The AWS AI Competency program added Agentic categories in early 2026, recognising the surge in vendor work on Bedrock AgentCore, Lambda-based agents, and broader autonomous-system architectures on AWS. For enterprise buyers evaluating agencies for AI work, the badge is now a meaningful procurement signal — but it's not the final word.
This post is the vendor-evaluation framework we'd recommend to procurement, head-of-AI, or CTO buyers — partly because we're applying for the competency ourselves, partly because we've seen badge-driven procurement decisions go sideways enough times to want to write down the full checklist.
What the badge actually means
AWS AI Competency is a specialised credential within the AWS Partner Network. To earn it (and the Agentic sub-categories), a vendor must:
- Demonstrate two or more named customer references with documented AI workloads on AWS
- Submit a reference architecture for technical review by AWS solution architects
- Pass a security and operational excellence review against AWS Well-Architected criteria
- Maintain ongoing partner activities (training, certifications, joint go-to-market)
The Agentic categories add agent-specific requirements: production agent deployments, eval evidence, observability practices, governance documentation.
A badge holder has, at minimum, shipped real systems and had AWS validate the architecture. That's a higher bar than "partnered with AWS" or "AWS-trained team." It's a meaningful filter.
Where the badge falls short as a gating signal
Three failure modes we've seen in badge-driven procurement:
1. Coverage of categories you don't need
A vendor might hold AI Competency for, say, generative document search and content generation — but you're hiring them for autonomous customer-service agents. The badge says "they ship AI on AWS"; the relevance to your specific workload requires deeper inspection.
2. Recency and relevance of references
The reference architecture might be 18 months old, the customer reference might be a system that ships 50 queries a day, and the validation might predate the agentic-specific requirements. Ask when the validation was performed and what was specifically reviewed.
3. AWS-centric default
The badge inherently reflects work on AWS. For workloads where AWS isn't the right answer — pure on-premise sovereign deployments, multi-cloud architectures, hybrid patterns — the badge holder may not be the strongest fit. The badge is a signal of one capability, not all capabilities.
The five questions to ask any badged vendor
If you're evaluating a vendor with the AWS AI Competency Agentic credentials, ask these five before signing:
1. Show me the reference architecture you submitted
The architecture they used to earn the badge is public to AWS but typically not to customers. They should be able to walk you through it. Compare against your workload — is it the same shape? Different shapes need different patterns; a vendor strong at one shape may be weaker at yours.
2. How many production agents are you operating today?
Not "have built." Operating today, with active monitoring, with on-call coverage, with last-quarter operational metrics. The gap between "we can build agents" and "we operate agents in production" is where most vendor weakness shows up.
3. Walk me through your eval harness — not the marketing demo, the actual one
Ask to see the eval set, the scoring functions, the CI integration, and a sample run output. A vendor that can produce these on demand has the operational discipline you need. A vendor that talks generally about "rigorous evaluation" but can't show specifics doesn't.
4. What does your handover documentation look like?
If you're hiring for a fixed-scope project, the deliverable includes documentation that lets your team operate what they built. Ask to see a sample. Vendors with strong handover practices have it pre-formatted; vendors with weak ones improvise.
5. For sovereign workloads — how do you handle data residency?
For UAE/GCC clients, this is the test that separates AWS-centric vendors from genuinely region-aware ones. AWS me-south-1 (Bahrain) covers most use cases; the gaps (CBUAE-aligned banking workloads, government data, on-prem mandates) require capability outside AWS. A strong vendor explains the AWS path AND the non-AWS path; a weak one defaults to "we can do that on AWS" for everything.
The non-AWS alternative checklist
For workloads where AWS isn't the right fit, look for:
- NVIDIA Inception membership — signals GPU and inference expertise outside hyperscaler clouds
- Production deployments on owned hardware — track record outside AWS/Azure/GCP
- Open-weight model expertise — Llama, Qwen, Falcon-H1, Jais 2 in production
- Sovereign deployment references — UAE, GCC, EU, or other data-residency-constrained engagements
- Regulatory-overlay experience — CBUAE, PDPL, ISO 42001 specifics
Codenovai operates this way. We run substantial workloads on AWS (me-south-1 is our default for non-sovereign cloud deployments) and we operate on-premise sovereign deployments where the workload demands it. We're applying for AWS AI Competency in 2026 and concurrently maintain NVIDIA Inception status for the on-premise path.
A sane procurement framework
For enterprise AI procurement, we'd recommend:
- Filter: require AI Competency or equivalent demonstrable credentials.
- Score: ask the five questions above; weight responses heavily.
- Reference: talk to two of their named customer references. Ask what went wrong, not just what went right.
- Pilot: start with a small fixed-scope engagement before signing a long-term commitment.
- Operate: measure them on operational metrics during the pilot, not just on the deliverable.
The badge gets you to step 1. Steps 2–5 are where the actual decision is made.
Where Codenovai fits
We deploy AWS substantially (me-south-1 is our default region for non-sovereign cloud workloads) and we operate on-premise sovereign deployments where the workload demands it. The Agentic Pilot program we deliver answers all five questions above by default — the eval harness, handover documentation, sovereign-deployment path, and production operating discipline are part of the standard scope, not custom add-ons.
Book a scoping call — we welcome the five-question test.
