> ## Documentation Index
> Fetch the complete documentation index at: https://docs-dev-docs-event-stream-action-templates.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Review Auth0 Agent Experience Score

> Learn how Auth0 scores AI agent performance across models and frameworks, including the dimensions, grading system, and methodology behind the Agent Experience Score.

The [Agent Experience Score](https://auth0.com/agent-experience) measures how well AI agents implement Auth0 across different models and frameworks. It allows you to compare current scores from agents implementing Auth0 services and features — such as Multi-factor Authentication (MFA) or Auth0 Actions — in development and testing environments to review how Auth0 tools improve agent performance.

Use this resource to learn about the scoring methodology, including how scores are calculated, what dimensions are measured, and how grades are assigned.

## Test specifications

AI agents — Claude Code, GitHub Copilot, Gemini CLI — run Auth0 integration tasks in isolated development environments. Each agent uses the same tools a developer would in a realistic environment: a workspace, a shell, and file tools like Auth0 CLI. The prompts are short and realistic: "add authentication to my Next.js app," not step-by-step recipes.

Each model is tested with and without Auth0 tools ([MCP Server](/docs/get-started/build-with-ai-tools#auth0-docs-mcp-server) and [Agent Skills](/docs/quickstart/agent-skills)). The difference between those scores is the measurable impact of Auth0's AI tooling on the developer experience.

## Score dimensions

Every run is scored across 8 dimensions split into two categories — 50% process, 50% output. Five dimensions address the agent process from end-to-end with Auth0 tools. Three dimensions score the final output. Each dimension is scored 0–100 individually, then weighted and combined into the overall score.

| Dimension          | Category | Weight | Description                                                                                                                                                                                                                                             |
| ------------------ | -------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Setup Friction** | Process  | 12%    | Score determined by the agent's ability to complete the task autonomously. If the agent paused to ask questions or encountered errors, the score decreased.                                                                                             |
| **Setup Speed**    | Process  | 12%    | Score determined by the agent's active execution time. Results are comparable across environments.                                                                                                                                                      |
| **Efficiency**     | Process  | 12%    | Score determined by the proportion of wasteful tool calls — duplicate reads, errored calls, retries, and overwritten writes. Fewer wasted calls means less cost and less complexity.                                                                    |
| **Error Recovery** | Process  | 7%     | Score determined by infrastructure errors (rate limits, timeouts) that disrupted the execution.                                                                                                                                                         |
| **Docs Quality**   | Process  | 7%     | Score determined by how well the agent used external documentation during the task — whether it looked up valid, relevant sources, successfully retrieved content, and incorporated that information into the implementation rather than discarding it. |
| **Correctness**    | Output   | 25%    | Score determined by whether the generated code imports real packages, calls real methods, and wires components correctly.                                                                                                                               |
| **Hallucination**  | Output   | 15%    | Score determined by whether the agent invented packages that don't exist or used incorrect SDK variants.                                                                                                                                                |
| **Security**       | Output   | 10%    | Score determined by whether the agent hardcoded secrets, stored tokens insecurely, or committed credentials to source code.                                                                                                                             |

## Grades

Overall scores map to letter grades:

| Grade | Score range | Description                                |
| ----- | ----------- | ------------------------------------------ |
| A     | 90+         | Production-ready with 1–2 minor issues.    |
| B     | 75–89       | Sound fundamentals with notable gaps.      |
| C     | 60–74       | Usable but needs significant cleanup.      |
| D     | 40–59       | Major failures or severe process problems. |
| F     | \< 40       | Not useful — faster to start from scratch. |

Grades are calibrated to match developer intuition. A score of 91 should feel like code you'd accept with minimal review. A score of 55 should feel like something that needs real work to fix.

## Result validation

Every grader verifies generated code — not prose or explanations. Graders check that code compiles, imports real packages, calls actual SDK methods, and doesn't introduce security vulnerabilities.

Results are validated at multiple levels:

* **Presence checks**: Required SDK symbols, imports, and config keys exist in the output.
* **Hallucination detection**: Invented packages, wrong SDK variants, and fabricated API methods are caught.
* **Security checks**: Hardcoded credentials, tokens in insecure storage, and secrets in source code are flagged.
* **Structural validation**: Code is correctly wired — right components in right files, lifecycle hooks handled, middleware in the correct order.
* **Version correctness**: The agent uses current APIs, not deprecated patterns (only checked when the agent has access to current docs).
* **Holistic review**: An LLM judge evaluates overall correctness of the implementation.

## Estimated cost and time

The results page displays estimated cost and estimated time for each configuration. These values represent a single eval run with Auth0 MCP + Skills enabled.

### Estimated cost

Cost is calculated from the total tokens consumed during the eval run (input tokens + output tokens) multiplied by the model provider's published per-token pricing. Auth0 does not charge for running evals — the cost reflects what you would pay your model provider for equivalent token usage.

Token pricing varies by model and provider. For current rates, refer to your provider's pricing page:

* [Anthropic (Claude) pricing](https://docs.anthropic.com/en/docs/about-claude/models#model-comparison-table)
* [OpenAI (GPT) pricing](https://openai.com/api/pricing/)
* [Google (Gemini) pricing](https://ai.google.dev/gemini-api/docs/pricing/)

### Estimated time

Time is the wall-clock duration of the eval run from prompt submission to final output. It includes all agent activity: reading files, making tool calls, waiting for API responses, and writing code.

Time may vary based on:

* Model provider API latency and rate limits
* Number of tool calls required (varies by task complexity)
* Network conditions between the eval environment and the model provider
* Provider-side queue depth and load

Time is not normalized across providers. A faster time reflects both model efficiency and provider infrastructure performance.

## Learn more

* [Agent Experience Score](https://auth0.com/agent-experience)
* [Auth0 MCP Server](/docs/get-started/build-with-ai-tools#auth0-docs-mcp-server)
* [Agent Skills](/docs/quickstart/agent-skills)
