The AI Grader evaluates your Requirements across four weighted dimensions and produces a letter grade (A–F) along with actionable improvement suggestions. Grades are not cosmetic — they surface specific gaps in clarity, coverage, and testability so you know exactly what to fix.
The Four Grading Dimensions
Grades are computed from 19 sub-dimensions organized into four parent dimensions. Each sub-dimension is scored 0–10. Weighted averages roll up to an overall score on a 175-point scale, which maps to a letter grade.
Clarity (weight: 48/175)
Clarity measures how well the requirement communicates what is needed and why.
Sub-dimensions:
- Title Quality — Is the title concise, specific, and distinguishable?
- Problem Statement (user_want) — Does it identify who has the problem, the current state, the desired state, and why it matters? Is it specific and in active voice?
- Benefit Clarity (user_benefit) — Are benefits measurable? Do they link to business objectives?
- Use Case Clarity — Are happy path, edge cases, and error scenarios covered in a structured format (e.g., Given-When-Then)?
- Language Precision — Does it avoid vague terms like “user-friendly”, “fast”, or “easy” in favor of specific, quantifiable language?
Completeness (weight: 46/175)
Completeness measures whether all the information needed to build and test the feature is present.
Sub-dimensions:
- Functional Completeness — Are all user interactions, system behaviors, data inputs/outputs, and business rules specified?
- Assumption Completeness — Are technical, business, and user assumptions documented?
- Constraint Identification — Are security, compliance, performance, scalability, and integration constraints specified?
- Acceptance Criteria — Are criteria specific, measurable, and testable?
- Metadata Completeness — Are priority, complexity, business value, source, and relevant personas set?
Feasibility (weight: 37/175)
Feasibility measures whether the requirement can actually be built with available resources.
Sub-dimensions:
- Technical Feasibility — Are required technologies mature and available? Is complexity matched to team capabilities?
- Business Feasibility — Does the business value justify the investment? Is the timeline realistic?
- Implementation Readiness — Is the requirement detailed enough for estimation? Are dependencies known?
- Risk Assessment — Are technical, business, security, and integration risks identified with mitigation strategies?
Quality (weight: 44/175)
Quality measures the structural integrity of the requirement as a long-lived artifact.
Sub-dimensions:
- Traceability — Does it link to source artifacts, related requirements, or business objectives?
- Consistency — No internal contradictions; aligned with organizational standards and system architecture
- Testability — Can QA derive test plans from the requirement? Are success/fail conditions clear?
- Maintainability — Is ownership clear? Is the format structured and version-controlled?
- Stakeholder Alignment — Are the requesting stakeholders, relevant personas, and discovery method documented?
Letter Grade Mapping
| Grade | Label | Score Range | Meaning |
|---|---|---|---|
| A | Excellent | 8.5–10.0 | Production-ready; minimal clarification needed |
| B | Good | 7.0–8.4 | Solid foundation; ready after small refinements |
| C | Needs Work | 5.0–6.9 | Adequate base but needs significant improvements |
| D | Poor | 3.0–4.9 | Major gaps; requires substantial rework |
| F | Failing | 0.0–2.9 | Critical issues; not implementable as written |
How Grades Appear in the UI
Each requirement’s detail page shows:
- Overall grade — the letter grade badge (A–F)
- Overall score — the numeric score (0–10)
- Dimension breakdown — score and grade per dimension
- Sub-dimension scores — granular scores (0–10) for each of the 19 sub-dimensions
- Improvement suggestions — specific, actionable items generated by the LLM for sub-dimensions scoring below 7.0
The grading data is stored in two fields on the Requirement resource:
ai_grade— the letter grade as an atom (:a,:b,:c,:d,:f)ai_quality_assessment_result— the full JSON from the LLM, including all dimension scores and suggestions
On-Demand vs. Automatic Grading
On-Demand
You can trigger a grade from the requirement detail page at any time by clicking the Grade Requirement button. The LLM evaluates the current state of the requirement — including its title, user_want, user_benefit, linked use cases, acceptance criteria, and metadata — and returns results within a few seconds.
Automatic
Catalio runs the grader automatically in the background when conditions are met — for example, after a new Requirement is created or meaningfully edited. The auto-grader includes a staleness check: if the Requirement is updated while the AI is still evaluating, the in-flight grade is discarded so the saved grade always reflects the current content.
Acting on Low Scores
The suggestions field is where the grader earns its value. For any sub-dimension scoring below 7.0, the LLM generates a specific suggestion — not a generic observation.
Examples of actionable suggestions:
- “Add at least two assumptions covering technical prerequisites and data availability”
- “Replace ‘user-friendly interface’ with a specific usability metric, e.g., ‘completes in fewer than 3 clicks for first-time users’”
- “Add a performance constraint specifying the expected response time under peak load”
- “Link this requirement to the source discovery artifact or stakeholder discussion”
Work through the suggestions from highest-weight dimensions first: Clarity (48) and Completeness (46) have the most impact on the overall score.
Best Practices
Grade early and often. Grade a requirement when you first draft it, not only before a sprint. Early grades reveal structural gaps before downstream work depends on the requirement.
Use the suggestions as a checklist. Each suggestion maps directly to a grading rule. Address them in order and re-grade to see the score improve.
Target B or better before development. A “C” requirement usually has enough to estimate but not enough to build without clarifying questions. A “B” or “A” requirement can go directly into a sprint.
Pair with the Product Manager AI Skill. The Product Manager skill in AI Chat is trained to write grade-ready requirements. Ask the AI to draft or improve a requirement and it will fill in the fields that the grader evaluates.
Understand the weight distribution. Clarity and Completeness together account for 94 of 175 weighted points. A vague problem statement and missing acceptance criteria will drag the overall score down even if Feasibility is strong.
Relationships at a Glance
| Entity | Relationship |
|---|---|
| Requirements | Graded record; stores ai_grade and ai_quality_assessment_result |
| AI Skills | Powered by the Requirement Grader skill, which defines the rubric |
| AI Chat | Use the Product Manager skill to draft grade-ready requirements |
Next Steps
- Learn how AI Skills power the grader’s 147-rule evaluation system
- See Requirements for the full field reference
- Use AI Chat with the Product Manager skill to draft grade-ready requirements from conversation
Support
If grading returns an error, check that the LLM provider is correctly configured at Settings > LLM Providers. Grading requires the provider to return well-formed JSON — if the provider times out or returns a malformed response, the grade will not be saved and you can retry from the requirement detail page.