AI Grader

The AI Grader evaluates your Requirements across four weighted dimensions and produces a letter grade (A–F) along with actionable improvement suggestions. Grades are not cosmetic — they surface specific gaps in clarity, coverage, and testability so you know exactly what to fix.

The Four Grading Dimensions

Grades are computed from 19 sub-dimensions organized into four parent dimensions. Each sub-dimension is scored 0–10. Weighted averages roll up to an overall score on a 175-point scale, which maps to a letter grade.

Clarity (weight: 48/175)

Clarity measures how well the requirement communicates what is needed and why.

Sub-dimensions:

Title Quality — Is the title concise, specific, and distinguishable?
Problem Statement (user_want) — Does it identify who has the problem, the current state, the desired state, and why it matters? Is it specific and in active voice?
Benefit Clarity (user_benefit) — Are benefits measurable? Do they link to business objectives?
Use Case Clarity — Are happy path, edge cases, and error scenarios covered in a structured format (e.g., Given-When-Then)?
Language Precision — Does it avoid vague terms like “user-friendly”, “fast”, or “easy” in favor of specific, quantifiable language?

Completeness (weight: 46/175)

Completeness measures whether all the information needed to build and test the feature is present.

Sub-dimensions:

Functional Completeness — Are all user interactions, system behaviors, data inputs/outputs, and business rules specified?
Assumption Completeness — Are technical, business, and user assumptions documented?
Constraint Identification — Are security, compliance, performance, scalability, and integration constraints specified?
Acceptance Criteria — Are criteria specific, measurable, and testable?
Metadata Completeness — Are priority, complexity, business value, source, and relevant personas set?

Feasibility (weight: 37/175)

Feasibility measures whether the requirement can actually be built with available resources.

Sub-dimensions:

Technical Feasibility — Are required technologies mature and available? Is complexity matched to team capabilities?
Business Feasibility — Does the business value justify the investment? Is the timeline realistic?
Implementation Readiness — Is the requirement detailed enough for estimation? Are dependencies known?
Risk Assessment — Are technical, business, security, and integration risks identified with mitigation strategies?

Quality (weight: 44/175)

Quality measures the structural integrity of the requirement as a long-lived artifact.

Sub-dimensions:

Traceability — Does it link to source artifacts, related requirements, or business objectives?
Consistency — No internal contradictions; aligned with organizational standards and system architecture
Testability — Can QA derive test plans from the requirement? Are success/fail conditions clear?
Maintainability — Is ownership clear? Is the format structured and version-controlled?
Stakeholder Alignment — Are the requesting stakeholders, relevant personas, and discovery method documented?

Letter Grade Mapping

Grade	Label	Score Range	Meaning
A	Excellent	8.5–10.0	Production-ready; minimal clarification needed
B	Good	7.0–8.4	Solid foundation; ready after small refinements
C	Needs Work	5.0–6.9	Adequate base but needs significant improvements
D	Poor	3.0–4.9	Major gaps; requires substantial rework
F	Failing	0.0–2.9	Critical issues; not implementable as written

How Grades Appear in the UI

Each requirement’s detail page shows:

Overall grade — the letter grade badge (A–F)
Overall score — the numeric score (0–10)
Dimension breakdown — score and grade per dimension
Sub-dimension scores — granular scores (0–10) for each of the 19 sub-dimensions
Improvement suggestions — specific, actionable items generated by the LLM for sub-dimensions scoring below 7.0

The grading data is stored in two fields on the Requirement resource:

ai_grade — the letter grade as an atom (:a, :b, :c, :d, :f)
ai_quality_assessment_result — the full JSON from the LLM, including all dimension scores and suggestions

On-Demand vs. Automatic Grading

On-Demand

You can trigger a grade from the requirement detail page at any time by clicking the Grade Requirement button. The LLM evaluates the current state of the requirement — including its title, user_want, user_benefit, linked use cases, acceptance criteria, and metadata — and returns results within a few seconds.

Automatic

Catalio runs the grader automatically in the background when conditions are met — for example, after a new Requirement is created or meaningfully edited. The auto-grader includes a staleness check: if the Requirement is updated while the AI is still evaluating, the in-flight grade is discarded so the saved grade always reflects the current content.

Acting on Low Scores

The suggestions field is where the grader earns its value. For any sub-dimension scoring below 7.0, the LLM generates a specific suggestion — not a generic observation.

Examples of actionable suggestions:

“Add at least two assumptions covering technical prerequisites and data availability”
“Replace ‘user-friendly interface’ with a specific usability metric, e.g., ‘completes in fewer than 3 clicks for first-time users’”
“Add a performance constraint specifying the expected response time under peak load”
“Link this requirement to the source discovery artifact or stakeholder discussion”

Work through the suggestions from highest-weight dimensions first: Clarity (48) and Completeness (46) have the most impact on the overall score.

Best Practices

Grade early and often. Grade a requirement when you first draft it, not only before a sprint. Early grades reveal structural gaps before downstream work depends on the requirement.

Use the suggestions as a checklist. Each suggestion maps directly to a grading rule. Address them in order and re-grade to see the score improve.

Target B or better before development. A “C” requirement usually has enough to estimate but not enough to build without clarifying questions. A “B” or “A” requirement can go directly into a sprint.

Pair with the Product Manager AI Skill. The Product Manager skill in AI Chat is trained to write grade-ready requirements. Ask the AI to draft or improve a requirement and it will fill in the fields that the grader evaluates.

Understand the weight distribution. Clarity and Completeness together account for 94 of 175 weighted points. A vague problem statement and missing acceptance criteria will drag the overall score down even if Feasibility is strong.

Relationships at a Glance

Entity	Relationship
Requirements	Graded record; stores `ai_grade` and `ai_quality_assessment_result`
AI Skills	Powered by the Requirement Grader skill, which defines the rubric
AI Chat	Use the Product Manager skill to draft grade-ready requirements

Next Steps

Learn how AI Skills power the grader’s 147-rule evaluation system
See Requirements for the full field reference
Use AI Chat with the Product Manager skill to draft grade-ready requirements from conversation

Support

If grading returns an error, check that the LLM provider is correctly configured at Settings > LLM Providers. Grading requires the provider to return well-formed JSON — if the provider times out or returns a malformed response, the grade will not be saved and you can retry from the requirement detail page.