Banner image for AI Grader
Core Concepts 6 min read

AI Grader

Understand how Catalio grades requirements across Clarity, Completeness, Feasibility, and Quality — and how to act on low scores

Updated
On this page

The AI Grader evaluates your Requirements across four weighted dimensions and produces a letter grade (A–F) along with actionable improvement suggestions. Grades are not cosmetic — they surface specific gaps in clarity, coverage, and testability so you know exactly what to fix.

The Four Grading Dimensions

Grades are computed from 19 sub-dimensions organized into four parent dimensions. Each sub-dimension is scored 0–10. Weighted averages roll up to an overall score on a 175-point scale, which maps to a letter grade.

Clarity (weight: 48/175)

Clarity measures how well the requirement communicates what is needed and why.

Sub-dimensions:

  • Title Quality — Is the title concise, specific, and distinguishable?
  • Problem Statement (user_want) — Does it identify who has the problem, the current state, the desired state, and why it matters? Is it specific and in active voice?
  • Benefit Clarity (user_benefit) — Are benefits measurable? Do they link to business objectives?
  • Use Case Clarity — Are happy path, edge cases, and error scenarios covered in a structured format (e.g., Given-When-Then)?
  • Language Precision — Does it avoid vague terms like “user-friendly”, “fast”, or “easy” in favor of specific, quantifiable language?

Completeness (weight: 46/175)

Completeness measures whether all the information needed to build and test the feature is present.

Sub-dimensions:

  • Functional Completeness — Are all user interactions, system behaviors, data inputs/outputs, and business rules specified?
  • Assumption Completeness — Are technical, business, and user assumptions documented?
  • Constraint Identification — Are security, compliance, performance, scalability, and integration constraints specified?
  • Acceptance Criteria — Are criteria specific, measurable, and testable?
  • Metadata Completeness — Are priority, complexity, business value, source, and relevant personas set?

Feasibility (weight: 37/175)

Feasibility measures whether the requirement can actually be built with available resources.

Sub-dimensions:

  • Technical Feasibility — Are required technologies mature and available? Is complexity matched to team capabilities?
  • Business Feasibility — Does the business value justify the investment? Is the timeline realistic?
  • Implementation Readiness — Is the requirement detailed enough for estimation? Are dependencies known?
  • Risk Assessment — Are technical, business, security, and integration risks identified with mitigation strategies?

Quality (weight: 44/175)

Quality measures the structural integrity of the requirement as a long-lived artifact.

Sub-dimensions:

  • Traceability — Does it link to source artifacts, related requirements, or business objectives?
  • Consistency — No internal contradictions; aligned with organizational standards and system architecture
  • Testability — Can QA derive test plans from the requirement? Are success/fail conditions clear?
  • Maintainability — Is ownership clear? Is the format structured and version-controlled?
  • Stakeholder Alignment — Are the requesting stakeholders, relevant personas, and discovery method documented?

Letter Grade Mapping

Grade Label Score Range Meaning
A Excellent 8.5–10.0 Production-ready; minimal clarification needed
B Good 7.0–8.4 Solid foundation; ready after small refinements
C Needs Work 5.0–6.9 Adequate base but needs significant improvements
D Poor 3.0–4.9 Major gaps; requires substantial rework
F Failing 0.0–2.9 Critical issues; not implementable as written

How Grades Appear in the UI

Each requirement’s detail page shows:

  • Overall grade — the letter grade badge (A–F)
  • Overall score — the numeric score (0–10)
  • Dimension breakdown — score and grade per dimension
  • Sub-dimension scores — granular scores (0–10) for each of the 19 sub-dimensions
  • Improvement suggestions — specific, actionable items generated by the LLM for sub-dimensions scoring below 7.0

The grading data is stored in two fields on the Requirement resource:

  • ai_grade — the letter grade as an atom (:a, :b, :c, :d, :f)
  • ai_quality_assessment_result — the full JSON from the LLM, including all dimension scores and suggestions

On-Demand vs. Automatic Grading

On-Demand

You can trigger a grade from the requirement detail page at any time by clicking the Grade Requirement button. The LLM evaluates the current state of the requirement — including its title, user_want, user_benefit, linked use cases, acceptance criteria, and metadata — and returns results within a few seconds.

Automatic

Catalio runs the grader automatically in the background when conditions are met — for example, after a new Requirement is created or meaningfully edited. The auto-grader includes a staleness check: if the Requirement is updated while the AI is still evaluating, the in-flight grade is discarded so the saved grade always reflects the current content.

Acting on Low Scores

The suggestions field is where the grader earns its value. For any sub-dimension scoring below 7.0, the LLM generates a specific suggestion — not a generic observation.

Examples of actionable suggestions:

  • “Add at least two assumptions covering technical prerequisites and data availability”
  • “Replace ‘user-friendly interface’ with a specific usability metric, e.g., ‘completes in fewer than 3 clicks for first-time users’”
  • “Add a performance constraint specifying the expected response time under peak load”
  • “Link this requirement to the source discovery artifact or stakeholder discussion”

Work through the suggestions from highest-weight dimensions first: Clarity (48) and Completeness (46) have the most impact on the overall score.

Best Practices

Grade early and often. Grade a requirement when you first draft it, not only before a sprint. Early grades reveal structural gaps before downstream work depends on the requirement.

Use the suggestions as a checklist. Each suggestion maps directly to a grading rule. Address them in order and re-grade to see the score improve.

Target B or better before development. A “C” requirement usually has enough to estimate but not enough to build without clarifying questions. A “B” or “A” requirement can go directly into a sprint.

Pair with the Product Manager AI Skill. The Product Manager skill in AI Chat is trained to write grade-ready requirements. Ask the AI to draft or improve a requirement and it will fill in the fields that the grader evaluates.

Understand the weight distribution. Clarity and Completeness together account for 94 of 175 weighted points. A vague problem statement and missing acceptance criteria will drag the overall score down even if Feasibility is strong.

Relationships at a Glance

Entity Relationship
Requirements Graded record; stores ai_grade and ai_quality_assessment_result
AI Skills Powered by the Requirement Grader skill, which defines the rubric
AI Chat Use the Product Manager skill to draft grade-ready requirements

Next Steps

  • Learn how AI Skills power the grader’s 147-rule evaluation system
  • See Requirements for the full field reference
  • Use AI Chat with the Product Manager skill to draft grade-ready requirements from conversation

Support

If grading returns an error, check that the LLM provider is correctly configured at Settings > LLM Providers. Grading requires the provider to return well-formed JSON — if the provider times out or returns a malformed response, the grade will not be saved and you can retry from the requirement detail page.