A Repository is a connected source code repository. When you connect a repository to Catalio, it automatically reads the repository tree and creates Component records from the files it finds — giving you code-level traceability alongside your business requirements.
This is how Catalio bridges the gap between “what is specified” and “what is implemented”: Requirements describe what the system should do; Repositories show what the codebase actually contains; Issue Links connect the delivery work in between.
Provider Support
| Provider | Status |
|---|---|
| GitHub | Primary — fully supported |
| Bitbucket | Planned — not yet available |
| GitLab | Planned — not yet available |
| Azure DevOps | Planned — not yet available |
Sync Lifecycle
Connecting a repository triggers an automatic sync:
pending → syncing → synced
↘ failed
pending — The repository has been connected but sync has not yet started.
syncing — The repository tree is being fetched and Components are being created or updated.
synced — Sync completed successfully. last_synced_at records when, and last_sync_cursor stores the tree SHA used for incremental detection on future syncs.
failed — Sync encountered an error. last_sync_error records the reason. The repository can be re-synced after the underlying issue is resolved.
After the initial full sync, subsequent syncs are incremental — only files whose tree SHA has changed are processed, keeping sync times fast even for large repositories.
Repository Mappings
A Repository can map to multiple Applications via path-based routing rules called Repository Mappings. Each mapping defines a file path pattern (e.g., apps/billing/**) and the Application it routes to. During sync, each file is matched against the active mappings and linked to the corresponding Application.
Files that match no mapping still become Components — they are just not associated with a specific Application until a mapping rule is created.
Key Fields
| Field | Purpose |
|---|---|
| owner | Repository owner (GitHub organization or user, e.g., my-org) |
| name | Repository name (e.g., billing-service) |
| provider | Code host platform: currently github |
| default_branch | Branch to sync from (e.g., main) |
| sync_status | Current sync state: pending, syncing, synced, or failed |
| last_synced_at | When the most recent successful sync completed |
| last_sync_cursor | Tree SHA from the last sync — used for incremental change detection |
| last_sync_error | Error details if the most recent sync failed |
Code Traceability Model
Connecting a repository enables a three-layer traceability model:
| Layer | Catalio Concept | What It Captures |
|---|---|---|
| Specification | Requirements | What the system should do |
| Work Items | Issue Links | Delivery tasks in GitHub/Linear/Jira |
| Code | Repository → Components | What the codebase actually contains |
The Application sits in the middle — Requirements scope to an Application; Components are linked to Applications via Repository Mappings; Issue Links can also scope to an Application. This gives you a full end-to-end audit trail: from business requirement to code file.
Relationships at a Glance
| Related Concept | Relationship |
|---|---|
| Applications | Repositories map to Applications via path-based routing rules |
| Components | Repository sync creates Component records from code files |
| Issue Links | Issue Links track the work-item layer alongside code |
Best Practices
Connect repositories before starting discovery, not after.
Once a repository is connected and synced, the Component records from your codebase are available for AI analysis and traceability mapping. Connecting early means Catalio can surface code-level insights during discovery rather than just after.
Create Repository Mappings before the first sync.
If you have a monorepo with multiple services, create your path-based mappings before the initial sync triggers. This ensures Components are correctly attributed to their Application from the start, rather than requiring a full re-attribution afterward.
Use default_branch intentionally.
Most teams sync from main or master. If you want to track a specific release branch or a long-running feature branch, set default_branch accordingly. For teams doing continuous delivery from main, the default is usually correct.
Monitor sync status after connecting.
After connecting a new repository, check sync_status after a few minutes. A failed status with a last_sync_error message indicates the repository is private and Catalio’s GitHub App installation doesn’t have access, or the owner/name values are incorrect.
Disconnect repositories you no longer need.
Stale repositories that are no longer relevant to active Initiatives create noise in Component lists and traceability reports. Remove them when the engagement context changes.
Next Steps
- Understand Components — See what the sync creates from your repository files
- Work with Applications — Learn how Applications anchor code and work-item traceability
- Connect Issue Links — Complete the traceability chain from requirements to delivery
Pro Tip: For large monorepos, start with the most critical service areas first. Define path-based mappings for the top 3–5 Applications you’re modernizing, then expand. A targeted sync gives you clean, actionable traceability rather than thousands of undifferentiated Components.
Support
- Documentation: Continue reading about Components and Applications
- In-App Help: The AI assistant can help interpret repository structure and suggest mappings
- Email: support@catalio.ai
- Community: Share repository integration patterns with other Catalio users