Banner image for Repositories
Core Concepts 4 min read

Repositories

Connect source code repositories to Catalio to automatically surface Components from your codebase and map them to Applications for code-level traceability

Updated
On this page

A Repository is a connected source code repository. When you connect a repository to Catalio, it automatically reads the repository tree and creates Component records from the files it finds — giving you code-level traceability alongside your business requirements.

This is how Catalio bridges the gap between “what is specified” and “what is implemented”: Requirements describe what the system should do; Repositories show what the codebase actually contains; Issue Links connect the delivery work in between.

Provider Support

Provider Status
GitHub Primary — fully supported
Bitbucket Planned — not yet available
GitLab Planned — not yet available
Azure DevOps Planned — not yet available

Sync Lifecycle

Connecting a repository triggers an automatic sync:

Plaintext
pending → syncing → synced
↘ failed

pending — The repository has been connected but sync has not yet started.

syncing — The repository tree is being fetched and Components are being created or updated.

synced — Sync completed successfully. last_synced_at records when, and last_sync_cursor stores the tree SHA used for incremental detection on future syncs.

failed — Sync encountered an error. last_sync_error records the reason. The repository can be re-synced after the underlying issue is resolved.

After the initial full sync, subsequent syncs are incremental — only files whose tree SHA has changed are processed, keeping sync times fast even for large repositories.

Repository Mappings

A Repository can map to multiple Applications via path-based routing rules called Repository Mappings. Each mapping defines a file path pattern (e.g., apps/billing/**) and the Application it routes to. During sync, each file is matched against the active mappings and linked to the corresponding Application.

Files that match no mapping still become Components — they are just not associated with a specific Application until a mapping rule is created.

Key Fields

Field Purpose
owner Repository owner (GitHub organization or user, e.g., my-org)
name Repository name (e.g., billing-service)
provider Code host platform: currently github
default_branch Branch to sync from (e.g., main)
sync_status Current sync state: pending, syncing, synced, or failed
last_synced_at When the most recent successful sync completed
last_sync_cursor Tree SHA from the last sync — used for incremental change detection
last_sync_error Error details if the most recent sync failed

Code Traceability Model

Connecting a repository enables a three-layer traceability model:

Layer Catalio Concept What It Captures
Specification Requirements What the system should do
Work Items Issue Links Delivery tasks in GitHub/Linear/Jira
Code Repository → Components What the codebase actually contains

The Application sits in the middle — Requirements scope to an Application; Components are linked to Applications via Repository Mappings; Issue Links can also scope to an Application. This gives you a full end-to-end audit trail: from business requirement to code file.

Relationships at a Glance

Related Concept Relationship
Applications Repositories map to Applications via path-based routing rules
Components Repository sync creates Component records from code files
Issue Links Issue Links track the work-item layer alongside code

Best Practices

Connect repositories before starting discovery, not after.

Once a repository is connected and synced, the Component records from your codebase are available for AI analysis and traceability mapping. Connecting early means Catalio can surface code-level insights during discovery rather than just after.

Create Repository Mappings before the first sync.

If you have a monorepo with multiple services, create your path-based mappings before the initial sync triggers. This ensures Components are correctly attributed to their Application from the start, rather than requiring a full re-attribution afterward.

Use default_branch intentionally.

Most teams sync from main or master. If you want to track a specific release branch or a long-running feature branch, set default_branch accordingly. For teams doing continuous delivery from main, the default is usually correct.

Monitor sync status after connecting.

After connecting a new repository, check sync_status after a few minutes. A failed status with a last_sync_error message indicates the repository is private and Catalio’s GitHub App installation doesn’t have access, or the owner/name values are incorrect.

Disconnect repositories you no longer need.

Stale repositories that are no longer relevant to active Initiatives create noise in Component lists and traceability reports. Remove them when the engagement context changes.

Next Steps


Pro Tip: For large monorepos, start with the most critical service areas first. Define path-based mappings for the top 3–5 Applications you’re modernizing, then expand. A targeted sync gives you clean, actionable traceability rather than thousands of undifferentiated Components.

Support

  • Documentation: Continue reading about Components and Applications
  • In-App Help: The AI assistant can help interpret repository structure and suggest mappings
  • Email: support@catalio.ai
  • Community: Share repository integration patterns with other Catalio users