An Artifact in Catalio is an uploaded document or deliverable — a PDF, Word document, architecture diagram, meeting transcript, or specification — that Catalio processes through its AI extraction pipeline to surface requirements, capabilities, and business context.
Artifacts are the primary on-ramp for organizations that have existing documentation. Rather than manually re-entering information from legacy requirements documents or process manuals, you upload the source material and let the AI do the initial extraction pass.
What Artifacts Can Be
Artifacts typically represent:
- Legacy requirements documents — Old specifications, feature lists, or BRDs that capture what a system currently does
- Process manuals — Step-by-step guides that describe how business processes work
- Architecture documents — System design documents that describe components and integrations
- Meeting transcripts — Stakeholder interview notes or workshop summaries
- Exported reports — Compliance reports, audit findings, or gap analyses
- Vendor documentation — System manuals or API specifications for legacy software
Key Fields
| Field | Purpose |
|---|---|
| name | Human-readable name for the artifact |
| artifact_type | Classification of the document type |
| status | Processing lifecycle: pending, processing, completed, failed |
| parent_id / parent_type | The Initiative or other entity this artifact belongs to |
| extracted_text | AI-extracted plain text content (populated after processing) |
| file_url | URL to the stored file |
| organization_id | Tenant scope |
Artifact Lifecycle
Artifacts follow a processing pipeline:
pending → processing → completed
↘ failed
pending — The artifact has been uploaded and is queued for processing.
processing — The AI pipeline is actively extracting text, identifying requirements signals, and analyzing content.
completed — Processing is finished. Extracted text is available, and Change Proposals may have been generated from the content.
failed — Processing encountered an error (e.g., corrupted file, unsupported format). The artifact can be re-uploaded.
Artifacts and Initiatives
Artifacts are most commonly linked to Initiatives. When you start a new modernization initiative, you typically collect and upload the materials that define the current state:
- Legacy specifications
- As-is process documentation
- Previous project reports
These feed the Initiative’s Onboarding Plan — a prioritized checklist of materials Catalio recommends collecting for a thorough discovery pass.
AI Extraction Pipeline
Once an Artifact reaches processing status:
- The file is converted to plain text if needed (PDF extraction, OCR for scanned documents)
- The AI analyzes the text for requirements-relevant content
- Extracted requirements signals are created as Change Proposals linked to the Initiative or Application
- The
extracted_textfield is populated for audit and review
The extraction is conservative — it generates proposals for human review rather than automatically creating requirements. This keeps humans in control of what enters the requirements catalog.
Artifact Types in Onboarding Plans
When Catalio’s AI generates an Onboarding Plan for a new Initiative, it suggests specific Artifact types to collect:
- Process documentation
- Data flow diagrams
- Legacy system manuals
- Stakeholder interview notes
- Regulatory compliance documents
Each suggested artifact type is represented as an Onboarding Item with an AI-generated explanation of why that specific material is valuable for this engagement.
Best Practices
Upload source materials early.
The earlier you upload legacy documentation, the sooner the AI can begin surfacing requirements signals. Even imperfect or outdated documents provide useful context.
Prefer text-based documents over scanned images.
Native PDFs and Word documents produce higher-quality extraction than scanned images. If you only have scanned documents, Catalio will attempt OCR, but text quality may vary.
Review proposals promptly.
The value of Artifact processing is the Change Proposals it generates. Set a team cadence for reviewing proposals after each batch upload.
Name Artifacts descriptively.
“Oracle EBS AP Module User Guide v12.2” is more useful than “Document 1.” Future team members (and AI context windows) benefit from clear naming.
Don’t upload confidential data unnecessarily.
Catalio processes Artifact content through the AI pipeline. If a document contains sensitive personal data (e.g., individual HR records), assess whether that level of detail is needed before uploading.
Relationships at a Glance
| Related Concept | Relationship |
|---|---|
| Initiative | Artifacts are typically linked to an Initiative |
| Onboarding Plan | Onboarding Plans guide which Artifacts to collect |
| Change Proposals | AI extraction generates proposals from Artifact content |
| Requirements | Approved proposals become Requirements |
Next Steps
- Understand Onboarding Plans — Learn how Catalio guides artifact collection
- Review Change Proposals — Curate AI-extracted content
- Create an Initiative — Start a modernization engagement
Pro Tip: The single highest-impact action when starting a new engagement is uploading existing documentation — even if it’s old, incomplete, or messy. The AI extracts signal from noise more effectively than you might expect, and it surfaces gaps in existing documentation as part of the analysis.
Support
- Documentation: Continue reading about Onboarding Plans and Initiatives
- Email: support@catalio.ai
- Community: Share document extraction tips with other Catalio users