Banner image for Artifacts
Core Concepts 4 min read

Artifacts

Upload PDFs, Word documents, and other materials to let Catalio extract requirements, capabilities, and context through AI analysis

Updated
On this page

An Artifact in Catalio is an uploaded document or deliverable — a PDF, Word document, architecture diagram, meeting transcript, or specification — that Catalio processes through its AI extraction pipeline to surface requirements, capabilities, and business context.

Artifacts are the primary on-ramp for organizations that have existing documentation. Rather than manually re-entering information from legacy requirements documents or process manuals, you upload the source material and let the AI do the initial extraction pass.

What Artifacts Can Be

Artifacts typically represent:

  • Legacy requirements documents — Old specifications, feature lists, or BRDs that capture what a system currently does
  • Process manuals — Step-by-step guides that describe how business processes work
  • Architecture documents — System design documents that describe components and integrations
  • Meeting transcripts — Stakeholder interview notes or workshop summaries
  • Exported reports — Compliance reports, audit findings, or gap analyses
  • Vendor documentation — System manuals or API specifications for legacy software

Key Fields

Field Purpose
name Human-readable name for the artifact
artifact_type Classification of the document type
status Processing lifecycle: pending, processing, completed, failed
parent_id / parent_type The Initiative or other entity this artifact belongs to
extracted_text AI-extracted plain text content (populated after processing)
file_url URL to the stored file
organization_id Tenant scope

Artifact Lifecycle

Artifacts follow a processing pipeline:

Plaintext
pending → processing → completed
↘ failed

pending — The artifact has been uploaded and is queued for processing.

processing — The AI pipeline is actively extracting text, identifying requirements signals, and analyzing content.

completed — Processing is finished. Extracted text is available, and Change Proposals may have been generated from the content.

failed — Processing encountered an error (e.g., corrupted file, unsupported format). The artifact can be re-uploaded.

Artifacts and Initiatives

Artifacts are most commonly linked to Initiatives. When you start a new modernization initiative, you typically collect and upload the materials that define the current state:

  • Legacy specifications
  • As-is process documentation
  • Previous project reports

These feed the Initiative’s Onboarding Plan — a prioritized checklist of materials Catalio recommends collecting for a thorough discovery pass.

AI Extraction Pipeline

Once an Artifact reaches processing status:

  1. The file is converted to plain text if needed (PDF extraction, OCR for scanned documents)
  2. The AI analyzes the text for requirements-relevant content
  3. Extracted requirements signals are created as Change Proposals linked to the Initiative or Application
  4. The extracted_text field is populated for audit and review

The extraction is conservative — it generates proposals for human review rather than automatically creating requirements. This keeps humans in control of what enters the requirements catalog.

Artifact Types in Onboarding Plans

When Catalio’s AI generates an Onboarding Plan for a new Initiative, it suggests specific Artifact types to collect:

  • Process documentation
  • Data flow diagrams
  • Legacy system manuals
  • Stakeholder interview notes
  • Regulatory compliance documents

Each suggested artifact type is represented as an Onboarding Item with an AI-generated explanation of why that specific material is valuable for this engagement.

Best Practices

Upload source materials early.

The earlier you upload legacy documentation, the sooner the AI can begin surfacing requirements signals. Even imperfect or outdated documents provide useful context.

Prefer text-based documents over scanned images.

Native PDFs and Word documents produce higher-quality extraction than scanned images. If you only have scanned documents, Catalio will attempt OCR, but text quality may vary.

Review proposals promptly.

The value of Artifact processing is the Change Proposals it generates. Set a team cadence for reviewing proposals after each batch upload.

Name Artifacts descriptively.

“Oracle EBS AP Module User Guide v12.2” is more useful than “Document 1.” Future team members (and AI context windows) benefit from clear naming.

Don’t upload confidential data unnecessarily.

Catalio processes Artifact content through the AI pipeline. If a document contains sensitive personal data (e.g., individual HR records), assess whether that level of detail is needed before uploading.

Relationships at a Glance

Related Concept Relationship
Initiative Artifacts are typically linked to an Initiative
Onboarding Plan Onboarding Plans guide which Artifacts to collect
Change Proposals AI extraction generates proposals from Artifact content
Requirements Approved proposals become Requirements

Next Steps


Pro Tip: The single highest-impact action when starting a new engagement is uploading existing documentation — even if it’s old, incomplete, or messy. The AI extracts signal from noise more effectively than you might expect, and it surfaces gaps in existing documentation as part of the analysis.

Support

  • Documentation: Continue reading about Onboarding Plans and Initiatives
  • Email: support@catalio.ai
  • Community: Share document extraction tips with other Catalio users