Catalio is designed with privacy at its core. This guide explains how your data is protected when using AI-powered features, including requirement analysis, semantic search, and the AI chat assistant.
Core Privacy Principles
Catalio’s AI features are built on four fundamental privacy principles:
1. Your Data Stays Yours
Important
2. Organization Isolation
All AI processing is isolated per organization. Data from one organization is never accessible to another, and AI features cannot cross organizational boundaries.
3. Explicit Control
You control which data is accessible to AI. Every requirement has an “AI Accessible” toggle, allowing you to exclude sensitive content from AI processing.
4. Transparency
We’re transparent about how data flows, what providers receive, and how your content is processed. This document provides complete visibility into AI data handling.
Supported AI Providers
Catalio supports multiple AI providers through our Bring Your Own LLM (BYOLLM) capability:
| Provider | Chat | Embeddings | Vision | Data Handling |
|---|---|---|---|---|
| OpenAI | Yes | Yes | Yes | API data not used for training |
| Anthropic | Yes | No | Yes | API/commercial: no training; consumer: opt-out |
| Azure OpenAI | Yes | Yes | Yes | Data stays in your Azure tenant |
| Google Gemini | Yes | Yes | Yes | Enterprise data handling available |
| Groq | Yes | No | No | Fast inference, no training |
| xAI | Yes | No | No | OpenAI-compatible API |
| Ollama | Yes | Yes | Yes | Self-hosted, data never leaves |
| OpenRouter | Yes | Yes | Yes | Passes to underlying provider |
| GitHub Copilot | Yes | No | Yes | GitHub Enterprise data policies |
Provider Data Policies
Each provider has their own data handling policies:
OpenAI: API requests through the enterprise API are not used for model training. Data may be retained for 30 days for abuse monitoring.
Anthropic: As of September 28, 2025, Anthropic’s data policy varies by plan type. Commercial offerings (Claude for Work, Claude Gov, Claude for Education, and API usage) retain full protections—data is never used for model training. Consumer plans (Free, Pro, Max) operate under an opt-out model where data may be used for training unless users disable this in settings. Since Catalio integrates via the API, your data is protected under commercial terms and is not used for training.
Azure OpenAI: Data remains in your Azure tenant with full enterprise controls. You control data residency, retention, and encryption.
Ollama: Data never leaves your infrastructure. This is the highest privacy option for organizations with strict data handling requirements.
Tip
API Key Security
When you configure AI providers in Catalio, your API keys are protected with enterprise-grade security:
AES-256-GCM Encryption
All API keys are encrypted at rest using AES-256-GCM, the same encryption standard used by financial institutions and government agencies:
- 256-bit encryption keys
- Galois/Counter Mode for authenticated encryption
- Unique initialization vectors for each encryption
- Separate key management from database storage
Encryption Implementation
API Key Entry
|
v
AES-256-GCM Encryption
|
v
Encrypted Storage in Database
|
v
(When needed for API call)
|
v
Decryption in Memory
|
v
API Request to Provider
|
v
Immediate Memory Cleanup
The encryption key is stored separately from the encrypted data, typically in environment variables or a secrets management system, ensuring database access alone cannot reveal API keys.
Never Logged
API keys and other sensitive credentials are automatically excluded from:
- Application logs
- Error reports
- Telemetry data
- Audit trails
- Debug output
Catalio’s logging system uses a sanitization layer that automatically redacts fields like:
password, api_key, secret, token, authorization, bearer, access_token,
refresh_token, and similar sensitive identifiers.
Data Flow for AI Features
Understanding how your data flows helps you make informed decisions about AI feature usage.
Requirement Analysis
When you analyze a requirement for quality, sentiment, or categories:
Requirement Content
|
v
Catalio Application
|
| (Selected fields only)
v
Your AI Provider
|
v
Analysis Results
|
v
Stored with Requirement
What’s sent: Title, description, user story (want/benefit), acceptance criteria
What’s NOT sent: User identifiers, organization metadata, audit trail information, internal IDs
Semantic Search (Embeddings)
For semantic search, Catalio generates vector embeddings:
Requirement Text
|
v
Embedding Model API
|
v
Vector (1536 numbers)
|
v
Stored in Catalio Database
|
v
Used for Similarity Search
What’s sent: Combined text from title, user want, and user benefit fields
What’s stored: Only the numeric vector representation, not the original text sent to the API
Important: Embeddings are mathematical representations that cannot be reversed back into original text. They enable semantic search without storing your content at the provider.
AI Chat Assistant
When using the AI chat assistant:
User Message
|
v
Catalio Chat System
|
| + Context from tools
v
Your AI Provider
|
v
AI Response
|
v
Displayed to User
|
v
Stored in Conversation
What’s sent: Your message, conversation history, and context from tool calls (requirement summaries, search results)
What’s stored in Catalio: Complete conversation history for your reference and continuity
Contextual Learning
Catalio offers optional contextual learning that improves AI responses based on your organization’s content and patterns.
How Contextual Learning Works
When enabled, Catalio:
- Analyzes patterns in your requirements and usage
- Creates organization-specific context
- Provides this context to AI for better responses
Isolation Guarantees
Contextual learning is completely isolated per organization:
- Learning from Org A never influences Org B
- Context is stored separately for each organization
- Deletion removes all associated learning data
No Global Training
Note
- Train OpenAI, Anthropic, or other provider models
- Improve global AI capabilities
- Share patterns across organizations
- Create generalizable AI improvements
Contextual learning provides context at inference time, not training time.
Enabling/Disabling
Organization administrators control contextual learning:
- Navigate to Settings > AI Features
- Toggle Contextual Learning
- Choose scope: Requirements only, Full content, or Off
Data Control and Deletion
You maintain full control over your AI-related data.
AI Accessible Toggle
Every requirement has an “AI Accessible” toggle:
- Enabled (default): Requirement is included in AI analysis and semantic search
- Disabled: Requirement is excluded from all AI processing
Use this for:
- Sensitive or confidential requirements
- PII-containing content
- Internal notes not suitable for AI processing
- Compliance-restricted information
Conversation Deletion
Users can delete AI chat conversations:
- Open the conversation
- Click the options menu
- Select Delete Conversation
- Confirm deletion
Deleted conversations are permanently removed and cannot be recovered.
Organization Data Deletion
When an organization is deleted from Catalio:
- All requirements and associated AI data are deleted
- All embeddings are removed
- All chat conversations are deleted
- All contextual learning data is purged
- All provider configurations and encrypted API keys are deleted
Right to Deletion (GDPR)
For GDPR compliance, Catalio supports data subject requests:
- Individual user data can be anonymized or deleted
- Organization data can be exported or deleted
- Audit trails maintain minimal necessary information
Contact support@catalio.ai for data deletion requests.
Compliance Considerations
GDPR (European Union)
For EU organizations or those handling EU citizen data:
Recommended: Use Azure OpenAI with EU data residency
- Deploy Azure OpenAI resource in West Europe or Sweden Central
- Data never leaves EU boundaries
- Full GDPR compliance controls
Considerations:
- Standard OpenAI API processes data in the US
- Document your legal basis for AI processing
- Include AI processing in your privacy policy
- Enable data subject access and deletion
HIPAA (Healthcare)
For organizations handling protected health information (PHI):
Recommended: Use Azure OpenAI with BAA
- Microsoft offers Business Associate Agreements for Azure OpenAI
- Configure with HIPAA-compliant settings
- Enable audit logging
Caution
Considerations:
- Exclude PHI from AI-accessible requirements
- Document AI processing in your HIPAA policies
SOC 2
For organizations requiring SOC 2 compliance:
Recommended: Use enterprise provider tiers
- Azure OpenAI includes SOC 2 compliance
- OpenAI Enterprise provides compliance documentation
- Anthropic offers enterprise agreements
Considerations:
- Document AI provider security controls
- Include in your third-party risk assessment
- Monitor for security advisories
Financial Services
For banks, insurance, and financial services:
Recommended: Self-hosted or Azure OpenAI
- Ollama for complete on-premise control
- Azure OpenAI with financial services certifications
- Document AI processing in regulatory filings
Best Practices
Minimize Sensitive Data in Requirements
Write requirements to minimize PII and sensitive information:
Instead of:
“John Smith (john.smith@company.com) needs to export his social security number for tax filing”
Write:
“Users need to export personal tax identifiers for compliance reporting”
Use AI Accessible Toggle Appropriately
Warning
Mark requirements as non-AI-accessible when they contain:
- Personal identifiable information (PII)
- Financial account numbers
- Healthcare information
- Trade secrets or proprietary formulas
- Classified or restricted information
Review Provider Policies
Before configuring a provider:
- Review their data handling policies
- Understand their data retention periods
- Verify compliance certifications
- Consider data residency requirements
Audit AI Usage
Regularly review AI feature usage:
- Check which providers are configured
- Review feature assignments
- Audit who has configuration access
- Monitor for unusual usage patterns
Document Your Policies
Create internal documentation covering:
- Which AI providers are approved
- What data can be processed by AI
- Who can configure AI features
- How to handle AI-related incidents
Summary
| Aspect | Catalio’s Approach |
|---|---|
| Global model training | Never - your data is not used for training |
| API key storage | AES-256-GCM encryption |
| API key logging | Never logged or exposed |
| Organization isolation | Complete - no cross-org data access |
| Contextual learning | Optional, isolated per organization |
| Data deletion | Full deletion supported on request |
| Provider choice | BYOLLM - you choose your provider |
| Sensitive data control | AI Accessible toggle per requirement |
Resources
- Supported AI Providers - Provider comparison and capabilities
- Bring Your Own LLM - Configure your AI provider
- Setting Up LLM API Keys - Detailed setup guide
- Catalio Privacy Policy - Legal privacy documentation
Questions?
For privacy-related questions about AI features:
- Email: privacy@catalio.ai
- Security: security@catalio.ai
- General Support: support@catalio.ai
Last Updated: December 2025