AI Features and Data Privacy

Catalio is designed with privacy at its core. This guide explains how your data is protected when using AI-powered features, including requirement analysis, semantic search, and the AI chat assistant.

Core Privacy Principles

Catalio’s AI features are built on four fundamental privacy principles:

1. Your Data Stays Yours

Important

Catalio never uses your data to train global AI models. Your requirements, use cases, personas, and other content remain exclusively within your organization’s context.

2. Organization Isolation

All AI processing is isolated per organization. Data from one organization is never accessible to another, and AI features cannot cross organizational boundaries.

3. Explicit Control

You control which data is accessible to AI. Every requirement has an “AI Accessible” toggle, allowing you to exclude sensitive content from AI processing.

4. Transparency

We’re transparent about how data flows, what providers receive, and how your content is processed. This document provides complete visibility into AI data handling.

Supported AI Providers

Catalio supports multiple AI providers through our Bring Your Own LLM (BYOLLM) capability:

Provider	Chat	Embeddings	Vision	Data Handling
OpenAI	Yes	Yes	Yes	API data not used for training
Anthropic	Yes	No	Yes	API/commercial: no training; consumer: opt-out
Azure OpenAI	Yes	Yes	Yes	Data stays in your Azure tenant
Google Gemini	Yes	Yes	Yes	Enterprise data handling available
Groq	Yes	No	No	Fast inference, no training
xAI	Yes	No	No	OpenAI-compatible API
Ollama	Yes	Yes	Yes	Self-hosted, data never leaves
OpenRouter	Yes	Yes	Yes	Passes to underlying provider
GitHub Copilot	Yes	No	Yes	GitHub Enterprise data policies

Provider Data Policies

Each provider has their own data handling policies:

OpenAI: API requests through the enterprise API are not used for model training. Data may be retained for 30 days for abuse monitoring.

Anthropic: As of September 28, 2025, Anthropic’s data policy varies by plan type. Commercial offerings (Claude for Work, Claude Gov, Claude for Education, and API usage) retain full protections—data is never used for model training. Consumer plans (Free, Pro, Max) operate under an opt-out model where data may be used for training unless users disable this in settings. Since Catalio integrates via the API, your data is protected under commercial terms and is not used for training.

Azure OpenAI: Data remains in your Azure tenant with full enterprise controls. You control data residency, retention, and encryption.

Ollama: Data never leaves your infrastructure. This is the highest privacy option for organizations with strict data handling requirements.

Tip

For maximum data privacy, consider using Ollama for self-hosted AI processing where data never leaves your infrastructure.

API Key Security

When you configure AI providers in Catalio, your API keys are protected with enterprise-grade security:

AES-256-GCM Encryption

All API keys are encrypted at rest using AES-256-GCM, the same encryption standard used by financial institutions and government agencies:

256-bit encryption keys
Galois/Counter Mode for authenticated encryption
Unique initialization vectors for each encryption
Separate key management from database storage

Encryption Implementation

Plaintext

API Key Entry
     |
     v
AES-256-GCM Encryption
     |
     v
Encrypted Storage in Database
     |
     v
(When needed for API call)
     |
     v
Decryption in Memory
     |
     v
API Request to Provider
     |
     v
Immediate Memory Cleanup

The encryption key is stored separately from the encrypted data, typically in environment variables or a secrets management system, ensuring database access alone cannot reveal API keys.

Never Logged

API keys and other sensitive credentials are automatically excluded from:

Application logs
Error reports
Telemetry data
Audit trails
Debug output

Catalio’s logging system uses a sanitization layer that automatically redacts fields like: password, api_key, secret, token, authorization, bearer, access_token, refresh_token, and similar sensitive identifiers.

Data Flow for AI Features

Understanding how your data flows helps you make informed decisions about AI feature usage.

Requirement Analysis

When you analyze a requirement for quality, sentiment, or categories:

Plaintext

Requirement Content
     |
     v
Catalio Application
     |
     | (Selected fields only)
     v
Your AI Provider
     |
     v
Analysis Results
     |
     v
Stored with Requirement

What’s sent: Title, description, user story (want/benefit), acceptance criteria

What’s NOT sent: User identifiers, organization metadata, audit trail information, internal IDs

Semantic Search (Embeddings)

For semantic search, Catalio generates vector embeddings:

Plaintext

Requirement Text
     |
     v
Embedding Model API
     |
     v
Vector (1536 numbers)
     |
     v
Stored in Catalio Database
     |
     v
Used for Similarity Search

What’s sent: Combined text from title, user want, and user benefit fields

What’s stored: Only the numeric vector representation, not the original text sent to the API

Important: Embeddings are mathematical representations that cannot be reversed back into original text. They enable semantic search without storing your content at the provider.

AI Chat Assistant

When using the AI chat assistant:

Plaintext

User Message
     |
     v
Catalio Chat System
     |
     | + Context from tools
     v
Your AI Provider
     |
     v
AI Response
     |
     v
Displayed to User
     |
     v
Stored in Conversation

What’s sent: Your message, conversation history, and context from tool calls (requirement summaries, search results)

What’s stored in Catalio: Complete conversation history for your reference and continuity

Contextual Learning

Catalio offers optional contextual learning that improves AI responses based on your organization’s content and patterns.

How Contextual Learning Works

When enabled, Catalio:

Analyzes patterns in your requirements and usage
Creates organization-specific context
Provides this context to AI for better responses

Isolation Guarantees

Contextual learning is completely isolated per organization:

Learning from Org A never influences Org B
Context is stored separately for each organization
Deletion removes all associated learning data

No Global Training

Note

Contextual learning does NOT involve training AI models. Your data is never used to:

Train OpenAI, Anthropic, or other provider models
Improve global AI capabilities
Share patterns across organizations
Create generalizable AI improvements

Contextual learning provides context at inference time, not training time.

Enabling/Disabling

Organization administrators control contextual learning:

Navigate to Settings > AI Features
Toggle Contextual Learning
Choose scope: Requirements only, Full content, or Off

Data Control and Deletion

You maintain full control over your AI-related data.

AI Accessible Toggle

Every requirement has an “AI Accessible” toggle:

Enabled (default): Requirement is included in AI analysis and semantic search
Disabled: Requirement is excluded from all AI processing

Use this for:

Sensitive or confidential requirements
PII-containing content
Internal notes not suitable for AI processing
Compliance-restricted information

Conversation Deletion

Users can delete AI chat conversations:

Open the conversation
Click the options menu
Select Delete Conversation
Confirm deletion

Deleted conversations are permanently removed and cannot be recovered.

Organization Data Deletion

When an organization is deleted from Catalio:

All requirements and associated AI data are deleted
All embeddings are removed
All chat conversations are deleted
All contextual learning data is purged
All provider configurations and encrypted API keys are deleted

For GDPR compliance, Catalio supports data subject requests:

Individual user data can be anonymized or deleted
Organization data can be exported or deleted
Audit trails maintain minimal necessary information

Contact support@catalio.ai for data deletion requests.

Compliance Considerations

For EU organizations or those handling EU citizen data:

Recommended: Use Azure OpenAI with EU data residency

Deploy Azure OpenAI resource in West Europe or Sweden Central
Data never leaves EU boundaries
Full GDPR compliance controls

Considerations:

Standard OpenAI API processes data in the US
Document your legal basis for AI processing
Include AI processing in your privacy policy
Enable data subject access and deletion

HIPAA (Healthcare)

For organizations handling protected health information (PHI):

Recommended: Use Azure OpenAI with BAA

Microsoft offers Business Associate Agreements for Azure OpenAI
Configure with HIPAA-compliant settings
Enable audit logging

Caution

Do NOT use standard OpenAI API for PHI. Use Azure OpenAI with a Business Associate Agreement, or self-hosted Ollama for HIPAA compliance.

Considerations:

Exclude PHI from AI-accessible requirements
Document AI processing in your HIPAA policies

SOC 2

For organizations requiring SOC 2 compliance:

Recommended: Use enterprise provider tiers

Azure OpenAI includes SOC 2 compliance
OpenAI Enterprise provides compliance documentation
Anthropic offers enterprise agreements

Considerations:

Document AI provider security controls
Include in your third-party risk assessment
Monitor for security advisories

Financial Services

For banks, insurance, and financial services:

Recommended: Self-hosted or Azure OpenAI

Ollama for complete on-premise control
Azure OpenAI with financial services certifications
Document AI processing in regulatory filings

Best Practices

Minimize Sensitive Data in Requirements

Write requirements to minimize PII and sensitive information:

Instead of:

“John Smith (john.smith@company.com) needs to export his social security number for tax filing”

Write:

“Users need to export personal tax identifiers for compliance reporting”

Use AI Accessible Toggle Appropriately

Warning

Always disable AI access for requirements containing sensitive data. The toggle is your primary control for excluding content from AI processing.

Mark requirements as non-AI-accessible when they contain:

Personal identifiable information (PII)
Financial account numbers
Healthcare information
Trade secrets or proprietary formulas
Classified or restricted information

Review Provider Policies

Before configuring a provider:

Review their data handling policies
Understand their data retention periods
Verify compliance certifications
Consider data residency requirements

Audit AI Usage

Regularly review AI feature usage:

Check which providers are configured
Review feature assignments
Audit who has configuration access
Monitor for unusual usage patterns

Document Your Policies

Create internal documentation covering:

Which AI providers are approved
What data can be processed by AI
Who can configure AI features
How to handle AI-related incidents

Summary

Aspect	Catalio’s Approach
Global model training	Never - your data is not used for training
API key storage	AES-256-GCM encryption
API key logging	Never logged or exposed
Organization isolation	Complete - no cross-org data access
Contextual learning	Optional, isolated per organization
Data deletion	Full deletion supported on request
Provider choice	BYOLLM - you choose your provider
Sensitive data control	AI Accessible toggle per requirement

Resources

Supported AI Providers - Provider comparison and capabilities
Bring Your Own LLM - Configure your AI provider
Setting Up LLM API Keys - Detailed setup guide
Catalio Privacy Policy - Legal privacy documentation

Questions?

For privacy-related questions about AI features:

Email: privacy@catalio.ai
Security: security@catalio.ai
General Support: support@catalio.ai

Last Updated: December 2025

Core Privacy Principles

1. Your Data Stays Yours

2. Organization Isolation

3. Explicit Control

4. Transparency

Supported AI Providers

Provider Data Policies

API Key Security

AES-256-GCM Encryption

Encryption Implementation

Never Logged

Data Flow for AI Features

Requirement Analysis

Semantic Search (Embeddings)

AI Chat Assistant

Contextual Learning

How Contextual Learning Works

Isolation Guarantees

No Global Training

Enabling/Disabling

Data Control and Deletion

AI Accessible Toggle

Conversation Deletion

Organization Data Deletion

Right to Deletion (GDPR)

Compliance Considerations

GDPR (European Union)

HIPAA (Healthcare)

SOC 2

Financial Services

Best Practices

Minimize Sensitive Data in Requirements

Use AI Accessible Toggle Appropriately

Review Provider Policies

Audit AI Usage

Document Your Policies

Summary

Resources

Questions?