Banner image for AI Features and Data Privacy
Integrations 9 min read

AI Features and Data Privacy

How Catalio protects your data when using AI features. Learn about encryption, data handling, contextual learning, and compliance considerations.

Updated

Catalio is designed with privacy at its core. This guide explains how your data is protected when using AI-powered features, including requirement analysis, semantic search, and the AI chat assistant.

Core Privacy Principles

Catalio’s AI features are built on four fundamental privacy principles:

1. Your Data Stays Yours

Important

Catalio never uses your data to train global AI models. Your requirements, use cases, personas, and other content remain exclusively within your organization’s context.

2. Organization Isolation

All AI processing is isolated per organization. Data from one organization is never accessible to another, and AI features cannot cross organizational boundaries.

3. Explicit Control

You control which data is accessible to AI. Every requirement has an “AI Accessible” toggle, allowing you to exclude sensitive content from AI processing.

4. Transparency

We’re transparent about how data flows, what providers receive, and how your content is processed. This document provides complete visibility into AI data handling.

Supported AI Providers

Catalio supports multiple AI providers through our Bring Your Own LLM (BYOLLM) capability:

Provider Chat Embeddings Vision Data Handling
OpenAI Yes Yes Yes API data not used for training
Anthropic Yes No Yes API/commercial: no training; consumer: opt-out
Azure OpenAI Yes Yes Yes Data stays in your Azure tenant
Google Gemini Yes Yes Yes Enterprise data handling available
Groq Yes No No Fast inference, no training
xAI Yes No No OpenAI-compatible API
Ollama Yes Yes Yes Self-hosted, data never leaves
OpenRouter Yes Yes Yes Passes to underlying provider
GitHub Copilot Yes No Yes GitHub Enterprise data policies

Provider Data Policies

Each provider has their own data handling policies:

OpenAI: API requests through the enterprise API are not used for model training. Data may be retained for 30 days for abuse monitoring.

Anthropic: As of September 28, 2025, Anthropic’s data policy varies by plan type. Commercial offerings (Claude for Work, Claude Gov, Claude for Education, and API usage) retain full protections—data is never used for model training. Consumer plans (Free, Pro, Max) operate under an opt-out model where data may be used for training unless users disable this in settings. Since Catalio integrates via the API, your data is protected under commercial terms and is not used for training.

Azure OpenAI: Data remains in your Azure tenant with full enterprise controls. You control data residency, retention, and encryption.

Ollama: Data never leaves your infrastructure. This is the highest privacy option for organizations with strict data handling requirements.

Tip

For maximum data privacy, consider using Ollama for self-hosted AI processing where data never leaves your infrastructure.

API Key Security

When you configure AI providers in Catalio, your API keys are protected with enterprise-grade security:

AES-256-GCM Encryption

All API keys are encrypted at rest using AES-256-GCM, the same encryption standard used by financial institutions and government agencies:

  • 256-bit encryption keys
  • Galois/Counter Mode for authenticated encryption
  • Unique initialization vectors for each encryption
  • Separate key management from database storage

Encryption Implementation

Plaintext
API Key Entry
|
v
AES-256-GCM Encryption
|
v
Encrypted Storage in Database
|
v
(When needed for API call)
|
v
Decryption in Memory
|
v
API Request to Provider
|
v
Immediate Memory Cleanup

The encryption key is stored separately from the encrypted data, typically in environment variables or a secrets management system, ensuring database access alone cannot reveal API keys.

Never Logged

API keys and other sensitive credentials are automatically excluded from:

  • Application logs
  • Error reports
  • Telemetry data
  • Audit trails
  • Debug output

Catalio’s logging system uses a sanitization layer that automatically redacts fields like: password, api_key, secret, token, authorization, bearer, access_token, refresh_token, and similar sensitive identifiers.

Data Flow for AI Features

Understanding how your data flows helps you make informed decisions about AI feature usage.

Requirement Analysis

When you analyze a requirement for quality, sentiment, or categories:

Plaintext
Requirement Content
|
v
Catalio Application
|
| (Selected fields only)
v
Your AI Provider
|
v
Analysis Results
|
v
Stored with Requirement

What’s sent: Title, description, user story (want/benefit), acceptance criteria

What’s NOT sent: User identifiers, organization metadata, audit trail information, internal IDs

Semantic Search (Embeddings)

For semantic search, Catalio generates vector embeddings:

Plaintext
Requirement Text
|
v
Embedding Model API
|
v
Vector (1536 numbers)
|
v
Stored in Catalio Database
|
v
Used for Similarity Search

What’s sent: Combined text from title, user want, and user benefit fields

What’s stored: Only the numeric vector representation, not the original text sent to the API

Important: Embeddings are mathematical representations that cannot be reversed back into original text. They enable semantic search without storing your content at the provider.

AI Chat Assistant

When using the AI chat assistant:

Plaintext
User Message
|
v
Catalio Chat System
|
| + Context from tools
v
Your AI Provider
|
v
AI Response
|
v
Displayed to User
|
v
Stored in Conversation

What’s sent: Your message, conversation history, and context from tool calls (requirement summaries, search results)

What’s stored in Catalio: Complete conversation history for your reference and continuity

Contextual Learning

Catalio offers optional contextual learning that improves AI responses based on your organization’s content and patterns.

How Contextual Learning Works

When enabled, Catalio:

  1. Analyzes patterns in your requirements and usage
  2. Creates organization-specific context
  3. Provides this context to AI for better responses

Isolation Guarantees

Contextual learning is completely isolated per organization:

  • Learning from Org A never influences Org B
  • Context is stored separately for each organization
  • Deletion removes all associated learning data

No Global Training

Note

Contextual learning does NOT involve training AI models. Your data is never used to:
  • Train OpenAI, Anthropic, or other provider models
  • Improve global AI capabilities
  • Share patterns across organizations
  • Create generalizable AI improvements

Contextual learning provides context at inference time, not training time.

Enabling/Disabling

Organization administrators control contextual learning:

  1. Navigate to Settings > AI Features
  2. Toggle Contextual Learning
  3. Choose scope: Requirements only, Full content, or Off

Data Control and Deletion

You maintain full control over your AI-related data.

AI Accessible Toggle

Every requirement has an “AI Accessible” toggle:

  • Enabled (default): Requirement is included in AI analysis and semantic search
  • Disabled: Requirement is excluded from all AI processing

Use this for:

  • Sensitive or confidential requirements
  • PII-containing content
  • Internal notes not suitable for AI processing
  • Compliance-restricted information

Conversation Deletion

Users can delete AI chat conversations:

  1. Open the conversation
  2. Click the options menu
  3. Select Delete Conversation
  4. Confirm deletion

Deleted conversations are permanently removed and cannot be recovered.

Organization Data Deletion

When an organization is deleted from Catalio:

  • All requirements and associated AI data are deleted
  • All embeddings are removed
  • All chat conversations are deleted
  • All contextual learning data is purged
  • All provider configurations and encrypted API keys are deleted

Right to Deletion (GDPR)

For GDPR compliance, Catalio supports data subject requests:

  • Individual user data can be anonymized or deleted
  • Organization data can be exported or deleted
  • Audit trails maintain minimal necessary information

Contact support@catalio.ai for data deletion requests.

Compliance Considerations

GDPR (European Union)

For EU organizations or those handling EU citizen data:

Recommended: Use Azure OpenAI with EU data residency

  • Deploy Azure OpenAI resource in West Europe or Sweden Central
  • Data never leaves EU boundaries
  • Full GDPR compliance controls

Considerations:

  • Standard OpenAI API processes data in the US
  • Document your legal basis for AI processing
  • Include AI processing in your privacy policy
  • Enable data subject access and deletion

HIPAA (Healthcare)

For organizations handling protected health information (PHI):

Recommended: Use Azure OpenAI with BAA

  • Microsoft offers Business Associate Agreements for Azure OpenAI
  • Configure with HIPAA-compliant settings
  • Enable audit logging

Caution

Do NOT use standard OpenAI API for PHI. Use Azure OpenAI with a Business Associate Agreement, or self-hosted Ollama for HIPAA compliance.

Considerations:

  • Exclude PHI from AI-accessible requirements
  • Document AI processing in your HIPAA policies

SOC 2

For organizations requiring SOC 2 compliance:

Recommended: Use enterprise provider tiers

  • Azure OpenAI includes SOC 2 compliance
  • OpenAI Enterprise provides compliance documentation
  • Anthropic offers enterprise agreements

Considerations:

  • Document AI provider security controls
  • Include in your third-party risk assessment
  • Monitor for security advisories

Financial Services

For banks, insurance, and financial services:

Recommended: Self-hosted or Azure OpenAI

  • Ollama for complete on-premise control
  • Azure OpenAI with financial services certifications
  • Document AI processing in regulatory filings

Best Practices

Minimize Sensitive Data in Requirements

Write requirements to minimize PII and sensitive information:

Instead of:

“John Smith (john.smith@company.com) needs to export his social security number for tax filing”

Write:

“Users need to export personal tax identifiers for compliance reporting”

Use AI Accessible Toggle Appropriately

Warning

Always disable AI access for requirements containing sensitive data. The toggle is your primary control for excluding content from AI processing.

Mark requirements as non-AI-accessible when they contain:

  • Personal identifiable information (PII)
  • Financial account numbers
  • Healthcare information
  • Trade secrets or proprietary formulas
  • Classified or restricted information

Review Provider Policies

Before configuring a provider:

  1. Review their data handling policies
  2. Understand their data retention periods
  3. Verify compliance certifications
  4. Consider data residency requirements

Audit AI Usage

Regularly review AI feature usage:

  1. Check which providers are configured
  2. Review feature assignments
  3. Audit who has configuration access
  4. Monitor for unusual usage patterns

Document Your Policies

Create internal documentation covering:

  • Which AI providers are approved
  • What data can be processed by AI
  • Who can configure AI features
  • How to handle AI-related incidents

Summary

Aspect Catalio’s Approach
Global model training Never - your data is not used for training
API key storage AES-256-GCM encryption
API key logging Never logged or exposed
Organization isolation Complete - no cross-org data access
Contextual learning Optional, isolated per organization
Data deletion Full deletion supported on request
Provider choice BYOLLM - you choose your provider
Sensitive data control AI Accessible toggle per requirement

Resources

Questions?

For privacy-related questions about AI features:


Last Updated: December 2025