Evaluating an AI Vendor's Data Practices and Privacy

Most AI vendor evaluations cover demos, pricing, and integration but skip the data and privacy questions that determine whether the deployment passes legal review. The result is contracts that get caught at the legal review step, sometimes blocking deployment entirely. Per Morgan Lewis's framework for evaluating AI vendors, the data and privacy questions are foundational, not optional. Per Stanford Law School's guide to AI vendor contracts, "AI vendor contracts are more than just legal agreements; they are actively shaping AI governance, liability structures, and compliance standards" — the negotiation is structural, not paperwork.
This article is the seven questions that surface real data risk and the contract terms that protect against it.
Question 1: Where did the training data come from?
The model was trained on something. Knowing what determines a lot about your downstream IP and compliance exposure.
What to ask: was the model trained on publicly available data, licensed data, or customer data? If publicly available, do they have a position on copyrighted material? If licensed, can they share the licensing structure? If customer data, whose customers and under what terms?
Strong answer: clear documentation of training data sources, licensing where applicable, no surprises.
Weak answer: vague gestures at "publicly available data" without specifics, or refusal to discuss training data as "proprietary." Both signal risk that's hard to quantify.
Why this matters: training data lawsuits are now a recurring risk. Vendors with opaque training data sources may face downstream liability that affects their ability to keep operating, which becomes your problem if the vendor's product becomes core to your operations.
Question 2: Will our data be used to train future models?
The default in many AI vendor contracts is yes, sometimes buried in dense terms. The default should be no.
What to ask: is customer data used to train models, fine-tune models, or improve the service in any way? Is this opt-in or opt-out? Where in the contract is this controlled?
Contract term to require: "Vendor will not use Customer Data, Customer Inputs, or Customer Outputs to train, fine-tune, or otherwise improve any AI model, including foundation models, without explicit written opt-in by Customer."
This is non-negotiable for any sensitive data deployment. Vendors that won't agree to this term are not appropriate for sensitive use cases.
Question 3: Where is the data stored and processed?
Data residency is increasingly contractual. EU data must often be processed in the EU. Healthcare data has US-specific residency requirements. Financial data has sector-specific rules.
What to ask: in which countries is the data stored? In which countries is it processed? Are sub-processors used (other vendors that touch the data), and where are they?
Contract term to require: explicit data residency commitment. For EU customers: "All Customer Data shall be stored and processed within the European Economic Area." For US healthcare: HIPAA-compliant data centers with BAA in place.
The sub-processor question is important. Many AI vendors use third-party model providers (Anthropic, OpenAI). Your data flows to those providers. Ensure the chain is documented and the providers are acceptable to your privacy team.
Question 4: How long is data retained?
Retention varies dramatically. Some vendors hold data indefinitely. Some have 30-day default retention with extension options.
What to ask: what's the default retention period? What gets retained — inputs, outputs, intermediate model artifacts? Can retention be configured?
Contract term to require: explicit retention limits with default short. "Vendor shall retain Customer Inputs and Outputs for no longer than [X] days unless otherwise specified by Customer." For sensitive use cases: zero-retention modes where the vendor doesn't store inputs or outputs at all.
Question 5: How does deletion actually work?
GDPR and similar laws require ability to delete personal data on request. AI vendors often have multiple data stores: production database, training data, model weights, logs, backups. Real deletion requires removing data from all of them.
What to ask: when a deletion request comes in, what gets deleted? What gets retained? How long does the deletion take? Are there model artifacts trained on the data that don't get deleted with the records?
Contract term to require: explicit deletion SLA with full chain of stores covered. "Upon Customer's request to delete Personal Data, Vendor shall delete the data from all production systems, backups, and any model artifacts derived from the data within [30] days, and shall provide written confirmation of completion."
This is harder than it sounds. Some vendors can't actually do it without retraining models. The vendors that can do it have designed for it from the start.
Question 6: What's the breach response process?
When the vendor has a security incident affecting your data, what happens?
What to ask: notification timeline, what gets disclosed, who pays for incident response, what credit or remedy is available, who gets sued first.
Contract term to require: notification within 24-48 hours of discovery, full incident report within 30 days, vendor responsible for incident response costs through their layer of the stack, indemnification for vendor-side breaches.
The notification timeline is often the most contested term. Vendors prefer flexible "without undue delay" language. Customers should require specific hour-counts. Push for 48 hours maximum.
Question 7: What happens to data when the contract ends?
Contract end is when data risk is highest. The vendor still has the data, the relationship is over, oversight is reduced.
What to ask: post-contract data deletion timeline, post-contract retention of any data, post-contract use rights (do they retain rights to use anonymized or aggregated data?).
Contract term to require: complete deletion within 90 days of contract termination, written certification of deletion, no post-contract use rights for any customer data including aggregates derived from it.
The aggregate-data question matters especially. Many SaaS contracts allow the vendor to retain "aggregated and anonymized" data perpetually. For AI vendors, this can include training data derived from your usage. Strike the language unless you understand and accept the implications.
Sample contract language
A useful baseline data-and-privacy clause for AI vendor contracts:
"Customer Data shall not be used by Vendor to train, fine-tune, or otherwise improve any AI model without Customer's explicit written opt-in, and any such opt-in shall be revocable on 30 days notice. Customer Data shall be stored and processed exclusively within [region]. Default retention for Customer Inputs and Outputs shall not exceed 30 days unless extended by Customer. Upon Customer request, Vendor shall delete Customer Data from all production systems, backups, and derived model artifacts within 30 days and provide written certification. In the event of a security incident affecting Customer Data, Vendor shall notify Customer within 48 hours of discovery and bear all costs of incident response through Vendor's systems. Upon contract termination, Vendor shall delete all Customer Data within 90 days and shall not retain any post-contract use rights, including for aggregated or derived data."
Most vendors will negotiate against this language. The negotiation is the point — it surfaces what they actually do versus what the marketing claims.
What to do if the vendor won't negotiate
For sensitive deployments, walk. The contract terms are not bureaucratic — they encode operational practice. Vendors who won't agree to standard data terms either don't operate that way or don't have the maturity to commit to it.
For low-stakes deployments, accept the standard terms but limit the data exposure. Don't send sensitive data to a vendor whose data practices you can't control.
The honest takeaway
Seven questions: training data sources, customer data usage, residency, retention, deletion, breach response, contract termination. Each has a standard contract term that protects you.
Most AI vendor evaluations skip these questions. The skipped evaluations produce contracts that fail legal review or, worse, get signed and create compliance exposure.
Ask the questions. Negotiate the terms. The friction is real and worth it. The data practices baked into the contract are the data practices you'll be living with for the duration.
Frequently Asked Questions
Is it normal for AI vendors to use customer data for model training?
It's common but should never be the default. Strong vendors offer explicit opt-in (or default opt-out) for customer data being used to train models. Weak vendors bury opt-out in dense terms or use customer data by default. Always negotiate this term explicitly — silence in the contract usually means opt-in.
What happens to customer data when we leave an AI vendor?
Whatever the contract says, which is usually less than buyers expect. Standard SaaS terms allow vendors to retain backups for 30-90 days. AI vendors may also have model artifacts trained on your data that don't get deleted with database records. Negotiate explicit deletion requirements including model rollback if your data was used in training.
Sources
- Stanford Law School — Navigating AI Vendor Contracts and the Future of Law
- Dentons — Key Considerations for Evaluating Vendor Contracts Involving AI
- Bloomberg Law — AI Considerations for Data Privacy Contracts
- American Bar Association — Avoiding AI Agreement Dystopia: Managing Key Risks in AI Licensing Deals
- Morgan Lewis — Key Considerations When Evaluating an AI Vendor
- NIST — AI Risk Management Framework
- European Commission — EU AI Act overview
- European Commission — GDPR official text
- Stanford HAI — AI Index Report 2026

Founder, Tech10
Doreid Haddad is the founder of Tech10. He has spent over a decade designing AI systems, marketing automation, and digital transformation strategies for global enterprise companies. His work focuses on building systems that actually work in production, not just in demos. Based in Rome.
Read more about Doreid


