Portfolio Red Flags: How to Read an AI Vendor's Past Work

AI ConsultingMay 16, 20266 min readDoreid Haddad

In this article

AI vendor portfolios are mostly marketing artifacts. Pretty screenshots, success language, customer logos, light technical detail. The portfolio is engineered to impress, not inform. Per Reed Smith LLP's red-flag guide for AI deals, the value of an AI engagement "often turns on data provenance, intellectual property clarity, regulatory compliance" — none of which a polished portfolio surfaces by default. Buyers who read portfolios charitably get the engagement they were warned about. Buyers who read past the marketing find what was actually built.

This article is the eight signals that separate real production work from polished demos.

Signal 1: Architectural detail

Strong portfolios show how the system was built: data flow diagrams, model selection rationale, integration points, eval methodology, infrastructure choices.

Weak portfolios show what the system does at a marketing level: "AI-powered customer service" with a screenshot of a chat interface.

What to ask if portfolio is shallow: "Can you walk me through the architecture of [specific portfolio item] for 15 minutes?" Vendors with real depth can do this comfortably. Vendors with marketing-only portfolios will deflect or produce vague answers.

Signal 2: Specific named challenges

Real engagements have specific challenges that get solved. A real portfolio item describes them: "the customer's data was 60% structured and 40% unstructured, requiring a hybrid retrieval approach" or "initial deployments hallucinated product specifications, fixed by adding strict citation requirements."

Weak portfolios describe generic "challenges" like "scaling AI to enterprise" or "ensuring data quality" without specifics. These descriptions could apply to any engagement and don't reveal what was actually solved.

Signal 3: Production usage data

Strong portfolios cite ongoing production metrics: monthly query volume, user adoption rates, accuracy benchmarks, cost-per-query trends. The data signals the system is actually running.

Weak portfolios describe project outcomes ("3-month engagement, deployed to 10 users") without ongoing metrics. The pattern suggests the system was built and then either stagnated or wasn't widely adopted.

Signal 4: Named customers willing to be referenced

The most reliable signal. Real portfolio items have customers willing to be named and to take reference calls. The friction in producing them tells the truth about the engagement.

Strong vendors: three named contacts within 1 week of request, customers responsive, willing to speak openly.

Weak vendors: anonymized "Fortune 500 healthcare company" descriptions, references that take 2-3 weeks to surface, customers willing to do a 15-minute polite call but not detailed reference.

Signal 5: Continuity from project to ongoing relationship

Real portfolio items often have ongoing engagement after initial build: feature additions, model retraining, new use cases. The vendor stayed engaged because the customer kept finding value.

Weak portfolio items are one-shot projects. The engagement ended at delivery. Sometimes this is appropriate (the customer took it in-house), but the pattern of all one-shots is a signal.

Signal 6: Failure transparency

Strong vendors discuss what didn't work in past portfolio items. "We tried fine-tuning first; it didn't beat the baseline RAG, so we reverted. Took us 4 weeks to figure out fine-tuning wasn't the answer." This kind of detail is uncomfortable to share and indicates real experience.

Weak vendors describe portfolio items as uniformly successful. This is implausible. Real engagements have things that didn't work; pretending otherwise is marketing fiction.

Signal 7: Industry-specific depth

Per the industry-specific experience analysis, real industry depth shows in portfolio details. Healthcare AI portfolio items reference HIPAA-specific design decisions, ICD code handling, prior authorization workflows. Generic AI portfolio items in healthcare show none of this.

The shorthand: if the portfolio item could be moved to a different industry by changing a few words, the vendor doesn't have industry depth.

Signal 8: Technical artifacts available

Strong vendors share technical artifacts during scoping (under NDA): redacted architecture diagrams, eval set examples, model documentation, integration code samples.

Weak vendors share marketing decks. Sometimes a 1-page case study. Rarely actual technical artifacts.

The willingness to share technical artifacts under NDA is a strong predictor of capability. If it's hard to produce, the technical work probably wasn't done at the depth implied.

What to do with a thin portfolio

When the vendor's portfolio doesn't pass these tests:

Treat the engagement as a first deployment for them. Even if they have AI experience generally, your engagement is a first-of-kind. Price and timeline accordingly — first engagements take longer and cost more.

Require named team continuity. First-time deployments fail when the team rotates mid-engagement. Lock the team in the contract.

Build governance into the contract. First-deployment vendors will skip governance unless it's required. Specify it as a deliverable.

Plan for higher monitoring overhead. Your team will need to verify what the vendor produces because they don't have the pattern recognition of repeat-engagement vendors.

For some engagements, a vendor with a thin portfolio is still the right answer (specialized expertise, founder pedigree, niche capability). The mitigations are how you de-risk the choice.

When portfolio strength is misleading

Portfolio strength can be a misdirection. Specific cases:

Portfolio is strong but team is weak. The portfolio reflects the firm's history; your engagement is staffed with juniors who weren't on those engagements. Verify team composition independently.

Portfolio is strong in one industry but you're in a different one. Industry experience doesn't fully transfer. A healthcare-strong portfolio may not deliver well for a financial services engagement.

Portfolio is strong for a different scale. A vendor with strong enterprise portfolio may not deliver well at SMB scale, and vice versa. Match the portfolio scale to your engagement scale.

Portfolio is from acquired teams or contractors. Some firms grow through acquisition or heavy use of contractors. The portfolio includes work the current full-time team didn't do. Verify which work is the current team's.

The portfolio is necessary but not sufficient. Combine with team verification and reference calls.

A useful portfolio review process

A 90-minute portfolio review process:

Minutes 0-30: ask the vendor to walk through their three strongest portfolio items at architecture level. Listen for specifics.

Minutes 30-60: pick one portfolio item and probe. "Tell me about the data preparation. Tell me about the eval methodology. Tell me about what broke in production. Tell me about integrations."

Minutes 60-75: ask for a portfolio item that's most similar to your engagement. Probe specifically: industry, scale, integration points, regulatory environment.

Minutes 75-90: ask about technical artifacts. Eval sets, model cards, architecture diagrams, integration code. What can be shared under NDA?

This is more rigor than typical portfolio review. The 90 minutes filters out most vendors who can't back the marketing.

The honest takeaway

Eight portfolio signals: architectural detail, named challenges, production usage data, customer references, continuity beyond initial project, failure transparency, industry depth, technical artifacts available.

Most AI vendor portfolios fail several of these. The vendors that pass are usually capable; the ones that don't pass usually aren't, regardless of marketing polish.

Read past the marketing. Probe specifically. Ask for technical artifacts under NDA. The portfolio is the firm's argument that they can do your work. Treat it as an argument, not as proof.

Frequently Asked Questions

Should I expect AI vendors to share named customer references?

Yes, for any engagement above $50K. Strong vendors will share three references with named contacts willing to talk. The friction in producing them is itself a signal — vendors that delay reference sharing or filter heavily are usually hiding either thin track record or unhappy customers.

What do I do if the vendor's portfolio looks great but references won't take my call?

Reference unresponsiveness is a meaningful red flag. The most common explanation is the customer is unhappy and the vendor knows it. Ask the vendor directly why the references are slow to respond. If the answer doesn't satisfy you, walk.

Sources

Reed Smith LLP — AI deals, no illusions: A practical red-flag guide for buyers and boards
Ethisphere — Vendor AI Review Framework
DEV Community — 5 Red Flags in AI Product Demos That PMs Should Never Ignore
Anthropic Research — Building Effective Agents
Harvard Business Review — AI Is Changing the Structure of Consulting Firms
Gartner — Generative AI Consulting and Implementation Services
Stanford HAI — AI Index Report 2026
McKinsey QuantumBlack — The state of AI in 2026

Written byDoreid Haddad

Founder, Tech10

Doreid Haddad is the founder of Tech10. He has spent over a decade designing AI systems, marketing automation, and digital transformation strategies for global enterprise companies. His work focuses on building systems that actually work in production, not just in demos. Based in Rome.