Tech10
Back to blog

The 8 Questions to Ask Before Hiring an AI Consultant

Eight Questions To Ask Before Hiring Ai Consultant
AI ConsultingApr 25, 20266 min readDoreid Haddad

Most scoping calls with AI consultants cover the same surface ground. What's the use case. What's the timeline. What's the budget. The conversation feels productive but produces little signal about whether this firm will actually deliver. The questions that surface real signal are the ones consultants prefer to glide past.

This article is the eight questions worth asking, and what to listen for in the answers.

Question 1: "Show me an eval set from a past engagement."

The single most predictive question. Eval discipline separates AI consultants who deliver from ones that pitch.

Strong answer: a redacted eval set with 100-500 examples, organized by category (typical cases, edge cases, adversarial inputs), with documented expected outputs and pass/fail criteria. Methodology for how the set was constructed and how regression testing runs.

Weak answer: "we use accuracy and F1 score" or "we work with the client to define KPIs." This is the single most reliable indicator that the firm doesn't actually do evaluation work.

Question 2: "Walk me through three production deployments still running 12+ months later, and what's your PoC-to-production conversion rate?"

The "PoC-to-production conversion rate" framing — surfaced by DEV Community's enterprise AI buyer's checklist — is a sharper metric than raw counts. A firm with 50 PoCs and 5 productions has a 10% conversion rate. A firm with 12 PoCs and 9 productions has 75%. The rate matters more than the volume.

This is the question that filters out demo-culture firms.

Strong answer: three named deployments with specific clients (or anonymized but consistent details), what each does in production, what the operational metrics look like, and what failures the firm has handled post-deployment. Reference contacts available on request.

Weak answer: a list of "case studies" that are pilots, demos, or consulting reports rather than running systems. Pilots in 12 industries are not three productions over 12 months.

Question 3: "Who specifically will work on this engagement?"

The classic bait-and-switch lives here.

Strong answer: named individuals with bios, committed allocation percentages, and confirmation that the senior consultants in the room will be the ones delivering. Often a willingness to put the named team into the contract.

Weak answer: "we'll staff with our best available team at engagement start." This is the language of a firm planning to swap senior staff for juniors after signing.

Question 4: "How do you handle disagreement during scoping?"

A behavioral question disguised as a process question.

Push back on a recommendation they make. "I'm not sure RAG is the right choice here, I'd want to start with prompt engineering" or "I'd want governance running parallel from week one rather than after the build." Then watch how they respond.

Strong answer: engagement with the substance. "Here's why we still recommend X given the data we have" or "you may be right, let's walk through the tradeoffs." Real consultants have opinions and defend them.

Weak answer: immediate capitulation ("oh yes, of course, we can do it your way") or rigid restating of the recommendation without engaging the substance. Both signal a firm that won't push back during execution either, which is when push-back matters most.

Question 5: "What's the integration cost into our specific stack?"

The model is small. The integration is big.

Strong answer: specific knowledge of integrations into your CRM brand, your data warehouse, your monitoring stack. Concrete time estimates per integration with assumptions documented. Often: "we've integrated with [your CRM] in three past engagements; it usually takes 2-3 weeks for the level of integration you're describing."

Weak answer: "integration costs depend on the specifics, we can scope that in discovery." This is the language of a firm that has not actually integrated with your stack and is planning to learn on your dollar.

Question 6: "What's your governance approach and when does it start?"

Per the NIST AI Risk Management Framework, mature AI deployments require formal governance work.

Strong answer: governance runs as a parallel workstream from week one. Specific deliverables: regulatory mapping, audit trail design, model documentation, incident response procedures. Specific knowledge of regulations applicable to your sector.

Weak answer: governance happens at the end as a final review. Generic "compliance review" language without naming specific regulations. This is the language of a firm planning to skip governance and hope the legal review goes well.

Question 7: "What's your post-deployment support model?"

The engagement ends, but the system keeps running. What happens then?

Strong answer: a defined transition where the client team takes ownership, with a specific runbook produced as a deliverable. Optional ongoing support tier for incident response and model refinement. Honest acknowledgment that some retraining or refinement will be needed and a structure for handling it.

Weak answer: "we provide ongoing support" without specifics, or implicit assumption that you'll be paying for ongoing managed services indefinitely. Both extremes — no transition plan or eternal lock-in — signal misalignment.

Question 8: "Walk me through an engagement that went badly."

The honesty test.

Strong answer: a specific engagement that didn't deliver as planned, what went wrong, what they learned, what they changed in their methodology. The willingness to discuss failure is correlated strongly with capability.

Weak answer: "we don't really have engagements that go badly" or a non-answer about how they once had a "challenging client." Every consulting firm has had engagements that went badly. The ones that won't admit it haven't learned from theirs.

What to do with the answers

After the scoping call, score each answer on the four-dimensional scale (strong / mostly strong / mixed / weak) and look at the pattern.

Mostly strong across all eight: this is a firm worth advancing.

Strong on some, weak on others: the pattern matters. Strong on track record but weak on eval is a firm that has shipped before but won't measure quality. Strong on eval but weak on integration is a firm that builds models but won't ship them. Match the strengths to your priorities.

Mostly weak: don't advance. The scoping call is the firm's best foot forward. Performance during execution is rarely better than performance during scoping.

A working scoping-call protocol

A 60-minute scoping call covering these eight questions plus baseline use case discussion is enough to filter most firms. Schedule with the senior delivery lead, not just sales. Bring an engineer from your team to catch vague technical answers. Take notes. Compare across firms within 48 hours of the calls so memory is fresh.

This is more rigor than most scoping calls get. The outcome is materially better firm selection, and the marginal time investment pays back many times over against the cost of choosing the wrong consultant.

The honest takeaway

Eight questions. Eval set, production deployments, specific team, scoping disagreement, integration cost, governance approach, post-deployment support, an engagement that went badly. The answers separate consultants who can deliver from ones who pitch. The questions are not difficult. The willingness to ask them and listen carefully to the answers is what most buyers skip.

Frequently Asked Questions

What's the most common dodge from AI consultants during scoping?

Hand-waving about evaluation. Ask 'how do you measure model quality' and listen for specifics. Strong consultants describe eval set construction, hold-out tests, regression methodology, and how they involve domain experts. Weak consultants give a paragraph of generic language about 'KPIs and metrics.'

Should I bring my own engineering team to scoping calls?

Yes, ideally an engineer who has shipped production systems. They'll catch vague answers that business stakeholders will let slide. Engineers also push back on architecture suggestions, which surfaces whether the consultant can defend their recommendations or just cycles to a different recommendation.

Sources
Doreid Haddad
Written byDoreid Haddad

Founder, Tech10

Doreid Haddad is the founder of Tech10. He has spent over a decade designing AI systems, marketing automation, and digital transformation strategies for global enterprise companies. His work focuses on building systems that actually work in production, not just in demos. Based in Rome.

Read more about Doreid

Keep reading