What to Ask an AI Consulting Firm Before Signing: 23 Questions
23 critical questions to ask any AI consulting firm before signing a contract. Covers technical capability, delivery, pricing, and post-deployment.
TL;DR
- Most AI consulting engagements fail not because of bad models, but because buyers didn’t ask the right questions before signing
- 80% of AI projects fail to deliver business value (RAND Corporation, 2024) — and poor vendor selection is a leading cause
- These 23 questions cover five critical areas: technical capability, delivery track record, pricing, team composition, and post-deployment support
- The questions are designed to separate firms that ship production AI from those that deliver slide decks and prototypes
Hiring an AI consulting firm is one of the highest-stakes vendor decisions a company can make. According to RAND Corporation’s 2024 research, over 80% of AI projects fail — more than twice the failure rate of non-AI IT projects. S&P Global’s 2025 data shows 42% of companies have abandoned most of their AI initiatives, up from 17% the year before.
Many of those failures start at the vendor selection stage. The wrong consulting firm burns budget, wastes months, and leaves you with a prototype that never reaches production. Gartner’s July 2024 data shows 30% of generative AI projects are abandoned after the proof-of-concept stage.
These 23 questions are organized to help you evaluate an AI consulting firm systematically — before you sign anything.
Section 1: Technical Capability (Questions 1–5)
These questions determine whether the firm can actually build what you need — not just talk about it.
1. What is your evaluation framework for determining whether AI is the right solution for our problem?
Not every business problem needs AI. A good firm will have a structured process for assessing whether your use case warrants an AI solution or whether a simpler rules-based approach would work better. If they skip this step and jump straight to model selection, they are optimizing for their billable hours, not your outcomes.
What a good answer sounds like: “We start with a problem assessment that evaluates data availability, decision complexity, and ROI potential before recommending any AI approach. About 30% of the time, we recommend against an AI solution.”
2. How do you handle the transition from prototype to production?
This is where most AI projects die. Gartner’s May 2024 data puts the average prototype-to-production timeline at 8 months. Ask specifically about their deployment pipeline, infrastructure requirements, and how they handle the gap between a model that works in a notebook and a system that works at scale.
What a good answer sounds like: “We build production-grade from day one. Our architecture includes monitoring, logging, and rollback capabilities in the initial deployment — not as an afterthought.”
3. What monitoring and observability do you build into AI systems?
AI systems degrade silently. Model drift, data distribution shifts, and changing user behavior can erode performance without triggering any traditional error alerts. A firm that treats deployment as the finish line will leave you with a system that gets worse every week without anyone noticing.
What a good answer sounds like: “We instrument every model with drift detection, performance dashboards, and automated alerting thresholds tied to your business metrics — not just model accuracy.”
4. How do you approach data quality assessment and preparation?
Data quality problems are the most common root cause of AI project failure. Ask how they audit your existing data, what minimum quality thresholds they require, and how they handle gaps. Firms that skip data quality work and jump to model building are setting you up for expensive rework.
What a good answer sounds like: “We run a structured data audit in the first week that covers completeness, consistency, freshness, and bias. We give you a written report with specific remediation steps before any model work begins.”
5. Can you walk me through your approach to AI evaluation and testing?
Traditional software testing (unit tests, integration tests) is necessary but insufficient for AI systems. You need evaluation frameworks that test for hallucinations, bias, edge cases, and performance under distribution shift. Ask for specifics about their eval suite, not just a promise that they “test thoroughly.”
What a good answer sounds like: “We build custom evaluation harnesses for each project that include automated regression tests, human evaluation protocols, adversarial testing, and production monitoring. Here is an example from a recent project.”
Section 2: Delivery Track Record (Questions 6–10)
Past performance is the best predictor of future results. These questions cut through marketing claims.
6. Can you share references from clients where the AI system is still running in production today?
A demo is not a delivery. Many consulting firms can build impressive prototypes that never survive contact with production traffic. Ask specifically for references where the system has been running for 6+ months and is still delivering value. Then actually call those references.
What a good answer sounds like: They give you names and contact information without hesitation. Firms that hedge with “confidentiality concerns” for every single project are often hiding a trail of abandoned prototypes.
7. Tell me about a project that failed. What went wrong and what did you learn?
Any firm that claims a perfect track record in AI is either too new to have encountered real problems or is not being honest. BCG’s 2025 research shows 74% of companies struggle to scale AI value. Failure is normal. What matters is whether the firm learned from it and changed their process.
What a good answer sounds like: A specific, detailed account of what went wrong — whether it was a data quality issue, scope creep, or a misaligned success metric — and the concrete process changes they made afterward.
8. What is your typical project timeline from kickoff to production deployment?
Compare their answer to the industry baseline: Gartner’s May 2024 data shows the average AI prototype-to-production timeline is 8 months. If they claim 2 weeks, ask what exactly ships in that timeframe. If they say 12 months, ask why and what is happening in each phase.
What a good answer sounds like: A phased timeline with specific milestones, deliverables, and decision points — not a vague “it depends.” At Clarity, we use a structured sprint model that delivers production AI in defined timeframes with clear scope boundaries.
9. How do you define and measure project success?
If their success metric is “model accuracy,” run. Model accuracy in isolation means nothing if the system does not move a business outcome. McKinsey’s State of AI 2025 report found that only 17% of companies report 5% or more EBIT impact from generative AI. Ask how they connect technical metrics to business results.
What a good answer sounds like: “We define 2-3 business KPIs at kickoff, build dashboards that track them from day one, and tie our delivery milestones to measurable movement in those KPIs.”
10. What percentage of your projects make it to production vs. staying as prototypes?
This question directly addresses the industry’s biggest problem. If less than 70% of their projects reach production, ask why. The answer will tell you whether they are a prototyping shop or a delivery firm.
What a good answer sounds like: A specific number, backed by their project portfolio. Bonus points if they can explain the reasons for the ones that did not ship — sometimes the right answer is to kill a project early.
Section 3: Pricing and Contracts (Questions 11–16)
Pricing models determine incentive alignment. The wrong structure pays the firm to drag things out.
Misaligned Pricing
- ×Open-ended time-and-materials with no cap
- ×Success defined as 'model deployed' not 'business value delivered'
- ×Change orders for every scope adjustment
- ×No financial consequence for the firm if the project fails
Aligned Pricing
- ✓Fixed-fee or outcome-based with clear deliverables
- ✓Success tied to business metrics you define upfront
- ✓Built-in scope flexibility for reasonable adjustments
- ✓Shared risk through milestone-based payments or outcome bonuses
11. What pricing model do you use, and why?
Stack.expert’s 2025 research shows 73% of buyers prefer fixed-fee pricing for AI consulting work. OrientSoftware’s 2024 data puts AI consulting rates at $150-$500 per hour for time-and-materials engagements. Understand what you are paying for and how the pricing model aligns the firm’s incentives with your outcomes. See our detailed comparison of pricing models for more context.
What a good answer sounds like: A clear explanation of their model, why they chose it, and how it protects you from scope creep and cost overruns. They should be able to explain the trade-offs honestly.
12. What is included in the quoted price, and what costs extra?
Hidden costs kill AI project budgets. Common surprises include: cloud infrastructure charges, third-party API costs (LLM inference fees add up fast), data preparation work, additional evaluation rounds, and post-deployment support. Get the full picture in writing before signing.
What a good answer sounds like: An itemized breakdown that separates their fees from pass-through costs, with realistic estimates for infrastructure and API usage based on your expected scale.
13. What happens if the project scope changes mid-engagement?
Scope changes are inevitable in AI projects because you learn things during development that change the requirements. Ask how they handle this: do they charge change orders for every adjustment, or is there built-in flexibility? The answer reveals whether they see themselves as a partner or a vendor.
What a good answer sounds like: “We build a 15-20% scope buffer into our estimates. Minor adjustments are absorbed. Major pivots trigger a joint scope review with a revised estimate that you approve before work continues.”
14. What are your payment terms and milestone structure?
Paying 100% upfront for an AI project is a bad idea. Milestone-based payments tied to deliverables protect both sides. Ask for specific milestones and what constitutes “complete” at each stage.
What a good answer sounds like: “We structure payments across 3-4 milestones tied to specific deliverables. You review and approve each deliverable before the next payment is due.”
15. Is there a warranty or support period included after delivery?
The first 30-90 days after an AI system goes live are critical. Production traffic will expose issues that testing missed. Ask whether post-deployment support is included in the price or billed separately — and what it covers.
What a good answer sounds like: “We include 30 days of production support after deployment. This covers bug fixes, performance tuning, and model adjustments. Extended support is available as a separate retainer.”
16. Who owns the intellectual property — the code, models, and data artifacts?
This is non-negotiable. You should own everything that is built for you: the code, trained models, evaluation frameworks, documentation, and data pipelines. Some firms retain IP rights and license it back to you, which creates dependency and limits your options.
What a good answer sounds like: “You own everything. All code, models, and artifacts are yours. We retain no licenses, no usage rights, and no proprietary claims on work product built for your project.”
Section 4: Team Composition (Questions 17–20)
Who actually does the work matters more than the firm’s brand name.
17. Who will be working on my project, and can I meet them before signing?
Many large consulting firms sell with senior partners and staff with junior contractors. Ask to meet the actual team members who will do the work — not just the account manager. Review their backgrounds, ask about their relevant experience, and assess whether they understand your domain.
What a good answer sounds like: They introduce you to the specific engineers and architects who will work on your project. They can speak to relevant experience and ask intelligent questions about your problem.
18. What is your team’s experience with our specific industry and use case?
Domain expertise matters in AI. A team that has built recommendation systems for e-commerce is not automatically qualified to build clinical decision support for healthcare. Ask for specific examples of work in your industry, and probe for understanding of your regulatory environment, data constraints, and user expectations.
What a good answer sounds like: Specific examples with enough detail that you can verify they actually did the work and understood the domain-specific challenges.
19. How do you handle knowledge transfer to our internal team?
If the consulting firm leaves and nobody on your team understands the system, you are dependent on them forever. Ask about documentation standards, code handoff procedures, training sessions, and whether they pair-program with your engineers during the engagement.
What a good answer sounds like: “We embed with your team from week one. Every technical decision is documented. We run structured knowledge transfer sessions at each milestone, and we pair-program on critical components so your team can maintain the system independently.”
20. What is your subcontracting policy?
Some firms subcontract significant portions of the work to freelancers or offshore teams without disclosing it. Ask directly whether any of the work will be subcontracted, to whom, and what quality controls are in place.
What a good answer sounds like: Full transparency about who does what work. If they subcontract, they should explain why, introduce those team members, and describe their quality assurance process.
Section 5: Post-Deployment and Long-Term Support (Questions 21–23)
The real test of an AI system starts after launch. These questions determine whether your investment will last.
21. What does your post-deployment monitoring and maintenance look like?
AI systems require ongoing attention that traditional software does not. Models drift, data distributions change, and user behavior evolves. Ask about their monitoring infrastructure, alerting thresholds, and retraining cadence. A firm that treats deployment as the end of the engagement is leaving you exposed.
What a good answer sounds like: “We set up automated drift detection, performance monitoring dashboards, and alerting before go-live. Our standard maintenance includes monthly model performance reviews and quarterly retraining assessments.”
22. How do you handle model retraining and updates after the initial deployment?
Models are not static software — they degrade as the world changes around them. Ask about their retraining pipeline, how they handle new data ingestion, and whether they have automated or manual retraining workflows. Also ask about rollback procedures if a retrained model performs worse.
What a good answer sounds like: “We build automated retraining pipelines with evaluation gates. A new model only promotes to production if it passes the same evaluation suite as the original, plus regression tests on known edge cases.”
23. What happens if we want to switch providers or bring the work in-house?
This is the question most buyers forget to ask — and the one that matters most for long-term flexibility. Ask about documentation completeness, code portability, infrastructure dependencies, and whether they use proprietary frameworks that create lock-in.
What a good answer sounds like: “Everything is documented, open-source where possible, and runs on standard cloud infrastructure. We provide a complete handoff package including architecture docs, runbooks, and a transition support period.”
How to Use This List
You do not need to ask all 23 questions in a single meeting. Use this framework:
First Call
Questions 1-5 (Technical Capability) and Questions 6-10 (Track Record). These determine whether the firm is worth a deeper evaluation.
Second Call
Questions 11-16 (Pricing) and Questions 17-20 (Team). These determine whether the engagement structure protects your interests.
Before Signing
Questions 21-23 (Post-Deployment). These determine whether your investment will survive contact with production.
The firms that answer these questions clearly, with specific examples and without defensiveness, are the ones worth working with. The ones that dodge, generalize, or get uncomfortable are telling you something important about how the engagement will go.
At Clarity, we welcome these questions because our delivery model is built around the problems they expose: fixed-fee pricing that aligns our incentives with your outcomes, production-first architecture from day one, and full IP ownership for every client. If you are evaluating AI consulting firms, start a conversation with us — we will answer all 23.
Building AI that needs to understand its users?
Key insights
Stay sharp on AI personalization
Daily insights and research on AI personalization and context management at scale. Read by hundreds of AI builders.
Daily articles on AI-native products. Unsubscribe anytime.
We build in public. Get Robert's weekly newsletter on building better AI products with Clarity, with a focus on hyper-personalization and digital twin technology. Join 1500+ founders and builders at Self Aligned.
Subscribe to Self Aligned →