Boards want a number. CFOs want a formula. The reality of measuring AI’s business impact is more nuanced, and more valuable, than any single metric can capture.
Somewhere in the next quarterly board meeting, someone — probably a board member, possibly the CFO — will ask a version of this question: “What’s the ROI on our AI investment?”
It’s a reasonable question. Organizations are spending aggressively on AI. KPMG’s Q4 2025 AI Pulse Survey found that enterprises project deploying $124 million on AI over the coming year, with 67% saying they’ll maintain spending even during a recession. The money is real. The desire for accountability is legitimate.
But the question, as typically asked, is unanswerable — and attempting to answer it with a single number is actively counterproductive.
Here’s why: AI is not a single investment with a single return. It’s a capability that diffuses across every function, every workflow, and every employee in the organization. Asking “What’s the ROI of AI?” is like asking “What’s the ROI of electricity?” or “What’s the ROI of the internet?” The answer depends entirely on what you’re using it for, how well you’re using it, and what you’re measuring.
The organizations that are successfully measuring AI’s impact have stopped chasing a single number. They’ve built measurement systems: suites of metrics across multiple dimensions that together tell the story of where AI is creating value, where it’s not, and what to do about it.
This Guide explains the dimensions you need to consider when coming up with ROI and related measurements for AI. Also, please see our blog post, Creating an ROI Framework for AI Applications, Featuring Microsoft Copilot <link>, for a case study on ROI calculations for Microsoft Copilot and Github Copilot.
The gap between AI investment and provable returns is well documented and widening.
S&P Global’s 2025 Voice of the Enterprise survey found that the share of companies abandoning most of their AI projects jumped to 42%, up from just 17% the year before, often citing cost and unclear value as the primary reasons. Constellation Research’s AI Survey reported that 42% of enterprises have deployed AI without seeing any return at all, with an additional 29% reporting only modest gains. BCG’s 2025 analysis of more than 1,250 firms found that only 5% of organizations are generating meaningful value from AI at scale.
The measurement challenge is real. Gartner’s research indicates that establishing ROI has become the top barrier holding back further AI adoption. Only 21% of companies report measuring the impact of their AI initiatives at all, according to S&P Global. Most organizations are flying blind — spending more, measuring less, and hoping the “vibes” are directionally correct.
But the problem isn’t that AI doesn’t work. It’s that organizations are trying to measure something multi-dimensional with a single-dimensional metric.
The instinct to boil AI impact down to a single ROI percentage is understandable. It’s how we’ve measured technology investments for decades. But AI breaks the traditional ROI model in several fundamental ways.
First, the returns are distributed, not centralized. When you deploy an ERP system, the costs and benefits accrue to a defined scope. When you deploy AI tools across an organization, the impact shows up differently in every function. A sales team might see faster deal cycles. An engineering team might see higher deployment frequency. A customer success team might see improved resolution times. These are fundamentally different value dimensions that resist aggregation into a single number.
Second, the value creation is often non-financial, at least initially. AI might enable a team to tackle more complex work, not just do work faster. It might improve the quality of analysis, not just the speed. It might allow a single person to do what previously required a team, but this doesn’t show up as cost savings if you redeploy those people to higher-value work rather than eliminating their roles. The most important AI impacts are often about capability transformation, not cost reduction.
Third, the timeline is non-linear. AI investments compound over time as proficiency increases, as workflows are redesigned, and as the organization builds muscle memory around AI-augmented processes. The ROI at month three looks nothing like the ROI at month twelve, which looks nothing like the ROI at month twenty-four. A snapshot number at any single point is misleading.
Fourth, the most insidious problem: measuring only the positive side ignores the hidden costs. Workday’s January 2026 study of 3,200 employees and leaders found that roughly 37% of time saved through AI is offset by rework: correcting, verifying, and rewriting low-quality outputs. They called this the “AI Tax.” Only 14% of employees consistently achieve net-positive outcomes from AI use. An organization that measures “hours saved” without accounting for hours lost to rework is celebrating a mirage.
Instead of chasing a single ROI number, effective AI impact measurement requires a framework — a structured approach to understanding where AI is creating value across multiple dimensions, and where it’s not.
Larridin’s Productivity Roof framework, shown in Figure 1, treats AI impact as an architectural system. Five structural pillars — Adoption, Proficiency, Throughput, Reliability, and Governance — support a “roof” of measurable business outcomes. If any pillar is weak, the roof becomes fragile, hard to prove, and impossible to scale.
Figure 1. Larridin’s AI Productivity framework shows the “roof” of productivity resting on five pillars: Adoption, Proficiency, Throughput, Reliability, and Governance.
Each pillar answers a specific executive question:
The roof — Productivity — is the layer of business outcomes these pillars support: revenue impact, cost savings, and risk mitigation. But you can only prove the roof is solid if you can show the pillars are holding it up.
Rather than seeking a single number, effective measurement works across five interconnected dimensions. Each dimension answers a different question, and together they provide a complete picture of where AI is driving value and where it may be falling short.
Effectiveness measures whether AI is helping people and teams produce better results, not just faster ones. This is the dimension most organizations skip because it’s harder to quantify than speed, but it’s often where the most valuable AI impact lives.
Effectiveness shows up as: higher win rates in sales, better forecast accuracy in finance, improved diagnostic precision in healthcare, stronger code quality in engineering, more compelling creative output in marketing.
The key insight is that effectiveness is about capability elevation: moving work up the value chain, from routine execution to strategic judgment. A finance team that uses AI to close the books faster has improved speed. A finance team that uses AI to surface anomalies and improve forecast accuracy has improved effectiveness. The second is worth dramatically more, but it doesn’t show up in a “time saved” metric.
Example metrics: Win rate improvement, forecast accuracy, first-contact resolution rate, code quality scores, content performance, decision quality assessments.
Quality is the guardrail dimension that determines whether gains in other areas are real or illusory. The Workday study’s finding that 37% of time saved is lost to rework is a quality problem masquerading as a productivity gain.
Quality measurement asks: when AI helps produce work, does that work hold up downstream? Does it require extensive editing? Does it introduce errors that compound through the workflow? Is the output trustworthy enough to act on without human verification?
This is where the concept of a “Net Productivity Matrix” becomes critical. Not every AI user is creating net-positive value. Some users save significant time, but generate output that requires equally significant correction. Others save modest time, but produce work that flows through without rework. Measuring quality separates genuine productivity from the illusion of productivity.
Example metrics: Edit rate (percentage of AI output requiring revision), defect rate, rework cost, acceptance rate (percentage of AI output used without modification), customer satisfaction scores on AI-assisted interactions, verification burden score.
Time is the most commonly measured dimension — and the most commonly mismeasured. Most organizations track task-level time savings: “This task took 2 hours, now it takes 30 minutes.” That’s useful, but insufficient.
What matters for business impact is end-to-end throughput — whether complete workflows are completing faster across handoffs, approvals, and systems. A marketing team that generates campaign copy in minutes instead of days hasn’t improved throughput if the review-and-approval process still takes three weeks. An engineering team that writes code faster hasn’t improved throughput if QA and deployment cycles remain unchanged.
The financial translation of time improvement is Cost of Delay: quantifying the value of delivering outcomes sooner. If an e-commerce company launches a product recommendation engine six weeks early, and the new engine generates $75,000 in additional revenue each week, the Cost of Delay value is $450,000. This connects time improvement directly to business impact, without needing to calculate a traditional ROI.
Example metrics: Sales cycle time, deployment frequency, time to market, content production rate, close cycle time (finance), time to hire, end-to-end workflow completion time.
Revenue impact is the dimension boards care about most, but it’s also the hardest to attribute directly to AI. The causal chain from “employee used AI tool” to “revenue increased” runs through multiple intermediaries: the employee’s proficiency, the workflow they applied AI to, the quality of the output, the customer’s response.
Rather than attempting direct attribution, effective revenue measurement focuses on correlation and contribution. Do sales teams with higher AI proficiency close more deals? Do customer success teams using AI have better retention rates? Do marketing teams leveraging AI see stronger campaign performance?
The comparison methodology matters enormously here. The most rigorous approach compares AI-augmented cohorts against non-augmented cohorts performing similar work, controlling for experience, territory, and other variables. This isn’t a randomized controlled trial, but it’s a meaningful step beyond anecdotal claims.
Example metrics: Revenue per employee, deal velocity, pipeline conversion rates, customer retention rate, revenue attribution by AI-augmented vs. non-augmented cohorts, net new revenue from AI-enabled capabilities.
Cost measurement is the most straightforward dimension, but it’s frequently oversimplified. The naive approach counts AI tool licenses as cost and “hours saved × hourly rate” as benefit. This ignores several critical factors.
First, time saved only becomes cost savings if it’s actually recaptured. An employee who saves five hours per week, but fills that time with more of the same routine work, hasn’t generated cost savings; they’ve generated capacity. Capacity only becomes financially valuable when it’s deliberately reallocated to higher-value work or when it forestalls additional hiring.
This is why Capacity Reallocation Value is a more honest metric than simple “time saved.” It calculates: (Time Saved on Routine Tasks) × (Value Differential between Routine and Strategic Work). If a marketing manager saves five hours per week on drafting copy ($75/hr) and redirects that time to strategy ($200/hr), the value isn’t just $375 (hours × rate) — it’s $625/week ($200 - $75 = $125 × 5 hours), once you account for the higher value of the reallocated work.
Second, cost measurement must include the total cost of AI: tool licenses, infrastructure, training, change management, governance overhead, and the hidden cost of rework. The net cost impact — not the gross savings — is what matters.
Third, cost avoidance is often more valuable than cost reduction. AI governance that prevents a GDPR violation (ranging up to 4% of global revenue) or an EU AI Act fine (up to €35 million) creates enormous value that doesn’t appear in a traditional cost-savings analysis.
Example metrics: Cost savings per FTE, total cost of AI ownership, capacity reallocation value, cost avoidance (compliance penalties prevented), tool consolidation savings, hiring avoidance through AI-augmented capacity.
Here’s how the framework works in practice. For any given AI initiative or use case, the measurement approach follows two steps:
Step 1: Identify the primary metric you’re optimizing for.
Not every AI deployment optimizes for the same thing. The right primary metric depends on the function, the use case, and the strategic priority:
A sales team deploying AI for proposal generation might optimize for deal velocity (time dimension): getting proposals out faster to accelerate the pipeline.
An engineering team deploying AI coding tools might optimize for deployment frequency (time): shipping features faster.
A customer service team deploying AI for ticket triage might optimize for first-contact resolution (effectiveness): resolving issues without escalation.
A finance team deploying AI for reporting might optimize for forecast accuracy (effectiveness): improving the quality of projections, not just the speed.
A marketing team deploying AI for content production might optimize for content production rate (time): publishing more frequently.
In some cases, the primary optimization is about enabling work that wasn’t previously possible. A small team taking on the complexity of work that previously required a much larger team is optimizing for capability elevation — a form of effectiveness that doesn’t neatly fit into time or cost metrics.
Step 2: Define two or three guardrail metrics to ensure the optimization isn’t creating hidden costs.
This is the critical step most organizations skip. For every primary optimization, there are potential failure modes that guardrail metrics catch:
The guardrail model prevents the most common failure mode in AI measurement: celebrating a primary metric while ignoring the damage happening in secondary dimensions. The 37% rework problem is exactly what happens when organizations optimize for speed without a quality guardrail.
Some use cases have obvious, clean measurements. Customer service is the textbook example: deploy AI, measure resolution time, measure CSAT, measure cost per ticket. The input-output relationship is relatively direct, the metrics are well-established, and the value is easy to quantify.
But most enterprise workflows are far more nuanced than customer service. And this is precisely why a single AI ROI number fails.
Consider a few real scenarios:
A sales team using AI for competitive intelligence and proposal generation. The primary goal might be deal velocity, but is it? Maybe the real goal is increasing win rate by producing better-tailored proposals. Or maybe it’s freeing up senior salespeople to focus on larger, more complex deals. The “right” primary metric depends on the strategic priority, and that might change quarter to quarter.
An engineering team using AI coding assistants. Are you optimizing for shipping speed? For enabling a smaller team to handle a larger codebase? For reducing the onboarding time for new developers? For improving code quality? Each of these is legitimate, each requires different primary and guardrail metrics, and each produces a different “ROI” narrative.
A finance team using AI for forecasting and close processes. You might want faster close cycles (time). Or you might want better forecast accuracy (effectiveness). Or you might want the finance team to spend less time on reporting and more time on strategic analysis (capability elevation). These aren’t the same thing, and measuring them as if they were collapses important distinctions.
A marketing team using AI for content production. Volume? Quality? Brand consistency? Audience engagement? Cost per asset? The answer depends on whether you’re a high-growth startup trying to flood the zone or an established brand managing reputation risk.
The measurement framework allows each function and use case to define its own primary optimization target, all while maintaining consistent guardrails. This produces a suite of metrics across the organization that tells a coherent story about AI’s impact, without reducing it to a single, misleading number.
One of the most significant limitations in current AI impact measurement is its disproportionate focus on software engineering work. Most AI productivity frameworks measure code generation speed, PR throughput, and deployment frequency. These are valuable metrics, but they ignore the majority of enterprise AI usage.
AI is now embedded in sales, marketing, customer service, HR, finance, legal, and operations. Engineering-only measurement ignores the larger part of potential enterprise AI value. Worse, it produces a distorted picture: engineering might show impressive productivity gains while other functions are seeing negligible returns, and no one knows because no one is measuring.
The Productivity Roof framework is deliberately enterprise-wide. The five pillars apply to every function, with function-specific metrics under each pillar:
Each function has its own primary optimization targets and guardrails. Together, they roll up into an enterprise-wide view that answers the board’s question about ROI; not with a single number, but with a coherent narrative. “Here is where AI is accelerating our business. Here is where quality needs improvement. Here is where we should invest next.”
Separate, dedicated deep-dive guides for function-specific measurement, such as sales AI impact, engineering AI impact, and customer service AI impact, are necessary to do justice to each domain. The framework here provides the connective tissue that makes those function-specific measurements enterprise-coherent.
Even though a single ROI number is a fool’s errand, the board still needs a financial story. The framework provides three methods for translating operational metrics into financial language.
A variety of metrics can contribute to financial returns. Figure 2 shows important workforce AI-adoption metrics that can be actively tracked, from the Larridin report, The State of Enterprise AI 2026.
Figure 2. Selected AI-adoption metrics that organizations can actively track.
(Source: The State of Enterprise AI 2026)
The most broadly applicable financial translation. Rather than claiming “AI saved X hours,” CRV calculates the economic value of shifting time from routine to strategic work.
Formula: (Time Saved on Routine Tasks) × (Value Differential between Routine and Strategic Work)
This works across every function and avoids the trap of valuing all saved hours equally. An hour saved on data entry that’s redirected to strategic analysis is worth more than an hour saved on data entry that’s redirected to more data entry.
Particularly powerful for throughput improvements. CoD calculates the financial value of delivering outcomes sooner, whether that’s a product launch, a sales deal, a regulatory filing, or a market response.
Formula: (Time Accelerated) × (Revenue or Value per Unit of Time)
This converts time-dimension improvements directly into revenue impact without requiring full ROI calculations.
For organizations whose CFOs require a traditional ROI calculation, the framework provides the inputs:
ROAI = (Revenue Attribution + Cost Savings + Risk Mitigation Value) / Total AI Investment
But this is positioned as an optional extension, not the primary measurement. Larridin measures the operational pillars: adoption, proficiency, throughput, reliability, and governance; that feed into ROI. The organization plugs in its own cost data, hourly rates, and investment totals to extend operational measurement into financial ROI.
This distinction matters. Larridin provides the measurement system. The organization provides the financial assumptions. Together, they produce a defensible ROI narrative, not a fabricated number.
Measuring AI impact across these dimensions requires three categories of data, each capturing something the others miss.
Real-time telemetry captures what’s actually happening: which AI tools are being used, by whom, how frequently, for which tasks, and within what patterns. This is the behavioral foundation; it tells you what people do, not what they say they do. Browser-level and desktop-level monitoring provides the needed data at the granularity required for meaningful measurement.
Business system integration correlates AI usage with outcomes. By connecting AI telemetry with your CRM (Salesforce, HubSpot), ITSM (ServiceNow, Zendesk), ERP, HRIS (Workday), and development tools (Jira, GitHub), you can answer the executive questions: Do sales reps using AI close deals faster? Do support agents using AI achieve higher CSAT? Do finance teams close the books faster with fewer errors?
Targeted surveys capture what telemetry and system data cannot: employee sentiment, perceived quality, proficiency self-assessment, friction points, and unmet needs. Quarterly pulse surveys and event-triggered micro-surveys complement the behavioral and outcome data with the human perspective.
No single data source is sufficient. Telemetry without business system integration tells you who’s using AI, but not whether it matters. Business system data without telemetry tells you which outcomes changed, but not whether AI caused the changes. Surveys without telemetry tell you what people think, but not what they actually do. Only be gathering all relevant sources of data, and integrating them, can you develop an accurate picture.
One of the most valuable aspects of the pillar framework is its distinction between leading and lagging indicators.
Lagging indicators, such as revenue impact, cost savings, and ROAI, tell you what already happened. They’re essential for board reporting and accountability, but they’re useless for management. By the time a lagging indicator shows a problem, the damage has been done.
Leading indicators, such as adoption rates, proficiency scores, and governance coverage, indicate what’s about to happen. If proficiency scores are flat or declining in a function, throughput and quality problems will follow. If governance coverage is dropping, compliance incidents are incoming. If adoption is clustering among a few power users, while the majority disengages, your productivity gains are concentrated and fragile.
The pillar framework is designed so that each pillar provides leading indicators for the next:
Productivity (the roof) is the lagging indicator that confirms whether the pillars are working.
This creates an early warning system. If adoption is high but proficiency is low, you know the 37% rework problem is coming. If throughput is improving but reliability is declining, you know gains are about to evaporate. If governance coverage is dropping, you know risk is accumulating. You can intervene before damage occurse, confirmed by your lagging indicators.
For organizations beginning to measure AI impact, the framework provides a phased approach.
Before you can measure improvement, you need to know where you are. Deploy telemetry to capture AI tool usage across the organization. Establish baseline metrics for the five pillars. Discover the full AI tool landscape — sanctioned and unsanctioned. Document current performance in the business systems you’ll use for correlation (CRM, ITSM, HRIS, etc.).
The deliverable is a Baseline Metrics Dashboard that shows: who’s using AI, what they’re using, how well they’re using it, and what the current state of business outcomes looks like in target workflows.
Connect AI telemetry to business systems to enable correlation analysis. Launch proficiency enablement programs to address the rework problem. Begin identifying function-specific primary and guardrail metrics. Establish the governance framework that makes scale safe.
The deliverable is an Integrated Productivity Dashboard that connects AI usage patterns to business outcomes, and that serves as the first evidence of where AI is driving impact and where it’s not.
Identify the top three high-impact use cases per function based on actual data. Focus proficiency investment on the “Misaligned Middle”: users who are active, but who are generating high rework. Consolidate the tool portfolio based on measured impact. Produce the first board-ready AI Impact Report using the financial translation layer.
The deliverable is a defensible narrative, backed by data, that tells the board exactly where AI is accelerating the business, where quality needs investment, where governance is reducing risk, and (if required) what the ROAI looks like.
Organizations don’t build comprehensive AI measurement overnight. The maturity journey typically follows four stages:
Measuring adoption and calling it impact. “87% of our employees use AI” tells you nothing about value creation. Adoption is the foundation, not the roof.
Using gross time saved without accounting for rework. The 37% AI Tax is real. Measure net productivity gains, not gross efficiency improvements.
Trying to calculate a single, enterprise-wide ROI number. Using a single number encourages gaming, hides functional variation, and produces a number that no one trusts because everyone can see that it’s overly simple.
Measuring engineering only. AI is embedded across the enterprise. Engineering-only measurement ignores the majority of AI value and produces a distorted picture.
Relying on self-reported surveys without behavioral data. People overestimate their own AI proficiency and underestimate their rework time. Telemetry provides the behavioral ground truth.
Treating all time saved as equally valuable. An hour saved on data entry is worth less than an hour saved on strategic analysis. Capacity Reallocation Value captures this distinction.
Static measurement in a dynamic landscape. AI tools, capabilities, and best practices change monthly. Measurement frameworks that aren’t updated regularly quickly become irrelevant.
Ignoring the guardrails. Optimizing for speed without measuring quality. Optimizing for volume without measuring effectiveness. Every primary metric needs guardrails, or you’re celebrating at the top while the foundation erodes underneath.
Figure 3 shows industries that have the highest likelihood of achieving ROI with AI within the next six months, as described in the Larridin report, The State of Enterprise AI 2026. Note that these are industries that are likely to use AI in a variety of job roles, not only software engineering.
Figure 3. Industries with high expectations of achieving AI ROI.
(Source: The State of Enterprise AI 2026)
When the board asks “what’s the ROI on our AI investment?”, the right answer isn’t a number. It’s a narrative, backed by data across five dimensions:
“Our AI adoption is at X% across target roles, with Y% of users achieving consistent, proficient usage. AI-augmented workflows are completing Z% faster in sales, finance, and engineering. Quality metrics show that rework has decreased from A% to B% as proficiency programs take effect. Governance coverage is at C%, keeping our risk posture strong. The net impact translates to $D in capacity reallocation value, $E in cost of delay savings, and $F in risk mitigation. We recommend investing next in [specific functions/use cases], where the leading indicators suggest the highest return.”
That narrative, grounded in measurement across adoption, proficiency, throughput, reliability, and governance, is infinitely more useful than a single ROI percentage. It tells the board not just whether AI is working, but where, how, and what to do next.
The question isn’t whether to invest in AI. The question is whether you can measure it well enough to invest wisely.
Larridin’s Productivity Roof framework gives enterprises the measurement system to answer the impact question, with real-time telemetry, business system integration, and a five-pillar architecture that connects AI usage to business outcomes across every function.
Learn how Larridin enables AI governance.
If you’d like to learn more about how to accelerate your AI transformation with Larridin, sign up for our newsletter or book a demo.