The 2.5× rule: why most AI ROI numbers are quietly lying to you
How The GAiGE calculates ROI defensibly — the 2.5× extrapolation cap, the use-your-own hourly rate, aggregation thresholds, and where you should push back on our methodology.
By Colin Cardwell
If you can't defend the number, don't report the number.
Every AI vendor has an ROI calculator. Most of them produce results that would embarrass a mid-career finance person. "Save 14 hours per user per week!" Not if the user only works 37 hours a week and spends a third of them in meetings, they won't. The gap between those numbers and reality is where trust in AI measurement goes to die.
The GAiGE exists because we'd rather be honestly useful than impressively wrong. This post is a walk-through of how we calculate the numbers you'll see in the product — and the decisions we've made to keep them defensible. If you're evaluating whether to trust our reports, this is the page to read. If you're a skeptic, we're hoping to earn about eighty percent of your trust by the end of it, and leave you with useful questions for the last twenty.
Where most AI ROI numbers go wrong
Four common failure modes, in roughly decreasing order of frequency:
- Unbounded extrapolation. A small number of users report saving an outlier amount of time. The calculator multiplies that across the whole company. Twelve months of "savings" are invented in a spreadsheet.
- Survivorship bias in the respondents. Only enthusiasts reply. Their answers get treated as representative. Everyone who quietly ignores the tool is invisible to the number.
- Conflict of interest. The party calculating the ROI also wants the renewal to go through. Guess which way the ambiguous decisions break.
- Wrong unit of measurement. "Hours saved" with no hourly rate, or with an hourly rate pulled from thin air. Impressive-looking numbers that nobody can turn back into dollars.
These aren't strawmen. We've seen all four in vendor decks, and we've caught ourselves drifting toward a couple of them during product design. Naming them helps.
The 2.5× rule
Here's the cap that gives this post its name.
When a user responds to a pulse and reports saving, say, "3 hours this week" on a specific tool, we have a number to extrapolate from. If they answer four pulses in a month, we have four numbers. The temptation is to take the average, multiply by 52, and call it an annual savings figure. Nobody working with real data thinks this is a good idea.
What we actually do:
- We set a 2.5× ceiling on the ratio between reported hours saved and expected baseline hours of work. In plain English — if someone claims to have saved 20 hours a week on a task that realistically only took 8 hours before AI, we cap the saving at 8 × 2.5 = 20 of their original 8-hour context, then don't extrapolate further. The cap's the cap.
- We never report extrapolated numbers without a clearly-labelled reported figure next to them. The reader always sees both. If the two diverge a lot, the reader knows to be careful.
- We factor in the org's response rate. A 4.5/5 average from 80% of your team is a different fact from a 4.5/5 from 15%. We surface both.
- We require minimum aggregation thresholds. Fewer than five respondents in a segment and we don't report on it. Too few data points is how you get noise masquerading as insight.
Is 2.5× the right cap? It's defensible. It's approximately what the literature on self-reported time savings in consulting engagements converges on for "outlier but plausible" results. It's also a number we can justify in a room — which is the point. If you want to argue for 2× or 3×, we'll happily have that conversation and adjust the model. What we won't do is claim a uniformly applied 10× because it makes the deck look better.
The blended hourly rate — yours, not ours
The other number that turns hours into dollars is the hourly rate. We could make one up. Most ROI calculators effectively do — they multiply by "the average knowledge worker salary" and hope you don't notice.
Our approach is boring: you set it. Every GAiGE organisation configures a blended hourly rate at setup, fully-loaded (salary plus on-costs, divided by productive hours per year). It's your number. If your auditor has a different view, you can argue that with them. We just do the multiplication.
Same principle applies to the tool cost side of the equation. You enter what you actually pay per seat per month, including any negotiated discounts and committed spend. We don't pull list prices from the vendor's website. Your numbers, your truth.
Aggregation, because honest answers require anonymity
One more principle — and it's a product decision as much as a methodology one. Pulse responses are always aggregated before any human inside your organisation sees them. Your CTO sees "the team rated Copilot 4.1/5". They never see "Sarah rated Copilot 2/5 on Tuesday, and wrote 'honestly I prefer Claude'."
Why does this belong in a methodology post? Because the quality of the data depends on it. If your team suspects their answers are attributable — even just a little — the honest ones stop responding, and you're left with the corporate-approved vibes. We've watched this happen to engagement surveys for twenty years. We don't intend to repeat the mistake with AI.
A worked example
A 120-person firm. They pay for 80 ChatGPT Enterprise seats at $60/month each. Blended hourly rate entered as $110.
Over 8 weeks, 74 of those 80 users answer at least one pulse (response rate: 92.5% — healthy). The average self-reported hours saved is 2.7 per user per week. That number passes the 2.5× check against the baseline of "around 7 hours of content-writing or summarisation work per week."
Do the math:
- Annualised hours saved: 74 users × 2.7 hrs × 48 weeks ≈ 9,590 hrs
- Dollar value at $110: ≈ $1.05M
- Annual tool cost: 80 × $60 × 12 = $57,600
- Ratio: roughly 18× return
That ratio is indicative — headline-safe for a board paper, with the methodology attached. It's also not a number we asked you to take on faith; every component is shown, and you can stress-test any of them.
Where you should push back
Three places where this methodology has genuine limits, and we'd rather be open about them than quiet:
- Self-reported time is self-reported. People overestimate savings on tasks they enjoy and underestimate on tasks they dread. The 2.5× cap bounds the overestimate; nothing fully fixes the rest. Pair our numbers with your intuition.
- Non-responders are a real problem. A 90% response rate is gold. A 50% response rate is concerning — the other 50% may be the people struggling the most. We show response rate prominently so you can't miss this.
- ROI is a proxy, not the goal. Sometimes a tool costs more than it saves, and you keep it because it opens up something you couldn't do before. The GAiGE tells you the numbers; you decide what the numbers mean.
In short
Cap the extrapolation. Use your numbers, not our numbers. Always show response rate. Always aggregate. Be honest about the limits. Present the reader a number they can defend, not a number that will embarrass them in six months.
If that sounds boring — it is, slightly. The glamorous AI ROI claims are the ones that don't survive contact with the CFO. The durable ones tend to look a bit like this.
Questions about the methodology? Our team would rather hear them than not — drop us a line.
Want more like this?
The AI Impact Brief. AI Impact Measurement news when it's fresh. One click to unsubscribe.