AI API costs have surprised many businesses — a workflow that seemed affordable at test volume becomes expensive at production volume, or a new feature generates far more API calls than anticipated. Building a cost forecast before committing to a plan or launching a new AI workflow takes thirty minutes and prevents the budget surprises that create friction between technical and finance teams. Here is how to do it accurately.
The Forecast Formula
LLM cost forecasting has three inputs: average tokens per request (input + output), expected daily request volume, and the model’s per-token pricing. Monthly cost = (avg input tokens × input price + avg output tokens × output price) × daily requests × 30.
Example: A customer service chatbot averages 800 input tokens (system prompt + conversation history + user message) and 200 output tokens per interaction, running Claude Haiku at $0.80/M input and $4/M output. At 500 interactions per day: (800 × $0.80/1M + 200 × $4/1M) × 500 × 30 = ($0.00064 + $0.0008) × 500 × 30 = $0.00144 × 15,000 = $21.60 per month. At 5,000 interactions per day, $216 per month. At 50,000 per day, $2,160 per month. The formula scales linearly, so the volume estimate is the most important variable to get right.
Estimating Volume Accurately
Volume estimates are where forecasts most often go wrong. Teams typically underestimate by building their estimate around average usage rather than peak usage, by forgetting to include development and testing traffic, and by not accounting for the growth trajectory of the workflow over the forecast period. A more conservative and accurate approach: estimate the realistic daily peak volume (the busiest day, not the average day), add 20% for development and testing traffic, and use a six-month growth projection if the workflow is expected to scale.
LLM Cost Forecast Worksheet
| Variable | How to Estimate |
|---|---|
| Avg input tokens | Count system prompt + typical user message + context |
| Avg output tokens | Generate 20 sample outputs, measure average length |
| Daily requests | Peak day estimate + 20% buffer |
| Model pricing | Check provider docs — input and output priced separately |
Scenario Modelling
Build three scenarios: conservative (50% of expected volume), base (expected volume), and optimistic (200% of expected volume if the workflow succeeds beyond expectations). The optimistic scenario is often the most financially important to model — a workflow that takes off faster than anticipated can generate API costs that exceed its budget allocation by 4× if the upside was not planned for. Having a cost ceiling and a scaling plan (when to switch to a cheaper model, when to implement caching, when to consider self-hosting) before launch means you are not improvising under financial pressure if the workflow succeeds.
Refreshing the Forecast
A pre-launch forecast is the starting point, not the final word. Review actual costs against forecast weekly in the first month after any new AI workflow launches. Update the forecast model with actual token counts (which often differ from estimates) and actual request volumes. After four weeks of real data, your forecast accuracy improves significantly and you can set reliable monthly budget expectations. Make the forecast refresh a standard part of your monthly AI operations review alongside the actual cost reporting.
Estimating Token Counts for New Workflows
Before a workflow is built, you cannot measure its actual token usage — but you can estimate it. For input tokens: write your system prompt and count its tokens (OpenAI’s tokenizer tool, available free online, counts tokens for any text). Estimate the average user message or input document length for your use case — if you are processing customer emails, sample ten real emails and measure their average token count. Add a context buffer if your workflow maintains conversation history. For output tokens: generate five to ten representative outputs using a prototype prompt and measure their average token length. These measurements give you realistic per-call estimates you can plug into the cost formula.
The biggest forecasting error is using an average input length when inputs are highly variable. If your workflow processes documents ranging from one-page briefs to fifty-page reports, the average is not representative — the long documents drive a disproportionate share of cost. For variable-length inputs, forecast separately for short, medium, and long inputs, then apply the expected distribution of input types to produce a blended average. This approach typically produces forecasts within 20% of actual costs, compared to average-only approaches that can underestimate by 50% or more for long-tail heavy distributions.
Scenario Planning for Growth
A cost forecast that only models current volume gives you a point estimate but misses the most important planning question: what happens when this workflow scales? Build three scenarios into every cost forecast: current volume, 5× current volume, and 20× current volume. The 5× scenario is what happens if the workflow succeeds and the team starts using it more broadly. The 20× scenario is what happens if you integrate it into a customer-facing product or automate it across a high-volume process. The 20× scenario often reveals when you hit a cost inflection point where optimisations become economically mandatory — where caching, model downsizing, or prompt compression move from nice-to-have to essential for the workflow to remain economically viable at scale.
Build your cost forecast before writing a single line of code for any new AI workflow. The thirty minutes it takes is the cheapest form of infrastructure planning available, and it prevents the budget surprises that create friction between technical and finance teams when a workflow unexpectedly becomes expensive at scale.
Updating the Forecast With Real Data
A pre-launch forecast becomes a benchmark once the workflow is live. Compare actual costs against forecast weekly for the first month. When actual costs deviate significantly from forecast — more than 20% in either direction — investigate why. Inputs are longer than estimated, outputs are more verbose than expected, there is more retry traffic than anticipated — identifying the source of the variance improves future forecasts and often surfaces optimisation opportunities. The forecast update process is also when you discover whether your input distribution assumptions were correct, and correcting them early prevents longer-term budget surprises as the workflow’s volume grows.
Build the cost forecast for your next planned AI workflow before starting development. Use the OpenAI tokenizer to count your prompt tokens, sample real inputs to estimate input lengths, and generate prototype outputs to measure output lengths. The result is a realistic budget estimate rather than a post-hoc surprise.
Cost Optimisation Scenarios in the Forecast
Your cost forecast becomes significantly more useful if it includes optimisation scenarios alongside baseline projections. Alongside the baseline (current approach, current model), model three optimisation options: prompt compression (estimated 20–30% input token reduction), model downsizing (estimated cost reduction if a cheaper model meets quality threshold), and batch processing (50% cost reduction for non-time-sensitive requests). For each optimisation, estimate the implementation effort and the resulting cost reduction. This pre-launch analysis tells you which optimisations are worth building into the initial implementation versus which can be deferred until the workflow is running and generating real cost data.
Pre-launch optimisation analysis prevents the common pattern of launching with an expensive configuration, discovering high costs in the first month, and then spending engineering time optimising what could have been built more efficiently from the start. A thirty-minute optimisation analysis before implementation is more efficient than a week of post-launch optimisation work on a live production system.
Integrating LLM Cost Into Product Pricing
For businesses building AI-powered products or features, LLM API costs need to be integrated into product pricing and unit economics — not treated as a separate overhead that is absorbed without analysis. Calculate your LLM cost per user per month (or per transaction, or per output, depending on your pricing model) and ensure it is appropriately reflected in your pricing. Common failure mode: a business prices an AI feature on customer value without calculating the underlying LLM cost, then discovers that the feature is profitable at small scale but loss-making at scale because LLM costs grow linearly while pricing was set assuming cost leverage. Build the unit economics model before pricing: cost per user at 1,000 users, at 10,000 users, at 100,000 users. Identify the break-even volume and the optimisations required at each scale point. This analysis, done before launch, prevents the pricing regret that comes from discovering unfavourable unit economics after you have committed to a price in the market.
Budgeting for AI Infrastructure Beyond API Costs
LLM API costs are the most visible AI infrastructure cost, but not the only one. A complete AI infrastructure budget includes: API costs (the most variable component), AI tool subscriptions (flat monthly fees), orchestration and observability tooling (Portkey, Helicone, LangSmith — usage-based or subscription), compute costs for any self-hosted components, and engineering time for building and maintaining AI workflows. The engineering time component is often the largest cost and the one most frequently omitted from initial budget estimates. A workflow that costs $50 per month in API fees but required 40 hours to build and requires 2 hours per month to maintain has a true cost significantly higher than the API fees alone. Build the full-stack cost estimate — including engineering time at your team’s fully-loaded rate — before evaluating whether an AI workflow is economically justified.