This is Part 2 of How to Vibe a Business , a four-part series on the economics of building with AI in 2026. Part 1 covers the full landscape. This post goes deep on the single biggest financial risk: the token tax.
Everything you learned about SaaS economics in the last decade is wrong — or at least incomplete — the moment you put an LLM in the backend.
Traditional SaaS has a beautiful cost structure. You write the software once. You deploy it. Your 1,000th user costs essentially nothing more to serve than your 10th. Hosting scales in steps, but the marginal cost per user trends toward zero. That is why SaaS gross margins sit at 70-85% and investors love the model. The cost of goods sold barely moves as revenue compounds.
AI-powered SaaS does not work this way.
Your 1,000th user costs exactly as much to serve as your first. Every request hits an inference API. Every API call costs real money. The meter is always running. And the bill scales linearly with usage — not with infrastructure tiers, not with headcount, but with every single interaction your users have with the AI feature you shipped.
This is the token tax. Most solo builders discover it after launch, not before. By then the margins are already bleeding.
What the token tax actually costs
Let’s get specific. As of early 2026, here is what the major model providers charge for API access:
- Claude Sonnet 4.6: $3 per million input tokens / $15 per million output tokens
- GPT-5.4: $2.50 per million input tokens / $15 per million output tokens
Output tokens are 3-6x more expensive than input tokens because generation is computationally harder than comprehension. And for most AI features — summarization, content generation, analysis, code assistance — the output is the product. That is the side of the meter you care about.
A typical AI feature uses 1,000 to 4,000 tokens per request. A short summary might be 500 tokens. A detailed analysis runs 2,000-3,000. Code generation or long-form writing can blow past 4,000 easily.
Here is the math for one user on a $29/month subscription, assuming average output of 2,000 tokens per request at $15 per million output tokens:
Light usage — 50 requests/month:
50 × 2,000 tokens × $15/1M = $1.50/user/month
That is 5.2% of $29 in API costs. Manageable. You barely notice it.
Heavy usage — 200 requests/month:
200 × 2,000 tokens × $15/1M = $6.00/user/month
Now it is 20.7% of revenue. You notice it.
Power usage — 500 requests/month:
500 × 2,000 tokens × $15/1M = $15.00/user/month
That is 51.7% of revenue going to a single line item in your COGS. And you still have hosting, payment processing, and every other cost to cover.
Plug these numbers into the Cost Per Unit Calculator. Then open the Profit Margin Calculator and watch what happens to your gross margin as API costs scale.
The margin erosion curve
The numbers above are per-user averages. In practice, your user base is a distribution: some users are light, some are heavy, and a small percentage are power users who consume an outsized share of your API budget. The problem is that your heaviest users are usually your most engaged — the ones you cannot afford to lose and the ones most likely to churn if you throttle them.
Here is the margin curve for a $29/month AI SaaS product, assuming 2,000 average output tokens per request:
| Usage level | Requests/mo | API cost/user/mo | Remaining revenue | Gross margin |
|---|---|---|---|---|
| Light | 50 | $1.50 | $27.50 | 94.8% |
| Medium | 200 | $6.00 | $23.00 | 79.3% |
| Heavy | 500 | $15.00 | $14.00 | 48.3% |
| Power | 1,000 | $30.00 | −$1.00 | −3.4% |
Read that last row again. A power user on your $29/month plan costs you $30 in API fees alone. You are paying a dollar a month for the privilege of serving them. And that is before hosting, Stripe’s 2.9%, customer support, or any other cost.
A product that looks like a 79% gross margin business at median usage becomes a negative-margin business for your most engaged cohort. If 10% of your users are power users and 30% are heavy users, your blended gross margin might land around 55-60% — well below the 70-85% SaaS benchmark that investors expect and your financial model probably assumes.
Why this is different from bandwidth costs
Experienced SaaS operators will object: “Infrastructure costs have always scaled with usage. We handled this with CDN tiers and compute autoscaling.” True. But the cost dynamics are fundamentally different in three ways.
First, the slope is steeper. Bandwidth costs per additional user in traditional SaaS are fractions of a cent. Token costs per additional user are dollars. The variable cost component of COGS goes from negligible to dominant.
Second, the cost does not decrease with scale. Traditional infrastructure has economies of scale — reserved instances, committed-use discounts, amortized fixed costs. AI inference pricing is flat. Your millionth API call costs the same as your first. There is no volume tier that drops output token pricing by 80%. (Provider pricing has fallen over time as competition increases, but you cannot plan a business on the assumption that your largest cost input will get cheaper.)
Third, the cost correlates with value, not with waste. In traditional SaaS, high-bandwidth users were often serving static assets — you could cache, compress, and CDN your way to lower costs. AI usage cannot be cached in the same way because every request is unique. The user asking your AI to analyze their specific data set requires a fresh inference call. The cost is the product.
a16z identified this problem early. Their research found that gross margins for AI companies often fall in the 50-60% range — well below the 60-80%+ benchmark for comparable SaaS businesses. That analysis was written when model pricing was higher than today, but the structural point holds: AI inference is a variable cost that behaves like cost-of-goods in a physical product business, not like infrastructure in a software business.
The SaaS CFO put it more bluntly, calling the current shift a “SaaSpocalypse” — the collision between AI’s promise and the margin compression it creates for anyone building with it.
You are not running a zero-marginal-cost software business anymore. You are running something closer to a services business with software distribution. The sooner you internalize that, the sooner you can price and architect accordingly.
Three cost reduction levers
The token tax is real, but it is not fixed. Smart architecture decisions can cut your per-request cost by 60-70% without degrading the user experience. Three levers, in order of impact:
1. Model routing
Not every request needs your most powerful model. A simple classification task, a short summary, or a formatting operation does not require Sonnet-class reasoning. Route simple tasks to cheaper, faster models and reserve expensive models for complex work.
Practical example: an AI writing assistant that uses Claude Haiku ($0.25/$1.25 per million tokens) for autocomplete suggestions and Claude Sonnet ($3/$15) for full document analysis. If 70% of requests are autocomplete and 30% are analysis, your blended output cost drops from $15/M to roughly $5.40/M — a 64% reduction.
2. Prompt caching
If your system prompt, few-shot examples, or context documents are the same across many requests, you are paying full price to re-send the same tokens every time. Anthropic’s prompt caching offers a 90% discount on cached input tokens — you pay a small write cost once and then read at 10% of the standard input price for subsequent requests.
For a typical application where 80% of input tokens are reusable context (system prompt + few-shot examples + shared documents), prompt caching cuts your input costs by roughly 70%. Input is cheaper than output to begin with, but on high-context applications where you are sending 3,000-5,000 input tokens per request, that savings adds up.
3. Batch processing
Anthropic offers 50% off for batch API requests that do not require real-time responses. If your product includes any background processing — nightly report generation, bulk analysis, content preprocessing — batching those jobs halves the cost.
Combined impact. Take the heavy user from the table above: 200 requests/month, $6.00 in API costs. Apply model routing (64% reduction on 70% of requests), prompt caching (70% reduction on input costs), and batching (50% off on 20% of requests that are background jobs). The blended cost drops from $6.00 to roughly $2.10/user/month. That moves gross margin from 79.3% back up to 92.8% — firmly in healthy SaaS territory.
None of this is free to implement. Model routing requires classification logic. Prompt caching requires consistent context architecture. Batching requires async job infrastructure. Budget 1-2 weeks of engineering time for the full stack. But the ROI is immediate and compounds with every user you add.
How to model this before you launch
If you are building with AI in the backend, here is the minimum viable financial model you need before you ship:
Step 1: Estimate per-user API cost. Count the AI-powered features in your product. For each one, estimate requests per user per month and average tokens per request. Multiply by your provider’s per-token pricing. Use the Cost Per Unit Calculator — plug API costs into the variable cost field alongside payment processing and any other per-user costs. This gives you your true cost to serve one customer.
Step 2: Check your margins. Take the per-user cost from step 1 and your planned price. Open the Profit Margin Calculator and verify that your gross margin holds above 70% at median usage and above 50% at heavy usage. If heavy-user margin falls below 50%, you need to either raise your price, implement the cost reduction levers above, or add usage caps.
Step 3: Model break-even with real COGS. Traditional SaaS break-even models treat COGS as near-zero and focus on fixed costs. Your model cannot do that. Use the Break-Even Calculator with your actual variable cost per user — including API costs — as the variable cost input. The break-even point will be higher than a traditional SaaS model predicts, and your runway to get there will be shorter.
Step 4: Stress-test for success. This is the one founders skip. Model what happens if your product works — if you get 10x the users you planned for. In traditional SaaS, 10x users is pure upside. In AI SaaS, 10x users is 10x API costs. Use the Subscription Revenue Calculator to model revenue at scale, then compare it against the API cost at scale. The gap between those two lines is your actual margin at scale, and it might be a lot thinner than you assumed.
Run all four calculators. Print the results. Tape them to your monitor. These numbers are not a one-time exercise — they are the operating constraints of your business.
The bottom line
The token tax is not a bug. It is the defining economic characteristic of AI-powered software in 2026. Every time your user clicks a button that triggers an LLM call, you are spending real money. Not amortized-over-millions money. Not fraction-of-a-cent money. Real, variable, per-request, adds-up-fast money.
This does not mean you should not build with AI. The products you can create with LLMs in the backend are genuinely better than what was possible three years ago. But it means you need to treat your COGS the way a physical product business does — model it, monitor it, optimize it, and price for it.
If your financial model has a line for “hosting: $200/month” and no line for “AI inference: $X per user per month,” your model is wrong. If your pricing assumes 80% gross margins without accounting for token costs at heavy usage, your pricing is wrong. If you have not stress-tested what happens to your margins when your product succeeds and usage spikes, you are building on assumptions that will break at exactly the moment you can least afford them to.
Model it now. The Cost Per Unit Calculator and Profit Margin Calculator take five minutes. The alternative is discovering your margin problem in month four of a product that is growing faster than your bank account can sustain.
Next in the series: Part 3 — Pricing AI Products covers how to set prices that account for the token tax without killing conversion. And Part 4 — SaaS Metrics After Launch covers the dashboard you need once the product is live.
Covers SaaS metrics, subscription economics, and startup growth. Turns unit economics into decisions founders can act on.
View profile →Spotted an error or have feedback? Get in touch.