← Back to blog
Cost

AI Cost Will Become a Data Platform Problem

Token bills do not stay on the AI team's desk. They migrate to yours.

9 min readby Team BricksNotes
enterprise AIagentic AIdata professionalsAI costdata platformmodel choiceFinOps
01

The quarterly review nobody saw coming

An e-commerce company ships a smart product-search feature in March. It uses a frontier model for every query. The team is proud.

By June, the feature is wildly popular. By July, the AI bill is fourteen times the original estimate. By the August quarterly review, the CFO is asking the data platform leader — not the AI team — why the cost line keeps doubling.

The platform leader did not own the feature. They now own the bill.

02

The real problem

Models get cheaper per token. Usage grows faster than prices fall. A feature that cost two hundred dollars a month in pilot routinely costs twenty thousand a month in production, and nobody is quite sure why.

AI cost is variable, per-request, and grows with success. It behaves nothing like the SaaS and infrastructure lines next to it on the bill.

03

The Context Advantage view

Cost is the third of the four C's. It rarely sinks an AI program in year one. It almost always shows up in year two — and by then, the architecture decisions that made cost expensive are baked in.

Cost discipline is a platform pattern, not a model choice. It belongs in the same conversation as latency, reliability, and observability.

04

In plain language

Every AI call burns tokens. Tokens cost money. Bigger model, longer context, more steps in an agent loop — more money. Multiply by traffic, by retries, by background jobs, and the bill compounds in ways nobody modeled in the pilot.

Bending the curve is mostly about not paying for what you do not need: smaller model when smaller is fine, cached answer when the question is the same, shorter context when the long one was lazy.

05

A real-world example: customer support at an e-commerce brand

Forty percent of incoming support questions were repeats: where is my order, return policy, sizing chart. The team had been sending all of them to a frontier model. After routing those to a small model with a cache, the cost of those conversations dropped to under five percent of the previous spend, with no measurable change in resolution rate.

The expensive model still handled the genuinely complex cases. Routing was the only change.

06

A practical way to act this week

Put cost on the same dashboard as latency for every AI feature you run. Model, tokens in, tokens out, latency, cost per call. If you cannot see it, you cannot bend it.

Pick your most expensive feature. Ask one question: could a smaller model answer eighty percent of these queries acceptably? If yes, route them. The first week will pay for the work.

07

What this means for data professionals

Platform engineers own the routing, caching, and observability. Data engineers own the cost telemetry as a first-class dataset. AI engineers design the agent loops to fit inside budgets. Data leaders own the per-team budgets and the alerting policy — before the bill, not after.

08

The common mistake

Treating every AI request as equal. Sending easy questions and hard questions to the same frontier model because it 'works.' It works until the bill arrives.

09

The better way

Build a routing layer in front of your models. Classify each request — easy, medium, hard. Route accordingly. Cache aggressively at the semantic layer, not just the HTTP layer. Cap context windows by default. Set per-team budgets with alerts at fifty, seventy-five, and ninety percent. Review the top five most expensive features every month, the same way you review the top five slowest queries.

"AI cost is not a finance problem. By the time finance sees it, the architecture has already decided what you owe."
Mini checklist

Try this at work

  • Add model, tokens, latency, and cost to one shared dashboard.
  • Classify requests as easy / medium / hard before they hit a model.
  • Route easy and medium traffic to smaller models by default.
  • Cache at the semantic layer, not just the HTTP layer.
  • Cap context windows; widen them only when justified.
  • Set per-team budgets with alerts before the bill arrives.
  • Review the top five most expensive AI features every month.

This is one of the ideas explored deeper in The Context Advantage by Team BricksNotes — a living book for data + AI professionals learning how Context, Control, Cost, and Choice shape the agentic AI era.

Explore the book →
Over to you

If your AI bill doubled next quarter, would you know which feature did it within an hour — or within a fiscal cycle?

This is a companion post to The Context Advantage — a living book by Team BricksNotes.