Bonus 2

10 Real-World Case Studies

Short, practical stories from teams that lived through the 4 C's. Names are removed. The shape of each problem is what matters.

AirlinesControl

Rebooking at the speed of disruption

Situation. A global carrier handled 12,000 rebookings on a typical disruption day. Each one took an average of 14 minutes of agent time. Customer satisfaction during disruptions was their weakest scorecard metric.

Approach. They built an agent that read the disrupted booking, checked customer tier, fetched alternative inventory, applied policy, and proposed three options. Action policies required the customer service agent to confirm before issuing tickets.

Result. Average handling time dropped to 4 minutes. NPS during disruptions improved by 18 points. The agent operated on rung 3 of the human review ladder (one-click approval).

Lesson
Speed at the moment of disruption is a control problem, not a model problem. The right action policy unlocks the speed.
RetailContext

One definition of available inventory

Situation. A home goods retailer launched an AI assistant for store managers. Three teams used three different definitions of available inventory. The assistant gave three different answers.

Approach. They paused the rollout, built a small semantic layer with twelve agreed definitions, and made the assistant consume them on every answer.

Result. Adoption tripled within a month. Inventory-related complaints fell sharply. No model change required.

Lesson
Most AI accuracy problems are actually meaning problems. Fix the definitions and the model looks smarter for free.
InsuranceControl

Claims triage with rung-aware automation

Situation. A health insurer wanted an agent to triage claims faster without losing the human touch on sensitive cases.

Approach. They used the Human Review Ladder. Simple claims auto-approved. Mid-tier required one-click approval. Sensitive claims always reached a human regardless of rules.

Result. Throughput up 4x on simple claims. Complaint volume on sensitive claims dropped because the agent never auto-processed them.

Lesson
Different decisions deserve different rungs. Treat the ladder as a living document.
BankingCost

A model router that paid for itself in a week

Situation. A bank's customer support agent used the most expensive model for every question. 78% of questions were simple status lookups.

Approach. They added a cheap router. Simple questions went to a small model. Complex ones escalated to a larger model. They added a semantic cache for the top 200 questions.

Result. Cost per resolved chat dropped 74%. Latency improved. The big model handled fewer, harder questions and looked smarter than before.

Lesson
Routing is the cheapest model in your stack. Build it first.
HealthcareContext

A clinician assistant designed around all four C's

Situation. A regional healthcare provider built an assistant that suggested differential diagnoses and summarized patient history.

Approach. Context: a clinical ontology. Control: prescribing required physician confirmation. Cost: a router between general and clinical models. Choice: open formats for evaluation and audit.

Result. The assistant launched on schedule and passed an external regulatory audit on first attempt.

Lesson
Designing for all four C's from day one is slower at first and far faster after launch.
HRControl

Performance review summaries with action control

Situation. An HR team wanted to draft performance review summaries from manager notes, ratings, and goal data.

Approach. They built an agent that drafted summaries but could never publish without manager approval. Action control blocked any direct write to the HR system from the agent.

Result. Time per review dropped by 60%. Compliance was satisfied because no agent action reached an employee record without a human in the loop.

Lesson
Drafting is automation. Publishing should require a human. Action control draws the line.
Supply ChainCost

Forecast assistant with structured retrieval

Situation. A consumer goods company's forecasting agent burned tokens by pulling pages of policy and notes into every prompt.

Approach. They moved to structured-first retrieval. The agent queried a structured planning database first and only fell back to documents when needed.

Result. Tokens per answer dropped 60%. Accuracy improved slightly because the agent stopped grabbing irrelevant text.

Lesson
Naive retrieval is the most expensive way to be approximately right.
ManufacturingChoice

Open lakehouse, faster engine swap

Situation. A manufacturer ran analytics on a proprietary stack. When a faster query engine appeared, the migration looked like a year of work.

Approach. Years earlier, they had standardized on an open table format. They pointed the new engine at the existing data.

Result. Migration took weeks, not a year. The dividend of openness paid back the small ongoing tax of years prior.

Lesson
Openness is a small tax now and a large dividend later.
Customer SupportContext

Trust signals that doubled adoption

Situation. A SaaS company shipped a support assistant. Quality was good but adoption was flat. Users did not trust it.

Approach. They added three signals to every answer: the source link, the definition used, and the freshness of the data.

Result. Adoption doubled in six weeks. Accuracy was unchanged. Trust, not intelligence, had been the bottleneck.

Lesson
Users trust agents that show their work more than agents that are right.
E-commerceChoice

A vendor swap over a long weekend

Situation. An e-commerce company's AI workflows were tightly coupled to a specific model vendor. A better model launched from a competitor.

Approach. They had built every AI workflow against a thin internal SDK. The vendor was a plug-in behind the SDK.

Result. The swap took a long weekend. Their nearest competitor needed six months to do the same.

Lesson
Choice is a design discipline. It costs a little every week and pays back enormously the day the market shifts.