A pilot that worked beautifully — until it didn't
A mid-sized bank launches an AI assistant for relationship managers. In the pilot, three handpicked questions land perfectly. Leadership greenlights a wider rollout.
Two weeks later, the assistant tells a relationship manager that a top-twenty client has 'low engagement.' The manager calls the client to check in. The client is, in fact, in active onboarding for a new product line and has been in three meetings that week.
The prompt was fine. The model was fine. The word 'engagement' meant one thing in the marketing system, another in the CRM, and a third in the data warehouse. The agent picked the wrong one.
The real problem
Prompt engineering has a glamour problem. It looks like the lever, so teams pull it. When the answer is wrong, they rewrite the prompt. When it is still wrong, they rewrite it again, with more emphasis and more rules.
By the fifth rewrite, the prompt has become a small novel — and the answer is still wrong. Because the answer was never about the prompt. It was about the meaning the prompt was sitting on.
The Context Advantage view
This is a Context failure, full stop. The first of the four C's exists precisely so that Control, Cost, and Choice have something solid to stand on. When Context is missing, every other discipline gets harder.
Meaning is upstream of prompting. Fix the meaning and prompts get shorter. Skip the meaning and no prompt on earth will save you.
In plain language
An AI model is a very fast guesser. It does not know your business. It only knows what you put in front of it: column names, sample rows, glossary entries, prompts, and whatever scraps of documentation made it into the context window.
If your business has two definitions for 'customer,' the model has to pick one. It will pick the one that looks most plausible based on the words it can see. Sometimes that matches the answer you wanted. Often it does not.
A real-world example: healthcare claims
A health insurer builds an AI assistant to help claims agents resolve complex cases. A new agent asks: how many open claims does this member have?
'Open' is defined three ways across the company. The claims system marks a claim 'open' until it is paid or denied. The appeals system marks it 'open' until the appeals window closes. The fraud system marks it 'open' until the investigation is complete.
The AI does not ask which definition the user wants. It picks one. The agent acts on the wrong number. The member gets a confusing call. Trust drops. The pilot quietly ends.
A practical way to act this week
Run a meaning audit on your three most-used business terms. Customer. Active. Open. Revenue. Engagement. Pick whichever feel most central to your domain.
For each term, find every system that uses it and write down the definition that system applies. If the definitions differ — and they will — call a one-hour meeting with the owners and converge on a single definition. Put it in code. Make it the only one your AI features can reach.
You will not finish in a week. You will start in a week. That is the point.
What this means for data professionals
Data engineers and analytics engineers carry most of this work. You already curate dimensions, metrics, and reference tables. Treat that craft as the foundation of every AI feature, not as a side artifact for dashboards.
Governance teams: meaning is your jurisdiction now, not just access. AI leaders: every model evaluation should include a meaning test, not just an accuracy test.
The common mistake
Treating 'bad answers' as a model problem. Swapping models. Tuning prompts. Adding retrieval. None of those fix a definition that was never agreed on in the first place.
The better way
Put a meaning review in front of every AI feature launch. One page. Three questions. What nouns does this feature use? Where is each one defined? Who owns each definition? Until those three questions have clean answers, the feature does not ship.
It feels slow. It is faster than the alternative — which is shipping, breaking trust, and starting over.
"Prompts are how you ask. Meaning is what you are asking about. Most failures come from skipping the second one."
Try this at work
- List the top five business terms used in your AI features.
- Map each term to every system that defines it today.
- Call a one-hour meeting per disputed term and converge.
- Move the agreed definition into versioned code.
- Gate every new AI feature behind a one-page meaning review.
- Add a meaning test alongside your model accuracy tests.
- Retire duplicate definitions on a public deprecation schedule.
This is one of the ideas explored deeper in The Context Advantage by Team BricksNotes — a living book for data + AI professionals learning how Context, Control, Cost, and Choice shape the agentic AI era.
Explore the book →Which word in your business has the most quietly disagreed-on definition — and what is it costing your AI features?