Glossary

The Context Advantage glossary

Plain-English definitions for the agentic AI, data engineering, and AI governance terms used throughout the book. Every term links to the chapter that goes deeper.

118 of 118 terms

A

Agent

A system that can answer, reason, plan, use tools, and take action — not just chat.

Agentic AI

AI that acts on real systems, not just answers questions.

Action Control

Policies that decide what an agent is allowed to do, not just what it can see.

Access Control

Policies that decide what data an identity is allowed to see.

Audit Trail

A structured record of what the agent did, why, and with what evidence.

Approval Queue

A shared inbox where humans review agent actions before they execute.

Action Inventory

The structured list of every action an agent can take, with risk tier and owner.

Action Gateway

An in-line service every agent action passes through for policy checks and logging.

Agent Graph

A multi-agent topology where agents call each other freely. Powerful and easy to overuse.

B

Business Memory

The layer that captures how your company defines its world, beyond just schemas.

Business Rule

A condition or exception the business enforces, like discount eligibility.

Backpressure

Slowing the producer when the consumer cannot keep up.

Blast Radius

How many people, records, or systems an action affects if it goes wrong.

C

Caching

Reusing a previous answer to avoid recomputing it.

Catalog

An inventory of where data lives and who owns it.

Cascading Models

Trying a cheap model first and escalating to a larger one only when needed.

Citation

A reference the agent shows so users can verify where an answer came from.

Choice

The ability to swap models, tools, or vendors without rewriting your system.

Confidence

How sure the agent is about its answer, ideally surfaced honestly.

Context

The meaning, definitions, and relationships an agent needs to answer correctly.

Context Engineer

A data professional who owns meaning the way engineers own code.

Context Layer

The queryable layer above storage and compute that holds business meaning.

Context Window

The maximum amount of text a model can read in a single request.

Control

The set of policies that make agent behavior safe and predictable.

Cost

The total expense of running AI — tokens, tools, retries, and infrastructure.

Cost-Aware Architecture

A design that treats cost like latency: a first-class requirement.

Compositional Tools

Small, reliable tools that agents combine to do bigger work.

Calibrated Confidence

Confidence derived from system evidence, not from raw model probabilities.

D

Data Product

A piece of data offered with quality, ownership, and a clear contract.

Delta Lake

An open table format that adds reliability features on top of Parquet files.

Drift

When model or data behavior changes over time, often silently.

Determinism

Same input, same output, every time. Hard for LLMs, important for audits.

Downgrade

A policy outcome that lets a smaller, safer version of the action proceed.

E

Embedding

A numeric representation of text used for similarity search.

Evaluation

Measuring whether the agent is giving correct, safe, useful answers.

Eval Harness

A test suite that scores AI quality across many examples.

Eval-Driven Development

Building AI features by writing evaluations first, then improving the system.

F

Fine-Tuning

Adapting a base model to your domain by training it on your data.

G

Gateway

A single entry point that all AI calls go through for security, routing, and logging.

Glossary

A list of business terms with agreed, simple definitions.

Governance

The set of rules that decide what is allowed in the data and AI stack.

Grounding

Anchoring an answer in real, citable data instead of model memory.

Guardrails

Code-enforced rules that block dangerous inputs or outputs.

Golden Set

A curated set of examples used as the truth for evaluation.

H

Hallucination

When a model produces something that sounds confident but is not true.

Human in the Loop

A design where humans review, approve, or take over agent actions.

Handoff Contract

The typed schema two agents agree on when passing work between them.

I

Iceberg

An open table format widely adopted for lakehouses.

Ingestion

Bringing raw data into your platform from source systems.

Inference

Asking a model to produce an output, as opposed to training it.

Idempotent

Safe to retry — the same call twice gives the same result.

Interface Dividend

The compounding velocity gain a platform earns by routing change through stable interfaces.

K

Knowledge Graph

A structured map of concepts and how they relate, used to enrich context.

L

Lakehouse

A storage architecture that combines lake flexibility with warehouse reliability.

Lineage

The path a piece of data took from source to answer.

LLM

A large language model trained to generate human-like text.

M

MCP

Model Context Protocol — a standard way for agents to talk to tools and data.

Metric

A defined business number, like revenue or churn.

Metrics Layer

The place where metrics are defined once and calculated consistently.

Model Routing

Sending each request to the smallest model that can handle it well.

Model Choice Matrix

A map of tasks to the right model size for each one.

Multi-Agent

A design where multiple specialized agents work together.

O

Observability

The ability to see what your AI system is doing in production.

Ontology

A structured map of business concepts and how they relate.

Open Format

A data or model format anyone can read without vendor lock-in.

Open Interface

A standard API anyone can implement, reducing lock-in.

Orchestration

Coordinating multiple steps, tools, or agents into one workflow.

P

Parquet

An open columnar file format widely used in data platforms.

Permission

A rule about what an identity is allowed to see or do.

Policy as Code

Writing governance rules as software the system enforces automatically.

Policy Engine

A service that evaluates policy-as-code at runtime.

Prompt

The instructions and context sent to a model in a single request.

Prompt Engineering

Designing prompts so models behave well for a given task.

Pipeline Pattern

A multi-agent topology where agents run in a fixed sequence — the safest default.

Provenance

The chain of sources behind an agent's answer, surfaced so users can verify it.

R

RAG

Retrieval-Augmented Generation — letting the model read your data before answering.

Reasoning

A model thinking through steps before producing a final answer.

Reference Architecture

A shared pattern teams follow so each new project does not reinvent the wheel.

Retrieval

Looking up relevant content to give a model better context.

Retry

Re-running a failed model call. Quietly expensive at scale.

Rollback

Turning off or reverting an agent quickly when something goes wrong.

Router

A cheap classifier that decides which model or tool handles a request.

Replayable

A system where you can re-run history to debug or recover.

Reversibility

Whether an action can be undone cleanly, and how quickly.

S

Schema

The shape of a table — columns and types.

Semantic Cache

A cache keyed by meaning, not exact text, that reuses similar answers.

Semantic Layer

The layer that maps business meaning to underlying data.

Semantic Search

Finding content by meaning, not exact keywords.

Sensitivity Tag

A label that marks data by how sensitive it is, like PII or financial.

Signal

A measurable indicator used to monitor quality, cost, or safety.

Source of Truth

The one place a fact is considered authoritative.

Stewardship

The ongoing care and ownership of a data asset.

Stream

Continuous data that arrives event by event rather than batch by batch.

Structured Retrieval

Looking up answers in tables, metrics, or APIs before falling back to text search.

Synthetic Data

Generated data used for training or testing when real data is scarce.

Side Effect

Something an action changes in the outside world, like sending an email.

Supervisor Pattern

A multi-agent topology where one agent routes work to specialists.

T

Telemetry

Data emitted by a system that lets you observe its behavior.

Throttling

Limiting the rate of requests to control cost or load.

Token

The unit a model reads or writes — roughly a few characters.

Token Budget

A target for how many tokens a request is allowed to use.

Tool Call

When the agent invokes an external tool, like a query or API.

Trace

The full record of a single agent run, end to end.

Trust Path

The sequence of checks an agent action passes before it executes.

Trusted Agent Architecture

A nine-step reference flow every production agent follows.

Trusted Source

A dataset agreed upon as authoritative for a given metric.

Termination Condition

The explicit rule that ends an agent loop. Without one, loops burn money.

Trust Signal

A visible cue (source, definition, confidence, limit) that helps users decide when to trust an agent.

U

Unit Cost

Cost per unit of value delivered, like cost per resolved ticket.

Unity Catalog

Databricks' governance layer; one example of a managed catalog.

V

Validation

Checking that an input or output meets the rules before using it.

Vector Database

A store that lets you search by embedding similarity.

Vector Search

Finding similar content using embeddings.

Vendor Lock-In

A situation where switching providers is expensive or slow.

Versioning

Tracking changes to data, models, or definitions over time.

W

Warehouse

A structured store optimized for analytics queries.

Workflow

A defined sequence of steps a system runs to complete a task.

Z

Zero-Shot

Asking a model to do a task it was not explicitly trained for.

Go deeper than definitions.

The book turns these terms into a working method — Context, Control, Cost, Choice.