Why AI gets it wrong and how data products fix that

Srinivasa Mathkur

•

June 30, 2026

•

Why AI gets it wrong and how data products fix that

Turning institutional knowledge into deployable context infrastructure that persists beyond individual human expertise

The Problem With Knowing Everything

A 58-year-old man walks into an emergency room with fatigue and mild chest tightness. His ECG is borderline. His troponin reads technically within the normal range, but near the upper threshold.

Two doctors look at the same results.

The first has worked in this hospital for twelve years. She knows that three patients with near-identical presentations turned out to have myocarditis last month, an atypical pattern the textbooks associate with younger patients. She knows the troponin assay this lab runs has poorer precision at low concentrations (a documented limitation of that specific platform ), so a borderline number carries more uncertainty than it suggests. She admits him.

The second doctor is just as qualified. He calculates a low-risk score by standard protocol and discharges the patient with outpatient follow-up. By the guideline, the decision is defensible.

Same data and same training. But one doctor understood the context around the data while the other treated it as if that patient’s data existed in a vacuum.

This is a clear failure of context, and it is the same pattern that plays out every day inside enterprise AI deployments.

AI at scale in real enterprise deployments fail because it lacks a bounded, governed context. The same situated understanding that separates an experienced practitioner from a qualified novice.

The Illusion of Intelligence

Frontier models are extraordinary. They write code, summarize contracts, and reason through multi-step problems. Yet enterprises keep discovering the same frustration: general intelligence does not automatically produce correct answers in specific domains.

The numbers make this concrete. Industry estimates put the global cost of AI hallucinations at $67.4 billion in 2024. A Stanford HAI and RegLab study, peer-reviewed in the Journal of Empirical Legal Studies, found that legal AI tools built for a single domain still hallucinate between 17% and 34% of the time on challenging queries. Documented court cases involving AI-generated errors grew from around 10 in 2023 to 37 in 2024, and 73 from the first five months of 2025.

These are context disruptions: the model has no dynamic, governed, structured understanding of the specific domain it is reasoning about. Training data baked into model weights cannot be updated, audited, or scoped to a changing business problem. While a context-native data product the AI model is referencing can be securely updated.

The takeaway: Hallucination rates track context quality. Bigger models inherit the same blind spot.

Why More Data Makes It Worse

The instinctive fix for hallucination has been to feed the model more context: more tables, more schema, more documentation, via retrieval-augmented generation (RAG, where a model retrieves relevant documents at query time and inserts them into the prompt), few-shot prompting, or semantic search. Each compensates for missing context by adding tokens.

That scales badly on both accuracy and cost. Without a pre-defined data context, an AI agent must reconstruct understanding at inference time, triggering background queries just to build a single response. Every hop adds tokens, latency, and a new surface for error.

Pre-defined semantic context layers have cut input token consumption by over 90% compared to dynamic schema discovery, according to early enterprise deployments. For a company running thousands of agent interactions a day, that gap determines whether the AI program is economically sustainable at all.

Token cost is the smaller problem. Researchers found that model accuracy dropped significantly when the relevant information sat in the middle of a long context window. Flooding a model with unfiltered data collapses the signal-to-noise ratio and produces confident, plausible, wrong answers.

Chart showing model accuracy dropping in the middle of a long context window, illustrating the lost-in-the-middle effect in LLMs. — The trap of long context windows and how that drops model accuracy | Image: The Modern Data Company

The model becomes the second doctor: aware of everything, equipped for nothing.

The takeaway: Adding context without governing it does not fix hallucination. It often makes the model more confidently wrong.

What Does Bounded Context Mean

"Bounded context" comes from domain-driven software design, where it describes a scope of meaning within which a term has one agreed-upon definition. Revenue means something specific in billing and something slightly different in sales. Within each boundary, the term is unambiguous.

Applied to AI, a bounded context means the curated, governed, semantically consistent set of data and domain understanding that an AI agent needs to reason accurately about a specific business problem, and nothing more.

This is about equipping AI correctly instead of restricting it. The experienced ER physician is no less informed than a colleague who has read every textbook. She is differently informed, and that difference is what makes her useful.

Visual explaining bounded context as a governed semantic layer of entities, rules, relationships, constraints, semantics, and governance. — What is bounded context and why is it essential for navigating the context window | Image: The Modern Data Company

Customer data management platforms and data catalogs have pursued pieces of this (semantics, governance, quality, lineage) for decades. What a data product adds is purposeful packaging: a single, versioned, use-case-specific unit with a defined interface, built for direct consumption by an AI agent. Knowledge graphs and fine-tuned models each solve part of the problem, but neither combines all the pillars into a unit that persists independently of any model consuming it.

The takeaway: Bounded context is not a new idea. The data product is the first unit designed to deliver it directly to an AI agent.

What are the Six Pillars of a Data Product: The Contextual Infrastructure for AI

A data product is not a dashboard or a table. It is a governed, reusable, semantically enriched data asset built to solve one business problem, with every structural element it needs packaged together.

Semantics define what a field or metric means in business terms: not just that a column is called net_revenue, but which adjustments apply and what the fiscal calendar basis is.
Governance sets stewardship accountability, access policy, and regulatory constraint. In regulated industries, this is the difference between a defensible decision and a liability.
Data quality tells the AI what it can trust. A field with 40% null values should not carry the same weight as one that passes full quality checks.
Provenance and lineage cover two distinct concerns. Provenance records origin and custody: where the data came from and who collected it. Lineage maps the transformation path: which jobs touched it, and when.
Domain taxonomy defines how entities relate: how a SKU maps to a product line, and how a product line rolls up to a category. Without taxonomy, the AI cannot generalize across entities correctly.
Operational metadata is the catalog-facing layer: descriptions, tags, usage metrics, and certified-status flags that make an asset discoverable.

An AI agent working against a well-defined data product does not reconstruct meaning at inference time. The context window stays compact. The answers stay grounded.

The takeaway: A data product is data plus understanding, the six pillars that turn a raw table into something an AI agent can reason about correctly.

How DataOS Enables Context-native Data Products

These six pillars are easy to describe and hard to operationalize at scale, which is the problem DataOS is built to solve. Instead of treating semantics, governance, and lineage as separate efforts bolted onto a warehouse, DataOS packages them into a single, versioned data product through its Data Product Hub.

Diagram showing DataOS packaging semantics, governance, quality, lineage, taxonomy, and metadata into a single data product. — Enabling enterprise context with DataOS | Image: The Modern Data Company

Each product built on DataOS carries its own semantic layer, so:

An agent never has to guess what net_revenue includes.
Governance and access policy are enforced at the product layer itself (instead of being passed to the model as free-text instructions that it might ignore).
Quality certification and lineage update automatically as the product evolves, so an agent knows how current and trustworthy a field is before relying on it.

The takeaway: DataOS turns the six pillars of data products into a deployable unit, so bounded context becomes infrastructure rather than tribal knowledge.

The Token Economics of Getting Context Right

Pre-packaged data products invert the cost problem. Context discovery happens once, at build time, not on every query, which is what produces the token reductions of up to 90%.

There is a second-order saving too.

Data profiling, understanding freshness, null distributions, and schema drift, is expensive, and every agent repeats it independently unless the work is shared.

A single data product lets one profiling run serve every consuming agent, and amortizes its build cost across every BI tool and workload that reuses it.

The takeaway: Bounded context is cheaper at scale because the cost of defining it is paid once and reused everywhere.

How to Build Bounded Context Correctly

Context-native data products have to be built right to left: starting from the specific business problem, then working backward through data reality to validate the semantic model before the product reaches any agent.

Right-to-left process diagram showing context-native data products starting from a business problem, then data reality, then semantic validation. — Reverse engineering context-native data products | Image: The Modern Data Company

The doctor analogy holds here as well. What makes the ER physician effective is not just years in medicine. It is because she has worked in this ER and kept her understanding current. A data product that was accurate six months ago, built on a source system that has since changed schema, is not just unhelpful, but fatal to downstream users.

But a data product does something the doctor cannot: it does not retire. When the twelve-year veteran moves on, her contextual understanding goes with her. A data product's governance updates, quality certifications, and domain refinements stay layered into the product itself, building an institutional intelligence that survives turnover and compounds with every quarter.

Takeaway: Bounded context has to be earned and maintained like any other asset, but unlike institutional knowledge in a person's head, it does not disappear when someone leaves.

The Deeper Principle

The fundamental challenge of deploying intelligence in any complex domain is transforming general knowledge into situated competence: the ability to reason accurately within specific constraints.

Humans build that through domain experience and institutional memory. AI builds it through well-designed data products.

Diagram showing how human domain experience, AI situated competence, and DataOS infrastructure create compound institutional value. — Building institutional intelligence with compounding context in DataOS | Image: The Modern Data Company

Before choosing which model to deploy or which agent framework to adopt, ask the prior questions:

Do we have the data products that would make any of those choices effective?
Have we built the bounded contexts our agents need to be competent rather than merely confident?

The models will keep improving. The data products have to keep up with them, and that underlying infrastructure is what determines whether enterprise AI keeps the ROI promise.

Topics:

Data Products

AI-Ready Data