Resources / why-now

What humans were silently fixing

Inside almost every enterprise, there's a layer of data work that doesn't appear on any system diagram and doesn't have a name. Excel exports get reconciled by hand. A senior analyst who's been at the company for twenty years quietly knows that the customer-status code A in the CRM means active but in the legacy billing system means archived, and translates between them in her head when she runs reports. A junior analyst pastes data from three sources into a spreadsheet and applies VLOOKUP rules that exist nowhere except in his fingers.

This is the shadow data layer. It's invisible because nothing in the formal IT estate measures it, and invisible because the people who do it experience it as just doing my job, not as compensating for structural data deficiencies.

For thirty years, the shadow data layer kept everything working. The contradictions in the underlying systems were never resolved — the systems still disagreed about what a customer's address was, what a contract's status was, what a batch's purity reading was — but the contradictions were silently fixed at every output: every report, every dashboard, every regulatory filing, every customer interaction. The fix happened in the human layer, between the systems and the consumer of the data.

Then AI arrived and stopped doing the silent fix.

Why this worked at human scale

The shadow data layer worked because humans are good at exactly the kind of resolution it requires: they can hold ambiguity, ask follow-up questions, apply judgment, escalate when uncertain, and take responsibility for the output. None of these are model behaviors.

When the senior analyst sees three values for a customer's address, she notices, picks one based on contextual cues — which system was updated most recently? which one matches what the customer told us last week? — and produces a report with one address. The report doesn't say the customer has three different addresses. It says the customer has this address. Whoever reads the report consumes the resolved value.

Multiply this by every person doing data work in the company, every output, every day, for thirty years. The aggregate effect is enormous. The data underneath is inconsistent; what reaches consumers is consistent. Compliance reviews see consistent data. Executives see consistent data. Customers see consistent data. The systems were never reconciled, but the outputs always were.

This was sustainable as long as humans were the bottleneck and the resolution layer. It cost a lot — half of analyst time, sometimes more, was reconciliation work — but it was paid invisibly, not on any line item.

What changed

AI replaces the human pre-processing layer, and the silent fix stops happening.

When a company swaps an analyst's Excel reconciliation for an LLM-generated dashboard, the reconciliation work doesn't get reassigned — it disappears. The model reads the source systems directly, sees three values for the customer's address, and produces an output. The output is whatever the model decided: an averaged blend, the most-recent value, a hallucinated synthesis. The output looks confident. It also goes straight to the consumer with no resolution layer in between.

The same pattern repeats with RAG systems retrieving from contradictory documents, agent workflows querying multiple databases, and Skills procedures running over unreconciled tables. The shape is the same: the human resolution layer is removed, and what was always true about the underlying data — that it disagreed with itself — becomes visible.

Why hallucinations are a diagnostic signal

This reframes what AI hallucinations actually are. They're not the model misbehaving. They're the model correctly reporting that the data it was given contains contradictions it can't resolve. The error appears at the model layer; the cause is at the data layer; the diagnosis is the substrate is inconsistent.

The traditional response is to suppress the symptom: better prompts, more grounding, fine-tuning to make the model less likely to invent facts. None of these address what the symptom is signalling. They make the model better at hiding the inconsistency, which means making the model better at recreating the silent fix the human layer used to provide.

The alternative response is to read the signal: the data has been like this all along; AI is just the first technology that doesn't paper over it; the way to make AI work is to fix the data, not the model.

What this looks like in practice

For most enterprises, the truth-exposure dynamic shows up in a recognisable pattern:

The pilot worked. The model performed brilliantly on the curated, hand-cleaned demo dataset.
Production didn't. The model started hallucinating, refusing, or producing answers that contradicted the dashboards.
The instinct was to blame the model. Try a different model. Add more RAG. Fine-tune. None of it worked, because the model wasn't the problem.

What had happened is that the demo dataset was implicitly a reconciled dataset — someone, somewhere, did the silent fix before the data got to the model. In production, no one did. The model was reading what the underlying systems actually contained, and what the underlying systems contained was contradictory by construction.

The right response isn't to escalate model sophistication. It's to do, structurally and once, what the shadow data layer was doing tactically and continuously: resolve the contradictions in the substrate rather than at every output. The cost angle of not doing this is covered in The rework tax — the verification overhead that absorbs the AI productivity gain.

How ConnectSphere applies this

ConnectSphere's normalization engine produces a substrate where the resolution that used to happen in the shadow layer is built into the data shape. Every regulated fact has one canonical value, with provenance back to source. The Semantic Dictionary records the resolution decisions explicitly — not in a senior analyst's head, but in version-controlled metadata that downstream consumers (humans, BI tools, LLMs) all read the same.

The shadow data layer doesn't disappear; it gets externalised. What was implicit, fragile, and dependent on continued human attention becomes explicit, testable, and durable.

What was always there

The data quality problem in your enterprise isn't new. It's been there for decades. What's new is that there's now a technology — AI — that doesn't do the unpaid, invisible labor of hiding it.

This isn't AI's fault. It also isn't a problem you can solve by making AI better at the hiding. The hiding was the workaround; the underlying issue was always the substrate.

AI didn't break your data. It just stopped covering for it.

What humans were silently fixing

Why this worked at human scale

What changed

Why hallucinations are a diagnostic signal

What this looks like in practice

How ConnectSphere applies this

What was always there

Related

The AI adoption gap

Why audit-readiness has to be structural

Ready to Map Your Fragmented Landscape — and See the Path to One Logical Truth?