All Episodes
The $1.5 Trillion AI Problem Nobody Talks About - Until Now
November 30, 202500:54:51

The $1.5 Trillion AI Problem Nobody Talks About - Until Now

with David Carmel, DataRockit

The $1.5 Trillion AI Problem Nobody Talks About - Until Now

0:000:00

Show Notes

David Carmel is not a trained data scientist. He is a former corporate lawyer who noticed a $1.5 trillion problem hiding in plain sight: most enterprises cannot actually use AI because their data is either trapped in legacy mainframes, scattered across unstructured cloud storage, or duplicated in ways nobody has audited. He built DataRockit to solve the plumbing problem before the AI strategy problem. His claim: data is just data until it is clean enough for AI.

The Wall Street Journal calls this technical debt $1.52 trillion per year. The US government alone spends $80 billion annually on legacy system maintenance that could be replaced. 75% of the Fortune 500 still run mainframes. 99% of the world has unstructured cloud data problems. And 90% of enterprise data is unstructured. The reason most AI deployments underperform is not the model - it is the fuel going in.

The Victorian Plumbing Problem

David's most useful metaphor: AI adoption is blocked by Victorian era plumbing. The pipes are old, incompatible with modern infrastructure, and nobody wants to tear out the walls to replace them. The result is enterprises doing the splits - one foot in mainframe legacy systems, one foot in cloud, and the two systems do not communicate. Data scientists spend their time functioning as potato peelers: cleaning dirty data by hand instead of extracting insight from it.

DataRockit's core operation: clients upload their data - legacy mainframe exports, unstructured cloud files, or bloated LLM datasets - and receive it back in any format they choose, cleaned and AI-ready, in hours rather than years. The dashboard accepts data from multiple continents simultaneously, can combine 40 mainframes from 300 different locations into a single clean output, and scales from 5 gigabytes to petabytes. Clients also receive a full index showing what was done, with optional recommendations - but DataRockit does not tell clients how to use their data. Their job is to put it in the best position.

Three Layers of AI Data Readiness

David identifies three distinct data problems that each block AI differently:

  • Legacy mainframe data: Structured but siloed in systems designed before the internet. Expensive to maintain, incompatible with modern APIs, and unable to communicate with cloud infrastructure. The US government spends $80 billion annually on this problem.
  • Unstructured cloud data: 90% of all enterprise data. Documents, emails, PDFs, recordings, images - files that exist in cloud storage but have never been organized, deduplicated, or formatted for machine consumption. DataRockit finds companies with the same data duplicated in 1,500 different locations.
  • Over-provisioned LLM data: Enterprises feeding large volumes of unoptimized data to LLMs, using enormous compute to compensate for data quality problems that could be solved upstream. David's framing: you are dropping bunker busters when you need a scalpel.

Outcome-Linked Pricing for Enterprise

David is candid that pricing is a work in progress - and his model reflects a philosophy rather than a fixed structure. The target: seven-figure annual revenue per client. The structure: an upfront fee to begin the engagement, followed by a percentage of the savings the client realizes from eliminating their legacy technical debt and cloud waste, plus a monthly support fee for ongoing data hygiene.

The logic is deliberate alignment. If DataRockit participates in savings, both sides are incentivized to maximize the outcome. The vendor is not paid for the work - they are paid for the result. David also insists on the right to share the client as a case study: the ability to showcase what was done together is part of the deal. For clients that cannot accept outcome-linked pricing, DataRockit will negotiate a fair flat rate - but the outcome model is the preference because it puts both parties' incentives on the same side.

Distribution-First Enterprise GTM

Getting a direct meeting with a Fortune 500 CIO as an unknown startup is nearly impossible. David's solution: find people who already have a seat at the table and give them a reason to introduce DataRockit. These are value-added resellers, managed service providers, and VC firms with enterprise portfolios - people who serve the target buyer and are actively looking for the next differentiating product to bring to their clients.

David is also piloting an unconventional model: working with VC firms not for funding, but as distribution partners. The VC has portfolio companies and LP relationships with the banks and enterprises DataRockit needs to reach. The VC can earn meaningful revenue as a reseller. Neither party needs to close a funding round for the relationship to create value. This approach treats distribution as a co-investment rather than a sales channel - and offers a model for other B2B founders trying to reach enterprise buyers through trusted intermediaries.

Q&A

What exactly does DataRockit do and how does the onboarding work?

DataRockit transforms legacy mainframe data, unstructured cloud data, and over-provisioned LLM datasets into clean, AI-ready data. Onboarding is intentionally simple: clients upload their data through a dashboard and specify what format they want back. DataRockit handles the transformation, returns a full index of what was processed, and the client receives data that is immediately usable by AI systems. No multi-year migration project. No consultant engagement. Hours, not years.

Why is dirty data such a big problem for AI specifically?

AI models learn from and operate on data. When the input data is duplicated, inconsistently formatted, unindexed, or trapped in formats the model cannot parse, the model's outputs degrade - it hallucinates, produces incorrect results, or simply cannot access the information you need it to use. Most enterprise AI failures are diagnosed as model failures but are actually data failures. The model is doing exactly what it is supposed to do with the fuel it is given.

What is the $1.5 trillion technical debt number and where does it come from?

The Wall Street Journal cites $1.52 trillion as the annual cost of technical debt in the US, of which legacy data infrastructure is a major component. The US government alone spends $80 billion per year maintaining mainframe systems. Fortune 500 companies - 75% of which still run mainframes - spend billions each year on workarounds rather than solutions. Chase alone spends close to a billion dollars annually on this problem. The total represents ongoing spending, not a one-time fix cost.

What is DataRockit's pricing model and how does it structure value sharing?

DataRockit uses a three-part structure: an upfront setup fee, a percentage of the savings the client realizes from eliminating legacy technical debt and cloud waste, and a monthly support fee for ongoing data hygiene. The target deal size is seven figures of annual revenue per client. The outcome-linked component is designed to align both parties - DataRockit earns more when the client saves more. Clients that cannot accept outcome-linked pricing can negotiate a flat rate, but David views the shared-savings model as the right structure for enterprise relationships.

How does DataRockit get in front of Fortune 500 buyers without direct connections?

Distribution through trusted intermediaries. DataRockit looks for value-added resellers and service providers who already serve their target enterprises and want a differentiating product to bring to clients. They are also piloting a distribution model with VC firms - not for capital, but for access. The VC has enterprise relationships and can earn reseller revenue; DataRockit gets warm introductions at the executive level. Neither party needs a funding transaction for the relationship to create value.

What mistakes are companies making with AI adoption right now?

Getting too focused on AI capabilities before fixing the data that AI will consume. David's framework: first clarify where you want to go, then identify what is standing in the way. In most cases, what stands in the way is not the AI model - it is the quality, structure, and accessibility of the underlying data. Companies spend hundreds of millions on AI implementations that underperform because they never fixed the Victorian plumbing. The fix is often faster and cheaper than the AI investment itself.

Links & Resources