
Your company Is sitting on a data gold mine
with Mehul Shah, Aryn
Your company Is sitting on a data gold mine
Show Notes
Mehul Shah helped build OpenSearch and AWS Glue - two of the foundational tools that enterprises use to manage their data at scale. Now he is running Aryn with a co-founder he has worked alongside for 12 years, focused on the problem that comes before most AI strategies: 90% of enterprise data is unstructured, locked in documents, PDFs, scanned forms, and emails that AI systems cannot read without help.
Aryn's name comes from a Gaelic and Hebrew word meaning "tall mountain" - a deliberate metaphor for the challenge enterprises face. The data mountain is there. Every company is sitting on it. Almost none of them have figured out how to summit it.
The Invisible AI Advantage
Most people think of AI as something they interact with - a chatbot, a prompt, a voice assistant. Mehul's most important insight is that the highest-value AI in enterprises runs invisibly. Users do not know it is there. It simply makes their workflow better.
The concrete example: Aryn works with specialty insurance underwriters who spend their days manually cutting and pasting data from submission packets into spreadsheets and policy admin systems. The AI does not replace them - it does the cutting and pasting invisibly in the background. The underwriter opens their SharePoint and the spreadsheet is already populated. No chatbot interface. No new behavior required. The work is just done. Aryn calls this meeting humans where they are in the workflows they already have.
The Last 15% Problem
Getting an AI system to 80% accuracy is relatively easy. General-purpose frontier models are good enough to do it out of the box. The problem is that 80% accuracy is not useful in an enterprise context.
If you are an insurance estimator quoting a $10 million construction project and your AI gets the cost estimate wrong by 15%, you eat $1.5 million. If you are a legal analyst looking for case precedent and your AI misses the relevant filing, you walk into court without a strategy. The penalty for failure in enterprise AI is orders of magnitude higher than in consumer applications.
Mehul's framing: enterprise use cases require 95-99% accuracy. Getting from 80% to 99% is where all the real work lives. It requires deep specialization in specific document types, industry-specific training, and relentless edge case handling. This is what separates enterprise AI companies from impressive demos - and it is why most enterprise AI projects stall after the demo phase.
Verifiability as the AI Success Criterion
Mehul identifies a single underlying criterion that predicts where AI succeeds and where it fails: can a human quickly and independently verify the output?
- Writing and summarization: You read it. You know immediately if it captured the right points. High verifiability - AI has been successful here for years.
- Code generation: Experienced developers can scan generated code and spot errors immediately. High verifiability - GitHub Copilot adoption is real and growing.
- Protein structure prediction: Scientists have independent experimental methods to validate AI predictions. High verifiability - AlphaFold has been transformative.
- Self-driving vehicles: Verification requires that no one dies. Extremely low verifiability in real-world conditions - took 15 years from first Waymo to commercial deployment.
The implication for founders: build AI applications where the output can be verified quickly by the person using it. Stay away from applications where verification is expensive, slow, or requires specialized expertise to confirm. The former category is where AI creates compounding value. The latter is where projects die in pilot.
Human in the Loop vs. Human on the Critical Path
Mehul draws a precise distinction that most AI discussions collapse: humans should be in the loop for decisions, but removed from the critical path of execution.
The example he uses: imagine a maritime operator whose job is to identify distressed vessels from satellite images all day. After eight hours of scanning blue tiles, fatigue and attention decay make them error-prone. AI can scan continuously without fatigue and flag the candidates. The human's job becomes: look at what the AI identified and confirm whether it is actually a distressed vessel. The human is still in the loop - they make the final decision - but they are no longer in the critical path of reviewing every tile. The expensive, error-prone part is automated. The consequential judgment stays human.
This is the right frame for most enterprise automation. The goal is not to replace human judgment - it is to redirect human attention toward the decisions that actually require it.
Q&A
What does Aryn do and who are its primary customers?
Aryn helps enterprises extract structured, actionable information from unstructured documents - PDFs, scanned forms, emails, contracts, insurance submissions, legal filings, real estate offering memos. The AI pulls specific fields with high accuracy and routes them into the downstream systems (SharePoint, Salesforce, policy admin systems) that enterprise workflows already use. Primary customers are insurance MGAs, legal firms, construction estimators, and logistics companies - anywhere that knowledge workers spend hours manually reading and transcribing documents.
Why is 80% AI accuracy insufficient for enterprise use cases?
Because the penalty for the remaining 20% of errors is often catastrophic. A construction estimator who quotes $7 million on a $10 million job eats the $3 million gap. A legal analyst who misses key precedent walks into court without a strategy. A medical diagnostic AI that is wrong 1 in 5 times creates liability, not value. Enterprise AI requires 95-99% accuracy for most production use cases. Getting from 80% to 99% is where all the real engineering work lives - and where most enterprise AI projects fail.
What is the invisible AI model and why does it matter?
Most AI products require users to change their behavior - open a new interface, learn a new workflow, adapt to a chatbot. Aryn's model is different: the AI runs in the background of existing workflows, and users simply notice that tedious work is already done when they open their usual tools. No chatbot. No new interface. No training required. This matters because enterprise AI adoption fails when it adds cognitive load or requires behavior change. The path of least resistance is AI that blends into what users already do.
How should founders decide which AI use cases to build first?
Start with use cases where the output is quick and easy to verify independently. Writing, summarization, code generation, document extraction - these all pass the verifiability test because the person using the AI can immediately tell if it got it right. Avoid use cases where verification is expensive, slow, or requires specialized expertise to confirm. The former category compounds in value as AI improves. The latter stalls in pilots because the organization cannot build confidence in the results.
What is the distinction between human-in-the-loop and human on the critical path?
Human-in-the-loop means a human makes or confirms the final decision. Human on the critical path means a human is doing the execution work between decisions. Aryn's model removes humans from the critical path (reading every document, extracting every field) while keeping them in the loop (reviewing flagged items, making underwriting decisions). This matters because the execution work is where human error, fatigue, and inefficiency accumulate. The judgment work is where human intelligence is irreplaceable.
What is Mehul's prediction for where AI goes in the next 10 years?
AI processing speed is improving 10x per year - faster than Moore's Law. Today AI can read all of Shakespeare in under 10 minutes. In a few years, under 10 seconds. In 10 years, it will be possible to sweep all the PDFs in the world's largest document repository in under a day. The implication: the bottleneck will shift from model capability to data quality and structure. Companies that have organized and structured their unstructured data will have a compounding advantage over those that have not - because when the model is powerful enough to read everything, having clean data to read will be what separates insight from noise.