Inside the $50M Mission to Fix Clinical Evidence

Show Notes

Only 14% of daily medical decisions are backed by high-quality evidence. Brigham Hyde will tell you that is actually one of the higher estimates. He has spent his career at the intersection of clinical pharmacology, data science, and healthcare technology, and what he found when he looked at the evidence gap was not a data problem - it was an automation problem. The data exists. Hundreds of millions of patient timelines have been accumulating in electronic health records for 20 years. The bottleneck is the conversion step: turning raw data into peer-reviewed-quality evidence quickly enough to matter at the point of care.

Atropos Health was born out of Stanford, where Brigham's co-founder Dr. Haw built the original "green button" - press it and run a real-time study on patient data. What used to take a research team two to five months now takes minutes. The company has 894 million patient timelines across its evidence network, has produced over 100,000 novel studies, and is now deploying Chat RWD - a generative AI interface that lets clinicians ask questions and get evidence-backed answers in the time it takes to type a text message. This is not AI replacing doctors. It is AI giving doctors the evidence they have never had before.

The Evidence Gap and Why It Exists

Clinical trials are expensive, and most are funded by pharmaceutical companies with specific commercial interests - which means they focus on narrow populations and the healthiest patients to avoid confounding effects. The result: 70% of existing trials exclude patients with comorbidities like diabetes, obesity, and heart disease, which describes roughly 60-70% of actual US patients. The evidence base that physicians rely on was largely generated on what Brigham calls "white male triathletes at Memorial Sloan Kettering" - not the patients in most exam rooms.

Atropos addresses this by using observational research - the real-world data in EHRs - to generate evidence at scale. The hard part is not the data; it is doing the analysis correctly. Causal inference methods, propensity score matching, and other statistical techniques are required to isolate the effect of a treatment from the confounding variables in messy real-world data. Do it wrong and you publish findings that do not replicate or, worse, lead clinicians to the wrong conclusion. Atropos automates the rigorous methodology so that high-quality evidence can be produced in minutes rather than months, at a cost fraction of a traditional clinical trial.

5 Frameworks from This Episode

1. The Data-to-Evidence Conversion Model

Having data is not the same as having evidence - the gap between them is the correct application of statistical methodology (causal inference, propensity matching, bias controls)
A study done incorrectly on the right data will produce the wrong conclusion and may not pass peer review - which means it cannot inform clinical decisions
Atropos automates the methodology, not just the analysis - it produces studies that meet the evidentiary standards physicians are trained to use
The history of medicine moved from having no data (pre-1970s) → having bad data → having good data → now needing to convert good data to evidence at scale: that is the current step

2. The Federated Data Model

Atropos never moves patient data - the technology is brought to the data at the health system where it is stored and appropriately stewarded
This solves the silo problem without creating new privacy or liability risks: the data never leaves the institution, but the analysis can cross 20+ data sets and 894 million patient timelines
Cloud computing made this possible - health data is no longer locked in hospital basement servers; it is on private and hybrid clouds where computation can happen
For any startup building on sensitive data (health, financial, legal), the federated model is worth examining: the value is in the analysis layer, not in owning the data

3. The Shift in Glass Theory

Each major technology platform shift is a "shift in glass" - where your eyeballs go: laptops to phones (iPhone), browsers to apps (App Store), and now apps to agents
If an agent can go hit all your apps in the background and return the answer, why would you ever open a browser or app again?
In healthcare: doctors today log into Epic, look up papers, type notes. The agentic shift means they talk to an agent, the agent handles documentation (ambient AI), and calls specialized evidence agents in the background
Stanford's agentic tumor board is an early real deployment: clinicians talk in a Teams chat, invite specialized agents to interpret images or pull evidence, without leaving their workflow

4. The Jury Approach to AI Consensus

No single LLM is right about everything - the emerging best practice is to ask multiple models the same question and measure consensus (or divergence) across answers
Atropos published a benchmark of 3,000 clinical questions run through ChatGPT, Claude, Gemini, Perplexity, and their own service - measuring "answered with evidence" as the quality metric
The startup opportunity: build the orchestration layer that knows which model or agent to call for which part of a domain-specific workflow, and how to synthesize consensus back to the user
Big tech cannot easily dominate this layer because it requires vertical domain knowledge - this is where niche expertise becomes a durable competitive advantage

5. Drug Repositioning - Finding New Uses for Proven Drugs

Many drugs that are already approved and proven safe in humans may work for diseases they were never tested against - but identifying those matches historically required a 15-year drug development process
With real-world data at scale, you can run a study asking: in patients who took this drug for one condition, did the rate of a second unrelated condition go up or down?
Brigham's gout example: Stanford researcher Dylan Dodd found two antibiotics had opposite effects on gout via gut microbiome; Atropos confirmed it in millions of patients in a single day
The FDA's fast-track pathway for orphan drugs points toward where this could go: if a drug is proven safe and we can generate strong real-world evidence of efficacy quickly, why wait 15 years?
Every Cure (founded by David Feigenbaum, who used this approach to find a cure for his own rare disease) is advocating for this model at the policy level

Founder Experiment: Map Your Domain's Evidence Gap

Step 1 - Identify the decisions in your domain that are made on instinct rather than data. In healthcare it is 86% of clinical decisions. In your vertical - legal, finance, operations, education - what percentage of important decisions are backed by rigorous evidence vs. experience and convention? The gap is the market.

Step 2 - Find where the data already exists but is not being converted into evidence. Atropos did not create new patient data; it automated the methodology to convert existing EHR data into actionable studies. In your domain, what data is being collected but never analyzed in a way that informs decisions?

Step 3 - Benchmark current AI tools against your domain's quality standard. Run 20-30 representative questions from your vertical through ChatGPT, Claude, Gemini, and any specialist tools. Score each answer against your domain's quality criteria (not just "does it sound right?"). This is your competitive landscape map and your product gap in one exercise.

Step 4 - Design the federated version of your data model. Before building a data aggregation business, ask: can you bring computation to the data instead of moving the data? Federated models reduce regulatory friction, accelerate enterprise sales, and create network effects without the liability of owning sensitive records.

Step 5 - Identify a "green button" moment in your vertical. At Stanford, the green button was: press here, get a study. What is the single most valuable on-demand output your users would use daily if it were available instantly? Build that one thing first, prove it works, then expand - exactly as Atropos did from Stanford to 894 million patient timelines.

Glossary

Real World Evidence (RWE): Clinical evidence derived from real-world data - EHRs, claims data, patient registries, wearables - rather than controlled clinical trials. RWE captures the full diversity of patients and conditions, including the comorbidities that trial exclusion criteria filter out.

Evidence Gap: The deficit between the volume of clinical decisions made daily and the volume of high-quality evidence available to inform them. Only 14% of daily medical decisions are backed by high-quality evidence; Atropos is building the infrastructure to narrow this gap.

Causal Inference: A set of statistical techniques used in observational research to isolate the effect of a specific treatment or intervention from confounding variables in real-world data. The goal: show that an outcome changed because of the treatment, not because of something else correlated with it.

Federated Data Model: An architecture where computation is brought to data at its source rather than data being moved to a central repository. The data never leaves the institution; only results and models travel. Atropos uses this model to access 894 million patient timelines without creating a data warehouse.

Chat RWD: Atropos's generative AI interface built on a RAG (Retrieval-Augmented Generation) framework over their evidence library. Clinicians ask natural-language questions; Chat RWD surfaces relevant studies with an 'answered with evidence' confidence rating (green/yellow/red).

Drug Repositioning: The process of identifying new therapeutic uses for drugs that are already approved and proven safe. With real-world data at scale, potential repositioning candidates can be identified in days rather than the 15 years a traditional new drug approval requires.

Shift in Glass: Brigham's framework for platform shifts: each major technology transition moves where users' eyeballs go - laptops to phones, browsers to apps, apps to agents. The agentic AI shift means users talk to agents who handle all app interactions in the background.

Answered with Evidence: Atropos's quality evaluation metric for AI-generated clinical answers. A green badge means the answer is directly supported by a citable study; yellow or red means evidence is partial or missing, triggering the option to run a new on-demand study.

Prognostic (Prognos): The name Atropos gives to the PDF evidence reports they generate for clinicians. The term traces back to Duke University research in the 1970s - the original papers describing study generation from patient data - which is the intellectual origin of Atropos's entire approach.

Tools & Resources Mentioned

Atropos Health - Real world evidence platform - converts EHR data into on-demand clinical studies using automated causal inference methods. Used by health systems, academic medical centers, and life science companies.

Chat RWD - Atropos's generative AI clinical evidence interface - ask natural-language questions, get evidence-backed answers with study citations and confidence ratings.

Every Cure - Non-profit founded by David Feigenbaum advocating for drug repositioning - identifying new uses for already-approved safe drugs using real-world data. Feigenbaum famously used this approach to find a cure for his own rare disease.

Open Evidence - AI-powered medical question-answering platform for clinicians. Brigham cited it as an example of strong clinical summarization UI with significant physician adoption.

Epic - The dominant electronic health record system in US healthcare. Brigham referenced Epic's patient-facing agent announcements and its central role in the agentic AI shift in clinical workflows.

Q&A

Why does only 14% of medical decisions have high-quality evidence backing them?

Clinical trials are expensive and mostly funded by pharmaceutical companies with narrow commercial interests, which means they target specific populations and exclude the healthiest patients to avoid confounding. The result: 70% of existing trials exclude patients with comorbidities like diabetes, obesity, and heart disease - the conditions that describe 60-70% of actual US patients. The evidence base was built on an unrepresentative slice of humanity. The rest of medical decision-making falls into what Brigham calls the art of medicine: a physician's training, experience, and judgment, unaided by a relevant study.

What is the core technical insight behind Atropos Health?

The bottleneck was never the data - it was the methodology. Taking EHR data and generating a peer-review-quality study requires causal inference techniques (propensity score matching, confounder control, bias detection) that are time-consuming and technically demanding. Atropos automated that methodology so studies that previously took research teams two to five months to produce can now be generated in minutes. The innovation is not data access; it is the automation of scientific rigor at scale.

What happened when a Stanford neurologist used Atropos on a 12-year-old with distal nerve pain?

The neurology team suspected early-onset MS, which would have required a spinal tap and extensive imaging - a frightening and expensive workup for a child. One team member used Atropos to query what had happened to similar patients in the Stanford data. The study returned in four hours (now minutes): 85% of the roughly 300 matching cases had a latent viral infection within the prior two weeks. The team asked the family - yes, the child had been sick recently. They gave corticosteroids, the nerve pain resolved within 24 hours, the family avoided a $30,000 procedure, and the child went home in two days. Win for patient, physician, and healthcare cost simultaneously.

How does Atropos handle the privacy and silo problems in health data?

Atropos uses a federated model: the technology is brought to the data at each health system rather than the data being moved to a central warehouse. Patient timelines never leave the institution where they are stored. This sidesteps the privacy and regulatory complexity of data aggregation while still allowing analysis across 20+ data sets and 894 million patient timelines. Cloud computing made this feasible - health data is no longer in hospital basement servers but on private and hybrid clouds where computation can happen in place.

What is Chat RWD and how does it work?

Chat RWD is Atropos's generative AI interface built on a RAG (Retrieval-Augmented Generation) framework over their evidence library. Clinicians type natural-language questions - the same way they would text a colleague - and the system surfaces relevant studies with an 'answered with evidence' badge (green if directly supported, yellow or red if evidence is partial or missing). Atropos tested their system alongside ChatGPT, Claude, Gemini, and Perplexity across 3,000 clinical questions; the benchmark measures whether the answer is backed by a citable study, not just whether it sounds plausible.

What is the drug repositioning opportunity and why does it matter?

Thousands of drugs that are already approved and proven safe in humans may be effective against diseases they were never studied for - but without a study connecting them, clinicians cannot prescribe them for those uses and the FDA cannot approve the new indication. With real-world data, you can ask: in patients who took drug X for condition A, did the rate of condition B go up or down? Brigham's example: Stanford researcher Dylan Dodd found two antibiotics had opposite effects on gout via gut microbiome shifts. Atropos confirmed the finding across millions of patients in one day. The 15-year drug approval timeline drops to weeks if the drug is already proven safe.

What does the agentic AI shift look like inside a hospital today?

Stanford is deploying Microsoft's agentic orchestration platform with an early use case: a tumor board that runs in Teams chat. Clinicians discuss cases in the chat and can invite specialized agents - an imaging agent to read scan dimensions, an evidence agent (Atropos) to pull relevant clinical studies for a specific patient profile, a documentation agent to handle note-writing. Nobody has to log into the EHR, look up papers in a separate system, or switch contexts. Brigham also described Atropos running passively in the background during visits: absorbing the doctor-patient conversation, identifying the key clinical decisions, and surfacing relevant evidence before the physician even types a question.

Why is the evidence gap a startup opportunity rather than just a healthcare policy problem?

Because lowering the cost of generating studies creates a business, not just a public good. Life science companies currently spend two to five months and large teams to produce a single real-world evidence study for drug development. Atropos does it in a day. That automation has direct commercial value - faster R&D cycles, cheaper evidence generation, earlier signal on drug efficacy. On the health system side, every avoided unnecessary procedure (like the spinal tap) is a cost saving. The evidence gap is a policy failure that is also a massive inefficiency that technology can monetize while fixing.

What is Brigham's view on AI replacing physicians?

He is not building a physician replacement - he is building the evidence layer that physicians have never had. His framing: if you give a well-trained physician a peer-reviewed study, they know exactly what to do with it. The problem is there are not enough studies. Atropos generates those studies on demand. The physician still makes the clinical judgment; the AI provides the evidentiary foundation that was previously missing. The parallel to AI in other professions: the tool does not replace the expert's judgment, it improves the information environment that judgment operates in.