Voice AI for Early Neurological and Mental Health Warnings

Show Notes

Henry was calling his daughter twice a week after she moved out. She would tell him everything was fine. And every time, within moments of the call starting, he knew whether it actually was. Not from her words. From something in her voice - something he could not explain but could not ignore.

He asked his co-founder Jeff Adams, the engineer who built Dragon Naturally Speaking, led the team that created the Amazon Echo, and spent his early career building mathematical models to decrypt intelligence communications at the NSA: how are we doing that? Jeff's answer: I don't know. Let's go figure it out.

Eight hours later, in a bagel shop in 2016, Canary Speech was born. Today the company holds 14 issued patents across five patent families, has raised $26 million, and is deploying a system that listens to ordinary doctor-patient conversations and returns real-time assessments for up to a dozen diseases - including Parkinson's, Huntington's, multiple sclerosis, anxiety, depression, and mild cognitive impairment - without the doctor or patient doing anything differently at all.

How Canary Speech Works

Speech is the most complex motor function the human body performs. The central nervous system is driving the vocal cords, controlling the rate and power of respiration, coordinating tongue movement, cheek movement, jaw position - all of it simultaneously, all of it self-consciously controlled. When any part of that system is affected by disease - a common cold, a progressive neurological condition, elevated anxiety - it leaves a fingerprint in the process of creating language.

Canary Speech analyzes 2,590 discrete vocal biomarkers every 10 milliseconds. That produces approximately 15.5 to 16 million data elements per minute. Machine learning builds correlation models between those features and clinical diagnoses made by specialists - creating algorithms that can then be deployed across populations to detect the same signals independently, objectively, and in real time.

The delivery mechanism is ambient listening: Canary is integrated into the natural doctor-patient conversation, whether in person, on a phone call, or via video conference platforms like Zoom or Teladoc. It does not ask the patient to take a test or produce a specific audio sample. It listens to whatever they happen to be talking about - a grandson's soccer match, a recent fishing trip - and returns disease scores to a clinician's handheld device in real time, without the patient ever knowing it happened.

Frameworks from This Episode

These frameworks have been added to the AI for Founders Frameworks Library. Filter by Henry / Canary Speech to find them.

The Vocal Biomarker Stack

A five-layer architecture for building defensible AI health diagnostics from speech - from raw audio capture to clinical deployment.

•Layer 1 - Feature extraction: 2,590 vocal biomarkers analyzed every 10 milliseconds produce ~16 million data elements per minute. The signal is not the words - it is the characteristics of how speech is physically produced.
•Layer 2 - Ground truth correlation: Machine learning builds models against expert clinical diagnoses. The algorithm learns what a specialist would assess; it is then deployable without the specialist present.
•Layer 3 - Ambient capture: Speaker identification, sample quality monitoring, and audio stitching allow the system to gather usable data from natural conversation without structured testing or patient interruption.
•Layer 4 - Longitudinal delta: Canary's peroneal nets patent enables comparison of a patient's vocal biomarkers across time. A clinician can see not just where a patient is, but how they have changed since the last visit.
•Layer 5 - LLM augmentation: Combining non-specific population-level LLM data with individual-specific vocal biomarker data enhances diagnostic accuracy. The fifth patent family covers this combination.

Clinical Decision Support as the Entry Point

The right first position for AI in healthcare is not autonomous diagnosis. It is objective information that makes the human clinician more accurate - and earns trust before expanding scope.

•Primary care physicians see a patient twice a year. They are not neurologists or psychiatrists who have seen thousands of similar cases. Canary gives them specialist-level signal at the primary care level.
•The goal is the referral, not the diagnosis. If vocal biomarkers indicate mild cognitive impairment may warrant specialist attention, Canary surfaces that - the doctor decides what to do with it.
•Context matters: a patient post-cancer treatment is expected to show elevated anxiety. A skilled clinician will not immediately refer - they will manage through the expected period. Canary provides the information; the clinician provides the judgment.
•Objectivity is the key differentiator: the same algorithm returns the same result regardless of which clinician is holding the device. It standardizes the expert assessment and makes it portable.
•Full AI adoption in healthcare will take five to ten years. Clinical decision support is the trust-building first phase.

The Patent Moat in AI Health

When most AI is built on open-source models, defensibility comes from patenting specific clinical applications before others recognize the category.

•Most AI health companies are built on open-source foundations with no IP protection. Two hundred companies building patient-facing avatars share no patents among them.
•Canary's first two patents - identifying vocal biomarkers correlated with human disease, and using those biomarkers to build algorithms for routine clinical measurement - created the foundational families that all subsequent patents build on.
•Patents are now issued in the US, Europe, Asia, and multiple other jurisdictions. Each foundational patent now has four patents in its family.
•Co-founder Jeff Adams' background in creating entirely new technology categories - Dragon Naturally Speaking, Amazon Echo - meant the team understood how to patent genuinely novel technical approaches, not incremental applications.
•In a category where the underlying models are commoditizing rapidly, the defensible layer is the specific clinical architecture - the ambient listening methodology, the longitudinal delta capability, and the LLM combination approach.

The Force Multiplier Model for Healthcare AI

Henry's answer to AI replacing jobs: more holes get dug. AI expands what humans can do; it does not subtract from the number of people needed to do it.

•The wooden shovel did not reduce the number of holes dug. It increased them. The pyramids were built after that. AI is the wooden shovel.
•A primary care doctor with Canary becomes capable of detecting specialist-level neurological conditions - expanding what that single doctor can assess without requiring specialist training.
•Standardized, objective assessment tools allow the same quality of clinical information to flow across every primary care interaction, regardless of the physician's experience with the specific disease.
•The goal is not to replace the neurologist. The goal is to ensure that patients who need the neurologist actually get to them - and that patients who do not need the specialist do not consume that scarce resource.

Founder Experiment: Map Your Own Vocal Baseline

Before building anything with voice AI, you need to develop intuition for how much signal lives in ordinary speech. This experiment does not require access to Canary Speech - it builds the underlying mental model that makes any voice AI application more legible.

1Record yourself speaking for two minutes on three different occasions: once immediately after waking, once mid-afternoon during a high-focus work session, and once late evening when fatigued. Use the same topic each time - describe your current project.
2Listen back to all three recordings back-to-back. Without reading transcripts, note what changes across the three recordings. Rate of speech. Pitch variance. Energy. Pause frequency. You are doing manual feature extraction.
3Now call someone you know well - a partner, a close friend, a family member. Without asking how their day was, listen for the first 30 seconds before they answer. Note what you sense before the words tell you anything.
4Research one open-source vocal biomarker library or speech feature extraction tool (librosa in Python is a good starting point). Run your own recordings through basic feature extraction - pitch, energy, MFCC coefficients. Look at the numbers that correlate with what you heard manually.
5Identify one practical application in your own life or business where vocal state monitoring would create a meaningful feedback loop - and sketch the architecture of what that agentic system would look like using a platform like Canary Speech's upcoming developer interface.

Why this matters: Henry's founding insight was that he already knew how to read his daughter's vocal state - he just could not explain how. The experiment above is designed to make that implicit knowledge explicit. Once you can articulate what you are sensing, you can design systems to sense it.

Key Terms

These terms have been added to the AI for Founders Glossary. Search by Henry / Canary Speech to filter them.

Vocal biomarker: A discrete, measurable characteristic of speech - captured at the feature level, not the word level - that correlates with a specific physiological or psychological condition. Canary Speech analyzes 2,590 of these every 10 milliseconds.

Ambient listening: A non-intrusive audio capture method where the system monitors an ongoing natural conversation - a doctor-patient interaction, for example - without requiring participants to provide a dedicated audio sample or interrupt their activity.

Clinical decision support (CDS): An AI application category that provides objective, actionable information to clinicians in real time to augment their decision-making - without replacing the clinician's judgment. Canary Speech operates as a CDS tool, surfacing disease scores that inform referral decisions.

Feature extraction: The process of computing discrete measurable characteristics from raw audio data. In Canary Speech, feature extraction produces ~16 million data elements per minute from vocal biomarkers, which are then fed into machine learning models.

Longitudinal delta: The change in a patient's vocal biomarker profile across multiple clinical encounters over time. Canary Speech's peroneal nets patent enables instant comparison between a patient's current assessment and their baseline from any prior visit.

Ground truth (clinical): The expert clinical diagnosis used as the reference standard when building machine learning models. Canary Speech correlates vocal biomarker data against diagnoses made by specialists, then deploys the resulting algorithm as an objective, portable version of that specialist assessment.

Canary Ambient Continuous Monitoring: Canary Speech's latest capability, enabling ongoing passive vocal monitoring in environments like patient rooms or wearables - not just episodic doctor-patient conversations. Announced at HLTH 2025.

Aggression monitoring: A Canary Speech application that uses vocal biomarkers to assess the emotional state of up to five people in a room simultaneously and surface a green-yellow-red safety indicator to clinical staff before they enter. Developed in response to the estimated 2-nurses-per-day attack rate at large healthcare institutions.

Articulatory system: The coordinated set of physical structures - vocal cords, tongue, cheeks, jaw, respiratory system - that the central nervous system controls to produce speech. Because all of these are driven by the CNS, any neurological or physiological condition that affects the CNS leaves a detectable fingerprint in how speech is produced.

Tools from This Episode

Canary Speech

Voice AI platform that analyzes 2,590 vocal biomarkers in real time to detect up to a dozen diseases - including Parkinson's, Huntington's, MS, anxiety, depression, and mild cognitive impairment - through ambient listening during natural doctor-patient conversations. 14 issued patents, integrated with Microsoft Dragon Co-Pilot, Samsung watch, and major telehealth platforms. $26M raised. Founded 2016.

Visit Canary Speech

Q&A

What is Canary Speech?

Canary Speech is a voice AI platform that analyzes 2,590 vocal biomarkers every 10 milliseconds to detect neurological and behavioral health conditions in real time. It uses ambient listening to monitor natural doctor-patient conversations - without any intrusion - and returns disease assessments to the clinician's handheld device. Diseases covered include Parkinson's, Huntington's, multiple sclerosis, anxiety, depression, and mild cognitive impairment.

Who founded Canary Speech?

Henry (co-founder and CEO) and Jeff Adams. Jeff Adams led the team that built Dragon Naturally Speaking (the first commercial speech-to-text product), led the Amazon Echo team, and earlier in his career built mathematical models for intelligence communications at the NSA. Henry was doing neurological disease research at the NIH when they met. They founded Canary Speech in 2016 after an eight-hour conversation in a bagel shop.

How does ambient listening work?

Canary Speech integrates into existing audio capture environments - smartphones, video conferencing platforms, telephone systems, the Samsung watch - and monitors natural conversation without requiring participants to produce a dedicated audio sample. It identifies the patient's voice using speaker identification, assesses sample quality in real time, stitches together usable audio segments, and returns disease scores while the conversation is still ongoing.

What diseases can Canary Speech detect?

Canary Speech returns assessments for up to a dozen conditions per conversation, including anxiety, depression, mild cognitive impairment, Parkinson's disease, Huntington's disease, multiple sclerosis, and other behavioral and neurological health conditions. The platform is also working on identifying MS up to 10 years before a traditional clinical diagnosis would typically be made.

What is Canary Speech's patent portfolio?

Canary Speech holds 14 issued patents across five patent families, with 12 additional patents pending, in the US, Europe, Asia, and other jurisdictions. The five families cover: vocal biomarker identification correlated with disease; algorithmic clinical measurement from biomarkers; longitudinal delta tracking across time; AI-enhanced ambient functionality; and the combination of LLM population-level data with individual-specific vocal biomarker assessment.

What is Canary Ambient Continuous Monitoring?

Canary's newest capability, announced at HLTH 2025 in Las Vegas. It enables ongoing passive vocal monitoring beyond episodic clinical conversations - including room-level monitoring for applications like nurse safety (aggression detection) and continuous patient state monitoring between clinical visits.

How does Canary Speech address nurse safety?

Canary Speech runs an aggression model on room audio - identifying when voices in a patient room reach elevated emotional states - and surfaces a green-yellow-red indicator on a nurse's handheld device before they enter the room. Healthcare institutions report approximately two nurse attacks per day per 17,000-nurse organization. The estimated annual industry cost is $20 billion, with 20% of first-year nurses leaving the profession due to patient-related violence.

How does Canary Speech integrate with existing healthcare systems?

Canary Speech is integrated into the Microsoft Dragon Co-Pilot transcription platform, Samsung Watch, video conferencing systems, and telephony platforms. It operates at hospital-grade data security standards, is ISO audited and certified, and functions as a clinical decision support tool - not a standalone diagnostic system - to ensure compliance with healthcare data requirements.

Is Canary Speech available for non-healthcare applications?

Not yet as a commercial product, but Canary is building a developer-facing platform that will allow non-engineers to design custom vocal monitoring experiments and deploy models for applications outside of clinical healthcare. Henry expects this to be available within approximately one year.

Early warning for neurology and mental health with voice AI