September 12, 202501:01:17

Future of privacy-first computer vision in security tech

with Galvin Widjaja, Loretta.io

0:000:00

Show Notes

Galvin Widjaja is the founder of Loretta.io, a computer vision company that tracks human behavior and intent across retail and security environments - without storing biometric data, without building racial or age profiles, and without keeping records beyond 24 hours. Starting in Singapore in 2017 as one of a dozen computer vision startups, Loretta is now one of two still operating. Galvin works with TSA, Homeland Security, and special operations programs across multiple countries. In this episode he explains how to build an AI company that won't be made obsolete by the next foundation model, why bias cannot be removed from AI systems (but can be deliberately constrained), and what it actually takes to build a surveillance product you can defend ethically.

Why Loretta Survived When 10 Competitors Didn't - and How It Plans to Stay Relevant

Galvin's central insight - developed as early as 2019 - is that "tin AI" products whose differentiation is the algorithm itself are doomed. If your moat is a fight detection model that is 5% better than open source, a new entrant with access to the next foundation model can close that gap in months with none of your investment. Loretta's architecture is different: the AI is a foundational component, not the product. Step one is biometric re-identification - tracking the same person across multiple cameras without facial recognition, using clothing and body geometry instead. Step two is building an entity model: for a 300,000-person shopping mall, that means 300,000 complete end-to-end journey narratives. Step three is layering services on that accumulated behavioral data - data that a competitor arriving two years later with a better algorithm still cannot replicate because they were not there to collect it. The data moat deepens over time. The algorithm is only the entry point.

5 Frameworks from the Privacy-First Computer Vision Playbook

1. Tin AI vs. Foundational AI - Build the Moat That Compounds

Tin AI: the product IS the algorithm output - fight detection AI, gun detection AI. The moat is a marginal accuracy advantage
Every foundation model release supersedes that advantage; a new entrant can replicate your system in months with zero prior investment
Foundational AI: the algorithm enables data collection, and the data is the moat - a competitor with a better algorithm still lacks years of behavioral journey data
Loretta's re-identification + entity model means the system gets more valuable every day it runs, even if the underlying CV algorithm is eventually commoditized
Rule: if your entire differentiation can be reproduced by someone pointing a better API at your use case, you do not have a business - you have a feature

2. The 24-Hour Data Architecture - Privacy by Design, Not by Policy

Loretta retains individual journey data for 24 hours only - after that, all personal identifiers are removed
The system is end-to-end non-biometric: no facial recognition, no racial classification, no age profiling
The deliberate constraint: the government or police receive no more data about a non-criminal than they already legally have access to
Only once a person has crossed a legal threshold (an actual crime) does the system produce actionable identifying information
This architecture lost contracts in the short term - clients wanted more - but positions Loretta as the only defensible player when surveillance regulation tightens

3. Bias Cannot Be Removed - But Its Scope Can Be Deliberately Constrained

Galvin's position: bias cannot be removed from AI systems because stereotypes are pattern-matching and the AI is a pattern-matching engine
A bias filter on top of a biased model is like censorship on top of a biased image generator - the bias is still there, just obscured
The deeper problem: training data freezes today's biases into the system permanently, even as society evolves beyond them over the next decade
Loretta's approach: narrow the system's detection scope deliberately so that racial, age, and demographic signals cannot enter the detection logic at all
The key distinction: behavior detection (putting something in a bag) is acceptable; SET detection (when a person like this does that) codifies the stereotype into the system and is rejected

4. The Milestones-to-Graphs Funding Ladder for Deep Tech

Stage 1 - Milestones: seed to Series A on proof points (government pilot, regulatory approval, named enterprise customer) - speed and direction, not revenue scale
Stage 2 - Graphs: Series A onward on revenue curves - the market demands performance evidence, not just trajectory
For deep tech with a large long-term TAM, it can be rational to sacrifice graph growth temporarily to deepen product capability and expand addressable market
Loretta's path: stayed in milestone-mode longer than typical, using government and defense contracts to force technical development that commercial clients would not have demanded
The trap: commercial customers who just want the product cheaper and more enterprise-grade will cause you to stop investing in the algorithm, making you vulnerable to new entrants who kept investing

5. Keep One Hand in the Cutting Edge - Why Government Contracts Drive Commercial Innovation

AI companies whose most demanding clients are commercial enterprises will eventually stop improving their core technology
Government, defense, and military requirements are uniquely forcing functions: they push capability beyond what any retail client would request
Loretta's sensor fusion work (millimeter wave, standoff MF, Geiger counter data attributed to individuals) came from government contracts, not mall operators
The resulting capability then applies back to commercial use cases - the detection system built for a government installation is a structural advantage over a competitor only serving retail
Rule: always have at least one client whose requirements force you to improve faster than the market demands

Founder Experiment: Test Whether Your AI Moat Is Real or Tin in 5 Steps

Step 1 - Name your current differentiation honestly. Write down in one sentence what makes your AI product better than an open-source baseline today. Is it accuracy? Dataset size? A specific detection type? Now ask: if OpenAI, Google, or a well-funded startup releases a foundation model update in six months, does that sentence still hold? If the answer is no, you have tin AI - and you need to move to Step 2 urgently.

Step 2 - Identify the data layer that no future algorithm can replicate. What data does your system collect that can only exist because you were running in production? Loretta collects 300,000 behavioral journey narratives per day in a mall - no competitor arriving two years later can retroactively collect that history. Ask: what is the equivalent in your product? It might be customer interaction patterns, proprietary labeled datasets, longitudinal behavioral baselines, or accumulated training feedback loops. If you cannot name it, your moat is the algorithm and you are vulnerable.

Step 3 - Map the bias surface of your system. For every detection or classification your AI performs, ask: does this decision involve a demographic signal - age, race, gender, socioeconomic indicator? Not as an explicit input, but as an implicit correlate in the training data? If yes, you have SET detection risk - the system is pattern-matching the stereotype, not the behavior. For each instance, determine whether the demographic signal is necessary for the business outcome, and if not, redesign the detection to use only behavioral signals.

Step 4 - Find your most demanding possible client and pursue them first. Galvin's principle: the client whose requirements are hardest forces your technology to develop fastest. For Loretta, that was TSA and Homeland Security. For your product, it might be the most regulated, most security-conscious, or most technically demanding customer in your target market. Landing that client will do more for your product roadmap than 10 comfortable commercial customers - and will create a capability gap competitors cannot close without the same experience.

Step 5 - Define your 24-hour rule - what data do you actually need to keep? Most AI products accumulate data by default because storage is cheap and "we might need it later." Audit every data type your system collects and ask: what is the minimum retention period required to deliver the business outcome? Data you do not keep cannot be breached, subpoenaed, or misused. Galvin chose 24 hours deliberately - not as a marketing position, but as a structural commitment. The constraint forced better product design and created a defensible ethical narrative that competitors who kept everything cannot match.

Glossary

Re-identification (ReID): The computer vision technique of tracking the same person across multiple camera angles and locations using non-facial biometric signals - clothing color and texture, body geometry, gait. Loretta uses ReID as the foundation of its entity model, enabling person-level journey tracking without facial recognition.

Entity model: Loretta's term for the complete behavioral narrative of a single person's journey through a physical space - every location visited, dwell time, items examined, people accompanied. In a 300,000-visitor mall, the system maintains 300,000 simultaneous entity models.

Tin AI: Galvin's term for AI products whose entire value proposition is the output of the detection algorithm - fight detection AI, gun detection AI. These products are structurally vulnerable because a new foundation model can replicate their accuracy advantage with none of the original investment.

SET detection: A detection pattern where the AI flags a combination of demographic identity AND behavior - e.g., 'when a person of this appearance does that action.' SET detection codifies demographic stereotypes directly into the system logic and is deliberately excluded from Loretta's design.

Standoff MF / millimeter wave sensor: Passive sensing technologies that can detect objects concealed under clothing at a distance without requiring physical contact or the subject's cooperation. Emerging use in security: combined with computer vision re-identification to attribute sensor readings to specific individuals as they move through a space.

Precognitive surveillance: The capability to identify likely criminal or threatening behavior before it occurs - the operational goal of Loretta's security product. Rather than detecting a crime in progress, the system aims to provide a T-minus-one identifier: a signal that action is imminent, enabling intervention before harm occurs.

Milestones vs. graphs funding: Galvin's framework for startup funding stages. Milestones funding (seed to Series A): investors buy direction and proof points - pilots, approvals, named customers. Graphs funding (Series A onward): investors buy performance - revenue curves and demonstrated scale. Deep tech companies sometimes rationally sacrifice graph growth to deepen product capability.

Non-biometric tracking: Person-tracking that uses no facial recognition, racial classification, age estimation, or other protected biological attributes. Loretta's system is end-to-end non-biometric - tracking is based entirely on clothing, body geometry, and behavioral signals that carry no demographic identity information.

Common Crawl / public dataset bias: The large publicly available web datasets (Common Crawl and others) that form the training base for most large language and vision models. Galvin argues these datasets embed contemporary societal biases permanently into models trained on them - and that no post-hoc filter can fully remove what was baked in during training.

Tools & Resources Mentioned

Loretta.io - Galvin's company - privacy-first computer vision platform for retail analytics and security threat anticipation. Works with TSA, Homeland Security, and major mall operators.

Lendlease - Global real estate and infrastructure company that was Loretta's first major pilot partner in Singapore - placing the system in one of their mall properties and providing early product validation.

Simon Property Group - Galvin's cited dream client - the largest mall operator in the US. He sees transforming the Simon mall experience as the highest-leverage commercial application of Loretta's platform.

Minority Report (film) - 2002 Steven Spielberg film set in 2054 depicting precognitive law enforcement - the cultural reference point for anticipatory surveillance technology that Galvin frequently addresses when describing Loretta's ethical boundaries.

Radiolab - Referenced for a podcast episode on persistent wide-area aerial surveillance over Fallujah - the drone-based time-machine surveillance system that demonstrated both the power and the public discomfort with total-coverage video capture.

Q&A

What is Loretta.io and how does it differ from other computer vision companies?

Loretta.io is a computer vision platform that tracks human behavior and intent in retail and security environments. Its core differentiator is architectural: instead of building a product whose value is the detection algorithm (what Galvin calls 'tin AI'), Loretta uses computer vision as a foundational layer to build 300,000 individual behavioral journey narratives per day in a large mall. The algorithm can be superseded by the next foundation model; the accumulated journey data cannot be retroactively collected by any competitor. The security product - working with TSA, Homeland Security, and special operations programs - is an offshoot of the retail analytics work, built on the same re-identification and entity modeling infrastructure.

What is 'tin AI' and why does Galvin consider it a fatal business model?

Tin AI is Galvin's term for AI products whose entire value proposition is the output of the detection algorithm - fight detection, gun detection, sentiment analysis. The moat is a marginal accuracy advantage over open-source baselines, achieved through proprietary datasets and model tuning. The problem: every major foundation model release by OpenAI, Google, or other large players expands their coverage of these capabilities. A new entrant with no prior investment can build a comparable system in months by pointing a better API at the use case. Galvin watched 10 competitors fail this way between 2017 and 2025. The only sustainable position is one where the AI enables data collection, and the data itself is the moat.

Why does Loretta delete all data after 24 hours, and what does it mean to be 'end-to-end non-biometric'?

The 24-hour deletion policy is a deliberate structural commitment, not a marketing position. Loretta's system uses no facial recognition, no racial classification, and no age profiling - tracking is based on clothing, body geometry, and behavioral signals only. The operational principle: until a person crosses a legal threshold by committing an actual crime, the system provides authorities with no more information than they already legally possess. After 24 hours, all individual identifiers are removed. Galvin acknowledges this architecture lost contracts - clients wanted more - but frames it as the only defensible long-term position as surveillance regulation tightens globally. The companies that built unconstrained systems will face the hardest regulatory reckoning.

Can bias be removed from AI systems, and what is Loretta doing about it?

Galvin's position is unambiguous: no. Bias cannot be removed from AI systems because stereotypes are pattern-matching systems and AI is a pattern-matching engine. The same logic that completes a sentence also completes a demographic inference. Post-hoc filters obscure the bias but do not eliminate it. The deeper problem is temporal: training data freezes the biases of the moment it was collected. Society will evolve, but the model trained today will still carry 2025's stereotypes in 2035. Loretta's response is not to claim it has solved bias - it is to deliberately narrow the detection scope so that demographic signals (race, age, gender) cannot enter the detection logic. The system detects behaviors, not people-types. The distinction Galvin draws is between behavior detection (acceptable) and SET detection - when a person of this description does that action (unacceptable, because it encodes the stereotype into the system).

How did Galvin validate product-market fit before raising money?

Galvin used a consulting contract as his validation mechanism. He approached major mall operators in Singapore not with a product pitch but with an offer to analyze their existing data - which allowed him to discover from the inside that the world's largest mall operators were running entirely on heuristics and guesswork rather than behavioral data. That discovery confirmed that the gap was real and large, not just visible from the outside. He then joined a property tech accelerator, giving away 2% of the company for a pilot placement in a Lendlease mall. With that pilot data, he raised his first external capital. The sequence - consulting contract to understand the problem, accelerator to get a pilot, fundraise after validation - compressed the risk at each stage.

What is the milestones-versus-graphs funding framework and how did Loretta navigate it?

Galvin distinguishes two modes of fundraising. Milestones mode (seed to Series A): investors buy direction and proof points - a government pilot, a regulatory approval, a named enterprise client. Revenue is less important than speed toward validated checkpoints. Graphs mode (Series A onward): investors buy performance - revenue curves and demonstrated scale. Most AI startups are pushed toward graphs mode too early, which forces them to prioritize commercial customer retention over technical development. Loretta deliberately stayed in milestones mode longer than typical, using government and defense contracts - which have uniquely demanding technical requirements - to force product development that a pure commercial focus would not have generated. The trade-off was slower revenue growth; the benefit was a technical capability gap competitors could not close.

How does Loretta handle the tension between increasing surveillance normalization and civil liberties?

Galvin approaches this as an empirical rather than purely moral question. His observation: surveillance normalization happens incrementally - each use case expands what the public will accept, the same way Uber normalized getting into strangers' cars. He is more concerned about the decision-making layer than the data-collection layer. His framing: a police officer stopping a vehicle makes a decision that is currently about 20% what they observed, 60% preconceived bias, and 20% their mood that day. As the objective data available to that officer increases, it crowds out the space for bias. The danger is when AI systems start making those decisions autonomously - and Galvin is explicit that current hallucination rates make autonomous life-affecting decisions unacceptable. The data as an input to human judgment: potentially net positive. The data as a replacement for human judgment: not yet.

What two business ideas does Galvin believe could reach $1M in revenue quickly?

First: an AI lecture companion app for students. The observation that drove this: a third of ChatGPT's traffic disappeared during summer break and returned when school started - students are the dominant use case for consumer AI. An app that transcribes lectures, translates notes, cross-references textbooks, and allows students to ask tutorial-style questions would monetize that existing behavior directly. Second: a computer vision memory assistant for older adults. Galvin saw the seed of this in a photo of an AI scientist's parent using ChatGPT to identify a book on a shelf. An app that scans a room and tells you where a misplaced object is - keys, glasses, medication - solves a real daily problem for aging populations at a scale that requires no new AI capability, only a well-designed interface over existing vision models.

What does Galvin wish he had known earlier as a founder, and how is he actively working on it?

Galvin identifies fear as the core constraint he wishes he had addressed sooner. Coming from a background in management consulting and restaurant operations - both industries where margin for error is extremely thin and conservatism is rational - he brought a risk tolerance calibrated entirely wrong for startup speed. His observation: the IRR of a startup is essentially its execution velocity, and founders from non-startup entrepreneurial backgrounds are systematically too conservative. His active practice: going rock climbing with his wife despite being genuinely terrified of heights. The goal is not the rock climbing - it is rebuilding the default relationship between fear and action. He reports that confronting physical fear has directly improved his business decision-making, and frames deliberate fear exposure as a leadership development practice, not a personality quirk.

Future of privacy-first computer vision in security tech

Show Notes

Why Loretta Survived When 10 Competitors Didn't - and How It Plans to Stay Relevant

5 Frameworks from the Privacy-First Computer Vision Playbook

Founder Experiment: Test Whether Your AI Moat Is Real or Tin in 5 Steps

Glossary

Tools & Resources Mentioned

Q&A

Links & Resources