Privacy in America Is Already Dead

Show Notes

Peter Holcomb has spent nearly two decades building and maturing security programs for startups and global enterprises across technology and healthcare. He's also the founder of Optimal IT, a virtual CISO and AI governance consultancy. In this conversation - the first of two parts - Peter pulls no punches on the state of AI security, privacy law in the United States, and what founders are getting badly wrong when they race to deploy AI tools.

The title of this episode isn't clickbait. Peter's view: Americans have already traded privacy for convenience, the country has no unified privacy law to defend what remains, and the AI race is accelerating the erosion at an exponential pace. This is the foundational episode for understanding the threat landscape before building anything on top of an LLM.

The Most Common AI Mistakes Founders Make

Algorithmic bias is the first and most underappreciated failure mode. When a model is trained on a subset of data that doesn't represent the full demographic it will serve, the outputs carry embedded discrimination - often invisible until the consequences surface. Most companies don't audit for this before deploying.

Data breaches are the second. If an AI system lacks proper access controls, it can inadvertently expose sensitive information or allow data to be exfiltrated from the model. This risk compounds with RAG-based architectures (retrieval-augmented generation), where the model pulls from external databases and APIs at query time. If those APIs aren't protected, every retrieval is a potential leak path.

Privacy surveillance is the third - the gradual accumulation of sensitive user data by model providers who train on what users submit. Once data enters a model's training pipeline, control is effectively relinquished. The data can surface in future model outputs, be used to train successors, or be exposed in a breach. PII (personally identifiable information) and PHI (personal health information) are especially high-risk.

How Much Do You Actually Trust the Major AI Providers?

Peter's answer: only a little. OpenAI, Anthropic, Google, and others assert that user data is protected and not used to train models by default. Peter trusts these claims lightly - and points to the existence of the entire AI security industry as evidence that the providers' own guardrails are insufficient. If the model providers had solved inference security, red-teaming services wouldn't be growing.

His practical guidance: treat the major LLM chat interfaces as appropriate for writing, brainstorming, content outlines, and non-sensitive analysis. Treat them as inappropriate for anything containing PII, PHI, financial data, credentials, or proprietary business logic. If you need to use sensitive data, run a locally deployed or properly enclaved model where you control the infrastructure.

On DeepSeek specifically: Peter explicitly advises against using it. The company used a technique called distillation - essentially scraping and compressing OpenAI's outputs to train a cheaper open-source model - raising fundamental questions about data lineage, IP integrity, and trustworthiness.

The United States has no unified federal privacy law. What exists is a patchwork of state-level regulations - California's CCPA being the most prominent - with no federal standard tying them together. Compare this to the EU's GDPR, which establishes a single comprehensive framework across all member states with real enforcement teeth.

Peter's take is blunt: Americans have essentially forfeited the privacy battle in exchange for convenience and instant gratification. The data is already in the pool. Retroactive consent frameworks and opt-out mechanisms don't meaningfully restore what's been given up. The horse is out of the barn.

He sees copyright law and digital provenance - tracking how training data was sourced and whether it was used ethically - as the more promising battlefield for slowing the AI race's worst excesses. The music industry's aggressive IP enforcement offers a model: lawyers with sharp swords move faster than regulators. If AI companies can be held accountable for how they source training data (as OpenAI was scraped to create DeepSeek, which was scraped from the internet at large), copyright liability might accomplish what privacy law has not.

The deeper concern he raises: data trained recursively on AI outputs creates a copy-of-a-copy-of-a-copy degradation problem. As models train on increasingly AI-generated content, the fidelity of the underlying signal declines. Garbage in, garbage out - at exponential scale.

Inference Security: The Practical Steps

For companies deploying AI systems, Peter outlines a concrete inference security framework:

Sanitize inputs before inference - process all prompts before they reach the model to screen for injection attempts, sensitive data, and policy violations
Sanitize outputs after inference - post-process model responses to catch hallucinations, policy violations, and inadvertent data exposure before they reach users
Principle of least privilege - explicitly define and restrict which functions, tools, and data sources an AI agent can access; identity access management (IAM) becomes critical when agents call external APIs for supplementary context
Rate limiting and anomaly detection - implement API rate limits and deploy an observability/application performance monitoring platform to detect unusual usage patterns that may indicate abuse or attack

These steps don't require a massive security team. They require intentional architecture decisions made before deployment, not bolted on after the first incident.

Agentic AI: The Governance Frontier

The shift from single-prompt chatbots to agentic AI - systems that chain multiple LLMs, call APIs, take actions, and produce complex outputs with minimal human supervision - is where Peter sees governance becoming most critical and most difficult.

Semi-autonomous agentic AI still has a human in the loop at decision points. Fully autonomous agentic AI receives an instruction and executes independently - researching, analyzing, writing, acting - without checking in. The governance frameworks being built today for static LLM deployments will struggle to keep pace with systems that iterate daily and have real-world effects.

His most provocative prediction: on the security side, the end state is bots against bots. Autonomous AI systems will conduct reconnaissance, find vulnerabilities, and attempt breaches. Other autonomous systems will detect, respond, and defend. Humans will tap in when judgment calls exceed what either side can resolve autonomously.

How to Evaluate an AI Security Tool

With new AI security vendors appearing constantly, Peter's evaluation framework combines technical testing with business diligence:

Run the tool through HackerVerse's red team arena - real pen testers attempting prompt injections, jailbreaks, and model-level attacks in real time
Test guardrail effectiveness with Tesseon - the tool designed to both attack and defend, showing you where the model is hardened and where it isn't
Evaluate the vendor: funding, backers, team credentials, market traction
Run a head-to-head bake-off between top candidates on your actual data and use cases

POCs (proof of concept evaluations) take time, but shortcuts here are how companies buy tools that look good in demos and fail in production.

The vCISO Model and Optimal IT's Services

Optimal IT offers virtual CISO (vCISO) services - fractional security leadership that gives startups and SMBs access to enterprise-grade expertise without the cost of a full-time hire. With hiring apprehension rising as companies assess AI's impact on headcount, the fractional model is increasingly attractive: pay for the expertise you need, when you need it.

The company is now layering AI governance services onto its security advisory practice - risk assessments for current AI usage, red team exercises to surface model vulnerabilities, and blue team implementation of guardrails and access controls. Frameworks deployed include NIST AI RMF, ISO 42001, and the EU AI Act, alongside traditional security certifications (SOC 2, HIPAA, HITRUST, ISO). The end goal: a white-glove, turnkey service that gives companies both the security foundation and the AI governance layer they need to operate with confidence.

Tools & Resources

Optimal IT - Virtual CISO and AI governance consultancy; risk assessments, red/blue team exercises, compliance certifications (opto-it.io)
HackerVerse - Red team testing arena; real pen testers evaluate AI tools and models against live attack vectors in real time
Tesseon - AI red team and blue team platform; tests efficacy of AI systems and implements guardrails; tesson.ai
NIST AI RMF - National Institute of Standards and Technology AI Risk Management Framework; the foundational governance framework for responsible AI deployment
ISO 42001 - International standard for AI management systems; provides requirements for establishing, implementing, and improving AI governance
EU AI Act - European Union's comprehensive AI regulation; risk-tiered compliance requirements with the most extensive global AI governance framework currently in force
GDPR - EU General Data Protection Regulation; cited as the model for what a unified US privacy law should look like but doesn't
CCPA - California Consumer Privacy Act; the most significant US state-level privacy law, covering California residents' data rights

Key Frameworks from This Episode

The Three AI Failure Modes: Algorithmic bias (training data that doesn't represent the full population), data breaches (missing access controls at the model or API layer), and privacy surveillance (data submitted to models being inherited by providers and used for training). All three are active risks in every production AI deployment today, and most founders aren't auditing for any of them.
RAG Access Control Risk: Retrieval-augmented generation supplements model responses with data from external databases and APIs. If those APIs lack proper access controls, every supplementary retrieval is a potential leak path. The AI's expanded context window creates an expanded attack surface. Access control on RAG data sources is non-negotiable security hygiene.
The Copy-of-a-Copy Problem: OpenAI trained on internet data. DeepSeek distilled from OpenAI. The next model will train on DeepSeek's outputs. Each generation inherits the errors, biases, and fabrications of the previous one - with reduced fidelity. Recursive AI training on AI outputs degrades signal quality over time. Garbage in, garbage out, compounded.
Privacy as Forfeited Battlefield: The US has no unified federal privacy law. Americans have traded data privacy for convenience at scale and without legal recourse. Peter's argument: the privacy battle is effectively lost in the US. The more viable intervention points are copyright enforcement (how training data is sourced) and digital provenance (tracking data lineage through model generations).
Inference Security Hygiene: Four non-negotiable steps for any production AI deployment: sanitize inputs before inference, sanitize outputs after inference, apply least-privilege access controls to all APIs the model can reach, and monitor for anomalous usage patterns with rate limiting and observability tooling. These are architectural decisions, not afterthoughts.
Bots vs. Bots: Peter's prediction for the endpoint of autonomous agentic AI in security: offensive AI agents conducting reconnaissance and attacks; defensive AI agents detecting and responding - continuously, without human initiation. Human judgment remains necessary for ambiguous decisions, but the front line of security will be fully automated on both sides.

FAQ

Is it safe to put business information into ChatGPT or Gemini?

For non-sensitive work - writing, brainstorming, content outlines - yes, with reasonable caution. For anything containing PII, PHI, financial credentials, proprietary business logic, or patient data - no. Once data enters a major LLM's inference pipeline, you've effectively relinquished control of where it goes. If you need to use sensitive data, run a locally deployed or properly enclaved model that you control.

What is RAG and why does it create security risks?

Retrieval-augmented generation (RAG) is an architecture where the LLM pulls additional context from external databases or APIs at query time to improve response accuracy. The security risk: if those external APIs aren't protected with proper access controls, the retrieval mechanism creates a data leak path. Every API call the model makes is a potential vulnerability if that API isn't hardened.

Why doesn't the US have a unified privacy law like GDPR?

Peter's assessment: it's a combination of political fragmentation (privacy regulation is handled at the state level, not federally) and cultural preference for convenience over privacy. Americans have consistently chosen to trade data for frictionless services. The infrastructure for a privacy-protective culture - legal standardization, enforcement mechanisms, consumer expectations - doesn't exist at the federal level.

What is the vCISO model and why is demand growing?

A virtual CISO (vCISO) is a fractional chief information security officer - a senior security expert engaged on a part-time or project basis rather than as a full-time hire. Demand is growing because AI-driven job apprehension is making companies reluctant to add headcount, while the need for security expertise is simultaneously increasing. The fractional model delivers enterprise-grade expertise at a fraction of the cost.

What is algorithmic bias and how does it happen?

Algorithmic bias occurs when a model is trained on data that doesn't represent the full population it will serve. The model learns patterns from a skewed sample and applies them universally, producing outputs that discriminate against underrepresented groups. It often manifests invisibly until consequences surface - loan decisions, hiring recommendations, medical diagnoses. Prevention requires intentional training data auditing before deployment.

Should I trust DeepSeek?

Peter's explicit recommendation: no. DeepSeek used a technique called distillation - essentially scraping and compressing OpenAI's model outputs to train a cheaper open-source model. This raises serious questions about data lineage, IP integrity, and the trustworthiness of the underlying training. The provenance of what's inside DeepSeek is opaque, and Peter advises founders to avoid it.

This ai expert says privacy in America is already dead

Show Notes

The Most Common AI Mistakes Founders Make

How Much Do You Actually Trust the Major AI Providers?