Link copied to clipboard
Back to all posts
Industry Perspectives

AI Doctors Are Like Self-Driving Cars: Lessons from Waymo to Build Trust

Building trust through scoped domains, rigorous simulation, and human-in-the-loop safeguards.

Richard Wang

Richard Wang

September 19, 2025

0 min read

Share:

AI Doctors Are Like Self-Driving Cars: Lessons from Waymo to Build Trust

Over the past year, Waymo’s momentum has shifted the autonomous vehicle conversation from “if” to “how soon.” The company is rapidly adding cities and integrating with public systems and mainstream partners. Waymo and Via just announced a public-transit integration in Chandler, AZ. And with Lyft, Waymo is building a Nashville robotaxi network. Airport pilots are advancing too, with new permits and testing on California airport grounds. While this feels like it’s happening fast, what we’re actually seeing is the gradual, methodical rollout of a system designed to earn trust by delivering measurable safety.

Recent analysis of Waymo's performance shows a stunning 91% reduction in serious crashes compared to human drivers, with 96% fewer injury incidents at intersections (historically the deadliest driving scenarios). This is not incremental harm reduction; it looks like harm prevention, with the potential to eliminate traffic deaths as a leading cause of mortality.

The key to Waymo's success is not just their technology. It's their careful approach to earning trust in a high-liability, high-variance domain where every interaction can be uniquely complex, and the cost of failure is measured in lives. And that playbook offers crucial lessons for another life-or-death industry grappling with AI deployment: healthcare.

So if you’re trying to build an AI doctor, the core question is the same one Waymo confronted: How do you guarantee performance?

High Stakes, High Variance, High Skepticism#

Every patient interaction, like every driving scenario, involves complex, dynamic factors that make each situation unique. A slight medication dosage error or missed diagnosis can be as fatal as a miscalculated turn at an intersection. And just like with self-driving, public trust is everything. The safety and performance bar that an AI doctor needs to cross is much higher than the average human benchmark.

The question is not whether the technology can work, it's whether it will work – in ambiguous settings, 100% of the time.

This is where most AI companies get it wrong. They pursue the "general-purpose clinical agent," which is the equivalent of launching a self-driving car that claims to work perfectly in every city, weather condition, and traffic scenario from day one. As many healthcare buyers have learned the hard way, these are the agents that demonstrate great potential in demos but break down in production when faced with less-than-perfect real-world conditions.

Waymo took the opposite approach, and healthcare AI should follow suit.

The Waymo Playbook#

Within the field of autonomous vehicles, an operational design domain (ODD) defines the specific conditions under which an AV is designed to operate safely, including environmental factors like weather, geographical limitations, time-of-day restrictions, and roadway characteristics. ODDs are crucial for safety, as they establish the boundaries of the system's capabilities, ensuring it functions reliably within its defined parameters and does not operate outside its design limits.

Waymo’s design approach was to master an ODD – say, a neighborhood in Phoenix – under specific conditions, then widen the geofence as competence and confidence grow. They mapped every street, understood every traffic pattern, and simulated countless scenarios within that constrained environment before putting a single car on those roads.

Even then, they didn't go fully autonomous immediately. They put safety drivers behind the wheel to monitor performance and intervene when necessary. Only after proving exceptional safety and reliability in that specific domain did they remove the human oversight and expand to new neighborhoods.

These five elements are what allowed them to build trust:

  1. Domain Specificity
  2. Virtual Validation
  3. Human Oversight
  4. Gradual Autonomy
  5. Methodical Expansion

Let’s break down how this applies to building clinical AI agents.

Blog post image
The five steps that allowed Waymo to successfully gain trust in AV deployment.

1. Domain Specificity: Define a Clear ODD

Amigo takes this approach for healthcare AI. Instead of a general-purpose “AI doctor,” we help provider companies build custom agents for specific clinical neighborhoods. This extends to both different specialties (e.g., women’s health, cardiology, oncology) and different use cases (e.g., triage, post-visit follow-ups, care coordination). Each agent is scoped to the conditions and populations where it can be thoroughly validated.

Rather than hoping a general-purpose agent will safely handle the complexities of different medical domains, we work with clinical experts to define an agent's ODD – the precise scenarios where it can reliably operate.

2. Virtual Validation: Simulation Before Street Time

Once the agent has been built, Amigo’s partners will have their clinicians and product leads tangibly define what “good” looks like. We then simulate that environment using high-fidelity synthetic patient interactions – hundreds of thousands of them – graded on these bespoke safety and performance metrics (accuracy, appropriateness, empathy, escalation reliability, regulatory adherence, etc.). This exposes rare edge cases in hours that might take years to encounter in the field, all before the agent ever talks to a real patient.

The grader in this case is Amigo’s custom-built AI Judge, capable of evaluating agent success in simulations at 100,000x the speed of a human. Crucially, this allows for iterative improvement to be done at scale and significantly shortens time-to-convergence.

Naturally, this begs the following question: How do you trust the Judge?

3. Human Oversight: Teach the Judge to Judge

Waymo's initial rollout included safety drivers who could take control when needed. Similarly, Amigo’s deployment process has clinical oversight built in: first in simulations, then in real-life production. We work with human clinicians to assess the fidelity of the simulated world and the behavior of the agent, but perhaps their most important role is in refining the accuracy of the Judge.

This is because the Judge is actually the strongest source of evolutionary pressure in the agent development process. As long as we can correctly define success and train the Judge to adjudicate correctly, we will create a strong safety net to ensure an unfinished agent never leaves the factory.

As performance stabilizes and trust in the Judge grows, oversight tapers – not based on general feelings, but by hitting explicit, pre-agreed metric thresholds.

4. Gradual Autonomy: Earn the Right to Fly Solo

Just as Waymo eventually removed safety drivers as they proved reliability, we can gradually reduce reliance on human oversight. Once clinicians have gained the confidence that the Judge is calibrated correctly and the agent is performing safely across common and edge cases, Amigo’s system can take over from human reviewers to perform quality assurance at scale.

We continue to run the Judge on the same metrics we ran in simulation when the agent is live in production with real patients, continuously monitoring for any signs of degradation. And human clinicians are still able to conduct random quality inspections to ensure there is no drift over time.

Letting go of the reins can be the scary part. However, if we’ve brought clinicians along for the whole journey and demonstrated trust and safety at every step, we find that this step becomes easy.

5. Methodical Expansion: Onto the Next Neighborhood

Once a clinical agent is deemed to be safe and effective in one “neighborhood,” it becomes ready for scope expansion. As willingness to grow the capabilities of an agent increases, or as regulatory changes occur to create more space for AI in care delivery, we can teach the agent to expand its surface area in a controlled manner that leaves nothing up to chance. This also includes adjacent use cases that may require entirely new protocols and structures.

One significant advantage to building agents in this modular manner is that new agents can efficiently inherit validated behaviors and guardrails. The network effect is cumulative: every new scenario adds data and robustness to the whole system, increasing the contextual intelligence of agents across the board.

Building the Future Responsibly#

Waymo’s expansion from a few dozen cars in Phoenix to 1M+ monthly rides in cities all over the US didn't happen overnight. It took careful validation, continuous improvement, and a relentless focus on safety. But that measured approach is exactly why they're now the clear leader in self-driving cars, with the safety data to prove it (sorry, Tesla). And this data shows they've achieved something remarkable: the transition from trying to reduce harm to actually preventing it. Their cars have managed to systemically avoid the conditions that lead to crashes altogether.

Clinical AI has the same potential. Properly designed AI agents can prevent medical errors from happening and ensure no patient ever falls through the cracks. This could mean anything from providing 24/7 availability for urgent questions to delivering consistent, evidence-based care at scale.

But realizing this potential requires abandoning the "move fast and break things" mentality that has worked for AI deployment in other industries. Instead, we need the methodical, safety-first approach that's making Waymo a success. The companies that will win aren't those promising to solve everything immediately, but those building trust through demonstrated competence in specific domains. They're the ones running hundreds of thousands of simulations before real-world deployment, keeping humans in the loop until autonomy is earned, and expanding thoughtfully from proven success.

With the capabilities of today’s AI models, the future of healthcare isn't an off-the-shelf general-purpose AI doctor that magically handles everything. It's a network of specialized, highly-trained AI agents – each proven safe and effective in their specific domain – working alongside human clinicians to provide better, more accessible care.

To learn more about Amigo’s approach to safety, book a chat with us here.

The only platform for
creating safe AI agents.