AI Won't Automatically Accelerate Clinical Trials

Words by

Ruxandra Teslo

AI Won't Automatically Accelerate Clinical Trials

A critique of a snippet from Dario Amodei and Dwarkesh Patel’s recent interview.

During a recent interview, Dwarkesh Patel and the CEO of Anthropic, Dario Amodei, discussed whether clinical trials will remain a meaningful bottleneck for drug development in the age of AI. Patel said that “most clinical trials fail because the drug does not work.” In response, Amodei speculated that as AI models get better at designing drugs, “clinical trials will be much faster … let’s say, they will take one year.”

This is a commonly voiced sentiment, but flawed. The truth is that the most significant barriers to progress today are rarely a lack of intelligence. London has a housing crisis even though the technology to design and construct homes has existed for centuries. The bottleneck in housing is not a lack of knowhow, but rather the weaponization of environmental regulations, planning, and NIMBYism. Much the same is true for clinical trials.

AI models can help design more elegant molecules, in the same way an architect can use AI to design more efficient floor plans, but neither intervention guarantees an efficient use of institutional machinery to make that design in the real world. Even the most promising drug candidates must be tested in human bodies which, in turn, need time to metabolize those drugs and develop side effects. Patients must be recruited and followed over time, and regulators must be satisfied. None of this is easily accelerated with AI.

Although I’m optimistic that AI will design better drug candidates, this alone cannot ensure “therapeutic abundance,” for a few reasons. First, because the history of drug development shows that even when strong preclinical models exist for a condition, like osteoporosis, the high costs needed to move a drug through trials deters investment — especially for chronic diseases requiring large cohorts. And second, because there is a feedback problem between drug development and clinical trials. In order for AI to generate high-quality drug candidates, it must first be trained on rich, human data; especially from early, small-n studies.

Clinical Variables

The Amodei interview conflates two distinct variables: the success rate of a trial (based on the quality of a drug), and the speed of that trial, understood as an operational process.

The first variable — the success rate of a trial — is the probability that a drug candidate will be both efficacious and safe in humans. The current success rate for a drug entering clinical trials is only about ten percent, meaning 90 percent of all drugs fail. Most AI efforts in biology aim to boost this success rate.

The second variable is the speed of data generation — the calendar time required to run an experiment after it has started. A clinical trial is just an experiment in human subjects, and the duration of that experiment is determined by both operational and biological constraints that are largely independent of how confident we are in the drug itself. Recruiting 1000 patients across 10 sites takes time; understanding and satisfying unclear regulatory requirements is onerous and often frustrating; and shipping temperature-sensitive vials to research hospitals across multiple states takes both time and money.

Amodei’s prediction that clinical trials could be done in a single year seems to assume that improving the first variable will also compress the second; but this is not so. Even if AI can help design more effective drugs, timelines will not compress until we solve the operational and regulatory bottlenecks of trials.

Admittedly, there is a tempting counter-argument: If AI does generate better drug candidates, then perhaps clinical trials will cease to be a meaningful bottleneck. If a drug is almost certainly going to work, then trials may become a “formality,” even if, in general, they remain unnecessarily costly and long.¹ This argument is also wrong, but understanding why requires being clear about what clinical trials are actually for.

Trials serve two distinct functions: validation — confirming whether a drug works and is safe — and learning, or generating biological data to refine our understanding of a disease, a compound, and the relationship between the two.

Validation is the primary goal of large-scale Phase III trials, which come later in the process and are typically designed to support regulatory approval. While data from these studies can deepen our understanding of drugs, their main goal is to figure out whether a treatment works under defined conditions. Learning, by contrast, is the dominant aim of early-stage trials. Conducted in smaller patient populations and often using exploratory designs, these studies are not limited to simple “yes or no” outcomes. Instead, they are experiments in the fullest scientific sense: they seek to uncover how a drug behaves in the human body, how it interacts with biological systems, and how the disease itself responds. Because these early-stage trials can be completed relatively quickly, they are more amenable to quick iteration.

For large “validation” trials, is it plausible that their cost will simply cease to matter in a (theoretical) world where AI makes drugs with a high probability of success? I think the answer is no, for a couple reasons.

First, unless we increase the pace and volume of the early-stage “learning trials,” it is unlikely that we will ever approach such a level of certainty in drug discovery. Today, most AI systems in drug development are trained predominantly on in vitro data and animal models. While valuable, these sources only imperfectly capture the complexity of human biology. Without large amounts of high-quality data from actual humans, we should not expect AI to generate predictions that approach near-certainty about trial outcomes.

Second, even if improved modeling could compress early-stage development timelines, every successful drug must still demonstrate benefit on an endpoint; either a clinical endpoint or a surrogate endpoint.

For many diseases, however, the relevant endpoints take a very long time to observe. This is especially true for chronic conditions, which develop and progress over years or decades. The outcomes that matter most — such as disability, organ failure, or death — take a long time to measure in clinical trials. Aging represents the most extreme case. Demonstrating an effect on mortality or durable healthspan would require following large numbers of patients for decades. The resulting trial sizes and durations are enormous, making studies extraordinarily expensive. This scale has been a major deterrent to investment in therapies that target aging directly.

Lastly, the duration of a clinical trial does not merely determine how fast an individual therapy reaches patients. It also shapes which diseases attract serious investment and which do not. In a scenario where AI produces better drug candidates, yet trials remain slow, medicines will become unevenly deployed. In that scenario, capital and innovation will flow toward indications with clear, rapidly measurable endpoints — such as oncology — where trials can be completed relatively quickly. By contrast, fields like aging, where meaningful outcomes take years or decades to observe, will continue to lag unless there is genuine innovation in endpoint development.

Osteoporosis, a progressive bone disease that primarily affects post-menopausal women, illustrates these dynamics well. Firstly, it benefits from an unusually strong preclinical model in the ovariectomized rat (OvX model). Unlike many other chronic diseases, where animal models have poor predictive validity, the OvX model reliably recapitulates post-menopausal bone loss and predicts drug response. This rat model is so good, in fact, that Phase III trials for osteoporosis succeed 83.7 percent of the time, substantially higher than the cross-indication average of roughly 57.8 percent at the same stage.

Given the existence of a good pre-clinical model that allows us to select higher quality candidates and the scale of unmet need in osteoporosis, one might expect it to attract sustained and substantial investment. But instead, the opposite has occurred. Today, only two drug candidates remain in late-stage clinical development for osteoporosis.

The primary reason is that Phase III osteoporosis trials are exceptionally large, long, and expensive to run. The core challenge lies in the endpoint: fracture reduction. Fractures are relatively infrequent events, even in high-risk populations, and they happen unpredictably. To demonstrate that a new therapy meaningfully lowers fracture rates compared with standard of care, trials must wait for enough fracture events to accumulate to produce statistical confidence.

Because the event rate is low and influenced by many factors beyond bone strength — such as fall risk, age, and comorbidities — the signal-to-noise ratio is modest. As a result, Phase III osteoporosis trials typically enroll 10,000–16,000 participants and follow them for three to five years. The sheer scale and duration of these trials push costs to between $500 million and $1 billion. Thus, investment into osteoporosis drugs slowed not because the biology failed or drug candidates lacked promise, but because the cost of proving benefit became prohibitively high.

Osteoporosis is just one example where trial size and costs deter investment. But there is broader empirical evidence in this direction. A 2015 study examining oncology R&D found that hematological cancers — where the FDA accepts short-term surrogate endpoints in roughly 92 percent of approvals, allowing for shorter trials — attracted 112 percent more private R&D investment than solid tumors, where surrogate endpoints are used in only about half of cases. The authors traced this disparity to commercialization timelines. The shorter trials used for the former preserve more of a drug’s effective patent life, improving expected returns and drawing capital. Each one-year reduction in bringing a new therapy to market was estimated to increase R&D investment by between 7 and 23 percent.

If we want AI models to actually accelerate “therapeutic abundance,” then, we must first find ways to speed up these large validation trials. And to design better drugs in the first place, we must find ways to collect in-human data in early-stage “learning” trials much faster, too.

Regulatory Friction

The best way forward is to reduce operational and regulatory friction. AI tools can already help at the margins by automating submission drafting, improving site selection, matching patients more efficiently, and streamlining data workflows. But without deep regulatory reform, this is unlikely to shrink trial timelines or costs at scale.

One regulatory lever we could pull is to implement more high-quality surrogate endpoints. A clinical endpoint directly measures how a patient feels, functions, or survives — such as prevention of stroke or a reduction in fractures. A surrogate endpoint, by contrast, is a measurable biological marker or intermediate outcome that reliably predicts such clinical benefit. Instead of waiting years to observe clinical outcomes, trials that rely on surrogate endpoints can measure signals much earlier.

AI tools can contribute to the development of better surrogate endpoints, such as by identifying promising biomarkers, analyzing cross-trial datasets, and modeling causal relationships between intermediate signals and clinical outcomes. But here, too, technical capability is only part of the story. Institutional reform is likely to be the binding constraint. As my case study of the 12-year effort to qualify bone mineral density (BMD) as an endpoint for osteoporosis trials illustrates, the bottleneck was not scientific capability. Instead, the core barriers to faster progress were fragmented trial data scattered across sponsors, weak funding incentives for what is effectively a public good, and an unnecessarily lengthy and opaque regulatory pathway.²

For AI to generate high-quality candidates — the kind that might, one day, push success rates of drug candidates so high that trials become more of a formality — it also needs rich, dynamic data as input. But remember that such data can only come from trials in people (mice are nice, but most animal results simply do not translate.) This, in turn, creates a feedback loop: better AI models require better trial data, and better trial data requires running trials. The loop is only as fast as its slowest component, the trial itself.

A regulatory structure modeled after Australia’s Clinical Trial Notification (CTN) framework — administered by the Therapeutic Goods Administration — offers a concrete example of the kind of policy push that could speed up these types of trials. There, most early-phase trials proceed after approval by a Human Research Ethics Committee (HREC), with notification rather than pre-approval by the regulator. The regulator retains inspection powers and the authority to halt unsafe studies, but does not duplicate the scientific review already conducted by the clinician-scientists and toxicologists embedded in HRECs. The result is that clinical trial sites can begin giving drugs to patients much sooner (about two times faster than in the United States, according to informal interviews with industry leaders).

In the United States, by contrast, Phase I trials typically require submission of an Investigational New Drug (IND) application to the U.S. Food and Drug Administration before initiation. This dual review — by both an IRB and the federal regulator — creates redundancy that lengthens the feedback loop. A CTN-like model for Phase I trials could preserve safety oversight while shifting scientific and toxicological reviews to accredited, transparently governed IRBs with expanded expertise. The FDA would retain the power to inspect, impose clinical holds, and intervene in high-risk cases, such as for novel gene therapies. But for the majority of small-molecule first-in-human studies, the default could be notification rather than permission.

My criticisms are not meant to imply that AI is irrelevant to trials; that’s certainly not the case. But many of the bottlenecks that determine trial speed and cost are coordination, institutional and regulatory problems, and they cannot be solved by technology alone.

Ruxandra Teslo is a fellow at Renaissance Philanthropy and co-founder of the Clinical Trial Abundance project. She writes about the intersection of science, culture and policy at her Substack. She holds a PhD in Genomics from Cambridge University.

Header image by Ella Watkins-Dulaney.

Cite: Teslo, R. “AI Will Not Solve Clinical Trials.” Asimov Press (2026). DOI: 10.62211/92wj-65fn

Footnotes

Clinical trials can be stopped early for overwhelming efficacy if interim analyses show a treatment effect so large and statistically robust that continuing would be unnecessary or unethical. In such cases, sponsors may also qualify for expedited FDA pathways — such as Fast Track, Breakthrough Therapy, or Priority Review — which can shorten regulatory timelines.But this is not a general solution for long development cycles.
Surrogate endpoints function as a public good because once validated, any sponsor in a therapeutic area can use them, regardless of who funded the underlying research.

Learn More

TOC

Example H2