Blog

The Next Decade of Medical AI: No Shortage of Large Models, but an ImageNet Is Missing

What medical AI truly lacks may not be yet another larger model, but an ImageNet-like infrastructure — a foundational data and evaluation system that systematically records current biological states, intervention actions, and subsequent state changes. The next decade of medical AI does not lack large models. What is truly missing is a shared infrastructure for biological state transitions. Whoever defines state, action, and transition may define the underlying coordinate system of next-generation AI in medicine.

熊江辉 · 2026-05-10

One of the most important turning points in artificial intelligence over the past decade was not any single model suddenly becoming smarter — it was the entire field finally acquiring a shared infrastructure: ImageNet.

On the surface, ImageNet is an image database. But what truly transformed computer vision was not just the volume of data; rather, it defined the same task, the same label system, and the same evaluation methodology for researchers worldwide. Because of this shared coordinate system, models like AlexNet, VGG, and ResNet could rapidly iterate on the same benchmark, ultimately propelling computer vision into the deep learning era.

Today, medical AI stands at a similar inflection point.

We already have an increasing number of medical large models, medical QA systems, omics foundation models, virtual cells, digital twins, AI drug discovery platforms, and real-world data platforms. Arc Institute's State model has begun predicting cellular responses to drug, cytokine, and genetic perturbations; some studies have directly proposed Medical World Models to simulate future tumor states under therapeutic conditions; IEEE TBME has also identified Digital Twins / AI World Models as an important direction for the future of biomedical engineering, emphasizing the simulation of health trajectories through multimodal data to support therapeutic interventions and disease monitoring.

Yet I am increasingly convinced that what medical AI truly lacks may not be yet another larger model, but an ImageNet-like infrastructure.

More precisely, the medical world model needs a TransitionNet oriented toward living systems:

> A foundational data and evaluation system that systematically records "current biological state — intervention action — subsequent state change."

In image AI, the fundamental task is:

> image → label

In the medical world model, the truly critical task should be:

> state + action → next state

That is:

> current biological state + intervention action → subsequent state change

This is what distinguishes the medical world model from traditional medical AI.

Traditional medical AI is better at answering:

Does this person have a disease?

Does this scan look like cancer?

Is this biomarker high-risk?

What does this paper say?

But the medical world model must answer a more fundamental question:

> If we apply a certain intervention to a living system, in which direction will it change?

1. The Next Bottleneck in Medical AI Is Not Model Parameters, but Data Structure

Over the past few years, the dominant theme in medical AI has been "large models."

Larger language models, larger multimodal models, larger medical knowledge bases, stronger QA capabilities, better literature summarization.

All of these are important.

But if we understand the future of medical AI merely as "better models for answering medical questions," we may be underestimating the field's true opportunity.

The core of medicine is not answering questions — it is changing the trajectory of life.

When a physician faces a patient, what truly matters is not only knowing "what disease is this," but rather determining:

- How did the current state come about?

- Which factors are driving the system toward deterioration?

- Which nodes are intervenable?

- Which intervention might lead to state improvement?

- Through which metrics should improvement be verified?

- If there is no improvement, where did things go wrong?

- If adverse effects occur, why did the system deviate from expectations?

These questions are, in essence, not knowledge QA problems — they are state transition problems.

In other words, the next decade of medical AI must do more than enable models to comprehend medical knowledge. It must enable models to gradually learn:

> How living systems respond to interventions.

This requires a new kind of data structure.

Much of today's medical data is static, siloed, and cross-sectional.

There are test results, but no intervention records.

There are intervention records, but no follow-up measurements.

There are follow-up measurements, but no data on dosage, timing, or adherence.

There are clinical outcomes, but no mechanistic annotations.

There are omics data, but no state transitions.

There are case descriptions, but no computable before-and-after changes.

Such data certainly has value, but it is very difficult to train a true medical world model with it.

What is truly scarce is:

> longitudinal state–action–next-state data

That is:

> Longitudinal state—action—next-state data.

This is the fuel for the medical world model.

2. Why Does Medicine Need Its Own ImageNet?

ImageNet's greatness lies not merely in collecting a vast number of images.

More importantly, it gave the computer vision community its first shared coordinate system.

Before ImageNet, different researchers could work with their own datasets, their own labels, and their own evaluations, making it difficult to determine who had genuinely advanced. After ImageNet, everyone could finally compete, validate, and iterate on the same task.

Medical AI faces a similar issue today.

We have too many models, but too few shared tasks.

We have too much data, but too little shared structure.

We have too many metrics, but too little shared evaluation.

We have too many "intelligent systems," but very few can answer the same core question:

> Given the current state of a living system and an intervention action, can the model estimate the direction of subsequent state change?

The ImageNet for medical world models should not be an ordinary database.

It should not be merely a case repository.

It should not be merely an imaging archive.

It should not be merely an omics data warehouse.

It should not be merely a literature knowledge graph.

Nor should it be merely a large electronic health records table.

It should be an infrastructure built around state transitions.

It should include at least five components:

1. State representation: how to represent a person's current biological state;

2. Action ontology: how to standardize the description of interventions — drugs, nutrition, exercise, sleep, cell therapy, and more;

3. Transition record: how to record state changes following an intervention;

4. Evidence chain: how to connect targets, pathways, phenotypes, and validation metrics;

5. Benchmark task: how to evaluate whether a model has genuinely learned state transitions.

This is the ImageNet that the medical world model truly needs.

It does not simply reduce medical problems to classification problems; rather, it establishes a new shared coordinate system for medical AI.

3. From "Disease Recognition" to "Intervention Simulation"

The first phase of medical AI was recognition.

Recognizing lesions in medical images.

Recognizing diagnoses in medical records.

Recognizing genetic variant risks.

Recognizing whether a person belongs to a particular disease subtype.

The second phase of medical AI was prediction.

Predicting disease risk.

Predicting hospitalization probability.

Predicting drug response.

Predicting likelihood of recurrence.

Predicting survival.

But the third phase of medical AI should be intervention simulation.

Not merely asking:

> Is this person at high future risk?

But rather asking:

> Which interventions might alter this person's future trajectory?

This step represents the shift from prediction to intervention reasoning.

It requires models to move beyond building "feature–label" mappings, and instead learn:

> How states form, how interventions act, how systems transition, and how evidence is verified.

This is also why the concept of "world models" is so important for medicine.

In robotics, autonomous driving, and reinforcement learning, the value of world models lies in enabling agents to internally simulate the consequences of actions, compare alternatives, and then decide how to act.

Medicine, of course, cannot simply copy robotic world models. Living systems are far more complex than game environments, and trial-and-error is not freely available.

But medicine does need a more cautious, auditable, and verifiable world model paradigm:

> Not to let AI arbitrarily control the human body, but to make the state transition logic of medical interventions more transparent.

The significance of the medical world model is not to build a bigger black box, but to construct a more auditable living system simulator.

4. The Medical World Model Did Not Appear from Thin Air — It Has Deep Scientific Roots

The medical world model is not a brand-new term conjured from nothing.

Looking at the longer arc of scientific history, medicine has continually attempted to build computational models of the human body. Cardiac electrophysiology modeling, the virtual heart, and digital twins are important precedents.

Professor Henggui Zhang and colleagues have long worked on mathematical modeling and simulation of cardiac cells, tissue, and three-dimensional cardiac electrical activity — using ion channel dynamics, tissue conduction models, 3D anatomical structures, and electrophysiological equations to simulate arrhythmias, ischemic states, electrical propagation, and ECG changes.

This body of work offers an important insight:

> Truly valuable medical models tend not to be black-box classifiers, but system models that connect structure, mechanism, dynamics, and verifiable outputs.

Today's virtual cells, digital twins, and medical world models can be seen as extensions of this systems modeling tradition into the era of AI, multi-omics, and real-world data.

For example, Arc Institute's State model attempts to predict cellular responses to drug, cytokine, and genetic perturbations; MeWM directly employs the Medical World Model concept to explore simulating future tumor states under therapeutic conditions.

These efforts all demonstrate that medical AI is progressively moving from static recognition and risk prediction toward state transition simulation under intervention conditions.

But for this direction to become a truly cumulative, comparable, and verifiable scientific infrastructure, models alone are not enough. We also need shared data structures and evaluation systems akin to ImageNet.

The difference is that what the medical world model needs is not:

> image → label

but rather:

> state + action → next state

That is:

> current biological state + intervention action → subsequent state change.

5. Why Emphasize "Steerability"?

If the medical world model is merely a predictor, it is still not enough.

A model can predict that a person's risk is rising, but that does not automatically tell us how to change that trajectory.

What medicine truly cares about is:

- Which states can be measured?

- Which abnormalities can be explained?

- Which interventions can be described?

- Which transitions can be verified?

- Which deviations can be tracked?

- Which failures can be reflected upon and corrected?

On this point, the SEWO / Steerable Medicine World Model framework proposed by our team emphasizes that a medical world model must not pursue only prediction accuracy — it should possess the capacity to define states, describe interventions, reason about transitions, audit mechanisms, and track deviations.

The core ideas were presented in the preprint World Models for Biomedicine: A Steerability Framework and systematically laid out at steerable.world.

It is important to emphasize that this framework is not a validated clinical treatment system, but rather a set of structural constraints and evidence chain design principles for future biomedical world models.

It reminds us that the key to a medical world model is not just "what it can predict," but rather:

> Whether it can be audited, questioned, corrected, and steered by researchers and clinicians within well-defined boundaries.

This is also what sets the medical world model apart from ordinary large models.

Ordinary large models are more like knowledge and language systems.

A medical world model must become a system of states, interventions, transitions, and feedback.

It cannot merely talk.

It must be verifiable.

6. Why Is Now the Window of Opportunity?

I believe that discussing "the ImageNet for medical world models" now is not too early — the timing is just right.

There are five reasons.

First, multi-omics testing is maturing

Technologies such as genomics, transcriptomics, proteomics, metabolomics, methylomics, and single-cell omics are increasingly enabling us to measure the internal states of living systems.

In the past, medicine could only roughly observe phenotypes.

Now we are beginning to see deeper molecular perturbations, pathway changes, and cellular states.

Without state measurement, there can be no world model.

Second, longitudinal health data is growing

Wearable devices, continuous glucose monitoring, long-term physical examinations, home testing, remote follow-up, and digital health platforms are making individual health trajectories recordable.

Medical data is shifting from single-point snapshots toward continuous time series.

This is crucial for world models.

Because what a world model cares about is not "what is" at a given moment, but how the system changes over time.

Third, intervention data is becoming richer

Drugs, nutrition, exercise, sleep, psychological stress, supplements, cell therapy, regenerative medicine, lifestyle management — all can serve as actions in a medical world model.

In the past, such data was highly fragmented.

But if it can be recorded in a standardized manner, it could become immensely valuable state transition data.

Fourth, AI world models have become a key direction for next-generation AI

World models are becoming one of the frontier directions in AI. Whether in robotics, autonomous driving, physical world simulation, or generative environment modeling, the fundamental question being explored is:

> How can models understand how the world changes in response to actions?

Medicine also needs this capability.

Except that the medical world model cannot pursue flashy generative effects — it must pursue mechanistic credibility, clear boundaries, rigorous verification, and safety controllability.

Fifth, personalized medicine is approaching the N-of-1 era

The future of medicine will increasingly be not about "effective on average," but about answering:

> For this person, in this state, which intervention might be effective?

This inherently requires N-of-1 state transition data.

A structured N-of-1 intervention is essentially a small-scale world model experiment:

> individual state → intervention → individual transition

If such data can be standardized, retested, verified, and accumulated, it will become the most important fuel for medical world models.

7. Why Longevity Medicine May Be One of the Best Starting Points

If we are to build an ImageNet-like infrastructure for the medical world model, I believe longevity medicine may be one of the best starting points.

The reasons are straightforward.

First, aging is a continuous state, not a single disease label

Traditional diseases are often centered on diagnostic labels.

But aging is not a simple label. It is a continuously changing system state involving inflammation, metabolism, immunity, mitochondria, epigenetics, proteostasis, stem cell exhaustion, cellular senescence, and many other dimensions.

This is highly suited to world models.

Because what world models excel at is not static classification, but dynamic states.

Second, longevity medicine inherently requires retesting

Longevity medicine is not concerned with one-time diagnosis, but with long-term trajectories.

Whether an intervention is meaningful must be judged through months, years, or even longer periods of follow-up testing.

This naturally forms:

> baseline state → intervention → follow-up state

Which is precisely the state transition structure needed by world models.

Third, longevity interventions are inherently diverse

Diet, exercise, sleep, stress management, drugs, supplements, cell therapy, regenerative medicine, environmental exposure management — all can influence aging states.

This provides a rich set of scenarios for action ontology.

Fourth, individual variation is enormous

The same intervention may produce entirely different responses in different people.

This means longevity medicine cannot rely solely on average effects — it must attend to individual states, individual responses, and individual trajectories.

This is the very core of N-of-1 state transition modeling.

Fifth, longevity medicine needs a new foundation of trust

One of the biggest problems in the longevity industry today is a deficit of trust.

Users do not know which interventions actually work.

Physicians do not know how to evaluate complex combination interventions.

Companies struggle to demonstrate long-term value.

Investors find it hard to judge whether a platform has a genuine moat.

If a "state—intervention—transition" data infrastructure can be established, longevity medicine could shift from being marketing-driven to being evidence-driven.

8. The True Value of This Endeavor: Defining the Infrastructure for Next-Generation AI in Medicine

Once the ImageNet for medical world models is established, its significance goes beyond training a few models.

It could transform the foundational logic of medical AI itself.

1. It will change the competitive moat of medical AI

The core competition in future medical AI may not be about who has the largest model, but who has the best state transition data.

Large models can be called via API.

Algorithms can be replicated.

Interfaces can be copied.

But high-quality, retestable, verifiable, and traceable state transition data is very hard to replicate in the short term.

Whoever builds this data flywheel may possess a genuine platform-level moat.

2. It will change how medical research is organized

Traditional research is typically organized around diseases, drugs, or endpoints.

A portion of future medical research may be organized around state transitions:

> Which types of states, under which types of interventions, are most likely to produce which types of transitions?

This would gradually shift medical research from a "disease label–centered" paradigm to a "dynamic system–centered" paradigm.

3. It will change the evidence structure of personalized medicine

The biggest challenge in personalized medicine is the difficulty of evidence.

Large-scale randomized controlled trials are suited to evaluating average effects in populations, but they cannot necessarily answer state transition questions for each individual.

If we can systematically accumulate N-of-1 state transition data, it could form a new supplementary evidence paradigm:

> Population evidence + mechanistic evidence + individual state transition evidence.

This has profound implications for precision medicine, longevity medicine, rare diseases, and complex chronic disease management.

4. It will change AI drug discovery

AI drug discovery must not stop at target prediction, molecule generation, and binding affinity prediction.

What truly matters is:

> Can an intervention shift an aberrant biological state in the desired direction?

With state—intervention—transition data, drug discovery could move closer to real living system responses, rather than optimizing only at the static target level.

5. It will change investment logic

In the past, investors evaluating medical AI companies often asked: How powerful is the model? How much data do you have? Is there a product? Are physicians using it? Can it be commercialized?

In the future, they may need to ask one more question:

> Is this company accumulating reusable state transition data?

Without state transition data, many medical AI products may be merely tools.

With a continuously accumulating state transition data flywheel, a product could become a platform.

9. What Makes This Different: Not an Ordinary Dataset

The ImageNet for medical world models differs significantly from traditional AI datasets.

First, it is longitudinal, not cross-sectional

Ordinary datasets typically record samples and labels at a single point in time.

Medical world model datasets must record time.

Without time, there are no transitions.

Without transitions, there is no world model.

Second, it is intervention-linked, not purely observational

Observational data is important, but world models need actions.

If there are only states and no interventions, the model can only learn correlations.

With states, interventions, and subsequent changes, the model can potentially learn responses.

Third, it is multi-level, not single-modality

A living state cannot be represented by a single biomarker.

It requires connecting: molecules, cells, pathways, organs, phenotypes, behaviors, environment, and clinical context.

This means that medical world model datasets are inherently multimodal, multi-scale, and multi-timepoint.

Fourth, it must be auditable, not black-box labels

Medicine cannot merely assign a "effective/ineffective" label.

Every state transition should, as far as possible, be linked to mechanistic evidence: targets, pathways, biomarkers, clinical metrics, safety signals, and uncertainties.

Fifth, it must be continuously updated, not a one-time release

ImageNet could exist as a static benchmark for a long time.

But the data infrastructure for medical world models must continuously absorb new data, new interventions, new retests, new verifications, and new failure cases.

It is more like a living state transition data flywheel than a one-time dataset.

10. What Is the Biggest Challenge?

This endeavor is enormously significant, but also extremely difficult.

Challenge 1: State representation is extremely complex

A person's biological state cannot be summarized by a single diagnostic label.

How to organize multi-omics, physical examination data, lifestyle, symptoms, organ function, environmental exposures, and medical history into a computable state representation is the first major challenge.

Challenge 2: Intervention actions are difficult to standardize

Actions in medicine are far more complex than actions in robotics.

Drugs have dosage, frequency, duration, combinations, and adherence.

Exercise has type, intensity, frequency, and duration.

Diet has composition, caloric content, timing windows, and nutritional profile.

Supplements and lifestyle interventions are even more complex.

If actions cannot be standardized, models will struggle to learn from them.

Challenge 3: Follow-up data is scarce

Much medical data consists of a single measurement.

But world models require before-and-after changes.

This means data collection workflows must be redesigned so that testing, intervention, retesting, and feedback form a closed loop.

Challenge 4: Causal confounding is severe

In the real world, a person often simultaneously changes their diet, exercise, sleep, medications, and supplements.

Which factor caused the state change?

Do different interventions synergize or antagonize each other?

How should confounders be handled?

This requires very careful study design and statistical methods.

Challenge 5: Safety and ethical requirements are extremely high

A medical world model cannot freely trial-and-error like a game-playing model.

Any model involving interventions must have clear boundaries: what is merely a research hypothesis; what can serve as health management advice; what requires a physician's judgment; what must not be automatically recommended; and what must undergo regulatory and clinical validation.

Challenge 6: Tension between business models and open standards

If this infrastructure is entirely closed, the industry will struggle to form shared standards.

If it is entirely open, companies will find it difficult to generate sustainable commercial returns.

How to balance open benchmarks, privacy protection, commercial incentives, and scientific collaboration is a very practical challenge.

11. How Should We Approach This?

This article will not elaborate on the technical roadmap. A subsequent piece can specifically address "how to build the ImageNet for medical world models."

Here I will outline only the direction.

I believe it roughly requires five steps.

Step 1: Define a minimum viable task

Do not start by attempting to simulate the entire human body.

Begin with a scenario that is measurable, retestable, intervenable, and verifiable.

For example: cellular perturbation responses, longevity medicine state transitions, inflammatory state interventions, metabolic state improvements, DNA methylation age changes, or chronic disease risk state transitions.

Getting one task right first is more important than pursuing comprehensiveness from the outset.

Step 2: Establish state standards

Define what a baseline state should record.

For example: molecular metrics, pathway metrics, clinical metrics, phenotypic metrics, behavioral metrics, environmental context, and temporal information.

Step 3: Establish intervention standards

Define how actions should be described.

For example: intervention type, dosage, frequency, duration, combination relationships, adherence, and mechanistic annotations.

Step 4: Establish follow-up and transition records

Follow-up states must be systematically recorded.

Without retesting, there are no transitions.

Without transitions, there is no medical world model.

Step 5: Establish benchmark tasks

Have different models answer the same class of questions:

- Can they predict the direction of state change?

- Can they identify key mechanisms?

- Can they propose verification metrics?

- Can they recognize risks and uncertainties?

- Can they generalize to new individuals, new interventions, and new time points?

This is the embryonic form of a medical world model benchmark.

12. The Most Important Judgment: Whoever Defines State, Action, and Transition Defines the Future

The next decade of medical AI will not lack large models.

More precisely, medical AI certainly still needs stronger models, but larger models alone cannot automatically solve the state transition learning problem that medical world models require.

What is truly scarce is the data infrastructure that enables models to learn biological state transitions.

I would even argue that the platform-level companies in future medical AI may not be those with the largest language models, but rather those that are the earliest to build the following capabilities:

> Continuously measuring biological states;

> Standardizing the recording of intervention actions;

> Systematically retesting state changes;

> Constructing mechanistic evidence chains;

> Forming state transition data flywheels.

This is the infrastructure competition of the medical world model era.

Whoever defines state determines what medical AI can see.

Whoever defines action determines how medical AI understands interventions.

Whoever defines transition determines how medical AI learns about biological change.

Whoever defines the benchmark determines how the entire field advances.

Conclusion: ImageNet Taught AI to See the World; the ImageNet for Medical World Models Must Teach AI to Understand How Life Responds to Interventions

ImageNet gave machine vision its first shared coordinate system.

It taught AI to see the world more systematically.

The ImageNet that the medical world model needs is not about teaching AI to recognize more disease labels — it is about teaching AI to understand how life responds to interventions.

Once this is accomplished, medical AI will no longer merely answer questions, summarize literature, or predict risks.

It will begin to truly learn:

> How states form, how interventions act, how systems transition, and how evidence is verified.

The next decade of medical AI does not lack large models.

What is truly missing is a shared infrastructure for biological state transitions.

Whoever builds this may well define the underlying coordinate system of next-generation AI in medicine.

References

1. Deng J, Dong W, Socher R, et al. ImageNet: A Large-Scale Hierarchical Image Database. CVPR. 2009.

2. Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. 2015.

3. Ha D, Schmidhuber J. World Models. 2018. https://worldmodels.github.io/

4. Arc Institute. Arc Institute's first virtual cell model: State. https://arcinstitute.org/news/virtual-cell-model-state

5. Theodoris C, et al. Predicting cellular responses to perturbation across diverse contexts with State. bioRxiv. 2025.

6. Yang Y, Wang ZY, Liu Q, et al. Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning. arXiv:2506.02327.

7. IEEE Transactions on Biomedical Engineering. Digital Twins / AI World Models. https://www.embs.org/tbme/research-highlights/digital-twins-ai-world-models/

8. Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nature Medicine. 2022.

9. Xia Y, Wang K, Zhang H. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.

10. Aslanidi OV, Colman MA, Stott J, et al. 3D virtual human atria: A computational platform for studying clinical atrial fibrillation. Progress in Biophysics and Molecular Biology. 2011.

11. Xiong J. World Models for Biomedicine: A Steerability Framework. Preprints.org, 2026. doi:10.20944/preprints202605.0366.v1.

12. SEWO — Steerable Medicine World Model. https://steerable.world

Disclaimer: This article is intended solely for research, technology, and industry trend discussion. It does not constitute medical advice, diagnostic advice, or treatment advice. Any medical world model intended for clinical application requires prospective validation, safety assessment, ethical review, regulatory review, and professional clinical supervision.

xiongjianghui.com

← Back to Blog