Medical AI Doesn’t Just Need Bigger Models. It Needs an ImageNet for State Transitions

The next frontier for medical AI is building "world models" that can predict how a biological state changes in response to an intervention, moving beyond current systems focused on classification or question-answering. To achieve this, the author proposes creating a shared research infrastructure called "Biomedical TransitionNet," analogous to ImageNet, which would standardize and evaluate data on baseline states, interventions, and follow-up states. This infrastructure is considered essential for enabling meaningful progress in understanding and altering biological trajectories, rather than simply developing larger chatbots.

Whoever builds the “state–intervention–transition” dataset for biomedicine may define the next generation of medical AI infrastructure. Author: Jianghui Xiong Medical AI is moving beyond classification, risk prediction, and question answering. The next frontier is not just: sample → label or: question → answer It is: state + action → next state In other words: current biological state + intervention → future biological state To build real biomedical world models, we need more than bigger models. We need something analogous to ImageNet — not for images, but for biological state transitions . I will call this idea, for now: Biomedical TransitionNet A shared infrastructure for recording, standardizing, and evaluating: baseline biological state - intervention - follow-up biological state - mechanism evidence - uncertainty This article explains why such an infrastructure is needed , why it matters , and why it is scientifically difficult . It does not claim that a complete biomedical world model already exists. 1. ImageNet was not just a dataset. It was infrastructure. When people talk about the deep learning revolution in computer vision, they often mention AlexNet, VGG, ResNet, and other neural network architectures. That is correct, but incomplete. One of the most important enabling factors was ImageNet . ImageNet was not merely a large collection of images. Its deeper value was that it gave computer vision a shared coordinate system: - a common task, - a common label hierarchy, - common training and test data, - common benchmarks, - and a way to compare progress across models and institutions. Before ImageNet, many computer vision systems were difficult to compare because they were trained and evaluated on different datasets. ImageNet helped the field converge around shared evaluation. That is why ImageNet became much more than a database. It became research infrastructure . Medical AI may now need something similar. But not an image dataset. Medicine needs an ImageNet for state transitions . 2. Medical AI has many models, but not enough transition data Today, we already have many types of medical AI systems: - medical large language models, - medical question-answering systems, - radiology models, - pathology models, - omics foundation models, - virtual cell models, - digital twin systems, - clinical decision support tools, - AI drug discovery platforms. These are important. But if we think the future of medical AI is only “a bigger medical chatbot”, we may miss the real challenge. Medicine is not only about answering questions. Medicine is about understanding and changing biological trajectories. A clinician does not only ask: What disease does this patient have? They also ask: Why is this biological state happening? What is driving deterioration? Which mechanisms are actionable? Which intervention may shift the trajectory? How should the response be measured? What if the expected response does not happen? What if an adverse response appears? These are not just language problems. They are state transition problems . Most medical AI today is still closer to: sample → label or: question → answer But biomedical world models require something closer to: state + action → next state That is the key shift. 3. What is a biomedical world model? In AI, a world model is usually understood as an internal model that helps an agent simulate how the environment changes after an action. A simple abstraction is: current state + action → future state In robotics, this may mean: robot pose + motor command → next scene state In autonomous driving, it may mean: traffic scene + driving action → future traffic scene In biomedicine, the analogous formulation would be: biological state + intervention → future biological state This could apply at multiple scales: cell state + perturbation → cellular response tissue state + treatment → tissue response patient state + intervention → follow-up state A biomedical world model should therefore not be understood as a medical chatbot. It is not merely: medical text in → medical text out A more meaningful biomedical world model would combine: state representation - intervention representation - transition modeling - mechanism evidence - uncertainty estimation - feedback correction That is much harder than ordinary medical QA. And it requires a different kind of data. 4. Why medicine needs its own ImageNet In computer vision, a basic supervised learning unit can often be simplified as: image + label For biomedical world models, the basic unit should look more like: baseline state + action + follow-up state Or mathematically: S t + A → S t + Δt Where: S t = biological state before intervention A = action or intervention S t + Δt = biological state after intervention Δt = time interval This is fundamentally different from a static medical database. A biomedical world model does not only need: - medical images, - electronic health records, - omics profiles, - drug-target databases, - clinical notes, - literature graphs. Those are useful, but insufficient. It needs structured longitudinal data describing: what the biological state was, what action was taken, what changed afterward, over what time scale, with what evidence, and with what uncertainty. This is why medicine needs something like a Biomedical TransitionNet . Not a direct copy of ImageNet. A new infrastructure designed for biological state transitions. 5. What should one data unit look like? A conventional supervised learning sample may look like: x → y Examples: image → diagnosis label clinical note → ICD code genomic variant → risk category A biomedical world-model sample should look more like: state before - intervention - state after - time interval - evidence chain - uncertainty A simplified schema might look like this: { "baseline state": { "molecular": "...", "clinical": "...", "phenotype": "...", "lifestyle": "...", "context": "..." }, "action": { "type": "...", "dose": "...", "frequency": "...", "duration": "...", "mechanism": "..." }, "follow up state": { "molecular": "...", "clinical": "...", "phenotype": "...", "adverse events": "..." }, "transition": { "direction": "...", "magnitude": "...", "time scale": "...", "confidence": "..." }, "evidence chain": { "target": "...", "pathway": "...", "biomarker": "...", "phenotype": "...", "validation": "..." } } This is obviously simplified. But the principle matters: A biomedical world model should learn not only: what this sample is but: how this biological system changed after a defined intervention 6. Five layers of a biomedical ImageNet If we want to build an ImageNet-like infrastructure for biomedical world models, it should include at least five layers. 6.1 State representation The first question is: What is the biological state? A patient state is not just a diagnosis label. Terms such as: diabetes hypertension aging inflammation fatigue frailty are useful, but they are high-level descriptions. A real biological state may include: - genome, - DNA methylation, - transcriptome, - proteome, - metabolome, - immune state, - inflammatory state, - organ function, - microbiome, - sleep, - activity, - diet, - medication history, - environmental exposure, - clinical background. A simplified representation may be: individual state = molecular state - pathway state - organ state - phenotype state - lifestyle context - clinical context Without a state representation, a biomedical world model does not know what it is simulating. 6.2 Action ontology A world model needs actions. In medicine, actions are complex. They may include: - drugs, - supplements, - diet, - exercise, - sleep intervention, - stress management, - cell therapy, - gene therapy, - regenerative medicine, - combination therapy, - N-of-1 personalized intervention. Even a drug intervention requires many parameters: drug name dose frequency route duration combination adherence contraindications adverse events Exercise intervention also requires: type intensity frequency duration heart-rate zone recovery condition baseline fitness If actions are not standardized, the model cannot learn meaningful transitions. 6.3 Transition record The core of a biomedical world model is the transition: before → after Examples: inflammatory state before intervention → inflammatory state after intervention DNA methylation age before intervention → DNA methylation age after intervention metabolic state before intervention → metabolic state after intervention tumor state before treatment → tumor state after treatment Without follow-up measurement, there is no transition. Without transition, there is no world model. Many medical datasets are still one-time measurements: one-time measurement Biomedical world models need: longitudinal measurement 6.4 Evidence chain A medical model should not only output a probability. If a model says: This intervention may help. That is not enough. It should also answer: Which targets are involved? Which pathways are affected? Which abnormal state does this address? Which biomarkers can validate the response? Which evidence comes from experiments? Which evidence comes from clinical data? Which part is only model inference? Which risks should be monitored? In medicine, prediction alone is not sufficient. A safer output should look more like: prediction + mechanism + validation + uncertainty This is especially important because medical AI should not become an uninspectable black box. 6.5 Benchmark task ImageNet helped computer vision because different models could be compared on shared tasks. Biomedical world models need benchmarks too. Possible benchmark tasks include: - cellular perturbation response prediction, - gene expression response after drug perturbation, - tumor state simulation after treatment, - metabolic biomarker response prediction, - inflammatory state transition prediction, - aging-related biomarker transition prediction, - N-of-1 intervention response direction prediction. But the metrics cannot be copied directly from image classification. Useful metrics may include: directional accuracy mechanistic consistency biomarker validation uncertainty calibration risk awareness cross-context generalization This is much harder than top-1 accuracy. But medicine requires it. 7. Related progress: promising, but still early To be scientifically careful, we should not pretend that complete biomedical world models already exist. They do not. But several related directions are emerging. 7.1 ImageNet as an infrastructure analogy ImageNet and ILSVRC showed how large-scale, standardized datasets and benchmarks can accelerate a field. However, ImageNet is a benchmark for image classification and detection. It is not equivalent to what biomedicine needs. Here, ImageNet is used only as an infrastructure analogy. The biomedical version must be longitudinal, dynamic, intervention-aware, and mechanism-sensitive. 7.2 World Models in AI Ha and Schmidhuber’s World Models is a representative work in AI world modeling. Its key idea is that an agent can learn an internal model of the environment and use it to simulate future states. Medicine cannot directly copy this setting. A human body is not a game environment. Clinical intervention cannot be freely explored by trial and error. But the abstraction: state + action → future state is still useful for thinking about medical AI. 7.3 Virtual cells and perturbation response Arc Institute’s State model is a recent example of virtual-cell modeling. It aims to predict how cells respond to drugs, cytokines, or genetic perturbations. Public descriptions indicate that State was trained on large-scale observational and perturbational single-cell data. This is important because it directly touches the pattern: cell state + perturbation → cellular response However, State is primarily a cellular-level model. It should not be confused with a complete patient-level biomedical world model. 7.4 Medical World Model for tumor evolution Recent work using the term Medical World Model , such as MeWM, explores generative simulation of tumor evolution under treatment conditions. This is relevant because it moves medical AI from static recognition toward treatment-conditioned disease dynamics. But this direction is still early. It should not be interpreted as a general solution to biomedical world modeling. 7.5 Digital twins and virtual physiological systems Long before today’s AI world-model terminology, fields such as computational physiology, systems biology, virtual physiological systems, and digital twins already tried to connect biological structure, mechanism, dynamics, and measurable outputs. That tradition matters. A good biomedical world model should not be just a black-box predictor. It should connect: state mechanism dynamic change measurement feedback Today’s biomedical world models can be seen as an extension of this older systems-modeling tradition into the era of AI, multi-omics, real-world data, and large-scale computation. 8. Why steerability matters A biomedical world model that only predicts is not enough. A model may predict that a patient’s risk is increasing. But medicine needs more than that. It needs to ask: Which state can be measured? Which abnormality can be explained? Which intervention can be described? Which transition can be tested? Which deviation can be traced? Which failure can be corrected? This is why I emphasize steerability . Going forward, I will use the name: SteeraMed: A Steerable Biomedical World Model Website: https://SteeraMed.com The earlier preprint name was: SEWO / Steerable Medicine World Model or in Chinese: 可驾驭医学世界模型 Whenever I mention SEWO / 可驾驭医学世界模型, it should be understood together with the new unified naming: SteeraMed: A Steerable Biomedical World Model The idea behind SEWO / SteeraMed is that biomedical world models should not only pursue predictive accuracy. They should also support: - state definition, - intervention description, - transition hypothesis, - mechanism audit, - deviation tracing, - uncertainty inspection, - expert steering, - and iterative correction. The related ideas were introduced in the preprint: World Models for Biomedicine: A Steerability Framework and are also presented at: https://steerable.world Important clarification: SEWO / SteeraMed is not a clinically validated treatment system. It is not a medical device. It is better understood as a structural framework and evidence-chain design principle for future biomedical world models. The key question is not only: Can the model predict? but: Can researchers and clinicians inspect, question, correct, and steer the model within clearly defined boundaries? 9. Why longevity medicine may be one entry point Biomedical world models could start from many areas: - oncology, - cardiovascular disease, - metabolic disease, - immunology, - neurodegeneration, - drug discovery, - virtual cells, - longevity medicine. Longevity medicine is not the only entry point. But it is an interesting one. Why? 9.1 Aging is a continuous state Aging is not a single disease label. It is a continuous, multi-system biological process involving: - inflammation, - metabolism, - immunity, - epigenetics, - mitochondrial function, - proteostasis, - stem-cell exhaustion, - cellular senescence, - organ function decline. That makes it naturally suitable for state modeling. 9.2 Longevity medicine requires repeated measurement Longevity medicine is not a one-time diagnostic event. It depends on repeated measurement over time. A useful intervention must be evaluated through: baseline state → intervention → follow-up state This is exactly the structure needed for biomedical world modeling. 9.3 Interventions are diverse Longevity-related interventions may include: - diet, - exercise, - sleep, - supplements, - drugs, - cell therapy, - regenerative medicine, - stress management, - environmental exposure management. This provides a rich action space. 9.4 Individual responses vary The same intervention may produce different responses in different people. That means longevity medicine cannot rely only on average effects. It needs N-of-1 style transition data: individual state → intervention → individual transition Each well-structured N-of-1 intervention can be seen as a small world-model experiment. 10. Engineering implications From an engineering perspective, the biomedical ImageNet is not just a dataset. It is a data infrastructure problem. It requires: - data collection, - data standardization, - multimodal integration, - time-series modeling, - intervention encoding, - causal confounding control, - privacy protection, - benchmark design, - safety boundaries, - evidence-chain tracking. A simplified loop may look like: measure state ↓ standardize state representation ↓ record intervention ↓ measure follow-up state ↓ construct transition sample ↓ train / evaluate world model ↓ generate testable hypothesis ↓ repeat and correct This is not a static dataset. It is a data flywheel. 11. Main challenges This is scientifically and technically difficult. Some of the main challenges include: 11.1 Biological state is complex A human state cannot be compressed into one label. We need ways to represent multi-omics, clinical metrics, imaging, lifestyle, symptoms, environmental exposure, and medical history as computable state variables. 11.2 Interventions are hard to standardize Drugs, exercise, diet, sleep, supplements, and cell therapies all have complex parameters. Without action standardization, transition learning will be noisy. 11.3 Follow-up data is scarce Most medical data is not collected as structured pre/post intervention transition data. This requires new data collection workflows. 11.4 Causal confounding is serious In the real world, people often change many things at once: diet exercise sleep medication supplements stress Attributing a state change to one factor is difficult. This requires careful study design and statistical methods. 11.5 Safety and ethics are central A biomedical world model cannot freely experiment like a game-playing agent. Any intervention-related model must clearly distinguish: research hypothesis health-management suggestion clinical decision support medical recommendation validated therapy Clinical use would require prospective validation, safety evaluation, ethical review, regulatory review where applicable, and professional oversight. 11.6 Open standards and business incentives may conflict If everything is closed, the field cannot build shared benchmarks. If everything is open, companies may lack incentives to invest. A practical ecosystem will need a balance among: open benchmarks privacy protection commercial incentives scientific collaboration 12. A minimal viable direction A biomedical ImageNet should not begin by trying to simulate the entire human body. A more realistic path is to start with minimal viable tasks. Examples: - cellular perturbation response prediction, - tumor state change after treatment, - metabolic biomarker response prediction, - inflammatory state transition prediction, - DNA methylation age transition, - N-of-1 longevity intervention tracking. A minimal task should define: 1. state variables 2. intervention variables 3. follow-up time 4. transition metrics 5. benchmark task 6. safety boundary Start narrow. Make it measurable. Make it repeatable. Make it auditable. Then scale. 13. Whoever defines state, action, and transition may define the field Medical AI will still need better models. But bigger models alone cannot solve the problem of biomedical state transition learning. The scarce asset is the infrastructure that allows models to learn: how life systems change after intervention Future platform-level medical AI companies may not be the ones with the largest language models. They may be the ones that can build the strongest data flywheel: measure biological state standardize interventions record follow-up changes construct mechanism evidence chains evaluate transition models repeat Whoever defines state defines what medical AI can see. Whoever defines action defines how medical AI understands intervention. Whoever defines transition defines how medical AI learns biological change. Whoever defines the benchmark defines how the field measures progress. Conclusion ImageNet helped machines learn to see the world. A biomedical ImageNet should help AI learn how life responds to intervention. That does not mean replacing clinicians. It means building a scientific infrastructure where models can learn: how states form how interventions act how systems transition how evidence is validated The next decade of medical AI may not be limited by model size alone. It may be limited by the lack of a shared infrastructure for biological state transitions. That is the real opportunity. References Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A Large-Scale Hierarchical Image Database. CVPR. 2009. https://ieeexplore.ieee.org/document/5206848 https://ieeexplore.ieee.org/document/5206848 Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. 2015. https://arxiv.org/abs/1409.0575 https://arxiv.org/abs/1409.0575 ImageNet official website. https://www.image-net.org/ https://www.image-net.org/ Ha D, Schmidhuber J. World Models. 2018. https://worldmodels.github.io/ https://worldmodels.github.io/ Arc Institute. Arc Institute’s first virtual cell model: State. https://arcinstitute.org/news/virtual-cell-model-state https://arcinstitute.org/news/virtual-cell-model-state Predicting cellular responses to perturbation across diverse contexts with State. bioRxiv. 2025. https://www.biorxiv.org/content/10.1101/2025.06.26.661135v1 https://www.biorxiv.org/content/10.1101/2025.06.26.661135v1 Yang Y, Wang ZY, Liu Q, et al. Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning. arXiv. https://arxiv.org/abs/2506.02327 https://arxiv.org/abs/2506.02327 IEEE Transactions on Biomedical Engineering. Digital Twins / AI World Models. https://www.embs.org/tbme/research-highlights/digital-twins-ai-world-models/ https://www.embs.org/tbme/research-highlights/digital-twins-ai-world-models/ Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nature Medicine. 2022. https://www.nature.com/articles/s41591-022-01981-2 https://www.nature.com/articles/s41591-022-01981-2 Xiong J. World Models for Biomedicine: A Steerability Framework. Preprints.org. 2026. https://www.preprints.org/manuscript/202605.0366 https://www.preprints.org/manuscript/202605.0366 DOI: https://doi.org/10.20944/preprints202605.0366.v1 https://doi.org/10.20944/preprints202605.0366.v1 SteeraMed: A Steerable Biomedical World Model. https://steerable.world https://steerable.world Disclaimer This article is for research, technical, and industry discussion only. It is not medical advice, diagnostic advice, or treatment advice. Any biomedical world model intended for clinical use would require prospective validation, safety evaluation, ethical review, regulatory review where applicable, and professional clinical oversight.