Blog

Medical World Models: Steerable AI vs Harness Engineering

Medical AI cannot rely solely on the external guardrails of harness engineering. A true medical world model must enter the internal architecture of state-action-transition-feedback, making biological state changes representable, deducible, auditable, and correctable. Harness engineering controls the AI system from outside. Steerable world modeling structures biomedical reasoning from inside.

熊江辉 · 2026-05-10

Over the past few years, the rapid advancement of AI capabilities has made one question increasingly important:

> How should we actually control a powerful AI system?

In general-purpose AI, software engineering, and agentic systems, the common answer is: impose external constraints on the model.

For example:

- prompt templates;

- system prompts;

- retrieval-augmented generation (RAG);

- tool calling;

- workflow orchestration;

- safety filters;

- output validators;

- human-in-the-loop;

- audit logs;

- sandboxed execution environments.

This class of approaches can be broadly described as a form of harness engineering:

using external engineering structures to wrap, constrain, schedule, and verify an AI model so that it completes tasks more safely and reliably.

This is, of course, very important.

But when AI enters medicine — particularly the question of medical world models — external guardrails alone are not enough.

Because the core challenge of medical AI is not merely:

> How do we prevent the AI from making things up?

The deeper questions are:

> Does the AI genuinely understand the current state of a biological system?

> What does an intervention mean within that system?

> After applying an intervention, in which direction might the state move?

> If the state does not move as expected, can the cause of failure be identified and corrected?

These questions cannot be solved by prompts, RAG, or safety filters alone.

They require a more foundational architecture:

a steerable biomedical world model.

1. What Problem Does a Medical World Model Actually Solve?

In robotics or game agents, a world model can be simply understood as:

current state + action → predicted next state

A robot knows where it is and, after taking a certain action, predicts where it will be next.

But "state" and "action" in medicine are far more complex.

Medical state is not a simple coordinate, nor is it merely a disease label.

A person's biological state may include:

- molecular network states;

- DNA methylation status;

- immune status;

- metabolic status;

- organ function status;

- inflammation levels;

- aging module states;

- past medical history;

- environmental exposures;

- individual baselines;

- temporal trends.

A medical action is not simply "executing a command."

A medical action might be:

- a drug;

- a dosage;

- timing;

- intervention sequence;

- nutrition;

- exercise;

- sleep;

- behavioral change;

- cell therapy;

- combination interventions;

- follow-up strategies.

Therefore, the question a medical world model truly needs to answer is not "will this person get sick?" but rather:

> Given the current state, if a certain intervention is applied, in which direction might the biological state move?

Going further, it must also answer:

> If the state does not move as expected, where did the problem occur?

This is why I raised the question of "steerability" in the preprint World Models for Biomedicine: A Steerability Framework. A medical world model should not merely be a bigger predictor — it should be an auditable, challengeable, and correctable state-transition reasoning system.

2. Harness Engineering Is Necessary, but It Is Not a Medical World Model

Let me be clear:

I am not opposed to harness engineering.

On the contrary, I believe medical AI must have very strong harness engineering.

At minimum, a medical AI system needs to:

- not overstep diagnostic authority;

- not directly replace physician decisions;

- not prescribe medications on its own;

- not fabricate literature references;

- not exaggerate treatment efficacy;

- distinguish between health education and medical advice;

- label uncertainty;

- trigger escalation to human experts for high-risk cases;

- retain audit records;

- comply with clinical governance and regulatory constraints.

All of these are critically important.

But they primarily address AI behavioral risks:

Is the AI fabricating information?

Is the AI overstepping its boundaries?

Is the AI violating regulations?

Is the AI outputting unsafe recommendations?

They do not directly address biological reasoning risks:

Is the patient's state correctly represented?

Is the intervention correctly modeled?

Is the mechanistic chain valid?

Is the direction of state transition plausible?

Can molecular changes propagate to functional changes?

Can functional changes propagate to clinical benefit?

Can the cause of failure be pinpointed?

These two categories of risk must not be conflated.

Harness engineering can make AI outputs safer.

But it cannot automatically endow the AI with a medical world model.

3. A "Medical LLM + Toolchain" Does Not Equal a Medical World Model

Many current medical AI agent systems are, at their core:

+ medical knowledge base

+ RAG retrieval

+ tool calling

+ report generation

+ safety filtering

+ physician review

Such systems can be very useful.

They can help physicians read literature, organize medical records, generate summaries, interpret reports, query clinical guidelines, and design follow-up questionnaires.

But strictly speaking, this still may not constitute a medical world model.

Because it may not have truly defined:

1. what the current biological state is;

2. how an intervention is represented;

3. how the current state transitions under a given action;

4. how counterfactual paths are compared;

5. how failure feeds back into the next iteration of the model.

In other words, it has a workflow, but not necessarily a world model.

It has a harness, but not necessarily steerability.

A system that can safely answer medical questions is not the same as one that can simulate medical state transitions.

This distinction is crucial.

4. What Is a Steerable Biomedical World Model?

The steerable biomedical world model I describe is neither an "automated prescribing system" nor a "clinical prescription engine."

More precisely, it is an architectural objective for medical AI:

> Grounded in explicit state representation, action semantics, transition hypotheses, mechanistic evidence, and feedback checks, it helps humans understand and examine how biological states might be steered.

It should not directly claim "this intervention will definitely work."

Instead, it should more cautiously state:

Based on the current state, candidate actions, and mechanistic constraints,

this intervention may produce a testable direction of state transition.

That is, an early-stage medical world model is better understood as:

state-action-transition hypothesis system

rather than:

validated clinical decision system

This is an important boundary.

At the current stage, many medical world models do not yet have sufficient longitudinal intervention data to truly learn an empirical state + action → next state function.

Therefore, a more scientifically precise formulation would be:

knowledge-constrained transition tendency

That is:

> Based on existing biological mechanisms, network structures, individual states, and intervention semantics, it proposes an auditable, verifiable, and falsifiable hypothesis about the direction of state transition.

This is more robust than "predicting treatment efficacy" and better suited to the early developmental stage of medical AI.

5. The Core Distinction Between Steerability and Harness

It can be summarized in two sentences:

> Harness engineering controls the AI system from outside.

> Steerable world modeling structures biomedical reasoning from inside.

More concretely:

| Dimension | Harness Engineering | Steerable Biomedical World Model |

|-----------|---------------------|----------------------------------|

| Core question | How to constrain AI outputs and behaviors? | How to represent and reason about biological state transitions? |

| Primary objects | Models, tools, permissions, workflows, outputs | Biological states, interventions, transition hypotheses, feedback |

| Typical components | Prompts, RAG, validators, workflows, guardrails | State, action, transition, counterfactual, QC |

| Medical role | Reducing AI output risks | Improving the auditability of biological reasoning |

| Failure diagnosis | Did the output violate rules? Was a tool called incorrectly? | Was the state measurement wrong? Was the intervention semantics wrong? Was the transition hypothesis wrong? |

| Essence | External safety engineering | Internal state dynamics architecture |

| Sufficiency for a medical world model | Not sufficient | A core requirement |

This does not mean the two are mutually exclusive.

On the contrary, medical AI requires both to coexist.

But they belong to different layers.

6. Why Medical AI Cannot Rely Solely on Harness

Because the core object of medicine is not text — it is the human body.

Errors in text-based systems can usually be corrected through regeneration, human review, or rule-based filtering.

But medical systems face:

- dynamic biological states;

- individual variability;

- multi-scale mechanisms;

- long-term feedback;

- intervention risks;

- irreversible consequences;

- clinical ethical responsibilities.

Therefore, the key question for medical AI is not merely "is the answer compliant?" but rather:

Does the biological state model underlying this answer hold up?

Consider an example.

If a system says, based on a knowledge base, that "a certain drug is associated with the inflammatory pathway," that is merely knowledge retrieval.

If it adds, "Please consult a physician; this content does not constitute medical advice," that is harness engineering.

But if it further asks:

Is this individual's current inflammatory module truly abnormal?

Does the drug's target fall within the relevant aberrant modules?

Could the direction of action plausibly move the state toward the desired direction?

Is there a mechanistic evidence chain?

If there is no change after intervention, at which step might the failure have occurred?

This begins to approach a medical world model.

In other words:

Safe output ≠ sound biological reasoning

A medical AI system can be highly compliant yet still have shallow biological reasoning.

A medical AI system can also have interesting mechanistic reasoning, but without harness engineering, it is unfit for deployment.

This is why the two must be distinguished — and also combined.

7. Five Structural Conditions for Steerability

In my preprint, I organized the steerable medical world model around five constraint checkpoints. These are not decorative concepts — they represent the minimum set of questions a medical world model must be able to address.

1. State Representation

The first question is:

> What is this person's current state?

In medicine, state should not be reduced to a disease name. Disease labels are phenotypic-level descriptions, not equivalent to the state space of a world model.

A medical world model requires finer-grained state representations, such as: molecular states, pathway states, network states, immune states, metabolic states, organ function states, aging module states, individual baseline deviations, and temporal trajectories.

In the Capomics / mIC vector formulation, an individual can be represented as a composite state of multiple modules' intrinsic capacities, rather than a single age or single risk score.

2. Capability Quantification

The second question is:

> Can this state be quantified?

In medicine, it is common to describe someone as having "poor immunity," "poor metabolism," or "accelerated aging." But a world model cannot remain at the level of adjectives.

It needs to translate states into quantifiable representations that are comparable, trackable, and updateable.

Of course, these quantifications should not be over-interpreted as representing biological reality itself. They are, first and foremost, model variables — operational representations that make state-transition reasoning computationally tractable.

To be precise:

> Quantified variables are not biological truth per se, but operational representations used for modeling, comparison, and verification.

3. Intervention-Response Semantics

The third question is:

> What does an intervention actually mean within the model?

In an ordinary database, an intervention might simply be a label. But in a world model, an intervention cannot be just a label.

It must be translated into: which modules does this action target? In what direction? Under what conditions? Does it require information about dosage, timing, frequency, and sequence?

That is, a medical world model must not only know "what was done" but also "how this action is encoded within the biological system."

This is intervention-response semantics.

4. Counterfactual Transition

The fourth question is:

> What if we do A instead of B?

Medical decision-making is inherently a counterfactual problem. A simple recommendation system cannot truly answer these questions. Even an LLM with a harness can only express uncertainty more safely.

A genuine medical world model must bring these questions into a computable structure:

S(t), A → Ŝ(t+Δt | A)

However, it must be emphasized that at the current early stage, Ŝ(t+Δt | A) is better understood as a "testable transition direction hypothesis" rather than a "validated clinical outcome prediction."

This is critical for scientific rigor.

5. Quality-Control Feedback

The fifth question is:

> If the expected outcome did not occur, where did the problem arise?

This is a key distinction between steerability and ordinary simulation.

A steerable medical world model should be able to decompose failure into multiple possible links: Was the state measurement wrong? Was the intervention definition wrong? Did the module response fail to occur? Did the state fail to move as expected? Did downstream phenotype propagation fail? Was the time window incorrect? Was the dosage inappropriate? Was the individual baseline different?

In the preprint, I described this as the shift from a "what-if simulator" to a "why-not steering system."

That is, a medical world model must not only ask "what happens if we do this" but also "why didn't the expected result occur."

This step is critically important.

Because the value of medical AI lies not only in predicting success, but in understanding failure.

8. The Relationship Between Harness and Steerability: Not Replacement, but Layering

I believe that any serious future medical AI system should have at least four layers:

1. Biomedical World Model Layer

Responsible for states, actions, transitions, mechanisms, and feedback.

2. Harness Engineering Layer

Responsible for permissions, workflows, tools, safety boundaries, and output verification.

3. Clinical Governance Layer

Responsible for clinical accountability, ethics, regulation, and scope of application.

4. Human Oversight Layer

Responsible for final judgment, contextual interpretation, and actual decision-making.

The steerable world model addresses the question: "Does the medical reasoning have internal structure?"

Harness engineering addresses: "Can the AI system be used safely?"

Clinical governance addresses: "Should this system be deployed in a real-world scenario?"

Human oversight addresses: "In a specific individual's context, what should the final judgment be?"

These four layers must not be conflated.

In particular, harness engineering must not be treated as a substitute for a medical world model.

External guardrails can prevent a model from overstepping boundaries, but they cannot automatically grant the model the ability to understand biological state transitions.

9. What Should SteeraMed Point Toward?

If SteeraMed is understood as a research, methodological, or platform entry point for steerable medical AI, then it should not be merely a medical LLM + RAG + safety guardrails. Such a system is certainly useful, but it operates more at the application layer of medical AI.

I believe SteeraMed should point toward a deeper question:

> How can medical AI become steerable rather than merely constrained?

That is, SteeraMed should not focus solely on "how to prevent medical AI from making things up." It should also address:

How to define medical states?

How to encode medical actions?

How to estimate state transitions?

How to compare counterfactual paths?

How to convert failure into feedback?

How to make mechanistic chains auditable?

This means SteeraMed should include at least two architectural layers:

Harness Layer

Permissions, safety, compliance, auditing, output boundaries.

Steerability Layer

State representation, action semantics, counterfactual transitions, mechanistic evidence chains, quality-control feedback.

The former keeps the AI within bounds.

The latter ensures medical reasoning does not float free.

The former addresses: "Can the AI speak safely?"

The latter addresses: "Can medical states be structurally steered?"

This is the true reason the SteeraMed direction is worth proposing.

10. A More Cautious Assessment: Steerability Is Not Clinical Automation

It must be emphasized here:

A steerable biomedical world model does not equal a clinical automated decision system.

It does not mean "the model can make decisions on behalf of physicians." Nor does it mean "the model can directly recommend treatments to patients."

A more appropriate framing is:

The model helps formulate testable state-transition hypotheses,

and makes every hypothesis's state, action, mechanism, transition, and uncertainty available for inspection.

For a medical world model to truly enter clinical application, it still requires: prospective validation, safety assessment, clinical trials, real-world follow-up, regulatory review, physician oversight, clearly defined scope of application, and accountability frameworks for failure.

Therefore, at this stage, steerability should be regarded primarily as a research architecture and scientific reasoning framework, rather than a completed clinical product capability.

This boundary must be made explicit.

11. From "Constrainable" to "Steerable"

"Constrainable" means:

I can prevent the model from doing things it should not do.

"Steerable" means:

I can understand the system's state, provide directional signals, observe state transitions, and adjust the next step based on feedback.

These are two fundamentally different capabilities.

Using a horse as an analogy:

- harness engineering is the saddle, reins, fences, and training arena;

- a steerable world model is the dynamic control relationship formed between rider, horse, terrain, and direction.

Having a saddle does not mean knowing which way to go.

Having fences does not mean being able to navigate complex terrain.

Having rules does not mean understanding the horse's condition or changes in terrain.

The same is true for medicine.

Having safety guardrails does not mean having a medical world model.

Having RAG does not mean being able to simulate biological state transitions.

Having audit logs does not mean being able to diagnose mechanistic failures.

Having an agent workflow does not mean possessing counterfactual transition reasoning.

A true medical world model must enter the state space of the biological system.

12. Conclusion: The Next Step for Medical AI Is Not a Bigger Black Box, but a Steerable World Model

The future of medical AI should not be merely:

Bigger models

+ more medical literature

+ more complex agent workflows

These are certainly important, but they are not enough.

What medicine needs is:

Explicit state representations

+ explicit action semantics

+ testable state-transition hypotheses

+ auditable mechanistic chains

+ diagnosable feedback loops

Harness engineering keeps AI from running out of control.

Steerable world modeling ensures that medical reasoning can be guided, inspected, and corrected.

The former addresses: "How do we keep the AI under control?"

The latter addresses: "How do we steer biological state changes?"

The first generation of truly useful medical world models may not be the systems that claim to predict all treatment outcomes.

Instead, they may be more modest, more auditable, and more falsifiable systems:

> They do not claim to know the future.

> They merely make every hypothesis about states, actions, transitions, mechanisms, and uncertainties explicit enough to be tested.

This is the critical step for medical AI to move from "prediction tools" to "steerable systems."

References

1. Xiong J. World Models for Biomedicine: A Steerability Framework. Preprints.org, 2026. DOI: 10.20944/preprints202605.0366.v1.

2. SEWO / Steerable Medicine World Model: https://steerable.world

3. SteeraMed: https://steeramed.com

4. SteeraMed: https://steeramed.org

5. DeepoMe: https://deepome.com

xiongjianghui.com

← Back to Blog