# MARS: Making Multimodal Models Safer Without Breaking a Sweat

> Source: <https://www.machinebrief.com/news/mars-making-multimodal-models-safer-without-breaking-a-sweat-4iep>
> Published: 2026-07-01 07:25:59+00:00

# MARS: Making Multimodal Models Safer Without Breaking a Sweat

MARS introduces a fresh approach to enhancing safety in multimodal language models, using text-based refusal strategies to manage multimodal challenges.

Large Language Models (LLMs) are the new rock stars of AI, but safety remains a concern. The solution? Some suggest aligning them post-[training](/glossary/training) or using refusal directions in their activation space. But [Multimodal](/glossary/multimodal) LLMs (MLLMs), which blend text, image, and video, these methods hit a snag. Why? Because gathering unsafe multimodal data isn't exactly easy. Enter a bold new approach that might just shake things up.

## Cracking the Multimodal Code

The breakthrough here's the concept of using textual refusal directions straight from the [LLM](/glossary/llm) backbone. Imagine applying these textual strategies to images and video. Sounds wild? Preliminary results say it's not only possible but effective, albeit with some caveats. The trick lies in choosing the right layer and strength, plus ensuring cross-modal alignment. But beware, while aligning, safe multimodal inputs might accidentally get steered toward refusal.

This brings us to the innovation of the hour: Modality-Agnostic Refusal Steering (MARS). Think of it as a safety net that doesn't need the crutch of unsafe multimodal data. MARS re-centers activations, tweaks steering strength within a trust zone, and picks the best intervention layer. All of this magic happens with the first [token](/glossary/token) generated, saving time and resources.

## Why MARS Matters

So, why should you care about MARS? Evaluations across five State-of-the-Art MLLMs show that MARS isn't just a theoretical exercise. It significantly boosts safety while keeping utility intact. This isn't just a technical curiosity. it's a big deal. It suggests that safety structures exist across different modalities and that textual refusal directions are a gold mine for aligning MLLMs.

Here's the kicker: if textual strategies can generalize across modalities, why haven't more researchers jumped on this bandwagon sooner? It's a question worth pondering. The answer could redefine how we approach safety in AI, making it more accessible and less dependent on hard-to-get data.

## Looking Ahead

The implications of MARS reach far beyond just improving safety. They suggest a future where building reliable AI doesn't require compromising on safety or getting bogged down by the grind of data collection. This is a blueprint for smarter AI development. AI, where safety is often at odds with utility, MARS might just be the hero we didn't know we needed.

The bottom line? If nobody would play it without the model, the model won't save it. MARS is a step in the right direction, proving that we can have our AI cake and eat it too. It's high time we rethought our approach to [AI safety](/glossary/ai-safety) with innovation like this leading the charge.

Get AI news in your inbox

Daily digest of what matters in AI.
