Magenta RealTime 2: Open and Local Live Music Models Google released Magenta RealTime 2, an open-weights AI model and real-time inference engine that enables live, low-latency music generation on a MacBook. The model responds instantly to MIDI, audio, and text inputs, achieving roughly 15 times lower latency than its predecessor while running on standard consumer hardware. The release includes a suite of playable instruments and an open-source library, positioning the tool as an interactive musical instrument rather than an offline music generator. We’re excited to share Magenta RealTime 2 MRT2 , a state-of-the-art open model and efficient real-time inference engine that enables you to build and play AI musical instruments on your laptop To get started, download the apps /mrt2 on your MacBook requires Apple Silicon . Unlike other large generative music models that work offline to turn a prompt into a track, MRT2 is a live, interactive model that you can control with MIDI and audio, in addition to text. It performs low-latency on-device inference to respond to your inputs instantly. You can run it as a standalone app, drop it into your DAW, or integrate it into other music software. In addition to the open-weights model, we are releasing a collection of playable instruments and experiences built with MRT2. Experiment with cloning sounds, blending styles, and creating live accompaniment with this low-latency music model. To explore the potential of live music models as instruments, today we are releasing: - Magenta RealTime 2, an open-weights model https://huggingface.co/google/magenta-realtime-2 2.4B parameters capable of high-quality real-time music synthesis with low-latency real-time controls via MIDI, text, and audio . - Alongside our model, we release an open source Python library https://github.com/magenta/magenta-realtime pip install magenta-rt offering inference via JAX https://github.com/jax-ml/jax / MLX https://github.com/ml-explore/mlx using SequenceLayers https://github.com/google/sequence-layers . - An inference engine https://github.com/magenta/magenta-realtime/tree/main/core written in C++, enabling efficient streaming audio generation on a MacBook GPU via MLX https://github.com/ml-explore/mlx . - A suite of example applications https://github.com/magenta/magenta-realtime/tree/main/examples built on the inference engine. These offer a glimpse into the creative potential of Magenta RealTime 2, and serve as references to help you get started building new instruments and software integrations. For a decade, the Magenta team has championed a vision of AI as a tool for musicians, never a replacement. We released our first neural synthesizer, NSynth https://magenta.withgoogle.com/nsynth , back in 2017 which put machine learning into playable hardware https://experiments.withgoogle.com/nsynth-super . We continued creating AI Instruments with projects such as DDSP https://magenta.withgoogle.com/ddsp-vst , Piano Genie https://magenta.withgoogle.com/pianogenie , and the first version of Magenta RealTime https://magenta.withgoogle.com/magenta-realtime , our debut live music model https://arxiv.org/abs/2508.04651 capable of generating and blending a wide range of musical styles. MRT2 achieves ~15x lower latency than version one, works on standard hardware and integrates directly into DAWs, making this live model a true musical instrument. A live music model with lower latency and expanded control Magenta RealTime | Magenta RealTime 2 | | | Live music generation | ✅ | ✅ | | Hardware required | TPU/GPU | MacBook | | Frame size | 2s | 40ms | | | Both MRT and MRT2 are codec language models low-latency-streaming-generation operating on sequences of audio tokens from the SpectroStream https://arxiv.org/abs/2508.05207 codec, but MRT2 achieves lower latency by performing frame-level autoregression with frame-aligned conditioning. To enable expressive musical control, MRT2 is designed to model audio that continuously follows MIDI inputs, alongside style prompts which can be either audio or text; prompts are embedded via MusicCoCa https://arxiv.org/abs/2508.04651 . For minimal interaction lag, both signals are injected as frame-aligned conditioning at every generation step, allowing the model to react to changes in the signal within a single frame 40 ms, plus additional sources of empirical latency, see below real-world-control-latency . Key to this approach is the use of a causal sliding window attention mechanism to enable continuous streaming generation while bounding memory requirements. Alongside this, learnable attention embeddings are also incorporated to improve generalization to arbitrary durations and context eviction artifacts e.g., ringing and feedback during long-context generation. Fast C++ inference engine via MLX While the original Magenta RealTime required a high-power GPU or TPU, Magenta RealTime 2 brings live generation to the hardware musicians actually use. To achieve this, we built a C++ inference engine powered by MLX that allows MRT2 to run natively on Apple Silicon . Apple’s MLX framework https://github.com/ml-explore/mlx provides the link between Python and C++. More specifically, we use MLX to compile the MRT2 model, implemented using the SequenceLayers library https://arxiv.org/pdf/2507.23292 , into an .mlxfn file which is a model container that bundles the weights and computational graph. Our C++ inference engine loads that file and uses the MLX runtime to efficiently execute it on Apple Silicon GPUs. The inference engine handles other necessary infrastructure model state, audio buffering / resampling, MIDI input and can be embedded into many music application frameworks where C++ supported. MLX allows MRT2 to run on Apple Silicon M-series : both model sizes can run offline non-real-time inference on any Apple Silicon Mac, while real-time streaming generating audio faster than playback is supported on the following devices: | Model | Platform | | Base 2.4B | MacBook M3 Pro or higher MacBook M2 Max or higher | | Small 230M | Any Apple Silicon MacBook, including MacBook Air | A suite of example applications for musicians and developers A key goal of Magenta RealTime 2 is to allow musicians to integrate live music models within existing software, and help developers build custom applications. To help you get started, our codebase provides several examples /mrt2 , including standalone apps, plugins and extensions. What’s Next? Our team members have been building new instruments with machine learning for nearly 10 years https://magenta.withgoogle.com/nsynth , excitedly making unique and quirky sounds from statistical knowledge of music. With Magenta RealTime 2, AI instruments are finally starting to gain the controllability and immediacy we expect from music creation tools, but plenty remains to be explored. From even more interaction and lower control latency, to audio streaming inputs that can enable jamming and real-time audio control, we look forward to expanding the capabilities of live music models further. Stay tuned for future updates And in the meantime, we are also excited to bring more features and example applications to MRT2 soon, including: Finetuning , allowing anyone to customize the model by directly training on their own data. Example performance tools created in collaboration with Manaswi Mishra https://manaswimishra.com/ . In the next few days, we will also be at the Music Technology Hackathon in Boston https://musichackspace.org/events/hackathon-boston-june-2026 , where we are presenting a challenge centered around Magenta RealTime 2. We look forward to seeing what everyone will come up with Citation Please cite our work as: Magenta Team. “Magenta RealTime 2: Open & Local Live Music Models”. https://magenta.withgoogle.com/magenta-realtime-2. June 2026 @article{mrt2, title = {Magenta RealTime 2: Open & Local Live Music Models}, author = {Magenta Team}, year = {2026}, note = {https://magenta.withgoogle.com/magenta-realtime-2} } Appendix: Technical Details Low-latency streaming generation Some background on Codec Language Modeling. A codec language model LM operates on discrete sequences of tokens from a neural audio codec. Here a codec refers to a pair of functions, an encoder and decoder, that convert audio to and from a discrete, compressed representation while minimizing distortion. More formally, the encoder is a function mapping raw stereo audio waveforms \ \textbf{a} \in \mathbb{R}^{T f s \times 2}\ into matrices of discrete tokens \ \mathbf{x} \in \mathbb{V} c^{Tf k \times d c}\ where \ T\ is the duration in seconds, \ f s\ the audio sampling rate, \ f k\ the token frame rate, \ \mathbb{V} c\ the codec vocabulary, and \ d c\ is the number of tokens per frame. In this case, \ d c\ refers to the “depth” of the residual vector quantization algorithm, referring to the iterative quantization of continuous embeddings of each audio frame. The goal of the codec LM is to model these token matrices. For efficiency, an increasingly common approach is to adopt a hierarchical autoregressive framework https://arxiv.org/abs/2508.04651 using a pair of Transformers: one which compresses temporal history into fixed-length embedding vectors \ \texttt{Temporal} \theta\ , and another which iteratively decodes tokens depth-wise given the current frame embedding \ \texttt{Depth} \phi\ . Assuming \ \mathbf{x i}\ refers to the \ i\ -th frame of \ \mathbf{x}\ , and \ x i^j\ refers to its \ j\ -th token, the joint distribution over \ x\ is modeled autoregressively as: \ P {\theta,\phi} \mathbf{x} = \prod {i=1}^{Tf k} \prod {j=1}^{d c} P \phi x i^j | \mathbf{x i^{