AVTR-1: Open-weight real-time flow-matching transformer for audio-driven avatars AVTR-1, an open-weight flow-matching transformer for audio-driven avatars, has been released for real-time dialogue generation. The model renders lip-synced speech and active listening at 25 frames per second on a single GPU, given a portrait image and dual-stream audio. The release includes model weights, TensorRT-accelerated inference code, and a production-ready backend available as an API or for self-hosting. AVTR-1 is a flow-matching-based autoregressive model for live dialogue. Given a portrait image and dual-stream audio, it renders lip-synced speech and active listening at 25 fps on a single GPU. Built for production deployment: model weights, TensorRT-accelerated inference, and the live-session backend - available as an API or fully self-hosted trailer 720p small.mp4 - Model weights - Inference code - Interactive streaming demo - Technical report Coming soon - Production-ready back-end Coming soon - Linux - NVIDIA GPU Ampere or later recommended - CUDA 12.x + TensorRT 10.x pixi https://prefix.dev/ — curl -fsSL https://pixi.sh/install.sh | sh git clone https://github.com/avaturn-live/avtr-1.git cd avtr-1 pixi install export AVTR1 LOCAL STORAGE=/path/to/avtr1 storage All downloaded weights and built engines go here. Defaults to