End-to-end model that listens, sees, thinks and responds on video in real time Alibaba unveiled Wan Streamer, an AI agent capable of real-time video interaction that can see, hear, and respond to users, marking a significant advancement beyond voice-only AI systems. Min Choi @minchoi We are cooked. China's Alibaba just revealed Wan Streamer. AI agents can now see you, hear you, and talk back on video in real time. This is not voice mode anymore 🤯 00:00 3:25 AM · Jun 26, 2026 371K Views