Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba’s Qwen team released Qwen3.7-Plus, a multimodal large language model that understands images and video alongside text, now available on Alibaba Cloud’s Bailian platform. The model adds deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration, positioning it for agentic tasks that require planning and execution across multiple steps. The release follows Alibaba’s May unveiling of the Qwen3.7 generation and marks a shift toward multimodal hybrid agent technology for long-running, action-oriented workflows.

Alibaba’s Qwen team has released Qwen3.7-Plus https://qwen.ai/blog?id=qwen3.7-plus . The model is now available through Alibaba Cloud’s Bailian platform. Bailian is the console international users access as Model Studio. It offers API services to external developers. The release follows Alibaba’s May unveiling of the Qwen3.7 generation. Qwen3.7-Plus Qwen3.7-Plus https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc /doc/?type=model&url=2840914 2&modelId=qwen3.7-plus&serviceSite=international is a multimodal large language model. The model understands images and video, alongside written prompts. Its sibling, Qwen3.7-Max, is text-only. This is visual understanding, not generation. The model reads images and video; it does not create them. Alibaba’s image and video generation work sits in separate model families. Alibaba team describes the release as a step in multimodal hybrid agent technology. An agent is a model that plans and acts across steps. Building on image and video understanding, Qwen3.7-Plus adds five abilities. These are deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration. Self-programming means the model writes and revises its own code. Tool invocation means it calls external functions or APIs. Verification and testing means it runs outputs and checks results. Autonomous iteration means it loops until the task is done. Together, they describe a model built to act, not just answer. The Vision Case Qwen3.7-Plus is the multimodal half of the 3.7 family. Its preview already posted measurable vision results. In Vision Arena, Qwen3.7-Plus-Preview ranked 16 overall. That placed Alibaba as the 5 lab in vision. The model rank and the lab rank are separate figures. Vision Arena is a neutral leaderboard run by LM Arena. Users vote on image-understanding answers in blind matchups. The 16 result sits behind the top US labs, but inside the field. For image-heavy work, this is the signal that matters. Think OCR at scale, chart reading, or video-frame analysis. The text-only Max sibling anchors the generation’s reasoning. Max scored 56.6 on the Artificial Analysis Intelligence Index. That was the highest placement for a Chinese model at release. The Agentic Loop The clear shift in Qwen3.7 is its agentic focus. Alibaba team is positioning the models for long-running tasks. Bailian, the host platform, adds two relevant pieces. The first is an Agentic RL reinforcement learning mechanism. The platform uses real-world execution feedback to refine model accuracy over time. The second is a set of built-in safety guardrails. These keep autonomous tools inside preset operational limits. That detail matters when an agent runs commands or edits files. Marktechpost’s Visual Explainer 1 / 7 Qwen3.7-PlusAlibaba’s multimodal agent model, now on Bailian image and video understanding , deep reasoning, and agentic features. Available via API on Alibaba Cloud’s Bailian platform, accessed internationally as Model Studio . A multimodal large language model Multimodal — it reads images and video, alongside text input.- Visual understanding, not generation — it reads media, it does not create it. - The multimodal sibling to the text-only Qwen3.7-Max . - Alibaba describes it as multimodal hybrid agent technology. Five abilities beyond seeing Deep reasoning — works through problems step by step. Self-programming — writes and revises its own code. Tool invocation — calls external functions or APIs. Verification and testing — runs outputs and checks results. Autonomous iteration — loops until the task is done. Where it stands on vision - The preview ranked 16 overall in Vision Arena LM Arena . - That placed Alibaba as the 5 lab in vision. - Model rank and lab rank are separate figures . - Relevant for OCR, chart reading, and video-frame analysis. 56.6 on the Artificial Analysis Intelligence Index, the highest Chinese model at release. Built for long-running tasks - Bailian adds an Agentic RL reinforcement learning mechanism. - It uses real-world execution feedback to refine accuracy. - Built-in safety guardrails keep autonomous tools within limits. - That matters when an agent runs commands or edits files. What we know today Confirmed - Image and video understanding - Agentic feature set - Bailian API access - Proprietary, API-only Not yet published - Public price sheet - Context window size - Output token limits - Open weights The practical read - A vision-capable agent backend through one API. - Suits workloads mixing images, video, and tool use. - A leaderboard rank shows promise, not a guarantee . - Validate accuracy on your own data before committing. marktechpost.com https://www.marktechpost.com . Key Takeaways - Alibaba released Qwen3.7-Plus, a multimodal model now available via API on its Bailian platform Model Studio . - It understands images and video as input — understanding, not generation — and adds agentic features. - Capabilities include deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration. - Its preview ranked 16 in Vision Arena, making Alibaba the 5 lab in vision. Check out the Technical details. Also, feel free to follow us on and don’t forget to join our Twitter https://x.com/intent/follow?screen name=marktechpost and Subscribe to 150k+ ML SubReddit https://www.reddit.com/r/machinelearningnews/ . Wait are you on telegram? our Newsletter https://www.aidevsignals.com/ now you can join us on telegram as well. https://t.me/machinelearningresearchnews Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us https://forms.gle/wbash1wF6efRj8G58 Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights. - Michal Sutter - Michal Sutter - Michal Sutter - Michal Sutter