GELab-Zero: Android automation framework for multimodal LLMs

GELab-Zero, an open-source Android automation framework for multimodal LLMs, has been released, featuring a 4B GUI agent model and plug-and-play engineering infrastructure with no cloud dependencies. The framework supports local deployment, one-click launch, task distribution, and three agent modes, aiming to accelerate GUI Agent innovation and application deployment.

👋 Hi, everyone We are proud to present the first fully open-source GUI Agent with both model and infrastructure. Our solution features plug-and-play engineering with no cloud dependencies, giving you complete privacy control. - 🎁 Coming Soon... - 🎁 2025-12-12 MCP-Server ready： enable mcp server python mcp server/detailed gelab mcp server.py - 🎁 2025-12 We thank the following projects and authors for providing quantization tools & tutorials: GGUF v1 https://huggingface.co/bartowski/stepfun-ai GELab-Zero-4B-preview-GGUF , GGUF v2 https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF , EXL3 https://huggingface.co/ArtusDev/stepfun-ai GELab-Zero-4B-preview-EXL3 , Tutorials CN http://xhslink.com/o/1WrmgHGWFYh , Tutorials EN https://www.youtube.com/watch?v=4BMiDyQOpos - 🎁 2025-11 We release a lightweight 4B model onand Hugging Face . Model Scope - 🎁 2025-11 We release the tasks from thebenchmark. AndroidDaily - 🎁 2025-11 We release the current GELab-Zero engineering infrastructure. - 🎁 2025-10 Our research https://github.com/summoneryhl/gelab-engine paper on GELab-Engine is accepted by NeurIPS 2025 . 📖 Background -background 🎥 Application Demonstrations -application-demonstrations 📊 AndroidDaily -androiddaily-a-self-built-benchmark-close-to-daily-life 🏆 Open Benchmark -open-benchmark 🚀 Installation & Quick Start -installation-quick-start 📝 Citation -citation 📧 Contact -contact As AI experiences continue to penetrate consumer-grade terminal devices, mobile Agent research is at a critical juncture transitioning from "feasibility verification" to "large-scale application." GUI-based solutions have emerged as the optimal approach for the current stage in addressing complex mobile ecosystems and achieving scalable Agent capabilities, thanks to their universal compatibility with all apps and zero-cost integration without requiring app vendor adaptation. However, due to the highly fragmented nature of mobile application ecosystems, getting GUI Agents to truly work across different brands and device models often faces numerous engineering challenges: multi-device ADB connections, dependency installation, permission configuration, inference service deployment, task recording and replay. This means Agent developers and MCP users need to handle substantial engineering infrastructure work, making it difficult to focus on strategic innovation. To address this challenge, we are open-sourcing GELab-Zero to accelerate the innovation and application deployment of GUI Agents. It consists of two main components: - Plug-and-play complete inference engineering infrastructure that handles all the heavy lifting - A 4B GUI Agent model capable of running on local computer It provides a one-click launch experience similar to open-source GUI Agent MCP, can be deployed entirely locally, and puts the entire inference pipeline under your complete control. Specific capabilities include: Local Deployment : Supports 4B-scale models running on consumer-grade hardware, balancing low latency with privacy. One-click Launch : Provides unified deployment pipeline that automatically handles environment dependencies and device management. Task Distribution : Can distribute tasks to multiple phones while recording interaction trajectories for observability and reproducibility. Three Agent Modes : Covers multiple working modes including ReAct loops, multi-agent collaboration, and scheduled tasks. These capabilities enable GELab-Zero to flexibly handle complex task flows in real-world scenarios and provide a solid foundation for future extensions. For Agent developers, this infrastructure enables rapid testing of new ideas and strategies, validating interaction approaches; for enterprise users, it allows direct reuse of this infrastructure to quickly integrate MCP capabilities into product business. Task: Help me find any good recent sci-fi movies Task: Help me find a place where I can take my kids on the weekend Task: Claim meal vouchers on the enterprise welfare platform Task: Check if Metro Line 1 is operating normally, then navigate to the nearest entrance of Line 1 metro station Task: Go to the nearest Hema Fresh Store on Ele.me and purchase: Red strawberries 300g, Peruvian Bianca blueberries 125g 18mm diameter , seasonal fresh yellow potatoes 500g, sweet baby pumpkin 750g, Hema large grain shrimp sliders, 2 bottles of Hema pure black soy milk 300ml, Little Prince macadamia nut cocoa crisp 120g, Hema spinach noodles, Hema five-spice beef, 5 bags of Haohuan snail Liuzhou river snail rice noodles extra spicy extra smelly 400g, m&m's milk chocolate beans 100g Task: Search for 'how to learn financial management' on Zhihu and view the first answer with over 10k likes Task: Find a pair of white canvas shoes in size 37 on Taobao, priced under 100 yuan, then add the first item that meets the criteria to favorites Task: Go to Baicizhan and help me complete the vocabulary learning task Current mainstream benchmarks mostly focus on productivity applications such as email , but users' daily high-frequency usage is dominated by lifestyle service applications such as food delivery, ride-hailing, social media, payments, etc. , and these scenarios better reflect the practical value of current GUI Agents. To this end, we propose AndroidDaily: a multi-dimensional dynamic benchmark for the real world. We focus on empirical analysis of six core dimensions of modern life food, transportation, shopping, housing, information consumption, entertainment , prioritizing popular applications that dominate these categories. This makes the tasks in the benchmark characterized by real-world interaction results such as transaction payments, service bookings and tight online-offline inheritance. To balance evaluation comprehensiveness and execution efficiency, AndroidDaily adopts two evaluation modes: Contains 3146 actions. Provides task descriptions and step-by-step screenshots, requiring the Agent to predict the action type and action value such as click coordinates, input text for each step, primarily evaluating numerical accuracy. This method requires no complex engineering infrastructure and enables rapid, cost-effective large-scale model iteration and testing. The action type distribution in static testing is as follows total 3146 actions : CLICK : 1354 times - Click operations COMPLETE : 410 times - Task completion AWAKE : 528 times - App activation TYPE : 371 times - Text input INFO : 305 times - Information query WAIT : 85 times - Wait operations SLIDE : 93 times - Slide operations | Model | Accuracy | |---|---| | GPT-4o | 0.196 | | Gemini-2.5-pro-thinking | 0.366 | | UI-TARS-1.5 | 0.470 | | GELab-Zero-4B-preview | 0.734 | Contains 235 tasks. Conducted in a fully functional test environment such as real devices or emulators , the Agent needs to autonomously execute tasks from start to finish, with overall task success rate as the evaluation metric. This setup has the highest ecological validity and truly reflects the Agent's comprehensive capabilities in complex environments. The scenario distribution in the end-to-end benchmark is as follows: Transportation : 78 tasks 33.19% - Ride-hailing, navigation, public transit, etc. Shopping : 61 tasks 25.96% - E-commerce shopping, payment, order management, etc. Social Communication : 43 tasks 18.3% - Messaging, social interactions, etc. Content Consumption : 37 tasks 15.74% - News reading, video watching, content bookmarking, etc. Local Services : 16 tasks 6.81% - Food delivery, on-site services, etc. Typical tasks include ride-hailing, shopping, message sending, content bookmarking, food delivery ordering, etc. GELab-Zero-4B-preview achieves 75.86% success rate on AndroidWorld testing, demonstrating excellent performance on complex mobile tasks. We conducted comprehensive evaluations of GELab-Zero-4B-preview model across multiple open-source benchmarks, covering various dimensions including GUI understanding, localization, and interaction. The comparison results with other open-source models are shown below: The benchmark results demonstrate that GELab-Zero-4B-preview exhibits exceptional performance across multiple open-source benchmarks, with particularly outstanding results in real mobile scenarios Android World , proving its strong capabilities in practical applications. End-to-end inference requires just a few simple steps: - Set up LLM inference environment ollama or vllm - Set up Android device execution environment adb configuration and enable developer mode - Set up Agent runtime environment gelab-zero one-click deployment script - Set up trajectory visualization environment optional The third-party infrastructure dependencies mentioned above are very mature, so don't be afraid. We assume you have installed Python 3.12+ environment and have a certain command line operation foundation. If you have not installed the python environment yet, please refer to Step 0 for installation. If you have not installed Python 3.12+ environment yet, you can refer to the following steps for installation: For commercial friendliness and cross-platform support, we recommend using miniforge for Python environment installation and management. Official website: https://github.com/conda-forge/miniforge https://github.com/conda-forge/miniforge Windows Users : MUST USE powershell - Directly download and manually install Miniforge. Refer to the Install section at: https://github.com/conda-forge/miniforge https://github.com/conda-forge/miniforge . During installation, ensure to check the option to add Conda to the PATH environment variable to guarantee proper activation of Conda. - After installation, activate Conda. Open PowerShell and enter the following commands: Activate Conda in PowerShell conda init powershell Allow Conda scripts to run on PowerShell startup Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser Successful activation is indicated by " base " displayed at the beginning of the latest line in the terminal. - It is recommended to use VS Code for code execution and debugging. Download and install it from the official website: https://code.visualstudio.com/ https://code.visualstudio.com/ MAC and Linux Users : - Download and install miniforge using the command line: curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$ uname -$ uname -m .sh" bash Miniforge3-$ uname -$ uname -m .sh After installation, create and activate a new Python environment: conda create -n gelab-zero python=3.12 -y conda activate gelab-zero We have verified two mainstream LLM local inference deployment methods: ollama and vllm. Personal users are recommended to use the ollama method, while enterprise users and those with certain technical backgrounds can choose the vllm method for more stable inference services. For individual users conducting local inference, we strongly recommend using Ollama for local deployment, as it offers the advantages of simple installation and easy usage. - Windows and Mac users : You can directly download and install the graphical version from the official website: https://ollama.com/ https://ollama.com/ . - Linux users : Refer to the official documentation for installation: https://ollama.com/download/linux https://ollama.com/download/linux . The one-click installation command for Linux users is as follows: Download and install the latest Linux version of Ollama AppImage curl -fsSL https://ollama.com/install.sh | sh After completing the installation of Ollama, you need to download and deploy the gelab-zero-4b-preview model using the following commands: If huggingface cli is not installed yet, execute this command first pip install huggingface hub If the download speed is slow in China, you can try using the mirror acceleration "https://hf-mirror.com" WINDOWS users can use the following command: $env:HF ENDPOINT = "https://hf-mirror.com" LINUX and MAC users can use the following command: export HF ENDPOINT="https://hf-mirror.com" Download the gelab-zero-4b-preview model weights from huggingface hf download --no-force-download stepfun-ai/GELab-Zero-4B-preview --local-dir gelab-zero-4b-preview Import the model into ollama cd gelab-zero-4b-preview ollama create gelab-zero-4b-preview -f Modelfile If Windows users encounter an error, they need to specify the installation path, for example: C:\Users\admin\AppData\Local\Programs\Ollama\ollama.exe create gelab-zero-4b-preview -f Modelfile If your computer has low configuration, you may consider quantizing the model to improve inference speed. Note that quantization may cause a certain loss of model performance. For detailed documentation, see: https://docs.ollama.com/import quantizing-a-model Quantize the model with int8 precision small precision loss, model size becomes 4.4G : ollama create -q q8 0 gelab-zero-4b-preview Quantize the model with int4 precision large precision loss, model size becomes 2.2G : ollama create -q Q4 K M gelab-zero-4b-preview Revert to the original precision: ollama create -q f16 gelab-zero-4b-preview - Windows users : You can open the Ollama app, select the model gelab-zero-4b-preview, and send a message to test whether the model can reply correctly. - Mac and Linux users : You can test whether the model is installed successfully using the following command: curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gelab-zero-4b-preview", "messages": {"role": "user", "content": "Hello, GELab-Zero "} }' The expected output should include the model's reply content, indicating that the model has been successfully installed and is running. For example: {"id":"chatcmpl-174","object":"chat.completion","created":1764405566,"model":"gelab-zero-4b-preview","system fingerprint":"fp ollama","choices": {"index":0,"message":{"role":"assistant","content":"Hello I'm here to help with any questions or information you might need. How can I assist you today?"},"finish reason":"stop"} ,"usage":{"prompt tokens":16,"completion tokens":24,"total tokens":40}} After completing the above steps, it indicates that your ollama environment and gelab-zero-4b-preview model have been successfully installed, and you can proceed to the next step of configuring the mobile execution environment. To enable GELab-Zero to control the phone for task execution, you need to complete the following steps to configure the mobile execution environment: - Enable developer mode and USB debugging on the phone. - Install the ADB tool and ensure that the computer can connect to the phone via ADB. If you have already installed the adb tool, you can skip this step - Connect the phone to the computer via a USB cable and use the adb devices command to confirm a successful connection. Generally, you can enable developer mode and USB debugging on Android phones by following these steps: - Go to the "Settings" app on your phone. - Find the "About Phone" or "System" option, and tap on the "Build Number" 10+ times until you see a message saying "You are now a developer." - Go back to the main "Settings" menu and find "Developer Options."【Important, must enable】 - In "Developer Options," find and enable the "USB Debugging" feature. Follow the on-screen instructions to enable USB debugging.【Important, must enable】 Different phone brands may have slight variations, so please adjust according to your specific situation. Generally, searching for " how to enable developer mode" will yield relevant tutorials. After completing the setup, it should look like the image below: ADB Android Debug Bridge is a bridge tool for communication between Android devices and computers. You can install the ADB tool by following these steps: Windows Users :- Download the ADB tool package: https://dl.google.com/android/repository/platform-tools-latest-windows.zip https://dl.google.com/android/repository/platform-tools-latest-windows.zip and extract it to a suitable location. - Add the extracted folder path to the system environment variables so that you can use the adb command directly in the command line. For detailed steps, see: https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-2010/ee537574 v=office.14 https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-2010/ee537574 v=office.14 .The specific steps include: - Download the ADB tool package: 1. Right-click "Computer" in the "Start" menu and select "Properties." 2. Click "Advanced system settings." 3. In the "System Properties" dialog box, click the "Environment Variables" button. 4. In the "System variables" section, find and select the "Path" variable, then click the "Edit" button. 5. In the "Edit Environment Variables" dialog box, click "New," and then enter the extracted path of the ADB tool package. 6. Click "OK" to save the changes and close all dialog boxes. MAC and Linux Users : - You can install the ADB tool using Homebrew Mac or package managers Linux . If you don't have Homebrew installed, you should install it first with the command: ruby -e $ curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install - Then use the following command to install the ADB tool: brew cask install android-platform-tools After connecting your phone to the computer using a USB cable, open a terminal or command prompt and adb devices If the connection is successful, you will see an output similar to the following, showing the list of connected devices: List of devices attached AN2CVB4C28000731 device If you do not see any devices, please check if the USB cable and the USB debugging settings on your phone are correctly enabled. When connecting the phone for the first time, an authorization prompt may pop up on the phone; simply select "Allow." As shown in the image below: If the installation is unsuccessful, you can refer to third-party documentation: quickappcn/issues 120 https://github.com/quickappcn/issues/issues/120 for further troubleshooting. After completing the above steps, you can deploy the GELab-Zero runtime environment with the following command: Clone the repository git clone https://github.com/stepfun-ai/gelab-zero cd gelab-zero Install dependencies pip install -r requirements.txt To inference a single task python examples/run single task state compress.py The trajectory will be defult saved in the running log/server log/os-copilot-local-eval-logs/ directory. You can visualize the trajectory using streamlit: If you want other devices in the local area network LAN to access it, use --server.address 0.0.0.0 streamlit run --server.address 0.0.0.0 visualization/pages/main page.py --server.port 33503 If you only want to access it on the local machine, use the following command: streamlit run --server.address 127.0.0.1 visualization/pages/main page.py --server.port 33503 Then open your browser and go to http://localhost:33503 to access the visualization interface. Each task execution will generate a unique session ID, which can be used to query and visualize the corresponding trajectory in the visualization interface. The action with point s such as click and slide will be marked on the screenshot for better understanding of the agent's behavior. Make sure you have already downloaded the GELab-Zero-4B-preview model locally. Clone the official llama.cpp repository: git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp pip install -r requirements.txt If there are dependency conflicts, create a Conda virtual environment. Convert the model to GGUF format. Command-line arguments: - The first path points to your locally downloaded GELab-Zero-4B-preview from Hugging Face. --outtype specifies the quantization precision. --outfile is the output filename; you can customize the path. No quantization, keep full model quality python convert hf to gguf.py /PATH/TO/gelab-zero-4b-preview --outtype f16 --verbose --outfile gelab-zero-4b-preview f16.gguf Quantized faster but lossy; known issue: <THINK may become <THIN python convert hf to gguf.py /PATH/TO/gelab-zero-4b-preview --outtype q8 0 --verbose --outfile gelab-zero-4b-preview q8 0.gguf The INT8-quantized GGUF file is ~4.28 GB for reference. GELab-Zero-4B-preview is a vision model, so you also need to export an mmproj file: INT8 quantization for mmproj python convert hf to gguf.py /PATH/TO/gelab-zero-4b-preview --outtype q8 0 --verbose --outfile gelab-zero-4b-preview q8 0 mmproj.gguf --mmproj The INT8-quantized mmproj GGUF file is ~454 MB for reference. You can use any llama.cpp-compatible client to spin up a local API service; here we use Jan https://github.com/janhq/jan as an example: Download the Jan https://github.com/janhq/jan/releases client and install it. Go to Settings → Model Provider → choose llama.cpp, then import the models: Select the two GGUF files you just converted: Back in the model UI, click Start . Create a chat to verify the model runs correctly: Once tokens are streaming normally, start the local API server. Go to Settings → Local API Server, create an API key under server configuration, then launch the service: llama.cpp’s service differs slightly from Ollama, so you must tweak the model config in GELab-Zero Agent. Two places: - In model config.yaml , update the port and API key use the key you just created : local: api base: "http://localhost:1337/v1" api key: "YOUR KEY" - In examples/run single task state compress.py , set the model name to your local model and keep model provider as local : local model config = { "task type": "parser 0920 summary adv state compress", "model config": { "model name": "gelab-zero", "model provider": "local", "args": { "temperature": 1, "top p": 0.95, "frequency penalty": 0.05, "max tokens": 32768, }, }, "config": { "enable state compression": True, "state compression interval": 10, "state compression recent window": 10, "state compression max field items": 10, }, } If you find GELab-Zero useful for your research, please consider citing our work : @software{gelab zero 2025, title={GELab-Zero: An Advanced Mobile Agent Inference System}, author={GELab Team}, year={2025}, url={https://github.com/stepfun-ai/gelab-zero} } @inproceedings{gelab mt rl, title={GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning}, author={Yan, Haolong and Shen, Yeqing and Huang, Xin and Wang, Jia and Tan, Kaijun and Liang, Zhixuan and Li, Hongxin and Ge, Zheng and Yoshie, Osamu and Li, Si and others}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems} } For questions and support, please contact: tankaijun@stepfun.com mailto:tankaijun@stepfun.com You can contact us and communicate with us by joining our WeChat group: | WeChat Group | |---|