Show HN: wavecat – a fully local personal agent that watches your screen

Samuel Yuan, a developer from MIT, released wavecat, a fully local personal AI agent that continuously watches a user's screen to understand their activity and goals, with all processing and data storage occurring on-device to ensure privacy. The free app runs models locally using llama.cpp and Qwen3.6 35B A3B, requires at least 24 GB of unified memory on Apple Silicon Macs or a dedicated GPU with 12 GB VRAM on Windows/Linux, and aims to anticipate user needs without sending data to the cloud.

A fully local personal agent that watches your screen a super cool project dev by Samuel Yuan https://www.mit.edu/~sdkyuan/ No data centers; Private & entirely on your computer Download download wavecat constantly watches your screen to understand you. All models run locally, so no personal data ever leaves your device. Hopefully the future of personal AI is local. It's free. The context is always there. And privacy & sovereignty are always ensured. Usage Using wavecat is as easy as installing it. Once the app is installed, you will be guided through the vision and language model installation process. These models will take roughly 19 GB of disk space even when heavily quantized since they contain billions of parameters. You will also be guided to allow wavecat to view your screen. With this, wavecat will develop a rich understanding of your activity and goals. Ideally it will be able to anticipate your needs, before you even ask. Again, don't worry, no personal data will leave your device. All the data is stored locally and all the processing is done locally on your device. wavecat will never send any of your personal screen data to the cloud; you can even turn off your internet and wavecat will still work. Hardware Requirements For Mac users, at least 24 GB of unified memory is necessary for smooth background model running, with 32+ GB recommended. wavecat only supports Apple Silicon Macs. For Windows and Linux users, wavecat supports Vulkan, plus CUDA on Windows. A dedicated GPU with at least 12 GB of VRAM or unified memory device with at least 24 GB of RAM is recommended. More is always better. While no hardware requirement is strictly enforced, you will not have an enjoyable time with wavecat unless your device meets these requirements. But hopefully model improvements, hardware advances, and inference system optimizations will allow local personal agents to run on much more inexpensive hardware in the near future Technical Details wavecat uses llama.cpp https://github.com/ggml-org/llama.cpp as the primary backend inference engine. It's great. Qwen3.6 35B A3B serves as the primary language model "engine." A way to connect your own more powerful open-source model as a backend is currently in development at github.com/sdkyuanpanda/wavecat-sdk https://github.com/sdkyuanpanda/wavecat-sdk . On a M5 Pro with 48 GB of RAM, wavecat runs comfortably at roughly 70-90 tok/s depending on the task. Better speculative decoding methods, first-class MLX support, and other techniques rolled out in later updates will hopefully allow wavecat to run even faster soon. Misc Right now, English is the only language supported by wavecat. While you can interact with wavecat in other languages, performance will likely be impaired. Integrations with other apps and tools are in development and should be available soon I'll also be working on vastly improving the SDK so you can add your own plugins to wavecat. If you have any questions, please read the FAQ /faq . If you have any unanswered questions, feedback, or advice, please feel free to email me at sdkyuan at mit dot edu.