Show HN: AssistantAI – Real-Time Conversation Hints and Screenshot Analysis

A developer released AssistantAI, an open-source desktop AI assistant that captures audio from apps like Zoom and Teams, generates context-aware responses using attached documents, and analyzes screenshots via a web interface. Built with Python, it uses ScreenCaptureKit, Silero VAD, and whisper.cpp for low-latency speech processing.

A real-time desktop AI assistant. AssistantAI listens to a selected application Zoom, Teams, Telegram, browsers, and other apps , detects spoken conversations, generates context-aware responses using an attached document and can analyze screenshots through its built-in web interface. Built as a personal engineering project to explore low-latency desktop AI, speech processing, context-aware conversations and screenshot analysis. - Application-specific audio capture using ScreenCaptureKit - Voice activity detection Silero VAD - Local speech recognition with whisper.cpp - Context-aware conversations using an attached document - Screenshot capture and AI analysis - Local conversation history - Remote web interface The current implementation is written in Python. Useful starting points: - Python setup: bash python/run/setup python env.sh - Python docs: python/README.md - Audio capture research: docs/modules/audio-capture-research.md Additional implementation details, installation instructions, and OpenAI configuration are available in python/README.md. Future versions may include additional components and languages as the project evolves. - Additional LLM backends Ollama, llama.cpp - Additional document formats - Improved screenshot analysis