AI VTuber For Beginners/non-programmers Easy To setup

A new open-source AI VTuber tool allows beginners and non-programmers to set up a fully local, free VTuber using Whisper for speech recognition, Ollama for LLM inference, and Chatterbox TTS, with VTube Studio integration for mouth animations and zero-shot voice cloning.

A 100% local Ai Vuber for Beginners and Non-programmer setup That is 100% free to Run With instant zero‑shot voice cloning That Uses Vtube studio’s api To make the mouth open and close and play animations after setting it up ----- Readme ----- AI VTuber For Begginers/non programmers Easy To setup An AI VTuber that uses Whisper for speech recognition, Ollama for LLM inference, and Chatterbox TTS in a continuous listening loop. This Was Also Made On a AMD gpu But the code is mainly supported For cpu users So it can be used without amd or nvdia gpus This uses Python 3.10.11 if you don’t have it as your main Version do: py -3.10 -m venv venv you can check the version with python -V Features Whisper base.en model - Real-time speech-to-text in English Ollama llama3.2 - AI model for generating VTuber responses Chatterbox TTS - Text-to-speech to speak responses Automatic silence detection - Only records when speech is detected Continuous listening loop - Runs forever until Ctrl+C VTube Studio integration - Controls mouth expressions via VTube Studio Api Dependencies IMPORTANT MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example Optional RVC Voice Cloning Advanced - Windows Build Required Uncomment in requirements.txt or install manually requires C++ build tools pip install torchaudio librosa onnxruntime onnx fairseq pyworld praat-parselmouth TTS edge-tts Note: RVC voice cloning is optional. The VTuber works perfectly with just the core dependencies using Chatterbox TTS. RVC voice cloning requires C++ build tools Visual Studio Build Tools and can be challenging to install on Windows. Quick Start MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example Listening phase: Waits for speech with automatic silence detection Speech detection: Only starts recording after minimum speech duration is confirmed Transcribing: Uses Whisper to convert speech to text AI response: Ollama generates a VTuber-appropriate response Speaking: Chatterbox TTS speaks the response aloud Mouth control: VTube Studio controls mouth expressions in sync with speech Repeat: Returns to listening mode Notes Press Ctrl+C to stop the VTuber at any time Ensure proper audio device permissions for microphone access For GPU acceleration, install PyTorch CUDA versions Adjust silence threshold, silence duration, and min speech duration in the code for different environments Troubleshooting Common Issues “Ollama not running” error: Make sure Ollama is installed and running with ollama serve Verify the model “llama3.2” is pulled VTube Studio connection failed: Ensure VTube Studio is running Check that VTS PORT default: 8001 is correct Make sure VTube Studio plugins are enabled Audio permissions: Grant microphone permissions to this application On Linux: pip install pyaudio might require additional system packages Model loading issues: Whisper uses “base.en” for faster performance Ensure all dependencies are installed from requirements.txt Customization Adjusting Silence Detection Edit Aivtuber.py and modify these constants: silence threshold = 0.01 Lower = more sensitive, Higher = less sensitive silence duration = 1.5 Seconds of silence before stopping recording min speech duration = 0.5 Minimum speech duration to trigger recording Changing Models Whisper: Change in line 24: self.whisper model = whisper.load model "base" Ollama: Change in line 109: model="llama3.2" Adding Emotions Edit the EMOTION HOTKEYS dictionary in the code and add hotkeys to VTube Studio: EMOTION HOTKEYS = { "happy": "your happy hotkey id", "sad": "your sad hotkey id", "angry": "your angry hotkey id", "thinking": "your thinking hotkey id", "neutral": "your neutral hotkey id", } Future Enhancements Potential future improvements: Local LLM alternatives: Support for other Ollama models or local LLM implementations Multi-language support: Whisper language switching and response localization Context memory: Maintain conversation history for more coherent interactions Advanced emotion system: More nuanced emotion detection and expression control Stream processing: WebSocket streaming for lower latency Plugin architecture: Easy addition of new features and integrations questions Is this Ali? No this is Not ali In Fact Ali is A WAY more complicated program than this. This Also Doesn’t use Any of ali’s og code aside from How the mouth api works And Some recreated stuff Like the api being used so you can play music without having issues Does This contain Any preMade vtuber models i can Download? No But i do Have Older vtuber models you can use for this for example: -Proby not Unless You Replaced Ollama with something Else License This project is open source. Feel free to modify and distribute as long as you give appropriate credit since that’s really important to get a habit out of.