A 100% local Ai Vuber for Beginners and Non-programmer setup That is 100% free to Run With instant zero‑shot voice cloning That Uses Vtube studio’s api To make the mouth open and close and play animations after setting it up
----- Readme -----
AI VTuber For Begginers/non programmers Easy To setup
An AI VTuber that uses Whisper for speech recognition, Ollama for LLM inference, and Chatterbox TTS in a continuous listening loop.
This Was Also Made On a AMD gpu But the code is mainly supported For cpu users So it can be used without amd or nvdia gpus
This uses Python 3.10.11 if you don’t have it as your main Version do: py -3.10 -m venv venv
(you can check the version with python -V)
Features
Whisper (base.en model) - Real-time speech-to-text in English
Ollama (llama3.2) - AI model for generating VTuber responses
Chatterbox TTS - Text-to-speech to speak responses
Automatic silence detection - Only records when speech is detected
Continuous listening loop - Runs forever until Ctrl+C
VTube Studio integration - Controls mouth expressions via VTube Studio Api
Dependencies
(IMPORTANT!!!) MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example
Optional RVC Voice Cloning (Advanced - Windows Build Required)
Note: RVC voice cloning is optional. The VTuber works perfectly with just the core dependencies using Chatterbox TTS. RVC voice cloning requires C++ build tools (Visual Studio Build Tools) and can be challenging to install on Windows.
Quick Start
MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example
Listening phase: Waits for speech with automatic silence detection
Speech detection: Only starts recording after minimum speech duration is confirmed
Transcribing: Uses Whisper to convert speech to text
AI response: Ollama generates a VTuber-appropriate response
Speaking: Chatterbox TTS speaks the response aloud
Mouth control: VTube Studio controls mouth expressions in sync with speech
Repeat: Returns to listening mode
Notes
Press Ctrl+C to stop the VTuber at any time
Ensure proper audio device permissions for microphone access
For GPU acceleration, install PyTorch CUDA versions
Adjust silence_threshold, silence_duration, and min_speech_duration in the code for different environments
Troubleshooting
Common Issues
“Ollama not running” error:
Make sure Ollama is installed and running with ollama serve
Verify the model “llama3.2” is pulled
VTube Studio connection failed:
Ensure VTube Studio is running
Check that VTS_PORT (default: 8001) is correct
Make sure VTube Studio plugins are enabled
Audio permissions:
Grant microphone permissions to this application
On Linux: pip install pyaudio might require additional system packages
Model issues:
Whisper uses “base.en” for faster performance
Ensure all dependencies are installed from requirements.txt
Customization
Adjusting Silence Detection
Edit Aivtuber.py and modify these constants:
silence_threshold = 0.01 # Lower = more sensitive, Higher = less sensitive
silence_duration = 1.5 # Seconds of silence before stopping recording
min_speech_duration = 0.5 # Minimum speech duration to trigger recording
Changing Models
Whisper: Change in line 24: self.whisper_model = whisper.load_model("base")
Ollama: Change in line 109: model="llama3.2"
Adding Emotions
Edit the EMOTION_HOTKEYS dictionary in the code and add hotkeys to VTube Studio:
EMOTION_HOTKEYS = {
"happy": "your_happy_hotkey_id",
"sad": "your_sad_hotkey_id",
"angry": "your_angry_hotkey_id",
"thinking": "your_thinking_hotkey_id",
"neutral": "your_neutral_hotkey_id",
}
Future Enhancements
Potential future improvements:
Local LLM alternatives: Support for other Ollama models or local LLM implementations
Multi-language support: Whisper language switching and response localization
Context memory: Maintain conversation history for more coherent interactions
Advanced emotion system: More nuanced emotion detection and expression control
Stream processing: WebSocket streaming for lower latency
Plugin architecture: Easy addition of new features and integrations
##questions
Is this Ali?
No this is Not ali In Fact Ali is A WAY more complicated program than this.
This Also Doesn’t use Any of ali’s og code aside from How the mouth api works And Some recreated stuff Like the api being used so you can play music without having issues
Does This contain Any preMade vtuber models i can Download?
No But i do Have Older vtuber models you can use for this for example:
-Proby not Unless You Replaced Ollama with something Else
License
This project is open source. Feel free to modify and distribute as long as you give appropriate credit since that’s really important to get a habit out of.