cd /news/artificial-intelligence/ai-vtuber-for-beginners-non-programm… · home topics artificial-intelligence article
[ARTICLE · art-38934] src=discuss.huggingface.co ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

AI VTuber For Beginners/non-programmers Easy To setup

A new open-source AI VTuber tool allows beginners and non-programmers to set up a fully local, free VTuber using Whisper for speech recognition, Ollama for LLM inference, and Chatterbox TTS, with VTube Studio integration for mouth animations and zero-shot voice cloning.

read4 min views1 publishedJun 25, 2026

A 100% local Ai Vuber for Beginners and Non-programmer setup That is 100% free to Run With instant zero‑shot voice cloning That Uses Vtube studio’s api To make the mouth open and close and play animations after setting it up

----- Readme -----

AI VTuber For Begginers/non programmers Easy To setup

An AI VTuber that uses Whisper for speech recognition, Ollama for LLM inference, and Chatterbox TTS in a continuous listening loop.

This Was Also Made On a AMD gpu But the code is mainly supported For cpu users So it can be used without amd or nvdia gpus

This uses Python 3.10.11 if you don’t have it as your main Version do: py -3.10 -m venv venv

(you can check the version with python -V)

Features

Whisper (base.en model) - Real-time speech-to-text in English

Ollama (llama3.2) - AI model for generating VTuber responses

Chatterbox TTS - Text-to-speech to speak responses

Automatic silence detection - Only records when speech is detected

Continuous listening loop - Runs forever until Ctrl+C

VTube Studio integration - Controls mouth expressions via VTube Studio Api

Dependencies

(IMPORTANT!!!) MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example

Optional RVC Voice Cloning (Advanced - Windows Build Required)

Note: RVC voice cloning is optional. The VTuber works perfectly with just the core dependencies using Chatterbox TTS. RVC voice cloning requires C++ build tools (Visual Studio Build Tools) and can be challenging to install on Windows.

Quick Start

MAKE A VENV FIRST AND MAKE SURE YOU ARE INSIDE THE PROJECT FOLDER for example

Listening phase: Waits for speech with automatic silence detection

Speech detection: Only starts recording after minimum speech duration is confirmed

Transcribing: Uses Whisper to convert speech to text

AI response: Ollama generates a VTuber-appropriate response

Speaking: Chatterbox TTS speaks the response aloud

Mouth control: VTube Studio controls mouth expressions in sync with speech

Repeat: Returns to listening mode

Notes

Press Ctrl+C to stop the VTuber at any time

Ensure proper audio device permissions for microphone access

For GPU acceleration, install PyTorch CUDA versions

Adjust silence_threshold, silence_duration, and min_speech_duration in the code for different environments

Troubleshooting

Common Issues

“Ollama not running” error:

Make sure Ollama is installed and running with ollama serve

Verify the model “llama3.2” is pulled

VTube Studio connection failed:

Ensure VTube Studio is running

Check that VTS_PORT (default: 8001) is correct

Make sure VTube Studio plugins are enabled

Audio permissions:

Grant microphone permissions to this application

On Linux: pip install pyaudio might require additional system packages

Model issues:

Whisper uses “base.en” for faster performance

Ensure all dependencies are installed from requirements.txt

Customization

Adjusting Silence Detection

Edit Aivtuber.py and modify these constants:

silence_threshold = 0.01    # Lower = more sensitive, Higher = less sensitive
silence_duration = 1.5      # Seconds of silence before stopping recording
min_speech_duration = 0.5   # Minimum speech duration to trigger recording

Changing Models

Whisper: Change in line 24: self.whisper_model = whisper.load_model("base")

Ollama: Change in line 109: model="llama3.2"

Adding Emotions

Edit the EMOTION_HOTKEYS dictionary in the code and add hotkeys to VTube Studio:

EMOTION_HOTKEYS = {
    "happy": "your_happy_hotkey_id",
    "sad": "your_sad_hotkey_id",
    "angry": "your_angry_hotkey_id",
    "thinking": "your_thinking_hotkey_id",
    "neutral": "your_neutral_hotkey_id",
}

Future Enhancements

Potential future improvements:

Local LLM alternatives: Support for other Ollama models or local LLM implementations

Multi-language support: Whisper language switching and response localization

Context memory: Maintain conversation history for more coherent interactions

Advanced emotion system: More nuanced emotion detection and expression control

Stream processing: WebSocket streaming for lower latency

Plugin architecture: Easy addition of new features and integrations

##questions

Is this Ali?

No this is Not ali In Fact Ali is A WAY more complicated program than this.

This Also Doesn’t use Any of ali’s og code aside from How the mouth api works And Some recreated stuff Like the api being used so you can play music without having issues

Does This contain Any preMade vtuber models i can Download?

No But i do Have Older vtuber models you can use for this for example:

-Proby not Unless You Replaced Ollama with something Else

License

This project is open source. Feel free to modify and distribute as long as you give appropriate credit since that’s really important to get a habit out of.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @whisper 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ai-vtuber-for-beginn…] indexed:0 read:4min 2026-06-25 ·