Anti Refusal LLM Service

A developer built Cerberus AI, a 12MB desktop application using Tauri and Rust that runs uncensored language models locally. The app auto-detects GPU VRAM, pulls appropriate model quantizations, and uses refusal ablation to remove alignment-based refusal directions from model weights. Cerberus AI offers both a local-first desktop client and an OpenAI-compatible managed API for running refusal-ablated models on personal hardware.

I Built a 12MB Desktop App for Running Uncensored AI Models Locally Tauri + Rust + Ollama published: true description: How I built Cerberus AI — a local-first desktop app that auto-detects your GPU, pulls the right model quantization, and gives you uncensored AI chat without sending a single prompt to the cloud. Every major language model ships with an alignment layer that refuses certain prompts. Sometimes that's reasonable. Sometimes you're a security researcher, a creative writer, or just someone who doesn't want a corporation deciding what questions you're allowed to ask. I built Cerberus AI to fix that — and to make the whole experience local-first, lightweight, and dead simple to install. What Is Cerberus AI? Cerberus AI is a platform for running open-weight, refusal-ablated language models on your own hardware. It has three parts: A native desktop app ~12 MB built with Tauri + Rust — not Electron Open-weight GGUF models hosted on a public CDN An OpenAI-compatible managed API for when you don't want to run local The desktop app integrates directly with Ollama, auto-detects your GPU VRAM, and recommends the right model quantization for your hardware. From 4 GB laptops to 24 GB workstations, it just works. Cerberus AI Desktop Chat What Is Refusal Ablation? This is the core technical innovation behind Cerberus models. Here's the short version: Language models learn a refusal direction in their activation space during alignment training. When a prompt triggers this direction, the model produces refusal text "I can't help with that" regardless of whether the underlying model actually lacks the knowledge. Refusal ablation surgically removes this direction from the model weights. The technique: Identifies the refusal direction vector in the model's residual stream Projects it out of the weight matrices Preserves all other reasoning capabilities The result is a model that treats every prompt equally. No refusals. No moralizing. Just direct, unfiltered output from the model's actual knowledge. We apply this to multiple base architectures: Model Base Parameters Use Case Cerberus 4B v2 Qwen 3.5 4B General purpose, fits on 4-8 GB GPU Arbiter GL9b GLM-4 9B Heavier reasoning, needs 6+ GB Gamma3 1B BDPO Custom 1B Edge devices, CPU-only inference All models are distributed as GGUF files — the same format llama.cpp uses. Download once, run anywhere. Why Tauri Instead of Electron? Electron bundles an entire Chromium browser. That's 150+ MB just for the runtime. For a chat app that talks to a local Ollama instance, that's absurd. Tauri uses your system's existing WebView WebView2 on Windows and writes the backend in Rust. The result: ~12 MB installer vs 150+ MB for Electron Native performance — the Rust backend handles hardware detection, Ollama management, and streaming Lower memory footprint — no spare Chrome process eating your RAM System-level integration — direct access to GPU info, filesystem, and process management The frontend is Vue 3 + Vite, but the heavy lifting VRAM detection, model management, MCP plugin orchestration all happens in Rust. Install in 30 Seconds On Windows, open PowerShell and run: powershell irm https://cerberusai.dev/get | iex This single command: Checks for and installs WebView2 if missing Installs Ollama if missing Detects your GPU VRAM Downloads the recommended model quantization Installs the Cerberus desktop app Launches it You paste your API key free to generate at access.cerberusai.dev and you're chatting locally. The API — OpenAI-Compatible Don't want to run local? Use the managed API. It follows the OpenAI chat completions shape, so any existing OpenAI SDK code works with a one-line change: python python from openai import OpenAI client = OpenAI base url="https://api.cerberusai.dev/v1", api key="YOUR API KEY" response = client.chat.completions.create model="cerberus-4b-v2-abliterated", messages= {"role": "system", "content": "You are Cerberus, an unfiltered AI."}, {"role": "user", "content": "Explain how refusal ablation works"} , stream=True for chunk in response: if chunk.choices 0 .delta.content: print chunk.choices 0 .delta.content, end="" Streaming via SSE, standard error codes 401, 402, 429 , and a public model CDN at llm.cerberusai.dev that's fully CORS-enabled — you can even fetch model metadata from browser-based apps. curl Example bash curl -X POST https://api.cerberusai.dev/v1/chat/completions \ -H "Authorization: Bearer YOUR API KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "cerberus-4b-v2-abliterated", "messages": {"role": "user", "content": "Hello"} , "stream": false }' Model Downloads — Public CDN All GGUF model files are hosted on llm.cerberusai.dev with a public JSON API: bash curl https://llm.cerberusai.dev/api/models/ https://llm.cerberusai.dev/api/models/ curl https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/ https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/ wget https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4 K M.gguf https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4 K M.gguf wget -c https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8 0.gguf https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8 0.gguf Range requests are supported, CORS is enabled for all origins, and GGUF files are served with proper Content-Disposition: attachment headers. Built-In Features Beyond chat, the desktop app includes: Model Manager — browse local Ollama models, pull from the Cerberus cloud catalog, import raw GGUF files, switch active models, see disk usage MCP Plugin System — browse and install Model Context Protocol plugins from inside the app. There's also a public MCP Skills Server at api.cerberusai.dev/skills-sse Hardware Monitoring — CPU, RAM, and VRAM activity displayed in the interface Zero Telemetry — no prompts leave your machine during local inference. No analytics. No phone-home. Pricing Every account gets 50,000 free monthly credits. That's enough for casual use and testing. If you need more: Plan Price Monthly Credits Free $0 50,000 Lite $8/mo 300,000 Mid $15/mo 900,000 Exp $22/mo 2,000,000 One-time top-ups start at $5 125,000 credits . Stripe and PayPal supported. The free tier has no time limit — it refreshes every month. Local inference through Ollama costs zero credits. Credits only apply to the managed API. Try It 🌐 Website: cerberusai.dev 📦 GitHub: github.com/tjcrims0nx/CerberusAI-Desktop 🧠 Models: llm.cerberusai.dev 📖 API Docs: cerberusai.dev/docs/api 💬 Discord: discord.gg/YdVj7hEtv5 🔑 Get API Key: access.cerberusai.dev If you've ever been frustrated by a language model refusing a perfectly reasonable prompt, or if you just want to run AI locally without cloud dependencies — give Cerberus a try. The install is one command, the free tier is permanent, and the weights are open. I'd love to hear feedback. Drop into the Discord or open an issue on GitHub. Cerberus AI is an open-weight project. The desktop app source is on GitHub. Models are distributed as GGUF under open licenses. The managed API is a pay-as-you-go service.