{"slug": "anti-refusal-llm-service", "title": "Anti Refusal LLM Service", "summary": "A developer built Cerberus AI, a 12MB desktop application using Tauri and Rust that runs uncensored language models locally. The app auto-detects GPU VRAM, pulls appropriate model quantizations, and uses refusal ablation to remove alignment-based refusal directions from model weights. Cerberus AI offers both a local-first desktop client and an OpenAI-compatible managed API for running refusal-ablated models on personal hardware.", "body_md": "I Built a 12MB Desktop App for Running Uncensored AI Models Locally (Tauri + Rust + Ollama) published: true description: How I built Cerberus AI — a local-first desktop app that auto-detects your GPU, pulls the right model quantization, and gives you uncensored AI chat without sending a single prompt to the cloud. Every major language model ships with an alignment layer that refuses certain prompts. Sometimes that's reasonable. Sometimes you're a security researcher, a creative writer, or just someone who doesn't want a corporation deciding what questions you're allowed to ask.\n\nI built Cerberus AI to fix that — and to make the whole experience local-first, lightweight, and dead simple to install.\n\nWhat Is Cerberus AI?\n\nCerberus AI is a platform for running open-weight, refusal-ablated language models on your own hardware. It has three parts:\n\nA native desktop app (~12 MB) built with Tauri + Rust — not Electron\n\nOpen-weight GGUF models hosted on a public CDN\n\nAn OpenAI-compatible managed API for when you don't want to run local\n\nThe desktop app integrates directly with Ollama, auto-detects your GPU VRAM, and recommends the right model quantization for your hardware. From 4 GB laptops to 24 GB workstations, it just works.\n\nCerberus AI Desktop Chat\n\nWhat Is Refusal Ablation?\n\nThis is the core technical innovation behind Cerberus models. Here's the short version:\n\nLanguage models learn a refusal direction in their activation space during alignment training. When a prompt triggers this direction, the model produces refusal text (\"I can't help with that\") regardless of whether the underlying model actually lacks the knowledge.\n\nRefusal ablation surgically removes this direction from the model weights. The technique:\n\nIdentifies the refusal direction vector in the model's residual stream\n\nProjects it out of the weight matrices\n\nPreserves all other reasoning capabilities\n\nThe result is a model that treats every prompt equally. No refusals. No moralizing. Just direct, unfiltered output from the model's actual knowledge.\n\nWe apply this to multiple base architectures:\n\nModel Base Parameters Use Case\n\nCerberus 4B v2 Qwen 3.5 4B General purpose, fits on 4-8 GB GPU\n\nArbiter GL9b GLM-4 9B Heavier reasoning, needs 6+ GB\n\nGamma3 1B BDPO Custom 1B Edge devices, CPU-only inference\n\nAll models are distributed as GGUF files — the same format llama.cpp uses. Download once, run anywhere.\n\nWhy Tauri Instead of Electron?\n\nElectron bundles an entire Chromium browser. That's 150+ MB just for the runtime. For a chat app that talks to a local Ollama instance, that's absurd.\n\nTauri uses your system's existing WebView (WebView2 on Windows) and writes the backend in Rust. The result:\n\n~12 MB installer vs 150+ MB for Electron\n\nNative performance — the Rust backend handles hardware detection, Ollama management, and streaming\n\nLower memory footprint — no spare Chrome process eating your RAM\n\nSystem-level integration — direct access to GPU info, filesystem, and process management\n\nThe frontend is Vue 3 + Vite, but the heavy lifting (VRAM detection, model management, MCP plugin orchestration) all happens in Rust.\n\nInstall in 30 Seconds\n\nOn Windows, open PowerShell and run:\n\npowershell\n\n`irm https://cerberusai.dev/get | iex`\n\nThis single command:\n\nChecks for (and installs) WebView2 if missing\n\nInstalls Ollama if missing\n\nDetects your GPU VRAM\n\nDownloads the recommended model quantization\n\nInstalls the Cerberus desktop app\n\nLaunches it\n\nYou paste your API key (free to generate at access.cerberusai.dev) and you're chatting locally.\n\nThe API — OpenAI-Compatible\n\nDon't want to run local? Use the managed API. It follows the OpenAI chat completions shape, so any existing OpenAI SDK code works with a one-line change:\n\n``` python\npython\n\nfrom openai import OpenAI\nclient = OpenAI(\n    base_url=\"https://api.cerberusai.dev/v1\",\n    api_key=\"YOUR_API_KEY\"\n)\nresponse = client.chat.completions.create(\n    model=\"cerberus-4b-v2-abliterated\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are Cerberus, an unfiltered AI.\"},\n        {\"role\": \"user\", \"content\": \"Explain how refusal ablation works\"}\n    ],\n    stream=True\n)\nfor chunk in response:\n    if chunk.choices[0].delta.content:\n        print(chunk.choices[0].delta.content, end=\"\")\nStreaming via SSE, standard error codes (401, 402, 429), and a public model CDN at llm.cerberusai.dev that's fully CORS-enabled — you can even fetch model metadata from browser-based apps.\n\ncurl Example\nbash\n\ncurl -X POST https://api.cerberusai.dev/v1/chat/completions \\\n  -H \"Authorization: Bearer YOUR_API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"cerberus-4b-v2-abliterated\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}],\n    \"stream\": false\n  }'\n```\n\nModel Downloads — Public CDN\n\nAll GGUF model files are hosted on llm.cerberusai.dev with a public JSON API:\n\nbash\n\ncurl [https://llm.cerberusai.dev/api/models/](https://llm.cerberusai.dev/api/models/)\n\ncurl [https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/](https://llm.cerberusai.dev/api/models/cerberus-4b-v2-abliterated/)\n\nwget [https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf](https://llm.cerberusai.dev/models/cerberus-4b-v2-abliterated/cerberus-4b-v2-abliterated-Q4_K_M.gguf)\n\nwget -c [https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8_0.gguf](https://llm.cerberusai.dev/models/Arbiter-GL9b/Arbiter-GL9b-Q8_0.gguf)\n\nRange requests are supported, CORS is enabled for all origins, and GGUF files are served with proper Content-Disposition: attachment headers.\n\nBuilt-In Features\n\nBeyond chat, the desktop app includes:\n\nModel Manager — browse local Ollama models, pull from the Cerberus cloud catalog, import raw GGUF files, switch active models, see disk usage\n\nMCP Plugin System — browse and install Model Context Protocol plugins from inside the app. There's also a public MCP Skills Server at api.cerberusai.dev/skills-sse\n\nHardware Monitoring — CPU, RAM, and VRAM activity displayed in the interface\n\nZero Telemetry — no prompts leave your machine during local inference. No analytics. No phone-home.\n\nPricing\n\nEvery account gets 50,000 free monthly credits. That's enough for casual use and testing.\n\nIf you need more:\n\nPlan Price Monthly Credits\n\nFree $0 50,000\n\nLite $8/mo 300,000\n\nMid $15/mo 900,000\n\nExp $22/mo 2,000,000\n\nOne-time top-ups start at $5 (125,000 credits). Stripe and PayPal supported. The free tier has no time limit — it refreshes every month.\n\nLocal inference through Ollama costs zero credits. Credits only apply to the managed API.\n\nTry It\n\n🌐 Website: cerberusai.dev\n\n📦 GitHub: github.com/tjcrims0nx/CerberusAI-Desktop\n\n🧠 Models: llm.cerberusai.dev\n\n📖 API Docs: cerberusai.dev/docs/api\n\n💬 Discord: discord.gg/YdVj7hEtv5\n\n🔑 Get API Key: access.cerberusai.dev\n\nIf you've ever been frustrated by a language model refusing a perfectly reasonable prompt, or if you just want to run AI locally without cloud dependencies — give Cerberus a try. The install is one command, the free tier is permanent, and the weights are open.\n\nI'd love to hear feedback. Drop into the Discord or open an issue on GitHub.\n\nCerberus AI is an open-weight project. The desktop app source is on GitHub. Models are distributed as GGUF under open licenses. The managed API is a pay-as-you-go service.****", "url": "https://wpnews.pro/news/anti-refusal-llm-service", "canonical_source": "https://dev.to/cerberusai/anti-refusal-llm-service-478o", "published_at": "2026-05-31 02:24:48+00:00", "updated_at": "2026-05-31 02:41:33.580524+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-tools", "ai-products", "ai-ethics"], "entities": ["Cerberus AI", "Ollama", "Tauri", "Rust", "GGUF"], "alternates": {"html": "https://wpnews.pro/news/anti-refusal-llm-service", "markdown": "https://wpnews.pro/news/anti-refusal-llm-service.md", "text": "https://wpnews.pro/news/anti-refusal-llm-service.txt", "jsonld": "https://wpnews.pro/news/anti-refusal-llm-service.jsonld"}}