LLM-Manager: Orchestrating Ollama and Llama.cpp with Pure Bash A developer built LLM-Manager, a lightweight Bash-based orchestration suite for managing local and remote LLM inference engines like Ollama and Llama.cpp across Linux and WSL2. The tool uses a modular plug-and-play architecture with a JSON/interactive dual interface, eliminating the need for complex Python scripts or Docker setups. It provides unified commands for starting, stopping, loading, and monitoring models while outputting both human-readable messages and machine-parsable JSON for automation. LLM-Manager is a lightweight, modular Bash suite with a dual JSON/Interactive interface designed to manage local and remote inference engines across Linux and WSL2. When I started experimenting with Large Language Models LLMs to build an On-Premise RAG Retrieval-Augmented Generation application, I hit a massive roadblock: environment fragmentation . Managing multiple inference engines like Ollama and Llama.cpp meant memorizing different command-line flags, environment variables, and configurations. Once my frontend and backend prototypes were ready for testing, I realized I was spending too much time manually starting, stopping, loading, and unloading models. I looked online for solutions. Most people suggested complex Python scripts, heavy Docker setups, n8n workflows, or complicated web dashboards. I didn't want the bloat. I wanted something lightweight that executed commands as if I were doing them manually, but with zero cognitive load. That is why I built LLM-Manager : a modular orchestration suite written entirely in pure Bash. Choosing Bash wasn't about being old-school; it was a pragmatic engineering decision: npm install , no runtime dependencies. It’s native and lightning-fast.The system is designed with a strict plug-and-play modular layout. At the center sits a single entry-point orchestrator engine-run.sh that validates arguments against whitelists and routes actions to engine-specific scripts. . ├── engine.conf Global configuration constants ├── engine-models.json Model registry with per-engine metadata ├── engine-templates.json Prompt/Model templates by family ├── engine-run.sh Main orchestrator & entry-point ├── engine-common.sh Shared utilities OS detection, JSON formatting ├── engine-status.sh Cross-engine status aggregation ├── engine-system.sh Hardware metric probing ├── logs/ Centralized logs ├── llama/ Llama.cpp backend scripts └── ollama/ Ollama backend scripts Every engine directory implements a consistent interface start.sh , stop.sh , status.sh , load.sh , unload.sh , show.sh , remove.sh . If an engine doesn't support a specific action, a simple stub script that exits with 0 keeps the pipeline happy. One of the core features of LLM-Manager is how it handles output. stderr . stdout .This dual nature makes it perfect for local interactive use, but also means it acts as a local proxy. You can run it over Remote SSH and pipe the clean JSON straight into another monitoring script, custom Web UI, or automation tool. Running ./engine-run.sh status probes the system metrics and queries active network ports, spitting out a comprehensive payload: { "timestamp": "2026-05-29T06:38:30Z", "status": "success", "action": "status", "engine": "all", "data": { "system": { "os type": "wsl", "memory": { "total mb": 5927, "available mb": 4555 }, "gpu": { "detected": true, "name": "AMD Radeon TM Graphics", "vram total mb": 512 }, "cpu": { "cores": 4, "load 1m": 1.11 } }, "engines": { "ollama": { "state": "stopped", "port": 1234 }, "llama": { "state": "stopped", "port": 12345 } } } } If a command fails or is called without parameters, the machine gets the JSON error contract, and the human operator gets a clean, human-readable usage menu: Error: LLM Manager Usage: engine-run.sh