LLM-Manager: Orchestrating Ollama and Llama.cpp with Pure Bash

wpnews.pro

cd /news/large-language-models/llm-manager-orchestrating-ollama-and… · home › topics › large-language-models › article

[ARTICLE · art-17374] src=dev.to ↗ pub=2026-05-29T08:16Z topic=large-language-models verified=true sentiment=↑ positive

LLM-Manager: Orchestrating Ollama and Llama.cpp with Pure Bash

A developer built LLM-Manager, a lightweight Bash-based orchestration suite for managing local and remote LLM inference engines like Ollama and Llama.cpp across Linux and WSL2. The tool uses a modular plug-and-play architecture with a JSON/interactive dual interface, eliminating the need for complex Python scripts or Docker setups. It provides unified commands for starting, stopping, loading, and monitoring models while outputting both human-readable messages and machine-parsable JSON for automation.

read4 min views26 publishedMay 29, 2026

LLM-Manager is a lightweight, modular Bash suite with a dual JSON/Interactive interface designed to manage local and remote inference engines across Linux and WSL2.

When I started experimenting with Large Language Models (LLMs) to build an On-Premise RAG (Retrieval-Augmented Generation) application, I hit a massive roadblock: environment fragmentation.

Managing multiple inference engines like Ollama and Llama.cpp meant memorizing different command-line flags, environment variables, and configurations. Once my frontend and backend prototypes were ready for testing, I realized I was spending too much time manually starting, stopping, , and un models.

I looked online for solutions. Most people suggested complex Python scripts, heavy Docker setups, n8n workflows, or complicated web dashboards.

I didn't want the bloat. I wanted something lightweight that executed commands as if I were doing them manually, but with zero cognitive load.

That is why I built LLM-Manager: a modular orchestration suite written entirely in pure Bash.

Choosing Bash wasn't about being old-school; it was a pragmatic engineering decision:

npm install

, no runtime dependencies. It’s native and lightning-fast.The system is designed with a strict plug-and-play modular layout. At the center sits a single entry-point orchestrator (engine-run.sh

) that validates arguments against whitelists and routes actions to engine-specific scripts.

.
├── engine.conf               # Global configuration constants
├── engine-models.json        # Model registry with per-engine metadata
├── engine-templates.json     # Prompt/Model templates by family
├── engine-run.sh             # Main orchestrator & entry-point
├── engine-common.sh          # Shared utilities (OS detection, JSON formatting)
├── engine-status.sh          # Cross-engine status aggregation
├── engine-system.sh          # Hardware metric probing
├── logs/                     # Centralized logs
├── llama/                    # Llama.cpp backend scripts
└── ollama/                   # Ollama backend scripts

Every engine directory implements a consistent interface (start.sh

, stop.sh

, status.sh

, load.sh

, unload.sh

, show.sh

, remove.sh

). If an engine doesn't support a specific action, a simple stub script that exits with 0

keeps the pipeline happy.

One of the core features of LLM-Manager is how it handles output.

stderr

.stdout

.This dual nature makes it perfect for local interactive use, but also means it acts as a local proxy. You can run it over Remote SSH and pipe the clean JSON straight into another monitoring script, custom Web UI, or automation tool.

Running ./engine-run.sh status

probes the system metrics and queries active network ports, spitting out a comprehensive payload:

{
  "timestamp": "2026-05-29T06:38:30Z",
  "status": "success",
  "action": "status",
  "engine": "all",
  "data": {
    "system": {
      "os_type": "wsl",
      "memory": { "total_mb": 5927, "available_mb": 4555 },
      "gpu": { "detected": true, "name": "AMD Radeon(TM) Graphics", "vram_total_mb": 512 },
      "cpu": { "cores": 4, "load_1m": 1.11 }
    },
    "engines": {
      "ollama": { "state": "stopped", "port": 1234 },
      "llama": { "state": "stopped", "port": 12345 }
    }
  }
}

If a command fails or is called without parameters, the machine gets the JSON error contract, and the human operator gets a clean, human-readable usage menu:

Error: LLM Manager
Usage: engine-run.sh <action> [engine] [args...]
actions:
    config                           Global config
    models [-h]                      List available models (-h human readable)
    status <engine>                  Show global or engine status
    start <engine> [model] [users]   Start an engine
    stop <engine>                    Stop an engine
...

Managing raw .gguf

files on Ollama can be a chore since it requires a Modelfile

. LLM-Manager abstracts this entirely in the backend via model strategies in engine-models.json

If a model is configured with a gguf

, the load.sh

script dynamically generates the required Modelfile on the fly, injecting correct prompt templates based on the model family, and it into Ollama seamlessly. It also supports native

strategies to pull directly from the official Ollama registry, or auto

to fallback if the local file is missing.

The project is fully open-source. If you want to see how the WSL2/PowerShell bridges are handled, how the dynamic Modelfiles are generated, or if you want to use it to clean up your own local LLM testing environment, check out the repository:

A lightweight, modular Bash orchestration suite to manage, start, stop, and monitor local and remote LLM inference engines (Ollama, Llama.cpp) with a dual interactive/JSON interface.

Developed primarily to solve the complexity of managing ibrid environments (like Windows hosts from WSL2) and remote deployments via SSH without the overhead of heavy Python or dashboard solutions.

Before running the orchestrator, ensure your environment has the following tools installed:

jq

sudo apt install jq

on Debian/Ubuntu).curl

ollama

and llama.cpp

(with vLLM

planned).I am currently working on completing the vLLM

engine integration and refining the startup health-checks into proactive retry loops.

Let me know what you think or if you've built similar lightweight alternatives for your AI workflows!

source & further reading

dev.to — original article Building an Event Planning Coordinator Agent in typescript with HazelJS Building a Home Maintenance Scheduler Agent in typescript with HazelJS Lesson 2 - Security: Secure as you build

~/api · this article 200

$curl api.wpnews.pro/v1/news/llm-manager-orchestratin…

Read original on dev.to → dev.to/bumbulik0/llm-manager-orchestrating-ollam…

mentioned entities

LLM-Manager

Ollama

Llama.cpp

Bash

Linux

WSL2

RAG

n8n

metadata

slugllm-manager-orchestrating-ollama-and-llama-cpp-with-pure-bash

topic#large-language-models

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevChatGPT isn't the only chatbot p…

next →How to set up PostgreSQL permiss…

── more in #large-language-models 4 stories · sorted by recency

insideai.news · 14 Jul · #large-language-models

IBM Shares Crash 25% as AI Hardware Shift Catches CEO Arvind Krishna Off Guard

fortune.com · 14 Jul · #large-language-models

The secrets of an unheralded AI success story

sourcefeed.dev · 14 Jul · #large-language-models

GPT-5.6 Agents Write Their Own Orchestration

machinebrief.com · 14 Jul · #large-language-models

Apple's Siri AI Beta: A New Era or More of the Same?

── more on @llm-manager 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required