We got local models to triage the OpenClaw repo for FREE! Hugging Face engineer Onur developed a real-time notification system for the OpenClaw repository using local open-weight models like Gemma and Qwen, running on an NVIDIA GB10 with 128 GB of unified memory. The system classifies issues and PRs into categories using an agent harness, enabling near-instantaneous triage without relying on costly cloud-based AI services, highlighting the importance of self-hosted AI stacks after the removal of Anthropic's Claude Fable 5 model. Text Generation • 284B • Updated • 5.16M • 268 We got local models to triage the OpenClaw repo for FREE Update on GitHub https://github.com/huggingface/blog/blob/main/local-models-pr-triage.md Free as in beer, excluding the cost of electricity, and assuming you already own the hardware June 2026 will go down as the moment that people realized closed models can be taken away. With the removal of Anthropic's latest flagship model Claude Fable 5 fresh in memory, one can see why it is more important than ever to own your AI stack and be able to run models locally, especially if you are building your business on top of AI. In that light, we wanted to share how we use local models like Gemma and Qwen in an agent harness, to run classification tasks ^1 . This approach is different from using a model like BERT for classification. A local model in an agent harness like Pi can be used in tandem with structured outputs, to assign labels. We chose this approach because we already had local models and the harness on hand, and have conviction that similar setups will increase in popularity as local models improve in capability. ^2 Our starting point was open source contributions in the OpenClaw repo. OpenClaw gets hundreds of issues and PRs every day, which need to be triaged, prioritized and routed to maintainers. I, Onur, am working to make local models work well with OpenClaw. Being a maintainer of this specific vertical, I need to react quickly to any P0 issues. With SOTA closed models like GPT-5, Opus, or Sonnet, this is a pretty straightforward task. But I happen to sit on 128 GB of unified memory, namely an NVIDIA GB10. So I took on the task: Can I build a real-time notification system that filters and notifies me for only the issues that I am responsible for... with local open-weight models? If I set up my OpenClaw main agent running on a $200/mo ChatGPT Pro plan to trigger a job on every new issue or PR, that would use up my quota. I might instead set it to run every 2 hours, or 6 hours. This would batch issues over longer periods, so we would be trading real-time notifications for delayed processing. If I were to run this on a local model on the hardware I already have up and running, I would not only have near-instantaneous notifications, I would also be able to do it for free or rather, for the cost of electricity . Categorizing issues and PRs We came up with a finite set of labels representing the categories of issues we need to triage, and then use a local model to classify each issue into one of those categories, like local models , self hosted inference , acp , agent runtime , codex , ui tui and so on. ^3 But how do we classify pull requests? A simple single request to a Chat Completions endpoint with a tool JSON schema, with the topics as an enum? Kind of. But this is 2026, not 2023, and we have AGENTS. We can do better For the local model choices, we tested gemma-4-26b-a4b and qwen3.6-35b-a3b . With performance optimizations, both can generate hundreds of tokens per second locally. We use an agent harness to drive the classification run. For this, we bundle pi https://pi.dev as a harness that can call local model endpoints. The agent by default receives the PR title, body and a truncated excerpt of the PR diff in the first prompt. Then, it can choose to use the bash tool to perform read-only operations on the OpenClaw repo in case it needs to look at the codebase , or the final json tool to submit the final classification result. You wouldn't want to give full bash access to a local model running in this high-throughput setting, because a prompt-injected issue or PR could otherwise steer the model into doing something unrelated to classification. For that reason, we use reposhell https://github.com/osolmaz/localpager/tree/main/reposhell instead of bash : a restricted bash -like shell that only allows read-only operations ls , find , cat , grep , etc. on the OpenClaw repo. The model thinks it is using bash , but any operation that is not allowed is rejected: reposhell bound cwd=/repo/openclaw repos=openclaw type help for allowed commands; exit or quit to leave reposhell /repo/openclaw help allowed: pwd, ls, find, rg, grep, sed -n, cat, head, tail, wc -l, git status --short, git show --name-only, git grep, git ls-files search: rg -n -i "lm studio" or grep -R -n -i "lm studio" . files: rg --files -g " .ts" or git ls-files src examples: rg -n reposhell README.md | sed is not allowed; use one simple command at a time reposhell /repo/openclaw head README.md 🦞 OpenClaw — Personal AI Assistant