cd/entity/GGUF· home› entities› GGUF

grep -l @gguf /news/*.json | wc -l → 32

GGUF

mentions 32 type Organization page 2/2 feed RSS

// recent coverage 32 mentions

08:42

2026-06-05

deemwar-products.github.io

large-language-models

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Mochallama, a new Java library, enables running llama.cpp inference directly within a Java process using JDK 22's Foreign Function and Memory (FFM) API, eliminating the need for separate daemon proces…

00:29

2026-06-04

github.com

large-language-models

TensorSharp: Open-Source Local LLM Inference Engine

TensorSharp, a new open-source C# inference engine, now enables developers to run large language models locally using GGUF files. The engine supports multiple model architectures including Gemma 4, Qw…

06:03

2026-06-03

dev.to

large-language-models

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

A Reddit user reported that llama.cpp build b9455 achieved 67-81 tokens per second on a dual RTX 3090 setup running Unsloth's Qwen3.6-27B-UD-Q8_K_XL model, matching the speed of vLLM for multi-GPU inf…

02:24

2026-05-31

dev.to

large-language-models

Anti Refusal LLM Service

A developer built Cerberus AI, a 12MB desktop application using Tauri and Rust that runs uncensored language models locally. The app auto-detects GPU VRAM, pulls appropriate model quantizations, and u…

16:00

2026-05-27

dev.to

large-language-models

Why your quantized LLM loses its MTP heads and how to keep them

A developer discovered that standard quantization pipelines for large language models silently discard multi-token prediction (MTP) heads, causing speculative decoding speedups to vanish despite the b…

04:27

2026-05-26

github.com

large-language-models

Ollama v0.30.0-rc23: "directly support llama.cpp" & "compatibility with GGUF"

Ollama released version 0.30.0-rc23, a pre-release that shifts the software's architecture to directly support llama.cpp instead of building on top of GGML, and adds compatibility with the GGUF file f…

18:48

2026-05-23

dev.to

large-language-models

GGUF & Modelfile: The Power User's Guide to Local LLMs

The article explains how power users can download GGUF (GPT-Generated Unified Format) model files directly from Hugging Face, quantize them (using Q4_K_M as the optimal balance of size and quality), a…

09:21

2026-05-23

dev.to

artificial-intelligence

How to Build a Self-Hosted AI Code Review Tool in Python

This article provides a guide for building a self-hosted AI code review tool in Python using Ollama and a locally-run language model like CodeLlama or DeepSeek-Coder. The tool reads a git diff, sends …

01:14

2026-05-20

dev.to

large-language-models

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

This article compares three dominant tools for local LLM inference in 2026: Ollama, llama.cpp, and vLLM. Ollama is recommended for personal, non-technical use due to its ease of setup, while llama.cpp…

20:09

2026-05-18

dev.to

hardware

Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Detailed, step-by-step guide for building llama.cpp from source on a Dell Precision T5820 workstation equipped with an RTX 3090 Ti, achieving 42 tok/s with Qwen3.6-27B. The author emphasizes that the …

00:00

2026-05-14

nobodywho.ooo

large-language-models

What's in a GGUF, besides the weights - and what's still missing?

The GGUF file format consolidates all necessary model components—including weights, chat templates, and special tokens—into a single file, offering a more ergonomic alternative to the scattered JSON f…

23:30

2026-05-01

gist.github.com

large-language-models

Transplant MTP block from one GGUF file into another

A developer has released a Python script that transplants extra tensors—such as Multi-Token Prediction (MTP) layers—from one GGUF file into another, enabling the creation of mixed-quantization models.…

← prev page 2 / 2

// co-occurs with top 8 entities

llama.cpp 14 Ollama 12 Hugging Face 9 NVIDIA 7 OpenAI 5 CUDA 4 Apple Silicon 4 vLLM 3