cd/entity/GGUF· home entities GGUF
grep -l @gguf /news/*.json | wc -l → 32

GGUF

mentions 32 type Organization page 2/2 feed RSS

// recent coverage 32 mentions

08:42
2026-06-05
deemwar-products.github.io
large-language-models

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Mochallama, a new Java library, enables running llama.cpp inference directly within a Java process using JDK 22's Foreign Function and Memory (FFM) API, eliminating the need for separate daemon proces…

00:29
2026-06-04
github.com
large-language-models

TensorSharp: Open-Source Local LLM Inference Engine

TensorSharp, a new open-source C# inference engine, now enables developers to run large language models locally using GGUF files. The engine supports multiple model architectures including Gemma 4, Qw…

02:24
2026-05-31
dev.to
large-language-models

Anti Refusal LLM Service

A developer built Cerberus AI, a 12MB desktop application using Tauri and Rust that runs uncensored language models locally. The app auto-detects GPU VRAM, pulls appropriate model quantizations, and u…

16:00
2026-05-27
dev.to
large-language-models

Why your quantized LLM loses its MTP heads and how to keep them

A developer discovered that standard quantization pipelines for large language models silently discard multi-token prediction (MTP) heads, causing speculative decoding speedups to vanish despite the b…

18:48
2026-05-23
dev.to
large-language-models

GGUF & Modelfile: The Power User's Guide to Local LLMs

The article explains how power users can download GGUF (GPT-Generated Unified Format) model files directly from Hugging Face, quantize them (using Q4_K_M as the optimal balance of size and quality), a…

09:21
2026-05-23
dev.to
artificial-intelligence

How to Build a Self-Hosted AI Code Review Tool in Python

This article provides a guide for building a self-hosted AI code review tool in Python using Ollama and a locally-run language model like CodeLlama or DeepSeek-Coder. The tool reads a git diff, sends …

01:14
2026-05-20
dev.to
large-language-models

Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026?

This article compares three dominant tools for local LLM inference in 2026: Ollama, llama.cpp, and vLLM. Ollama is recommended for personal, non-technical use due to its ease of setup, while llama.cpp…

00:00
2026-05-14
nobodywho.ooo
large-language-models

What's in a GGUF, besides the weights - and what's still missing?

The GGUF file format consolidates all necessary model components—including weights, chat templates, and special tokens—into a single file, offering a more ergonomic alternative to the scattered JSON f…

23:30
2026-05-01
gist.github.com
large-language-models

Transplant MTP block from one GGUF file into another

A developer has released a Python script that transplants extra tensors—such as Multi-Token Prediction (MTP) layers—from one GGUF file into another, enabling the creation of mixed-quantization models.…

← prev page 2 / 2
// co-occurs with top 8 entities