cd/entity/GGUF· home› entities› GGUF

grep -l @gguf /news/*.json | wc -l → 32

GGUF

mentions 32 type Organization page 1/2 feed RSS

// recent coverage 32 mentions

00:00

2026-07-04

dev.to

large-language-models

Mastering Local Deployment of SOTA LLMs: Jamesob’s Guide to Overcoming Resource Constraints

Jamesob's guide provides developers with actionable strategies to deploy state-of-the-art large language models locally on consumer-grade hardware. The framework covers model quantization, pruning, ef…

10:13

2026-07-01

github.com

artificial-intelligence

Transcribe.cpp – ggml speech-to-text inference engine

Transcribe.cpp, a C/C++ speech-to-text inference library, has been released supporting 16 model families and 60+ variants via GGUF models on the ggml runtime. It offers Metal, Vulkan, and CUDA backend…

05:44

2026-06-28

github.com

large-language-models

KoboldCPP: Run GGUF Models Easily with a KoboldAI UI. One File. Zero Install

KoboldCpp, a single-file executable for running GGUF and GGML AI models, has been released by developer LostRuins. The tool requires no installation and supports text generation, image generation, spe…

00:08

2026-06-27

dev.to

large-language-models

What building an LLM inference engine from scratch taught me about compiler design

A developer built ignis, a from-scratch LLM inference engine in Rust with only two dependencies, to explore how compiler design principles apply to inference. The engine uses SSA IR, fusion passes, an…

20:06

2026-06-25

dev.to

large-language-models

Your Local LLM Is Not as Private as You Think

Cyera Research disclosed a critical vulnerability in Ollama, a popular tool for running large language models locally. Tracked as CVE-2026-7482 with a CVSS score of 9.1, the flaw allows attackers to l…

17:36

2026-06-25

devclubhouse.com

large-language-models

Quantize and Run Llama 3.2 on Apple Silicon with llama.cpp

Mariana Souza published a tutorial on quantizing and running Meta's Llama 3.2 3B model on Apple Silicon using llama.cpp with Metal GPU acceleration, achieving local inference with Q4_K_M quantization.…

09:46

2026-06-24

github.com

artificial-intelligence

GELab-Zero: Android automation framework for multimodal LLMs

GELab-Zero, an open-source Android automation framework for multimodal LLMs, has been released, featuring a 4B GUI agent model and plug-and-play engineering infrastructure with no cloud dependencies. …

02:20

2026-06-21

discuss.huggingface.co

machine-learning

FLUX.1-schnell on 8GB VRAM (AMD, no CUDA): the GGUF format mismatch that wastes hours

A developer discovered that FLUX.1-schnell GGUF models from city96's repository fail to load in stable-diffusion.cpp on AMD RX 580 8GB via Vulkan, while leejet's builds work. The error message 'new_sd…

10:14

2026-06-20

dev.to

ai-safety

Cool AI Projects That Failed: The File Integrity Gap

A developer identified a recurring failure mode in local AI projects: teams assume model artifacts like .gguf and .safetensors files are self-documenting and safe to consume without inspection. To add…

23:15

2026-06-18

unsloth.ai

artificial-intelligence

Unsloth: Easily run and train models locally

Unsloth launched Unsloth Studio, a desktop application for Mac and Windows that runs AI models offline, supporting GGUF and Safetensors formats with tool-calling, web search, and an OpenAI-compatible …

14:05

2026-06-18

dev.to

large-language-models

Quantized LoRA Adapters for On-Device LLMs: Hot-Swapping Task-Specific Behaviors on Android Without Reloading the Base Model

A developer demonstrates a technique for hot-swapping QLoRA adapters on Android devices, enabling task-specific LLM behaviors without reloading the base model. By loading a single 4-bit quantized base…

06:55

2026-06-17

github.com

large-language-models

Native Inference Engine for macOS 14 or newer

Embershard, a macOS chat app with its own LLM inference engine, has been released in beta v0.1.1 for Apple Silicon devices running macOS 14 or newer. The app bypasses llama.cpp for inference, instead …

14:03

2026-06-14

discuss.huggingface.co

large-language-models

Slopsome.com - a free VRAM fit-calculator + real tokens/sec database for local LLMs

Slopsome.com launched a free VRAM fit-calculator and real tokens-per-second database for local LLMs, enabling users to check if a model runs on specific GPUs with given quantization and context length…

23:50

2026-06-13

llama-cpp.com

large-language-models

Llama.cpp – Run LLM Inference in C/C++

Llama.cpp is an open-source C/C++ library that enables running large language model inference locally on consumer hardware, supporting multiple platforms and GPU backends. It automatically optimizes e…

11:34

2026-06-13

vettedconsumer.com

artificial-intelligence

Show HN: Quant Picker – which GGUF file fits your model and machine

Quant Picker is a new tool that calculates which GGUF quantization level fits a given model and machine, balancing file size, quality, and context budget. It recommends the highest quantization that l…

04:02

2026-06-12

vettedconsumer.com

large-language-models

Ollama vs LM Studio vs llama.cpp: Which Local LLM Runtime Should You Actually Use?

A comparison of three local large language model runtimes reveals that llama.cpp is the core inference engine, while Ollama and LM Studio are user-friendly wrappers built on top of it. Ollama offers a…

00:34

2026-06-11

vettedconsumer.com

large-language-models

Your Local AI Model Folder Is a Mess: Taming a Multi-Terabyte Model Hoard on Apple Silicon

Local AI users are accumulating hundreds of gigabytes of model files, including multiple quantizations of the same model, image and video generation weights, and RAG datasets, often with duplicates th…

14:58

2026-06-06

vettedconsumer.com

large-language-models

GGUF vs. GPTQ vs. AWQ: The Plain-English Guide to LLM Quantization

GGUF, GPTQ, and AWQ are the three dominant formats for running quantized large language models locally, each optimized for different hardware and use cases. GGUF, the format used by llama.cpp and its …

03:22

2026-06-06

dev.to

large-language-models

Run Gemma-4 12B on WSL2 with llama.cpp

A developer has published a guide for running Google's Gemma-4 12B instruction-tuned model on Windows Subsystem for Linux 2 (WSL2) using the llama.cpp framework. The process involves installing build …

12:25

2026-06-05

dev.to

ai-agents

I kept using Claude Code. Added one thing to it. Cut AI engineering costs by 62%.

A developer benchmarked two speech-to-text models on a CPU-only Azure VM and found that using an AI agent to plan and execute the task cut costs by 62%, from $1.96 to $0.74. The agent, Neo, achieved t…

page 1 / 2 next →

// co-occurs with top 8 entities

llama.cpp 14 Ollama 12 Hugging Face 9 NVIDIA 7 OpenAI 5 CUDA 4 Apple Silicon 4 vLLM 3