cd/entity/Gemma· home› entities› Gemma

grep -l @gemma /news/*.json | wc -l → 76

Gemma

mentions 76 type Organization page 2/4 feed RSS

sameAs · en.wikipedia.org

// recent coverage 76 mentions

18:03

2026-06-20

hackster.io

ai-tools

Offline AI Voice Assistant on Raspberry Pi 4 with Gemma

A developer built a fully offline voice assistant on a Raspberry Pi 4 or 5 using local AI models. The device records audio, processes it with Whisper for speech-to-text, runs a local language model vi…

23:06

2026-06-19

lector.dev

large-language-models

Show HN: Evaluating Local LLMs as language translators for my app

A reproducible benchmark of 24 local, self-hosted, and cloud LLMs for translation into English found that a local 18 GB model (gemma-4-12b-qat) performs on par with frontier cloud models on Afrikaans→…

20:25

2026-06-19

lmsys.org

large-language-models

The next generation of speculative decoding: DFlash and Spec V2

Modal and Z Lab released DFlash, a speculative decoding model for Qwen 3.5 397B-A17B, achieving over 4.3x throughput versus baseline and 1.5x versus MTP on HumanEval at concurrency 1. The model uses a…

18:13

2026-06-19

dev.to

developer-tools

I Wired OpenRouter Free Models Into My OpenClaw Fallback Chain. Here's What Actually Works.

A developer fixed a broken fallback chain in their OpenClaw agent that was causing request timeouts during peak hours. The new chain includes seven entries: two local Ollama models, three OpenRouter f…

04:00

2026-06-19

arxiv.org

large-language-models

Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts

Researchers developed an ensemble of large language models, including Google's Gemini and Gemma, to automatically identify EQ-5D health-related quality-of-life studies in PubMed from abstracts. The we…

01:53

2026-06-18

letsdatascience.com

large-language-models

Google releases OpenRL for LLM fine-tuning

Google released OpenRL, an open-source API for fine-tuning large language models on Kubernetes clusters, aiming to decouple infrastructure from AI research and improve GPU utilization by running multi…

00:00

2026-06-18

mindstudio.ai

large-language-models

How to Run Local AI Models with Ollama: A Beginner's Setup Guide for 2026

Ollama, an open-source tool for running large language models locally, offers a beginner-friendly setup for 2026 with privacy, cost savings, and data control. Users can install it on macOS, Windows, o…

03:56

2026-06-17

dev.to

large-language-models

How much VRAM do you actually need to run Llama 3 or Gemma locally?

A developer calculated the actual VRAM requirements for running Llama 3 8B and Gemma 2 9B locally, revealing that the KV cache can consume far more memory than the model weights, especially at longer …

00:00

2026-06-17

runagentrun.co.uk

ai-infrastructure

OpenRouter fans prompts to match Claude Fable 5

OpenRouter launched Fusion, a routing layer that sends a single prompt to multiple AI models in parallel and synthesizes their outputs, achieving performance comparable to Anthropic's Claude Fable 5 a…

14:16

2026-06-16

byteiota.com

large-language-models

Local LLMs vs Claude for Coding: The 70% Problem

A Hacker News thread on June 16 revealed that local LLMs like Qwen 3.6 35B-A3B handle about 70% of daily coding tasks but fall short on complex multi-file reasoning, creating a gap akin to a junior ve…

00:00

2026-06-16

tomtunguz.com

large-language-models

5x for Free : The Local Coding Stack

A Hacker News thread reveals that local AI coding models, led by Qwen 3.6 35B-A3B and harness Pi, are increasingly replacing cloud-based tools like Claude and GPT, offering privacy, zero cost, and off…

14:46

2026-06-15

news.ycombinator.com

large-language-models

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Hacker News users discuss replacing cloud-based AI coding assistants like Claude and GPT with local or self-hosted models, sharing setups and performance metrics. Some report success with models like …

13:00

2026-06-15

vettedconsumer.com

large-language-models

The KV Cache, Explained: Why Long Context Eats Your VRAM (and How to Fit More)

The KV cache, a memory store for attention keys and values, grows linearly with context length and can exceed model weights in VRAM usage, causing out-of-memory errors for local LLM users. At 32k cont…

22:29

2026-06-14

arxiv.org

large-language-models

Still: Amortized KV Cache Compaction in a Single Forward Pass

Researchers introduced Still, a per-layer Perceiver model that compacts KV cache in a single forward pass, enabling efficient long-context language model deployment. On Qwen and Gemma models, Still ou…

07:01

2026-06-14

coles.codes

large-language-models

Local Models in Mid-2026

Open-weight local language models in mid-2026 have nearly matched frontier performance for everyday tasks, driven by engineering advances in sparse attention and mixture-of-experts architectures. Deep…

00:00

2026-06-12

mindstudio.ai

artificial-intelligence

Diffusion Language Models Explained: How Google's Diffusion Gemma Works

Google released Diffusion Gemma in early 2025 as its first open-weight diffusion language model, using a masked diffusion approach that generates text by starting with noise and iteratively refining i…

22:21

2026-06-11

idlemachines.co.uk

artificial-intelligence

DiffusionGemma: Discrete diffusion in a large language model

DeepMind released DiffusionGemma, a new large language model that uses discrete diffusion to generate entire sequences in parallel instead of autoregressive token-by-token generation. The model achiev…

17:50

2026-06-11

lesswrong.com

large-language-models

Failing to Ragebait the New Gemma

Researchers from the SPAR Research Fellowship found that Google's Gemma 4 language model shows significantly reduced emotional instability compared to its predecessor, Gemma 3, which frequently exhibi…

14:00

2026-06-11

coles.codes

large-language-models

Local models in mid-2026: the engineering that closed the gap

Local large language models have nearly caught up to frontier models for everyday tasks as of mid-2026, driven by engineering advances in sparse attention and mixture-of-experts architectures that red…

20:00

2026-06-10

simonwillison.net

large-language-models

DiffusionGemma

Google released DiffusionGemma, a new open-weight AI model under the Apache 2 license, available on Hugging Face. NVIDIA is hosting the model for free on its NIM cloud API, where it generated 2,409 to…

← prev page 2 / 4 next →

// co-occurs with top 8 entities

Qwen 33 Google 18 Ollama 17 DeepSeek 14 Llama 10 Gemma 4 9 Gemini 8 NVIDIA 7