cd/entity/CUDA· home› entities› CUDA

grep -l @cuda /news/*.json | wc -l → 194

CUDA

mentions 194 type Organization page 1/10 feed RSS

// recent coverage 194 mentions

13:01

2026-07-28

github.com

artificial-intelligence

Adaptive speculative decoding on a $300 GPU

A developer achieved up to 9.27× speedup on code editing tasks using adaptive speculative decoding on a €300 RTX 5060 GPU, with a 0.6B draft model nearly doubling math and JSON throughput. The project…

08:17

2026-07-28

probablydance.com

artificial-intelligence

If AI Writes All the Code, What Do the Programmers Do?

Eight months ago, software engineer Malte Skarupke wrote roughly 90% human code and 10% AI code; now his code is about 90% AI-generated. In a recent optimization of a matrix-multiply kernel using Nvid…

07:53

2026-07-28

marktechpost.com

artificial-intelligence

Deploying a 1-Bit Bonsai-27B Model with PrismML llama.cpp and OpenAI-Compatible Local Inference Workflows

PrismML released the 1-bit Bonsai-27B language model, deployable via a specialized fork of llama.cpp with CUDA kernels for the Q1_0_g128 GGUF format. The model requires only ~5.2 GB peak memory at 4K …

06:38

2026-07-28

runtimewire.com

large-language-models

Fermion Research publishes 3.88 GB Neutrino-1 8B for local inference

Fermion Research has published Neutrino-1 8B, an 8.19-billion-parameter language model packaged in a 3.88 GB file that fits on an 8 GB GPU or 16 GB laptop, using a proprietary ternary-family format th…

04:00

2026-07-28

machinebrief.com

large-language-models

LOCKS: Page-Local Compact Key Summaries for Efficient Long-Context Decoding

LOCKS, a new method from arXiv, enables efficient long-context decoding by giving each page of the key-value cache its own compact spectral summary, reconstructing within-page logits, and attending on…

20:30

2026-07-27

cryptobriefing.com

artificial-intelligence

China bets on broader AI strategy beyond chips, and crypto markets should pay attention

China's $295 billion AI infrastructure plan, announced in June 2026, aims to build a nationwide network of AI data centers with 80% domestic technology by 2028, sidestepping US chip export controls. T…

18:28

2026-07-27

promptcube3.com

artificial-intelligence

Kimi K3 Weights: Initial Deployment Notes

A developer deploying the Kimi K3 model encountered a CUDA out-of-memory error caused by KV cache allocation during initial inference passes, not the model weights themselves. The developer resolved t…

16:52

2026-07-27

i-programmer.info

artificial-intelligence

Programming Massively Parallel Processors, 5th Ed(Morgan Kaufmann)

The 5th edition of 'Programming Massively Parallel Processors' by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj, published by Morgan Kaufmann, introduces new chapters on filtering, wavefront parall…

02:02

2026-07-27

promptcube3.com

artificial-intelligence

Gemma Model Deployment: Handling VRAM Spikes

A developer reports that deploying Google's Gemma model causes a CUDA out-of-memory error during the weights loading sequence, with 18.1GB already allocated on a 24GB GPU. The spike occurs during the …

11:48

2026-07-26

pub.towardsai.net

artificial-intelligence

How I Fit a Model That “Shouldn’t” Fit on a 6GB Laptop GPU

A self-study AI engineer demonstrates how to fit a large language model on a 6GB laptop GPU using quantization, a technique that reduces model precision to enable fine-tuning on consumer hardware. The…

07:23

2026-07-26

snipvote.com

artificial-intelligence

AMD publishes machine-readable ISA so frontier models can write its GPU kernels

AMD has published a machine-readable ISA for its Instinct GPUs and partnered with Anthropic and OpenAI to let frontier models natively write and optimize low-level kernels, claiming a 38% inference sp…

04:22

2026-07-26

xcancel.com

ai-infrastructure

Now that Huang believes in open source, looking forward to CUDA open sourcing

Nvidia CEO Jensen Huang now supports open source, raising expectations for CUDA to be open-sourced, according to a post on X.…

23:03

2026-07-25

promptcube3.com

artificial-intelligence

AMD ISA: Why Machine-Readable Specs Change GPU Programming

AMD's move to provide machine-readable Instruction Set Architecture (ISA) specifications could transform GPU programming by enabling LLM agents to directly generate optimized kernels, bypassing the ne…

22:43

2026-07-25

github.com

developer-tools

Nvprobe – Open-source, zero-setup CLI for CUDA benchmarks

Nvprobe, an open-source, zero-setup CLI tool for CUDA benchmarks, has been released on GitHub. The tool automates CUDA workloads including HPL, HPCG, MLPerf inference, and custom kernels, and generate…

18:08

2026-07-25

marktechpost.com

developer-tools

Designing High-Performance GPU Kernels with TileLang: Tensor-Core GEMM, Fused Softmax, FlashAttention, and Autotuning

TileLang, a high-level Python domain-specific language for designing GPU kernels through TVM, enables developers to implement tensor-core GEMM, fused softmax, FlashAttention, and autotuning while mana…

17:02

2026-07-25

promptcube3.com

machine-learning

My Machine Learning Internship: Theory vs. Real-World Deployment

A machine learning intern recounts that the biggest challenge in transitioning from a local environment to deployment was dependency conflicts, specifically a PyTorch version mismatch with CUDA driver…

13:27

2026-07-25

officechai.com

ai-policy

Anthropic Researcher Calls Out NVIDIA, Microsoft For Signing Open AI Letter, Asks Them To Open-Source CUDA And Microsoft Office

Anthropic researcher Julian Schrittwieser mocked NVIDIA and Microsoft for signing an open AI letter, calling on them to open-source CUDA and Microsoft Office. Schrittwieser highlighted the irony of hi…

04:04

2026-07-25

promptcube3.com

artificial-intelligence

Inflect v2: Running TTS under 10M Parameters

Inflect v2, a text-to-speech model under 10 million parameters, achieves competitive performance with 4.395 UTMOS22 and 3.99% semantic WER for the Micro version (9.36M parameters) and 4.386 UTMOS22 wi…

21:55

2026-07-24

machinebrief.com

artificial-intelligence

AMD vibe codes its way past the CUDA moat with ROCm.AI

AMD at its Advancing AI event in San Francisco unveiled ROCm.AI, a platform that uses frontier AI models to automatically optimize GPU kernels and inference performance on AMD Instinct hardware, claim…

17:48

2026-07-24

promptcube3.com

artificial-intelligence

Jensen Huang's New X Account: Why It Matters for AI

NVIDIA CEO Jensen Huang's new X account signals a shift toward real-time, decentralized communication with the developer community, potentially accelerating feedback loops on AI hardware and software …

page 1 / 10 next →

// co-occurs with top 8 entities

NVIDIA 80 PyTorch 37 Nvidia 32 AMD 24 llama.cpp 18 vLLM 16 ROCm 14 Vulkan 14

// topics top 6 topics

ai infrastructure 139 artificial intelligence 100 developer tools 84 ai tools 77 machine learning 74 large language models 68