cd/entity/GPU· home› entities› GPU

grep -l @gpu /news/*.json | wc -l → 87

GPU

mentions 87 type Organization page 4/5 feed RSS

// recent coverage 87 mentions

02:47

2026-06-06

dev.to

artificial-intelligence

What Is Ollama? The Complete Guide to Running LLMs Locally in 2026

Ollama, an open-source runtime for large language models, enables users to run models locally on Mac, Windows, or Linux with a single command, eliminating the need for cloud dependencies or complex en…

01:20

2026-06-06

arxiv.org

machine-learning

Unlocking Non-Uniform KV Cache for Efficient Multi-Turn LLM Serving

Researchers introduced Tangram, a serving system that enables non-uniform Key-Value cache compression for multi-turn large language model inference. The system uses deterministic budget allocation, he…

00:19

2026-06-06

dev.to

ai-startups

How to build a credit system for a Next.js AI app (Stripe + Supabase)

A developer has created a credit-based billing system for Next.js AI applications using Stripe and Supabase, solving the problem of flat-rate pricing that can lead to margin erosion from heavy users. …

08:53

2026-06-05

letsdatascience.com

artificial-intelligence

AI-Powers Worm Exploits Stolen Compute to Infect Mixed Devices

Researchers published a proof-of-concept AI-driven worm that embeds an open-weight LLM on compromised GPUs to autonomously scan, exploit, and propagate across Linux, Windows, and IoT devices. The worm…

00:31

2026-06-04

dev.to

ai-agents

Your Agent Has a Memory That Runs While You Sleep

A developer built a continuous AI agent memory system called `akm improve` that runs autonomously on local hardware, processing 14,189 memories across 48 scheduled runs in 24 hours with zero failures.…

00:00

2026-06-04

spidra.io

developer-tools

What is WebGL fingerprinting and how to bypass it when scraping

WebGL fingerprinting is a hardware-level anti-bot technique that uses GPU rendering differences to create unique device identifiers, making it difficult to spoof compared to software-level signals. Th…

04:51

2026-06-03

arxiv.org

machine-learning

GPU Forecasters: Language Models as Selective Surrogates for Kernel Optimization

Researchers have demonstrated that large language models can serve as selective surrogates for GPU kernel performance evaluation, accurately forecasting kernel runtime without requiring repeated compi…

01:39

2026-06-02

marktechpost.com

machine-learning

How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

NVIDIA Apex's FusedAdam optimizer and FusedLayerNorm normalization layers can accelerate Transformer training by up to 30% compared to standard PyTorch implementations, according to benchmark tests. T…

06:00

2026-05-31

dev.to

ai-agents

Multi-Agent Negotiation Protocols: How AI Agents Should Bargain for Resources

A developer has found that centralized resource scheduling, like Kubernetes-style limits, wastes 30-40% of compute capacity in AI agent swarms by failing to account for agents' real-time utility needs…

00:00

2026-05-31

cefboud.com

large-language-models

Exploring Speculative Decoding: From Concept to Implementation

Speculative decoding optimizes LLM inference by using a cheap draft model to predict multiple tokens, which are then verified in a single forward pass of the target model, reducing memory-bandwidth bo…

23:18

2026-05-30

categoryvc.com

ai-chips

AI Hardware

Modern GPUs spend most of their time during AI inference waiting for data, as memory bandwidth cannot keep pace with compute throughput. This fundamental bottleneck has driven the AI hardware market, …

18:26

2026-05-29

dev.to

ai-infrastructure

I Built a Complete AI Infrastructure Stack from Scratch — Here's What I Learned

A developer built a complete AI infrastructure stack from scratch, including a crash-recovery storage engine, an LLM inference cache, a Kubernetes-based training orchestrator, and an async AI data pip…

03:52

2026-05-29

dev.to

neural-networks

Tensors Explained Part 2: Why Tensors Are Useful

Tensors enable hardware acceleration by leveraging GPUs and TPUs to perform parallel mathematical operations efficiently, making them essential for training neural networks. They also support automati…

19:50

2026-05-28

arxiv.org

artificial-intelligence

SIA: Self Improving AI with Harness and Weight Updates

Researchers have developed SIA, a self-improving AI system that updates both its own software scaffolding and internal model weights without human intervention, combining two previously separate appro…

18:39

2026-05-28

letsdatascience.com

machine-learning

MoE Transforms Open Model Ecosystem Costs

Mixture of Experts (MoE) models are reshaping the economics of open-model deployments by reducing GPU inference costs and altering serving stack requirements. The shift toward MoE architectures in 202…

11:53

2026-05-28

github.com

large-language-models

Why LLM decode is memory-bound, not compute-bound

LLM inference costs 100x more than traditional machine learning inference because autoregressive generation requires a separate forward pass through the entire model for each output token. A Llama 3.1…

15:39

2026-05-27

pytorch.org

large-language-models

Up to 580tps! New Speed Record of Qwen3.5-397B-A17B on GPU for Agentic Workloads with TokenSpeed

TokenSpeed, an open-source inference engine, achieved a record-breaking 580 tokens per second running the Qwen3.5-397B-A17B model on GPUs. The performance gain for agentic workloads comes from elimina…

17:39

2026-05-26

aws.amazon.com

generative-ai

Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore

NVIDIA, Amazon Web Services, and Strands have launched a multi-agent generative AI system that combines NVIDIA NIM for GPU-accelerated inference, Amazon Bedrock AgentCore for managed runtime and share…

12:51

2026-05-26

klongpy.org

machine-learning

KlongPy: PyTorch Back End and Autograd

KlongPy now supports a PyTorch backend that enables GPU acceleration and automatic differentiation for gradient-based computations. The torch backend outperforms NumPy by up to 8x on large arrays and …

16:35

2026-05-24

thedeepview.com

artificial-intelligence

How the compute crisis is defining the next stage of AI

Lambda Chief Commercial Officer Robert Brooks IV argued that computing power is becoming one of the most strategically important resources in the AI economy, with his company building supercomputers f…

← prev page 4 / 5 next →

// co-occurs with top 8 entities

PyTorch 11 LLM 9 CPU 8 CUDA 7 vLLM 6 NVIDIA 6 HBM 6 TPU 5