cd/entity/DeepSeek-V3· home› entities› DeepSeek-V3

grep -l @deepseek-v3 /news/*.json | wc -l → 17

DeepSeek-V3

mentions 17 type Organization feed RSS

// recent coverage 17 mentions

02:34

2026-07-08

technode.com

ai-chips

DeepSeek begins in-house AI chip development to cut reliance on NVIDIA, sources say

Chinese AI startup DeepSeek has begun developing its own AI chips focused on inference workloads to reduce reliance on NVIDIA and cut costs, sources told Reuters. The project is in early stages and ha…

15:38

2026-07-07

vettedconsumer.com

large-language-models

Kimi K2.7 Code: The Open Trillion-Parameter Coder, and the 594GB Reality of Running It Locally

Moonshot AI released Kimi K2.7 Code, a 1-trillion-parameter open-source coding model under a near-MIT license, but its 594GB size at 4-bit quantization makes local deployment impractical for most user…

00:00

2026-06-27

outofcontext.dev

large-language-models

Why a 30B model can run like a 3B: dense vs MoE for running models locally

Mixture-of-Experts (MoE) models like Qwen3-30B-A3B and DeepSeek-V3 separate total parameters (memory) from active parameters (compute), allowing a 30B-parameter model to run at the speed of a 3B model…

15:01

2026-06-20

dev.to

artificial-intelligence

Building Cost-Effective AI Workflows: Open Source + Paid Tools Done Right

A developer outlines a cost-effective AI workflow combining open-source local models with paid APIs, achieving a monthly cost of $20-30 plus initial hardware. The approach uses DeepSeek-V3 via Ollama …

15:11

2026-06-16

developer.nvidia.com

artificial-intelligence

NVIDIA Blackwell Tops MLPerf Training 6.0 with Industry-Leading Scale and Performance

NVIDIA swept MLPerf Training v6.0 benchmarks, achieving the fastest training times at scale and highest per-accelerator performance across all tests, including new DeepSeek-V3 and GPT-OSS-20B workload…

15:00

2026-06-16

blogs.nvidia.com

artificial-intelligence

Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0

NVIDIA's Blackwell platform swept MLPerf Training 6.0, achieving the fastest training times across all seven benchmarks, scaling to 8,192 GPUs on DeepSeek-V3 671B, and delivering up to 1.6x performanc…

17:54

2026-06-15

dev.to

large-language-models

The Budget Guide to Prompt Engineering: Save Money with Every Token

A developer's guide to budget prompt engineering reveals that maximizing information density while minimizing token count can achieve premium-tier productivity from budget models like GPT-4.1-mini, De…

16:45

2026-06-15

developer.nvidia.com

machine-learning

Boosting MoE Training Throughput with Advanced Fusion Kernels

NVIDIA introduced advanced fused MLP kernels for mixture-of-experts (MoE) models, built with the CuTe DSL, delivering 1.3x–2x kernel-level speedups and enabling sync-free MoE execution. The optimizati…

13:16

2026-06-15

prahladyeri.github.io

large-language-models

Applying Brevity and Language Efficiency in Prompt Engineering

Prahlad Yeri published a guide on prompt engineering for budget-tier AI models, targeting developers and students in cost-sensitive markets like Bangalore and Jakarta. The article teaches structured p…

04:00

2026-06-12

arxiv.org

artificial-intelligence

Language-Guided Abstraction for Visual Reasoning

Researchers have developed L-VARC, a novel framework that enhances visual reasoning on the Abstraction and Reasoning Corpus (ARC) by integrating a language-guided Learning Using Privileged Information…

21:46

2026-06-11

vettedconsumer.com

artificial-intelligence

Mixture-of-Experts (MoE), Explained: Why “Active Parameters” Decide What Runs on Your Machine

Mixture-of-Experts (MoE) architecture allows large language models to use only a fraction of their total parameters for each token, enabling models with 671 billion total parameters to run at speeds c…

14:37

2026-06-06

dev.to

large-language-models

Cut 70%+ LLM API Expense with Qwen-Turbo & DeepSeek: Real Pricing & Optimization Case

A developer built a cost-saving solution combining Qwen-Turbo and DeepSeek series APIs, cutting total token costs up to 72% without reducing response quality. The system uses task-based model routing,…

04:00

2026-06-05

arxiv.org

large-language-models

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

Researchers have developed ReasoningFlow, a framework that maps the non-linear discourse structures of large reasoning model (LRM) traces into directed acyclic graphs (DAGs) to improve evaluation and …

16:00

2026-05-27

dev.to

large-language-models

Why your quantized LLM loses its MTP heads and how to keep them

A developer discovered that standard quantization pipelines for large language models silently discard multi-token prediction (MTP) heads, causing speculative decoding speedups to vanish despite the b…

03:32

2026-05-27

dev.to

large-language-models

I built a Rust inference engine that streams MoE expert weights from NVMe SSDs, no GPU required

A developer built Micro-Expert-Router, a Rust inference engine that streams Mixture-of-Experts model weights directly from NVMe SSDs using io_uring with O_DIRECT, eliminating the need for GPU VRAM. Th…

03:12

2026-05-27

metaworld.me

ai-research

Finding deadlocks in CuTe kernels with SPIN

Researchers at the FlashInfer MLSYS Challenge developed a formal verification method using the SPIN model checker to detect deadlocks in CuTe DSL kernels running on NVIDIA B200 GPUs. The approach, dem…

13:14

2026-05-23

dev.to

large-language-models

Multi-Head Latent Attention (MLA)

**Summary:** Multi-Head Latent Attention (MLA) is an attention mechanism used in DeepSeek-V2/V3 and Kimi K2.x models that compresses the Key-Value (KV) cache by projecting full KV pairs into a shared,…

// co-occurs with top 8 entities

NVIDIA 5 Mixtral 2 Blackwell 2 NVLink 2 DeepSeek-R1 2 DeepSeek 2 Claude 2 GPT-4.1-mini 2