GPT-OSS

mentions 7 type Organization feed RSS

// recent coverage 7 mentions

14:48

2026-06-25

pytorch.org

large-language-models

TokenSpeed-Kernel: Portable APIs and High-Performance Kernels for Multi-Silicon LLM Inference

LightSeek Org open-sourced TokenSpeed-Kernel, a portable API and high-performance kernel subsystem for multi-silicon LLM inference, decoupling runtime from hardware-specific code to simplify backend c…

05:52

2026-06-20

github.com

machine-learning

Release 4.0.0 · HuggingFace/Transformers.js

HuggingFace released Transformers.js v4, a major update featuring a new WebGPU backend rewritten in C++ for faster AI model inference in browsers, Node, Bun, and Deno. The release adds support for lar…

22:06

2026-06-19

marktechpost.com

artificial-intelligence

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

Researchers from Sina Weibo Inc (China) released VibeThinker-3B, a 3-billion-parameter dense reasoning model built on Qwen2.5-Coder-3B using the Spectrum-to-Signal post-training pipeline. The open-sou…

18:13

2026-06-19

dev.to

developer-tools

I Wired OpenRouter Free Models Into My OpenClaw Fallback Chain. Here's What Actually Works.

A developer fixed a broken fallback chain in their OpenClaw agent that was causing request timeouts during peak hours. The new chain includes seven entries: two local Ollama models, three OpenRouter f…

16:45

2026-06-15

developer.nvidia.com

machine-learning

Boosting MoE Training Throughput with Advanced Fusion Kernels

NVIDIA introduced advanced fused MLP kernels for mixture-of-experts (MoE) models, built with the CuTe DSL, delivering 1.3x–2x kernel-level speedups and enabling sync-free MoE execution. The optimizati…

14:45

2026-06-05

ianbarber.blog

large-language-models

Somehow, more on distillation

Microsoft AI released a detailed technical report on the development of its first model, MAI-Thinking-1, emphasizing a controlled, reproducible training process built on human-generated data and propr…

00:00

2026-05-10

jola.dev

large-language-models

Running local models on an M4 with 24GB memory

The article describes the author's successful setup for running local AI models on an M4 Mac with 24GB of memory, specifically highlighting Qwen 3.5-9B (Q4 quantized) as the best performing model at ~…

// co-occurs with top 8 entities

NVIDIA 3 Ollama 2 Qwen 2 Gemma 2 GLM-5 2 llama.cpp 1 LM Studio 1 Devstral 1