cd/entity/Qwen3-4B-InstructΒ· homeβ€Ί entitiesβ€Ί Qwen3-4B-Instruct
grep -l @qwen3-4b-instruct /news/*.json | wc -l β†’ 1

Qwen3-4B-Instruct

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00
2026-06-29
arxiv.org
large-language-models

Tandem Reinforcement Learning with Verifiable Rewards

Researchers propose Tandem Reinforcement Learning (TRL), extending the tandem training paradigm to reinforcement learning with verifiable rewards (RLVR). Training Qwen3-4B-Instruct on competition math…

// co-occurs with top 3 entities