Deepfake Detector Robustness Testing
A new benchmark dataset, the Social Media Robustness Benchmark, evaluates how deepfake detectors perform on images re-encoded by social media platforms like Instagram, Facebook, TikTok, and X. The dat…
A new benchmark dataset, the Social Media Robustness Benchmark, evaluates how deepfake detectors perform on images re-encoded by social media platforms like Instagram, Facebook, TikTok, and X. The dat…
Five research labs built a multi-model finance simulation game where each of four woodland creature agents runs on a different lab's small language model, with a human player acting as a shadow financ…
A team of developers has built an AI-powered job search tool that uses a fine-tuned Qwen3-8B language model to generate LinkedIn queries, scrape job postings, and score each role against a candidate's…
Persona Atlas, a tool developed during the "build-small" hackathon, transforms public figures into measurable behavioral portraits by having a small-model agent research them online and answer open-en…
A team of developers built Thousand Token Wood, a multi-agent economic simulation where five AI-powered woodland creatures trade goods using a 3-billion-parameter Qwen2.5-3B model. The simulation, cre…
Nvidia released Nemotron 3.5 Content Safety, a single 4-billion-parameter model that unifies multimodal input, multilingual coverage across 140 languages, custom enterprise policy enforcement, and aud…
NVIDIA released Nemotron 3.5 ASR, a 600M-parameter streaming multilingual speech-to-text model that transcribes 40 language-locales from a single checkpoint with built-in punctuation and capitalizatio…
ServiceNow released EVA-Bench Data 2.0, expanding its enterprise voice agent benchmark from one domain to three—Airline Customer Service Management, Enterprise IT Service Management, and Healthcare HR…
NVIDIA researchers developed a task-seeded synthetic Q&A generation workflow for Nemotron-family pretraining that uses public task training splits as capability seeds to generate new task-aligned exam…
A new population-scale synthetic dataset for El Salvador has been released, containing over 8,000 multilingual, region-specific personas with detailed demographic, professional, and cultural attribute…
Hugging Face redesigned its `hf` command-line interface to optimize it for AI coding agents, which now account for significant traffic on the Hub. The new CLI auto-detects when an agent is driving it …
Dharma-AI released DharmaOCR, a structured OCR model, and published a paper demonstrating that Direct Preference Optimization (DPO) reduced text degeneration rates by an average of 59.4% across all te…
Pollen Robotics released remote tool support for the Reachy Mini robot, allowing users to add third-party capabilities like web search and weather lookups with a single command. The new system enables…
Holo3.1, a new family of computer-use agents, is now available with improved robustness across web, desktop, and mobile environments. The release introduces quantized checkpoints for local inference, …
JetBrains released Mellum2, a 12-billion-parameter Mixture-of-Experts model trained on natural language and code that activates only 2.5 billion parameters per token for efficient inference. The open-…
IBM's research demonstrates that large language models alone are insufficient for scalable enterprise AI adoption, requiring "agent logic" — software primitives like knowledge graphs and program analy…
NVIDIA released Cosmos 3, the first open omni-model for physical AI reasoning and action, on Hugging Face June 1, 2026. The single unified model combines world generation, physical reasoning, and acti…
PyTorch released a beginner's guide to its torch.profiler tool, starting with profiling a simple matrix multiplication and addition operation on an A100 GPU. The guide walks through reading profiler t…
Artificial Analysis and IBM Research launched ITBench-AA, the first benchmark for agentic enterprise IT tasks, revealing that all frontier AI models scored below 50% on Site Reliability Engineering ch…
A new free tool called TruthLens, described as a multi-signal deepfake image detector, has been released but its landing page on Hugging Face returns a 404 error. The tool was announced on Hacker News…