Show HN: Gandalf the Grader
Handshake Research released Gandalf the Grader, an open-source reactive agent-as-judge that evaluates AI agents against binary rubric criteria by operating inside the same environment and using the same tools as the agen…
Handshake Research released Gandalf the Grader, an open-source reactive agent-as-judge that evaluates AI agents against binary rubric criteria by operating inside the same environment and using the same tools as the agen…
A new open-source protocol called AGENTS-COLLAB.md provides a live handoff layer for projects where multiple AI agents work on the same codebase across different sessions. The specification solves the problem of agents a…
AWS leaders at the Sales, Marketing and Global Services (SMGS) organization now use NarrateAI, an AI-powered conversational assistant built on Amazon Bedrock AgentCore, to access real-time business intelligence through n…
A developer built a Claude skill that automatically files Gmail receipts into Google Drive, organized by year, month, and vendor, with a CSV log entry for each transaction. The skill, stored as a `SKILL.md` file in a loc…
Google is scrambling to fix widespread bugs and user complaints after replacing the Fitbit app with its new AI-heavy Google Health app, which has been plagued by inaccurate data, mislabeled workouts, missing sleep scores…
ElevenLabs has partnered with Stan Lee Universe to license the late Marvel co-creator's AI-generated voice and likeness for commercial use through its Iconic Voices Marketplace. The deal allows businesses to pay for acce…
Enterprises relying on network policies, API gateways, and role-based access control (RBAC) for AI agent accountability face critical gaps, as these tools were designed for deterministic, human-driven workloads rather th…
Microsoft is rolling out a new "Copilot Design System" to integrate artificial intelligence features more deeply into Windows 11, using language that describes the technology as an "intelligent presence" and "thoughtful …
A developer conducted controlled experiments on CIFAR-10 using a Convolutional Neural Network and found that data preprocessing alone caused model accuracy to range from 65% to over 87%, with one case dropping to nearly …
FastVideo has open-sourced Dreamverse, a real-time video generation workspace that enables "vibe directing" through natural-language iteration, releasing both the frontend and backend as a reference application for gener…
A developer released a small Ruby prototype for an OpenAI-compatible LLM proxy that enforces per-user rate limits using a refillable token bucket system. The proxy, built entirely with Ruby standard libraries and no exte…
Google's AI-generated answers now dominate search results, leaving brands with little visibility into how the technology describes them to customers. On TechCrunch's Equity podcast, Scrunch VP of partnerships Matt Thomps…
A test of five search engines and ChatGPT across five common queries found that no tool produced consistently good results, with all failing to surface correct, non-spam information in the top results more than half the …
A developer has outlined a structured prompting strategy for AI coding tools that can produce production-ready code in seconds rather than broken outputs requiring hours of cleanup. The approach breaks prompts into three…
Micron Technology and SK Hynix both surpassed $1 trillion in stock market value this week, driven by surging demand for chips used in artificial intelligence. Micron reached the milestone just 48 days after hitting $500 …
Meta shares rose nearly 3% Wednesday after the company announced Meta One, a suite of paid subscriptions for Instagram, Facebook, and WhatsApp, priced at $3.99 and $2.99 per month respectively, alongside two AI tiers at …
Microsoft released MAI-Image-2.5, an update to its image generation model that now ranks third on Arena's text-to-image leaderboard, tying with Google's Nano Banana 2. The model delivers significant improvements in text …
AI coding agents interact with developer tools like SDKs, CLIs, and APIs through a multi-step process that differs significantly from human usage, often bypassing documentation or extensions in favor of pre-trained knowl…
A developer argues that most software is fundamentally workflow design, not feature design, and that users care more about completing tasks than about the underlying technology. The developer explains that successful pro…
A researcher claims that giving the AI model Claude Code simulated "ADHD" improved its thinking speed by a factor of two, but outside experts are demanding more rigorous proof of the results. The experimental modificatio…