An eval is just a test that returns a value
A developer created apte, an async-first test framework for Python that unifies unit tests and LLM evals in the same engine with shared fixtures. The framework uses typed dependencies with Annotated a…
A developer created apte, an async-first test framework for Python that unifies unit tests and LLM evals in the same engine with shared fixtures. The framework uses typed dependencies with Annotated a…
LightShield, a SIEM built by LS-SIEM LLP, developed qa-probe, an open-source tool that stops AI coding assistants from hallucinating bug fixes by providing ground-truth evidence. The tool analyzes sou…
A developer argues that assembling context for large language models is fundamentally a data assembly problem, not a prompt engineering one. The author introduces pydantic-resolve as a tool to structu…
AI and modern tooling are reshaping geospatial work by enabling free satellite data, cloud platforms, and machine learning models to process imagery at scale. Practical workflows now use open-source l…
Prompt engineering is not disappearing but evolving into context engineering, argues a developer who builds AI workflows. As AI models improve, the need for clear, structured prompts grows, especially…
A developer argues that the software industry is measuring engineers with outdated metrics like lines of code or number of commits, while AI tools have made code generation cheap. The real value now l…
PowWater, a startup building an operating system for water delivery, is expanding to Mexico City after processing billions of liters in Kenya. The company uses a Python/FastAPI/PostgreSQL stack and in…
A developer argues that building custom API wrappers for ML models is a waste of time, advocating instead for using the Model Context Protocol (MCP) to connect AI agents directly to deployed models. T…
A developer built a production-grade API for a 3B text-to-SQL model using FastAPI and Ollama, enabling natural language queries against SQLite databases at zero inference cost. The API, part of the de…
A developer applied Specmatic's spec-first approach to TRIO, a multi-agent AI assistant built with FastAPI and React. Contract testing with OpenAPI 3.0 caught mismatches in response fields, status cod…
A backend engineer who initially dismissed DeepSeek now routes 40% of LLM traffic through DeepSeek V4 Flash after stress-testing it on production workloads. The model delivers 97% of GPT-4o's reasonin…
An ML engineer built a system that automatically rolls back machine learning models before they harm production data by routing a small percentage of live traffic to a canary model, continuously measu…
A developer built a production-ready LangGraph ReAct agent that exposes an OpenAI-compatible API, supports multi-model switching via a gateway, and includes one-line tracing with Langfuse. The deploym…
IBM released CUGA, an open-source agent harness that handles orchestration, state management, and tool integration, allowing developers to build agentic apps with just a tool list and a prompt. The co…
MemoryOps AI launched an open-source governed memory infrastructure for AI assistants, implementing a ChatGPT-style memory lifecycle with policy enforcement, typed storage, hybrid retrieval, and audit…
Aegis, an open-source OpenAI-compatible governance proxy, uses a two-path model with Python's FastAPI for the hot path and a Rust extension via PyO3 for cryptographic operations to mitigate GIL conten…
A developer released graphlens, an open-source code-analysis framework that parses source projects into a typed graph with resolved references. The tool uses language-specific adapters and resolvers t…
Koji, a self-hostable personal website engine for developers, has been released. It uses FastAPI, Markdown on disk, and a minimalist layout with no database, offering features like HTMX-based live sea…
Conduit released v0.8.4 of its self-hosted Bitcoin Lightning payment infrastructure for AI agents, enabling operators to run their own LND node with virtual sub-balances, spending policies, and a plat…
A developer building a memory agent for the Global AI Hackathon Series discovered that implementing effective forgetting is harder than expected. The agent, built with Qwen Cloud, FastAPI, Neon Postgr…