AI OSS tool repo goes archived over night after raising $7.3M Seed TensorZero, an open-source LLMOps platform that raised $7.3M in seed funding, archived its repository overnight. The platform provides tools for LLM gateway, observability, evaluation, optimization, and experimentation, and is used by companies from AI startups to Fortune 10 firms. TensorZero is an open-source LLMOps platform that unifies: Gateway: access every LLM provider through a unified API, built for performance <1ms p99 latency Observability: store inferences and feedback in your database, available programmatically or in the UI Evaluation: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc. Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc. You can take what you need, adopt incrementally, and complement with other tools. It plays nicely with the OpenAI SDK , OpenTelemetry , and every major LLM provider . TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today. Website · · Docs https://www.tensorzero.com/docs · Twitter https://www.x.com/tensorzero · Slack https://www.tensorzero.com/slack Discord https://www.tensorzero.com/discord · Quick Start 5min https://www.tensorzero.com/docs/quickstart · Deployment Guide https://www.tensorzero.com/docs/deployment/tensorzero-gateway · API Reference https://www.tensorzero.com/docs/gateway/api-reference Configuration Reference https://www.tensorzero.com/docs/gateway/configuration-reference tensorzero-demo.mp4 Note TensorZero Autopilot is an automated AI engineer powered by TensorZero that analyzes LLM observability data, sets up evals, optimizes prompts and models, and runs A/B tests. It dramatically improves the performance of LLM agents across diverse tasks: Integrate with TensorZero once and access every major LLM provider. - API or self-hosted through a single unified API Call any LLM https://www.tensorzero.com/docs/gateway/call-any-llm - Infer with , tool use https://www.tensorzero.com/docs/gateway/guides/tool-use , structured outputs JSON https://www.tensorzero.com/docs/gateway/generate-structured-outputs , batch https://www.tensorzero.com/docs/gateway/guides/batch-inference , embeddings https://www.tensorzero.com/docs/gateway/generate-embeddings , multimodal images, files https://www.tensorzero.com/docs/gateway/call-llms-with-image-and-file-inputs , etc. caching https://www.tensorzero.com/docs/gateway/guides/inference-caching - to enforce a structured interface between your application and the LLMs Create prompt templates and schemas https://www.tensorzero.com/docs/gateway/create-a-prompt-template - Satisfy extreme throughput and latency needs, thanks to 🦀 Rust: <1ms p99 latency overhead at 10k+ QPS https://www.tensorzero.com/docs/gateway/benchmarks - with routing, retries, fallbacks, load balancing, granular timeouts, etc. Ensure high availability https://www.tensorzero.com/docs/gateway/guides/retries-fallbacks - and Track usage and cost https://www.tensorzero.com/docs/operations/track-usage-and-cost with granular scopes e.g. tags enforce custom rate limits https://www.tensorzero.com/docs/operations/enforce-custom-rate-limits - to allow clients to access models without sharing provider API keys Set up auth for TensorZero https://www.tensorzero.com/docs/operations/set-up-auth-for-tensorzero Anthropic , , AWS Bedrock https://www.tensorzero.com/docs/gateway/guides/providers/aws-bedrock , AWS SageMaker https://www.tensorzero.com/docs/gateway/guides/providers/aws-sagemaker , Azure https://www.tensorzero.com/docs/gateway/guides/providers/azure , DeepSeek https://www.tensorzero.com/docs/gateway/guides/providers/deepseek , Fireworks https://www.tensorzero.com/docs/gateway/guides/providers/fireworks , GCP Vertex AI Anthropic https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-anthropic , GCP Vertex AI Gemini https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-gemini , Google AI Studio Gemini API https://www.tensorzero.com/docs/gateway/guides/providers/google-ai-studio-gemini , Groq https://www.tensorzero.com/docs/gateway/guides/providers/groq , Hyperbolic https://www.tensorzero.com/docs/gateway/guides/providers/hyperbolic , Mistral https://www.tensorzero.com/docs/gateway/guides/providers/mistral , OpenAI https://www.tensorzero.com/docs/gateway/guides/providers/openai , OpenRouter https://www.tensorzero.com/docs/gateway/guides/providers/openrouter , SGLang https://www.tensorzero.com/docs/gateway/guides/providers/sglang , TGI https://www.tensorzero.com/docs/gateway/guides/providers/tgi , Together AI https://www.tensorzero.com/docs/gateway/guides/providers/together , and vLLM https://www.tensorzero.com/docs/gateway/guides/providers/vllm . xAI Grok https://www.tensorzero.com/docs/gateway/guides/providers/xai Need something else? TensorZero also supports any OpenAI-compatible API e.g. Ollama . You can use TensorZero with any OpenAI SDK Python, Node, Go, etc. or OpenAI-compatible client. one Docker container . Deploy the TensorZero Gateway https://www.tensorzero.com/docs/deployment/tensorzero-gateway - Update the base url and model in your OpenAI-compatible client. - Run inference: python from openai import OpenAI Point the client to the TensorZero Gateway client = OpenAI base url="http://localhost:3000/openai/v1", api key="not-used" response = client.chat.completions.create Call any model provider or TensorZero function model="tensorzero::model name::anthropic::claude-sonnet-4-6", messages= { "role": "user", "content": "Share a fun fact about TensorZero.", } , See Quick Start for more information. Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI. - Store inferences and in your own database feedback metrics, human edits, etc. https://www.tensorzero.com/docs/gateway/guides/metrics-feedback - Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically - for optimization, evaluation, and other workflows Build datasets https://www.tensorzero.com/docs/gateway/api-reference/datasets-datapoints - Replay historical inferences with new prompts, models, inference strategies, etc. - and Export OpenTelemetry traces OTLP https://www.tensorzero.com/docs/operations/export-opentelemetry-traces to your favorite application observability tools export Prometheus metrics https://www.tensorzero.com/docs/operations/export-prometheus-metrics - Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically. - Optimize your models with , RLHF, and other techniques supervised fine-tuning https://www.tensorzero.com/docs/optimization/supervised-fine-tuning-sft - Optimize your prompts with automated prompt engineering algorithms like GEPA https://www.tensorzero.com/docs/optimization/gepa - Optimize your with inference strategy https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations , best/mixture-of-N sampling, etc. dynamic in-context learning https://www.tensorzero.com/docs/optimization/dynamic-in-context-learning-dicl - Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models - Soon: synthetic data generation Compare prompts, models, and inference strategies using evaluations powered by heuristics and LLM judges. - with Evaluate individual inferences https://www.tensorzero.com/docs/evaluations/inference-evaluations/tutorial inference evaluations powered by heuristics or LLM judges ≈ unit tests for LLMs - with Evaluate end-to-end workflows https://www.tensorzero.com/docs/evaluations/workflow-evaluations/tutorial workflow evaluations with complete flexibility ≈ integration tests for LLMs - Optimize LLM judges just like any other TensorZero function to align them to human preferences - Soon: more built-in evaluators; headless evaluations Evaluation » UI | Evaluation » CLI | | docker compose run --rm evaluations \ --evaluation-name extract data \ --dataset-name hard test cases \ --variant-name gpt 4o \ --concurrency 5 Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4 Number of datapoints: 100 ██████████████████████████████████████ 100/100 exact match: 0.83 ± 0.03 n=100 semantic match: 0.98 ± 0.01 n=100 item count: 7.15 ± 0.39 n=100 | Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc. - to ship with confidence and identify the best prompts and models for your use cases. Run adaptive A/B tests https://www.tensorzero.com/docs/experimentation/run-adaptive-ab-tests - Enforce principled experiments in complex workflows, including support for multi-turn LLM systems, sequential testing, and more. Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments. - Build simple applications or massive deployments with GitOps-friendly orchestration - with built-in escape hatches, programmatic-first usage, direct database access, and more Extend TensorZero https://www.tensorzero.com/docs/operations/extend-tensorzero - Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc. - Iterate quickly by experimenting with prompts interactively using the Playground UI How is TensorZero different from other LLM frameworks? - TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback. - TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc. - TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges. Can I use TensorZero with ? Yes. Every major programming language is supported. It plays nicely with the OpenAI SDK , OpenTelemetry , and every major LLM provider . Is TensorZero production-ready? Yes. TensorZero is used by companies ranging from frontier AI startups to the Fortune 10 and powers ~1% of the global LLM API spend today. Here's a case study: Automating Code Changelogs at a Large Bank with LLMs https://www.tensorzero.com/blog/case-study-automating-code-changelogs-at-a-large-bank-with-llms How much does TensorZero cost? TensorZero LLMOps platform is 100% self-hosted and open-source. TensorZero Autopilot automated AI engineer is a complementary paid product powered by TensorZero. Who is building TensorZero? Our technical team includes a former Rust compiler maintainer, machine learning researchers Stanford, CMU, Oxford, Columbia with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects e.g. ClickHouse, CockroachDB and AI labs e.g. OpenAI, Anthropic . See our $7.3M seed round announcement and . We're coverage from VentureBeat https://venturebeat.com/ai/tensorzero-nabs-7-3m-seed-to-solve-the-messy-world-of-enterprise-llm-development/ . hiring in NYC https://www.tensorzero.com/jobs How do I get started? You can adopt TensorZero incrementally. Our Quick Start goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes. Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero. Questions? Ask us on Slack or . Discord https://www.tensorzero.com/discord Using TensorZero at work? Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team free . We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel. Optimizing Data Extraction NER with TensorZero This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning DICL . In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data. Agentic RAG — Multi-Hop Question Answering with LLMs This example shows how to build a multi-hop retrieval agent using TensorZero. The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question. Writing Haikus to Satisfy a Judge with Hidden Preferences This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times. Image Data Extraction — Multimodal Vision Fine-tuning This example shows how to fine-tune multimodal models VLMs like GPT-4o to improve their performance on vision-language tasks. Specifically, we'll build a system that categorizes document images screenshots of computer science research papers . Improving LLM Chess Ability with Best-of-N Sampling This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options. We write about LLM engineering on the TensorZero Blog . Here are some of our favorite posts: Bandits in your LLM Gateway: Improve LLM Applications Faster with Adaptive Experimentation A/B Testing https://www.tensorzero.com/blog/bandits-in-your-llm-gateway/ Is OpenAI's Reinforcement Fine-Tuning RFT Worth It? https://www.tensorzero.com/blog/is-openai-reinforcement-fine-tuning-rft-worth-it/ Distillation with Programmatic Data Curation: Smarter LLMs, 5-30x Cheaper Inference https://www.tensorzero.com/blog/distillation-programmatic-data-curation-smarter-llms-5-30x-cheaper-inference/ From NER to Agents: Does Automated Prompt Engineering Scale to Complex Tasks? https://www.tensorzero.com/blog/from-ner-to-agents-does-automated-prompt-engineering-scale-to-complex-tasks/