tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

8,479

524

TensorZero

TensorZero is an open-source stack for industrial-grade LLM applications:

Gateway: access every LLM provider through a unified API, built for performance (<1ms p99 latency)
Observability: store inferences and feedback in your database, available programmatically or in the UI
Optimization: collect metrics and human feedback to optimize prompts, models, and inference strategies
Evaluations: benchmark individual inferences or end-to-end workflows using heuristics, LLM judges, etc.
Experimentation: ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Take what you need, adopt incrementally, and complement with other tools.

<a href="https://www.tensorzero.com/" target="_blank">Website</a> · <a href="https://www.tensorzero.com/docs" target="_blank">Docs</a> · <a href="https://www.x.com/tensorzero" target="_blank">Twitter</a> · <a href="https://www.tensorzero.com/slack" target="_blank">Slack</a> · <a href="https://www.tensorzero.com/discord" target="_blank">Discord</a> <a href="https://www.tensorzero.com/docs/quickstart" target="_blank">Quick Start (5min)</a> · <a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Deployment Guide</a> · <a href="https://www.tensorzero.com/docs/gateway/api-reference" target="_blank">API Reference</a> · <a href="https://www.tensorzero.com/docs/gateway/deployment" target="_blank">Configuration Reference</a>

<table> <tr> <td width="30%" valign="top">What is TensorZero?</td> <td width="70%" valign="top">TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluations, and experimentation.</td> </tr> <tr> <td width="30%" valign="top">How is TensorZero different from other LLM frameworks?</td> <td width="70%" valign="top"> 1. TensorZero enables you to optimize complex LLM applications based on production metrics and human feedback. 2. TensorZero supports the needs of industrial-grade LLM applications: low latency, high throughput, type safety, self-hosted, GitOps, customizability, etc. 3. TensorZero unifies the entire LLMOps stack, creating compounding benefits. For example, LLM evaluations can be used for fine-tuning models alongside AI judges. </td> </tr> <tr> <td width="30%" valign="top">Can I use TensorZero with ___?</td> <td width="70%" valign="top">Yes. Every major programming language is supported. You can use TensorZero with our Python client, any OpenAI SDK or OpenAI-compatible client, or our HTTP API.</td> </tr> <tr> <td width="30%" valign="top">Is TensorZero production-ready?</td> <td width="70%" valign="top">Yes. Here's a case study: <a href="https://www.tensorzero.com/blog/case-study-automating-code-changelogs-at-a-large-bank-with-llms">Automating Code Changelogs at a Large Bank with LLMs</a></td> </tr> <tr> <td width="30%" valign="top">How much does TensorZero cost?</td> <td width="70%" valign="top">Nothing. TensorZero is 100% self-hosted and open-source. There are no paid features.</td> </tr> <tr> <td width="30%" valign="top">Who is building TensorZero?</td> <td width="70%" valign="top">Our technical team includes a former Rust compiler maintainer, machine learning researchers (Stanford, CMU, Oxford, Columbia) with thousands of citations, and the chief product officer of a decacorn startup. We're backed by the same investors as leading open-source projects (e.g. ClickHouse, CockroachDB) and AI labs (e.g. OpenAI, Anthropic).</td> </tr> <tr> <td width="30%" valign="top">How do I get started?</td> <td width="70%" valign="top">You can adopt TensorZero incrementally. Our <a href="https://www.tensorzero.com/docs/quickstart">Quick Start</a> goes from a vanilla OpenAI wrapper to a production-ready LLM application with observability and fine-tuning in just 5 minutes.</td> </tr> </table>

Features

🌐 LLM Gateway

Integrate with TensorZero once and access every major LLM provider.

Access every major LLM provider (API or self-hosted) through a single unified API
Infer with streaming, tool use, structured generation (JSON mode), batch, multimodal (VLMs), file inputs, caching, etc.
Define prompt templates and schemas to enforce a consistent, typed interface between your application and the LLMs
Satisfy extreme throughput and latency needs, thanks to Rust: <1ms p99 latency overhead at 10k+ QPS
Integrate using our Python client, any OpenAI SDK or OpenAI-compatible client, or our HTTP API (use any programming language)
Ensure high availability with routing, retries, fallbacks, load balancing, granular timeouts, etc.
Soon: embeddings; real-time voice

<table> <tr></tr>  <tr> <td width="50%" align="center" valign="middle">Model Providers</td> <td width="50%" align="center" valign="middle">Features</td> </tr> <tr> <td width="50%" align="left" valign="top"> The TensorZero Gateway natively supports: <ul> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/anthropic">Anthropic</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/aws-bedrock">AWS Bedrock</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/aws-sagemaker">AWS SageMaker</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/azure">Azure OpenAI Service</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/deepseek">DeepSeek</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/fireworks">Fireworks</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-anthropic">GCP Vertex AI Anthropic</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/gcp-vertex-ai-gemini">GCP Vertex AI Gemini</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/google-ai-studio-gemini">Google AI Studio (Gemini API)</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/hyperbolic">Hyperbolic</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/mistral">Mistral</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/openai">OpenAI</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/together">Together</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/vllm">vLLM</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/providers/xai">xAI</a></li> </ul> Need something else? Your provider is most likely supported because TensorZero integrates with <a href="https://www.tensorzero.com/docs/gateway/guides/providers/openai-compatible">any OpenAI-compatible API (e.g. Ollama)</a>. </td> <td width="50%" align="left" valign="top"> The TensorZero Gateway supports advanced features like: <ul> <li><a href="https://www.tensorzero.com/docs/gateway/guides/retries-fallbacks">Retries & Fallbacks</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations">Inference-Time Optimizations</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/prompt-templates-schemas">Prompt Templates & Schemas</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/experimentation/">Experimentation (A/B Testing)</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/configuration-reference">Configuration-as-Code (GitOps)</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/batch-inference">Batch Inference</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/multimodal-inference">Multimodal Inference (VLMs)</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/inference-caching">Inference Caching</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/metrics-feedback">Metrics & Feedback</a></li> <li><a href="https://www.tensorzero.com/docs/gateway/guides/episodes">Multi-Step LLM Workflows (Episodes)</a></li> <li>& a lot more...</li> </ul> The TensorZero Gateway is written in Rust 🦀 with performance in mind (<1ms p99 latency overhead @ 10k QPS). See <a href="https://www.tensorzero.com/docs/gateway/benchmarks">Benchmarks</a>. You can run inference using the TensorZero client (recommended), the OpenAI client, or the HTTP API. </td> </tr> </table> <details open> <summary>Usage: Python — TensorZero Client (Recommended)</summary>

You can access any provider using the TensorZero Python client.

pip install tensorzero
Optional: Set up the TensorZero configuration.
Run inference:

from tensorzero import TensorZeroGateway # or AsyncTensorZeroGateway

with TensorZeroGateway.build_embedded(clickhouse_url="...", config_file="...") as client:
 response = client.inference(
 model_name="openai::gpt-4o-mini",
 # Try other providers easily: "anthropic::claude-3-7-sonnet-20250219"
 input={
 "messages": [
 {
 "role": "user",
 "content": "Write a haiku about artificial intelligence.",
 }
 ]
 },
 )

See Quick Start for more information.

</details> <details> <summary>Usage: Python — OpenAI Client</summary>

You can access any provider using the OpenAI Python client with TensorZero.

pip install tensorzero
Optional: Set up the TensorZero configuration.
Run inference:

from openai import OpenAI # or AsyncOpenAI
from tensorzero import patch_openai_client

client = OpenAI()

patch_openai_client(
 client,
 clickhouse_url="http://chuser:chpassword@localhost:8123/tensorzero",
 config_file="config/tensorzero.toml",
 async_setup=False,
)

response = client.chat.completions.create(
 model="tensorzero::model_name::openai::gpt-4o-mini",
 # Try other providers easily: "tensorzero::model_name::anthropic::claude-3-7-sonnet-20250219"
 messages=[
 {
 "role": "user",
 "content": "Write a haiku about artificial intelligence.",
 }
 ],
)

See Quick Start for more information.

</details> <details> <summary>Usage: JavaScript / TypeScript (Node) — OpenAI Client</summary>

You can access any provider using the OpenAI Node client with TensorZero.

Deploy tensorzero/gateway using Docker.
Detailed instructions →
Set up the TensorZero configuration.
Run inference:

import OpenAI from "openai";

const client = new OpenAI({
 baseURL: "http://localhost:3000/openai/v1",
});

const response = await client.chat.completions.create({
 model: "tensorzero::model_name::openai::gpt-4o-mini",
 // Try other providers easily: "tensorzero::model_name::anthropic::claude-3-7-sonnet-20250219"
 messages: [
 {
 role: "user",
 content: "Write a haiku about artificial intelligence.",
 },
 ],
});

See Quick Start for more information.

</details> <details> <summary>Usage: Other Languages & Platforms — HTTP API</summary>

TensorZero supports virtually any programming language or platform via its HTTP API.

Deploy tensorzero/gateway using Docker.
Detailed instructions →
Optional: Set up the TensorZero configuration.
Run inference:

curl -X POST "http://localhost:3000/inference" \
 -H "Content-Type: application/json" \
 -d '{
 "model_name": "openai::gpt-4o-mini",
 "input": {
 "messages": [
 {
 "role": "user",
 "content": "Write a haiku about artificial intelligence."
 }
 ]
 }
 }'

See Quick Start for more information.

</details>

🔍 LLM Observability

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.

Store inferences and feedback (metrics, human edits, etc.) in your own database
Dive into individual inferences or high-level aggregate patterns using the TensorZero UI or programmatically
Build datasets for optimization, evaluations, and other workflows
Replay historical inferences with new prompts, models, inference strategies, etc.
Export OpenTelemetry (OTLP) traces to your favorite general-purpose observability tool
Soon: AI-assisted debugging and root cause analysis; AI-assisted data labeling

<table> <tr></tr>  <tr> <td width="50%" align="center" valign="middle">Observability » Inference</td> <td width="50%" align="center" valign="middle">Observability » Function</td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/2cc3cc9a-f33f-4e94-b8de-07522326f80a"></td> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/00ae6605-8fa0-4efd-8238-ae8ea589860f"></td> </tr> </table>

📈 LLM Optimization

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.

Optimize your models with supervised fine-tuning, RLHF, and other techniques
Optimize your prompts with automated prompt engineering algorithms like MIPROv2
Optimize your inference strategy with dynamic in-context learning, chain of thought, best/mixture-of-N sampling, etc.
Enable a feedback loop for your LLMs: a data & learning flywheel turning production data into smarter, faster, and cheaper models
Soon: programmatic optimization; synthetic data generation

Model Optimization

Optimize closed-source and open-source models using supervised fine-tuning (SFT) and preference fine-tuning (DPO).

<table> <tr></tr>  <tr> <td width="50%" align="center" valign="middle">Supervised Fine-tuning — UI</td> <td width="50%" align="center" valign="middle">Preference Fine-tuning (DPO) — Jupyter Notebook</td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/cf7acf66-732b-43b3-af2a-5eba1ce40f6f"></td> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/a67a0634-04a7-42b0-b934-9130cb7cdf51"></td> </tr> </table>

Inference-Time Optimization

Boost performance by dynamically updating your prompts with relevant examples, combining responses from multiple inferences, and more.

<table> <tr></tr>  <tr> <td width="50%" align="center" valign="middle"><a href="https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#best-of-n-sampling">Best-of-N Sampling</a></td> <td width="50%" align="center" valign="middle"><a href="https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#mixture-of-n-sampling">Mixture-of-N Sampling</a></td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/c0edfa4c-713c-4996-9964-50c0d26e6970"></td> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/75b5bf05-4c1f-43c4-b158-d69d1b8d05be"></td> </tr> <tr> <td width="50%" align="center" valign="middle"><a href="https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#dynamic-in-context-learning-dicl">Dynamic In-Context Learning (DICL)</a></td> <td width="50%" align="center" valign="middle"><a href="https://www.tensorzero.com/docs/gateway/guides/inference-time-optimizations#chain-of-thought-cot">Chain-of-Thought (CoT)</a></td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/d8489e92-ce93-46ac-9aab-289ce19bb67d"></td> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/ea13d73c-76a4-4e0c-a35b-0c648f898311" height="320"></td> </tr> </table>

More coming soon...

Prompt Optimization

Optimize your prompts programmatically using research-driven optimization techniques.

More coming soon...

📊 LLM Evaluations

Compare prompts, models, and inference strategies using TensorZero Evaluations — with support for heuristics and LLM judges.

Evaluate individual inferences with static evaluations powered by heuristics or LLM judges (≈ unit tests for LLMs)
Evaluate end-to-end workflows with dynamic evaluations with complete flexibility (≈ integration tests for LLMs)
Optimize LLM judges just like any other TensorZero function to align them to human preferences
Soon: more built-in evaluators; headless evaluations

<table> <tr></tr>  <tr> <td width="50%" align="center" valign="middle">Evaluation » UI</td> <td width="50%" align="center" valign="middle">Evaluation » CLI</td> </tr> <tr> <td width="50%" align="center" valign="middle"><img src="https://github.com/user-attachments/assets/f4bf54e3-1b63-46c8-be12-2eaabf615699"></td> <td width="50%" align="left" valign="middle"> <pre><code class="language-bash">docker compose run --rm evaluations \ --evaluation-name extract_data \ --dataset-name hard_test_cases \ --variant-name gpt_4o \ --concurrency 5</code></pre> <pre><code class="language-bash">Run ID: 01961de9-c8a4-7c60-ab8d-15491a9708e4 Number of datapoints: 100 ██████████████████████████████████████ 100/100 exact_match: 0.83 ± 0.03 semantic_match: 0.98 ± 0.01 item_count: 7.15 ± 0.39</code></pre> </td> </tr> </table>

🧪 LLM Experimentation

Ship with confidence with built-in A/B testing, routing, fallbacks, retries, etc.

Ship with confidence with built-in A/B testing for models, prompts, providers, hyperparameters, etc.
Enforce principled experiments (RCTs) in complex workflows, including multi-turn and compound LLM systems
Soon: multi-armed bandits; AI-managed experiments

& more!

Build with an open-source stack well-suited for prototypes but designed from the ground up to support the most complex LLM applications and deployments.

Build simple applications or massive deployments with GitOps-friendly orchestration
Extend TensorZero with built-in escape hatches, programmatic-first usage, direct database access, and more
Integrate with third-party tools: specialized observability and evaluations, model providers, agent orchestration frameworks, etc.
Soon: UI playground

Demo

Watch LLMs get better at data extraction in real-time with TensorZero!
Dynamic in-context learning (DICL) is a powerful inference-time optimization available out of the box with TensorZero.
It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.

https://github.com/user-attachments/assets/4df1022e-886e-48c2-8f79-6af3cdad79cb

Get Started

Start building today.
The Quick Start shows it's easy to set up an LLM application with TensorZero.

Questions?
Ask us on Slack or Discord.

Using TensorZero at work?
Email us at hello@tensorzero.com to set up a Slack or Teams channel with your team (free).

Work with us.
We're hiring in NYC.
We'd also welcome open-source contributions!

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero
This example shows how to use TensorZero to optimize a data extraction pipeline.
We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL).
In the end, an optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.

Agentic RAG — Multi-Hop Question Answering with LLMs
This example shows how to build a multi-hop retrieval agent using TensorZero.
The agent iteratively searches Wikipedia to gather information, and decides when it has enough context to answer a complex question.

Writing Haikus to Satisfy a Judge with Hidden Preferences
This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste.
You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants.
You'll see progress by fine-tuning the LLM multiple times.

Improving LLM Chess Ability with Best-of-N Sampling
This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)
TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows.
But you can also easily create your own recipes and workflows!
This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.

& many more on the way!

Repository

tensorzero

tensorzero/tensorzero

Created

July 16, 2024

Updated

July 7, 2025

Language

Rust

tensorzero

TensorZero

Features

🌐 LLM Gateway

🔍 LLM Observability

📈 LLM Optimization

Model Optimization

Inference-Time Optimization

Prompt Optimization

📊 LLM Evaluations

🧪 LLM Experimentation

& more!

Demo

Get Started

Examples

Repository

Created

Updated

Language

Category