**Concise Description:** Deploy agents, models, RAG & pipelines easily. MCP server simplifies AI deployment. No YAML/MLOps needed.
<div align='center'>
<h1>
The Easiest Way to Deploy Agents, MCP Servers, RAG, Pipelines, and Any Model.
<br/>
No MLOps. No YAML.
</h1>
<img alt="Lightning" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_banner2.png" width="800px" style="max-width: 100%;">
</div>
LitServe lets you serve any model (vision, audio, text) and build full AI systems - agents, chatbots, MCP servers, RAG, pipelines - with full control, batching, multi-GPU, streaming, custom logic, and multi-model support, all without YAML. Unlike most serving engines that serve one model with rigid abstractions, LitServe gives you the flexibility to build complex AI systems.
Self-host or deploy in one-click to [Lightning AI](https://lightning.ai/).
<div align='center'>
✅ Build full AI systems ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
✅ Custom logic + control ✅ Any PyTorch model ✅ Self-host or managed
✅ Multi-GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
✅ No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
<div align='center'>
[](https://pepy.tech/projects/litserve)
[](https://discord.gg/WajDThKAur)

[](https://codecov.io/gh/Lightning-AI/litserve)
[](https://github.com/Lightning-AI/litserve/blob/main/LICENSE)
</div>
</div>
<div align="center">
<div style="text-align: center;">
<a href="#quick-start">Quick Start</a> •
<a href="#featured-examples">Examples</a> •
<a href="#features">Features</a> •
<a href="#performance">Performance</a> •
<a href="#host-anywhere">Hosting</a> •
<a href="https://lightning.ai/docs/litserve">Docs</a>
</div>
</div>
<div align="center">
<a href="https://lightning.ai/docs/litserve/home/get-started">
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>
</div>
## Quick Start
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
```bash
pip install litserve
Examples:
import litserve as ls
# Define the API to include any number of models, databases, etc.
class InferencePipeline(ls.LitAPI):
def setup(self, device):
self.model1 = lambda x: x**2
self.model2 = lambda x: x**3
def predict(self, request):
x = request["input"]
# Perform calculations using both models
a = self.model1(x)
b = self.model2(x)
c = a + b
return {"output": c}
if __name__ == "__main__":
# 12+ features like batching, streaming, etc.
server = ls.LitServer(InferencePipeline(max_batch_size=1), accelerator="auto")
server.run(port=8000)
Deploy for free to Lightning cloud (or self-host anywhere):
# Deploy for free with autoscaling, monitoring, etc.
lightning deploy server.py --cloud
# Or run locally (self host anywhere)
lightning deploy server.py
# python server.py
Test the server: Simulate an HTTP request (run this on any terminal):
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
import re, requests, openai
import litserve as ls
class NewsAgent(ls.LitAPI):
def setup(self, device):
self.openai_client = openai.OpenAI(api_key="OPENAI_API_KEY")
def predict(self, request):
website_url = request.get("website_url", "https://text.npr.org/")
website_text = re.sub(r'<[^>]+>', ' ', requests.get(website_url).text)
# Ask the LLM to tell you about the news
llm_response = self.openai_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"Based on this, what is the latest: {website_text}"}],
)
output = llm_response.choices[0].message.content.strip()
return {"output": output}
if __name__ == "__main__":
server = ls.LitServer(NewsAgent())
server.run(port=8000)
Test it:
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"website_url": "https://text.npr.org/"}'
Here are a few key benefits of using LitServe:
setup()
(more).⚠️ Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.
Here are examples of inference pipelines for common model types and use cases.
**Toy model:** <a href="#define-a-server">Hello world</a>
**LLMs:** <a href="https://lightning.ai/lightning-ai/studios/deploy-llama-3-2-vision-with-litserve">Llama 3.2</a>, <a href="https://lightning.ai/lightning-ai/studios/openai-fault-tolerant-proxy-server">LLM Proxy server</a>, <a href="https://lightning.ai/lightning-ai/studios/deploy-ai-agent-with-tool-use">Agent with tool use</a>
**RAG:** <a href="https://lightning.ai/lightning
Lightning-AI/LitServe
December 12, 2023
July 7, 2025
Python