MCP Directory
ServersClientsBlog

xASO - App Store Optimization

AI-powered App Store Optimization platform for mobile apps

Go to xASO
MCP Directory

Model Context Protocol Directory

MKSF LTD
Suite 8805 5 Brayford Square
London, E1 0SG

MCP Directory

  • About
  • Blog
  • Documentation
  • Contact

Menu

  • Servers
  • Clients

© 2026 model-context-protocol.com

The Model Context Protocol (MCP) is an open standard for AI model communication.
Powered by Mert KoseogluSoftware Forge
  1. Home
  2. Servers
  3. FunASR

FunASR

GitHub
Website

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

18,771
1,845

(简体中文|English|日本語|한국어)

<p align="center"> <a href="https://github.com/modelscope/FunASR"><img src="https://svg-banners.vercel.app/api?type=origin&text1=FunASR🤠&text2=💖%20A%20Fundamental%20End-to-End%20Speech%20Recognition%20Toolkit&width=800&height=210" alt="FunASR"></a> </p> <p align="center"> <strong>Industrial speech recognition. 170x faster than Whisper. 50+ languages.</strong><br> <em>Speaker diarization · Emotion detection · Streaming · One API call</em> </p> <p align="center"> <a href="https://pypi.org/project/funasr/"><img src="https://img.shields.io/pypi/v/funasr" alt="PyPI"></a> <a href="https://github.com/modelscope/FunASR"><img src="https://img.shields.io/github/stars/modelscope/FunASR?style=social" alt="Stars"></a> <a href="https://pypi.org/project/funasr/"><img src="https://img.shields.io/pypi/dm/funasr" alt="Downloads"></a> <a href="https://modelscope.github.io/FunASR/"><img src="https://img.shields.io/badge/docs-online-blue" alt="Docs"></a> </p> <p align="center"> <a href="https://trendshift.io/repositories/10479" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10479" alt="modelscope%2FFunASR | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> </p> <p align="center"> <a href="#quick-start">Quick Start</a> · <a href="./examples/colab/">Colab</a> · <a href="#benchmark">Benchmark</a> · <a href="./docs/model_selection.md">Model selection</a> · <a href="./docs/migration_from_whisper.md">Migration guide</a> · <a href="./docs/use_case_showcase.md">Use cases</a> · <a href="./docs/deployment_matrix.md">Deployment matrix</a> · <a href="#model-zoo">Models</a> · <a href="https://modelscope.github.io/FunASR/agent.html">Agent Integration</a> · <a href="https://modelscope.github.io/FunASR/">Docs</a> · <a href="./CONTRIBUTING.md">Contribute</a> </p>

Quick Start

Open In Colab

No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.

pip install torch torchaudio
pip install funasr
from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")

Output — structured text with speaker labels, timestamps, and punctuation:

[00:00.4 → 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 → 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 → 00:12.3] Speaker 0: Go ahead. We have 30 minutes.

That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

LLM-powered ASR: Fun-ASR-Nano

For highest accuracy across 31 languages (including Chinese dialects), use Fun-ASR-Nano — an LLM-based ASR combining SenseVoice encoder with Qwen3-0.6B decoder:

from funasr import AutoModel

model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", vad_model="fsmn-vad", device="cuda")
result = model.generate(input="meeting.wav")

With vLLM acceleration (16x faster, batch processing):

from funasr.auto.auto_model_vllm import AutoModelVLLM

model = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Nano-2512", tensor_parallel_size=1)
results = model.generate(["audio1.wav", "audio2.wav"], language="auto")

Deploy as API server: funasr-server --device cuda → OpenAI-compatible endpoint at localhost:8000

Use with AI agents: MCP Server for Claude/Cursor · OpenAI API for LangChain/Dify/AutoGen

Why FunASR?

FunASRWhisperCloud APIs
Speed170x realtime13x realtime~1x realtime
Speaker ID✅ Built-in❌ Needs pyannote✅ Extra cost
Emotion✅ Happy/Sad/Angry❌❌
Languages50+57Varies
Streaming✅ WebSocket❌✅
vLLM Acceleration✅ 2-3x faster❌N/A
Self-hosted✅ MIT license✅ MIT license❌ Cloud only
CostFreeFree$0.006/min+
CPU viable✅ 17x realtime❌ Too slowN/A

Trying FunASR for the first time? Use the Colab quickstart before setting up a local environment. Choosing a first model? Start with the model selection guide. Planning a switch from Whisper or a cloud ASR provider? Use the migration guide and benchmark example to test representative audio, map features, and roll out safely.


<a name="benchmark"></a>

Benchmark

184 long-form audio files (192 min). Full report →

ModelGPU SpeedCPU Speedvs Whisper-large-v3
SenseVoice-Small170x realtime17x realtime🚀 13x faster
Paraformer-Large120x realtime15x realtime🚀 9x faster
Whisper-large-v3-turbo46x realtime❌3.4x faster
Fun-ASR-Nano17x realtime3.6x realtime1.3x faster
Whisper-large-v313x realtime❌baseline

Key takeaway: FunASR models run on CPU faster than Whisper runs on GPU.


What's new

  • 2026/05/24: vLLM Inference Engine — 2-3x faster LLM decoding for Fun-ASR-Nano. Streaming WebSocket service with VAD + Speaker Diarization. Guide →
  • 2026/05/24: Dynamic VAD — adaptive silence threshold (default on). Short sentences stay intact, long segments get auto-split. Details →
  • 2026/05/24: v1.3.3 — funasr-server CLI, OpenAI-compatible API, MCP Server for AI agents. pip install --upgrade funasr
  • 2026/05/20: Added Qwen3-ASR (0.6B/1.7B) — 52 languages, auto detection. usage
  • 2026/05/20: Added GLM-ASR-Nano (1.5B) — 17 languages, dialect support. usage
  • 2026/05/19: Fun-ASR-Nano and SenseVoice now support speaker diarization.
  • 2025/12/15: Fun-ASR-Nano-2512 — 31 languages, tens of millions of hours training.
<details><summary>Older</summary>
  • 2024/10/10: Whisper-large-v3-turbo support added.
  • 2024/07/04: SenseVoice — ASR + emotion + audio events.
  • 2024/01/30: FunASR 1.0 released.
</details>

Installation

pip install funasr
<details><summary>From source / Requirements</summary>
git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./

Requirements: Python ≥ 3.8. Install PyTorch + torchaudio first (pytorch.org), then pip install funasr.

</details>

<a name="model-zoo"></a>

Model Zoo

ModelTaskLanguagesParamsLinks
Fun-ASR-NanoASR + timestamps31 languages800M⭐ 🤗
SenseVoiceSmallASR + emotion + eventszh/en/ja/ko/yue234M⭐ 🤗
Paraformer-zhASR + timestampszh/en220M⭐ 🤗
Paraformer-zh-streamingStreaming ASRzh/en220M⭐ 🤗
Qwen3-ASRASR, 52 languagesmultilingual1.7Busage
GLM-ASR-NanoASR, 17 languagesmultilingual1.5Busage
Whisper-large-v3ASR + translationmultilingual1550Musage
Whisper-large-v3-turboASR + translationmultilingual809Musage
ct-puncPunctuationzh/en290M⭐ 🤗
fsmn-vadVADzh/en0.4M⭐ 🤗
cam++Speaker diarization—7.2M⭐ 🤗
emotion2vec+largeEmotion recognition—300M⭐ 🤗

Usage

Full examples with parameter docs: Tutorial →

from funasr import AutoModel

# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav", hotword="关键词 20")

# 31 languages with timestamps
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", hub="hf", trust_remote_code=True,
                  vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda")
result = model.generate(input="audio.wav", batch_size=1)

# Streaming real-time
model = AutoModel(model="paraformer-zh-streaming", device="cuda")
result = model.generate(input="chunk.wav", cache={}, chunk_size=[0, 10, 5])

# Emotion recognition
model = AutoModel(model="emotion2vec_plus_large", device="cuda")
result = model.generate(input="audio.wav", granularity="utterance")

CLI (Agent-Friendly)

# Transcribe audio (simplest)
funasr audio.wav

# JSON output (for AI agents)
funasr audio.wav --output-format json

# SRT subtitles
funasr audio.wav --output-format srt --output-dir ./subs

# Speaker diarization + timestamps
funasr audio.wav --spk --timestamps -f json

# Choose model and language
funasr audio.wav --model paraformer --language zh

# Batch transcribe
funasr *.wav --output-format srt --output-dir ./output

Available models: sensevoice (default), paraformer, paraformer-en, fun-asr-nano


Deploy

# OpenAI-compatible API (recommended)
pip install torch torchaudio
pip install funasr vllm fastapi uvicorn python-multipart
funasr-server --device cuda
# → POST /v1/audio/transcriptions at localhost:8000

Verify it with a public sample:

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json
# Docker streaming service
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12

OpenAI API example → · Gradio demo → · Client recipes → · JavaScript/TypeScript recipes → · Kubernetes template → · Workflow recipes → · Postman collection → · OpenAPI spec → · Security guide → · Deployment matrix → · Deployment docs → · Agent integration →


Community

📖 Documentation🐛 Issues
💬 Discussions🤗 HuggingFace
🤝 Contributing📈 20k growth plan

Star History

<a href="https://star-history.com/#modelscope/FunASR&Date"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=modelscope/FunASR&type=Date&theme=dark" /> <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=modelscope/FunASR&type=Date" /> <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=modelscope/FunASR&type=Date" width="600" /> </picture> </a>

License

MIT License

Citations

@inproceedings{gao2023funasr,
  author={Zhifu Gao and others},
  title={FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  booktitle={INTERSPEECH},
  year={2023}
}

Repository

MO
modelscope

modelscope/FunASR

Created

November 24, 2022

Updated

June 15, 2026

Language

Python

Category

AI