Py Xiaozhi
Open-source AI assistant ecosystem with MCP integrations, multimodal workflows, IoT support, and cross-platform voice interaction.
Open-source AI assistant ecosystem with MCP integrations, multimodal workflows, IoT support, and cross-platform voice interaction.
暂未识别到可直接复制的 MCP 配置,请查看 GitHub README。后台管理员可以补充配置。
# py-xiaozhi English | [简体中文](README.zh.md) ## About py-xiaozhi is a lightweight, cross-platform multi-modal AI interaction framework built on Python's async architecture. It supports real-time voice streaming, vision-language tasks, and IoT device control. Deployable across Windows, macOS, Linux desktops, and ARM embedded platforms (Raspberry Pi, Horizon Robotics RDK, Jetson Nano), it bridges the gap between Large Language Models and physical hardware — out of the box. > Evolved from the [xiaozhi-esp32](https://github.com/78/xiaozhi-esp32) firmware project. Officially adopted by [D-Robotics (xiaozhi-in-rdk)](https://github.com/D-Robotics/xiaozhi-in-rdk) as an upstream dependency. ## Related Projects - [xiaozhi-desktop](https://xiaozhi.junsen.online) — Electron desktop client with AEC echo cancellation, Live2D, floating window modes, and Windows / macOS installers ## Demo - [Bilibili Demo Video](https://www.bilibili.com/video/BV1HmPjeSED2/#reply255921347937)  ## Key Features - **Real-time Voice AI** — Opus codec with auto frame detection (RFC 6716 TOC parsing), async streaming, sub-20ms latency - **Multi-modal Vision** — Camera capture + vision-language model integration for image understanding and scene perception - **MCP Tool Ecosystem** — Modular JSON-RPC 2.0 tool server: music player, camera, screenshot, app management, weather, volume control - **Cross-platform Deployment** — Windows 10+ / macOS 10.15+ / Linux (x86_64 & ARM), optimized for Raspberry Pi and edge boards - **Multiple UI Modes** — PySide6 + QML GUI / CLI / GPIO, adapting to desktop, headless server, and embedded environments - **Offline Wake Word** — Sherpa-ONNX based on-device keyword spotting with custom wake word support - **IoT & Embodied AI Ready** — GPIO interface for robotics control, hardware actuation, and sensor integration - **WebSocket / MQTT** — Dual protocol communication with WSS/TLS encrypted transmission and auto-reconnection - **Plugin Architecture** — Event-driven async design, clean dependency injection, extensible plugin system ## System Requirements ### Basic Requirements - **Python Version**: 3.10 - 3.12 - **Operating System**: Windows 10+, macOS 10.15+, Linux - **Audio Devices**: Microphone and speaker devices - **Network Connection**: Stable internet connection (for AI services and online features) ### Recommended Configuration - **Memory**: At least 4GB RAM (8GB+ recommended) - **Processor**: Modern CPU with AVX instruction set support - **Storage**: At least 2GB available disk space (for model files and cache) - **Audio**: Audio devices supporting 16kHz sampling rate ### Optional Feature Requirements - **Voice Wake-up**: Requires downloading Sherpa-ONNX speech recognition models - **Camera Features**: Requires camera device and OpenCV support ## Read This First - Carefully read [项目文档](https://huangjunsen0406.github.io/py-xiaozhi/) for startup tutorials and file descriptions - The main branch has the latest code; manually reinstall pip dependencies after each update to ensure you have new dependencies [Zero to Xiaozhi Client (Video Tutorial)](https://www.bilibili.com/video/BV1dWQhYEEmq/?vd_source=2065ec11f7577e7107a55bbdc3d12fce) ## Technical Architecture ### Core Architecture Design - **Event-Driven Architecture**: Based on asyncio asynchronous event loop, supporting h...
A collection of MCP servers.
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的...
Chrome DevTools for coding agents
Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, M...
Playwright MCP server
GitHub's official MCP Server