★ 292
Rust Apache-2.0 sse 更新 1小时前
Smg
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.
安装配置
暂未识别到可直接复制的 MCP 配置,请查看 GitHub README。后台管理员可以补充配置。
README 摘要
Shepherd Model Gateway
High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.
## Why SMG?
| | |
|:--------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **🚀 Maximize GPU Utilization** | Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation. |
| **🔌 One API, Any Backend** | Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint. |
| **⚡ Built for Speed** | Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running. |
| **🔒 Enterprise Control** | Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure. |
| **📊 Full Observability** | 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer. |
**API Coverage:** OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.
## Quick Start
**Install** — pick your preferred method:
```bash
# Docker
docker pull lightseekorg/smg:latest
# Python
pip install smg
# Rust
cargo install smg
```
**Run** — point SMG at your inference workers:
```bash
# Single worker
smg launch --worker-urls http://localhost:8000
# Multiple workers with cache-aware routing
smg launch --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware
# With high availability mesh
smg launch --worker-urls http://gpu1:8000 --enable-mesh \
--mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527
```
**Use** — send requests to the gateway:
```bash
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'
```
That's it. SMG is now load-balancing requests across your workers.
## Supported Backends
| Self-Hosted | Cloud Providers |
|-------------|-----------------|
| vLLM | OpenAI |
| SGLang | Anthropic |
| TensorRT-LLM | Google Gemini |
| Ollama | AWS Bedrock |
| Any OpenAI-compatible server | Azure OpenAI |
## Features
| Feature | Description |
|---------|-------------|
| **[8 Routing Policies](docs/concepts/routing/load-balancing.md)** | cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket |
| **[gRPC Pipeline](docs/concepts/architecture/grpc-pipeline.md)** | Native gRPC with streaming, reasoning extraction, and tool call parsing |
| **[MCP Integration](docs/concepts/extensibility/mcp.md)** | Connect external tool servers via Model Context Protocol |
| **[High Availabil...
相关 MCP
A collection of MCP servers.
★ 88072
sse 待补充
Chrome DevTools for coding agents
★ 42127
TypeScript sse 有配置
Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, M...
★ 37614
TypeScript sse 待补充
Playwright MCP server
★ 33159
TypeScript sse 有配置
GitHub's official MCP Server
★ 30243
Go sse 待补充
🚀 The fast, Pythonic way to build MCP servers and clients.
★ 25364
Python sse 待补充