★ 1206 Rust AGPL-3.0 sse 更新 47分钟前

Webclaw

Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.

安装配置

{
    "mcpServers": {
        "webclaw": {
            "command": "~/.webclaw/webclaw-mcp"
        }
    }
}

README 摘要

webclaw Turn websites into clean markdown, JSON, and LLM-ready context. CLI, MCP server, REST API, and SDKs for AI agents and RAG pipelines. --- Most web scraping tools give your agent one of two bad outputs: - a blocked page, login wall, or empty app shell - raw HTML full of nav, scripts, styling, ads, and duplicated boilerplate [webclaw.io](https://webclaw.io) is the hosted web extraction API for webclaw. This repo contains the open-source CLI, MCP server, extraction engine, and self-hostable server. webclaw turns a URL into clean content your tools can actually use. ```bash webclaw https://example.com --format markdown ``` ```md # Example Domain This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission. ``` Use it from the terminal, wire it into Claude/Cursor through MCP, call the hosted API from your app, or self-host the OSS server. --- ## Install ### Agent setup The fastest way to connect webclaw to Claude Code, Claude Desktop, Cursor, Windsurf, OpenCode, Codex CLI, and other MCP-compatible tools: ```bash npx create-webclaw ``` The installer detects supported clients and configures the MCP server for you. ### Homebrew ```bash brew tap 0xMassi/webclaw brew install webclaw ``` ### Prebuilt binaries Download macOS and Linux binaries from [GitHub Releases](https://github.com/0xMassi/webclaw/releases). ### Docker ```bash docker run --rm ghcr.io/0xmassi/webclaw https://example.com ``` ### Cargo ```bash cargo install --git https://github.com/0xMassi/webclaw.git webclaw-cli cargo install --git https://github.com/0xMassi/webclaw.git webclaw-mcp ``` If building from source fails because native build tools are missing, install the platform prerequisites: | OS | Command | | --- | --- | | Debian / Ubuntu | `sudo apt install -y pkg-config libssl-dev cmake clang git build-essential` | | Fedora / RHEL | `sudo dnf install -y pkg-config openssl-devel cmake clang git make gcc` | | Arch | `sudo pacman -S pkg-config openssl cmake clang git base-devel` | | macOS | `xcode-select --install` | --- ## Quick Start ### Scrape one page ```bash webclaw https://stripe.com --format markdown ``` ### Return LLM-optimized text ```bash webclaw https://docs.anthropic.com --format llm ``` ### Keep only the main content ```bash webclaw https://example.com/blog/post --only-main-content ``` ### Include or exclude selectors ```bash webclaw https://example.com \ --include "article, main, .content" \ --exclude "nav, footer, .sidebar, .ad" ``` ### Crawl a documentation site ```bash webclaw https://docs.rust-lang.org --crawl --depth 2 --max-pages 50 ``` ### Workflow examples - [HTML to Markdown for RAG](examples/html-to-markdown-rag/) - [Firecrawl-compatible API](examples/firecrawl-compatible-api/) - [MCP web scraping](examples/mcp-web-scraping/) - [Proxy-backed crawling](examples/proxy-backed-crawling/) - [Cloudflare diagnostics](examples/cloudflare-diagnostics/) ### Extract brand assets ```bash webclaw https://github.com --brand ``` ### Compare a page over time ```bash webclaw https://example.com/pricing --format json > pricing-old.json webclaw https://example.com/pricing --diff-with pricing-old.json ``` --- ## MCP Server webclaw ships with an MCP server for AI agents. ```bash npx create-webclaw ``` Manual config: ```json { "mcpServers": { "webclaw": { "command...

相关 MCP

Libre Chat

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, M...

★ 37614 TypeScript sse 待补充
mcp sse TypeScript

Github

GitHub's official MCP Server

★ 30243 Go sse 待补充
mcp sse Go

Fast

🚀 The fast, Pythonic way to build MCP servers and clients.

★ 25364 Python sse 待补充
mcp Python sse