Local “search → fetch → extract” plumbing, exposed as an MCP stdio server.
webpipe is designed for bounded, deterministic evidence gathering:
- cache-first fetch by default
- explicit limits on bytes/chars/results
- stable, schema-versioned JSON outputs
crates/webpipe: public facade crate (re-exportswebpipe-core)crates/webpipe-core: backend-agnostic types + traitscrates/webpipe-local: local implementations (reqwest + filesystem cache + extractors)crates/webpipe-mcp: MCP stdio server + CLI harness (webpipe)
# From the repo root:
cargo install --path crates/webpipe-mcp --bin webpipe --features stdio --force
# Run as an MCP stdio server:
webpipe mcp-stdioUse mcp.example.json as a starting point (it intentionally contains no API keys).
Keyless workflow tip:
- Use the MCP tool
web_seed_urlsto get curated “awesome list” seed URLs, then callweb_search_extractwithurls=[...]andurl_selection_mode=query_rank.
Environment variables (optional, provider-dependent):
- Brave search:
WEBPIPE_BRAVE_API_KEY(orBRAVE_SEARCH_API_KEY) - Tavily search:
WEBPIPE_TAVILY_API_KEY(orTAVILY_API_KEY) - SearXNG search (self-hosted):
WEBPIPE_SEARXNG_ENDPOINT(orWEBPIPE_SEARXNG_ENDPOINTS) - Firecrawl fetch:
WEBPIPE_FIRECRAWL_API_KEY(orFIRECRAWL_API_KEY) - Perplexity deep research:
WEBPIPE_PERPLEXITY_API_KEY(orPERPLEXITY_API_KEY) - ArXiv endpoint override (debug):
WEBPIPE_ARXIV_ENDPOINT(defaulthttps://export.arxiv.org/api/query)
webpipe drops request-secret headers by default:
AuthorizationCookieProxy-Authorization
If a caller supplies them, responses include a warning:
warning_codescontainsunsafe_request_headers_droppedrequest.dropped_request_headerslists header names only
To opt in (only for trusted endpoints), set WEBPIPE_ALLOW_UNSAFE_HEADERS=true.
cargo test -p webpipe-mcp --features stdio