Give your AI agents the entire web.
An open-source MCP server that connects any AI agent to real-time web search, page reading, and site mapping. Three search providers. Intelligent re-ranking. One unified API.
Training data goes stale. The web does not.
AI coding agents are powerful — until they need information that is not in their training data.
A new library version ships. An API changes its response format. A framework deprecates a core method. Your agent does not know. It writes code based on what it learned months ago, and you spend your time fixing the hallucinations.
Some agents have basic web access built in. But “basic” means a single search provider, no content extraction strategy, and no visibility into what the agent actually retrieved.
Your agents need more than a search box. They need a proper web research toolkit — one that searches multiple sources, extracts clean content from any page, and gives you full observability into every request.
Three tools. Full web access.
Everything an agent needs to search, read, and map the web — exposed as MCP tools with a single endpoint.
Search the web
Queries Brave Search, Tavily, and Exa simultaneously. Results are deduplicated, scored by position, boosted when multiple providers agree, and returned as a single ranked list. Up to 35 results per query.
Extract page content
Converts web pages into clean markdown. Three-tier strategy: raw GitHub content for repos, direct markdown fetch for docs, and full JS rendering via Firecrawl for everything else. Strips navigation, ads, and boilerplate.
Discover site structure
Returns up to 5,000 URLs from any website, including subdomains. Uses sitemaps when available. Gives agents a complete picture of a site before reading individual pages.
One query. Three search engines. One ranked result set.
Every search provider has blind spots. Brave has broad web coverage but may miss niche content. Tavily is optimized for AI results. Exa uses neural search with live crawling.
webctx queries all three in parallel, then combines the results intelligently.
Fan out
The query hits Brave, Tavily, and Exa simultaneously. Each provider has a 40-second timeout. If one fails, the others continue.
Normalize
URLs are lowercased, trailing slashes removed, and tracking parameters (UTM tags, fbclid, gclid) stripped. Same page, one entry.
Score
Each result gets points based on rank. Position 1 earns 30 points, position 2 earns 27, declining to 1 point for anything below #14. Top results carry disproportionate weight.
Boost duplicates
When the same URL appears across multiple providers, scores are additive — plus a bonus: +5 points for top-5 results, +3 otherwise. Cross-provider agreement is a strong relevance signal.
Deduplicate and rank
The final list is sorted by score, deduplicated, and capped at 35 results. One high-quality answer set.
Not every page needs a headless browser.
webctx uses a three-tier extraction strategy. Each request automatically takes the fastest available path. All three tiers produce the same output format — your agent doesn't need to know which was used.
GitHub Raw Content
For GitHub URLs, webctx skips the web UI entirely. It parses the URL, constructs the raw.githubusercontent.com path, and fetches the file directly. For repo roots, it locates the README automatically.
Order of magnitude faster than rendering the GitHub page.
Direct Markdown Fetch
For documentation sites serving .md files natively, webctx performs a lightweight HEAD request. If the content exists and is meaningful (>50 bytes), it fetches directly. No rendering overhead.
Skips the browser entirely for sites that serve markdown.
Full Rendering via Firecrawl
For everything else — JS-heavy SPAs, dynamic loading, complex docs — Firecrawl renders the page fully. Navigation, headers, footers, ads, and base64 images are stripped. PDFs are parsed automatically.
Token bucket queue prevents 429 errors. FIFO processing.
Built-in observability. Not an afterthought.
When an agent searches the web or reads a page, you should see exactly what happened. webctx logs every tool call automatically and provides a web dashboard to inspect them.
What gets logged
- Tool name & input parameters
- Full response content
- Execution duration (ms)
- Success / failure status
- Client IP & user agent
- Geolocation (country, region, city)
Dashboard features
- Auto-refreshing every 30 seconds
- Filter by tool type or status
- Expandable entries with rendered markdown
- Copy-to-clipboard for responses
- Color-coded tool badges
Works with any MCP client.
webctx speaks the Model Context Protocol. Any compatible client can connect and use all three tools immediately. One endpoint. One API key.
ChatGPT note: webctx provides a separate /mcp-chatgpt endpoint that exposes only read-link and map-site — since ChatGPT already has its own search, there is no need to duplicate it.
Skip MCP. Pipe the web.
webctx ships a CLI that calls the same tools directly — no server required. Agents can pipe web content into files, chain commands with unix tools, and download raw content from any URL. Built for workflows where stdout matters.
Zero MCP overhead
Direct function calls — no protocol negotiation, no server handshake. Just run the command.
Pipe-native output
Plain text stdout. Pipe into files, grep, awk, jq — or straight to your agent's context.
Download anything
Raw GitHub files, documentation pages, license templates — fetch and save in one line.
Same tools, same engine
Identical search, read-link, and map-site — backed by the same multi-provider pipeline.
Self-host on Vercel. Own your data.
webctx is open source and designed for self-hosting. You own the deployment, the data, and the API keys. No vendor lock-in. No usage-based pricing from a third party.
Prerequisites
- A Vercel account (free tier works)
- API keys for Brave Search, Tavily, Exa, and Firecrawl
- A Turso database (free tier available)
Give your agents the web.
An open-source MCP server with multi-provider search, intelligent content extraction, full site mapping, and built-in observability for every request.