Open source MCP server

Give your AI agents the entire web.

An open-source MCP server that connects any AI agent to real-time web search, page reading, and site mapping. Three search providers. Intelligent re-ranking. One unified API.

Training data goes stale. The web does not.

AI coding agents are powerful — until they need information that is not in their training data.

A new library version ships. An API changes its response format. A framework deprecates a core method. Your agent does not know. It writes code based on what it learned months ago, and you spend your time fixing the hallucinations.

Some agents have basic web access built in. But “basic” means a single search provider, no content extraction strategy, and no visibility into what the agent actually retrieved.

Your agents need more than a search box. They need a proper web research toolkit — one that searches multiple sources, extracts clean content from any page, and gives you full observability into every request.

Three tools. Full web access.

Everything an agent needs to search, read, and map the web — exposed as MCP tools with a single endpoint.

search

Search the web

Queries Brave Search, Tavily, and Exa simultaneously. Results are deduplicated, scored by position, boosted when multiple providers agree, and returned as a single ranked list. Up to 35 results per query.

> search("Next.js 15 server actions")
→ Ranked results from 3 providers
read-link

Extract page content

Converts web pages into clean markdown. Three-tier strategy: raw GitHub content for repos, direct markdown fetch for docs, and full JS rendering via Firecrawl for everything else. Strips navigation, ads, and boilerplate.

> read-link("https://docs.example.com/api")
→ Clean markdown with extracted content
map-site

Discover site structure

Returns up to 5,000 URLs from any website, including subdomains. Uses sitemaps when available. Gives agents a complete picture of a site before reading individual pages.

> map-site("https://docs.example.com")
→ Full URL list with titles and descriptions

One query. Three search engines. One ranked result set.

Every search provider has blind spots. Brave has broad web coverage but may miss niche content. Tavily is optimized for AI results. Exa uses neural search with live crawling.

webctx queries all three in parallel, then combines the results intelligently.

01

Fan out

The query hits Brave, Tavily, and Exa simultaneously. Each provider has a 40-second timeout. If one fails, the others continue.

02

Normalize

URLs are lowercased, trailing slashes removed, and tracking parameters (UTM tags, fbclid, gclid) stripped. Same page, one entry.

03

Score

Each result gets points based on rank. Position 1 earns 30 points, position 2 earns 27, declining to 1 point for anything below #14. Top results carry disproportionate weight.

04

Boost duplicates

When the same URL appears across multiple providers, scores are additive — plus a bonus: +5 points for top-5 results, +3 otherwise. Cross-provider agreement is a strong relevance signal.

05

Deduplicate and rank

The final list is sorted by score, deduplicated, and capped at 35 results. One high-quality answer set.

Not every page needs a headless browser.

webctx uses a three-tier extraction strategy. Each request automatically takes the fastest available path. All three tiers produce the same output format — your agent doesn't need to know which was used.

Tier 1Fastest

GitHub Raw Content

For GitHub URLs, webctx skips the web UI entirely. It parses the URL, constructs the raw.githubusercontent.com path, and fetches the file directly. For repo roots, it locates the README automatically.

Order of magnitude faster than rendering the GitHub page.

Tier 2Fast

Direct Markdown Fetch

For documentation sites serving .md files natively, webctx performs a lightweight HEAD request. If the content exists and is meaningful (>50 bytes), it fetches directly. No rendering overhead.

Skips the browser entirely for sites that serve markdown.

Tier 3Comprehensive

Full Rendering via Firecrawl

For everything else — JS-heavy SPAs, dynamic loading, complex docs — Firecrawl renders the page fully. Navigation, headers, footers, ads, and base64 images are stripped. PDFs are parsed automatically.

Token bucket queue prevents 429 errors. FIFO processing.

Built-in observability. Not an afterthought.

When an agent searches the web or reads a page, you should see exactly what happened. webctx logs every tool call automatically and provides a web dashboard to inspect them.

What gets logged

  • Tool name & input parameters
  • Full response content
  • Execution duration (ms)
  • Success / failure status
  • Client IP & user agent
  • Geolocation (country, region, city)

Dashboard features

  • Auto-refreshing every 30 seconds
  • Filter by tool type or status
  • Expandable entries with rendered markdown
  • Copy-to-clipboard for responses
  • Color-coded tool badges
/dashboard
search"React 19 useOptimistic"success2.3s
read-linkreact.dev/reference/react/useO...success0.8s
map-sitereact.devsuccess1.1s
search"Drizzle ORM migrations"success1.9s
read-linkorm.drizzle.team/docs/migrat...success1.4s
Zero added latency — logs run async via Next.js after()

Works with any MCP client.

webctx speaks the Model Context Protocol. Any compatible client can connect and use all three tools immediately. One endpoint. One API key.

Claude Code
Full tool support
Cursor
Full tool support
Codex
Full tool support
ChatGPT
Dedicated endpoint
Custom Agents
Any MCP SDK client

ChatGPT note: webctx provides a separate /mcp-chatgpt endpoint that exposes only read-link and map-site — since ChatGPT already has its own search, there is no need to duplicate it.

CLI included

Skip MCP. Pipe the web.

webctx ships a CLI that calls the same tools directly — no server required. Agents can pipe web content into files, chain commands with unix tools, and download raw content from any URL. Built for workflows where stdout matters.

webctx CLIbun install -g webctx
# Search and save results to a file
$ webctx search "drizzle orm migration guide" > results.md
 
# Download docs straight into your project
$ webctx read-link https://orm.drizzle.team/docs/migrations > docs/migrations.md
 
✓ 4,823 tokens of clean markdown saved

Zero MCP overhead

Direct function calls — no protocol negotiation, no server handshake. Just run the command.

Pipe-native output

Plain text stdout. Pipe into files, grep, awk, jq — or straight to your agent's context.

Download anything

Raw GitHub files, documentation pages, license templates — fetch and save in one line.

Same tools, same engine

Identical search, read-link, and map-site — backed by the same multi-provider pipeline.

Self-host on Vercel. Own your data.

webctx is open source and designed for self-hosting. You own the deployment, the data, and the API keys. No vendor lock-in. No usage-based pricing from a third party.

Prerequisites

  • A Vercel account (free tier works)
  • API keys for Brave Search, Tavily, Exa, and Firecrawl
  • A Turso database (free tier available)
terminal
Clone the repo
$ git clone https://github.com/your-repo/webctx.git
Set env variables
$ cp .env.example .env.local && vim .env.local
Deploy to Vercel
$ vercel deploy
Connect agents
$ # Point your MCP clients at your deployment URL

Give your agents the web.

An open-source MCP server with multi-provider search, intelligent content extraction, full site mapping, and built-in observability for every request.

webctxOpen source. MIT License.