SearXNG MCP server with parallel fan-out to 3 VPN-backed backends (NL/US/SG)
Find a file
slothitude 21bded5080 Add VPN location switcher: 3 MCP tools to dynamically swap Gluetun slots
switch_location, vpn_status, list_endpoints — connects to Lappy via SSH
to run switch_location.py against 83 PureVPN endpoints.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-05 05:59:21 +10:00
cache SearchMCP: agent-optimized search and web reading MCP server 2026-06-04 15:51:19 +10:00
extract SearchMCP: agent-optimized search and web reading MCP server 2026-06-04 15:51:19 +10:00
search Fix httpx.Timeout API and move default port to 8013 2026-06-04 22:49:25 +10:00
utils SearchMCP: agent-optimized search and web reading MCP server 2026-06-04 15:51:19 +10:00
.gitignore SearchMCP: agent-optimized search and web reading MCP server 2026-06-04 15:51:19 +10:00
CLAUDE.md Add VPN location switcher: 3 MCP tools to dynamically swap Gluetun slots 2026-06-05 05:59:21 +10:00
config.py SearchMCP: agent-optimized search and web reading MCP server 2026-06-04 15:51:19 +10:00
README.md Add README with architecture, tools, and VPN stack docs 2026-06-04 15:54:26 +10:00
requirements.txt SearchMCP: agent-optimized search and web reading MCP server 2026-06-04 15:51:19 +10:00
searchmcp.py Add VPN location switcher: 3 MCP tools to dynamically swap Gluetun slots 2026-06-05 05:59:21 +10:00

SearchMCP

Agent-optimized search and web reading MCP server. Returns clean markdown with metadata — purpose-built for LLM consumption.

Features

  • Three-tier extraction: trafilatura (fast, ~50ms) → precision retry → crawl4ai browser fallback → BeautifulSoup last resort
  • Parallel VPN fan-out: queries 3 SearXNG backends (NL/US/SG) simultaneously via asyncio.gather, merges and deduplicates results
  • Quality scoring: domain authority + relevance ranking
  • aiosqlite cache: survives restarts, count-based LRU with stampede protection
  • Per-domain rate limiting: token bucket + robots.txt cache
  • 10 MCP tools: search, read, batch read, sitemap/RSS, extract fields, summarize, cache management

Architecture

searchmcp.py              FastMCP server entry point
config.py                 Env vars, domain authority, TTLs
extract/
  tier1.py               trafilatura (broad + precision)
  tier3.py               crawl4ai browser fallback + BeautifulSoup
  postprocess.py         Markdown cleanup, metadata injection
  quality.py             Article body detection, JS-heavy detection
  summarize.py           Extractive key sentences (no LLM)
search/
  searxng.py             Parallel fan-out to multiple SearXNG backends
  scoring.py             Domain authority + relevance scoring
  dedup.py               URL/title dedup
  feeds.py               Sitemap/RSS/Atom parsing + autodiscovery
cache/
  store.py               aiosqlite WAL, count-based LRU
utils/
  urls.py                URL normalization, domain extraction
  rate_limit.py           Per-domain token bucket
  markdown.py             Truncation, metadata blocks

Installation

pip install fastmcp httpx aiohttp aiosqlite trafilatura beautifulsoup4 crawl4ai

Configuration

Variable Default Purpose
SEARXNG_URLS nl=..8899,us=..8898,sg=..8897 Location-keyed SearXNG backends
SEARXNG_URL First URL from SEARXNG_URLS Backward-compat single backend
BROWSER_CDP_URL None (disabled) Remote Chrome CDP for JS fallback
SEARCHMCP_CACHE_DIR ~/.searchmcp Cache DB directory
SEARCHMCP_RATE_LIMIT 2 Requests/sec per domain
SEARCHMCP_FETCH_CONCURRENCY 5 Max parallel fetches

MCP Registration

{
  "searchmcp": {
    "type": "stdio",
    "command": "python",
    "args": ["searchmcp.py"],
    "cwd": "/path/to/searchmcp",
    "env": {
      "SEARXNG_URLS": "nl=http://host:8899,us=http://host:8898,sg=http://host:8897"
    }
  }
}

Tools

Tool Description
search SearXNG search, ranked + deduped. location param: "all" (fan-out) or "nl"/"us"/"sg"
search_and_read One-shot search + extract top N concurrently
read_url Extract single URL with tiered pipeline
read_urls Batch read multiple URLs in parallel
read_sitemap Sitemap/RSS/Atom discovery with autodiscovery
extract CSS selector field extraction from URL
summarize_content Extractive key sentences (no LLM)
cache_status Cache statistics
clear_cache Clear entries

VPN Stack (Optional)

SearchMCP works best behind VPN-backed SearXNG instances to avoid rate limits and geoblocking. The recommended setup uses 3 Gluetun containers routing through PureVPN:

searchmcp → parallel fan-out:
  ├── searxng-nl (port 8899) via gluetun-nl (Netherlands)
  ├── searxng-us (port 8898) via gluetun-us (United States)
  └── searxng-sg (port 8897) via gluetun-sg (Singapore)

License

MIT