Sitemap Extractor {API}

Extract every page URL from any website sitemap with a single API call.

Pass a domain name and get back deduplicated page URLs from its sitemaps. Sitemap index files are crawled recursively. Non-page resources are filtered out automatically.

Perfect for building content indexes, seeding crawlers, or auditing a competitor's full site structure in seconds.

No credit card required
View Documentation
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo
Daydream logo
Kovai logo
Passionfroot logo
Orange logo
SendX logo
Klarna logo
Super.com logo

What You Get

Each request crawls a domain's sitemaps and returns all discoverable page URLs.

Deduplicated page URLs

Deduplicated, page-only results with non-page resources like images and PDFs automatically filtered out.

Sitemap index support

Recursively crawls nested sitemap index files with parallel fetching and concurrency control.

Crawl metadata

See how many sitemaps were discovered, fetched, skipped, and errored in a single response.

Normalized domain input

Pass just the domain name, no protocol needed. The API validates and normalizes domains automatically.

How It Works

We discover every relevant sitemap, follow sitemap indexes, and return a clean, deduplicated list of page URLs.

— step 01

Send a domain

Pass the domain name (e.g., "example.com" or "blog.example.com"). No protocol required.

— step 02

Sitemaps discovered

The API checks robots.txt and common sitemap paths, then recursively follows sitemap index files.

— step 03

URLs extracted and deduplicated

All page URLs are collected from every sitemap, deduplicated, and filtered to exclude non-page resources.

— step 04

Clean URL list returned

You get normalized page URLs plus crawl metadata about the sitemap discovery process.

API Response

Discovered URLs for context.dev

GET /v1/web/scrape/sitemap?domain=context.dev
{
  "success": true,
  "domain": "context.dev",
  "urls": [
    "https://context.dev/",
    "https://context.dev/pricing",
    "https://context.dev/blog",
    "https://context.dev/data/logo-api",
    "https://context.dev/use-cases/logo-link",
    "... more discovered URLs"
  ],
  "meta": {
    "sitemapsDiscovered": 3,
    "sitemapsFetched": 3,
    "sitemapsSkipped": 0,
    "errors": 0
  }
}

Frequently asked questions

Common questions about the Context.dev Sitemap Extractor API.

Am I billed for failed requests?
No. You are not billed for failed requests or requests where we are blocked (rarely happens). Credits are only consumed on successful responses.
How do I extract every URL from a website?
Pass the domain to /v1/web/scrape/sitemap. The API checks robots.txt for sitemap directives, fetches the sitemap.xml (or sitemap index), recursively walks every child sitemap, deduplicates the URLs, and filters out non-page resources like PDFs, images, and video files. You get back a clean list of canonical page URLs.
How does the sitemap extractor find sitemaps?
It checks robots.txt for "Sitemap:" directives first, then falls back to common paths (/sitemap.xml, /sitemap_index.xml, /sitemap-index.xml). When the root file is a sitemap index pointing at child sitemaps, the API follows them recursively in parallel with concurrency control.
Does it support sitemap index files (nested sitemaps)?
Yes. Large sites usually split their sitemap into a top-level <sitemapindex> file with dozens or hundreds of child sitemaps. The API resolves them all and returns the union of every URL discovered, with crawl-metadata showing how many sitemaps were found, fetched, and skipped.
What's the difference between a sitemap extractor and a website crawler?
A sitemap extractor reads the URLs the site already declares in sitemap.xml — fast, lightweight, no rendering. A website crawler walks links from the homepage to discover URLs not in the sitemap. Use the sitemap extractor when the site publishes a sitemap; use the Website Crawler API when it doesn't, or when you want page content too.
Why use this instead of writing my own sitemap parser?
Real-world sitemaps are messy: gzip-compressed, deeply nested indexes, malformed XML, duplicate URLs across multiple files, image/video sub-sitemaps mixed in with page URLs, broken Sitemap: directives in robots.txt. The API handles all of that. One GET, clean URL list.
Is the sitemap extractor API free?
Yes — the free tier covers thousands of monthly extractions, plenty for SEO audits and small projects. A single API key also unlocks Logo, Colors, NAICS, SIC, web scraping, and the rest of the Context.dev stack.

Ship an agent that actually knows things.

Free tier, 10-minute integration, and the same API powering agents at Mintlify, daily.dev, and Propane. No credit card to start.