Question 1

Am I billed for failed requests?

Accepted Answer

No. You are not billed for failed requests or requests where we are blocked (rarely happens). Credits are only consumed on successful responses.

Question 2

How do I extract every URL from a website?

Accepted Answer

Pass the domain to /v1/web/scrape/sitemap. The API checks robots.txt for sitemap directives, fetches the sitemap.xml (or sitemap index), recursively walks every child sitemap, deduplicates the URLs, and filters out non-page resources like PDFs, images, and video files. You get back a clean list of canonical page URLs.

Question 3

How does the sitemap extractor find sitemaps?

Accepted Answer

It checks robots.txt for "Sitemap:" directives first, then falls back to common paths (/sitemap.xml, /sitemap_index.xml, /sitemap-index.xml). When the root file is a sitemap index pointing at child sitemaps, the API follows them recursively in parallel with concurrency control.

Question 4

Does it support sitemap index files (nested sitemaps)?

Accepted Answer

Yes. Large sites usually split their sitemap into a top-level <sitemapindex> file with dozens or hundreds of child sitemaps. The API resolves them all and returns the union of every URL discovered, with crawl-metadata showing how many sitemaps were found, fetched, and skipped.

Question 5

What's the difference between a sitemap extractor and a website crawler?

Accepted Answer

A sitemap extractor reads the URLs the site already declares in sitemap.xml — fast, lightweight, no rendering. A website crawler walks links from the homepage to discover URLs not in the sitemap. Use the sitemap extractor when the site publishes a sitemap; use the Website Crawler API when it doesn't, or when you want page content too.

Question 6

Why use this instead of writing my own sitemap parser?

Accepted Answer

Real-world sitemaps are messy: gzip-compressed, deeply nested indexes, malformed XML, duplicate URLs across multiple files, image/video sub-sitemaps mixed in with page URLs, broken Sitemap: directives in robots.txt. The API handles all of that. One GET, clean URL list.

Question 7

Is the sitemap extractor API free?

Accepted Answer

Yes — the free tier covers thousands of monthly extractions, plenty for SEO audits and small projects. A single API key also unlocks Logo, Colors, NAICS, SIC, web scraping, and the rest of the Context.dev stack.

Sitemap Extractor {API}

What You Get

Deduplicated page URLs

Sitemap index support

Crawl metadata

Normalized domain input

How It Works

Send a domain

Sitemaps discovered

URLs extracted and deduplicated

Clean URL list returned

API Response

Frequently asked questions

Ship an agent that actually knows things.