Skip to main content
Extract URLs from Sitemap reads XML sitemaps to get comprehensive lists of website URLs. Essential for SEO audits, competitive analysis, and bulk website monitoring tasks.

When to Use It

Use this node to:
  • Get complete URL lists for SEO audits
  • Analyze competitor website structure
  • Monitor large websites for changes
  • Bulk check page status across entire sites
  • Feed URLs into scraping or analysis workflows

Inputs

FieldTypeRequiredDescription
XML Sitemap URLTextYesThe sitemap.xml URL you want to extract URLs from
LimitNumberNoMaximum number of URLs to extract (optional)

How It Works

This node reads XML sitemap files and extracts all the URLs listed within them. Sitemaps are files that websites use to tell search engines about their pages.

Common Sitemap Locations

Most websites have sitemaps at these standard locations:
  • https://example.com/sitemap.xml
  • https://example.com/sitemap_index.xml
  • https://example.com/sitemaps/sitemap.xml
You can also find sitemap URLs in:
  • robots.txt file (usually at https://example.com/robots.txt)
  • Google Search Console
  • Website footer links

Sitemap Types

Standard Sitemaps:
  • List all website pages in XML format
  • Include last modification dates
  • Show page priority and update frequency
Sitemap Index Files:
  • Point to multiple sitemap files
  • Common for large websites
  • May contain thousands of URLs across multiple files
Specialized Sitemaps:
  • News sitemaps (news articles)
  • Image sitemaps (image content)
  • Video sitemaps (video content)

Output

The node returns:
  • URLs - List of all URLs found in the sitemap
  • Total Count - Number of URLs extracted
  • Last Modified - When each URL was last updated (if available)
  • Priority - Page priority as specified in sitemap (if available)

Tips

Finding Sitemaps:
  • Check /robots.txt for sitemap declarations
  • Try common sitemap URLs first
  • Look in Google Search Console for verified sitemaps
  • Some sites have multiple sitemaps for different content types
Large Sitemaps:
  • Use the limit parameter for initial testing
  • Large sites may have sitemap index files linking to multiple sitemaps
  • Consider processing in batches for very large sites
Error Handling:
  • Not all websites have sitemaps
  • Some sitemaps may be incomplete or outdated
  • Private or restricted sitemaps may not be accessible

FAQ

Not all websites have sitemaps. You can try common sitemap URLs or check the robots.txt file. For sites without sitemaps, consider using web scraping to find links or manually compile URL lists.
Yes, the node can handle sitemap index files that reference multiple sitemaps. It will follow the references and extract URLs from all linked sitemaps.
Use the limit parameter to extract a subset of URLs first. For complete analysis of large sites, consider processing the sitemap in batches or focusing on specific sections.
Some websites restrict access to their sitemaps or require specific user agents. The sitemap might be protected or the URL might be incorrect. Try accessing it directly in your browser first.
Yes, extracting competitor sitemaps is a common competitive analysis technique. Combine with other nodes to track their content strategy, new pages, and site structure changes.
For monitoring purposes, weekly or monthly extraction is usually sufficient unless you’re tracking rapidly changing sites. Use schedulers to automate regular sitemap analysis.
I