Extract URLs from Sitemap

Extract URLs from Sitemap reads XML sitemaps to get comprehensive lists of website URLs. Essential for SEO audits, competitive analysis, and bulk website monitoring tasks.

When to Use It

Use this node to:

Get complete URL lists for SEO audits
Analyze competitor website structure
Monitor large websites for changes
Bulk check page status across entire sites
Feed URLs into scraping or analysis workflows

Inputs

Field	Type	Required	Description
XML Sitemap URL	Text	Yes	The sitemap.xml URL you want to extract URLs from
Limit	Number	No	Maximum number of URLs to extract (optional)

How It Works

This node reads XML sitemap files and extracts all the URLs listed within them. Sitemaps are files that websites use to tell search engines about their pages.

Common Sitemap Locations

Most websites have sitemaps at these standard locations:

https://example.com/sitemap.xml
https://example.com/sitemap_index.xml
https://example.com/sitemaps/sitemap.xml

You can also find sitemap URLs in:

robots.txt file (usually at https://example.com/robots.txt)
Google Search Console
Website footer links

Sitemap Types

Standard Sitemaps:

List all website pages in XML format
Include last modification dates
Show page priority and update frequency

Sitemap Index Files:

Point to multiple sitemap files
Common for large websites
May contain thousands of URLs across multiple files

Specialized Sitemaps:

News sitemaps (news articles)
Image sitemaps (image content)
Video sitemaps (video content)

Output

The node returns:

URLs - List of all URLs found in the sitemap
Total Count - Number of URLs extracted
Last Modified - When each URL was last updated (if available)
Priority - Page priority as specified in sitemap (if available)

Tips

Finding Sitemaps:

Check /robots.txt for sitemap declarations
Try common sitemap URLs first
Look in Google Search Console for verified sitemaps
Some sites have multiple sitemaps for different content types

Large Sitemaps:

Use the limit parameter for initial testing
Large sites may have sitemap index files linking to multiple sitemaps
Consider processing in batches for very large sites

Error Handling:

Not all websites have sitemaps
Some sitemaps may be incomplete or outdated
Private or restricted sitemaps may not be accessible

FAQ

What if a website doesn't have a sitemap?

Not all websites have sitemaps. You can try common sitemap URLs or check the robots.txt file. For sites without sitemaps, consider using web scraping to find links or manually compile URL lists.

Can I extract URLs from sitemap index files?

Yes, the node can handle sitemap index files that reference multiple sitemaps. It will follow the references and extract URLs from all linked sitemaps.

How do I handle very large sitemaps?

Use the limit parameter to extract a subset of URLs first. For complete analysis of large sites, consider processing the sitemap in batches or focusing on specific sections.

What if I get access denied errors?

Some websites restrict access to their sitemaps or require specific user agents. The sitemap might be protected or the URL might be incorrect. Try accessing it directly in your browser first.

Can I use this to monitor competitors?

Yes, extracting competitor sitemaps is a common competitive analysis technique. Combine with other nodes to track their content strategy, new pages, and site structure changes.

How often should I extract sitemap data?

For monitoring purposes, weekly or monthly extraction is usually sufficient unless you’re tracking rapidly changing sites. Use schedulers to automate regular sitemap analysis.

Get Started

Guides

Nodes

Extract URLs from Sitemap

When to Use It

Inputs

How It Works

Common Sitemap Locations

Sitemap Types

Output

Tips

FAQ

Get Started

Guides

Nodes

​When to Use It

​Inputs

​How It Works

​Common Sitemap Locations

​Sitemap Types

​Output

​Tips

​FAQ

When to Use It

Inputs

How It Works

Common Sitemap Locations

Sitemap Types

Output

Tips

FAQ