Scrape URL extracts content from webpages, turning unstructured web data into usable information for your workflows. Essential for competitor monitoring, content analysis, and automated data collection.


When to Use It

Use this node to:

  • Monitor competitor pricing and product information
  • Extract content for AI analysis and insights
  • Collect data from news sites or blogs
  • Scrape product details from e-commerce sites
  • Monitor website changes and updates

Inputs

FieldTypeRequiredDescription
URLTextYesWebpage URL you want to scrape
Extract ModeSelectYesChoose what content to extract from the page
CSS SelectorsMapperNo*Specify elements to extract (*Required for custom mode)
Enable JavaScriptSwitchNoRender JavaScript-generated content (slower but more complete)

How It Works

This node visits the specified webpage and extracts content based on your chosen extraction mode. It can handle both static HTML content and JavaScript-rendered pages.

Extract Modes

Entire Page Content:

  • Gets all text and HTML from the webpage
  • Perfect for AI analysis of full page content
  • Includes headers, navigation, and footer content
  • Most comprehensive option

Text Only (No HTML):

  • Extracts only the readable text content
  • Removes all HTML tags and styling
  • Clean text perfect for content analysis
  • Faster processing than full page extraction

Select Specific Elements:

  • Target specific parts of the webpage using CSS selectors
  • Extract only the data you need (prices, titles, descriptions)
  • More efficient for structured data collection
  • Requires basic knowledge of CSS selectors

CSS Selectors Guide

When using “Select Specific Elements”, you’ll map friendly names to CSS selectors:

Element TypeExample SelectorUse Case
Product Price.price, #priceE-commerce monitoring
Page Titleh1, .titleContent analysis
Article Text.content, article pBlog/news scraping
Product Description.descriptionProduct data collection
Reviews.review-textSentiment analysis

Selector Examples

Friendly Name: product_price
CSS Selector: .price-current

Friendly Name: product_title  
CSS Selector: h1.product-name

Friendly Name: rating
CSS Selector: .rating-stars

JavaScript Rendering

When to Enable:

  • Content is loaded dynamically with JavaScript
  • Page shows “Loading…” initially
  • Important data appears after page load
  • Single-page applications (SPAs)

When to Keep Disabled:

  • Static HTML pages
  • Faster scraping is needed
  • Content is visible in page source

Output

The node returns extracted content based on your selection:

Full Page/Text Mode:

  • Content - All extracted text or HTML
  • Page Title - Webpage title
  • URL - The scraped page address

Custom Elements Mode:

  • [Your Field Names] - Data mapped to your specified selectors
  • URL - The scraped page address

Tips

Choosing Selectors:

  • Use browser developer tools (F12) to find CSS selectors
  • Test selectors on the webpage before using them
  • Use specific selectors to avoid getting unwanted content
  • Combine multiple selectors for comprehensive data extraction

Performance:

  • Disable JavaScript rendering when possible for speed
  • Use specific element extraction instead of full page when you only need certain data
  • Consider rate limiting when scraping multiple pages

Reliability:

  • Websites may change their structure, breaking your selectors
  • Test your selectors periodically
  • Have fallback selectors for critical data points

FAQ