There are two main ways that you can scrape text from a website: Capture List and Capture Text. In this article, we'll go through the differences and how to choose which method suits your use case.
AT A GLANCE
β
βCapture List (i.e. 'From a list') - use this when you need to extract the same type of data that repeats on a page, ex: gathering names and prices from a grid of products on a category page.
βCapture Text (i.e. 'Just text') - use this when you need to extract specific pieces of information from a detailed page, like getting the full description and specifications from a single product's page.
How to choose between Capture List and Capture Text
When to use Capture List (i.e. 'From a list')
Use Capture List when your target website displays information in a repeating pattern, typically on pages that show multiple items.
Common examples include:
Product category pages with multiple items in a grid
Search results pages showing multiple listings
Directory pages listing multiple businesses
Capture List is great for extracting and structuring listed data from a page including:
Product names
Prices
Image URLs
Star ratings
Short descriptions
Product URLs
Capture List organizes and structures this data into Tables, ensures that the same data fields are extracted for each list item, and using AI keeps this field data accurate when web changes occur.
When to use Capture Text (i.e. 'Just text)
Use Capture Text when you need specific information from a detailed page about a single item. Common examples include:
Individual product pages
Company "About" pages
Detailed specification pages
Capture Text is great for extracting and structuring data from:
Full product descriptions
Technical specifications
Shipping information
Company information
Detailed product features
Capture Text lets you extract and structure specific text elements within a page. This data can then be structured, and can be kept up to date (even with design changes) using AI-powered change detection monitoring.
Using both methods together via workflows (deep scraping)
If you're looking to get data across pages and sub-pages of website (ex: search results and individual page data), you can use both of these methods together to get the dataset you need.
Here's how to connect two robots together using workflows:
Create Robot A (Capture List): create a robot that collects URLs from a category page where products are listed.
Create Robot B (Capture Text): using one of the URLs captured by Robot A, create another robot that extracts the detailed information.
Create a Workflows: connect these two robots together using a workflow to extract the data from all of the pages.
For a more in-depth look, check out How can I extract data from lists and their associated details pages? (deep scraping).
Tips for successful data extraction
Mouse movement guide
Move your mouse slowly across the page to see how the selection boxes appear
Notice which elements are grouped together
Pay attention to how different websites structure their data differently
Testing your selection
After selecting data, run a test to ensure you're capturing the right information
Verify that all required fields are being extracted
Check that the data format matches your needs
Common issues to watch for
Some websites may group different pieces of information in the same box
Others may separate related information into different boxes
Test your robot on multiple pages to ensure consistent extraction
We have a more detailed article here: Tips for successful data extraction