Skip to main content
All CollectionsRobotsBuilding robots
How can I extract data from lists and their associated details pages? (deep scraping)
How can I extract data from lists and their associated details pages? (deep scraping)

Deep scraping connects two robots: one to gather lists of items, another to get details from each item's page, giving you complete data sets.

Nick Simard avatar
Written by Nick Simard
Updated this week

Deep scraping helps you gather detailed information by connecting two robots: one that collects lists of items and another that extracts specific details from each item's page. This approach allows you to build comprehensive datasets across multiple pages from the same website.

Why use deep scraping?

When extracting data from websites, you'll often encounter this scenario: a list page shows basic information (like product names and prices), while detailed information lives on separate pages that you access by clicking each item.

How deep scraping works

Deep scraping involves two main steps:

  1. Your first robot scans a list or category page to collect basic information and URLs

  2. Your second robot visits each URL to extract detailed information from individual pages

How to deep scrape on Browse AI using workflows

Step 1: Collect all URLs (Robot A)

  1. Create a robot that focuses only on gathering information from the list page, including URLs

  2. Use Capture List to select repeating items (like products, job listings, or properties)

  3. Make sure to capture the URL field that links to the detail pages

  4. Download these URLs as a CSV file (Tables tab β†’ Export β†’ Export as CSV)

Step 2: Extract details from each URL (Robot B)

  1. Create a second robot designed for a single detail page

  2. Use Capture Text to select specific data points you want to extract

  3. Import the CSV file from Robot A (Tables tab β†’ Import CSV)

  4. Browse AI will visit each URL and extract the detailed information

Step 3 (Optional): Connect these two robots together using workflows

If you'd prefer to automatically feed the URLs from Robot A into Robot B, you can use a workflow to do so.

  1. Go to Workflows

  2. Set up a workflow that feeds data from Robot A to Robot B

  3. Schedule the workflow to run automatically at your preferred frequency

Video walkthrough

In this video, you'll learn how to build both robots and use bulk run (import CSV of URLs into Robot B) to perform a deep scrape.

Common use cases for deep scraping

E-commerce product analysis

  • Robot A collects product information from category pages

  • Robot B visits each product page to gather specifications, reviews, and availability

Real estate market research

  • Robot A scans property listing pages for basic details

  • Robot B visits individual property pages to collect specifications and amenities

Business directory database

  • Robot A processes directory pages to gather business listings

  • Robot B visits each business profile to extract contact details and services

Best practices

  1. When creating Robot B, train it on a detail page that shows all possible information you want to extract

  2. Always verify your CSV of URLs before running Robot B in bulk

  3. Use integrations like Google Sheets to automatically collect all extracted data

  4. Monitor the extraction progress in your Browse AI dashboard

Did this answer your question?