Skip to main content

How can I extract data from lists and their associated details pages? (deep scraping)

Deep scraping connects two robots: one to gather lists of items, another to get details from each item's page, giving you complete data sets.

Nick Simard avatar
Written by Nick Simard
Updated this week

Deep scraping is a method that allows you to extract data from multiple linked pages on a website. It involves systematically following links listed on a main page and collecting data from the linked detailed pages—particularly useful for e-commerce platforms, directories, or any structured websites.

Why use deep scraping?

When extracting data from websites, you'll often encounter this scenario: a list page shows basic information (like product names and prices), while detailed information lives on separate pages that you access by clicking each item.

When implementing deep scraping in Browse AI, you have two main options:

  • Bulk Run: Ideal for one-time extractions. Use this option to process all extracted links in one go.

  • Workflow: Automates the connection between your list robot and detail robot. This option is best for ongoing scraping tasks with recurring updates.

How deep scraping works

Deep scraping involves two main steps:

  1. Your first robot scans a list or category page to collect basic information and URLs

  2. Your second robot visits each URL to extract detailed information from individual pages

How to deep scrape on Browse AI using workflows

Step 1: Collect all URLs (Robot A)

  1. Create a robot that focuses only on gathering information from the list page, including URLs

  2. Use Capture List to select repeating items (like products, job listings, or properties)

  3. Make sure to capture the URL field that links to the detail pages

  4. Download these URLs as a CSV file (Tables tab → Export → Export as CSV)

Step 2: Extract details from each URL (Robot B)

  1. Create a second robot designed for a single detail page

  2. Use Capture Text to select specific data points you want to extract

  3. Import the CSV file from Robot A (Tables tab → Import CSV)

  4. Browse AI will visit each URL and extract the detailed information

Step 3 (Optional): Connect these two robots together using workflows

If you'd prefer to automatically feed the URLs from Robot A into Robot B, you can use a workflow to do so.

  1. Go to Workflows

  2. Set up a workflow that feeds data from Robot A to Robot B

  3. Schedule the workflow to run automatically at your preferred frequency

Video walkthrough

In this video, you'll learn how to build both robots and use bulk run (import CSV of URLs into Robot B) to perform a deep scrape.

Common use cases for deep scraping

E-commerce product analysis

  • Robot A collects product information from category pages

  • Robot B visits each product page to gather specifications, reviews, and availability

Real estate market research

  • Robot A scans property listing pages for basic details

  • Robot B visits individual property pages to collect specifications and amenities

Business directory database

  • Robot A processes directory pages to gather business listings

  • Robot B visits each business profile to extract contact details and services

Best practices

  1. When creating Robot B, train it on a detail page that shows all possible information you want to extract

  2. Always verify your CSV of URLs before running Robot B in bulk

  3. Use integrations like Google Sheets to automatically collect all extracted data

  4. Monitor the extraction progress in your Browse AI dashboard

  5. Ensure your target URLs are publicly accessible

  6. For periodic data collection, schedule your robots to run automatically

  7. Visit the Browse AI Help Center for troubleshooting and additional support

Did this answer your question?