Deep scraping helps you gather detailed information by connecting two robots: one that collects lists of items and another that extracts specific details from each item's page. This approach allows you to build comprehensive datasets across multiple pages from the same website.
Why use deep scraping?
When extracting data from websites, you'll often encounter this scenario: a list page shows basic information (like product names and prices), while detailed information lives on separate pages that you access by clicking each item.
How deep scraping works
Deep scraping involves two main steps:
Your first robot scans a list or category page to collect basic information and URLs
Your second robot visits each URL to extract detailed information from individual pages
How to deep scrape on Browse AI using workflows
Step 1: Collect all URLs (Robot A)
Create a robot that focuses only on gathering information from the list page, including URLs
Use Capture List to select repeating items (like products, job listings, or properties)
Make sure to capture the URL field that links to the detail pages
Download these URLs as a CSV file (Tables tab β Export β Export as CSV)
Step 2: Extract details from each URL (Robot B)
Create a second robot designed for a single detail page
Use Capture Text to select specific data points you want to extract
Import the CSV file from Robot A (Tables tab β Import CSV)
Browse AI will visit each URL and extract the detailed information
Step 3 (Optional): Connect these two robots together using workflows
If you'd prefer to automatically feed the URLs from Robot A into Robot B, you can use a workflow to do so.
Go to Workflows
Set up a workflow that feeds data from Robot A to Robot B
Schedule the workflow to run automatically at your preferred frequency
Video walkthrough
In this video, you'll learn how to build both robots and use bulk run (import CSV of URLs into Robot B) to perform a deep scrape.
Common use cases for deep scraping
E-commerce product analysis
Robot A collects product information from category pages
Robot B visits each product page to gather specifications, reviews, and availability
Real estate market research
Robot A scans property listing pages for basic details
Robot B visits individual property pages to collect specifications and amenities
Business directory database
Robot A processes directory pages to gather business listings
Robot B visits each business profile to extract contact details and services
Best practices
When creating Robot B, train it on a detail page that shows all possible information you want to extract
Always verify your CSV of URLs before running Robot B in bulk
Use integrations like Google Sheets to automatically collect all extracted data
Monitor the extraction progress in your Browse AI dashboard