Deep scraping is a method that allows you to extract data from multiple linked pages on a website. It involves systematically following links listed on a main page and collecting data from the linked detailed pages—particularly useful for e-commerce platforms, directories, or any structured websites.
Why use deep scraping?
When extracting data from websites, you'll often encounter this scenario: a list page shows basic information (like product names and prices), while detailed information lives on separate pages that you access by clicking each item.
When implementing deep scraping in Browse AI, you have two main options:
Bulk Run: Ideal for one-time extractions. Use this option to process all extracted links in one go.
Workflow: Automates the connection between your list robot and detail robot. This option is best for ongoing scraping tasks with recurring updates.
How deep scraping works
Deep scraping involves two main steps:
Your first robot scans a list or category page to collect basic information and URLs
Your second robot visits each URL to extract detailed information from individual pages
How to deep scrape on Browse AI using workflows
Step 1: Collect all URLs (Robot A)
Create a robot that focuses only on gathering information from the list page, including URLs
Use Capture List to select repeating items (like products, job listings, or properties)
Make sure to capture the URL field that links to the detail pages
Download these URLs as a CSV file (Tables tab → Export → Export as CSV)
Step 2: Extract details from each URL (Robot B)
Create a second robot designed for a single detail page
Use Capture Text to select specific data points you want to extract
Import the CSV file from Robot A (Tables tab → Import CSV)
Browse AI will visit each URL and extract the detailed information
Step 3 (Optional): Connect these two robots together using workflows
If you'd prefer to automatically feed the URLs from Robot A into Robot B, you can use a workflow to do so.
Go to Workflows
Set up a workflow that feeds data from Robot A to Robot B
Schedule the workflow to run automatically at your preferred frequency
Video walkthrough
In this video, you'll learn how to build both robots and use bulk run (import CSV of URLs into Robot B) to perform a deep scrape.
Common use cases for deep scraping
E-commerce product analysis
Robot A collects product information from category pages
Robot B visits each product page to gather specifications, reviews, and availability
Real estate market research
Robot A scans property listing pages for basic details
Robot B visits individual property pages to collect specifications and amenities
Business directory database
Robot A processes directory pages to gather business listings
Robot B visits each business profile to extract contact details and services
Best practices
When creating Robot B, train it on a detail page that shows all possible information you want to extract
Always verify your CSV of URLs before running Robot B in bulk
Use integrations like Google Sheets to automatically collect all extracted data
Monitor the extraction progress in your Browse AI dashboard
Ensure your target URLs are publicly accessible
For periodic data collection, schedule your robots to run automatically
Visit the Browse AI Help Center for troubleshooting and additional support