When building our web scrapers or monitors, you'll often encounter this scenario:

A list page shows basic information (like product names and prices),
A detail page that lives on separate pages that you access by clicking each item.

This is called deep scraping. Deep scraping is a way to extract data from multiple linked pages on a website. Simply put, it's how you scrape an entire website across multiple layers vs. a singular web page.

Depending on how the website you want to scrape or monitor is structured, you can deep scrape by:

Category - category > sub category > page
Site search - search term > results > page
Navigation - navigate > scrape links > pages
Site map - scrape site map > pages

Popular use cases for deep scraping include:

E-commerce price and product monitoring to create a database of product and prices based on category or search terms.
Scraping and monitoring directories for lead generation.
Competitive monitoring to scrape entire websites.

How do I deep scrape with Browse AI?

There are two ways you can set up deep scraping with Browse AI depending on the data you want to scrape or monitor, and the structure of the website.

	What it does	When it's ideal
Bulk run	Upload a CSV of URLs or Input parameters to scrape up to 500,000 pages at once.	Site search deep scraping Single use static dataset
Workflow	Connect multiple robots to automatically automatically scrape all pages and sub pages.	Monitored or 'live' data Category deep scraping

Bulk run: How to deep scrape with bulk runs

Read our Bulk Run Guide here to learn how to set up a bulk run to scrape and monitor up to 500,000 URLs or input parameters.

Workflows: how to deep scrape using workflows

A basic workflow to deep scrape involves two main steps:

Robot A: a robot scrapes a list or category page to collect basic information and URLs.
Robot B: a robot that visits each URL to extract or scrape detailed information from individual pages.

Note that using workflows you can connect as many robots as you'd need depending on the structure of the data you need to scrape.

Step 1. Train Robot A to scrape a list or category page to get a list of URLs

In this first step, you'll create a robot that scrapes or extracts a list of URLs based on:

Go to the category page URL.
Use Capture List to select repeating items (like products, job listings, or properties). Make sure to capture the URL field that links to the detail pages.
Finish, approve and name this robot.

Step 2: Train Robot B to extract, monitor or scrape details

Create a second robot using one of the URLs for the details page.
Train the robot to scrape, structure and monitor the data you'd like.
Finish, approve and name this robot.

Step 3: Connect these two robots together using workflows

Automatically feed the URLs from Robot A into Robot B by creating a workflow.

Go to Workflows.
Set up a workflow that feeds data from Robot A to Robot B.
Schedule the workflow to run automatically at your preferred frequency.

Read our workflows guide here.

Common use cases for deep scraping

E-commerce product monitoring

Robot A collects product information from category pages.
Robot B visits each product page to gather specifications, reviews, and availability.

Real estate data

Robot A scans property listing pages for basic details.
Robot B visits individual property pages to collect specifications and amenities.

Lead generation of business listings

Robot A processes directory pages to gather business listings.
Robot B visits each business profile to extract contact details and services.

Best practices

Start by mapping out what the structure of the website and webpages looks like first. How many workflows/layers do you need?
Make sure to set up a monitor on one or both of your robots to keep this data up to date.

What are robots and what can they do?

What is deep scraping?

How can I create a workflow connecting two robots?

How to train a robot to scrape or monitor data

How to approve a robot

How can I extract data from lists and their associated details pages? (deep scraping)