How can I extract data from lists and their associated details pages? (Deep scraping)
Deep scraping lets you collect data from multiple levels of a website — for example, gathering both product listings and their detailed information pages. Here's how to do it effectively:
The challenge
Many users want to collect data from both list pages (like product categories) and detail pages (individual product information). While it might seem natural to create one robot that clicks through each item, this approach can cause problems:
- The list page might change while gathering details
- Infinite scroll pages can reset when returning to them
- Rapid clicking through many items might trigger website security measures
The solution: two-robot approach
Instead of using one robot to do everything, split the task into two parts:
Step 1: Collect All Links (Robot A)
- Create a robot that focuses only on gathering links from the list page
- Example: "Extract Fintech companies from Y Combinator"
- Download these links as a CSV file (visit Tables tab for the robot, then Export → Export as CSV)
Step 2: Extract Details (Robot B)
- Create a second robot that extracts information from a single detail page
- Example: "Extract details from single YC company"
- Use the Import CSV feature in Robot B's Table to process all links from Step 1
- (Optional) Connect to third-party tools like Google Sheets to easily collect all the extracted data
Video walkthrough
In this video, you'll learn how to build both robots and use bulk run (import CSV of URLs into Robot B) to perform a deep scrape.
Best practices
- When creating Robot B, train it on a detail page that shows all possible information you want to extract
- Always verify your CSV of links before running Robot B
- Use Google Sheets integration to automatically collect all extracted data
- Monitor the progress in your Browse AI dashboard
By using this two-robot approach, you'll avoid common pitfalls and create a more reliable data extraction process.
Can this be automated?
Browse AI's Workflow feature allows you to connect two robots to work together automatically. This feature solves a common data extraction challenge: getting data from both list pages and detail pages efficiently.
How it works
- Robot A extracts a list of items, including a URL to its detail page (e.g. products, jobs, etc)
- Robot B then extracts the data from each item (e.g. prices, descriptions, salaries, etc)
You simply set up your two robots and connect them in sequence. The system handles the rest, making data extraction from multiple pages seamless and efficient.
You can learn more in this help article.