Capture List vs. Capture Text

Two of the main techniques that Browse AI robots are equipped with when extracting data from websites are: Capture List and Capture Text. Understanding when to employ each is crucial for ensuring the efficiency and accuracy of your data gathering process.

Capture List

With Capture List, you can teach your robot to find and extract specific data from repetitive website structures. This is incredibly useful for Product Listing Pages (PLPs), where product details are often arranged in a predictable, repetitive, and consistent grid or layout, making it easy to target specific elements within each product 'box' for extraction, such as (but not limited to):

  • Product names
  • Prices
  • Image URLs
  • Star ratings
  • Short Description
  • Product URLs
Figure A: Visual sample for a Product Listing Page

Why should I use Capture List?

Apart from being the most optimal method for capturing large volumes of various items or product listings quickly, it organises data into a table-like format, readily usable for analysis. It also ensures that the same data fields are extracted for each product.

Back to top

Capture Text

Capture Text shines the brightest on Product Detail Pages (PDPs), where information tends to be less structured. It's perfect for getting in-depth information based on specific cues, or based on your use case. Examples of data suited for Capture Text include, but are not limited to:

  • Product descriptions 
  • Technical specifications
  • About
  • Shipping information
  • Information not on the listing page
Figure B: Visual sample for a Product Detail Page

When to use Capture Text:

While Capture List would be optimal for product listings, capture text lets you target precise text elements even within complex pages. You can customise what text to extract, getting only the information you need. Not to mention, it's invaluable for gaining qualitative insights from detailed descriptions.

Once a task is through, your robot will organise these data into Tables, which you can read more about here

Back to top


In a nutshell:

  • Capture List is perfect for efficiently collecting an overview of products (name, price, URLs, etc.) across several pages.
  • Capture Text is best for drilling down into selected products, getting deeper descriptions, and other details.

Utilising the Capture methods when deep scraping:

Unlike traditional web scraping or web data extraction, deep scraping goes beyond single pages by following internal links, mimicking a human user's navigation to extract comprehensive data that's typically located one-page deeper than the main page.

Deep scraping in Browse AI often involves two robots working in tandem, each equipped with a specific Capture method:

  • Robot A (Capture List): This robot specialises in navigating product listing pages and using the Capture List method to extract all URLs leading to individual product detail pages.
  • Robot B (Capture Text): This robot uses one product detail page URL collected by Robot A as a point of reference, and extracts the necessary or desired data using the Capture Text method.

Keep in mind that each use case is unique, so adapt these robots and the capture methods as needed to fit your specific data extraction goals

For a more in-depth look at deep scraping, check out our deep scraping deep-dive.

Things to consider; tips for a successful data extraction:

As you build your robot, pay close attention to how the outlined boxes behave as you move your mouse across the screen estate of the website. These outlines, visible whether you're using Capture List or Capture Text, highlight the various containers holding the data on each page.

The key to successful extraction lies in understanding these containers. Websites are constructed differently, so the behaviour of these boxes can vary. For example, imagine two websites selling similar products:

  • Website A: Here, the original Price and any Discounted Price might be displayed together within a single outlined box.
  • Website B: In contrast, this site might show each Price and its corresponding Discounted Price in separate, distinct boxes.

This difference in structure can be crucial for when you want to extract specific data points. By carefully moving your mouse across the containers, you can identify which 'boxes' hold the data you need, and have your robot capture them.

With all these said, knowing and understanding the distinction between Capture List and Capture Text will elevate your data extraction strategies, and achieve a more streamlined — as well as effective data collection process.

🤖

Back to top

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us