Capture List vs. Capture Text

Two of the main techniques that Browse AI robots are equipped with when extracting data from websites are: Capture List and Capture Text. Understanding when to employ each is crucial for ensuring the efficiency and accuracy of your data-gathering process.

What is a Capture List?

With a capture list, you can teach your robot to find and extract specific data from repetitive website structures. This is incredibly useful for Product Listing Pages (PLPs), where product details are often arranged in a predictable, repetitive, and consistent grid or layout, making it easy to target specific elements within each product 'box' for extraction, such as (but not limited to):

  • Product names
  • Prices
  • Image URLs
  • Star ratings
  • Short Description
  • Product URLs
Figure A: Visual sample for a Product Listing Page

Why is a Capture List Good for Product Listing Pages?

Apart from being the most optimal method for capturing large volumes of various items or product listings quickly, it organises data into a table-like format, readily usable for analysis. It also ensures that the same data fields are extracted for each product.

What is Capture Text?

Capture text shines brightest on Product Detail Pages (PDPs), where information tends to be less structured. It's perfect for getting in-depth information based on specific cues or based on your use case. Examples of data suited for capture text include, but are not limited to:

  • Product descriptions 
  • Technical specifications
  • About
  • Shipping information
Figure B: Visual sample for a Product Detail Page

Why is Capture Text Good for Product Detail Pages?

While a capture list would be optimal for product listings, capture text lets you target precise text elements even within complex pages. You can customize what text to extract, getting only the information you need. Not to mention, it's invaluable for gaining qualitative insights from detailed descriptions.

Once a task is through, your robot will organize these data into Tables, which you can read more about here

In a nutshell

  • Capture List is perfect for efficiently collecting an overview of products (name, price, URLs, etc.) across several pages.
  • Capture Text is best for drilling down into selected products, getting deeper descriptions, and other details.

Things to consider

As you go about building your robot, try to observe the behavior of the outlined boxes as you move across/around your screen, regardless of whether you used Capture List or Capture Text.

If you can see the outlined boxes are highlighting different containers, try to encapsulate the container of where your desired data is boxed in. That's basically the trick; websites are built differently, so the boxes' behaviour can also differ.

For example, let's assume there are 2 different websites with very similar data and structure about a category of product; on website ABC, the Price, and Discounted Price may both appear in the same box, but on website DEF, each Price and Discounted Price may have separate boxes appear for selection.

Understanding the distinction between Capture List and Capture Text will elevate your data extraction strategies and achieve a more streamlined and effective data collection process.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us