Capture List vs. Capture Text
Two of the main techniques that Browse AI robots are equipped with when extracting data from websites are: Capture List and Capture Text. Understanding when to employ each is crucial for ensuring the efficiency and accuracy of your data gathering process.
Jump to topic
Capture List
With Capture List, you can teach your robot to find and extract specific data from repetitive website structures. This is incredibly useful for Product Listing Pages (PLPs), where product details are often arranged in a predictable, repetitive, and consistent grid or layout, making it easy to target specific elements within each product 'box' for extraction, such as (but not limited to):
- Product names
- Prices
- Image URLs
- Star ratings
- Short Description
- Product URLs
Why should I use Capture List?
Apart from being the most optimal method for capturing large volumes of various items or product listings quickly, it organises data into a table-like format, readily usable for analysis. It also ensures that the same data fields are extracted for each product.
Capture Text
Capture Text shines the brightest on Product Detail Pages (PDPs), where information tends to be less structured. It's perfect for getting in-depth information based on specific cues, or based on your use case. Examples of data suited for Capture Text include, but are not limited to:
- Product descriptions
- Technical specifications
- About
- Shipping information
- Information not on the listing page
When to use Capture Text:
While Capture List would be optimal for product listings, capture text lets you target precise text elements even within complex pages. You can customise what text to extract, getting only the information you need. Not to mention, it's invaluable for gaining qualitative insights from detailed descriptions.
Once a task is through, your robot will organise these data into Tables, which you can read more about here.
In a nutshell:
- Capture List is perfect for efficiently collecting an overview of products (name, price, URLs, etc.) across several pages.
- Capture Text is best for drilling down into selected products, getting deeper descriptions, and other details.
Utilising the Capture methods when deep scraping:
Unlike traditional web scraping or web data extraction, deep scraping goes beyond single pages by following internal links, mimicking a human user's navigation to extract comprehensive data that's typically located one-page deeper than the main page.
Deep scraping in Browse AI often involves two robots working in tandem, each equipped with a specific Capture method:
- Robot A (Capture List): This robot specialises in navigating product listing pages and using the Capture List method to extract all URLs leading to individual product detail pages.
- Robot B (Capture Text): This robot uses one product detail page URL collected by Robot A as a point of reference, and extracts the necessary or desired data using the Capture Text method.
Keep in mind that each use case is unique, so adapt these robots and the capture methods as needed to fit your specific data extraction goals
For a more in-depth look at deep scraping, check out our deep scraping deep-dive.
Things to consider; tips for a successful data extraction:
As you build your robot, pay close attention to how the outlined boxes behave as you move your mouse across the screen estate of the website. These outlines, visible whether you're using Capture List or Capture Text, highlight the various containers holding the data on each page.
The key to successful extraction lies in understanding these containers. Websites are constructed differently, so the behaviour of these boxes can vary. For example, imagine two websites selling similar products:
- Website A: Here, the original Price and any Discounted Price might be displayed together within a single outlined box.
- Website B: In contrast, this site might show each Price and its corresponding Discounted Price in separate, distinct boxes.
This difference in structure can be crucial for when you want to extract specific data points. By carefully moving your mouse across the containers, you can identify which 'boxes' hold the data you need, and have your robot capture them.
With all these said, knowing and understanding the distinction between Capture List and Capture Text will elevate your data extraction strategies, and achieve a more streamlined — as well as effective data collection process.
🤖