Skip to main content

Capture text: How to extract data 'From a list'

Learn how to extract data 'From a list'. This is typically used to scrape data that repeats on a page in a pattern, ex: a list of search results.

M
Written by Melissa Shires
Updated today

'From a list' is best for extracting repeating information like product listings or search results. By training your robot to scrape or extract the data from a list, it will automatically structure the data into a table as well as trigger pagination options.

Note that when training a robot to extract data from a web page you can train a single robot to extract data "From a list", "Just text", as well as take a screenshot.

Common examples include:

  • Product category pages with multiple items in a grid

  • Search results pages showing multiple listings

  • Directory pages listing multiple businesses

Scraping data 'From a list' is great for extracting and structuring listed data from a page including:

  • Product names

  • Prices

  • Image URLs

  • Star ratings

  • Short descriptions

  • Product URLs

Step 1: Start training your robot

To start training your robot, all you'll need is the URL you'd like to scrape or monitor.

  1. From your Browse AI dashboard, click "Build New Robot".

  2. Select either:

    1. Extract structure data - if you'd like to scrape data from a web page

    2. Monitor site changes - if you want to create a web monitor.

  3. Enter the Origin URL you would like to scrape or monitor.

  4. Click Start Training Robot.

  5. Select Use Robot Studio and wait for your web page to load.

Step 2: How to scrape data from a page 'From a list'

  1. Click on Capture Text, and select From a list.

  2. Hover over the list of items on the page until you see a dotted outline around the elements you want to capture.

  3. Click to select the list when the outline matches your desired data set.

  4. Robot studio will automatically structure that data into a recommended dataset (you can customize this if needed, see below).

  5. Give your list a descriptive name.

  6. Select the number of items you'd like the robot to capture.

  7. Configure the pagination settings to capture additional list items. These include:

    1. Clicking through 'next' buttons.

    2. Click "load more items"

    3. Infinite scroll (i.e. scroll up or down to load additional items)

    4. No more items to load.

  8. Click 'Save Captured List'.

  9. Click 'Finish' to finish recording your robot if you've captured all of the data you need, or keep capturing text or screenshots.

  10. Name your robot to run it, review the data and approve it.

[arcade id="PiYeOnjDEaRTDTGGEzid" title="How to extract data 'as a list'" padding-bottom="0"]

How to customize what and how the robot structures the list data

If you're not happy with how the robot automatically structured the list data - you can customize it.

  1. From the extracted list, click 'Select Manually Instead'.

  2. Click 'Cancel Edits'.

  3. Hover over each item you'd like to extract, and click to select them.

  4. When finished, click Confirm.

  5. Label each data point (press Enter after each one to move to the next).

  6. Give your list a descriptive name.

  7. Select the number of items you'd like the robot to capture.

  8. Configure the pagination settings to capture additional list items. These include:

    1. Clicking through 'next' buttons.

    2. Click "load more items"

    3. Infinite scroll (i.e. scroll up or down to load additional items)

    4. No more items to load.

  9. Click 'Save Captured List'.

  10. Click 'Finish' to finish recording your robot if you've captured all of the data you need, or keep capturing text or screenshots.

  11. Name your robot to run it, review the data and approve it.

[arcade id="xSu63FmvvmdiXmjKuzgO" title="Extracting data 'From a list' - customizing the list output" padding-bottom="0%"]

Did this answer your question?