Skip to main content

Capture text: How to extract and structure specific data from a page using 'Just text'

Capture 'Just text' to scrape, extract or monitor specific text or elements from a page.

M
Written by Melissa Shires
Updated today

If you want to scrape, extract or monitor specific text or elements from a page, you'll want to capture 'Just text'.

This feature lets you not only extract specific elements, but also allows you to structure the data you're scraping. Common examples include:

  • Individual product pages

  • Company "About" pages

  • Detailed specification pages

Note that when training a robot to extract data from a web page you can train a single robot to extract data "From a list", "Just text", as well as take a screenshot.

Capture Text is great for extracting and structuring data from:

  • Full product descriptions

  • Technical specifications

  • Shipping information

  • Company information

  • Detailed product features

Capture Text lets you extract and structure specific text elements within a page. This data can then be structured, and can be kept up to date using AI-powered change detection monitoring.

Step 1: Start training your robot

To start training your robot, all you'll need is the URL you'd like to scrape or monitor.

  1. From your Browse AI dashboard, click "Build New Robot".

  2. Select either:

    1. Extract structure data - if you'd like to scrape data from a web page

    2. Monitor site changes - if you want to create a web monitor.

  3. Enter the Origin URL you would like to scrape or monitor.

  4. Click Start Training Robot.

  5. Select Use Robot Studio and wait for your web page to load.

Step 2: How to scraping 'Just text' to capture and structure text

  1. Start training your robot.

  2. Click 'Capture Text' and select 'Just text'.

  3. Hover and click to select what you want to capture.

  4. Select all text on the page you want to scrape or monitor by clicking on it.

  5. Click confirm when you're done.

  6. Label your captured text. Each label will be a column of data.

  7. Save the captured text.

  8. Click 'Finish' to finish recording your robot if you've captured all of the data you need, or keep capturing text or screenshots.

Note that when capturing, you can often choose between scraping the visible text, HTML, or link depending on what you've selected.

Did this answer your question?