12 key concepts you should know to use Browse AI
Browse AI lets you extract and monitor data from any website in minutes with no code. The robots that you train interact with websites just like a human would, allowing you to extract the most accurate data with minimal effort. This guide introduces key concepts to help you get started.
You can train a robot to automate a set of steps that you would normally take manually on a website.
For example, a robot can be trained to do all the following as part of a task:
- Open a webpage,
- Log in,
- Click on buttons,
- Fill out forms,
- Select from a dropdown menu,
- Extract structured data from a webpage into a spreadsheet,
- Click on "Next page" or "Load more" buttons to extract more data,
- Take screenshots,
- Monitor specific parts of webpages for visual or content changes.
A robot has dynamic input parameters that allow you to adjust the page URL (a.k.a. Origin URL) or what it enters in text inputs every time you run it. This allows you to use a single robot to extract or monitor data from an unlimited number of pages on a site that have a similar layout.
Most robots that people create simply open a webpage and extract data from it. The user can bulk run the robot on tens of thousands of similar pages on that site to quickly extract the full data set.
Robots do a lot more in the background without you noticing. For example, they solve captchas, use geolocated residential IP addresses, emulate human behavior so they do not get detected, and automatically adapt to website changes and essentially maintain themselves.
Prebuilt Robots vs Custom Robots
Robots are created either using Prebuilt Robots or using Browse AI Recorder and its click-and-extract interface. Every robot has a few input parameters (like the webpage address) that you can adjust every time you run it.
Prebuilt Robots are designed for popular use-cases and new ones are published every week. Examples include extracting data from Yelp, TripAdvisor, or LinkedIn Companies.
Over 90% of robots that Browse AI users create are Custom Robots that they train for their specific use cases. For example, there are realtors that monitor building permits issued by their county's government (on the county website) and connect that to a sales CRM or spreadsheet to send emails automatically to every builder that receives a building permit.
Every Custom Robot comes with an Origin URL input parameter that points to the link they were trained on by default. You can extract or monitor data from any other page on that site that has a similar layout by adjusting the Origin URL.
For example, if you are looking to monitor Walmart's product prices, you can train a custom robot on a Walmart product page, and then configure that robot to monitor 100 different product pages by adjusting the Origin URL for each monitor.
Each robot is trained to perform a certain task. Every time you run that robot, it will perform that task and the details, including the extracted data, will be stored under that task in the robot's History tab.
If you set up a monitoring robot to monitor a webpage for changes daily, it will have to run a task every day or about 30 tasks per month for you.
New tasks can be created in a few different ways:
- You can open a robot on your dashboard, go to its Run Task tab, and run a task.
- In the Run Task tab, you also have the option to Bulk Run up to 50,000 tasks at once by uploading a CSV file.
- If you configure monitors, every time a monitoring check needs to be done, a new monitoring task is automatically created.
- If you integrate Browse AI with another software or use the API, new tasks can be created using the API.
- Sometimes the system creates tasks to make sure the robot is healthy or to optimize the robot and make it faster or more reliable. The tasks will be marked as run by "the system".
One of the most useful features of Browse AI is the built-in monitoring system.
Every robot can have an unlimited number of monitors configured, each for one page or one search criteria on that site that needs to be monitored.
For example, you can monitor all products on an e-commerce site with a single robot and receive a notification when the prices change or a product becomes available.
Monitors detect changes and can be configured to send email notifications or send the data to another software automatically when a change is detected.
Every robot comes with certain Input Parameters that you can adjust for every task and monitor so you do not have to create new robots for every page or search keyword on a site.
The most common input parameter is Origin URL which refers to the page that the robot opens first.
When you train a custom robot, if you interact with any text inputs, what you enter will become an input parameter that you can adjust later.
The Bulk Run feature, which can be found in the robot dashboard under the Run Task tab, allows you to upload a CSV containing up to 50,000 different sets of Input Parameters and immediately create a task for each one of them. The tasks will then be processed in a queue and you will get the full data set they have extracted once they are finished.
This allows you, for example, to upload a CSV containing links to 50,000 company pages on LinkedIn and then get a spreadsheet of all the data that was extracted from those 50,000 pages.
Learn more about bulk running tasks here.
Deep scraping is a popular term for extracting a list of links, from a category page for example, and then extracting the details from each of those links.
This allows you to collect in-depth data from nested pages or sections of a website.
Check out this thorough article on deep scraping using Browse AI.
Most of the time, you want to transfer the data you find on a website to another software you use, like Google Sheets or your CRM. Browse AI comes with 5,000+ integrations to make it possible to create a data pipeline from any website into the tools you already use.
Google Sheets and Airtable
There are native integrations with Google Sheets and Airtable. Once you configure them on a robot, every time the robot runs a task, the data it extracts is inserted into your spreadsheet immediately.
Browse AI for Google Sheets Add-on
This add-on gives you a few additional features in Google Sheets:
- Run robots from within your Google Sheet by highlighting a set of Input Parameters and pressing a button,
- Automatically delete old data in your Google Sheet,
- Automatically delete duplicate data in your Google Sheet.
Learn more about Browse AI for Google Sheets add-on here.
Connector Integrations (Zapier, Make, Pabbly)
These native integrations allow you to connect Browse AI to 5,000+ other apps with a few clicks through a 3rd party integration software:
- Zapier is the easiest one to use, but it can be costly at large volumes.
- Make costs much less, but is harder to use.
- Pabbly Connect is typically used by people who have purchased their one-time payment lifetime deal to save on costs.
API & Webhooks
If you have software developers on your team, make sure to take advantage of the API and webhooks to do almost anything you can do on the dashboard (except creating new robots) programmatically using the public API and webhooks.
There are startups that have built their software on top of Browse AI API and offloaded all their tedious data scraping work.