How can I create a workflow connecting two robots?
We've got an interactive demo you can check out. You can click along, use your arrow keys to advance, or hover along the bottom of the demo to see navigation that lets you skip through the demo. From the top-right corner of the demo, you can enable/disable the audio and subtitles, and also open it full-screen 🙂
You can also follow along with the written version below.
Browse AI's workflow feature is a robust tool that enables integrating two robots and running them in sequence. Using workflows, you can configure a robot to perform consecutive runs of two robots, perform bulk runs, or even automatically extract data from detail pages without doing anything manually.
Setting up a workflow is the way to extract data from details pages. Previously, the only option was to download results, import them into a second robot, and perform bulk runs to extract data from inside pages. However, with workflows, you can easily set up two robots and chain them together. Robot A can extract a list of pages and pass it on to Robot B, which can then go through all pages and extract data per the robot's training.
How to create a workflow to do deep scraping?
In this example, we will create a workflow that extracts information from company profile pages on YCombinator.com, which requires obtaining company profile links and extracting data from each profile.
1- You need to create robot A, which extracts company profile links from the YC companies list like you usually would create. Make sure to extract company profile links. The result should look like the image below.
2- You need to train another robot to extract data from a company page on YC that extracts the company information. The result would look like this.
3- Before going ahead and creating a workflow, you need to integrate your second robot with one of the integrations to collect data from multiple runs. In this example, we will integrate with Google Sheets.
4- Now, go to Workflows on your Browse AI dashboard and hit Add new workflow button.
5- Choose a name for your workflow, select the first robot, and click on the next step.
6- Now select your second robot in step B. You must also choose a data point you want to pass from your first robot to your second robot. In this example, we extracted profile page links from YC, so I will select that and hit the next step.
7- Now, you can choose when to run your second robot. You have four options to choose from:
- Only if robot A finds changes while monitoring
- Only if robot A finds new items while monitoring
- Only if robot A finds new or changed items while monitoring
For this example, we will select "always."
8- Once you complete the final step, your workflow is now ready to be enabled. If you have activated Google Sheets syncing, the dashboard will indicate that you have enabled Google Sheets integration. Alternatively, if you have not yet enabled an integration, it is necessary you do so because you will be running multiple tasks simultaneously, possibly in the range of hundreds or thousands, and the most efficient way to manage this data is through integrations such as Google Sheets or Webhooks.
9- Once you have saved and enabled your workflow, you can initiate it by running a task on Robot A. This will prompt Browse AI to extract company profile pages from YC using Robot A and then pass the links to Robot B, which will trigger a bulk run. By doing so, Robot B can extract all the required data.
10- On the robot B page, you will see a bulk run extracting data from company profile pages on YC.
Now if you go to your Google Sheets file, you will see that the data from each company is added as a row to your file.
Workflows are a highly effective and efficient tool for automating data extraction tasks. Whether extracting data from details pages, performing bulk runs, or extracting data from multiple sources, workflows can make the process much easier and more streamlined.