Skip to main content

API Guide: Retrieving and managing scraped data

Learn how to access, filter, and work with data extracted by your Browse AI robots through the API.

M
Written by Melissa Shires
Updated today

After running your first API task, you need to understand how to access and work with the data your robots extract. This guide covers everything from understanding response formats to filtering large datasets and handling errors.

Understanding task response structure

When your robot successfully extracts data, the API returns it in a structured format. Here's what a typical successful task response looks like:

{
"statusCode": 200,
"messageCode": "success",
"result": {
"id": "f6fb62b6-f06a-4bf7-a623-c6a35c2e70b0",
"status": "successful",
"robotId": "4f5cd7ff-6c98-4cac-8cf0-d7d0cb050b06",
"capturedTexts": {
"title": "Product Name Here",
"price": "$99.99",
"description": "Product description text",
"product_list": [
{
"name": "Item 1",
"price": "$29.99"
},
{
"name": "Item 2",
"price": "$39.99"
}
]
},
"capturedScreenshots": [
{
"name": "full_page",
"url": "https://screenshot-url-here.jpg"
}
],
"createdAt": 1678795867879,
"startedAt": 1678795867879,
"finishedAt": 1678795867879,
"videoUrl": "https://video-debug-url.mp4"
}
}

Key fields explained:

  • capturedTexts: Your extracted data with the field names you configured

  • capturedScreenshots: Any screenshots your robot captured

  • status: Current task status (running, successful, failed)

  • videoUrl: Debug video showing exactly what your robot did

Getting task results

Option 1: Get a specific task If you know the task ID (from when you started the task):

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks/TASK_ID" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY"

Option 2: List recent tasks Get your robot's latest tasks:

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY"

Working with different data types

Your capturedTexts field contains different types of data depending on how you configured your robot.

Single text fields

"capturedTexts": {
"title": "Single piece of text",
"price": "$99.99"
}

Lists of items

"capturedTexts": {
"product_list": [
{"name": "Item 1", "price": "$29.99"},
{"name": "Item 2", "price": "$39.99"}
]
}

Processing lists in your code

import requests

# Get task results
response = requests.get(
f"https://api.browse.ai/v2/robots/{robot_id}/tasks/{task_id}",
headers={"Authorization": f"Bearer {api_key}"}
)

data = response.json()
if data["result"]["status"] == "successful":
products = data["result"]["capturedTexts"]["product_list"]
for product in products:
print(f"Product: {product['name']}, Price: {product['price']}")

Filtering and pagination for large datasets

When you have many tasks, use filters to find exactly what you need.

Filter by status

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks?status=successful" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY"

Filter by date range

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks?fromDate=1678795867879&toDate=1678885867879" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY"

Pagination for large results

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks?page=1&pageSize=10" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY"

Available filters:

  • status: successful, failed, in-progress

  • fromDate / toDate: Unix timestamps for date range

  • page / pageSize: Pagination (1-10 items per page)

  • sort: Sort by creation date (-createdAt for newest first)

  • robotBulkRunId: Filter by specific bulk run

Handling failed tasks and debugging

When tasks fail, you get valuable debugging information:

{
"statusCode": 200,
"messageCode": "success",
"result": {
"id": "task-id-here",
"status": "failed",
"userFriendlyError": "The page took too long to load",
"videoUrl": "https://debug-video-url.mp4",
"triedRecordingVideo": true
}
}

Debugging failed tasks:

  1. Check userFriendlyError: Plain English explanation of what went wrong

  2. Watch the debug video: See exactly what your robot encountered

  3. Verify input parameters: Make sure URLs and parameters are correct

  4. Check website changes: Sites may have changed since robot training

Common failure reasons:

  • Website is down or loading slowly

  • Login credentials expired

  • Website structure changed

  • Network connectivity issues

Working with monitoring data

When robots run on monitoring schedules, you can identify them in the task list:

{
"result": {
"id": "task-id",
"runByTaskMonitorId": "monitor-id-here",
"runByAPI": false,
"status": "successful"
}
}

Monitoring-specific filtering:

# Get only monitoring tasks (exclude manual and API runs)
curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY" \
| jq '.result.robotTasks.items[] | select(.runByTaskMonitorId != null)'

Best practices for data management

  • Use specific date ranges instead of fetching all tasks

  • Filter by status to focus on successful extractions only

  • Implement retry logic for temporary failures

  • Use webhooks instead of polling for real-time updates

Did this answer your question?