API Guide: Retrieving and managing scraped data

After running your first API task, you need to understand how to access and work with the data your robots extract. This guide covers everything from understanding response formats to filtering large datasets and handling errors.

Understanding task response structure

When your robot successfully extracts data, the API returns it in a structured format. Here's what a typical successful task response looks like:

{
  "statusCode": 200,
  "messageCode": "success",
  "result": {
    "id": "f6fb62b6-f06a-4bf7-a623-c6a35c2e70b0",
    "status": "successful",
    "robotId": "4f5cd7ff-6c98-4cac-8cf0-d7d0cb050b06",
    "capturedTexts": {
      "title": "Product Name Here",
      "price": "$99.99",
      "description": "Product description text",
      "product_list": [
        {
          "name": "Item 1",
          "price": "$29.99"
        },
        {
          "name": "Item 2", 
          "price": "$39.99"
        }
      ]
    },
    "capturedScreenshots": [
      {
        "name": "full_page",
        "url": "https://screenshot-url-here.jpg"
      }
    ],
    "createdAt": 1678795867879,
    "startedAt": 1678795867879,
    "finishedAt": 1678795867879,
    "videoUrl": "https://video-debug-url.mp4"
  }
}

Key fields explained:

capturedTexts: Your extracted data with the field names you configured
capturedScreenshots: Any screenshots your robot captured
status: Current task status (running, successful, failed)
videoUrl: Debug video showing exactly what your robot did

Getting task results

Option 1: Get a specific task If you know the task ID (from when you started the task):

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks/TASK_ID" \
  -H "Authorization: Bearer YOUR_SECRET_API_KEY"

Option 2: List recent tasks Get your robot's latest tasks:

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks" \
  -H "Authorization: Bearer YOUR_SECRET_API_KEY"

Working with different data types

Your capturedTexts field contains different types of data depending on how you configured your robot.

Single text fields

"capturedTexts": {
  "title": "Single piece of text",
  "price": "$99.99"
}

Lists of items

"capturedTexts": {
  "product_list": [
    {"name": "Item 1", "price": "$29.99"},
    {"name": "Item 2", "price": "$39.99"}
  ]
}

Processing lists in your code

import requests

# Get task results
response = requests.get(
    f"https://api.browse.ai/v2/robots/{robot_id}/tasks/{task_id}",
    headers={"Authorization": f"Bearer {api_key}"}
)

data = response.json()
if data["result"]["status"] == "successful":
    products = data["result"]["capturedTexts"]["product_list"]
    for product in products:
        print(f"Product: {product['name']}, Price: {product['price']}")

Filtering and pagination for large datasets

When you have many tasks, use filters to find exactly what you need.

Filter by status

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks?status=successful" \
  -H "Authorization: Bearer YOUR_SECRET_API_KEY"

Filter by date range

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks?fromDate=1678795867879&toDate=1678885867879" \
  -H "Authorization: Bearer YOUR_SECRET_API_KEY"

Pagination for large results

curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks?page=1&pageSize=10" \
  -H "Authorization: Bearer YOUR_SECRET_API_KEY"

Available filters:

status: successful, failed, in-progress
fromDate / toDate: Unix timestamps for date range
page / pageSize: Pagination (1-10 items per page)
sort: Sort by creation date (-createdAt for newest first)
robotBulkRunId: Filter by specific bulk run

Handling failed tasks and debugging

When tasks fail, you get valuable debugging information:

{
  "statusCode": 200,
  "messageCode": "success", 
  "result": {
    "id": "task-id-here",
    "status": "failed",
    "userFriendlyError": "The page took too long to load",
    "videoUrl": "https://debug-video-url.mp4",
    "triedRecordingVideo": true
  }
}

Debugging failed tasks:

Check userFriendlyError: Plain English explanation of what went wrong
Watch the debug video: See exactly what your robot encountered
Verify input parameters: Make sure URLs and parameters are correct
Check website changes: Sites may have changed since robot training

Common failure reasons:

Website is down or loading slowly
Login credentials expired
Website structure changed
Network connectivity issues

Working with monitoring data

When robots run on monitoring schedules, you can identify them in the task list:

{
  "result": {
    "id": "task-id",
    "runByTaskMonitorId": "monitor-id-here",
    "runByAPI": false,
    "status": "successful"
  }
}

Monitoring-specific filtering:

# Get only monitoring tasks (exclude manual and API runs)
curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/tasks" \
  -H "Authorization: Bearer YOUR_SECRET_API_KEY" \
  | jq '.result.robotTasks.items[] | select(.runByTaskMonitorId != null)'

Best practices for data management

Use specific date ranges instead of fetching all tasks
Filter by status to focus on successful extractions only
Implement retry logic for temporary failures
Use webhooks instead of polling for real-time updates

How to scrape data from a Google Sheet using Browse AI's Google Sheets add-on

API Guide: Getting started

API Guide: Setting up automated monitoring

API Guide: Bulk operations and large-scale data extraction

Webhooks: Set up guide