Skip to main content

How to send Browse AI data to BigQuery

M
Written by Melissa Shires

This guide covers three ways to send your Browse AI scraped data into Google BigQuery, from no-code automation to custom API integrations. Choose the method that best fits your team's technical comfort and use case.

πŸ“– Prerequisites: You'll need an approved Browse AI robot with scraped data, a Google Cloud project with BigQuery enabled, and a BigQuery dataset and table to load data into. For API-based methods, you'll also need a Browse AI API key and a Google Cloud service account.

Which method should I use?

Here's a quick comparison to help you decide:

Method

Best for

Technical level

Speed

Zapier / Make

Teams without developers

No code

Near real-time

Webhooks + streaming insert

Real-time pipelines with custom logic

Intermediate

Real-time

API polling + batch load

Large-scale batch processing

Intermediate

On schedule

Method 1: Zapier or Make (no code)

The fastest way to connect Browse AI to BigQuery. No coding required. Just map your scraped fields to BigQuery table columns.

Setting up with Zapier

  1. Go to zapier.com and create a new Zap.

  2. Trigger: Choose Browse AI as the trigger app, then select New Successful Task Finished as the event.

  3. Connect your Browse AI account and select the robot you want to sync data from.

  4. Action: Choose Google BigQuery as the action app, then select Create Row.

  5. Connect your Google account, select your project, dataset, and table.

  6. Map your fields: Match each Browse AI captured field to the corresponding BigQuery column.

  7. Test the Zap, then turn it on.

Setting up with Make (formerly Integromat)

  1. Create a new scenario in Make.

  2. Add a Webhooks β†’ Custom webhook module as the trigger. Copy the webhook URL.

  3. In Browse AI, go to your robot's Integrate tab and add the Make webhook URL under Webhooks. Select the taskFinishedSuccessfully event.

  4. Add a Google BigQuery β†’ Insert a Row module, connect your Google account, and map your fields.

  5. Activate the scenario.

Method 2: Webhooks + streaming insert (real-time)

Use Browse AI webhooks to push data directly into BigQuery as soon as a task finishes. BigQuery's streaming insert API lets you add rows in real time without running a load job.

Step 1: Create a Google Cloud service account

  1. In the Google Cloud Console, go to IAM & Admin β†’ Service Accounts.

  2. Click Create Service Account.

  3. Give it a name (e.g. "Browse AI BigQuery Writer").

  4. Grant the role BigQuery Data Editor (allows inserting rows).

  5. Click Done, then click on the service account, go to Keys β†’ Add Key β†’ Create new key β†’ JSON.

  6. Download the JSON credentials file and store it securely on your server.

Step 2: Create your BigQuery table

Create a table in BigQuery to receive your Browse AI data. Here's an example schema for lead data:

-- Run this in the BigQuery console
CREATE TABLE your_dataset.browse_ai_leads (
    task_id STRING,
    robot_id STRING,
    first_name STRING,
    last_name STRING,
    email STRING,
    company_name STRING,
    phone STRING,
    website STRING,
    job_title STRING,
    origin_url STRING,
    scraped_at TIMESTAMP
);

Step 3: Build your webhook endpoint

This example receives Browse AI webhook data and streams it into BigQuery:

from google.cloud import bigquery
from flask import Flask, request, jsonify
from datetime import datetimeapp = Flask(__name__)# Set GOOGLE_APPLICATION_CREDENTIALS env var to your service account JSON path
client = bigquery.Client()
TABLE_ID = "your-project.your_dataset.browse_ai_leads"@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
    payload = request.get_json()    if payload.get("event") != "taskFinishedSuccessfully":
        return jsonify({"status": "ignored"}), 200    task = payload["task"]
    captured = task.get("capturedTexts", {})    row = {
        "task_id": task.get("id", ""),
        "robot_id": task.get("robotId", ""),
        "first_name": captured.get("first_name", ""),
        "last_name": captured.get("last_name", ""),
        "email": captured.get("email", ""),
        "company_name": captured.get("company_name", ""),
        "phone": captured.get("phone", ""),
        "website": captured.get("website", ""),
        "job_title": captured.get("job_title", ""),
        "origin_url": task.get("inputParameters", {}).get("originUrl", ""),
        "scraped_at": datetime.utcnow().isoformat()
    }    errors = client.insert_rows_json(TABLE_ID, [row])
    if errors:
        return jsonify({"status": "error", "errors": errors}), 500    return jsonify({"status": "inserted"}), 200if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

πŸ’‘ Install the client library: Run pip install google-cloud-bigquery and set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of your service account JSON file.

Step 4: Register the webhook in Browse AI

Via the dashboard:

  1. Open your robot and go to the Integrate tab.

  2. Under Webhooks, click Add webhook.

  3. Paste your endpoint URL and select the taskFinishedSuccessfully event.

Via the API:

curl -X POST "https://api.browse.ai/v2/robots/YOUR_ROBOT_ID/webhooks" \
  -H "Authorization: Bearer YOUR_BROWSE_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://yourdomain.com/browse-ai-webhook",
    "events": ["taskFinishedSuccessfully"]
  }'

πŸ’‘ IP allowlisting: Browse AI sends webhooks from IP address 3.228.254.190. If your server has a firewall, add this to your allowlist. See Webhooks: IP address for allowlisting.

Method 3: API polling + batch load (scheduled)

For large-scale data extraction, poll the Browse AI API on a schedule and batch-load results into BigQuery. Batch loading is more cost-effective than streaming for high volumes.

from google.cloud import bigquery
import requests
from datetime import datetime, timedeltaBROWSE_AI_API_KEY = "your_browse_ai_api_key"
ROBOT_ID = "your_robot_id"
TABLE_ID = "your-project.your_dataset.browse_ai_leads"client = bigquery.Client()def get_recent_tasks(since_hours=1):
    resp = requests.get(
        f"https://api.browse.ai/v2/robots/{ROBOT_ID}/tasks",
        headers={"Authorization": f"Bearer {BROWSE_AI_API_KEY}"},
        params={"pageSize": 100}
    )
    tasks = resp.json().get("result", {}).get("robotTasks", {}).get("items", [])
    cutoff = datetime.utcnow() - timedelta(hours=since_hours)
    return [t for t in tasks if t.get("status") == "successful"
            and datetime.fromisoformat(t["finishedAt"].replace("Z","")) > cutoff]def batch_load_to_bigquery(tasks):
    rows = []
    for task in tasks:
        captured = task.get("capturedTexts", {})
        rows.append({
            "task_id": task.get("id", ""),
            "robot_id": task.get("robotId", ""),
            "email": captured.get("email", ""),
            "company_name": captured.get("company_name", ""),
            "origin_url": task.get("inputParameters", {}).get("originUrl", ""),
            "scraped_at": task.get("finishedAt", "")
        })    if rows:
        errors = client.insert_rows_json(TABLE_ID, rows)
        if errors:
            print(f"Errors: {errors}")
        else:
            print(f"Inserted {len(rows)} rows")# Run on a schedule (e.g. cron job every hour)

πŸ“– For full Browse AI API details, including pagination, bulk operations, and task filtering, see the API Guide: Getting started and API Guide: Bulk operations.

BigQuery-specific tips

Handling duplicate rows

BigQuery streaming inserts don't enforce uniqueness. To deduplicate, include the Browse AI task_id in your table and use a scheduled query or view to deduplicate:

-- Create a deduplicated view
CREATE OR REPLACE VIEW your_dataset.browse_ai_leads_deduped AS
SELECT * EXCEPT(row_num)
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY task_id ORDER BY scraped_at DESC) AS row_num
    FROM your_dataset.browse_ai_leads
)
WHERE row_num = 1;

Working with table (list) data

If your Browse AI robot extracts tabular data (lists of items), each row in the scraped table becomes a separate BigQuery row. Access list data from task.capturedLists in the webhook payload instead of capturedTexts.

Partitioning for performance

If you're loading large volumes of scraped data, partition your table by the scraped_at timestamp. This improves query performance and reduces costs:

CREATE TABLE your_dataset.browse_ai_leads (
    ...
    scraped_at TIMESTAMP
)
PARTITION BY DATE(scraped_at);

Troubleshooting

BigQuery returns "Access Denied" errors

Check that your service account has the BigQuery Data Editor role on the dataset or project. Also verify the GOOGLE_APPLICATION_CREDENTIALS environment variable points to the correct JSON file.

Rows aren't appearing immediately

Streaming inserts may take a few seconds to become visible in queries. This is normal BigQuery behavior. For batch loads, the data is available as soon as the load job completes.

Schema mismatch errors

Make sure your row data matches the table schema exactly. BigQuery is strict about column names and types. If a Browse AI field returns a number but your column expects a STRING, you'll get an error.

Webhook isn't firing

Make sure the webhook URL is publicly accessible and that your server responds with a 200 status code. See the Webhooks: Set up guide for detailed debugging steps.

Did this answer your question?