Skip to main content

How to send Browse AI data to Snowflake

M
Written by Melissa Shires
Updated today

Browse AI doesn't have a built-in Snowflake connector, but you can easily send your scraped data to Snowflake using webhooks or the Browse AI REST API. This guide walks you through both approaches.

πŸ’‘ Which method should I choose?
Use webhooks if you want data pushed to Snowflake automatically in real time. Use the REST API if you prefer to pull data on your own schedule (e.g. via a cron job or orchestration tool like Airflow).

Prerequisites

  1. A Browse AI account with at least one approved robot that has extracted data.

  2. A Snowflake account with a database, schema, and warehouse you can write to.

  3. A server or cloud function (e.g. AWS Lambda, Google Cloud Functions) to receive webhooks and write to Snowflake β€” only needed for the webhook method.

Option A: Real-time ingestion with webhooks

With this approach, Browse AI sends a POST request to your endpoint every time a task finishes. Your endpoint parses the payload and inserts the data into Snowflake.

Step 1: Create a Snowflake table

Create a table to store the incoming data. A flexible starting point is a variant column for the raw JSON plus metadata columns:

CREATE TABLE IF NOT EXISTS browse_ai_data (
  id STRING DEFAULT UUID_STRING(),
  robot_id STRING,
  task_id STRING,
  captured_at TIMESTAMP_NTZ,
  raw_payload VARIANT,
  inserted_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

Step 2: Build a webhook receiver

Set up a small server or cloud function that accepts POST requests from Browse AI and writes to Snowflake. Here’s a Python example using the Snowflake connector:

import json
import snowflake.connector
from flask import Flask, requestapp = Flask(__name__)SNOWFLAKE_CONFIG = {
    "account": "your_account",
    "user": "your_user",
    "password": "your_password",
    "database": "your_database",
    "schema": "your_schema",
    "warehouse": "your_warehouse",
}@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
    payload = request.get_json()
    task = payload.get("task", {})    conn = snowflake.connector.connect(**SNOWFLAKE_CONFIG)
    cur = conn.cursor()
    cur.execute(
        """
        INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
        SELECT %s, %s, %s, PARSE_JSON(%s)
        """,
        (
            task.get("robotId"),
            task.get("id"),
            task.get("finishedAt"),
            json.dumps(task),
        ),
    )
    conn.close()
    return "OK", 200

βœ… Tip: For production, use key-pair authentication instead of a password, and store credentials in environment variables or a secrets manager.

Step 3: Register the webhook in Browse AI

  1. Open your robot in Browse AI and go to the Integrate tab.

  2. Click Webhooks.

  3. Paste your endpoint URL (e.g. https://your-server.com/browse-ai-webhook).

  4. Select the event type β€” taskFinishedSuccessfully is recommended for clean data ingestion.

  5. Click Save.

Run a test task and confirm a row appears in your Snowflake table.

πŸ“– For more detail on webhook setup and event types, see Webhooks: Set up guide. If you need to allowlist Browse AI’s IP address, see Webhooks: IP address for allowlisting.

Option B: Scheduled ingestion with the REST API

If you prefer to pull data on a schedule rather than receiving it in real time, you can use Browse AI’s REST API to fetch completed task results and load them into Snowflake.

Step 1: Get your API key

  1. Go to Account Settings β†’ API in the Browse AI dashboard.

  2. Copy your API key.

Step 2: Fetch task results

Use the /robots/{robotId}/tasks endpoint to retrieve completed tasks:

import requests
import json
import snowflake.connectorAPI_KEY = "your_browse_ai_api_key"
ROBOT_ID = "your_robot_id"# Fetch the latest tasks
response = requests.get(
    f"https://api.browse.ai/v2/robots/{ROBOT_ID}/tasks",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={"pageSize": 100},
)
tasks = response.json().get("result", {}).get("robotTasks", {}).get("items", [])# Write to Snowflake
conn = snowflake.connector.connect(
    account="your_account",
    user="your_user",
    password="your_password",
    database="your_database",
    schema="your_schema",
    warehouse="your_warehouse",
)
cur = conn.cursor()for task in tasks:
    if task.get("status") == "successful":
        cur.execute(
            """
            INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
            SELECT %s, %s, %s, PARSE_JSON(%s)
            """,
            (
                ROBOT_ID,
                task.get("id"),
                task.get("finishedAt"),
                json.dumps(task),
            ),
        )conn.close()

βœ… Tip: To avoid inserting duplicates, add a UNIQUE constraint on task_id and use a MERGE statement instead of INSERT, or track the last ingested timestamp.

Step 3: Schedule the script

Run the script on a recurring schedule using any orchestration tool:

  • cron β€” for simple VM-based setups.

  • Apache Airflow / Dagster β€” for more complex data pipelines.

  • AWS Lambda + EventBridge β€” for serverless scheduled pulls.

  • Snowflake Tasks + External Functions β€” to keep everything inside Snowflake.

Querying your data in Snowflake

Once data is flowing in, you can flatten the JSON payload into structured columns:

SELECT
  task_id,
  captured_at,
  raw_payload:capturedTexts::VARIANT AS captured_texts,
  raw_payload:capturedScreenshots::VARIANT AS screenshots,
  raw_payload:inputParameters::VARIANT AS input_params
FROM browse_ai_data
ORDER BY captured_at DESC;

For repeated use, create a view that extracts the specific fields your robot captures, so downstream dashboards and queries stay clean.

πŸš€ Want help getting this set up? Our managed services team can build and maintain your Browse AI β†’ Snowflake pipeline for you β€” no engineering effort on your end. Book a call with our team to get started.

Troubleshooting

  • Webhook not arriving? Make sure Browse AI’s IP (3.228.254.190) is allowlisted on your server. See Webhooks: IP address for allowlisting.

  • Snowflake connection errors? Verify your account identifier format (e.g. org-account) and that your warehouse is not suspended.

  • Duplicate rows? Add a unique constraint on task_id and switch to MERGE / INSERT ... WHERE NOT EXISTS.

  • JSON parsing issues? Confirm the payload is valid JSON before calling PARSE_JSON(). Log the raw body for debugging.

Did this answer your question?