Browse AI doesn't have a built-in Snowflake connector, but you can easily send your scraped data to Snowflake using webhooks or the Browse AI REST API. This guide walks you through both approaches.
π‘ Which method should I choose?
Use webhooks if you want data pushed to Snowflake automatically in real time. Use the REST API if you prefer to pull data on your own schedule (e.g. via a cron job or orchestration tool like Airflow).
Prerequisites
A Browse AI account with at least one approved robot that has extracted data.
A Snowflake account with a database, schema, and warehouse you can write to.
A server or cloud function (e.g. AWS Lambda, Google Cloud Functions) to receive webhooks and write to Snowflake β only needed for the webhook method.
Option A: Real-time ingestion with webhooks
With this approach, Browse AI sends a POST request to your endpoint every time a task finishes. Your endpoint parses the payload and inserts the data into Snowflake.
Step 1: Create a Snowflake table
Create a table to store the incoming data. A flexible starting point is a variant column for the raw JSON plus metadata columns:
CREATE TABLE IF NOT EXISTS browse_ai_data ( id STRING DEFAULT UUID_STRING(), robot_id STRING, task_id STRING, captured_at TIMESTAMP_NTZ, raw_payload VARIANT, inserted_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP() );
Step 2: Build a webhook receiver
Set up a small server or cloud function that accepts POST requests from Browse AI and writes to Snowflake. Hereβs a Python example using the Snowflake connector:
import json
import snowflake.connector
from flask import Flask, requestapp = Flask(__name__)SNOWFLAKE_CONFIG = {
"account": "your_account",
"user": "your_user",
"password": "your_password",
"database": "your_database",
"schema": "your_schema",
"warehouse": "your_warehouse",
}@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
payload = request.get_json()
task = payload.get("task", {}) conn = snowflake.connector.connect(**SNOWFLAKE_CONFIG)
cur = conn.cursor()
cur.execute(
"""
INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
SELECT %s, %s, %s, PARSE_JSON(%s)
""",
(
task.get("robotId"),
task.get("id"),
task.get("finishedAt"),
json.dumps(task),
),
)
conn.close()
return "OK", 200
β Tip: For production, use key-pair authentication instead of a password, and store credentials in environment variables or a secrets manager.
Step 3: Register the webhook in Browse AI
Open your robot in Browse AI and go to the Integrate tab.
Click Webhooks.
Paste your endpoint URL (e.g.
https://your-server.com/browse-ai-webhook).Select the event type β
taskFinishedSuccessfullyis recommended for clean data ingestion.Click Save.
Run a test task and confirm a row appears in your Snowflake table.
π For more detail on webhook setup and event types, see Webhooks: Set up guide. If you need to allowlist Browse AIβs IP address, see Webhooks: IP address for allowlisting.
Option B: Scheduled ingestion with the REST API
If you prefer to pull data on a schedule rather than receiving it in real time, you can use Browse AIβs REST API to fetch completed task results and load them into Snowflake.
Step 1: Get your API key
Go to Account Settings β API in the Browse AI dashboard.
Copy your API key.
Step 2: Fetch task results
Use the /robots/{robotId}/tasks endpoint to retrieve completed tasks:
import requests
import json
import snowflake.connectorAPI_KEY = "your_browse_ai_api_key"
ROBOT_ID = "your_robot_id"# Fetch the latest tasks
response = requests.get(
f"https://api.browse.ai/v2/robots/{ROBOT_ID}/tasks",
headers={"Authorization": f"Bearer {API_KEY}"},
params={"pageSize": 100},
)
tasks = response.json().get("result", {}).get("robotTasks", {}).get("items", [])# Write to Snowflake
conn = snowflake.connector.connect(
account="your_account",
user="your_user",
password="your_password",
database="your_database",
schema="your_schema",
warehouse="your_warehouse",
)
cur = conn.cursor()for task in tasks:
if task.get("status") == "successful":
cur.execute(
"""
INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
SELECT %s, %s, %s, PARSE_JSON(%s)
""",
(
ROBOT_ID,
task.get("id"),
task.get("finishedAt"),
json.dumps(task),
),
)conn.close()
β
Tip: To avoid inserting duplicates, add a UNIQUE constraint on task_id and use a MERGE statement instead of INSERT, or track the last ingested timestamp.
Step 3: Schedule the script
Run the script on a recurring schedule using any orchestration tool:
cron β for simple VM-based setups.
Apache Airflow / Dagster β for more complex data pipelines.
AWS Lambda + EventBridge β for serverless scheduled pulls.
Snowflake Tasks + External Functions β to keep everything inside Snowflake.
Querying your data in Snowflake
Once data is flowing in, you can flatten the JSON payload into structured columns:
SELECT task_id, captured_at, raw_payload:capturedTexts::VARIANT AS captured_texts, raw_payload:capturedScreenshots::VARIANT AS screenshots, raw_payload:inputParameters::VARIANT AS input_params FROM browse_ai_data ORDER BY captured_at DESC;
For repeated use, create a view that extracts the specific fields your robot captures, so downstream dashboards and queries stay clean.
π Want help getting this set up? Our managed services team can build and maintain your Browse AI β Snowflake pipeline for you β no engineering effort on your end. Book a call with our team to get started.
Troubleshooting
Webhook not arriving? Make sure Browse AIβs IP (
3.228.254.190) is allowlisted on your server. See Webhooks: IP address for allowlisting.Snowflake connection errors? Verify your account identifier format (e.g.
org-account) and that your warehouse is not suspended.Duplicate rows? Add a unique constraint on
task_idand switch toMERGE/INSERT ... WHERE NOT EXISTS.JSON parsing issues? Confirm the payload is valid JSON before calling
PARSE_JSON(). Log the raw body for debugging.
