Browse AI doesn't have a built-in Snowflake connector, but you can easily send your scraped data to Snowflake using webhooks or the Browse AI REST API. This guide walks you through both approaches.
💡 Which method should I choose?
Use webhooks if you want data pushed to Snowflake automatically in real time. Use the REST API if you prefer to pull data on your own schedule (e.g. via a cron job or orchestration tool like Airflow).
Prerequisites
A Browse AI account with at least one approved robot that has extracted data.
A Snowflake account with a database, schema, and warehouse you can write to.
A server or cloud function (e.g. AWS Lambda, Google Cloud Functions) to receive webhooks and write to Snowflake — only needed for the webhook method.
Option A: Real-time ingestion with webhooks
With this approach, Browse AI sends a POST request to your endpoint every time a task finishes. Your endpoint parses the payload and inserts the data into Snowflake.
Step 1: Create a Snowflake table
Create a table to store the incoming data. A flexible starting point is a variant column for the raw JSON plus metadata columns:
CREATE TABLE IF NOT EXISTS browse_ai_data ( id STRING DEFAULT UUID_STRING(), robot_id STRING, task_id STRING, captured_at TIMESTAMP_NTZ, raw_payload VARIANT, inserted_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP() );
Step 2: Build a webhook receiver
Set up a small server or cloud function that accepts POST requests from Browse AI and writes to Snowflake. Here’s a Python example using the Snowflake connector:
import json
import snowflake.connector
from flask import Flask, request
app = Flask(__name__)
SNOWFLAKE_CONFIG = {
"account": "your_account",
"user": "your_user",
"password": "your_password",
"database": "your_database",
"schema": "your_schema",
"warehouse": "your_warehouse",
}
@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
payload = request.get_json()
task = payload.get("task", {})
conn = snowflake.connector.connect(**SNOWFLAKE_CONFIG)
try:
cur = conn.cursor()
cur.execute(
"""
INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
SELECT %s, %s, TO_TIMESTAMP_NTZ(%s, 3), PARSE_JSON(%s)
""",
(
task.get("robotId"),
task.get("id"),
task.get("finishedAt"),
json.dumps(task),
),
)
finally:
conn.close()
return "OK", 200
✅ Tip: For production, use key-pair authentication instead of a password, and store credentials in environment variables or a secrets manager.
Step 3: Register the webhook in Browse AI
Open your robot in Browse AI and go to the Integrate tab.
Click Webhooks.
Paste your endpoint URL (e.g.
https://your-server.com/browse-ai-webhook).Select the event type —
taskFinishedSuccessfullyis recommended for clean data ingestion.Click Save.
Run a test task and confirm a row appears in your Snowflake table.
📖 For more detail on webhook setup and event types, see Webhooks: Set up guide. If you need to allowlist Browse AI’s IP address, see Webhooks: IP address for allowlisting.
Option B: Scheduled ingestion with the REST API
If you prefer to pull data on a schedule rather than receiving it in real time, you can use Browse AI’s REST API to fetch completed task results and load them into Snowflake.
Step 1: Get your API key
Go to Account Settings → API in the Browse AI dashboard.
Copy your API key.
Step 2: Fetch task results
Use the /robots/{robotId}/tasks endpoint to retrieve completed tasks:
import requests
import json
import snowflake.connector
API_KEY = "your_browse_ai_api_key"
ROBOT_ID = "your_robot_id"
def fetch_latest_tasks():
response = requests.get(
f"https://api.browse.ai/v2/robots/{ROBOT_ID}/tasks",
headers={"Authorization": f"Bearer {API_KEY}"},
params={"pageSize": 100},
)
result = response.json().get("result", {})
items = result.get("robotTasks", {}).get("items", [])
return items
def write_tasks_to_snowflake(tasks):
conn = snowflake.connector.connect(
account="your_account",
user="your_user",
password="your_password",
database="your_database",
schema="your_schema",
warehouse="your_warehouse",
)
cur = conn.cursor()
try:
for task in tasks:
if task.get("status") == "successful":
cur.execute(
"""
INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
SELECT %s, %s, TO_TIMESTAMP_NTZ(%s, 3), PARSE_JSON(%s)
""",
(
ROBOT_ID,
task.get("id"),
task.get("finishedAt"),
json.dumps(task),
),
)
finally:
conn.close()
if __name__ == "__main__":
tasks = fetch_latest_tasks()
write_tasks_to_snowflake(tasks)
✅ Tip: To avoid inserting duplicates, add a UNIQUE constraint on task_id and use a MERGE statement instead of INSERT, or track the last ingested timestamp.
Step 3: Schedule the script
Run the script on a recurring schedule using any orchestration tool:
cron — for simple VM-based setups.
Apache Airflow / Dagster — for more complex data pipelines.
AWS Lambda + EventBridge — for serverless scheduled pulls.
Snowflake Tasks + External Functions — to keep everything inside Snowflake.
Querying your data in Snowflake
Once data is flowing in, you can flatten the JSON payload into structured columns:
SELECT task_id, captured_at, raw_payload:capturedTexts::VARIANT AS captured_texts, raw_payload:capturedScreenshots::VARIANT AS screenshots, raw_payload:inputParameters::VARIANT AS input_params FROM browse_ai_data ORDER BY captured_at DESC;
For repeated use, create a view that extracts the specific fields your robot captures, so downstream dashboards and queries stay clean.
🚀 Want help getting this set up? Our managed services team can build and maintain your Browse AI → Snowflake pipeline for you — no engineering effort on your end. Book a call with our team to get started.
Troubleshooting
Webhook not arriving? Make sure Browse AI’s IP (
3.228.254.190) is allowlisted on your server. See Webhooks: IP address for allowlisting.Snowflake connection errors? Verify your account identifier format (e.g.
org-account) and that your warehouse is not suspended.Duplicate rows? Add a unique constraint on
task_idand switch toMERGE/INSERT ... WHERE NOT EXISTS.JSON parsing issues? Confirm the payload is valid JSON before calling
PARSE_JSON(). Log the raw body for debugging.
