Skip to main content

How to send Browse AI data to Snowflake

M
Written by Melissa Shires

Browse AI doesn't have a built-in Snowflake connector, but you can easily send your scraped data to Snowflake using webhooks or the Browse AI REST API. This guide walks you through both approaches.

💡 Which method should I choose?
Use webhooks if you want data pushed to Snowflake automatically in real time. Use the REST API if you prefer to pull data on your own schedule (e.g. via a cron job or orchestration tool like Airflow).

Prerequisites

  1. A Browse AI account with at least one approved robot that has extracted data.

  2. A Snowflake account with a database, schema, and warehouse you can write to.

  3. A server or cloud function (e.g. AWS Lambda, Google Cloud Functions) to receive webhooks and write to Snowflake — only needed for the webhook method.

Option A: Real-time ingestion with webhooks

With this approach, Browse AI sends a POST request to your endpoint every time a task finishes. Your endpoint parses the payload and inserts the data into Snowflake.

Step 1: Create a Snowflake table

Create a table to store the incoming data. A flexible starting point is a variant column for the raw JSON plus metadata columns:

CREATE TABLE IF NOT EXISTS browse_ai_data (
  id STRING DEFAULT UUID_STRING(),
  robot_id STRING,
  task_id STRING,
  captured_at TIMESTAMP_NTZ,
  raw_payload VARIANT,
  inserted_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

Step 2: Build a webhook receiver

Set up a small server or cloud function that accepts POST requests from Browse AI and writes to Snowflake. Here’s a Python example using the Snowflake connector:

import json
import snowflake.connector
from flask import Flask, request

app = Flask(__name__)

SNOWFLAKE_CONFIG = {
"account": "your_account",
"user": "your_user",
"password": "your_password",
"database": "your_database",
"schema": "your_schema",
"warehouse": "your_warehouse",
}

@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
payload = request.get_json()
task = payload.get("task", {})

conn = snowflake.connector.connect(**SNOWFLAKE_CONFIG)
try:
cur = conn.cursor()
cur.execute(
"""
INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
SELECT %s, %s, TO_TIMESTAMP_NTZ(%s, 3), PARSE_JSON(%s)
""",
(
task.get("robotId"),
task.get("id"),
task.get("finishedAt"),
json.dumps(task),
),
)
finally:
conn.close()
return "OK", 200

Tip: For production, use key-pair authentication instead of a password, and store credentials in environment variables or a secrets manager.

Step 3: Register the webhook in Browse AI

  1. Open your robot in Browse AI and go to the Integrate tab.

  2. Click Webhooks.

  3. Paste your endpoint URL (e.g. https://your-server.com/browse-ai-webhook).

  4. Select the event type — taskFinishedSuccessfully is recommended for clean data ingestion.

  5. Click Save.

Run a test task and confirm a row appears in your Snowflake table.

📖 For more detail on webhook setup and event types, see Webhooks: Set up guide. If you need to allowlist Browse AI’s IP address, see Webhooks: IP address for allowlisting.

Option B: Scheduled ingestion with the REST API

If you prefer to pull data on a schedule rather than receiving it in real time, you can use Browse AI’s REST API to fetch completed task results and load them into Snowflake.

Step 1: Get your API key

  1. Go to Account SettingsAPI in the Browse AI dashboard.

  2. Copy your API key.

Step 2: Fetch task results

Use the /robots/{robotId}/tasks endpoint to retrieve completed tasks:

import requests
import json
import snowflake.connector

API_KEY = "your_browse_ai_api_key"
ROBOT_ID = "your_robot_id"

def fetch_latest_tasks():
response = requests.get(
f"https://api.browse.ai/v2/robots/{ROBOT_ID}/tasks",
headers={"Authorization": f"Bearer {API_KEY}"},
params={"pageSize": 100},
)
result = response.json().get("result", {})
items = result.get("robotTasks", {}).get("items", [])
return items

def write_tasks_to_snowflake(tasks):
conn = snowflake.connector.connect(
account="your_account",
user="your_user",
password="your_password",
database="your_database",
schema="your_schema",
warehouse="your_warehouse",
)
cur = conn.cursor()
try:
for task in tasks:
if task.get("status") == "successful":
cur.execute(
"""
INSERT INTO browse_ai_data (robot_id, task_id, captured_at, raw_payload)
SELECT %s, %s, TO_TIMESTAMP_NTZ(%s, 3), PARSE_JSON(%s)
""",
(
ROBOT_ID,
task.get("id"),
task.get("finishedAt"),
json.dumps(task),
),
)
finally:
conn.close()

if __name__ == "__main__":
tasks = fetch_latest_tasks()
write_tasks_to_snowflake(tasks)

Tip: To avoid inserting duplicates, add a UNIQUE constraint on task_id and use a MERGE statement instead of INSERT, or track the last ingested timestamp.

Step 3: Schedule the script

Run the script on a recurring schedule using any orchestration tool:

  • cron — for simple VM-based setups.

  • Apache Airflow / Dagster — for more complex data pipelines.

  • AWS Lambda + EventBridge — for serverless scheduled pulls.

  • Snowflake Tasks + External Functions — to keep everything inside Snowflake.

Querying your data in Snowflake

Once data is flowing in, you can flatten the JSON payload into structured columns:

SELECT
  task_id,
  captured_at,
  raw_payload:capturedTexts::VARIANT AS captured_texts,
  raw_payload:capturedScreenshots::VARIANT AS screenshots,
  raw_payload:inputParameters::VARIANT AS input_params
FROM browse_ai_data
ORDER BY captured_at DESC;

For repeated use, create a view that extracts the specific fields your robot captures, so downstream dashboards and queries stay clean.

🚀 Want help getting this set up? Our managed services team can build and maintain your Browse AI → Snowflake pipeline for you — no engineering effort on your end. Book a call with our team to get started.

Troubleshooting

  • Webhook not arriving? Make sure Browse AI’s IP (3.228.254.190) is allowlisted on your server. See Webhooks: IP address for allowlisting.

  • Snowflake connection errors? Verify your account identifier format (e.g. org-account) and that your warehouse is not suspended.

  • Duplicate rows? Add a unique constraint on task_id and switch to MERGE / INSERT ... WHERE NOT EXISTS.

  • JSON parsing issues? Confirm the payload is valid JSON before calling PARSE_JSON(). Log the raw body for debugging.

Did this answer your question?