Skip to main content

How to send Browse AI data to PostgreSQL

M
Written by Melissa Shires

This guide covers three ways to send your Browse AI scraped data into PostgreSQL, from no-code automation to direct database connections. Whether you're using a managed service like Supabase, Neon, or Amazon RDS, or running your own PostgreSQL instance, these methods apply to all setups.

πŸ“– Prerequisites: You'll need an approved Browse AI robot with scraped data and a PostgreSQL database (self-hosted or managed). For API-based methods, you'll also need a Browse AI API key. If using Supabase, you'll need your project URL and API key.

Which method should I use?

Method

Best for

Technical level

Speed

Zapier / Make

Teams without developers

No code

Near real-time

Webhooks + direct insert

Real-time pipelines with custom logic

Intermediate

Real-time

API polling + batch insert

Batch processing, scheduled syncs

Intermediate

On schedule

Method 1: Zapier or Make (no code)

The fastest way to connect Browse AI to PostgreSQL. Both Zapier and Make support PostgreSQL as a destination.

Setting up with Zapier

  1. Go to zapier.com and create a new Zap.

  2. Trigger: Choose Browse AI as the trigger app, then select New Successful Task Finished.

  3. Connect your Browse AI account and select the robot you want to sync data from.

  4. Action: Choose PostgreSQL as the action app, then select Create Row.

  5. Connect your PostgreSQL database (host, port, database name, username, password). If using Supabase, find these in Settings β†’ Database.

  6. Select your table and map each Browse AI captured field to the corresponding column.

  7. Test the Zap, then turn it on.

πŸ’‘ Supabase users: You can also use the Supabase Zapier integration directly, which connects via Supabase's REST API instead of a direct database connection. This is simpler if your database doesn't allow external connections.

Setting up with Make (formerly Integromat)

  1. Create a new scenario in Make.

  2. Add a Webhooks β†’ Custom webhook module as the trigger. Copy the webhook URL.

  3. In Browse AI, go to your robot's Integrate tab and add the Make webhook URL under Webhooks. Select the taskFinishedSuccessfully event.

  4. Add a PostgreSQL β†’ Insert a Row module (or Supabase β†’ Create a Row), connect your database, and map your fields.

  5. Activate the scenario.

Method 2: Webhooks + direct insert (real-time)

Use Browse AI webhooks to push data directly into PostgreSQL as soon as a task finishes. This section covers two approaches: direct connection with psycopg2, and Supabase's REST API.

Step 1: Create your table

CREATE TABLE browse_ai_leads (
    id SERIAL PRIMARY KEY,
    task_id VARCHAR(255) UNIQUE,
    robot_id VARCHAR(255),
    first_name VARCHAR(255),
    last_name VARCHAR(255),
    email VARCHAR(255),
    company_name VARCHAR(255),
    phone VARCHAR(100),
    website VARCHAR(500),
    job_title VARCHAR(255),
    origin_url TEXT,
    scraped_at TIMESTAMPTZ DEFAULT NOW()
);

Option A: Direct PostgreSQL connection (psycopg2)

import psycopg2
from flask import Flask, request, jsonifyapp = Flask(__name__)DB_CONFIG = {
    "host": "your-host.example.com",
    "port": 5432,
    "dbname": "your_database",
    "user": "your_username",
    "password": "your_password"
}@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
    payload = request.get_json()    if payload.get("event") != "taskFinishedSuccessfully":
        return jsonify({"status": "ignored"}), 200    task = payload["task"]
    captured = task.get("capturedTexts", {})    conn = psycopg2.connect(**DB_CONFIG)
    cur = conn.cursor()    cur.execute(
        """INSERT INTO browse_ai_leads
           (task_id, robot_id, first_name, last_name, email,
            company_name, phone, website, job_title, origin_url)
           VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
           ON CONFLICT (task_id) DO NOTHING""",
        (
            task.get("id", ""),
            task.get("robotId", ""),
            captured.get("first_name", ""),
            captured.get("last_name", ""),
            captured.get("email", ""),
            captured.get("company_name", ""),
            captured.get("phone", ""),
            captured.get("website", ""),
            captured.get("job_title", ""),
            task.get("inputParameters", {}).get("originUrl", "")
        )
    )    conn.commit()
    cur.close()
    conn.close()    return jsonify({"status": "inserted"}), 200if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

πŸ’‘ Install psycopg2: Run pip install psycopg2-binary. For production, consider using a connection pool with psycopg2.pool or SQLAlchemy to avoid opening a new connection per request.

Option B: Supabase REST API (no direct database connection needed)

If you're using Supabase, you can insert data via their REST API without exposing your database connection string:

import requests
from flask import Flask, request, jsonifyapp = Flask(__name__)SUPABASE_URL = "https://your-project.supabase.co"
SUPABASE_API_KEY = "your_supabase_anon_or_service_role_key"@app.route("/browse-ai-webhook", methods=["POST"])
def handle_webhook():
    payload = request.get_json()    if payload.get("event") != "taskFinishedSuccessfully":
        return jsonify({"status": "ignored"}), 200    task = payload["task"]
    captured = task.get("capturedTexts", {})    row = {
        "task_id": task.get("id", ""),
        "robot_id": task.get("robotId", ""),
        "first_name": captured.get("first_name", ""),
        "last_name": captured.get("last_name", ""),
        "email": captured.get("email", ""),
        "company_name": captured.get("company_name", ""),
        "phone": captured.get("phone", ""),
        "website": captured.get("website", ""),
        "job_title": captured.get("job_title", ""),
        "origin_url": task.get("inputParameters", {}).get("originUrl", "")
    }    resp = requests.post(
        f"{SUPABASE_URL}/rest/v1/browse_ai_leads",
        headers={
            "apikey": SUPABASE_API_KEY,
            "Authorization": f"Bearer {SUPABASE_API_KEY}",
            "Content-Type": "application/json",
            "Prefer": "resolution=merge-duplicates"
        },
        json=row
    )    return jsonify({"status": "inserted", "code": resp.status_code}), 200if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

πŸ’‘ Automatic deduplication with Supabase: The Prefer: resolution=merge-duplicates header tells Supabase to upsert. It matches on the UNIQUE constraint (task_id in our schema) and updates the row if it already exists.

Step 2: Register the webhook in Browse AI

Via the dashboard:

  1. Open your robot and go to the Integrate tab.

  2. Under Webhooks, click Add webhook.

  3. Paste your endpoint URL and select the taskFinishedSuccessfully event.

Via the API:

curl -X POST "https://api.browse.ai/v2/robots/YOUR_ROBOT_ID/webhooks" \
  -H "Authorization: Bearer YOUR_BROWSE_AI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://yourdomain.com/browse-ai-webhook",
    "events": ["taskFinishedSuccessfully"]
  }'

πŸ’‘ IP allowlisting: Browse AI sends webhooks from IP address 3.228.254.190. If your server has a firewall, add this to your allowlist. See Webhooks: IP address for allowlisting.

Method 3: API polling + batch insert (scheduled)

Poll the Browse AI API on a schedule and batch-insert results into PostgreSQL.

import psycopg2
import requests
from datetime import datetime, timedeltaBROWSE_AI_API_KEY = "your_browse_ai_api_key"
ROBOT_ID = "your_robot_id"
DB_CONFIG = {
    "host": "your-host.example.com",
    "port": 5432,
    "dbname": "your_database",
    "user": "your_username",
    "password": "your_password"
}def get_recent_tasks(since_hours=1):
    resp = requests.get(
        f"https://api.browse.ai/v2/robots/{ROBOT_ID}/tasks",
        headers={"Authorization": f"Bearer {BROWSE_AI_API_KEY}"},
        params={"pageSize": 100}
    )
    tasks = resp.json().get("result", {}).get("robotTasks", {}).get("items", [])
    cutoff = datetime.utcnow() - timedelta(hours=since_hours)
    return [t for t in tasks if t.get("status") == "successful"
            and datetime.fromisoformat(t["finishedAt"].replace("Z","")) > cutoff]def batch_insert(tasks):
    conn = psycopg2.connect(**DB_CONFIG)
    cur = conn.cursor()    for task in tasks:
        captured = task.get("capturedTexts", {})
        cur.execute(
            """INSERT INTO browse_ai_leads (task_id, email, company_name)
               VALUES (%s, %s, %s)
               ON CONFLICT (task_id) DO NOTHING""",
            (task.get("id"), captured.get("email", ""),
             captured.get("company_name", ""))
        )    conn.commit()
    cur.close()
    conn.close()# Run on a schedule (e.g. cron job every hour)

πŸ“– For full Browse AI API details, including pagination, bulk operations, and task filtering, see the API Guide: Getting started and API Guide: Bulk operations.

PostgreSQL-specific tips

Built-in deduplication with ON CONFLICT

PostgreSQL's ON CONFLICT clause is the simplest way to handle duplicates. With a UNIQUE constraint on task_id, you can choose to skip duplicates or update existing rows:

-- Skip duplicates
INSERT INTO browse_ai_leads (task_id, email, company_name)
VALUES ('abc123', '[email protected]', 'Acme Inc')
ON CONFLICT (task_id) DO NOTHING;-- Or update on conflict
INSERT INTO browse_ai_leads (task_id, email, company_name)
VALUES ('abc123', '[email protected]', 'Acme Inc')
ON CONFLICT (task_id) DO UPDATE SET
    email = EXCLUDED.email,
    company_name = EXCLUDED.company_name;

Indexing for performance

Add indexes on columns you'll query frequently:

CREATE INDEX idx_leads_email ON browse_ai_leads (email);
CREATE INDEX idx_leads_scraped_at ON browse_ai_leads (scraped_at);
CREATE INDEX idx_leads_company ON browse_ai_leads (company_name);

Connection pooling

For webhook-based real-time inserts, avoid opening a new connection per request. Use a connection pool instead:

from psycopg2 import pooldb_pool = pool.ThreadedConnectionPool(
    minconn=2, maxconn=10,
    host="your-host", dbname="your_db",
    user="your_user", password="your_password"
)# In your webhook handler:
conn = db_pool.getconn()
# ... execute query ...
db_pool.putconn(conn)

Troubleshooting

Connection refused or timeout

Check that your PostgreSQL instance allows external connections. For cloud-hosted databases (Supabase, Neon, RDS), verify that your server's IP is in the allowed list. For Supabase specifically, enable the "Allow connections from outside" option in your project settings.

SSL connection errors

Most managed PostgreSQL services require SSL. Add sslmode=require to your connection string or pass "sslmode": "require" in your connection config.

Webhook isn't firing

Make sure the webhook URL is publicly accessible and that your server responds with a 200 status code. See the Webhooks: Set up guide for detailed debugging steps.

Did this answer your question?