What are bulk operations?
Bulk operations let you run your robot on thousands of URLs in a single API call, rather than creating tasks one by one. This is essential for large-scale data extraction and much more efficient than individual API calls.
Key advantages:
Scale: Up to 500,000 tasks per bulk operation (vs 1,000-5,000 competitor limits)
Performance: Much faster than individual task creation
Resource efficiency: Optimized processing and better success rates
Cost effective: More efficient use of your plan's task allowance
Browse AI API limitations and best practices
Good news: Browse AI has no limitations on task volume - you can scrape all day, non-stop if needed.
Important considerations:
Website limitations: The sites you're scraping may have rate limits or bot detection
API best practices: Use bulk endpoints for high volume, not individual task calls
Data retrieval: Consider table exports for large datasets vs individual task retrieval
When to use bulk operations vs individual tasks
Use bulk operations for:
Processing hundreds or thousands of similar URLs
Competitive intelligence across multiple sites
Lead generation from directory sites
Product catalog extraction
Large-scale monitoring setups
Use individual tasks for:
Testing and development
One-off data extractions
Real-time urgent requests
Custom parameter variations
Critical: If you need to create many tasks, always use the bulk-run endpoint. The individual task endpoint is not designed for frequent calls and may cause performance issues.
Understanding bulk run limits
Per API call: maximum 1,000 tasks
Total per bulk run: maximum 500,000 tasks
Strategy for larger datasets: submit multiple bulk runs sequentially
Step 1: Prepare your input data
Organize your URLs and parameters into chunks of 1,000 or fewer:
# Example: Bulk extract competitor product data
urls_to_scrape = [
"https://competitor1.com/product1",
"https://competitor1.com/product2",
# ... up to 1,000 URLs
]
# Convert to bulk run format
input_parameters = [
{"originUrl": url} for url in urls_to_scrape
]
Step 2: Create your first bulk run
curl -X POST "https://api.browse.ai/v2/robots/ROBOT_ID/bulk-runs" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "Competitor Product Analysis - Batch 1",
"inputParameters": [
{"originUrl": "https://competitor1.com/product1"},
{"originUrl": "https://competitor1.com/product2"},
{"originUrl": "https://competitor1.com/product3"}
]
}'
Successful response:
{
"statusCode": 200,
"messageCode": "success",
"result": {
"bulkRun": {
"id": "bulk-run-uuid-here",
"title": "Competitor Product Analysis - Batch 1",
"status": "running",
"totalTaskCount": 3,
"createdAt": 1678795867879
}
}
}Step 3: Track bulk run progress
Monitor your bulk run status and progress:
curl -X GET "https://api.browse.ai/v2/robots/ROBOT_ID/bulk-runs/BULK_RUN_ID" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY"
Progress response:
{
"statusCode": 200,
"messageCode": "success",
"result": {
"id": "bulk-run-uuid",
"title": "Competitor Product Analysis - Batch 1",
"status": "completed",
"totalTaskCount": 1000,
"successfulTaskCount": 985,
"failedTaskCount": 15,
"createdAt": 1678795867879,
"finishedAt": 1678825867879
}
}Step 4: Handle large datasets (>1,000 URLs)
For datasets larger than 1,000 URLs, submit multiple bulk runs:
import requests
import time
def submit_bulk_runs(robot_id, api_key, all_urls, chunk_size=1000):
bulk_run_ids = []
# Split URLs into chunks of 1,000
for i in range(0, len(all_urls), chunk_size):
chunk = all_urls[i:i + chunk_size]
batch_num = (i // chunk_size) + 1
input_params = [{"originUrl": url} for url in chunk]
response = requests.post(
f"https://api.browse.ai/v2/robots/{robot_id}/bulk-runs",
headers={"Authorization": f"Bearer {api_key}"},
json={
"title": f"Large Scale Extraction - Batch {batch_num}",
"inputParameters": input_params
}
)
if response.status_code == 200:
bulk_run_id = response.json()["result"]["bulkRun"]["id"]
bulk_run_ids.append(bulk_run_id)
print(f"Submitted batch {batch_num}: {bulk_run_id}")
# Brief pause between submissions
time.sleep(1)
return bulk_run_ids
Step 5: Efficient data retrieval for bulk operations
For large datasets, consider table exports instead of individual task retrieval:
Table exports vs individual API calls:
Table exports: Export all your data in bulk formats (CSV, JSON)
Webhooks: Get notified when table exports are ready
Scheduled exports: (Private beta) Automatically export data on schedule
Setting up table export webhooks:
curl -X POST "https://api.browse.ai/v2/robots/ROBOT_ID/webhooks" \
-H "Authorization: Bearer YOUR_SECRET_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"hookUrl": "https://your-system.com/webhook/table-export",
"eventType": "tableExportFinishedSuccessfully"
}'
This is much more efficient than retrieving thousands of individual task results via API.
Handling website limitations and bot detection
Important: While Browse AI has no usage limits, websites you're scraping might have rate limits or bot detection.
Best practices:
Start gradually: Begin with smaller batches to test website tolerance
Monitor success rates: Watch for increased failure rates that might indicate detection
Respect robots.txt: Follow website guidelines when possible
Vary timing: Don't submit all bulk runs simultaneously
Browse AI's built-in protections:
Human-like behavior simulation
IP rotation and proxy management
Realistic delays and scrolling patterns
Cookie and session management
If you encounter issues:
Contact support for site-specific guidance
Consider managed services for enterprise scale
Adjust bulk run timing and size
Common bulk operation patterns
Competitive intelligence:
{
"title": "Daily Competitor Price Check",
"inputParameters": [
{"originUrl": "https://competitor1.com/category/electronics"},
{"originUrl": "https://competitor2.com/category/electronics"},
{"originUrl": "https://competitor3.com/category/electronics"}
]
}Lead generation:
{
"title": "Business Directory Extraction",
"inputParameters": [
{"originUrl": "https://directory.com/page/1"},
{"originUrl": "https://directory.com/page/2"},
{"originUrl": "https://directory.com/page/3"}
]
}Product catalog extraction:
{
"title": "E-commerce Product Data",
"inputParameters": [
{"originUrl": "https://store.com/product/123", "category": "electronics"},
{"originUrl": "https://store.com/product/124", "category": "electronics"}
]
}
Enterprise and managed services available - book a call with our sales team to learn more.
โFor high-scale operations:
Scale pricing available for enterprise volumes
Custom rate limiting and performance optimization
Dedicated support for large-scale implementations
When to consider managed services:
Processing 100,000+ URLs regularly
Mission-critical data extraction requirements
Complex multi-step workflows
Custom integration needs
