Understanding table extraction
Browse AI recognizes tables, whether they're built with HTML <table> elements or styled divs that look like tables. The extraction process handles:
Standard HTML tables
Div-based tables (styled to look like tables)
Tables with row or column headers
Merged cells spanning multiple rows/columns
Nested tables (tables within tables)
How to extract from tables
Standard table extraction
Start training your robot in Robot Studio
Click Capture Text → From a list
Hover over the table until you see the dotted outline
Click to select the entire table
Browse AI automatically:
Detects all columns and headers
Creates a recommended dataset
Structures the data properly
Review the extracted columns
Save and continue
Example:
The YC Startup Directory table automatically extracts Company Name, Location, Description, Batch, Industry, Sub-Industry, and Company URL as separate columns.
Customizing table extraction
If you need specific columns or the automatic detection isn't perfect:
After the table is detected, click "Select Manually Instead"
Click individual columns or cells you want
Label each data point
Save your custom selection
Common table types and approaches
Table type | Extraction method | Notes |
Data tables | Capture List | Standard approach, works automatically |
Pricing tables | Capture List or Text | May need Text if interactive |
Comparison tables | Capture List | Watch for dynamic content |
Specification tables | Capture List | Handles merged cells well |
Nested tables | Capture List | AI detects structure on hover |
Schedule/Calendar | Capture List | Maintains date structure |
Handling complex tables
Tables with dynamic content
If you're looking to capture table content that's dynamically displayed, you might need to use a combination of multiple methods to capture the data you need.
If content is visible: Use Capture List normally
If content requires interaction:
Click toggles/options first
Use Capture Text → Just text for specific values
Create separate robots for different states
Tables with expandable rows
Some tables hide detail rows until clicked:
Decide what you need:
Just visible data → Extract as-is
Hidden details too → Click to expand first
Train accordingly:
Click expand buttons during training
Then capture the revealed content
Consider alternatives:
Might need workflows for complex cases
Tables that paginate
Tables split across multiple pages:
Extract the table with Capture List
Configure pagination:
Click "next" for page buttons
Set row limit (e.g., extract 500 rows)
Robot will navigate pages automatically
Best practices for table extraction
Before extracting
✅ Check the table structure:
Scroll to see the full table
Note if headers repeat
Check for hidden columns (horizontal scroll)
See if data loads dynamically
During extraction
✅ Let Browse AI do the work:
The AI usually identifies structure correctly
Review the recommended dataset
Only customize if needed
Test with a few rows first
Setting row limits
When asked "How many rows to extract?":
Testing: Start with 20 to 50 rows (enough to test pagination).
Production: Set based on your needs. If you're unsure of the total number of rows set the number higher vs. what you expect to capture all rows.
Troubleshooting table extraction
Common issues and solutions
Problem | Cause | Solution |
Missing columns | Table has horizontal scroll | Scroll right before capturing |
Wrong data grouped | Complex nested structure | Use manual selection |
Headers not detected | Non-standard formatting | Manually label columns |
Partial data only | JavaScript loading | Wait for full load |
Duplicate headers | Multi-level headers | Focus on data rows |
When tables won't extract properly
Signs you need a different approach:
Table is actually multiple separate elements
Heavy JavaScript manipulation
Content only appears on interaction
Data is in images, not text
Alternative approaches:
Use Capture Text for specific cells
Create multiple robots for different sections
Check if site offers data export
Contact support for complex cases
Examples of successful table extraction
Standard data table
Product inventory table:
- Headers: Product, SKU, Price, Stock, Location
- Rows: 500+ products
- Method: Capture List, all columns auto-detected
Comparison table
Feature comparison:
- Headers in first column
- Products in subsequent columns
- Method: Capture List, handles row headers
Financial data table
Quarterly results:
- Nested sections for different metrics
- Merged cells for categories
- Method: Capture List, maintains structure
Tips for specific industries
E-commerce product tables
Extract variant options separately
Watch for dynamic pricing
Include hidden SKU data if needed
Financial data tables
Maintain number formatting
Extract footnote references
Consider currency symbols
Real estate listings
Capture both text and links
Extract embedded contact info
Handle variable field availability
Post-extraction formatting
Column headers become field names
Each row becomes a data record
Export maintains table format
