Understanding table extraction

Browse AI recognizes tables, whether they're built with HTML <table> elements or styled divs that look like tables. The extraction process handles:

Standard HTML tables
Div-based tables (styled to look like tables)
Tables with row or column headers
Merged cells spanning multiple rows/columns
Nested tables (tables within tables)

How to extract from tables

Standard table extraction

Start training your robot in Robot Studio
Click Capture Text → From a list
Hover over the table until you see the dotted outline
Click to select the entire table
Browse AI automatically:
- Detects all columns and headers
- Creates a recommended dataset
- Structures the data properly
Review the extracted columns
Save and continue

Example:

The YC Startup Directory table automatically extracts Company Name, Location, Description, Batch, Industry, Sub-Industry, and Company URL as separate columns.

Customizing table extraction

If you need specific columns or the automatic detection isn't perfect:

After the table is detected, click "Select Manually Instead"
Click individual columns or cells you want
Label each data point
Save your custom selection

Common table types and approaches

Table type	Extraction method	Notes
Data tables	Capture List	Standard approach, works automatically
Pricing tables	Capture List or Text	May need Text if interactive
Comparison tables	Capture List	Watch for dynamic content
Specification tables	Capture List	Handles merged cells well
Nested tables	Capture List	AI detects structure on hover
Schedule/Calendar	Capture List	Maintains date structure

Handling complex tables

Tables with dynamic content

If you're looking to capture table content that's dynamically displayed, you might need to use a combination of multiple methods to capture the data you need.

If content is visible: Use Capture List normally
If content requires interaction:
- Click toggles/options first
- Use Capture Text → Just text for specific values
- Create separate robots for different states

Tables with expandable rows

Some tables hide detail rows until clicked:

Decide what you need:
- Just visible data → Extract as-is
- Hidden details too → Click to expand first
Train accordingly:
- Click expand buttons during training
- Then capture the revealed content
Consider alternatives:
- Might need workflows for complex cases

Tables that paginate

Tables split across multiple pages:

Extract the table with Capture List
Configure pagination:
- Click "next" for page buttons
- Set row limit (e.g., extract 500 rows)
Robot will navigate pages automatically

Best practices for table extraction

Before extracting

✅ Check the table structure:

Scroll to see the full table
Note if headers repeat
Check for hidden columns (horizontal scroll)
See if data loads dynamically

During extraction

✅ Let Browse AI do the work:

The AI usually identifies structure correctly
Review the recommended dataset
Only customize if needed
Test with a few rows first

Setting row limits

When asked "How many rows to extract?":

Testing: Start with 20 to 50 rows (enough to test pagination).
Production: Set based on your needs. If you're unsure of the total number of rows set the number higher vs. what you expect to capture all rows.

Troubleshooting table extraction

Common issues and solutions

Problem	Cause	Solution
Missing columns	Table has horizontal scroll	Scroll right before capturing
Wrong data grouped	Complex nested structure	Use manual selection
Headers not detected	Non-standard formatting	Manually label columns
Partial data only	JavaScript loading	Wait for full load
Duplicate headers	Multi-level headers	Focus on data rows

When tables won't extract properly

Signs you need a different approach:

Table is actually multiple separate elements
Heavy JavaScript manipulation
Content only appears on interaction
Data is in images, not text

Alternative approaches:

Use Capture Text for specific cells
Create multiple robots for different sections
Check if site offers data export
Contact support for complex cases

Examples of successful table extraction

Standard data table

Product inventory table: 
- Headers: Product, SKU, Price, Stock, Location 
- Rows: 500+ products 
- Method: Capture List, all columns auto-detected

Comparison table

Feature comparison: 
- Headers in first column 
- Products in subsequent columns 
- Method: Capture List, handles row headers

Financial data table

Quarterly results: 
- Nested sections for different metrics 
- Merged cells for categories 
- Method: Capture List, maintains structure

Tips for specific industries

E-commerce product tables

Extract variant options separately
Watch for dynamic pricing
Include hidden SKU data if needed

Financial data tables

Maintain number formatting
Extract footnote references
Consider currency symbols

Real estate listings

Capture both text and links
Extract embedded contact info
Handle variable field availability

Post-extraction formatting

Column headers become field names
Each row becomes a data record
Export maintains table format

Can I extract data from PDF files?

Can data be deleted from a Table?

Best practices and tips for web scraping, data extraction, and monitoring websites

Capture text: How to extract data 'From a list'

Understanding your data structure

How to extract data from tables on a web page