Skip to main content

How to extract data from tables on a web page

Tables are one of the most common data structures on the web. Browse AI can automatically detect and extract data from most table formats with a single click.

M
Written by Melissa Shires
Updated today

Understanding table extraction

Browse AI recognizes tables, whether they're built with HTML <table> elements or styled divs that look like tables. The extraction process handles:

  • Standard HTML tables

  • Div-based tables (styled to look like tables)

  • Tables with row or column headers

  • Merged cells spanning multiple rows/columns

  • Nested tables (tables within tables)

How to extract from tables

Standard table extraction

  1. Start training your robot in Robot Studio

  2. Click Capture TextFrom a list

  3. Hover over the table until you see the dotted outline

  4. Click to select the entire table

  5. Browse AI automatically:

    • Detects all columns and headers

    • Creates a recommended dataset

    • Structures the data properly

  6. Review the extracted columns

  7. Save and continue

Example:

The YC Startup Directory table automatically extracts Company Name, Location, Description, Batch, Industry, Sub-Industry, and Company URL as separate columns.

Customizing table extraction

If you need specific columns or the automatic detection isn't perfect:

  1. After the table is detected, click "Select Manually Instead"

  2. Click individual columns or cells you want

  3. Label each data point

  4. Save your custom selection

Common table types and approaches

Table type

Extraction method

Notes

Data tables

Capture List

Standard approach, works automatically

Pricing tables

Capture List or Text

May need Text if interactive

Comparison tables

Capture List

Watch for dynamic content

Specification tables

Capture List

Handles merged cells well

Nested tables

Capture List

AI detects structure on hover

Schedule/Calendar

Capture List

Maintains date structure

Handling complex tables

Tables with dynamic content

If you're looking to capture table content that's dynamically displayed, you might need to use a combination of multiple methods to capture the data you need.

  1. If content is visible: Use Capture List normally

  2. If content requires interaction:

    • Click toggles/options first

    • Use Capture TextJust text for specific values

    • Create separate robots for different states

Tables with expandable rows

Some tables hide detail rows until clicked:

  1. Decide what you need:

    • Just visible data → Extract as-is

    • Hidden details too → Click to expand first

  2. Train accordingly:

    • Click expand buttons during training

    • Then capture the revealed content

  3. Consider alternatives:

    • Might need workflows for complex cases

Tables that paginate

Tables split across multiple pages:

  1. Extract the table with Capture List

  2. Configure pagination:

    • Click "next" for page buttons

    • Set row limit (e.g., extract 500 rows)

  3. Robot will navigate pages automatically

Best practices for table extraction

Before extracting

Check the table structure:

  • Scroll to see the full table

  • Note if headers repeat

  • Check for hidden columns (horizontal scroll)

  • See if data loads dynamically

During extraction

Let Browse AI do the work:

  • The AI usually identifies structure correctly

  • Review the recommended dataset

  • Only customize if needed

  • Test with a few rows first

Setting row limits

When asked "How many rows to extract?":

  • Testing: Start with 20 to 50 rows (enough to test pagination).

  • Production: Set based on your needs. If you're unsure of the total number of rows set the number higher vs. what you expect to capture all rows.

Troubleshooting table extraction

Common issues and solutions

Problem

Cause

Solution

Missing columns

Table has horizontal scroll

Scroll right before capturing

Wrong data grouped

Complex nested structure

Use manual selection

Headers not detected

Non-standard formatting

Manually label columns

Partial data only

JavaScript loading

Wait for full load

Duplicate headers

Multi-level headers

Focus on data rows

When tables won't extract properly

Signs you need a different approach:

  • Table is actually multiple separate elements

  • Heavy JavaScript manipulation

  • Content only appears on interaction

  • Data is in images, not text

Alternative approaches:

  1. Use Capture Text for specific cells

  2. Create multiple robots for different sections

  3. Check if site offers data export

  4. Contact support for complex cases

Examples of successful table extraction

Standard data table

Product inventory table: 
- Headers: Product, SKU, Price, Stock, Location
- Rows: 500+ products
- Method: Capture List, all columns auto-detected

Comparison table

Feature comparison: 
- Headers in first column
- Products in subsequent columns
- Method: Capture List, handles row headers

Financial data table

Quarterly results: 
- Nested sections for different metrics
- Merged cells for categories
- Method: Capture List, maintains structure

Tips for specific industries

E-commerce product tables

  • Extract variant options separately

  • Watch for dynamic pricing

  • Include hidden SKU data if needed

Financial data tables

  • Maintain number formatting

  • Extract footnote references

  • Consider currency symbols

Real estate listings

  • Capture both text and links

  • Extract embedded contact info

  • Handle variable field availability

Post-extraction formatting

  • Column headers become field names

  • Each row becomes a data record

  • Export maintains table format

Did this answer your question?