Why PDFs aren't directly supported
Browse AI robots are designed to work with websites by interacting with their HTML structure. PDFs use a different format that our robots can't process directly.
PDF extraction workaround
Convert your PDF to HTML format using one of these methods:
Online converters like Adobe's PDF to HTML service, Zamzar, or PDF2Go
Software tools like Adobe Acrobat Pro (paid) or Calibre (free)
Programming libraries if you're technical (e.g., pdf2htmlEX or pdftohtml)
Save or host the HTML file where your robot can access it:
Upload it to a web hosting service
Store it in a cloud storage service with public access
Use a temporary file hosting service
Create a Browse AI robot targeting the HTML version:
Enter the URL where your HTML file is hosted
Build your robot as you normally would for any webpage
Test to ensure the data is being extracted correctly
Extract and export your data in your preferred format (CSV, JSON, etc.)
Important considerations
Conversion quality varies: Different tools produce different HTML results. If one converter doesn't work well, try another.
Complex formatting may be lost: Tables, charts, and complex layouts might not convert perfectly.
Text-heavy PDFs work best: PDFs that are mostly text typically convert more successfully than highly designed documents.