How Browse AI handles languages
Browse AI extracts content exactly as it appears on the website, regardless of language:
Any language: Chinese, Arabic, Russian, Japanese, etc.
Any script: Latin, Cyrillic, Arabic, Han, Devanagari, etc.
Any direction: Left-to-right (LTR) and right-to-left (RTL)
Mixed content: Pages with multiple languages
Special characters: Accents, umlauts, tildes, etc.
💡 You don't need special configuration - Browse AI preserves the original text.
Extracting from different language websites
Training your robot
Navigate to the website in any language.
Train normally - click, select, extract.
Label in your language - field names can be in English or any language.
Extract - robot captures text exactly as displayed.
What if the language I want to extract doesn't appear naturally?
You can train your robot to select the language on the website using a drop down, link, or button. From there continue the normal training steps.
You can change the country of an approved robot to trigger a different language.
Some websites trigger the language dynamically in the URL. If this is the case use your desired language as the Original URL.
Example multilingual extraction:
Product name (Arabic): منتج رائع
Price (Numbers): ٥٠٠ ريال
Description (English): Amazing product
Tags (Chinese): 高质量, 快速配送
What gets preserved
Content type | How it's handled |
Original text | Extracted exactly as shown |
Character encoding | UTF-8 preserved automatically |
Number formats | Regional formats maintained |
Date formats | Kept as displayed |
Currency symbols | Preserved with text |
RTL text | Direction maintained |
Common multi-language scenarios
E-commerce sites with multiple languages
Product names in local language, descriptions in English.
Extract each field as it appears.
Don't try to translate during extraction.
Post-process if translation needed.
Keep original for reference.
Mixed script content
Example: Japanese sites mixing Kanji, Hiragana, and English.
商品名: ノートパソコン (Laptop)
価格: ¥125,000
仕様: Intel Core i7
Right-to-left languages
Examples include: Arabic, Hebrew, Persian, and Urdu.
Text flows right-to-left
Numbers still read left-to-right
Mixed with LTR elements (URLs, English terms)
Browse AI automatically preserves direction and formatting.
Regional number and date formats
Different regions display numbers and dates differently. Browse AI approach extracts as displayed (you can standardize later if needed):
Region | Number format | Date format |
US | 1,234.56 | 12/31/2024 |
Europe | 1.234,56 | 31/12/2024 |
India | 1,234.56 | 31-12-2024 |
ISO | 1234.56 | 2024-12-31 |
Post-extraction processing
Exporting multi-language data
UTF-8 encoding preserved
Open with Excel using UTF-8 option
Google Sheets handles automatically
Perfect for preserving encoding
Maintains all special characters
Better for programmatic processing
Common issues
Problem | Cause | Solution |
Characters show as ??? | Wrong encoding in viewer | Use UTF-8 compatible software |
Excel shows gibberish | Default encoding wrong | Import as UTF-8 |
Database errors | Column encoding mismatch | Set database to UTF-8 |
