| Imadatobanisa (647) | |
|
I've tried opening a unicode file by many methods. If I knew "the file is ANSI or Unicode" then I would use the proper method to solve them. Now I'm having a strange text file. (Ansi - Unicode??? - nobody knows unless it's opened by someone) Actually I couldn't detect any text file what type of a text file is. ANSI? UNICODE? So I got a big trouble. If the detection failed I would not open any text file properly and correctly. Does any one know? Any help would be greatly appreciated. :) | |
|
Last edited on
|
|
| andywestken (1966) | ||
|
You open the file in binary mode and check the BOM (Byte Order Mark) - the first few bytes of the file From: Byte order mark http://en.wikipedia.org/wiki/Byte_order_mark
(see Wikipedia for more) But note that not all UTF-8 files have the BOM. And this might be the case for other encodings, too. Though modern editors are supposed to use a BOM when they write a file. Without a BOM, you'd need to to use some sort of statistical approach, like these guys: A composite approach to language/encoding detection http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html Andy | ||
|
Last edited on
|
||