How to detect UNICODE file?

I've tried opening a unicode file by many methods. If I knew "the file is ANSI or Unicode" then I would use the proper method to solve them. Now I'm having a strange text file. (Ansi - Unicode??? - nobody knows unless it's opened by someone) Actually I couldn't detect any text file what type of a text file is. ANSI? UNICODE? So I got a big trouble. If the detection failed I would not open any text file properly and correctly.

Does any one know? Any help would be greatly appreciated. :)
Last edited on
You open the file in binary mode and check the BOM (Byte Order Mark) - the first
few bytes of the file


Byte order mark

Encoding     BOM (hex) BOM (dec)
UTF-8        EF BB BF  239 187 191
UTF-16 (BE)  FE FF     254 255
UTF-16 (LE)  FF FE     255 254

(see Wikipedia for more)

But note that not all UTF-8 files have the BOM. And this might be the case for other encodings, too. Though modern editors are supposed to use a BOM when they write a file.

Without a BOM, you'd need to to use some sort of statistical approach, like these guys:

A composite approach to language/encoding detection

Last edited on
Topic archived. No new replies allowed.