ASCII or UTF-8?
作者:互联网
ASCII or UTF-8?
问题
Long long time ago before world scripts birth, text files are all ASCII.
Nowadays, we have world scripts.
I would like to ask if I open up a text file in a hex editor, is there a way to tell its code page is in ASCII or UTF-8?
回答1
UTF-8 is backwards compatible with ASCII: an ASCII text file is also a UTF-8 text file.
If a file contains bytes starting with 8 through F it's not ASCII.
If a file is not ASCII, it may be UTF-8 if every byte that starts with C, D, E, or F is followed by one to three bytes that start with 8, 9, A, or B. If any of these bytes appears in any other context it's not UTF-8.
There are a few more requirements for valid UTF-8, but they are harder to glean with a hex editor. See https://en.m.wikipedia.org/wiki/UTF-8
标签:UTF,bytes,hex,file,text,ASCII 来源: https://www.cnblogs.com/chucklu/p/16394370.html