cp936 and utf8 are two character encoding methods. What is the difference between them? We will discuss this difference in this tutorial, which is very useful when you are reading file using python.
The difference between cp936 and utf8
cp936 is also called gbk or ms936, which is often used to encode unified chinese language.
utf8 is also called utf_8, u8, utf, which is often used to encode all languages in the word. It not only can encode unified chinese, but also can encode languages such as japanese, english.
Here is a summary table:
cp936 | gbk, ms936 | unified chinese |
utf8 | utf_8, u8, utf | all languages |
You can get the character encoding of a text file easily in python. Here is an example:
Python Get Text File Character Encoding: A Beginner Guide – Python Tutorial