Understand the Difference Between cp936 and utf8 Encoding: A Beginner Guide – Python Tutorial

By | March 24, 2020

cp936 and utf8 are two character encoding methods. What is the difference between them? We will discuss this difference in this tutorial, which is very useful when you are reading file using python.

The difference between cp936 and utf8

cp936 is also called gbk or ms936, which is often used to encode unified chinese language.

utf8 is also called utf_8, u8, utf, which is often used to encode all languages in the word. It not only can encode unified chinese, but also can encode languages such as japanese, english.

Here is a summary table:

cp936 gbk, ms936 unified chinese
utf8 utf_8, u8, utf all languages

You can get the character encoding of a text file easily in python. Here is an example:

Python Get Text File Character Encoding: A Beginner Guide – Python Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *