Fix Python Read File: UnicodeDecodeError: ‘gbk’ codec can’t decode byte illegal multibyte sequence – Python Tutorial

By | November 13, 2019

When we are reading a text file using python, you may find this UnicodeDecodeError: UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa2 in position 5871: illegal multibyte sequence, which can make you fail to read a file. In this tutorial, we will tell you how to fix this error.

We often use python to read a text file like this:

with open('data.txt', 'r') as f: #read a text file
        for line in f:
            check = line.strip().split()

Run this script, you may get this error:

python read file unicodedecodeerror

How to fix this error?

The solution is to add encoding type when you are reading.

You can change code above to this:

with open('data.txt', 'r', encoding = 'utf-8') as f: #
        for line in f:
            check = line.strip().split()

Run this code, you will find this UnicodeDecodeError is fixed successfully.

There is another way to fix this problem.

with open(embedding_file_path, 'rb') as f: 
        for line in f:
            line = line.decode('gbk', 'ignore').strip()

In this example, we will ignore characters that can not be encoded by gbk, this way is also can fix this error.

Leave a Reply