When we are reading a text file using python, you may find this UnicodeDecodeError: UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa2 in position 5871: illegal multibyte sequence, which can make you fail to read a file. In this tutorial, we will tell you how to fix this error.
We often use python to read a text file like this:
with open('data.txt', 'r') as f: #read a text file for line in f: check = line.strip().split()
Run this script, you may get this error:
How to fix this error?
The solution is to add encoding type when you are reading.
You can change code above to this:
with open('data.txt', 'r', encoding = 'utf-8') as f: # for line in f: check = line.strip().split()
Run this code, you will find this UnicodeDecodeError is fixed successfully.
There is another way to fix this problem.
with open(embedding_file_path, 'rb') as f: for line in f: line = line.decode('gbk', 'ignore').strip()
In this example, we will ignore characters that can not be encoded by gbk, this way is also can fix this error.