Fix Python Pandas Read CSV File: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc8 in position 0: invalid continuation byte – Python Pandas Tutorial

By | March 24, 2020

Python pandas can allow us to read csv file easily, however, you may find this error: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc8 in position 0: invalid continuation byte. We will tell you how to fix this error in this tutorial.

You may read a csv file using python pandas like this:

import pandas as pd

file = r'data/601988.csv'

df = pd.read_csv(file, sep=',')
print(df)

Run this python code, you will get this UnicodeDecodeError.

UnicodeDecodeError 'utf-8' codec can't decode byte 0xc8 in position 0 - invalid continuation byte

Why does this error occur?

Python pandas will read a csv file using utf-8 encoding defautly. However, if the character encoding of this csv file is not utf-8, UnicodeDecodeError may occur.

How to fix this error?

In this example, the character encoding of csv file is cp936 (gbk). We should use this character encoding to read csv file using pandas library.

To get the character encoding of a csv file using python, you can read this tutorial.

Python Get Text File Character Encoding: A Beginner Guide – Python Tutorial

In this tutorial, we can use code below to fix this error.

Here is an example:

import pandas as pd

file = r'data/601988.csv'

df = pd.read_csv(file, sep=',', encoding='gbk')
print(df)

where encoding is the character encoding of the csv file you plan to read.

Run this python code, you will find this error is fixed.

fix UnicodeDecodeError 'utf-8' codec can't decode byte 0xc8 in position 0 - invalid continuation byte

Leave a Reply

Your email address will not be published. Required fields are marked *