Fix u’\ufeff’ Invalid Character When Reading File in Python – Python Tutorial

By | January 7, 2022

When we are reading content from a text file using python, we may get invalid character \ufeff. In this tutorial, we will introduce how to remove it.

For example:

We may use code below to read a file.

with open("test.txt", 'rb') as f:
    for line in f:
        line = line.decode('utf-8', 'ignore')
        line = line.strip().split('\t')

Here line is the content in test.txt

However, we may find \ufeff in line.

How to remove \ufeff?

The simplest way is to use utf-8-sig encoding.

For example:

with open("test.txt", 'rb') as f:
    for line in f:
        line = line.decode('utf-8-sig', 'ignore')
        line = line.strip().split('\t')

Then, we will find \ufeff  is removed.