Fix u’\ufeff’ Invalid Character When Reading File in Python – Python Tutorial

By | January 7, 2022

When we are reading content from a text file using python, we may get invalid character \ufeff. In this tutorial, we will introduce how to remove it.

For example:

We may use code below to read a file.

with open("test.txt", 'rb') as f:
    for line in f:
        line = line.decode('utf-8', 'ignore')
        line = line.strip().split('\t')

Here line is the content in test.txt

However, we may find \ufeff in line.

How to remove \ufeff?

The simplest way is to use utf-8-sig encoding.

For example:

with open("test.txt", 'rb') as f:
    for line in f:
        line = line.decode('utf-8-sig', 'ignore')
        line = line.strip().split('\t')

Then, we will find \ufeff  is removed.

2 thoughts on “Fix u’\ufeff’ Invalid Character When Reading File in Python – Python Tutorial

  1. Matthias

    Hello,
    there seems to be a better way, without possibly destroying the encoding by down-converting to utf-8.
    I had this problem when loading utf-16le files.

    with open(“”, encoding=”utf-16le”) as f
    line = f.readline().lstrip(“\ufeff”)

    Some remarks:
    – The \ufeff is only found in the first line. It’s the beginning of the file.
    – Because I don’t know which encoding an incoming file has, I did the following. Surely there is a better way but it works (on Linux):

    output = subprocess.check_output([“file”, “–mime-encoding”, “”], universal_newlines=True)
    encoding = output.split(” “)[1].rstrip()

    with open(“”, encoding=encoding) as f

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *