Fix u'ufeff' Invalid Character When Reading File in Python

When we are reading content from a text file using python, we may get invalid character \ufeff. In this tutorial, we will introduce how to remove it.

For example:

We may use code below to read a file.

with open("test.txt", 'rb') as f:
    for line in f:
        line = line.decode('utf-8', 'ignore')
        line = line.strip().split('\t')

Here line is the content in test.txt

However, we may find \ufeff in line.

How to remove \ufeff?

The simplest way is to use utf-8-sig encoding.

For example:

with open("test.txt", 'rb') as f:
    for line in f:
        line = line.decode('utf-8-sig', 'ignore')
        line = line.strip().split('\t')

Then, we will find \ufeff is removed.

2 thoughts on “Fix u’\ufeff’ Invalid Character When Reading File in Python – Python Tutorial”

Matthias January 9, 2022

Hello,
there seems to be a better way, without possibly destroying the encoding by down-converting to utf-8.
I had this problem when loading utf-16le files.

with open(“”, encoding=”utf-16le”) as f
line = f.readline().lstrip(“\ufeff”)
…

Some remarks:
– The \ufeff is only found in the first line. It’s the beginning of the file.
– Because I don’t know which encoding an incoming file has, I did the following. Surely there is a better way but it works (on Linux):

output = subprocess.check_output([“file”, “–mime-encoding”, “”], universal_newlines=True)
encoding = output.split(” “)[1].rstrip()

with open(“”, encoding=encoding) as f
…

↓

admin Post authorJanuary 10, 2022

yes, that is a good solution.

Log in to Reply ↓

How to remove \ufeff?

2 thoughts on “Fix u’\ufeff’ Invalid Character When Reading File in Python – Python Tutorial”

Leave a Reply Cancel reply