When we are reading content from a text file using python, we may get invalid character \ufeff. In this tutorial, we will introduce how to remove it.
For example:
We may use code below to read a file.
with open("test.txt", 'rb') as f: for line in f: line = line.decode('utf-8', 'ignore') line = line.strip().split('\t')
Here line is the content in test.txt
However, we may find \ufeff in line.
How to remove \ufeff?
The simplest way is to use utf-8-sig encoding.
For example:
with open("test.txt", 'rb') as f: for line in f: line = line.decode('utf-8-sig', 'ignore') line = line.strip().split('\t')
Then, we will find \ufeff is removed.