When we plan to read a file using python, we should know the file charset. In this tutorial, we will introduce you how to detect.
Python read a file
We can read a file as follows:
with open("test.txt", "r", encoding = "utf-8") as f: pass
Here encoding = “utf-8”, which means the charset of test.txt is utf-8.
However, if the charst of test.txt is not utf-8, what charset we can use?
How to detect the charset of a file in python?
We can use python chardet library to detect.
First, we should install it.
pip install -i https://mirrors.aliyun.com/pypi/simple/ chardet --trusted-host mirrors.aliyun.com
Then we can use code below to detect the charset of a file.
def get_charset(fx): import chardet with open(fx, "rb") as f: data = f.read() charset = chardet.detect(data)["encoding"] return charset
Finally, we can use code below to read a file in python.
fx = "test.txt" charset = get_charset(fx) print(charset) with open(fx, 'r', encoding=charset) as f: for line in f: pass
Moreover, if you want to detect a web page charset, you can read:
Python Detect Web Page Content Charset Type – Python Web Crawler Tutorial