Python Detect File Charset and Read – Python Tutorial

By | March 8, 2023

When we plan to read a file using python, we should know the file charset. In this tutorial, we will introduce you how to detect.

Python read a file

We can read a file as follows:

with open("test.txt", "r", encoding = "utf-8") as f:
    pass

Here encoding = “utf-8”, which means the charset of test.txt is utf-8.

However, if the charst of test.txt is not utf-8, what charset we can use?

How to detect the charset of a file in python?

We can use python chardet library to detect.

First, we should install it.

pip install -i https://mirrors.aliyun.com/pypi/simple/ chardet --trusted-host mirrors.aliyun.com

Then we can use code below to detect the charset of a file.

def get_charset(fx):
    import chardet
    with open(fx, "rb") as f:
        data = f.read()
        charset = chardet.detect(data)["encoding"]
        return charset

Finally, we can use code below to read a file in python.

fx = "test.txt"
charset = get_charset(fx)
print(charset)
with open(fx, 'r', encoding=charset) as f:
    for line in f:
        pass

Moreover, if you want to detect a web page charset, you can read:

Python Detect Web Page Content Charset Type – Python Web Crawler Tutorial