Python pandas can allow us to read csv file easily, however, you may find this error: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc8 in position 0: invalid continuation byte. We will tell you how to fix this error in this tutorial.
You may read a csv file using python pandas like this:
import pandas as pd file = r'data/601988.csv' df = pd.read_csv(file, sep=',') print(df)
Run this python code, you will get this UnicodeDecodeError.
Why does this error occur?
Python pandas will read a csv file using utf-8 encoding defautly. However, if the character encoding of this csv file is not utf-8, UnicodeDecodeError may occur.
How to fix this error?
In this example, the character encoding of csv file is cp936 (gbk). We should use this character encoding to read csv file using pandas library.
To get the character encoding of a csv file using python, you can read this tutorial.
Python Get Text File Character Encoding: A Beginner Guide – Python Tutorial
In this tutorial, we can use code below to fix this error.
Here is an example:
import pandas as pd file = r'data/601988.csv' df = pd.read_csv(file, sep=',', encoding='gbk') print(df)
where encoding is the character encoding of the csv file you plan to read.
Run this python code, you will find this error is fixed.