Fix UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 0

When you are crawling web page, you may get this error: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0. In this tutorial, we will introduce how to fix this error.

Code generates this error

content = crawl_response.read().decode("utf-8")

content = crawl_response.read().decode("utf-8")

content = crawl_response.read().decode("utf-8")

Then run this code, you may get error:

content = crawl_response.read().decode(“utf-8”)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0: invalid start byte

If you do not decode(“utf-8”), you may get this output.

From output, you will find the content is not encoded by utf-8.

Check response header on Content-Encoding

We find:

Content-Encoding: br

Content-Encoding: br

Content-Encoding: br

Which means the response content is compressed by Brotli algorithm, if you want to print it correctly, you should decompress it firstly.

Understand Content-Encoding: br and Decompress String – Python Web Crawler Tutorial

Here is an simple example to decompress content compressed by Brotli algorithm, you can check and learn how to decompress string with it.

content = crawl_response.read()
import brotli
content = brotli.decompress(content)
content = content.decode("utf-8")
print(content)

content = crawl_response.read()
import brotli
content = brotli.decompress(content)
content = content.decode("utf-8")
print(content)

content = crawl_response.read()
import brotli
content = brotli.decompress(content)
content = content.decode("utf-8")
print(content)

Fix UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0 – Python Tutorial

Code generates this error

Check response header on Content-Encoding

Leave a Reply Cancel reply