Fix UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0 – Python Tutorial

By | July 16, 2019

When you are crawling web page, you may get this error: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0. In this tutorial, we will introduce how to fix this error.

Code generates this error

  1. content = crawl_response.read().decode("utf-8")
content = crawl_response.read().decode("utf-8")

Then run this code, you may get error:

  content = crawl_response.read().decode(“utf-8”)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 0: invalid start byte

If you do not decode(“utf-8”), you may get this output.

http response br encoding

From output, you will find the content is not encoded by utf-8.

Check response header on Content-Encoding

We find:

  1. Content-Encoding: br
Content-Encoding: br

Which means the response content is compressed by Brotli algorithm,  if you want to print it correctly, you should decompress it firstly.

Understand Content-Encoding: br and Decompress String – Python Web Crawler Tutorial

Here is an simple example to decompress content compressed by Brotli algorithm, you can check and learn how to decompress string with it.

  1. content = crawl_response.read()
  2. import brotli
  3. content = brotli.decompress(content)
  4. content = content.decode("utf-8")
  5. print(content)
content = crawl_response.read()
import brotli
content = brotli.decompress(content)
content = content.decode("utf-8")
print(content)

Leave a Reply