A Simple Guide to Encode and Decode HTML Entities in Python String – Python Web Crawler Tutorial

By | July 17, 2019

When you have got the content of a web page by a python crawler, you should decode html entities so that you can save it into a database. In this tutorial, we will introduce how to encode and decode html entities in a python string.

Decode HTML Entities in Python String

In this tutorial, we use python 3.5.

preliminaries

#import model
import html

Create a python string need decoded

html_str = '<Python>  is nice programming language & this is a test.'

Decode string

print(html.unescape('<Python>  is nice programming language & this is a test.'))

The result is:

<Python>  is nice programming language & this is a test.

Then you can save and process it safely.

Encode result

print(html.escape('<Python>  is nice programming language & this is a test.'))

The result is:

&lt;Python&gt;  is nice programming language &amp; this is a test.

Then you can display in a web page correctly.

 

Leave a Reply