When you have got the content of a web page by a python crawler, you should decode html entities so that you can save it into a database. In this tutorial, we will introduce how to encode and decode html entities in a python string.
In this tutorial, we use python 3.5.
preliminaries
#import model import html
Create a python string need decoded
html_str = '<Python> is nice programming language & this is a test.'
Decode string
print(html.unescape('<Python> is nice programming language & this is a test.'))
The result is:
<Python> is nice programming language & this is a test.
Then you can save and process it safely.
Encode result
print(html.escape('<Python> is nice programming language & this is a test.'))
The result is:
<Python> is nice programming language & this is a test.
Then you can display in a web page correctly.