We often use python BeautifulSoup package to parse a html page to get html tags. However, the tag .string attribution often return None. In this tutorial, we will use some examples to how you how to fix this problem.
Parse a html page by BeautifulSoup
Here is an example:
from bs4 import BeautifulSoup html_content = '<html><div><span>Tutorial Example</span> https://www.tutorialexample.com</div></html>' soup = BeautifulSoup(html_content, "html.parser")
Parse a html string and get all div tags
tags = soup.find_all('div')
Output the content of each div tag
for tag in tags: print(tag.string)
We will plan to use .string attribution to output the text in each div tag.
Run this python code, you will get this result: None
Why does .string return None?
As to this example, the .string attribution of each div tag which contains only 0 or one html tag can not return None.
If the html is:
html_content = '<html><div>https://www.tutorialexample.com</div></html>'
There is not any html tag in html div tag, then
for tag in tags: print(tag.string)
The result will be: https://www.tutorialexample.com
Moreover, if the html is:
html_content = '<html><div><span>https://www.tutorialexample.com<span></div></html>'
There are only one html tag span in each div. The result is also be: https://www.tutorialexample.com
As to this html:
html_content = '<html><div><span>Tutorial Example</span> <span>https://www.tutorialexample.com<span></div></html>'
There are two span tags in div tag, the .string of each div tag is None.
How to get the text in div tag if .string is None?
We can use .text attribution. Here is an example:
for tag in tags: print(tag.text)
The text in html div tag is:
Tutorial Example https://www.tutorialexample.com