It is easy to find elements in a html page using python beautifulsoup. In this tutorial, we will introduce how to find by html element attribute.
HTML Element Attribute
In order to understand what is element attribute, we can see example below:
<div class="s-prose js-post-body" itemprop="text"> <p>I do nontire small program and it threw me off. </p> <p>How do I just play a single audio file? </p> </div>
As to html div, it contains two attributes: class and itemprop.
We can find it by these two attributes.
How to find html elements by attributes?
We can use syntax below:
soup.find_all(attrs={"attribute" : "value"})
For example, if we find this div by its class attribute, we can do as follows:
from bs4 import BeautifulSoup html_doc = '<div class="s-prose js-post-body" itemprop="text"><p>I do nontire small program and it threw me off. </p><p>How do I just play a single audio file? </p></div>' soup = BeautifulSoup(html_doc, 'html.parser') eles = soup.find_all(attrs={"class" : "s-prose js-post-body"}) print(eles)
Run this code, we will see:
[<div class="s-prose js-post-body" itemprop="text"><p>I do nontire small program and it threw me off. </p><p>How do I just play a single audio file? </p></div>]
If we plan to find div by its itemprop attribute, we can do like this:
eles = soup.find_all(attrs={"itemprop" : "text"}) print(eles)
However, we should notice: the value of attribute also can be part.
For example, if we use class attribute to find all div elements in this example, we also can do as follows:
eles = soup.find_all(attrs={"class" : "js-post-body"}) print(eles)
It means will find all html elements that contain string “js-post-body” in class attribute.
We also can determine what type of html elements we should find by attributes. For example:
If we only want to find all html div elements by its class attribute, we can do as follows:
eles = soup.find_all("div", attrs={"class" : "js-post-body"}) print(eles)
However, if we find all p by js-post-body, we will find nothing in this example:
eles = soup.find_all("p", attrs={"class" : "js-post-body"}) print(eles)
Run this code, we will get:
[]