Find HTML Elements by Attribute in BeautifulSoup – Python BeautifulSoup Tutorial

By | August 12, 2022

It is easy to find elements in a html page using python beautifulsoup. In this tutorial, we will introduce how to find by html element attribute.

HTML Element Attribute

In order to understand what is element attribute, we can see example below:

<div class="s-prose js-post-body" itemprop="text">
                
<p>I do nontire small program and it threw me off. </p>

<p>How do I just play a single audio file? </p>
</div>

As to html div, it contains two attributes: class and itemprop.

We can find it by these two attributes.

Find HTML Elements by Attribute in BeautifulSoup - Python BeautifulSoup Tutorial

How to find html elements by attributes?

We can use syntax below:

soup.find_all(attrs={"attribute" : "value"})

For example, if we find this div by its class attribute, we can do as follows:

from bs4 import BeautifulSoup

html_doc = '<div class="s-prose js-post-body" itemprop="text"><p>I do nontire small program and it threw me off. </p><p>How do I just play a single audio file? </p></div>'

soup = BeautifulSoup(html_doc, 'html.parser')
eles = soup.find_all(attrs={"class" : "s-prose js-post-body"})
print(eles)

Run this code, we will see:

[<div class="s-prose js-post-body" itemprop="text"><p>I do nontire small program and it threw me off. </p><p>How do I just play a single audio file? </p></div>]

If we plan to find div by its itemprop attribute, we can do like this:

eles = soup.find_all(attrs={"itemprop" : "text"})
print(eles)

However, we should notice: the value of attribute also can be part.

For example, if we use class attribute to find all div elements in this example, we also can do as follows:

eles = soup.find_all(attrs={"class" : "js-post-body"})
print(eles)

It means will find all html elements that contain string “js-post-body” in class attribute.

We also can determine what type of html elements we should find by attributes. For example:

If we only want to find all html div elements by its class attribute, we can do as follows:

eles = soup.find_all("div", attrs={"class" : "js-post-body"})
print(eles)

However, if we find all p by js-post-body, we will find nothing in this example:

eles = soup.find_all("p", attrs={"class" : "js-post-body"})
print(eles)

Run this code, we will get:

[]

Leave a Reply