Rss feed is an important source for capturing website content. In this tutorial, we will introduce how to parse rss feed xml file and get the information we want using python feedparser.
Install feedparser
We can use pip command to install it.
pip install feedparser
feedparser online documents
feedparser detailed documents are here:
https://feedparser.readthedocs.io/en/latest/
Common RSS Elements
In order to parse rss xml file, we should notice what elements are common used in rss. They are:
title, link, description, publication date, and entry ID.
You can find more rss elements in here:
https://www.rssboard.org/rss-profile
Here is an example of rss xml file.
How to parse rss feed using feedparser?
We will use an example to show you how to do.
import feedparser d = feedparser.parse('https://www.tutorialexample.com/feed/')
In this example, we will parse our blog feed.
Print artilce number
print(len(d['entries']))
You will get 10.
Parse the first article
We should notice the d[‘entries’] is a python list, each element is a python dictionary.
for k, v in d['entries'][0].items(): print(k + " = " + str(v))
Run this code, you may get this output.
Then we can get the information we want, then process and save them into our database. Here is the tutorial:
Python Select, Insert, Update and Delete Data from MySQL: A Completed Guide – Python Tutorial