Best Practice to Extract and Remove URLs from Python String – Python Tutorial

By | August 13, 2019

In some comments, there ar some urls in them, if you want to remove them before displaying, you can read this tutorial. In this tutorial, we will introduce you on how to extract and remove urls from a python string.

python extract and remove urls

Import library

import re

Create a python string which contains some urls

text = 'My blog is https://www.tutorialexample.com and not https://tutorialexample.com'

Create a regular regression to match url

pattern=r'(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

Match urls

match = re.findall(pattern, text)

Print urls

print(match)

The output is:

[('https://www.tutorialexample.com', '', '', '', ''), ('https://tutorialexample.com', '', '', '', '')]

We have extracted urls from python string, then we will remove all of them.

Remove urls from python string

for m in match:
    url = m[0]
    text = text.replace(url, '')

Print result

print(text)

The output is:

My blog is  and not

Leave a Reply