After we have got text from a text file, we have to remove some special characters. In this tutorial, we will introduce how to remove them for python beginners.
Special Characters
Special characters are not stable, they may different based on different applications.
As to english, common characters are printable characters. Other characters are special characters.
To know what are printable characters, you can read the tutorial below:
An Introduction to ASCII (0 – 255) for Beginners
How to remove specail characters?
If you only plan to reserve the printable characters in english, you can do like this:
import re text = "©tutorialexample.com is a blog site." pattern = re.compile(r'[^\x20-\x7F]') text = re.sub(pattern, '', text) print(text)
Here text contains a specail character ©, we remove it.
However, if you have known specail characters you plan to remove, you can do like this:
text = "©tutorialexample.com is a blog site." sp = ['©', 'a'] text = [ t for t in text if t not in sp] print(''.join(text))
In this example, ‘©‘ and ‘a‘ are special characters, we will remove them. You can replace them by your own special characters.