It is easy to split a python string by split() function.
For example:
text = "tutorialexample.com is a popular tutorial site." x = text.split(".") print(x)
Here text is splitted by . delimiter, we will get a string list.
['tutorialexample', 'com is a popular tutorial site', '']
However, if we plan to split a python string with multiple string delimiters, how to do?
How to split a python string with multiple string delimiters?
For example:
text ="this is a test<b>test</b>, hi good boy<em>nice</em>me"
and our delimiters is a python list:
d = ["<b>","</b>", "<em>","</em>"]
We can use python regular expression to do.
pattern = "<[/]{0,1}[^/]{1,10}>" sentences = re.split(f"{pattern}", text) print(sentences)
In this exmaple, pattern can contains all delimiters in parameter b.
Run this code, we will get:
['this is a test', 'test', ', hi good boy', 'nice', 'me']
However, if you want to display all delimiters in result, you can do as follows:
pattern = "<[/]{0,1}[^/]{1,10}>" sentences = re.split(f"({pattern})", text) print(sentences)
Then, we will see:
['this is a test', '<b>', 'test', '</b>', ', hi good boy', '<em>', 'nice', '</em>', 'me']
Moreover, if it is hard to create a regular expression, we can do as follows:
text ="this is a test<b>test</b>, hi good boy<em>nice</em>me" d = ["<b>","</b>", "<em>","</em>"] pattern = [x for x in d] pattern = "|".join(pattern) print(pattern) sentences = re.split(f"({pattern})", text) print(sentences)
Run this code, we will get:
<b>|</b>|<em>|</em> ['this is a test', '<b>', 'test', '</b>', ', hi good boy', '<em>', 'nice', '</em>', 'me']
More examples:
text = "The program calculates f0, energy and duration features from speech wav-file, performs continuous wavelet analysis" d = ["ro","en", "ure","orm"] pattern = [x for x in d] pattern = "|".join(pattern) print(pattern) sentences = re.split(f"({pattern})", text) print(sentences) sentences = re.split(f"{pattern}", text) print(sentences)
we will get:
ro|en|ure|orm ['The p', 'ro', 'gram calculates f0, ', 'en', 'ergy and duration feat', 'ure', 's f', 'ro', 'm speech wav-file, perf', 'orm', 's continuous wavelet analysis'] ['The p', 'gram calculates f0, ', 'ergy and duration feat', 's f', 'm speech wav-file, perf', 's continuous wavelet analysis']