using regular expression to split string with multiple spaces.
I'm trying to split a string that is delimited by multiple spaces i.e:
string1 = "abcd efgh a. abcd b efgh"
print re.findall(r"[\w.]+")
as expected, the results are:
['abcd', 'efgh', 'a.', 'abcd', 'b', 'efgh']
However, I would like to group 'a.' and 'abcd' into the same group, and
'b' and 'efgh' into the same group. So the result I want would look
something like:
['abcd', 'efgh', 'a. abcd', 'b efgh']
My approach at the moment is to create two types of expression. The first
to deal with the regular expression without the space i.e. 'abcd' and
'efgh'. The second to deal with the ones with a single space. i.e. 'a.' +
'abcd'.
So if r'[\w]+ can deal with the first type, and r'[\w]+ [\w]+ can deal
with the second type. But I don't know how to combine them into the same
expression using '|'.
As always, any other approaches are welcome. And thanks for your time!
No comments:
Post a Comment