python regex match line if exist or not -


i have little problem regex.

here sample of text parse :

output = """ country : usa zzzzzzz continent : americ eeeeeee ------ country : china zzzzzzz continent : asia planet : earth ------- country : izbud zzzzzzz continent : gladiora zzzzzzz zzzzzzz planet : mars """ 

i want parse , return country, continent , planet.

so did regex :

results = re.findall(     r"""(?mx)         ^country\s:\s*(.+)\s         (?:^.+\s)*?         ^continent\s:\s*(.+)\s         (?:^.+\s)*?         (?:^planet\s:\s*(.+)\s)*? """,output) 

but return :

[('usa', 'americ', ''), ('china', 'asia', ''), ('izbud', 'gladiora', '')] 

and don't know regex wrong ?

if has idea, thanks.

i found pattern seems work:

r"""(?mx)     ^country\s:\s*(.+)\s     (?:^.+\s)*?     ^continent\s:\s*(.+)\s     (?:^.+\s)*?     (?:^(?:planet\s:\s*(.+)\s|-+\s|\z)) """ 

basically, changed last part has match 1 of following: planet stuff, bunch of -'s, or end of string. it's kind of ugly, way find ensure got planet stuff. 1 problem solution there has empty line @ end of string (as in example) or won't last match.

by way, partial solution fix last line of op's pattern has ? @ end rather *?. however, match planet info line following continent info. reason wasn't getting before *? lazy. avoid matching if possible.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -