python regex match line if exist or not -
i have little problem regex.
here sample of text parse :
output = """ country : usa zzzzzzz continent : americ eeeeeee ------ country : china zzzzzzz continent : asia planet : earth ------- country : izbud zzzzzzz continent : gladiora zzzzzzz zzzzzzz planet : mars """
i want parse , return country, continent , planet.
so did regex :
results = re.findall( r"""(?mx) ^country\s:\s*(.+)\s (?:^.+\s)*? ^continent\s:\s*(.+)\s (?:^.+\s)*? (?:^planet\s:\s*(.+)\s)*? """,output)
but return :
[('usa', 'americ', ''), ('china', 'asia', ''), ('izbud', 'gladiora', '')]
and don't know regex wrong ?
if has idea, thanks.
i found pattern seems work:
r"""(?mx) ^country\s:\s*(.+)\s (?:^.+\s)*? ^continent\s:\s*(.+)\s (?:^.+\s)*? (?:^(?:planet\s:\s*(.+)\s|-+\s|\z)) """
basically, changed last part has match 1 of following: planet stuff, bunch of -'s, or end of string. it's kind of ugly, way find ensure got planet stuff. 1 problem solution there has empty line @ end of string (as in example) or won't last match.
by way, partial solution fix last line of op's pattern has ? @ end rather *?. however, match planet info line following continent info. reason wasn't getting before *? lazy. avoid matching if possible.
Comments
Post a Comment