How to use Perl Regex to detect <p> inside another <p> -
i trying parse "wrong html" fix using perl regex. wrong html following: <p>foo<p>bar</p>foo</p>
i perl regex return me : <p>foo<p>
i tried like: '|(<p\b[^>]*>(?!</p>)*?<p[^>]*>)|' no success because cannot repeat (?!</p>)*?
is there way in perl regex charactère except following sequence (in case </p>)
try like:
<p>(?:(?!</?p>).)*</p>(?!(?:(?!</?p>).)*(<p>|$)) a quick break down:
<p>(?:(?!</?p>).)*</p> matches <p> ... </p> not contain either <p> , </p>. , part:
(?!(?:(?!</?p>).)*(<p>|$)) is "true" when looking ahead ((?! ... )) there no <p> or end of input ((<p>|$)), without <p> , </p> in between ((?:(?!</?p>).)*).
a demo:
my $txt="<p>aaa aa a</p> <p>foo <p>bar</p> foo</p> <p> bb <p>x</p> bb</p>"; while($txt =~ m/(<p>(?:(?!<\/?p>).)*<\/p>)(?!(?:(?!<\/?p>).)*(<p>|$))/g) { print "found: $1\n"; } prints:
found: <p>bar</p> found: <p>x</p> note regex trickery works <p>baz</p> in string:
<p>foo <p>bar</p> <p>baz</p> foo</p> <p>bar</p> not matched! after replacing <p>baz</p>, 2nd run on input, in case <p>bar</p> matched.
Comments
Post a Comment