How to use Perl Regex to detect <p> inside another <p> -


i trying parse "wrong html" fix using perl regex. wrong html following: <p>foo<p>bar</p>foo</p>

i perl regex return me : <p>foo<p>

i tried like: '|(<p\b[^>]*>(?!</p>)*?<p[^>]*>)|' no success because cannot repeat (?!</p>)*?

is there way in perl regex charactère except following sequence (in case </p>)

try like:

<p>(?:(?!</?p>).)*</p>(?!(?:(?!</?p>).)*(<p>|$)) 

a quick break down:

<p>(?:(?!</?p>).)*</p> 

matches <p> ... </p> not contain either <p> , </p>. , part:

(?!(?:(?!</?p>).)*(<p>|$)) 

is "true" when looking ahead ((?! ... )) there no <p> or end of input ((<p>|$)), without <p> , </p> in between ((?:(?!</?p>).)*).

a demo:

my $txt="<p>aaa aa a</p> <p>foo <p>bar</p> foo</p> <p> bb <p>x</p> bb</p>"; while($txt =~ m/(<p>(?:(?!<\/?p>).)*<\/p>)(?!(?:(?!<\/?p>).)*(<p>|$))/g) {   print "found: $1\n"; } 

prints:

found: <p>bar</p> found: <p>x</p> 

note regex trickery works <p>baz</p> in string:

<p>foo <p>bar</p> <p>baz</p> foo</p> 

<p>bar</p> not matched! after replacing <p>baz</p>, 2nd run on input, in case <p>bar</p> matched.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -