regex - Replace sequential repeating tags with one of that tag in Ruby -
i'm trying replace multiple sequential <br>
tags 1 <br>
tag using ruby.
for instance:
hello <br><br/><br> world!
would become
hello <br> world!
you could regular expression, like:
"hello\n<br><br/><br>\nworld".gsub(/(?im)(<br\s*\/?>\s*)+/,'<br>')
to explain that: (?im)
part has options indicating match should case-insensitive , .
should match newlines. grouped expression (<br\s*\/?>\s*)
matches <br>
(optionally whitespace , trailing /
) possibly followed whitespace, , +
says match 1 or more of group.
however, should point out in general it's not idea use regular expressions manipulating html - should use proper parser instead. example, here's better way of doing using nokogiri:
require 'nokogiri' document = nokogiri::html.parse("hello <br><br/><br> world!") document.search('//br').each |node| node.remove if node.next.name == 'br' end puts document
that produce output like:
<!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd"> <html><body><p>hello <br> world!</p></body></html>
(the parser turns input well-formed document, why have doctype , enclosing <html><body><p>
tags.)
Comments
Post a Comment