manipulating string content value in html file using beautifulsoup -

August 15, 2015

folks

i new python , beautifulsoup - please bear me. trying html parsing.

i remove newlines , compact whitespace selected attributes (based on string search within html file.

for example, following html, search tags string attribute "xy" , remove newlines , multiple spaces string (replace single space.

<html>        <head></head>        <body>     <h1>xy         z</h1>     <p>xy         z</p>     <div align="center" style="margin-left: 0%; ">       <b>        <font style="font-family: 'times new roman', times">         ab    c        </font>        <font style="font-family: 'times new roman', times">         xy    z        </font>       </b>      </div>       </body>  </html>

the resulting html should like:

<html>      <head></head>      <body>     <h1>xy z</h1>     <p>xy z</p>     <div align="center" style="margin-left: 0%; ">       <b>        <font style="font-family: 'times new roman', times">         ab    c        </font>        <font style="font-family: 'times new roman', times">         xy z        </font>       </b>      </div>      </body>  </html>

ok - found way it...you use findall , use replacewith() method shown below.

......... soup = beautifulsoup(contents) s = soup.findall(text=re.compile("xy"))
s1 in s:
s1.replacewith(re.sub('\s+', ' ', str(s1)))
...........

Search This Blog

Assebmley

manipulating string content value in html file using beautifulsoup -

Comments

Post a Comment

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -