html parsing - extracting paragraph in python using lxml -


i extract paragraphs in html python. used lxml module doesn't looking for.

print html.parse(url).xpath('//p')[1].text_content()  <span id="midarticle_1"></span><p>here first paragraph.</p><span id="midarticle_2"></span><p>here second paragraph.</p><span id="midarticle_3"></span><p>paragraph three."</p> 

i should add that, in different pages have different number of paragraph, make list , put paragraph after that.

print html.parse(url).xpath('//p/text()') 

output

['here first paragraph.', 'here second paragraph.',   'paragraph three."'] 

Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -