html parsing - extracting paragraph in python using lxml -

html parsing - extracting paragraph in python using lxml -

January 15, 2010

i extract paragraphs in html python. used lxml module doesn't looking for.

print html.parse(url).xpath('//p')[1].text_content()  <span id="midarticle_1"></span><p>here first paragraph.</p><span id="midarticle_2"></span><p>here second paragraph.</p><span id="midarticle_3"></span><p>paragraph three."</p>

i should add that, in different pages have different number of paragraph, make list , put paragraph after that.

print html.parse(url).xpath('//p/text()')

output

['here first paragraph.', 'here second paragraph.',   'paragraph three."']

Comments