html parsing - extracting paragraph in python using lxml -
i extract paragraphs in html python. used lxml module doesn't looking for.
print html.parse(url).xpath('//p')[1].text_content() <span id="midarticle_1"></span><p>here first paragraph.</p><span id="midarticle_2"></span><p>here second paragraph.</p><span id="midarticle_3"></span><p>paragraph three."</p>
i should add that, in different pages have different number of paragraph, make list , put paragraph after that.
print html.parse(url).xpath('//p/text()')
output
['here first paragraph.', 'here second paragraph.', 'paragraph three."']
Comments
Post a Comment