html - Handle <nobr> tag in python sgmllib -


i'm trying parse page using python script. <nobr> tag along '&' giving me trouble. here actual html.

<a href="http://enpass.in/algo/c12.html" class="style"> <nobr>simulation 1st & 2nd path</nobr></a> 

now handle_data function of parser(using sgmllib) not able handle data properly. here handle_data code.

def handle_data(self, data):         self.datainfo.append(data) 

i expect datainfo array have 1 element namely "simulation 1st & 2nd path"

however, when print datainfo array, actual contents of datainfo array 7 in number.

datainfo -> ['', '', 'simulation 1st', '&', '2nd path', '', ''] 

whats happening?

you need encode ampersand, &amp; become valid html.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -