html - Handle <nobr> tag in python sgmllib -
i'm trying parse page using python script. <nobr>
tag along '&' giving me trouble. here actual html.
<a href="http://enpass.in/algo/c12.html" class="style"> <nobr>simulation 1st & 2nd path</nobr></a>
now handle_data
function of parser(using sgmllib) not able handle data properly. here handle_data code.
def handle_data(self, data): self.datainfo.append(data)
i expect datainfo array have 1 element namely "simulation 1st & 2nd path"
however, when print datainfo array, actual contents of datainfo array 7 in number.
datainfo -> ['', '', 'simulation 1st', '&', '2nd path', '', '']
whats happening?
you need encode ampersand, &
become valid html.
Comments
Post a Comment