html - Handle <nobr> tag in python sgmllib -
i'm trying parse page using python script. <nobr> tag along '&' giving me trouble. here actual html.
<a href="http://enpass.in/algo/c12.html" class="style"> <nobr>simulation 1st & 2nd path</nobr></a> now handle_data function of parser(using sgmllib) not able handle data properly. here handle_data code.
def handle_data(self, data): self.datainfo.append(data) i expect datainfo array have 1 element namely "simulation 1st & 2nd path"
however, when print datainfo array, actual contents of datainfo array 7 in number.
datainfo -> ['', '', 'simulation 1st', '&', '2nd path', '', ''] whats happening?
you need encode ampersand, & become valid html.
Comments
Post a Comment