Entropy Rate of a source of information with memory -
i have english written text , calculated entropy of it. realized compression algorithms based on lz methods compress under limit given entropy.
that's due fact source of information models english text has memory. boundary compression given entropy rate , not entropy of source.
i saw definition of entropy rate of source memory wondering how possible calculate entropy rate algorithm or pseudo code text written in english.
any ideas?
thanks help.
in general, estimating entropy rate of source memory hard problem, , when there lots of long-distance dependencies (as there in natural language) it's difficult. need construct grammar of language based on sample, , calculate entropy rate of grammar. particular assumptions make when extracting grammar going make big difference in entropy rate end with, best you'll able estimate weak bounds on entropy rate of actual source.
a common practice estimate entropy rate compressing large sample using standard compression program gzip, though cosma shalizi points out that's terrible idea. if you're going use general data compression algorithm, lz76 better choice; fermÃn moscoso del prado has paper coming out looking @ of alternatives.
while compression algorithms sort of give kind of grammar, better use grammar accurately captures long-distance dependencies in language. unfortunately, learning realistic natural language grammars samples of raw text open research problem. but, there promising work on learning finite state approximations natural language used produce entropy rate estimates. check out cssr 1 way of approaching problem along lines.
Comments
Post a Comment