Javascript / jQuery Find Text duplicates -


how approach finding duplicates in text document. duplicates can set of consecutive words or sentence. sentence not necessary ending dot. let's page contains document of 200 lines of 2 sentences identical, want highlight 2 sentences duplicates when "check duplicate button" clicked.

interesting question — here's idea on how i'd probably: http://jsfiddle.net/saqas/1/ — not anyhow optimized!

var text = $('p').text(),     words = text.split(' '),     sortedwords = words.slice(0).sort(),     duplicatewords = [],     sentences = text.split('.'),     sortedsentences = sentences.slice(0).sort(),     duplicatesentences = [];   (var i=0; i<sortedwords.length-1; i++) {     if (sortedwords[i+1] == sortedwords[i]) {         duplicatewords.push(sortedwords[i]);     } } duplicatewords = $.unique(duplicatewords);  (var i=0; i<sortedsentences.length-1; i++) {     if (sortedsentences[i+1] == sortedsentences[i]) {         duplicatesentences.push(sortedsentences[i]);     } } duplicatesentences = $.unique(duplicatesentences);  $('a.words').click(function(){     var highlighted = $.map(words, function(word){         if ($.inarray(word, duplicatewords) > -1)             return '<span class="duplicate">' + word + '</span>';         else return word;     });     $('p').html(highlighted.join(' '));     return false; });  $('a.sentences').click(function(){     var highlighted = $.map(sentences, function(sentence){         if ($.inarray(sentence, duplicatesentences) > -1)             return '<span class="duplicate">' + sentence + '</span>';         else return sentence;     });     $('p').html(highlighted.join('.'));     return false; }); 

update 1

this 1 finds sequences of identical words: http://jsfiddle.net/yqdk5/1/ here shouldn't hard e.g. ignore kind of punctuation @ end of fragments when comparing — you'd have write own version of inarray method.

var text = $('p').text(),     words = text.split(' '),     sortedwords = words.slice(0).sort(),     duplicatewords = []     highlighted = [];  (var i=0; i<sortedwords.length-1; i++) {     if (sortedwords[i+1] == sortedwords[i]) {         duplicatewords.push(sortedwords[i]);     } } duplicatewords = $.unique(duplicatewords);  (var j=0, m=[]; j<words.length; j++) {     m.push($.inarray(words[j], duplicatewords) > -1);     if (!m[j] && m[j-1])          highlighted.push('</span>');     else if (m[j] && !m[j-1])         highlighted.push('<span class="duplicate">');     highlighted.push(words[j]); }  $('p').html(highlighted.join(' ')); 

update 2

my regex-fu weak, (pretty messy!) version seems work okay: http://jsfiddle.net/yqdk5/2/ — i'm pretty sure there might better way of doing this, i've got leave alone! :d — good luck!

update 3

thinking it, don't think code previous update good. that's why i've removed it. can still find here: http://jsfiddle.net/yqdk5/2/ main point use regex match words, along lines of:

/^word(\.?)$/ 

Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -