Javascript / jQuery Find Text duplicates -
how approach finding duplicates in text document. duplicates can set of consecutive words or sentence. sentence not necessary ending dot. let's page contains document of 200 lines of 2 sentences identical, want highlight 2 sentences duplicates when "check duplicate button" clicked.
interesting question — here's idea on how i'd probably: http://jsfiddle.net/saqas/1/ — not anyhow optimized!
var text = $('p').text(), words = text.split(' '), sortedwords = words.slice(0).sort(), duplicatewords = [], sentences = text.split('.'), sortedsentences = sentences.slice(0).sort(), duplicatesentences = []; (var i=0; i<sortedwords.length-1; i++) { if (sortedwords[i+1] == sortedwords[i]) { duplicatewords.push(sortedwords[i]); } } duplicatewords = $.unique(duplicatewords); (var i=0; i<sortedsentences.length-1; i++) { if (sortedsentences[i+1] == sortedsentences[i]) { duplicatesentences.push(sortedsentences[i]); } } duplicatesentences = $.unique(duplicatesentences); $('a.words').click(function(){ var highlighted = $.map(words, function(word){ if ($.inarray(word, duplicatewords) > -1) return '<span class="duplicate">' + word + '</span>'; else return word; }); $('p').html(highlighted.join(' ')); return false; }); $('a.sentences').click(function(){ var highlighted = $.map(sentences, function(sentence){ if ($.inarray(sentence, duplicatesentences) > -1) return '<span class="duplicate">' + sentence + '</span>'; else return sentence; }); $('p').html(highlighted.join('.')); return false; });
update 1
this 1 finds sequences of identical words: http://jsfiddle.net/yqdk5/1/ here shouldn't hard e.g. ignore kind of punctuation @ end of fragments when comparing — you'd have write own version of inarray
method.
var text = $('p').text(), words = text.split(' '), sortedwords = words.slice(0).sort(), duplicatewords = [] highlighted = []; (var i=0; i<sortedwords.length-1; i++) { if (sortedwords[i+1] == sortedwords[i]) { duplicatewords.push(sortedwords[i]); } } duplicatewords = $.unique(duplicatewords); (var j=0, m=[]; j<words.length; j++) { m.push($.inarray(words[j], duplicatewords) > -1); if (!m[j] && m[j-1]) highlighted.push('</span>'); else if (m[j] && !m[j-1]) highlighted.push('<span class="duplicate">'); highlighted.push(words[j]); } $('p').html(highlighted.join(' '));
update 2
my regex-fu weak, (pretty messy!) version seems work okay: http://jsfiddle.net/yqdk5/2/ — i'm pretty sure there might better way of doing this, i've got leave alone! :d — good luck!
update 3
thinking it, don't think code previous update good. that's why i've removed it. can still find here: http://jsfiddle.net/yqdk5/2/ main point use regex match words, along lines of:
/^word(\.?)$/
Comments
Post a Comment