Algorithm to find "most common elements" in different arrays -

i have example 5 arrays inserted elements (numbers):

1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30

i need find most common elements in arrays , every element should go way till end (see example below). in example bold combination (or same 1 "30" on end, it's "same") because contains smallest number of different elements (only two, 4 , 2/30).

this combination (see below) isn't because if have ex. "4" must "go" till ends (next array mustn't contain "4" @ all). combination must go way till end.

1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30

edit2: or

1,4,8,10
1,2,3,4,11,15
2,4,20,21
2,30

or else not good.

is there algorithm speed thing (if have thousands of arrays hundreds of elements in each one)?

to make clear - solution must contain lowest number of different elements , groups (of same numbers) must grouped first - larger ones last - smallest ones. in upper example 4,4,4,2 better 4,2,2,2 because in first example group of 4's larger group of 2's.

edit: more specific. solution must contain smallest number of different elements , elements must grouped first last. if have 3 arrrays like

1,2,3
1,4,5
4,5,6

solution 1,1,4 or 1,1,5 or 1,1,6 not 2,5,5 because 1's have larger group (two of them) 2's (only one).

thanks.

edit3: can't more specific :(

edit4: @spintheblack 1,1,1,2,4 correct solution because number used first time (let's @ position 1) can't used later (except it's in same group of 1's). grouping has "priority"? also, didn't mention (sorry that) numbers in arrays not sorted in way, typed way in post because easier me follow.

i assuming "distinct elements" not have distinct, can repeat in final solution. if presented [1], [2], [1] obvious answer [1, 2, 1] allowed. we'd count having 3 distinct elements.

if so, here python solution:

def find_best_run (first_array, *argv):     # initialize data structures.     this_array_best_run = {}     x in first_array:         this_array_best_run[x] = (1, (1,), (x,))      this_array in argv:         # find best runs ending @ each value in this_array         last_array_best_run = this_array_best_run         this_array_best_run = {}          x in this_array:             (y, pattern) in last_array_best_run.iteritems():                 (distinct_count, lengths, elements) = pattern                 if x == y:                     lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))                 else :                     distinct_count += 1                     lengths = tuple(lengths + (1,))                     elements = tuple(elements + (x,))                  if x not in this_array_best_run:                     this_array_best_run[x] = (distinct_count, lengths, elements)                 else:                     (prev_count, prev_lengths, prev_elements) = this_array_best_run[x]                     if distinct_count < prev_count or prev_lengths < lengths:                         this_array_best_run[x] = (distinct_count, lengths, elements)      # find best overall run     best_count = len(argv) + 10 # needs bigger possible answer.     (distinct_count, lengths, elements) in this_array_best_run.itervalues():         if distinct_count < best_count:             best_count = distinct_count             best_lengths = lengths             best_elements = elements         elif distinct_count == best_count , best_lengths < lengths:             best_count = distinct_count             best_lengths = lengths             best_elements = elements      # convert more normal representation.                     answer = []     (length, element) in zip(best_lengths, elements):         answer.extend([element] * length)      return answer  # example print find_best_run(     [1,4,8,10],     [1,2,3,4,11,15],     [2,4,20,21],     [2,30]) # prints [4, 4, 4, 30]

here explanation. ...this_run dictionaries have keys elements in current array, , have values tuples (distinct_count, lengths, elements). trying minimize distinct_count, maximize lengths (lengths tuple, prefer element largest value in first spot) , tracking elements end. @ each step construct possible runs combination of run previous array element next in sequence, , find ones best current. when end pick best possible overall run, turn conventional representation , return it.

if have n arrays of length m, should take o(n*m*m) time run.

Search This Blog

Assebmley

Algorithm to find "most common elements" in different arrays -

Comments

Post a Comment

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -