FullSearch
The FullSearch algorithm computes H for each of the (K!)R−1 alignments of the K clusters in R replicates. Considering all possible vectors of permutations (I, P2, P3, …, PR), the algorithm computes H(I(Q1),P2(Q2),P3(Q3),…,PR(QR)) and returns the vector of permutations that maximizes SSCR. As the number of possible vectors grows quickly with K and R, however, even for moderate values of K and R, it is unrealistic to test every alignment. The FullSearch algorithm runs in time proportional to TFullSearch = (K!)R−1[R(R−1)/2]KC: the number of permutation vectors is (K!)R−1, the number of computations of G in each evaluation of Equation (4) is R(R−1)/2, and the time required for each computation of G is proportional to KC. Although it proceeds slowly, unlike our other algorithms, the FullSearch algorithm is guaranteed to find the optimal alignment of clusters across multiple runs.
Greedy
The Greedy algorithm employed by CLUMPP proceeds as follows:
Choose one run, Q1.
Choose a second run, Q2, and fix the permutation, P2, that maximizes G(P1(Q1),P(Q2)) over all possible permutations P (where P1 is the identity permutation).
Continue sequentially with each remaining run, Qx, where x=3,…,R, and fix the permutation, Px of Qx, that maximizes the average similarity with the previously fixed x−1 runs, or
(7)
over all permutations P.
This algorithm runs in time proportional to TGreedy=(K !)[R(R−1)/2]KC. The number of permutations tested for each run from 2 to R is K !. For run r (r ranging from 2 to R), the number of computations of G performed for each permutation is r−1. Thus, considering all runs from 2 to R, the total number of computations of G performed is (K !)[R(R−1)]/2. Each computation of G runs in time proportional to KC.
Because the order in which the runs are considered can affect the result, several different sequences of runs should be tested. CLUMPP offers three options for testing different sequences: test all possible sequences of runs, test a pre-defined number of random sequences and test a specific set of user-defined sequences.
LargeKGreedy
When K≳15, the number of permutations, K !, is very large, and it may not be possible to calculate G for all permutations of a particular pair of Q-matrices. Instead of testing every permutation as in the Greedy algorithm, the LargeKGreedy algorithm proceeds as follows:
(1)Choose one run, Q1.
(2a)Choose a second run, Q2. Compute G for all pairs of columns, one from Q1 and one from Q2. This computation is simply the value of G for two columns—hence no permutations of Q2 are computed, unlike in step 2 for the Greedy algorithm.
(2b)Pick the pair of columns Q1,y1 and Q2,z1 with highest G-value and fix these columns (Q1,y1 refers to column y1 of matrix Q1). Then pick the pairs of columns Q1,y2 and Q2,z2 with the next highest G-value, ignoring all G-values of pairs containing either of the previously chosen columns Q1,y1 and Q2,z1. Repeat this procedure until K pairs of columns, one each from Q1 and Q2, have been picked, and fix the permutation of Q2 that matches up these pairs of columns, or P2(Q2).
(3a)Continue sequentially with each remaining run, Qx, where x = 3,…, R. For each y and z from 1 to K, compute the average similarity,
(8)
where Pℓ,y denotes column y of the permuted matrix Pℓ(Qℓ). This quantity is the similarity of column z of Qx to column y of each of the previously fixed permutations, averaged across all runs previously considered. No permutations of Qx are computed, unlike in Step 3 for the Greedy algorithm.
(3b)Pick the pair of columns y1 of P1(Q1),P2(Q2),…,Px−1(Qx−1) and z1 of Qx with highest average similarity in Equation (8). Then pick the columns y2 and z2 with the next highest similarity in Equation (8), ignoring similarity scores of pairs containing either of the previously chosen columns y1 and z1. Repeat this procedure until K pairs of columns, one for the matrices P1(Q1),P2(Q2),…,Px−1(Qx−1) and one for Qx, have been picked. Fix the per