Merge Sort

Idea

Merge sort is an efficient algorithm that applies the concepts of divide and conquer to sort the list.

The key idea of divide and conquer is to recursively break down the problems into two or more sub-problems and they are same or related to the original problem, until these divided sub-problems are simple enough to solve directly. Then, the solutions of the original problem can be combined and derived by the solutions of all the sub-problems.

Divide and conquer

The calculation of Fibonacci number, F(n)=F(n1)+F(n2)F(n) = F(n-1) + F(n-2), is one example. To calculate F(n)F(n), it needs to find F(n1)F(n-1) and F(n2)F(n-2). Similarly, to calculate F(n1)F(n-1), it needs to F(n2)F(n-2) and F(n3)F(n-3). The sub-problems for calculating F(n1)F(n-1) and F(n2)F(n-2) have same form as the one for F(n)F(n).

Recursively, we will need to get F(n1)F(n-1), F(n2)F(n-2), ..., F(2)F(2), F(1)F(1) and F(1)F(1) and F(2)F(2) are easy enough to solve directly. They are both 11. Thus, F(3)=F(2)+F(1)=2F(3) = F(2) + F(1) = 2, F(4)=F(3)+F(2)=3F(4) = F(3) + F(2) = 3, ... and then F(n)F(n) can be computed.

Dividing the sorting-problem

Let's apply this concept to the sorting problem. If we want to sort the list L=[6,3,7,1,9,2,5]L = [6, 3, 7, 1, 9, 2, 5], the sub-lists [6,3,7,1],[9,2,5][6, 3, 7, 1], [9, 2, 5] must also be sorted, so we can narrow down our problem scope for handling the sub-lists. Next, [6,3,7,1][6, 3, 7, 1] can be divided to [6,3],[7,1][6, 3], [7, 1] and [6,3][6, 3] also can be split into [6][6] and [3][3]. Finally, [6],[3][6], [3] are not dividable so we stop breaking down the list.

       [6, 3, 7, 1, 9, 2, 5]
         /               \
   [6, 3, 7, 1]       [9, 2, 5]
    /        \          /     \
 [6, 3]    [7, 1]    [9, 2]  [5]
 /    \    /    \    /    \
[6]  [3]  [7]  [1]  [9]  [2]

In the same way, the whole list can be divided into [6],[3],[7],[1],[9],[2],[5][6], [3], [7], [1], [9], [2], [5].

Conquering the sub-problems

After there is only one element left, the subproblem is solved by nature since it's already sorted.

However, the problem becomes

how do we combine these sorted chunks into a sorted list

We need a method that can merge two sorted lists, L1[1...X]L_1[1...X] and L2[1...Y]L_2[1...Y], where X,Y1X, Y \geq 1, into a bigger sorted list L[1...(X+Y)]L[1...(X+Y)].

Combining all the results of sub-sorting-problem

Suppose we have two sorted lists A=[3,6,10,23]A = [3, 6, 10, 23] and B=[2,7,50,55]B = [2, 7, 50, 55]. We provide two ways to merge them into a sorted list.

Picking the smallest elements one by one

The simplest method is to pick the smallest elements iteratively by searching both lists from the minimal to maximal.

We only need to compare the left most elements of both lists and pick the smaller one since AA and BB are already sorted.

The following example demonstrate the process of this idea:

() <- search index

A = [(3), 6, 10, 23]
B = [(2), 7, 50, 55]
L = []                            // <- 2

In the first round, 22 is picked since 2<32 < 3 and going to be put into another list LL.

                                  // You can think the left most element
                                  // is shifted one by one
A = [(3), 6, 10, 23]              // [3, 6, 10, 23]
B = [2, (7), 50, 55]              // [7, 50, 55]
L = [2]                           // <- 3

After 22 is picked, we move the index of BB from 22 to 77. Next, 33 is picked since 3<73 < 7 and going to be put into LL.

A = [3, (6), 10, 23]              // [6, 10, 23]
B = [2, (7), 50, 55]              // [7, 50, 55]
L = [2, 3]                        // <- 6

After 33 is picked, we move the index of AA from 33 to 66. Next, 66 is picked since 6<76 < 7 and going to be put into LL.

A = [3, 6, (10), 23]              // [10, 23]
B = [2, (7), 50, 55]              // [7, 50, 55]
L = [2, 3, 6]                     // <- 7

After 66 is picked, we move the index of AA from 66 to 1010. Next, 77 is picked since 7<107 < 10 and going to be put into LL.

A = [3, 6, (10), 23]              // [10, 23]
B = [2, 7, (50), 55]              // [50, 55]
L = [2, 3, 6, 7]                  // <- 10

After 77 is picked, we move the index of BB from 77 to 5050. Next, 1010 is picked since 10<5010 < 50 and going to be put into LL.

A = [3, 6, 10, (23)]              // [23]
B = [2, 7, (50), 55]              // [50, 55]
L = [2, 3, 6, 7, 10]

After 1010 is picked, we move the index of AA from 1010 to 2323. Next, 2323 is picked since 23<5023 < 50 and going to be put into LL.

A = [3, 6, 10, 23]                // []
B = [2, 7, (50), 55]              // [50, 55]
L = [2, 3, 6, 7, 10, 23]

After 2323 is picked, there is no need to compare again since the 2323 is the last element in AA.

A = [3, 6, 10, 23]                // []
B = [2, 7, 50, 55]                // []
L = [2, 3, 6, 7, 10, 23, 50, 55]

Next, we can append all the rest elements from 5050 to the end of BB into the LL. Finally, we get a sort list LL.

Swapping the elements one by one

Another idea to merge the two sorted lists A=[3,6,10,23]A = [3, 6, 10, 23] and B=[2,7,50,55]B = [2, 7, 50, 55], is to couple them together into a list L=ABL = A \cup B

() <- element who will be moved
L = [3, 6, 10, 23, | (2), 7, 50, 55]

// The '|' doesn't exist! It's only a notation for better explanation.

and then move the minimal element of the later list(BB) to the right position of the former list(AA).

The way for finding right the position is to compare the elements one by one from the end of the former list(AA) to its head.

() <- element who will be moved
L = [3, 6, 10, (2), 23, | 7, 50, 55]

In our example, the 22 is swapped with 2323 since 2<232 < 23. Then we keep comparing 22 with 1010.

() <- element who will be moved
L = [3, 6, (2), 10, 23, | 7, 50, 55]

Similarly, the 2,102, 10 are swapped since 2<102 < 10.

() <- element who will be moved
L = [3, (2), 6,  10, 23, | 7, 50, 55]

Next, the 2,62, 6 are swapped since 2<62 < 6.

() <- element who will be moved
L = [(2), 3, 6,  10, 23, | 7, 50, 55]

Next, the 2,32, 3 are swapped since 2<32 < 3. After this round, there is nothing to compare, so the 22 is moved to its right position. Now A=[2,3,6,10,23]A = [2, 3, 6, 10, 23] and B=[7,50,55]B = [7, 50, 55]

In the same way, we can do this process again with 77. It's the minimal element of the later list BB now.

() <- element who will be moved
L = [2, 3, 6, 10, 23, | (7), 50, 55]
L = [2, 3, 6, 10, (7), 23, | 50, 55]
L = [2, 3, 6, (7), 10, 23, | 50, 55]

L = [2, 3, 6, 7, 10, 23, | 50, 55]

After this round, A=[2,3,6,7,10,23]A = [2, 3, 6, 7, 10, 23] and B=[50,55]B = [50, 55].

Then do it again with with 5050.

() <- element who will be moved
L = [2, 3, 6,  7, 10, 23, | (50), 55]

L = [2, 3, 6,  7, 10, 23, 50, | 55]

However, 5050 doesn't move because 23<5023 < 50. We just need to append 5050 to the end of the former list AA. After this round, A=[2,3,6,7,10,23,50]A = [2, 3, 6, 7, 10, 23, 50] and B=[55]B = [55].

() <- element who will be moved
L = [2, 3, 6,  7, 10, 23, 50, | (55)]

L = [2, 3, 6,  7, 10, 23, 50, 55]

It's same to 5555. It doesn't need to be moved since 50<5550 < 55, so just append it to the AA. Finally, A=[2,3,6,7,10,23,50,55]A = [2, 3, 6, 7, 10, 23, 50, 55] and B=[]B = [] is empty now. Now we have a sorted list L=AB=AL = A \cup B = A!

Which merge method is better

The first method use extra space to store the sorted results, rather than the second in-place solution. On the other hand, the second method needs more swapping executions and its a linear operation. For better performance, we take the first method as our approach here.

Actually, there is a way to save the extra space and it works as fast as the first method above. However, it's complicated. I will write another post for illustrating it. Please refer In-place merge sort in Elementary Algorithms to read it.

Algorithm

MergeSort(L):    mergeSort(L,1,L) \begin{aligned} & MergeSort(L): \\ & \space \space \space \space mergeSort(L, 1, \vert L \vert) \end{aligned}

mergeSort(L,l,r):    if l<r:        ml+r2        mergeSort(L,l,m)        mergeSort(L,m+1,r)        merge(L,l,m,r) \begin{aligned} & mergeSort(L, l, r): \\ & \space \space \space \space \text{if } l < r: \\ & \space \space \space \space \space \space \space \space m \leftarrow \lfloor \frac{l+r}{2} \rfloor \\ & \space \space \space \space \space \space \space \space mergeSort(L, l, m) \\ & \space \space \space \space \space \space \space \space mergeSort(L, m+1, r) \\ & \space \space \space \space \space \space \space \space merge(L, l, m, r) \\ \end{aligned}

merge(L,l,m,r):    L[]    il,jm+1,kl    while im and jr:        if L[i]<L[j]:            L[k]L[i]            ii+1        else:            L[k]L[j]            jj+1        kk+1    while im:        L[k]L[i]        ii+1        kk+1    while jr:        L[k]L[j]        jj+1        kk+1    for il to r:        L[i]L[i] \begin{aligned} & merge(L, l, m, r): \\ & \space \space \space \space L^\prime \leftarrow [] \\ & \space \space \space \space i \leftarrow l, j \leftarrow m+1, k \leftarrow l \\ & \space \space \space \space \text{while } i \leq m \text{ and } j \leq r: \\ & \space \space \space \space \space \space \space \space \text{if } L[i] < L[j]: \\ & \space \space \space \space \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[i] \\ & \space \space \space \space \space \space \space \space \space \space \space \space i \leftarrow i + 1 \\ & \space \space \space \space \space \space \space \space \text{else}: \\ & \space \space \space \space \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[j] \\ & \space \space \space \space \space \space \space \space \space \space \space \space j \leftarrow j + 1 \\ & \space \space \space \space \space \space \space \space k \leftarrow k + 1 \\ & \space \space \space \space \text{while } i \leq m: \\ & \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[i] \\ & \space \space \space \space \space \space \space \space i \leftarrow i + 1 \\ & \space \space \space \space \space \space \space \space k \leftarrow k + 1 \\ & \space \space \space \space \text{while } j \leq r: \\ & \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[j] \\ & \space \space \space \space \space \space \space \space j \leftarrow j + 1 \\ & \space \space \space \space \space \space \space \space k \leftarrow k + 1 \\ & \space \space \space \space \text{for } i \leftarrow l \text{ to } r: \\ & \space \space \space \space \space \space \space \space L[i] \leftarrow L^\prime[i] \\ \end{aligned}

Proof

Correctness of Merge

  List L

        <------   sorted   ------> <------   sorted  ------->
        <- merged -> <---  A  ---> <- merged -> <---  B  --->
  -----+---+--------+---+-----+---+-----+------+---+-----+---+-----
   ... | l | ...... | i | ... | m | m+1 | .... | j | ... | r | ...
  -----+---+--------+---+-----+---+-----+------+---+-----+---+-----
                      ^                          ^
                head of sublist A          head of sublist B

  List L'

   <--   merged  --> <---   empty   --->
  +---+-------+-----+---+-------+-------+
  | 1 |  ...  | k-1 | k | ..... | r-l+1 |
  +---+-------+-----+---+-------+-------+
                      ^
                head of empty area of list L'

  L[l...m]    : the sorted sublists for merging with L[m+1...r]
  L[m+1...r]  : the sorted sublists for merging with L[l...m]
  A, B        : the sublists containing elements that have NOT been merged yet
  L'[1...k-1] : the merged list from L[l...i-1] and L[m+1...j-1]

  i: The index of the first element in L[l...m] that has NOT been merged yet
  j: The index of the first element in L[m+1...r] that has NOT been merged yet
  k: The index of next merged element copied from L[i] or L[j]

Loop Invariant: At the beginning of the while-loop, the following conditions hold:

  1. Sublists L[i...m]L[i...m] and L[j...r]L[j...r] are sorted
  2. LL^\prime holds the elements from sublists L[l...i1]L[l...i-1] and L[m+1...j1]L[m+1...j-1]
  3. All elements in L[1...k1]L^\prime[1...k-1] is less or equal than sublists L[i...m]L[i...m] and L[j...r]L[j...r]
  4. LL^\prime are sorted. Formally, i[l+1,r],L[i1]L[i]\forall i \in [l + 1, r], L^\prime[i - 1] \leq L^\prime[i]

Then we use loop-invariants to prove:

  • Initialization: At the very beginning when k=1,i=l,j=m+1k = 1, i = l, j = m+1
    • the input L[l...m],L[m+1...r]L[l...m], L[m+1...r] are sorted so 1 holds
    • the list LL^\prime is empty so 2, 3, 4 hold
  • Maintenance: Consider the iteration k=xk = x
    • 1 is preserved since there is no change in LL
    • 2 is preserved because
      • If j>r(imL[i]<L[j])j > r \lor (i \leq m \land L[i] < L[j]), L[k]L[i]L^\prime[k] \leftarrow L[i]
        • then kk+1,ii+1k \leftarrow k+1, i \leftarrow i+1
        • 3 is preserved because L[k1]=L[i1]<L[j]L[j+1]...L[r]L[k-1] = L[i-1] < L[j] \leq L[j+1] \leq ... \leq L[r] and L[k1]=L[i1]L[i]...L[m]L[k-1] = L[i-1] \leq L[i] \leq ... \leq L[m]
      • Otherwise, L[k]L[j]L^\prime[k] \leftarrow L[j]
        • then kk+1,jj+1k \leftarrow k+1, j \leftarrow j+1
        • 3 is preserved because L[k1]=L[j1]<L[j]L[j+1]...L[r]L[k-1] = L[j-1] < L[j] \leq L[j+1] \leq ... \leq L[r] and L[k1]=L[j1]L[i]L[i+1]...L[m]L[k-1] = L[j-1] \leq L[i] \leq L[i+1] \leq ... \leq L[m]
    • The previous appended element must be smaller than the current selected minimal element or 1 is false
    • By 3, the next selected minimal element will be larger than current one
    • So 4 is also preserved
  • Termination
    • By 2, LL^\prime consists of the elements in L[l...r]L[l...r]
    • By 4, L[l...r]=L[l...r]L[l...r] = L^\prime[l...r] are sorted

Correctness of Merge Sort

Given a list LL with NN elements, the LL can be sorted by applying the above the MergeSort with l=1,r=Nl = 1, r = N.

  • Base step: When N=1N = 1, it's trivial.
  • Induction Hypothesis: Suppose this assumption holds when list has N=1,2,...,kN = 1, 2, ..., k elements
  • Induction Step: When N=k+1N = k + 1
    • the list LL is divide to L[1...m]L[1...m](mm elements) and L[m+1...N]L[m+1...N](NmN - m elements)
    • so m=1+(k+1)2=k2+1km = \lfloor \frac{1+(k+1)}{2} \rfloor = \lfloor \frac{k}{2} \rfloor + 1 \leq k
    • 1m0m11 \leq m \to 0 \leq m-1
      • kk1+m \to k \leq k-1+m
      • k+1k+m \to k+1 \leq k+m
      • (k+1)mk \to (k+1)-m \leq k
      • Nmk \to N-m \leq k
    • By our hypothesis, L[1...m]L[1...m] and L[m+1...N]L[m+1...N] can be sorted
    • By the proved correctness of merge above, the merged L[1...m]L[1...m] and L[m+1...N]L[m+1...N] is also sorted, so the proof is done

Complexity

  ^    +------------------------------------------------------+   Merge
  |    |                           N                          |   Complexity
  |    +------------------------------------------------------+
  |                 |                              |
  |                 v                              v
  |    +------------------------+    +------------------------+
  |    |           N/2          |    |           N/2          |   2 * O(N/2)
  |    +------------------------+    +------------------------+
  |         |              |              |              |
            v              v              v              v
  K    +---------+    +---------+    +---------+    +---------+
       |   N/4   |    |   N/4   |    |   N/4   |    |   N/4   |   4 * O(N/4)
  |    +---------+    +---------+    +---------+    +---------+
  |      |     |        |     |        |     |        |     |
  |      v     v        v     v        v     v        v     v
  |
  |                        .  .  .  .  .  .                       2^i * O(N/(2^i))
  |
  |    +---+  +---+  +---+                                +---+
  |    | 1 |  | 1 |  | 1 |  .  .  .  .  .  .  .  .  .  .  | 1 |   N * O(1)
  v    +---+  +---+  +---+                                +---+

  N: the number of list elements.
  K: K layers from N to 1.
     N/2^k = 1 => N = 2^K => K = log_2(N)

The above figure is the recursion tree of merge sort. The list containing NN elements is recursively divided to sort until there is only one elements. Suppose that there is KK times of division, therefore,

N2K=1N=2KK=log2N \begin{aligned} \frac{ N }{ 2^K } &= 1 \\ \to N &= 2^K \\ \to K &= \log_{ 2 }N \end{aligned}

On the other hand, the time complexity depends on the performance of merge Tmerge(N)T_{merge}(N). The used merge here is the basic version. It iteratively picks the minimal elements from both sublists then copied to another list LL^\prime. After all the elements in one sublist are all selected, we move the rest elements in the other sublist to list LL^\prime. Finally, we assigned L[i]L[i]L[i] \leftarrow L^\prime[i], i[1,N]\forall i \in [1, N]. Thus, Tmerge(N)T_{merge}(N) can be defined as

Tmerge(N)=cN T_{merge}(N) = c \cdot N

where cc is a constant reflecting the basic operations like comparisons or assignments for merging routine.

By the recursion tree

From the above figure, the total time for the merge sort is

Time=2cN2+4cN4+8cN8+...+Nc1=K(cN) \begin{aligned} Time &= 2 \cdot c \cdot \frac{N}{2} +4 \cdot c \cdot \frac{N}{4} + 8 \cdot c \cdot \frac{N}{8} + ... + N \cdot c \cdot 1 \\ &= K \cdot ( c \cdot N) \end{aligned} , where KK is the number of terms in the above equation.

The term is 2KcN2K2^K \cdot c \cdot \frac{N}{2^K} and it is from 2cN22 \cdot c \cdot \frac{N}{2} to Nc1N \cdot c \cdot 1, so we have K=log2NK = \log_{ 2 }N terms. As a result,

Time=cNlog2N Time = c \cdot N \cdot \log_{ 2 }N

Thus, the time complexity is O(NlogN)\mathcal{O}(N \log N).

By telescoping

Formally, since the merge sort repeatedly breaks down the NN-elements list into two N2\frac{N}{2}-elements sublists, the amount of time that merge sort, Tsort(N)T_{sort}(N), can be written as follows:

Tsort(N)=Tsort(N2)+Tsort(N2)+Tmerge(N)=2Tsort(N2)+Tmerge(N)=2Tsort(N2)+cN \begin{aligned} T_{sort}(N) &= T_{sort}(\frac{N}{2}) + T_{sort}(\frac{N}{2}) + T_{merge}(N) \\ &= 2 \cdot T_{sort}(\frac{N}{2}) + T_{merge}(N) \\ &= 2 \cdot T_{sort}(\frac{N}{2}) + c \cdot N \end{aligned}

Tsort(N)=2Tsort(N2)+cN=2(2Tsort(N4)+cN2)+cN=22Tsort(N4)+2cN=22(2Tsort(N8)+cN4)+2cN=23Tsort(N4)+3cN=...=2KTsort(N2K)+KcN=NTsort(1)+KcN=N1+KcN=N+cNlog2N \begin{aligned} T_{sort}(N) &= 2 \cdot T_{sort}(\frac{N}{2}) + c \cdot N \\ &= 2 \cdot (2 \cdot T_{sort}(\frac{N}{4}) + c \cdot \frac{N}{2}) + c \cdot N \\ &= 2^2 \cdot T_{sort}(\frac{N}{4}) + 2 \cdot c \cdot N \\ &= 2^2 \cdot (2 \cdot T_{sort}(\frac{N}{8}) + c \cdot \frac{N}{4}) + 2 \cdot c \cdot N \\ &= 2^3 \cdot T_{sort}(\frac{N}{4}) + 3 \cdot c \cdot N \\ &= ... \\ &= 2^K \cdot T_{sort}(\frac{N}{2^K}) + K \cdot c \cdot N \\ &= N \cdot T_{sort}(1) + K \cdot c \cdot N \\ &= N \cdot 1 + K \cdot c \cdot N \\ &= N + c \cdot N \cdot \log_{ 2 }N \end{aligned}

Thus, the time complexity is O(NlogN)\mathcal{O}(N \log N).

Implementation

See the files on gist here.

Appendix

Reference

results matching ""

    No results matching ""