Merge Sort

Idea

Merge sort is an efficient algorithm that applies the concepts of divide and conquer to sort the list.

The key idea of divide and conquer is to recursively break down the problems into two or more sub-problems and they are same or related to the original problem, until these divided sub-problems are simple enough to solve directly. Then, the solutions of the original problem can be combined and derived by the solutions of all the sub-problems.

Divide and conquer

The calculation of Fibonacci number, $F(n) = F(n-1) + F(n-2)$ , is one example. To calculate $F(n)$ , it needs to find $F(n-1)$ and $F(n-2)$ . Similarly, to calculate $F(n-1)$ , it needs to $F(n-2)$ and $F(n-3)$ . The sub-problems for calculating $F(n-1)$ and $F(n-2)$ have same form as the one for $F(n)$ .

Recursively, we will need to get $F(n-1)$ , $F(n-2)$ , ..., $F(2)$ , $F(1)$ and $F(1)$ and $F(2)$ are easy enough to solve directly. They are both $1$ . Thus, $F(3) = F(2) + F(1) = 2$ , $F(4) = F(3) + F(2) = 3$ , ... and then $F(n)$ can be computed.

Dividing the sorting-problem

Let's apply this concept to the sorting problem. If we want to sort the list $L = [6, 3, 7, 1, 9, 2, 5]$ , the sub-lists $[6, 3, 7, 1], [9, 2, 5]$ must also be sorted, so we can narrow down our problem scope for handling the sub-lists. Next, $[6, 3, 7, 1]$ can be divided to $[6, 3], [7, 1]$ and $[6, 3]$ also can be split into $[6]$ and $[3]$ . Finally, $[6], [3]$ are not dividable so we stop breaking down the list.

       [6, 3, 7, 1, 9, 2, 5]
         /               \
   [6, 3, 7, 1]       [9, 2, 5]
    /        \          /     \
 [6, 3]    [7, 1]    [9, 2]  [5]
 /    \    /    \    /    \
[6]  [3]  [7]  [1]  [9]  [2]

In the same way, the whole list can be divided into $[6], [3], [7], [1], [9], [2], [5]$ .

Conquering the sub-problems

After there is only one element left, the subproblem is solved by nature since it's already sorted.

However, the problem becomes

how do we combine these sorted chunks into a sorted list

We need a method that can merge two sorted lists, $L_1[1...X]$ and $L_2[1...Y]$ , where $X, Y \geq 1$ , into a bigger sorted list $L[1...(X+Y)]$ .

Combining all the results of sub-sorting-problem

Suppose we have two sorted lists $A = [3, 6, 10, 23]$ and $B = [2, 7, 50, 55]$ . We provide two ways to merge them into a sorted list.

Picking the smallest elements one by one

The simplest method is to pick the smallest elements iteratively by searching both lists from the minimal to maximal.

We only need to compare the left most elements of both lists and pick the smaller one since $A$ and $B$ are already sorted.

The following example demonstrate the process of this idea:

() <- search index

A = [(3), 6, 10, 23]
B = [(2), 7, 50, 55]
L = []                            // <- 2

In the first round, $2$ is picked since $2 < 3$ and going to be put into another list $L$ .

                                  // You can think the left most element
                                  // is shifted one by one
A = [(3), 6, 10, 23]              // [3, 6, 10, 23]
B = [2, (7), 50, 55]              // [7, 50, 55]
L = [2]                           // <- 3

After $2$ is picked, we move the index of $B$ from $2$ to $7$ . Next, $3$ is picked since $3 < 7$ and going to be put into $L$ .

A = [3, (6), 10, 23]              // [6, 10, 23]
B = [2, (7), 50, 55]              // [7, 50, 55]
L = [2, 3]                        // <- 6

After $3$ is picked, we move the index of $A$ from $3$ to $6$ . Next, $6$ is picked since $6 < 7$ and going to be put into $L$ .

A = [3, 6, (10), 23]              // [10, 23]
B = [2, (7), 50, 55]              // [7, 50, 55]
L = [2, 3, 6]                     // <- 7

After $6$ is picked, we move the index of $A$ from $6$ to $10$ . Next, $7$ is picked since $7 < 10$ and going to be put into $L$ .

A = [3, 6, (10), 23]              // [10, 23]
B = [2, 7, (50), 55]              // [50, 55]
L = [2, 3, 6, 7]                  // <- 10

After $7$ is picked, we move the index of $B$ from $7$ to $50$ . Next, $10$ is picked since $10 < 50$ and going to be put into $L$ .

A = [3, 6, 10, (23)]              // [23]
B = [2, 7, (50), 55]              // [50, 55]
L = [2, 3, 6, 7, 10]

After $10$ is picked, we move the index of $A$ from $10$ to $23$ . Next, $23$ is picked since $23 < 50$ and going to be put into $L$ .

A = [3, 6, 10, 23]                // []
B = [2, 7, (50), 55]              // [50, 55]
L = [2, 3, 6, 7, 10, 23]

After $23$ is picked, there is no need to compare again since the $23$ is the last element in $A$ .

A = [3, 6, 10, 23]                // []
B = [2, 7, 50, 55]                // []
L = [2, 3, 6, 7, 10, 23, 50, 55]

Next, we can append all the rest elements from $50$ to the end of $B$ into the $L$ . Finally, we get a sort list $L$ .

Swapping the elements one by one

Another idea to merge the two sorted lists $A = [3, 6, 10, 23]$ and $B = [2, 7, 50, 55]$ , is to couple them together into a list $L = A \cup B$

() <- element who will be moved
L = [3, 6, 10, 23, | (2), 7, 50, 55]

// The '|' doesn't exist! It's only a notation for better explanation.

and then move the minimal element of the later list( $B$ ) to the right position of the former list( $A$ ).

The way for finding right the position is to compare the elements one by one from the end of the former list( $A$ ) to its head.

() <- element who will be moved
L = [3, 6, 10, (2), 23, | 7, 50, 55]

In our example, the $2$ is swapped with $23$ since $2 < 23$ . Then we keep comparing $2$ with $10$ .

() <- element who will be moved
L = [3, 6, (2), 10, 23, | 7, 50, 55]

Similarly, the $2, 10$ are swapped since $2 < 10$ .

() <- element who will be moved
L = [3, (2), 6,  10, 23, | 7, 50, 55]

Next, the $2, 6$ are swapped since $2 < 6$ .

() <- element who will be moved
L = [(2), 3, 6,  10, 23, | 7, 50, 55]

Next, the $2, 3$ are swapped since $2 < 3$ . After this round, there is nothing to compare, so the $2$ is moved to its right position. Now $A = [2, 3, 6, 10, 23]$ and $B = [7, 50, 55]$

In the same way, we can do this process again with $7$ . It's the minimal element of the later list $B$ now.

() <- element who will be moved
L = [2, 3, 6, 10, 23, | (7), 50, 55]
L = [2, 3, 6, 10, (7), 23, | 50, 55]
L = [2, 3, 6, (7), 10, 23, | 50, 55]

L = [2, 3, 6, 7, 10, 23, | 50, 55]

After this round, $A = [2, 3, 6, 7, 10, 23]$ and $B = [50, 55]$ .

Then do it again with with $50$ .

() <- element who will be moved
L = [2, 3, 6,  7, 10, 23, | (50), 55]

L = [2, 3, 6,  7, 10, 23, 50, | 55]

However, $50$ doesn't move because $23 < 50$ . We just need to append $50$ to the end of the former list $A$ . After this round, $A = [2, 3, 6, 7, 10, 23, 50]$ and $B = [55]$ .

() <- element who will be moved
L = [2, 3, 6,  7, 10, 23, 50, | (55)]

L = [2, 3, 6,  7, 10, 23, 50, 55]

It's same to $55$ . It doesn't need to be moved since $50 < 55$ , so just append it to the $A$ . Finally, $A = [2, 3, 6, 7, 10, 23, 50, 55]$ and $B = []$ is empty now. Now we have a sorted list $L = A \cup B = A$ !

Which merge method is better

The first method use extra space to store the sorted results, rather than the second in-place solution. On the other hand, the second method needs more swapping executions and its a linear operation. For better performance, we take the first method as our approach here.

Actually, there is a way to save the extra space and it works as fast as the first method above. However, it's complicated. I will write another post for illustrating it. Please refer In-place merge sort in Elementary Algorithms to read it.

Algorithm

$\begin{aligned} & MergeSort(L): \\ & \space \space \space \space mergeSort(L, 1, \vert L \vert) \end{aligned}$

$\begin{aligned} & mergeSort(L, l, r): \\ & \space \space \space \space \text{if } l < r: \\ & \space \space \space \space \space \space \space \space m \leftarrow \lfloor \frac{l+r}{2} \rfloor \\ & \space \space \space \space \space \space \space \space mergeSort(L, l, m) \\ & \space \space \space \space \space \space \space \space mergeSort(L, m+1, r) \\ & \space \space \space \space \space \space \space \space merge(L, l, m, r) \\ \end{aligned}$

$\begin{aligned} & merge(L, l, m, r): \\ & \space \space \space \space L^\prime \leftarrow [] \\ & \space \space \space \space i \leftarrow l, j \leftarrow m+1, k \leftarrow l \\ & \space \space \space \space \text{while } i \leq m \text{ and } j \leq r: \\ & \space \space \space \space \space \space \space \space \text{if } L[i] < L[j]: \\ & \space \space \space \space \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[i] \\ & \space \space \space \space \space \space \space \space \space \space \space \space i \leftarrow i + 1 \\ & \space \space \space \space \space \space \space \space \text{else}: \\ & \space \space \space \space \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[j] \\ & \space \space \space \space \space \space \space \space \space \space \space \space j \leftarrow j + 1 \\ & \space \space \space \space \space \space \space \space k \leftarrow k + 1 \\ & \space \space \space \space \text{while } i \leq m: \\ & \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[i] \\ & \space \space \space \space \space \space \space \space i \leftarrow i + 1 \\ & \space \space \space \space \space \space \space \space k \leftarrow k + 1 \\ & \space \space \space \space \text{while } j \leq r: \\ & \space \space \space \space \space \space \space \space L^\prime[k] \leftarrow L[j] \\ & \space \space \space \space \space \space \space \space j \leftarrow j + 1 \\ & \space \space \space \space \space \space \space \space k \leftarrow k + 1 \\ & \space \space \space \space \text{for } i \leftarrow l \text{ to } r: \\ & \space \space \space \space \space \space \space \space L[i] \leftarrow L^\prime[i] \\ \end{aligned}$

Proof

Correctness of Merge

  List L

        <------   sorted   ------> <------   sorted  ------->
        <- merged -> <---  A  ---> <- merged -> <---  B  --->
  -----+---+--------+---+-----+---+-----+------+---+-----+---+-----
   ... | l | ...... | i | ... | m | m+1 | .... | j | ... | r | ...
  -----+---+--------+---+-----+---+-----+------+---+-----+---+-----
                      ^                          ^
                head of sublist A          head of sublist B

  List L'

   <--   merged  --> <---   empty   --->
  +---+-------+-----+---+-------+-------+
  | 1 |  ...  | k-1 | k | ..... | r-l+1 |
  +---+-------+-----+---+-------+-------+
                      ^
                head of empty area of list L'

  L[l...m]    : the sorted sublists for merging with L[m+1...r]
  L[m+1...r]  : the sorted sublists for merging with L[l...m]
  A, B        : the sublists containing elements that have NOT been merged yet
  L'[1...k-1] : the merged list from L[l...i-1] and L[m+1...j-1]

  i: The index of the first element in L[l...m] that has NOT been merged yet
  j: The index of the first element in L[m+1...r] that has NOT been merged yet
  k: The index of next merged element copied from L[i] or L[j]

Loop Invariant: At the beginning of the while-loop, the following conditions hold:

Sublists $L[i...m]$ and $L[j...r]$ are sorted
$L^\prime$ holds the elements from sublists $L[l...i-1]$ and $L[m+1...j-1]$
All elements in $L^\prime[1...k-1]$ is less or equal than sublists $L[i...m]$ and $L[j...r]$
$L^\prime$ are sorted. Formally, $\forall i \in [l + 1, r], L^\prime[i - 1] \leq L^\prime[i]$

Then we use loop-invariants to prove:

Initialization: At the very beginning when $k = 1, i = l, j = m+1$ $k = 1, i = l, j = m + 1$
- the input $L[l...m], L[m+1...r]$ are sorted so 1 holds
- the list $L^\prime$ is empty so 2, 3, 4 hold
Maintenance: Consider the iteration $k = x$ $k = x$
- 1 is preserved since there is no change in $L$
- 2 is preserved because
  - If $j > r \lor (i \leq m \land L[i] < L[j])$ $j > r \lor (i \leq m \land L [i] < L [j])$ , $L^\prime[k] \leftarrow L[i]$ $L^{' } [k] \leftarrow L [i]$
    - then $k \leftarrow k+1, i \leftarrow i+1$
    - 3 is preserved because $L[k-1] = L[i-1] < L[j] \leq L[j+1] \leq ... \leq L[r]$ and $L[k-1] = L[i-1] \leq L[i] \leq ... \leq L[m]$
  - Otherwise, $L^\prime[k] \leftarrow L[j]$ $L^{' } [k] \leftarrow L [j]$
    - then $k \leftarrow k+1, j \leftarrow j+1$
    - 3 is preserved because $L[k-1] = L[j-1] < L[j] \leq L[j+1] \leq ... \leq L[r]$ and $L[k-1] = L[j-1] \leq L[i] \leq L[i+1] \leq ... \leq L[m]$
- The previous appended element must be smaller than the current selected minimal element or 1 is false
- By 3, the next selected minimal element will be larger than current one
- So 4 is also preserved
Termination
- By 2, $L^\prime$ consists of the elements in $L[l...r]$
- By 4, $L[l...r] = L^\prime[l...r]$ are sorted

Correctness of Merge Sort

Given a list $L$ with $N$ elements, the $L$ can be sorted by applying the above the MergeSort with $l = 1, r = N$ .

Base step: When $N = 1$ , it's trivial.
Induction Hypothesis: Suppose this assumption holds when list has $N = 1, 2, ..., k$ elements
Induction Step: When $N = k + 1$ $N = k + 1$
- the list $L$ is divide to $L[1...m]$ ( $m$ elements) and $L[m+1...N]$ ( $N - m$ elements)
- so $m = \lfloor \frac{1+(k+1)}{2} \rfloor = \lfloor \frac{k}{2} \rfloor + 1 \leq k$
- $1 \leq m \to 0 \leq m-1$ $1 \leq m \to 0 \leq m - 1$
  - $\to k \leq k-1+m$
  - $\to k+1 \leq k+m$
  - $\to (k+1)-m \leq k$
  - $\to N-m \leq k$
- By our hypothesis, $L[1...m]$ and $L[m+1...N]$ can be sorted
- By the proved correctness of merge above, the merged $L[1...m]$ and $L[m+1...N]$ is also sorted, so the proof is done

Complexity

  ^    +------------------------------------------------------+   Merge
  |    |                           N                          |   Complexity
  |    +------------------------------------------------------+
  |                 |                              |
  |                 v                              v
  |    +------------------------+    +------------------------+
  |    |           N/2          |    |           N/2          |   2 * O(N/2)
  |    +------------------------+    +------------------------+
  |         |              |              |              |
            v              v              v              v
  K    +---------+    +---------+    +---------+    +---------+
       |   N/4   |    |   N/4   |    |   N/4   |    |   N/4   |   4 * O(N/4)
  |    +---------+    +---------+    +---------+    +---------+
  |      |     |        |     |        |     |        |     |
  |      v     v        v     v        v     v        v     v
  |
  |                        .  .  .  .  .  .                       2^i * O(N/(2^i))
  |
  |    +---+  +---+  +---+                                +---+
  |    | 1 |  | 1 |  | 1 |  .  .  .  .  .  .  .  .  .  .  | 1 |   N * O(1)
  v    +---+  +---+  +---+                                +---+

  N: the number of list elements.
  K: K layers from N to 1.
     N/2^k = 1 => N = 2^K => K = log_2(N)

The above figure is the recursion tree of merge sort. The list containing $N$ elements is recursively divided to sort until there is only one elements. Suppose that there is $K$ times of division, therefore,

$\begin{aligned} \frac{ N }{ 2^K } &= 1 \\ \to N &= 2^K \\ \to K &= \log_{ 2 }N \end{aligned}$

On the other hand, the time complexity depends on the performance of merge $T_{merge}(N)$ . The used merge here is the basic version. It iteratively picks the minimal elements from both sublists then copied to another list $L^\prime$ . After all the elements in one sublist are all selected, we move the rest elements in the other sublist to list $L^\prime$ . Finally, we assigned $L[i] \leftarrow L^\prime[i]$ , $\forall i \in [1, N]$ . Thus, $T_{merge}(N)$ can be defined as

$T_{merge}(N) = c \cdot N$

where $c$ is a constant reflecting the basic operations like comparisons or assignments for merging routine.

By the recursion tree

From the above figure, the total time for the merge sort is

$\begin{aligned} Time &= 2 \cdot c \cdot \frac{N}{2} +4 \cdot c \cdot \frac{N}{4} + 8 \cdot c \cdot \frac{N}{8} + ... + N \cdot c \cdot 1 \\ &= K \cdot ( c \cdot N) \end{aligned}$ , where $K$ is the number of terms in the above equation.

The term is $2^K \cdot c \cdot \frac{N}{2^K}$ and it is from $2 \cdot c \cdot \frac{N}{2}$ to $N \cdot c \cdot 1$ , so we have $K = \log_{ 2 }N$ terms. As a result,

$Time = c \cdot N \cdot \log_{ 2 }N$

Thus, the time complexity is $\mathcal{O}(N \log N)$ .

By telescoping

Formally, since the merge sort repeatedly breaks down the $N$ -elements list into two $\frac{N}{2}$ -elements sublists, the amount of time that merge sort, $T_{sort}(N)$ , can be written as follows:

$\begin{aligned} T_{sort}(N) &= T_{sort}(\frac{N}{2}) + T_{sort}(\frac{N}{2}) + T_{merge}(N) \\ &= 2 \cdot T_{sort}(\frac{N}{2}) + T_{merge}(N) \\ &= 2 \cdot T_{sort}(\frac{N}{2}) + c \cdot N \end{aligned}$

$\begin{aligned} T_{sort}(N) &= 2 \cdot T_{sort}(\frac{N}{2}) + c \cdot N \\ &= 2 \cdot (2 \cdot T_{sort}(\frac{N}{4}) + c \cdot \frac{N}{2}) + c \cdot N \\ &= 2^2 \cdot T_{sort}(\frac{N}{4}) + 2 \cdot c \cdot N \\ &= 2^2 \cdot (2 \cdot T_{sort}(\frac{N}{8}) + c \cdot \frac{N}{4}) + 2 \cdot c \cdot N \\ &= 2^3 \cdot T_{sort}(\frac{N}{4}) + 3 \cdot c \cdot N \\ &= ... \\ &= 2^K \cdot T_{sort}(\frac{N}{2^K}) + K \cdot c \cdot N \\ &= N \cdot T_{sort}(1) + K \cdot c \cdot N \\ &= N \cdot 1 + K \cdot c \cdot N \\ &= N + c \cdot N \cdot \log_{ 2 }N \end{aligned}$