Merge Sort
Idea
Merge sort is an efficient algorithm that applies the concepts of divide and conquer to sort the list.
The key idea of divide and conquer is to recursively break down the problems into two or more sub-problems and they are same or related to the original problem, until these divided sub-problems are simple enough to solve directly. Then, the solutions of the original problem can be combined and derived by the solutions of all the sub-problems.
Divide and conquer
The calculation of Fibonacci number, , is one example. To calculate , it needs to find and . Similarly, to calculate , it needs to and . The sub-problems for calculating and have same form as the one for .
Recursively, we will need to get , , ..., , and and are easy enough to solve directly. They are both . Thus, , , ... and then can be computed.
Dividing the sorting-problem
Let's apply this concept to the sorting problem. If we want to sort the list , the sub-lists must also be sorted, so we can narrow down our problem scope for handling the sub-lists. Next, can be divided to and also can be split into and . Finally, are not dividable so we stop breaking down the list.
[6, 3, 7, 1, 9, 2, 5]
/ \
[6, 3, 7, 1] [9, 2, 5]
/ \ / \
[6, 3] [7, 1] [9, 2] [5]
/ \ / \ / \
[6] [3] [7] [1] [9] [2]
In the same way, the whole list can be divided into .
Conquering the sub-problems
After there is only one element left, the subproblem is solved by nature since it's already sorted.
However, the problem becomes
how do we combine these sorted chunks into a sorted list
We need a method that can merge two sorted lists, and , where , into a bigger sorted list .
Combining all the results of sub-sorting-problem
Suppose we have two sorted lists and . We provide two ways to merge them into a sorted list.
Picking the smallest elements one by one
The simplest method is to pick the smallest elements iteratively by searching both lists from the minimal to maximal.
We only need to compare the left most elements of both lists and pick the smaller one since and are already sorted.
The following example demonstrate the process of this idea:
() <- search index
A = [(3), 6, 10, 23]
B = [(2), 7, 50, 55]
L = [] // <- 2
In the first round, is picked since and going to be put into another list .
// You can think the left most element
// is shifted one by one
A = [(3), 6, 10, 23] // [3, 6, 10, 23]
B = [2, (7), 50, 55] // [7, 50, 55]
L = [2] // <- 3
After is picked, we move the index of from to . Next, is picked since and going to be put into .
A = [3, (6), 10, 23] // [6, 10, 23]
B = [2, (7), 50, 55] // [7, 50, 55]
L = [2, 3] // <- 6
After is picked, we move the index of from to . Next, is picked since and going to be put into .
A = [3, 6, (10), 23] // [10, 23]
B = [2, (7), 50, 55] // [7, 50, 55]
L = [2, 3, 6] // <- 7
After is picked, we move the index of from to . Next, is picked since and going to be put into .
A = [3, 6, (10), 23] // [10, 23]
B = [2, 7, (50), 55] // [50, 55]
L = [2, 3, 6, 7] // <- 10
After is picked, we move the index of from to . Next, is picked since and going to be put into .
A = [3, 6, 10, (23)] // [23]
B = [2, 7, (50), 55] // [50, 55]
L = [2, 3, 6, 7, 10]
After is picked, we move the index of from to . Next, is picked since and going to be put into .
A = [3, 6, 10, 23] // []
B = [2, 7, (50), 55] // [50, 55]
L = [2, 3, 6, 7, 10, 23]
After is picked, there is no need to compare again since the is the last element in .
A = [3, 6, 10, 23] // []
B = [2, 7, 50, 55] // []
L = [2, 3, 6, 7, 10, 23, 50, 55]
Next, we can append all the rest elements from to the end of into the . Finally, we get a sort list .
Swapping the elements one by one
Another idea to merge the two sorted lists and , is to couple them together into a list
() <- element who will be moved
L = [3, 6, 10, 23, | (2), 7, 50, 55]
// The '|' doesn't exist! It's only a notation for better explanation.
and then move the minimal element of the later list() to the right position of the former list().
The way for finding right the position is to compare the elements one by one from the end of the former list() to its head.
() <- element who will be moved
L = [3, 6, 10, (2), 23, | 7, 50, 55]
In our example, the is swapped with since . Then we keep comparing with .
() <- element who will be moved
L = [3, 6, (2), 10, 23, | 7, 50, 55]
Similarly, the are swapped since .
() <- element who will be moved
L = [3, (2), 6, 10, 23, | 7, 50, 55]
Next, the are swapped since .
() <- element who will be moved
L = [(2), 3, 6, 10, 23, | 7, 50, 55]
Next, the are swapped since . After this round, there is nothing to compare, so the is moved to its right position. Now and
In the same way, we can do this process again with . It's the minimal element of the later list now.
() <- element who will be moved
L = [2, 3, 6, 10, 23, | (7), 50, 55]
L = [2, 3, 6, 10, (7), 23, | 50, 55]
L = [2, 3, 6, (7), 10, 23, | 50, 55]
L = [2, 3, 6, 7, 10, 23, | 50, 55]
After this round, and .
Then do it again with with .
() <- element who will be moved
L = [2, 3, 6, 7, 10, 23, | (50), 55]
L = [2, 3, 6, 7, 10, 23, 50, | 55]
However, doesn't move because . We just need to append to the end of the former list . After this round, and .
() <- element who will be moved
L = [2, 3, 6, 7, 10, 23, 50, | (55)]
L = [2, 3, 6, 7, 10, 23, 50, 55]
It's same to . It doesn't need to be moved since , so just append it to the . Finally, and is empty now. Now we have a sorted list !
Which merge method is better
The first method use extra space to store the sorted results, rather than the second in-place solution. On the other hand, the second method needs more swapping executions and its a linear operation. For better performance, we take the first method as our approach here.
Actually, there is a way to save the extra space and it works as fast as the first method above. However, it's complicated. I will write another post for illustrating it. Please refer In-place merge sort in Elementary Algorithms to read it.
Algorithm
Proof
Correctness of Merge
List L
<------ sorted ------> <------ sorted ------->
<- merged -> <--- A ---> <- merged -> <--- B --->
-----+---+--------+---+-----+---+-----+------+---+-----+---+-----
... | l | ...... | i | ... | m | m+1 | .... | j | ... | r | ...
-----+---+--------+---+-----+---+-----+------+---+-----+---+-----
^ ^
head of sublist A head of sublist B
List L'
<-- merged --> <--- empty --->
+---+-------+-----+---+-------+-------+
| 1 | ... | k-1 | k | ..... | r-l+1 |
+---+-------+-----+---+-------+-------+
^
head of empty area of list L'
L[l...m] : the sorted sublists for merging with L[m+1...r]
L[m+1...r] : the sorted sublists for merging with L[l...m]
A, B : the sublists containing elements that have NOT been merged yet
L'[1...k-1] : the merged list from L[l...i-1] and L[m+1...j-1]
i: The index of the first element in L[l...m] that has NOT been merged yet
j: The index of the first element in L[m+1...r] that has NOT been merged yet
k: The index of next merged element copied from L[i] or L[j]
Loop Invariant: At the beginning of the while-loop, the following conditions hold:
- Sublists and are sorted
- holds the elements from sublists and
- All elements in is less or equal than sublists and
- are sorted. Formally,
Then we use loop-invariants to prove:
- Initialization: At the very beginning when
- the input are sorted so 1 holds
- the list is empty so 2, 3, 4 hold
- Maintenance: Consider the iteration
- 1 is preserved since there is no change in
- 2 is preserved because
- If ,
- then
- 3 is preserved because and
- Otherwise,
- then
- 3 is preserved because and
- If ,
- The previous appended element must be smaller than the current selected minimal element or 1 is false
- By 3, the next selected minimal element will be larger than current one
- So 4 is also preserved
- Termination
- By 2, consists of the elements in
- By 4, are sorted
Correctness of Merge Sort
Given a list with elements, the can be sorted by applying the above the MergeSort with .
- Base step: When , it's trivial.
- Induction Hypothesis: Suppose this assumption holds when list has elements
- Induction Step: When
- the list is divide to ( elements) and ( elements)
- so
- By our hypothesis, and can be sorted
- By the proved correctness of merge above, the merged and is also sorted, so the proof is done
Complexity
^ +------------------------------------------------------+ Merge
| | N | Complexity
| +------------------------------------------------------+
| | |
| v v
| +------------------------+ +------------------------+
| | N/2 | | N/2 | 2 * O(N/2)
| +------------------------+ +------------------------+
| | | | |
v v v v
K +---------+ +---------+ +---------+ +---------+
| N/4 | | N/4 | | N/4 | | N/4 | 4 * O(N/4)
| +---------+ +---------+ +---------+ +---------+
| | | | | | | | |
| v v v v v v v v
|
| . . . . . . 2^i * O(N/(2^i))
|
| +---+ +---+ +---+ +---+
| | 1 | | 1 | | 1 | . . . . . . . . . . | 1 | N * O(1)
v +---+ +---+ +---+ +---+
N: the number of list elements.
K: K layers from N to 1.
N/2^k = 1 => N = 2^K => K = log_2(N)
The above figure is the recursion tree of merge sort. The list containing elements is recursively divided to sort until there is only one elements. Suppose that there is times of division, therefore,
On the other hand, the time complexity depends on the performance of merge . The used merge here is the basic version. It iteratively picks the minimal elements from both sublists then copied to another list . After all the elements in one sublist are all selected, we move the rest elements in the other sublist to list . Finally, we assigned , . Thus, can be defined as
where is a constant reflecting the basic operations like comparisons or assignments for merging routine.
By the recursion tree
From the above figure, the total time for the merge sort is
, where is the number of terms in the above equation.
The term is and it is from to , so we have terms. As a result,
Thus, the time complexity is .
By telescoping
Formally, since the merge sort repeatedly breaks down the -elements list into two -elements sublists, the amount of time that merge sort, , can be written as follows:
Thus, the time complexity is .
Implementation
See the files on gist here.