Insertion sort

Idea

The basic concept is similar to Selection Sort. Considering there are two lists. One is already sorted, and the other is unsorted, denoted LsortedL_{sorted} and LunsortedL_{unsorted} respectively. The key idea is to pick the element from LunsortedL_{unsorted} one by one and then insert them into the correct positions of LsortedL_{sorted}. Suppose we have Lsorted=[3,8,34]L_{sorted} = [3, 8, 34] and Lunsorted=[23,2,67,34,97]L_{unsorted} = [23, 2, 67, 34, 97]:

  • Step 1
    • Pick 2323 (which is the first element) from LunsortedL_{unsorted}, and insert it into LsortedL_{sorted}
    • Find a position in LsortedL_{sorted} such that all elements before it is less than or equal to 2323 and all elements after it is greater than 2323
    • Start comparing it from the last(maximal) element to the first(minimal) one in LsortedL_{sorted} (Or you can do same thing from the first element to the last one)
    • 3434 is greater than 2323, so we keep moving
    • Next, we found that 88 is less than 2323
    • A-ha! 2323 should be inserted between 88 and 3434
    • The LsortedL_{sorted} and LunsortedL_{unsorted} are updated to [3,8,23,34][3, 8, 23, 34] and [2,67,34,97][2, 67, 34, 97] respectively.
  • Step 2
    • Pick the current first element of LunsortedL_{unsorted}, 22, and insert it into LsortedL_{sorted}
    • Same as the previous step, we start comparing 22 from the maximal element of LsortedL_{sorted} to find the position to insert
    • 3434 is obviously larger than 22, so we should keep moving
    • In this step, we can not find any element less than or equal to 22 after the all elements in LsortedL_{sorted} are checked
    • Thus, the 22 is the minimal value among these elements
    • We should put 22 as the first element in LsortedL_{sorted}
    • The LsortedL_{sorted} and LunsortedL_{unsorted} are updated to [2,3,8,23,34][2, 3, 8, 23, 34] and [67,34,97][67, 34, 97] respectively
  • Step 3
    • pick the current first element of LunsortedL_{unsorted}, 6767, and then insert it into LsortedL_{sorted}
    • Start comparing 6767 with 3434, we found 6767 is greater
    • It means that 6767 is the maximal value among these elements
    • Therefore, 6767 should be inserted at the last position of LsortedL_{sorted}
    • The LsortedL_{sorted} and LunsortedL_{unsorted} are updated to [2,3,8,23,34,67][2, 3, 8, 23, 34, 67] and [34,97][34, 97] respectively
  • Step 4
    • 3434 is picked to compare with the elements in LsortedL_{sorted}.
    • 6767 is greater than 3434, so go next
    • 3434 is equal to 3434, so we stop here
    • The picked 3434 should be inserted between the existed 3434 and 6767
    • so the LsortedL_{sorted} and LunsortedL_{unsorted} are updated to [2,3,8,23,34,34,67][2, 3, 8, 23, 34, 34, 67] and [97][97] respectively.
  • Step 5
    • 9797 is picked to insert.
    • 9797 is greater than 6767,
    • so it should be put to the last position of LsortedL_{sorted}
    • Finally, LunsortedL_{unsorted} is empty and Lsorted=[2,3,8,23,34,34,67,97]L_{sorted} = [2, 3, 8, 23, 34, 34, 67, 97].

How to find the inserted position

We can use the following method to find the first element whose value is less than or equal to the picked element: Position(L,x):    iN    while i>0 and L[i]>x:        ii1    return i \begin{aligned} & Position(L, x): \\ & \space \space \space \space i \leftarrow N\\ & \space \space \space \space \text{while } i > 0 \text{ and } L[i] > x: \\ & \space \space \space \space \space \space \space \space i \leftarrow i - 1 \\ & \space \space \space \space \text{return} \space i \\ \end{aligned} ,where xx is the element needs to be inserted, L[i]L[i] is the iith element in the sorted list LL, and N=LN = \vert L \vert is the length of LL.

After getting the position p=Position(L,x)p = Position(L, x) given the element xx, we need to insert xx between L[p]L[p] and L[p+1]L[p+1]. (If p=0p = 0, then we insert xx as the first element L[1]L[1]. If p=Np = N, then we insert xx as the last element L[p+1]L[p + 1].)

Dividing one list into unsorted list and sorted list

In implementation, we usually divide the source list LL into two parts. One is sorted, the other is unsorted. They are denoted LsortedL_{sorted} and LunsortedL_{unsorted} respectively. This is better for memory usage than creating another list to put the sorted results.

Suppose we have L=[73,24,37,9,97,29]=LsortedLunsortedL = [73, 24, 37, 9, 97, 29] = L_{sorted} \cup L_{unsorted}, where LsortedL_{sorted} and LunsortedL_{unsorted} are initialized to [][] and [73,24,37,9,97,29][73, 24, 37, 9, 97, 29].

  • First round
    • 7373 is picked, but there is nothing could be compared
    • so we just put it into LsortedL_{sorted}
    • Lsorted=[73]L_{sorted} = [73] and Lunsorted=[24,37,9,97,29]L_{unsorted} = [24, 37, 9, 97, 29]
    • now L=LsortedLunsorted=[7324,37,9,97,29]L = L_{sorted} \cup L_{unsorted} = [73 \vert 24, 37, 9, 97, 29]
  • Second round
    • 2424 is picked and p=Position(Lsorted,24)=0p = Position(L_{sorted}, 24) = 0
    • so, we should insert 2424 as the first element and update lists
    • then Lsorted=[24,73]L_{sorted} = [24, 73] and Lunsorted=[37,9,97,29]L_{unsorted} = [37, 9, 97, 29]
    • now L=LsortedLunsorted=[24,7337,9,97,29]L = L_{sorted} \cup L_{unsorted} = [24, 73 \vert 37, 9, 97, 29]
  • Third round
    • 3737 is picked and p=Position(Lsorted,37)=1p = Position(L_{sorted}, 37) = 1
    • so we should insert 3737 between L[p]=L[1]=24L[p] = L[1] = 24 and L[p+1]=L[2]=73L[p + 1] = L[2] = 73
    • then Lsorted=[24,37,73]L_{sorted} = [24, 37, 73] and Lunsorted=[9,97,29]L_{unsorted} = [9, 97, 29]
    • now L=LsortedLunsorted=[24,37,739,97,29]L = L_{sorted} \cup L_{unsorted} = [24, 37, 73 \vert 9, 97, 29]
  • Fourth round
    • 99 is picked and p=Position(Lsorted,9)=0p = Position(L_{sorted}, 9) = 0
    • Thus, Lsorted,LunsortedL_{sorted}, L_{unsorted} are updated to [9,24,37,73][9, 24, 37, 73] and [97,29][97, 29].
    • now L=LsortedLunsorted=[9,24,37,7397,29]L = L_{sorted} \cup L_{unsorted} = [9, 24, 37, 73 \vert 97, 29]
  • Fifth round
    • 9797 is picked and p=Position(Lsorted,97)=4=Lsortedp = Position(L_{sorted}, 97) = 4 = \vert L_{sorted} \vert
    • so we should put 9797 as the last element of the LsortedL_{sorted}
    • then Lsorted=[9,24,37,73,97]L_{sorted} = [9, 24, 37, 73, 97] and Lunsorted=[29]L_{unsorted} = [29]
    • now L=LsortedLunsorted=[9,24,37,73,9729]L = L_{sorted} \cup L_{unsorted} = [9, 24, 37, 73, 97 \vert 29].
  • Final round
    • 2929 is picked and p=Position(Lsorted,29)=2p = Position(L_{sorted}, 29) = 2
    • so we should insert 2929 between L[p]=L[2]=24L[p] = L[2] = 24 and L[p+1]=L[3]=37L[p + 1] = L[3] = 37
    • then Lsorted=[9,24,29,37,73,97]L_{sorted} = [9, 24, 29, 37, 73, 97] and Lunsorted=[]L_{unsorted} = [] is empty
    • now L=LsortedLunsorted=[9,24,29,37,73,97]L = L_{sorted} \cup L_{unsorted} = [9, 24, 29, 37, 73, 97].

Algorithm

InsertionSort(L):    for i2 to L:        ji        while j>1 and L[j1]>L[j]:            swap L[j1] and L[j]            jj1 \begin{aligned} & InsertionSort(L): \\ & \space \space \space \space \text{for } i \leftarrow 2 \text{ to } \vert L \vert: \\ & \space \space \space \space \space \space \space \space j \leftarrow i\\ & \space \space \space \space \space \space \space \space \text{while } j > 1 \text{ and } L[j-1] > L[j]: \\ & \space \space \space \space \space \space \space \space \space \space \space \space \text{swap } L[j-1] \text{ and } L[j] \\ & \space \space \space \space \space \space \space \space \space \space \space \space j \leftarrow j - 1 \\ \end{aligned}

The above method will divide LL into two parts. L[1...i1]=LsortedL[1...i-1] = L_{sorted} is sorted, and L[i...N]=LunsortedL[i...N] = L_{unsorted} is unsorted, where N=LN = \vert L \vert is the length of LL. The L[i]L[i] will be picked to insert into LsortedL_{sorted} iteratively.

  • When i=2i = 2
    • Lsorted=L[1]L_{sorted} = L[1] and Lunsorted=L[2...N]L_{unsorted} = L[2...N]
    • The goal in this round is to insert the L[2]L[2] into LsortedL_{sorted}
    • The L[2]L[2] is picked and compare with L[1]L[1]
    • If L[2]<L[1]L[2] < L[1], then we swap them
    • Otherwise, do nothing
    • Then, Lsorted=L[1...2]L_{sorted} = L[1...2] is sorted and Lunsorted=L[3...N]L_{unsorted} = L[3...N]
  • When i=3i = 3
    • Lsorted=L[1...2]L_{sorted} = L[1...2] and Lunsorted=L[3...N]L_{unsorted} = L[3...N]
    • The goal in this round is to insert the L[3]L[3] into LsortedL_{sorted}
    • The L[3]L[3] is picked
    • If L[3]>=L[2]L[3] >= L[2], it means that L[1...3]L[1...3] is sorted, so we don't need to do anything
    • Otherwise(L[3]<L[2]L[3] < L[2]), swap L[3]L[3] and L[2]L[2] and check whether it needs to swap again if L[2]<L[1]L[2] < L[1]
    • After finishing checking, Lsorted=L[1...3]L_{sorted} = L[1...3] is sorted and Lunsorted=L[4...N]L_{unsorted} = L[4...N]
  • When i=ki = k
    • Lsorted=L[1...k1]L_{sorted} = L[1...k-1] and Lunsorted=L[k...N]L_{unsorted} = L[k...N]
    • The goal in this round is to insert the L[k]L[k] into LsortedL_{sorted}
    • The L[k]L[k] is picked to compare with the elements one by one in LsortedL_{sorted}, from the maximal(L[k1]L[k-1]) to minimal item(L[1]L[1]), to find a place to insert
    • After finishing checking, Lsorted=L[1...k]L_{sorted} = L[1...k] is sorted and LunsortedL_{unsorted} is L[k+1...N]L[k+1...N]
  • When i=Ni = N
    • Lsorted=L[1...N1]L_{sorted} = L[1...N-1] and Lunsorted=L[N]L_{unsorted} = L[N]
    • The goal in this round is to insert the L[N]L[N] into LsortedL_{sorted}
    • In the same way, the L[1...N]L[1...N] is sorted after finishing the procedure
    • so LsortedL_{sorted} is updated to L[1...N]L[1...N] and Lunsorted=[]L_{unsorted} = [] is empty

Another method without swapping

InsertionSort(L):    for i2 to L:        cL[i]        ji        while j>1 and L[j1]>c:            L[j]=L[j1]            j=j1        L[j]=c \begin{aligned} & InsertionSort(L): \\ & \space \space \space \space \text{for } i \leftarrow 2 \text{ to } \vert L \vert: \\ & \space \space \space \space \space \space \space \space c \leftarrow L[i] \\ & \space \space \space \space \space \space \space \space j \leftarrow i \\ & \space \space \space \space \space \space \space \space \text{while } j > 1 \text{ and } L[j-1] > c: \\ & \space \space \space \space \space \space \space \space \space \space \space \space L[j] = L[j-1] \\ & \space \space \space \space \space \space \space \space \space \space \space \space j = j - 1 \\ & \space \space \space \space \space \space \space \space L[j] = c \\ \end{aligned}

Proof

Proof by mathematical induction

After each iteration for ii in InsertionSortInsertionSort, the L[1...i]L[1...i] is sorted array.

We need to prove this statement is true.

  • when i=2i = 2:
    • Same as the above explanation
    • The assumption is hold
  • when i=ki = k:
    • Assume the statement is hold when i=ki = k
    • L[1...k]L[1...k] is sorted array
  • when i=k+1i = k + 1
    • If L[k+1]>L[k]L[k + 1] > L[k], then L[1...k+1]L[1...k + 1] is naturally sorted so the proof is done
    • Otherwise, the L[k+1]L[k + 1] is swapped with L[k]L[k]
    • Now L[1...k1]L[1...k-1] is sorted and L[k+1]>L[k]L[k + 1] > L[k](after swapping!)
    • Next, we apply this algorithm to L=L[1...k1]L[k]L = L[1...k-1] \cup L[k] and now i=ki = k
    • The statement is hold when i=ki = k, so L[1...k]L[1...k] is sorted after applying the algorithm
    • Now L[1...k]L[1...k] is sorted and L[k+1]>L[k]L[k + 1] > L[k], so the proof is done

Complexity

The time complexity depends on the speed to find the inserted position. The more iterations to find the value of Position(L,x)Position(L, x) need, the more time it takes. The worst case is that we need to go through whole LsortedL_{sorted} to find correct positions to insert. It happens when the list is arranged from maximal to minimal values(e.g.,[5,4,3,2,1][5, 4, 3, 2, 1]). In this case, if the length of list is NN, we need to search 0+1+2+...+(N1)=N(N1)2=12N212N \begin{aligned} 0 + 1 + 2 + ... + (N - 1) &= \frac{ N \cdot (N - 1) }{ 2 } \\ &= \frac{ 1 }{ 2 } \cdot N^2 - \frac{ 1 }{ 2 } N \end{aligned} times to move all the items into LsortedL_{sorted}. Thus, the complexity is O(N2)\mathcal{O}(N^2).

Implementation

See the files on gist here.

results matching ""

    No results matching ""