Searching and Sorting

Gregory M. Kapfhammer

March 25, 2024

What are purposes of searching and sorting?

  • Searching finds a specific item in a collection
  • Sorting orders the items in a collection
  • Recursive and iterative algorithms are possible
  • Basic building blocks of many algorithms

Let’s explore several sorting algorithms!

  • Quadratic time sorting algorithms are easy to implement
  • Various sorting algorithms have quadratic time complexity
  • Mergesort and quicksort are efficient yet harder to build
  • Python contains its own sorting algorithm called timsort
  • The divide and conquer paradigm is useful for sorting

Detecting a sorted list

def issorted(L):
    for i in range(len(L)-1):
        if L[i]>L[i+1]:
            return False
    return True

A = [1,2,3,4,5]
print(A, "is sorted:", issorted(A))

B = [1,4,5,7,2]
print(B, "is sorted:", issorted(B))
[1, 2, 3, 4, 5] is sorted: True
[1, 4, 5, 7, 2] is sorted: False

How do we know if a list is sorted?

  • Confirm that adjacent elements are in the correct order
  • Use a single for loop to compare adjacent elements in L

All-pairs issorted function

def issorted_allpairs(L):
    for i in range(len(L)-1):
        for j in range(i+1, len(L)):
            if L[j] < L[i]:
              return False
    return True

A = [1,2,3,4,5]
print(A, "is sorted:", issorted_allpairs(A))

B = [1,4,5,7,2]
print(B, "is sorted:", issorted_allpairs(B))
[1, 2, 3, 4, 5] is sorted: True
[1, 4, 5, 7, 2] is sorted: False
  • Use a double for loop to compare all pairs of elements in L
  • The issorted_allpairs function has a time complexity of \(O(n^2)\)
  • The issorted function has a time complexity of \(O(n)\)

Bubble sort algorithm

def bubblesort(L):
    for _ in range(len(L)-1):
        for i in range(len(L)-1):
            if L[i]>L[i+1]:
                L[i], L[i+1] = L[i+1], L[i]

data = [30,100000,54,26,93,17,77,31,44,55,20]
print("Is the data sorted?", issorted(data))
print(data)
bubblesort(data)
print("Is the data sorted?", issorted(data))
print(data)
Is the data sorted? False
[30, 100000, 54, 26, 93, 17, 77, 31, 44, 55, 20]
Is the data sorted? True
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Use a double for loop to order all of the elements in L
  • The bubblesort function has a time complexity of \(O(n^2)\)

Stopping early with bubble sort

def bubblesort_stopearly(L):
    keepgoing = True
    while keepgoing:
        keepgoing = False
        for i in range(len(L)-1):
            if L[i]>L[i+1]:
                L[i], L[i+1] = L[i+1], L[i]
                keepgoing = True

data = [30,100000,54,26,93,17,77,31,44,55,20]
bubblesort_stopearly(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Use a while loop containing a for loop to order elements in L

  • Although it may consume less time for some lists, the bubblesort_stopearly function still has a time complexity of \(O(n^2)\)

Implementing selection sort

def selectionsort(L):
    n = len(L)
    for i in range(n-1):
        max_index=0        
        for index in range(n - i):
            if L[index] > L[max_index]:
                max_index = index
        L[n-i-1], L[max_index] = L[max_index], L[n-i-1]

data = [30,100000,54,26,93,17,77,31,44,55,20]
selectionsort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Key invariant: after \(i\) runs the largest \(i\) elements are in final position

  • Find maximum element and swap it with last unsorted element

  • Yet, the selectionsort function still has a time complexity of \(O(n^2)\)

Implementing insertion sort

def insertionsort(L):
    n = len(L)
    for i in range(n):
        j = n - i - 1
        while j < n - 1 and L[j]>L[j+1]:
            L[j], L[j+1] = L[j+1], L[j]
            j+=1

data = [30,100000,54,26,93,17,77,31,44,55,20]
insertionsort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]
  • Key invariant: after \(i\) runs the last \(i\) elements are in sorted order

  • May be faster if the list is already sorted (or almost already sorted)

  • Yet, the insertionsort function still has a time complexity of \(O(n^2)\)

Sorting in Python

X = [3,1,5]
Y = sorted(X)
print("X:", X)
print("Y:", Y)

X.sort()
print("X:", X)
print("Y:", Y)
X: [3, 1, 5]
Y: [1, 3, 5]
X: [1, 3, 5]
Y: [1, 3, 5]
  • Two main functions to sort a list: sort() and sorted()
  • sort orders list and sorted returns a new list that is sorted
  • Note that calling sorted does not change the contents of X

Let’s design and implement some faster sorting algorithms!

  • Divide and conquer algorithms:
    • Step 1: divide the problem into 2 or more pieces
    • Step 2: conquer the problem by solving the pieces
    • Step 3: combine the solutions on the parts into a solution

Implementing mergesort

def mergesort(L):
    if len(L) < 2:
        return
    mid = len(L) // 2
    A = L[:mid]
    B = L[mid:]
    mergesort(A); mergesort(B)
    merge(A, B, L)

def merge(A, B, L):   
    i = 0
    j = 0
    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            L[i+j] = A[i]
            i = i + 1
        else:
            L[i+j] = B[j]
            j = j + 1
    L[i+j:] = A[i:] + B[j:]

data = [30,100000,54,26,93,17,77,31,44,55,20]
mergesort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Implementing quicksort

from random import randrange

def quicksort(L):
    _quicksort(L, 0, len(L))

def _quicksort(L, left, right):
    if right - left > 1:    
        mid = partition(L, left, right)
        _quicksort(L, left, mid)
        _quicksort(L, mid+1, right)

def partition(L, left, right):
    pivot = randrange(left, right)
    L[pivot], L[right -1] = L[right -1], L[pivot]
    i, j, pivot = left, right - 2, right - 1
    while i < j:
        while L[i] < L[pivot]:
            i += 1
        while i < j and L[j] >= L[pivot]:
            j -= 1
        if i < j:
            L[i], L[j] = L[j], L[i]
    if L[pivot] <= L[i]:
        L[pivot], L[i] = L[i], L[pivot]
        pivot = i
    return pivot

data = [30,100000,54,26,93,17,77,31,44,55,20]
quicksort(data)
print(data)
[17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Deep dive into quicksort

Quick sort time complexity

  • Worst-case time complexity: \(O(n^2)\)
  • Worst-case occurs when pivot is smallest or largest element
  • Random pivot selection helps avoid worst-case on average

Time complexity breakdown

  • partition(L, left, right): \(O(n)\)
  • quicksort(L): Defined by _quicksort
  • _quicksort(L, left, right):
    • Average case: \(O(n \times \log n)\)
    • Worst case: \(O(n^2)\)

Comparing mergesort and quicksort

Both algorithms use divide and conquer!

  • mergesort:
    • Stable sort maintains the relative order of equal elements
    • Worst-case time complexity is \(O(n \times \log n)\)
    • Requires additional space for merging intermediate sub-lists
  • quicksort:
    • Not a stable sort and thus relative order not preserved
    • Worst-case time complexity is \(O(n^2)\), but often faster in practice
    • In-place sorting means it does not require additional space

Recursive quick selection

def quickselect(L, k):
    return _quickselect(L, k, 0, len(L))

def _quickselect(L, k, left, right):
    pivot = partition(L, left, right)
    if k <= pivot:
        return _quickselect(L, k, left, pivot)
    elif k == pivot + 1:
        return L[pivot]
    else:
        return _quickselect(L, k, pivot + 1, right)

data = [30,100000,54,26,93,17,77,31,44,55,20]
selection_one = quickselect(data, 1)
selection_three = quickselect(data, 3)
selection_five = quickselect(data, 5)
quicksort(data)
print(selection_one, selection_three, selection_five, "for", data)
17 26 31 for [17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Iterative quick selection

def quickselect(L, k):
    left, right = 0, len(L)
    while left < right:
        pivot = partition(L, left, right)
        if k <= pivot:
            right = pivot
        elif k == pivot + 1:
            return L[pivot]
        else:
            left = pivot + 1
    return L[left]

data = [30,100000,54,26,93,17,77,31,44,55,20]
selection_one = quickselect(data, 1)
selection_three = quickselect(data, 3)
selection_five = quickselect(data, 5)
quicksort(data)
print(selection_one, selection_three, selection_five, "for", data)
17 26 31 for [17, 20, 26, 30, 31, 44, 54, 55, 77, 93, 100000]

Reviewing divide and conquer

  • Binary Search
    • Recursive step takes constant time
    • Makes a single recursive call on smaller list
    • Time complexity: \(O(\log n)\)
  • Sorting Algorithms
    • Running time is linear plus recursive call cost
    • Total length of shorter lists is \(O(n)\)
    • Time complexity: Recursion depth of \(O(\log n)\) means \(O(n \times \log n)\)
  • Quick Selection
    • Running time is linear plus the cost of one recursive call
    • Time complexity: \(O(n)\)