Runtime Analysis through Experimental Evaluation

Gregory M. Kapfhammer

February 12, 2024

Programmers must write programs that are correct, efficient, and maintainable

  • Prior Focus: is the program implemented correctly?
  • New Focus: does the program have good performance?

Programmers often ask “does your program have good performance?” Here are some common answers!

  • “Yes, it was fast enough on my laptop”
  • “I think it is fast enough for most users”
  • “I don’t know, I never measured its performance”
  • “It is really fast if you run it on an Apple M3 chip”
  • “Let’s wait for our user’s to tell us if it is fast enough”
  • “One function is too slow — and I think we rarely use it”

A two-week road map for exploring Python program performance

  • Week 1: Empirical Evaluation of Runtime

    • How do you conduct timing experiments to measure the performance of a real Python programs?
  • Week 2: Analytical Evaluation of Time Complexity

    • How do you use an instruction cost model to prove that it belongs to a specific time complexity class?

Ultimate Goal: create a nuanced description of program efficiency that adapts to different inputs and to different computers

Algorithm analysis challenges

Different Inputs

  • Varying sizes of input data
  • Different types of data
  • Degrees of sortedness
  • Degrees of duplication
  • Different data distributions

Different Computers

  • Diverse hardware setups
  • Varying system performance
  • Diverse system building
  • Different operating systems
  • Variation in system load

Ultimate goal: experimental and analytical evaluation methods that yield actionable insights that support understanding and prediction

Practical methods for measuring the running time of a program

  • Implement as a Python function with inputs and outputs
  • Use the time module to measure running time on inputs
  • Study trade-offs associated with different implementations
  • Run function with different inputs & execution environments

Implementing a duplicate detector

from typing import List
def duplicates(input_list: List[int]) -> bool:
    n = len(input_list)
    for i in range(n):
        for j in range(n):
            if i != j and input_list[i] == input_list[j]:
                return True
    return False

assert(duplicates([1,2,6,3,4,5,6,7,8]))
assert(not duplicates([1,2,3,4]))
print(duplicates([1,2,6,3,4,5,6,7,8]))
print(not duplicates([1,2,3,4]))
True
True
  • Returns True if there are any duplicates and False otherwise
  • What is the performance of the duplicates function? Why?

Performance of a duplicate detector?

from typing import List
import time

for i in range(5):
    n = 1000
    start = time.time()
    duplicates(list(range(n)))
    time_taken = time.time() - start
    print("Time taken for n = {}: {:.15f} seconds".format(n, time_taken))
Time taken for n = 1000: 0.056169748306274 seconds
Time taken for n = 1000: 0.056759595870972 seconds
Time taken for n = 1000: 0.056889295578003 seconds
Time taken for n = 1000: 0.057609319686890 seconds
Time taken for n = 1000: 0.055958986282349 seconds
  • What is the strategy for calculating the elapsed time?
  • Do the results suggest that this implementation is fast? Why?

General-purpose function timing

from typing import Callable
import time

def timetrials(function: Callable, n: int, trials: int = 10) -> None:
    """Time the function with n inputs trials times"""
    totaltime = 0
    for _ in range(trials):
        start = time.time()
        function(list(range(n)))
        totaltime += time.time() - start
    print("average =%10.7f for n = %d" % (totaltime/trials, n))

# conduct a doubling experiment for a provided function
for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates, n)
average = 0.0001396 for n = 50
average = 0.0005698 for n = 100
average = 0.0020573 for n = 200
average = 0.0090509 for n = 400
average = 0.0349435 for n = 800
average = 0.1477875 for n = 1600
average = 0.5900513 for n = 3200

Duplicate detecting experiment

Duplicate Function

  • Accepts an input list
  • True if duplicate found
  • False if no duplicates
  • n is the list’s length
  • Double nested for loop

Timing Function

  • Use the time module
  • Calculate elapsed time
  • Consistent number format
  • Provide a function as input
  • Run a doubling experiment

Key Observation: properly designed experiments yield meaningful insights into the likely worst-case performance of a Python function

Can we improve the performance of duplicate detection?

  • Avoid doing unnecessary computational steps
  • duplicates compares each pair of elements twice
  • Improve speed by only letting j range up to i

Improving duplicate’s speed

def duplicates_restrict_range(input_list: List[int]) -> bool:
    n = len(input_list)
    for i in range(1,n):
        for j in range(i):
            if input_list[i] == input_list[j]:
                return True
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_restrict_range, n)
average = 0.0000611 for n = 50
average = 0.0002808 for n = 100
average = 0.0011877 for n = 200
average = 0.0033687 for n = 400
average = 0.0137956 for n = 800
average = 0.0551500 for n = 1600
average = 0.2166065 for n = 3200
  • Did this improve the performance? How can you tell? Why?

Refactoring duplicate again

def duplicates_any(input_list: List[int]) -> bool:
    n = len(input_list)

    return any(input_list[i] == input_list[j]
        for i in range(1,n) for j in range(i))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_any, n)
average = 0.0001116 for n = 50
average = 0.0005233 for n = 100
average = 0.0019467 for n = 200
average = 0.0044804 for n = 400
average = 0.0183839 for n = 800
average = 0.0709085 for n = 1600
average = 0.2807016 for n = 3200
  • Wait, this new implementation does not seem to be faster!
  • Reducing code size does not always improve performance

More alternatives to duplicates

def duplicates_sort(input_list: List[int]) -> bool:
    n = len(input_list)
    input_list.sort()
    for i in range(n-1):
        if input_list[i] == input_list[i+1]:
            return True
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_sort, n)
average = 0.0000038 for n = 50
average = 0.0000067 for n = 100
average = 0.0000122 for n = 200
average = 0.0000268 for n = 400
average = 0.0000602 for n = 800
average = 0.0001164 for n = 1600
average = 0.0002348 for n = 3200
  • Does sort improve the performance of duplicates?

Using both any and sort

def duplicates_sort_any(input_list: List[int]) -> bool:
    n = len(input_list)
    input_list.sort()
    return any(input_list[i] == input_list[i+1] for i in range(n-1))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_sort_any, n)
average = 0.0000052 for n = 50
average = 0.0000080 for n = 100
average = 0.0000149 for n = 200
average = 0.0000316 for n = 400
average = 0.0000647 for n = 800
average = 0.0001334 for n = 1600
average = 0.0002741 for n = 3200
  • Does sort improve the performance of duplicates?
  • Does the use of any improve or hinder performance?
  • What is the connection between code size and efficiency?

Using set with a for loop

def duplicates_set_loop(input_list: List[int]) -> bool:
    s = set()
    for e in input_list:
        if e in s:
            return True
        s.add(e)
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_set_loop, n)
average = 0.0000035 for n = 50
average = 0.0000064 for n = 100
average = 0.0000116 for n = 200
average = 0.0000246 for n = 400
average = 0.0000471 for n = 800
average = 0.0000968 for n = 1600
average = 0.0001907 for n = 3200
  • Is it possible to avoid the use of the for loop?

Using set without a for loop

def duplicates_set(input_list: List[int]) -> bool:
    return len(input_list) != len(set(input_list))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_set, n)
average = 0.0000015 for n = 50
average = 0.0000031 for n = 100
average = 0.0000036 for n = 200
average = 0.0000079 for n = 400
average = 0.0000151 for n = 800
average = 0.0000355 for n = 1600
average = 0.0000615 for n = 3200
  • What is the fastest approach to duplicate detection?
  • How did you know that this approach was the fastest?
  • Can you explain why this approach is the fastest?

Key insights from the duplicate detection experiments

  • What did we learn from these experiments?

    • Different implementations yield different performance results
    • Time overhead increases as the input size increases
    • set construction and containment check yield improvements
    • Benefits from assuming that data is comparable or hashable
    • Code size does not always correlate with performance

Algorithm Engineering Skill: quickly implement a doubling experiment using a function like timetrials and compare results

How to empirically measure the running time of a Python program

  • Implement as a Python function with inputs and outputs:
    • Function use enables testing with pytest
    • Using a function also makes experiments easier
  • Study performance trade-offs of different implementations:
    • Use the time module to measure running time on inputs
    • Provide different inputs to function during benchmarking
    • Run the benchmarks in different execution environments