Runtime Analysis through Experimental Evaluation

Gregory M. Kapfhammer

February 10, 2025

Programmers must write programs that are correct, efficient, and maintainable

  • Prior Focus: is the program implemented correctly?
  • New Focus: does the program have good performance?

Programmers often ask “does your program have good performance?” Here are some common answers!

  • “Yes, it was fast enough on my laptop”
  • “I think it is fast enough for most users”
  • “I don’t know, I never measured its performance”
  • “It is really fast if you run it on an Apple M3 chip”
  • “Let’s wait for our user’s to tell us if it is fast enough”
  • “One function is too slow — and I think we rarely use it”

A two-week road map for exploring Python program performance

  • Week 1: Empirical Evaluation of Runtime

    • How do you conduct timing experiments to measure the performance of a real Python programs?
  • Week 2: Analytical Evaluation of Time Complexity

    • How do you use an instruction cost model to prove that it belongs to a specific time complexity class?

Ultimate Goal: create a nuanced description of program efficiency that adapts to different inputs and to different computers

Algorithm analysis challenges

Different Inputs

  • Varying sizes of input data
  • Different types of data
  • Degrees of sortedness
  • Degrees of duplication
  • Different data distributions

Different Computers

  • Diverse hardware setups
  • Varying system performance
  • Diverse system building
  • Different operating systems
  • Variation in system load

Ultimate goal: experimental and analytical evaluation methods that yield actionable insights that support understanding and prediction

Practical methods for measuring the running time of a program

  • Implement as a Python function with inputs and outputs
  • Use the time module to measure running time on inputs
  • Study trade-offs associated with different implementations
  • Run function with different inputs & execution environments

Implementing a duplicate detector

from typing import List
def duplicates(input_list: List[int]) -> bool:
    n = len(input_list)
    for i in range(n):
        for j in range(n):
            if i != j and input_list[i] == input_list[j]:
                return True
    return False


assert(duplicates([1,2,6,3,4,5,6,7,8]))
print(duplicates([1,2,6,3,4,5,6,7,8]))
assert(not duplicates([1,2,3,4]))
print(not duplicates([1,2,3,4]))
True
True
  • Returns True if there are any duplicates and False otherwise
  • What is the performance of the duplicates function? Why?

Explore the duplicate detector

  • Run duplicate with different inputs and observe its output
  • Experiment with using both the assert and print statements

Performance of a duplicate detector?

from typing import List
import time

for i in range(5):
    n = 1000
    start = time.time()
    duplicates(list(range(n)))
    time_taken = time.time() - start
    print("Time taken for n = {}: {:.15f} seconds".format(n, time_taken))
Time taken for n = 1000: 0.043537139892578 seconds
Time taken for n = 1000: 0.042206764221191 seconds
Time taken for n = 1000: 0.049254894256592 seconds
Time taken for n = 1000: 0.044584751129150 seconds
Time taken for n = 1000: 0.044873714447021 seconds
  • What is the strategy for calculating the elapsed time?
  • Do the results suggest that this implementation is fast? Why?

General-purpose function timing

from typing import Callable
import time

def timetrials(function: Callable, n: int, trials: int = 10) -> None:
    """Time the function with n inputs trials times"""
    totaltime = 0
    for _ in range(trials):
        start = time.time()
        function(list(range(n)))
        totaltime += time.time() - start
    print("average =%10.7f for n = %d" % (totaltime/trials, n))

# conduct a doubling experiment for a provided function
for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates, n)
average = 0.0001017 for n = 50
average = 0.0003443 for n = 100
average = 0.0012779 for n = 200
average = 0.0059092 for n = 400
average = 0.0265437 for n = 800
average = 0.1105893 for n = 1600
average = 0.4560852 for n = 3200

Explanation of timetrial function

  • Imports:
    • Callable from typing: Allows functions as arguments
    • time: Used for timing the function’s execution
  • Function timetrials:
    • Parameters:
      • function: The function to be timed
      • n: The size of the input list
      • trials: Number of times to run the function (default is 10)
    • Timing Loop:
      • Runs the function trials times
      • Measures the time taken for each run and calculates the average

Duplicate detecting experiment

Duplicate Function

  • Accepts an input list
  • True if duplicate found
  • False if no duplicates
  • n is the list’s length
  • Double nested for loop

Timing Function

  • Use the time module
  • Calculate elapsed time
  • Consistent number format
  • Provide a function as input
  • Run a doubling experiment

Key Observation: properly designed experiments yield meaningful insights into the likely worst-case performance of a Python function

Understanding Doubling Experiments

  • Doubling Experiments:
    • Measure performance by doubling input size
    • Observe changes in execution time
    • Helps to identify the worst-case performance
  • Key Insights:
    • Double Time: Execution time doubles when input size doubles
      • Suggests linear time complexity or \(O(n)\)
    • Quadruple Time: Time quadruples when input size doubles
      • Suggests quadratic time complexity or \(O(n^2)\)
    • Eight-times Time: Time increases eight-fold when input size doubles
      • Suggests cubic time complexity or \(O(n^3)\)

Can we improve the performance of duplicate detection?

  • Avoid doing unnecessary computational steps
  • duplicates compares each pair of elements twice
  • Improve speed by only letting j range up to i

Improving duplicate’s speed

def duplicates_restrict_range(input_list: List[int]) -> bool:
    n = len(input_list)
    for i in range(1,n):
        for j in range(i):
            if input_list[i] == input_list[j]:
                return True
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_restrict_range, n)
average = 0.0000393 for n = 50
average = 0.0001437 for n = 100
average = 0.0005082 for n = 200
average = 0.0021856 for n = 400
average = 0.0097723 for n = 800
average = 0.0419329 for n = 1600
average = 0.1740864 for n = 3200
  • Did this improve the performance? How can you tell? Why?

Refactoring duplicate again

def duplicates_any(input_list: List[int]) -> bool:
    n = len(input_list)

    return any(input_list[i] == input_list[j]
        for i in range(1,n) for j in range(i))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_any, n)
average = 0.0000881 for n = 50
average = 0.0006855 for n = 100
average = 0.0012529 for n = 200
average = 0.0038878 for n = 400
average = 0.0156014 for n = 800
average = 0.0657139 for n = 1600
average = 0.2725738 for n = 3200
  • Wait, this new implementation does not seem to be faster!
  • Reducing code size does not always improve performance

More alternatives to duplicates

def duplicates_sort(input_list: List[int]) -> bool:
    n = len(input_list)
    input_list.sort()
    for i in range(n-1):
        if input_list[i] == input_list[i+1]:
            return True
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_sort, n)
average = 0.0000045 for n = 50
average = 0.0000069 for n = 100
average = 0.0000138 for n = 200
average = 0.0000367 for n = 400
average = 0.0000837 for n = 800
average = 0.0001772 for n = 1600
average = 0.0006435 for n = 3200
  • Does sort improve the performance of duplicates?

Using both any and sort

def duplicates_sort_any(input_list: List[int]) -> bool:
    n = len(input_list)
    input_list.sort()
    return any(input_list[i] == input_list[i+1] for i in range(n-1))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_sort_any, n)
average = 0.0000067 for n = 50
average = 0.0000109 for n = 100
average = 0.0000188 for n = 200
average = 0.0000405 for n = 400
average = 0.0000736 for n = 800
average = 0.0001280 for n = 1600
average = 0.0002752 for n = 3200
  • Does sort improve the performance of duplicates?
  • Does the use of any improve or hinder performance?
  • What is the connection between code size and efficiency?

Using set with a for loop

def duplicates_set_loop(input_list: List[int]) -> bool:
    s = set()
    for e in input_list:
        if e in s:
            return True
        s.add(e)
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_set_loop, n)
average = 0.0000036 for n = 50
average = 0.0000072 for n = 100
average = 0.0000118 for n = 200
average = 0.0000268 for n = 400
average = 0.0000550 for n = 800
average = 0.0001166 for n = 1600
average = 0.0001940 for n = 3200
  • Is it possible to avoid the use of the for loop?

Using set without a for loop

def duplicates_set(input_list: List[int]) -> bool:
    return len(input_list) != len(set(input_list))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_set, n)
average = 0.0000019 for n = 50
average = 0.0000027 for n = 100
average = 0.0000041 for n = 200
average = 0.0000125 for n = 400
average = 0.0000210 for n = 800
average = 0.0000479 for n = 1600
average = 0.0000899 for n = 3200
  • What is the fastest approach to duplicate detection?
  • How did you know that this approach was the fastest?
  • Can you explain why this approach is the fastest?

Confirming correct output

print(duplicates([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates([1, 2, 3, 4, 1, 5, 6, 7, 8]))
print(duplicates_restrict_range([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_restrict_range([1, 2, 3, 4, 5, 1, 6, 7, 8]))
print(duplicates_any([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_any([1, 2, 3, 4, 5, 6, 7, 8, 3]))
print(duplicates_sort([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_sort([1, 2, 3, 4, 5, 2, 6, 7, 8]))
print(duplicates_sort_any([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_sort_any([1, 2, 3, 4, 5, 6, 7, 8, 8]))
print(duplicates_set_loop([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_set_loop([1, 2, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_set([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_set([1, 2, 3, 4, 5, 6, 7, 8, 3]))
False
True
False
True
False
True
False
True
False
True
False
True
False
True

Duplicate Detection: for or any

  • duplicates:
    • Uses double nested for loops
    • Compares each pair of elements
  • duplicates_restrict_range:
    • Optimizes by restricting the range of the inner loop
    • Compares each pair only once
  • duplicates_any:
    • Uses any with a generator expression
    • More concise but not necessarily faster

Duplicate Detection : sort and any

  • duplicates_sort:
    • Sorts the list first
    • Checks adjacent elements for duplicates
  • duplicates_sort_any:
    • Combines sorting with any
    • Uses any to check adjacent elements
  • Compared to duplicates_restrict_range, are either of these functions faster? Can you explain why or why not?

Duplicate Detection: set and for

  • duplicates_set_loop:
    • Uses a set to track seen elements
    • Checks for duplicates during iteration
  • duplicates_set:
    • Compares list length with set length
    • Fastest approach using set
  • Compared to duplicates_restrict_range, are either of these functions faster? Can you explain why or why not?

Key insights from the duplicate detection experiments

  • What did we learn from these experiments?

    • Different implementations yield different performance results
    • Time overhead increases as the input size increases
    • set construction and containment check yield improvements
    • Benefits from assuming that data is comparable or hashable
    • Code size does not always correlate with performance

Algorithm Engineering Skill: quickly implement a doubling experiment using a function like timetrials and compare results

Summarize results from empirically studying duplicate detection

  • Make a table to summarize the results for each function
  • Determine which function is best according to runtime
  • Clearly articulate why that function is the best choice

Let’s play an algorithm analysis game!

  • You are invited to experimentally evaluate the runtime of def f(values: List[int]): -> None

  • Alternate from one table to the next:

    • Sit with the members of your algorithm all-hands team
    • State the step that you would next take to evaluate function
    • If prior step was wrong, explain why and suggest a new one
    • Continue this process until we have correctly studied runtime
    • For each round, you may ask the instructor one question
    • Question asking is determined by consensus across all teams

How to empirically measure the running time of a Python program

  • Implement as a Python function with inputs and outputs:
    • Function use enables testing with pytest
    • Using a function also makes experiments easier
  • Study performance trade-offs of different implementations:
    • Use the time module to measure running time on inputs
    • Provide different inputs to function during benchmarking
    • Run the benchmarks in different execution environments