Runtime Analysis through Experimental Evaluation

Gregory M. Kapfhammer

February 10, 2025

Programmers must write programs that are correct, efficient, and maintainable

Prior Focus: is the program implemented correctly?
New Focus: does the program have good performance?

Programmers often ask “does your program have good performance?” Here are some common answers!

“Yes, it was fast enough on my laptop”
“I think it is fast enough for most users”
“I don’t know, I never measured its performance”
“It is really fast if you run it on an Apple M3 chip”
“Let’s wait for our user’s to tell us if it is fast enough”
“One function is too slow — and I think we rarely use it”

A two-week road map for exploring Python program performance

Week 1: Empirical Evaluation of Runtime
- How do you conduct timing experiments to measure the performance of a real Python programs?
Week 2: Analytical Evaluation of Time Complexity
- How do you use an instruction cost model to prove that it belongs to a specific time complexity class?

Ultimate Goal: create a nuanced description of program efficiency that adapts to different inputs and to different computers

Algorithm analysis challenges

Different Inputs

Varying sizes of input data
Different types of data
Degrees of sortedness
Degrees of duplication
Different data distributions

Different Computers

Diverse hardware setups
Varying system performance
Diverse system building
Different operating systems
Variation in system load

Ultimate goal: experimental and analytical evaluation methods that yield actionable insights that support understanding and prediction

Practical methods for measuring the running time of a program

Implement as a Python function with inputs and outputs
Use the time module to measure running time on inputs
Study trade-offs associated with different implementations
Run function with different inputs & execution environments

Implementing a duplicate detector

from typing import List
def duplicates(input_list: List[int]) -> bool:
    n = len(input_list)
    for i in range(n):
        for j in range(n):
            if i != j and input_list[i] == input_list[j]:
                return True
    return False


assert(duplicates([1,2,6,3,4,5,6,7,8]))
print(duplicates([1,2,6,3,4,5,6,7,8]))
assert(not duplicates([1,2,3,4]))
print(not duplicates([1,2,3,4]))

True
True

Returns True if there are any duplicates and False otherwise
What is the performance of the duplicates function? Why?

Explore the duplicate detector

Run duplicate with different inputs and observe its output
Experiment with using both the assert and print statements

Performance of a duplicate detector?

from typing import List
import time

for i in range(5):
    n = 1000
    start = time.time()
    duplicates(list(range(n)))
    time_taken = time.time() - start
    print("Time taken for n = {}: {:.15f} seconds".format(n, time_taken))

Time taken for n = 1000: 0.038827419281006 seconds
Time taken for n = 1000: 0.038571357727051 seconds
Time taken for n = 1000: 0.038315534591675 seconds
Time taken for n = 1000: 0.038638591766357 seconds
Time taken for n = 1000: 0.038575887680054 seconds

What is the strategy for calculating the elapsed time?
Do the results suggest that this implementation is fast? Why?

General-purpose function timing

from typing import Callable
import time

def timetrials(function: Callable, n: int, trials: int = 10) -> None:
    """Time the function with n inputs trials times"""
    totaltime = 0
    for _ in range(trials):
        start = time.time()
        function(list(range(n)))
        totaltime += time.time() - start
    print("average =%10.7f for n = %d" % (totaltime/trials, n))

# conduct a doubling experiment for a provided function
for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates, n)

average = 0.0000788 for n = 50
average = 0.0002982 for n = 100
average = 0.0011531 for n = 200
average = 0.0055996 for n = 400
average = 0.0239720 for n = 800
average = 0.1014120 for n = 1600
average = 0.4136400 for n = 3200

Explanation of `timetrial` function

Imports:
- Callable from typing: Allows functions as arguments
- time: Used for timing the function’s execution
Function timetrials:
- Parameters:
  - function: The function to be timed
  - n: The size of the input list
  - trials: Number of times to run the function (default is 10)
- Timing Loop:
  - Runs the function trials times
  - Measures the time taken for each run and calculates the average

Duplicate detecting experiment

Duplicate Function

Accepts an input list
True if duplicate found
False if no duplicates
n is the list’s length
Double nested for loop

Timing Function

Use the time module
Calculate elapsed time
Consistent number format
Provide a function as input
Run a doubling experiment

Key Observation: properly designed experiments yield meaningful insights into the likely worst-case performance of a Python function

Understanding Doubling Experiments

Doubling Experiments:
- Measure performance by doubling input size
- Observe changes in execution time
- Helps to identify the worst-case performance
Key Insights:
- Double Time: Execution time doubles when input size doubles
  - Suggests linear time complexity or \(O(n)\)
- Quadruple Time: Time quadruples when input size doubles
  - Suggests quadratic time complexity or \(O(n^2)\)
- Eight-times Time: Time increases eight-fold when input size doubles
  - Suggests cubic time complexity or \(O(n^3)\)

Can we improve the performance of duplicate detection?

Avoid doing unnecessary computational steps
duplicates compares each pair of elements twice
Improve speed by only letting j range up to i

Improving `duplicate`’s speed

def duplicates_restrict_range(input_list: List[int]) -> bool:
    n = len(input_list)
    for i in range(1,n):
        for j in range(i):
            if input_list[i] == input_list[j]:
                return True
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_restrict_range, n)

average = 0.0003671 for n = 50
average = 0.0001824 for n = 100
average = 0.0007830 for n = 200
average = 0.0022510 for n = 400
average = 0.0087375 for n = 800
average = 0.0380821 for n = 1600
average = 0.1620718 for n = 3200

Did this improve the performance? How can you tell? Why?

Refactoring `duplicate` again

def duplicates_any(input_list: List[int]) -> bool:
    n = len(input_list)

    return any(input_list[i] == input_list[j]
        for i in range(1,n) for j in range(i))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_any, n)

average = 0.0000521 for n = 50
average = 0.0002011 for n = 100
average = 0.0008133 for n = 200
average = 0.0033457 for n = 400
average = 0.0151705 for n = 800
average = 0.0625526 for n = 1600
average = 0.2584449 for n = 3200

Wait, this new implementation does not seem to be faster!
Reducing code size does not always improve performance

More alternatives to `duplicates`

def duplicates_sort(input_list: List[int]) -> bool:
    n = len(input_list)
    input_list.sort()
    for i in range(n-1):
        if input_list[i] == input_list[i+1]:
            return True
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_sort, n)

average = 0.0000025 for n = 50
average = 0.0000038 for n = 100
average = 0.0000068 for n = 200
average = 0.0000179 for n = 400
average = 0.0000432 for n = 800
average = 0.0000917 for n = 1600
average = 0.0001903 for n = 3200

Does sort improve the performance of duplicates?

Using both `any` and `sort`

def duplicates_sort_any(input_list: List[int]) -> bool:
    n = len(input_list)
    input_list.sort()
    return any(input_list[i] == input_list[i+1] for i in range(n-1))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_sort_any, n)

average = 0.0000040 for n = 50
average = 0.0000066 for n = 100
average = 0.0000133 for n = 200
average = 0.0000266 for n = 400
average = 0.0000583 for n = 800
average = 0.0001225 for n = 1600
average = 0.0002488 for n = 3200

Does sort improve the performance of duplicates?
Does the use of any improve or hinder performance?
What is the connection between code size and efficiency?

Using `set` with a `for` loop

def duplicates_set_loop(input_list: List[int]) -> bool:
    s = set()
    for e in input_list:
        if e in s:
            return True
        s.add(e)
    return False

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_set_loop, n)

average = 0.0000029 for n = 50
average = 0.0000051 for n = 100
average = 0.0000091 for n = 200
average = 0.0000198 for n = 400
average = 0.0000405 for n = 800
average = 0.0000843 for n = 1600
average = 0.0001638 for n = 3200

Is it possible to avoid the use of the for loop?

Using `set` without a `for` loop

def duplicates_set(input_list: List[int]) -> bool:
    return len(input_list) != len(set(input_list))

for n in [50, 100, 200, 400, 800, 1600, 3200]:
    timetrials(duplicates_set, n)

average = 0.0000017 for n = 50
average = 0.0000023 for n = 100
average = 0.0000035 for n = 200
average = 0.0000088 for n = 400
average = 0.0000183 for n = 800
average = 0.0000413 for n = 1600
average = 0.0000823 for n = 3200

What is the fastest approach to duplicate detection?
How did you know that this approach was the fastest?
Can you explain why this approach is the fastest?

Confirming correct output

print(duplicates([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates([1, 2, 3, 4, 1, 5, 6, 7, 8]))
print(duplicates_restrict_range([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_restrict_range([1, 2, 3, 4, 5, 1, 6, 7, 8]))
print(duplicates_any([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_any([1, 2, 3, 4, 5, 6, 7, 8, 3]))
print(duplicates_sort([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_sort([1, 2, 3, 4, 5, 2, 6, 7, 8]))
print(duplicates_sort_any([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_sort_any([1, 2, 3, 4, 5, 6, 7, 8, 8]))
print(duplicates_set_loop([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_set_loop([1, 2, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_set([1, 2, 3, 4, 5, 6, 7, 8]))
print(duplicates_set([1, 2, 3, 4, 5, 6, 7, 8, 3]))

False
True
False
True
False
True
False
True
False
True
False
True
False
True

Duplicate Detection: `for` or `any`

duplicates:
- Uses double nested for loops
- Compares each pair of elements
duplicates_restrict_range:
- Optimizes by restricting the range of the inner loop
- Compares each pair only once
duplicates_any:
- Uses any with a generator expression
- More concise but not necessarily faster

Duplicate Detection : `sort` and `any`

duplicates_sort:
- Sorts the list first
- Checks adjacent elements for duplicates
duplicates_sort_any:
- Combines sorting with any
- Uses any to check adjacent elements

Compared to duplicates_restrict_range, are either of these functions faster? Can you explain why or why not?

Duplicate Detection: `set` and `for`

duplicates_set_loop:
- Uses a set to track seen elements
- Checks for duplicates during iteration
duplicates_set:
- Compares list length with set length
- Fastest approach using set

Compared to duplicates_restrict_range, are either of these functions faster? Can you explain why or why not?

Key insights from the duplicate detection experiments

What did we learn from these experiments?
- Different implementations yield different performance results
- Time overhead increases as the input size increases
- set construction and containment check yield improvements
- Benefits from assuming that data is comparable or hashable
- Code size does not always correlate with performance

Algorithm Engineering Skill: quickly implement a doubling experiment using a function like timetrials and compare results

Summarize results from empirically studying duplicate detection

Make a table to summarize the results for each function
Determine which function is best according to runtime
Clearly articulate why that function is the best choice

Let’s play an algorithm analysis game!

You are invited to experimentally evaluate the runtime of def f(values: List[int]): -> None
Alternate from one table to the next:
- Sit with the members of your algorithm all-hands team
- State the step that you would next take to evaluate function
- If prior step was wrong, explain why and suggest a new one
- Continue this process until we have correctly studied runtime
- For each round, you may ask the instructor one question
- Question asking is determined by consensus across all teams

How to empirically measure the running time of a Python program

Implement as a Python function with inputs and outputs:
- Function use enables testing with pytest
- Using a function also makes experiments easier
Study performance trade-offs of different implementations:
- Use the time module to measure running time on inputs
- Provide different inputs to function during benchmarking
- Run the benchmarks in different execution environments