Introduction to Algorithm Analysis

Gregory M. Kapfhammer

January 16, 2024

What is algorithm analysis?

  • Data structures that organize data
  • Algorithms that process data structures
  • Experimental and theoretical analysis
  • “Algorithmology” is both science and engineering

Algorithm analysis process

  • Design, implement, and test
    • Data structures and algorithms
    • Benchmark framework
    • Data analysis tools
  • Create execution environments
    • Repeatable and reproducible
    • Controlled and varied
    • Local and in the cloud

Examples of four programs to study through performance benchmarks

  • Terminal window shell like bash or zsh
  • Terminal prompt like powerlevel10k or starship
  • Web browser like firefox or chrome
  • Text editor like vim or emacs

How do we characterize the execution environment of these programs? How do we compare their performance in different configurations? How do we improve their performance?

Wow, measuring and improving the performance of any complex program is very challenging! Why is that?

  • Differences in the execution environment
  • Difficulty in measuring the performance
  • Challenging to compare the performance
  • Hard to repeat and reproduce experiments
  • Results change as the program evolves
  • Caution needed to avoid over-optimization

Learn more about algorithm analysis

Read chapters 1 and 2 in “A First Course on Data Structures in Python

Overview the ds2 package in donsheehy/datastructures book and package code

  • The FCDSP book is both “simple” and “complex”
  • Coverage of introductory topics:
    • Python programming
    • Software testing
  • Data structures and algorithms
  • Experimental and theoretical analysis of performance

How can we characterize the execution environment of a program?

  • Sensing properties of a system with Python
  • Characterizing system performance with micro-benchmarks

Using the systemsense tool

Commands
benchmarkinfo: Benchmark the system used for experiments.
completeinfo: Detect information about and then benchmark the system used for experiments.
systeminfo: Detect all relevant information about the system used for experiments.
  • systeminfo: use packages like psutil to collect relevant information about the execution environment, suitable for characterizing and comparing local and cloud-based systems
  • benchmarkinfo: use packages like timeit to run simple micro-benchmarks involving basic operations so as to characterize baseline system performance

Detecting CPU details

def get_cpu() -> Dict[str, str]:
    """Return information about the current CPU in the system."""
    # detect the name of the function in
    # which this source code exists
    function_name = inspect.stack()[0][3]
    # parse out the second part of the name after
    # the underscore character
    function_name = function_name.split("_")[1]
    # create a dictionary with the function's
    # purpose as the key and the value as
    # the return of the function that collects it
    return {function_name: str(platform.machine())}
  • Use the inspect package to detect the name of the function
  • Use the platform package to detect the CPU architecture
  • Ensure that the function works in all execution environments!

Detecting disk details

def get_disk() -> Dict[str, str]:
    """Return disk space usage."""
    function_name = inspect.stack()[0][3]
    function_name = function_name.split("_")[1]
    if platform.system() == constants.system.Windows:
        total_disk = psutil.disk_usage("C:\\").total
        used_disk = psutil.disk_usage("C:\\").used
    else:
        total_disk = psutil.disk_usage("/").total
        used_disk = psutil.disk_usage("/").used
    total_disk_gb = total_disk / (1024**3)
    used_disk_gb = used_disk / (1024**3)
    disk = f"Using {used_disk_gb:.2f} GB of {total_disk_gb:.2f} GB"
    return {function_name: disk}
  • Use the psutil package to detect disk usage details
  • Customize the function for different operating systems

Performing a micro-benchmark

def time_benchmark_concatenation(
    repeat: int = 3, number: int = 100000, size: int = 100
) -> Dict[str, str]:
    """Time the benchmark_concatenation function."""
    function_name = inspect.stack()[0][3].split(_)[2]
    performance_list = timeit.repeat(
        "benchmark.benchmark_concatenation(size)",
        repeat=repeat,
        setup=f"""
from systemsense import benchmark
size = {size}
        """,
        number=number,
    )
    return {function_name: str(performance_list)}
  • Use timeit to benchmark string concatenation function
  • Call function subject to benchmarking multiple times

Overall goals of “algorithmology”

  • Algorithm Engineering:
    • Design and implement algorithm and data structure
    • Test all aspects of the system to ensure correctness
    • Make a benchmark framework to measure performance
  • Algorithm Evaluation:
    • Design experiments to answer research questions
    • Conduct experiments and collect data
    • Analyze and visualize data to draw conclusions