How does a container’s memory overhead influence the time efficiency of containment checking?
post
containment checking
lists
Authors
Miles Franck
Aidan Dyga
Nicholas Ingerson-Meacham
Simon Jones
Pallas-Athena Cain
Jason Gyamfi
Published
February 16, 2024
Introduction
When programming in Python each object created takes up a specific amount of space on the computer which is measured in bytes. The objects themselves hold their names as well as their properties. In the containmentcheck program we are looking in particular at container objects that also store the values inside of them, sometimes pointer values. In this post, we will be experimenting to see how much of an impact the size of an object has on the time efficiency of the containmentcheck program. Understanding the relationship between object size and program efficiency is crucial for optimizing the containmentcheck program’s performance.
Motivation
As society’s programs get more complex, computer memory is becoming an increasingly more valuable resource. When programming it is important to consider how the type of objects you use will impact your program’s performance. The goal of our experiments is to look at how the size of the container impacts the amount of space a program takes up and whether or not that causes an increase or decrease in the speed of the program’s runtime itself. As a result, our experiments seek to shed light on the crucial relationship between container size and its effects on program space and runtime efficiency. This will emphasize the significance of effective memory management and careful object type consideration in programming as societal programs become more complex.
Method
Part One
For the first part of our experiment, we compared the runtimes of the three types of containers used in our containmentcheck program, the list, tuple, and set. We used the sys.getsizeof() function to find the size, in bytes, of the current container.
Part One - How does sys.getsizeof work?
sys.getsizeof provides us with the size of the object in bytes. Here is an example of using the sys.getsizeof() function:
import syslist_example = [10, 2.0, "cat"]tuple_example = (10, 2.0, "cat")set_example =set([10, 2.0, "cat"])print("size of list_example:", sys.getsizeof(list_example))print("size of tuple_example:", sys.getsizeof(tuple_example))print("size of set_example:", sys.getsizeof(set_example))
size of list_example: 88
size of tuple_example: 64
size of set_example: 216
Part One - Our specific approach
In our experimentation, we edited main.py of the containmentcheck project to measure the size of the random container generated in each repeat. The implementation is very simple:
diff --git a/containmentcheck/containmentcheck/main.py b/containmentcheck/containmentcheck/main.pyindex 5d0b347..301b1fd 100644--- a/containmentcheck/containmentcheck/main.py+++ b/containmentcheck/containmentcheck/main.py@@ -4,6 +4,7 @@ import timeit from typing import Any, Callable, List, Tuple, Union import typer+import sys from rich.console import Console from containmentcheck.analyze import calculate_average_values@@ -135,3 +136,4 @@ def containmentcheck( # noqa: PLR0913 number_runs, number_repeats, )+ console.print(f"Size of container in memory {sys.getsizeof(random_container)} bytes")
Part One - Our results
As you can see in the following charts, when the container size increases, so does the number of bytes it takes up. This experiment was run at a container size of 10000, 1000000, and 100000000.
The results on our various operating systems are as follows:
Result from running on WSL2 Ubuntu 22.04.3 from Pallas
The following graphs show that for all three container types, there is a positive correlation between the size of the container and the amount of time it takes to process. Despite having different sizes in bytes, tuples and lists have nearly the same performance with tuples performing slightly better overall. Lists are larger containers but from the results, their size increase does not exactly correlate to the process speed. Sets have the same byte size as lists do but provide much slower results. The following scatter plots demonstrate the time versus the bytes. As you can see, lists are slightly slower than tuples but not by the same margin as the set’s slower speed. This case makes it appear that the size in bytes did not cause as much of an impact on the program as did the type of the container itself.
However, this does not explain the entirety of the picture of how bytes impact runtime. To get a better understanding of the relationship we can look at the program’s overall memory usage.
Part Two
Memory usage can be difficult to measure in Python in particular. Because Python automatically allocates memory for objects behind the scenes, it can be tricky to know exactly what is happening (Turner-Trauring, 2021).
There are three main cases where our containers may be stored. They can be stored in either RAM, the disk (swap), or both (Turner-Trauring, 2021). As a fourth option, the program may not be even stored at all. For the main three cases, the number of processes running in the background will decide where it is stored. For example, if there are a lot of browser tabs being run in the background the computer will likely switch things over to using the disk memory, also known as swap. This is our resident memory usage.
Part Two: Our Specific Approach
For this part of our experiment, we added a function named display_top which displays memory snapshots for lines that are memory intensive. Additionally, we used tracemalloc to trace the number of memory allocations during the containment checking process.
diff --git a/containmentcheck/containmentcheck/main.py b/containmentcheck/containmentcheck/main.pyindex 301b1fd..ddcaf6e 100644--- a/containmentcheck/containmentcheck/main.py+++ b/containmentcheck/containmentcheck/main.py@@ -1,6 +1,8 @@ """Perform an experiment to study efficiency of containment checking for collections.""" import timeit+import tracemalloc+import linecache from typing import Any, Callable, List, Tuple, Union import typer@@ -27,6 +29,30 @@ cli = typer.Typer() console = Console()+def display_top(snapshot, key_type='lineno', limit=10):+ snapshot = snapshot.filter_traces((+ tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),+ tracemalloc.Filter(False, "<unknown>"),+ ))+ top_stats = snapshot.statistics(key_type)++ print("Top %s lines" % limit)+ for index, stat in enumerate(top_stats[:limit], 1):+ frame = stat.traceback[0]+ print("#%s: %s:%s: %.1f KiB"+ % (index, frame.filename, frame.lineno, stat.size / 1024))+ line = linecache.getline(frame.filename, frame.lineno).strip()+ if line:+ print(' %s' % line)++ other = top_stats[limit:]+ if other:+ size = sum(stat.size for stat in other)+ print("%s other: %.1f KiB" % (len(other), size / 1024))+ total = sum(stat.size for stat in top_stats)+ print("Total allocated size: %.1f KiB" % (total / 1024))++ def perform_containment_check_benchmark( containment_check_lambda: Union[ Callable[[List[Any], Any], bool], Callable[[Tuple[Any], Any], bool]@@ -70,6 +96,7 @@ def containmentcheck( # noqa: PLR0913 number_repeats: int = typer.Option(3), ) -> None: """Conduct an experiment to measure the performance of containment checking."""+ tracemalloc.start() # create the starting data container and random number random_container = None # generate a random value that goes up to the maximum value;@@ -137,3 +164,8 @@ def containmentcheck( # noqa: PLR0913 number_repeats, ) console.print(f"Size of container in memory {sys.getsizeof(random_container)} bytes")++ console.print(f"Total traced memory {tracemalloc.get_traced_memory()}")+ snapshot = tracemalloc.take_snapshot()+ display_top(snapshot)+ tracemalloc.stop()
Part Two: Tracemalloc
The tracemalloc library is a tool that lets you trace the amount of memory that is allocated by a Python program. In this experiment we were able to use tracemalloc to get the total amount of traced memory from the program as well as the total allocated size of the program itself (“tracemalloc — Trace memory allocations”, 2024).
The following chart demonstrates the increase in traced memory and allocated size. We were not able to gather results from a container size of 100000000 because it took so much memory it caused our systems to crash. The sizes were found using the get_traced_memory() function within the tracemalloc library.
tracemalloc results
Approach
Container Size
Maximum Values
Container Size in Bytes
Traced Memory (current size and peak size of memory blocks)
Another utility of the tracemalloc library is the take_snapshot() method. This method allows the user to see snapshots of the memory blocks that are allocated by Python itself. The snapshot will provide an output of the memory blocks traced by tracemalloc in order of the one that takes up the most space. This is the output from running with the 1000000 size tuple.
Overall, the snapshot size of the list and the set are almost equal whereas the tuple displays less storage usage overall. This is consistent with our previous results.
The snapshot can be extremely helpful because it gives us a visualization of the space overhead from each process run by the program. As we can see the top line takes up over half of the amount of space traced by tracemalloc. From this output, we can also see that the process of creating the random integers takes up the majority of the space overhead. The integers created will also create a variation in size depending on the number of bytes they take up individually.
Overall, tracemalloc can be utilized outside of this project to visualize the amount of space parts of a program use. For the containmentcheck program specifically, the amount of space overhead is consistently larger in the sets and lists than in the tuple container.
Results
Part 1
The first part of our experiment yielded that the tuple approach was the most memory efficient with the list and set approaches coming second. Overall, we discovered that memory usage did not have much of an impact and that measuring time is a much better indicator of performance in this case. Despite this, we learned valuable information about the correlation between memory usage and the size of the different types of containers. In all, we can empirically conclude that tuples are the most memory-efficient of the 3 containers.
Part 2
For the future of this experiment, we would hope to be able to test the large container size using tracemalloc. This would provide us with more conclusive data about the correlation between container size and traced memory and allocated size.
From the tests we were able to run there is a definite positive correlation between the size of the container and the total amount of memory used overall. The larger the container, the more storage will be allocated to the program.