Priority Queues

Gregory M. Kapfhammer

April 15, 2024

What is a priority queue? Why is it useful?

  • Normally data items removed based on insertion order
  • Now, data items are removed based on priority!
  • Queue that supports priority-oriented data access
  • Aim to have good performance for all key operations

Any examples of data we should store in a priority queue?

Main methods in a PriorityQueue API

  • insert(item, priority): Add item with the given priority.

  • findmin() - Return the item with the minimum priority. If there are multiple items with the minimum priority, ties may be broken arbitrarily.

  • removemin() - Remove and return the item with the minimum priority. As with the findmin method, ties are broken arbitrarily.

  • Use a list-based approach for a starting implementation
  • Identify performance trade-offs when using a list
  • Develop a new data structure called a heap
  • Use the heap to support efficient priority queue operations

Using a list for a PriorityQueue

class SimpleListPQ:
    def __init__(self):
        self._L = []

    def insert(self, item, priority):
        self._L.append((item, priority))

    def findmin(self):
        return min(self._L, key = lambda x : x[1])[0]

    def removemin(self):
        item, priority = min(self._L, key = lambda x : x[1])
        self._L.remove((item, priority))
        return item
  • The variable _L is a list of tuples stored in self
  • Each tuple stores an item and its priority
  • Both findmin and removemin have \(O(n)\) time complexity

Creating and using a SimpleListPQ

pq = SimpleListPQ()

pq.insert('apple', 3)
pq.insert('banana', 1)
pq.insert('cherry', 2)

print(pq.findmin())    # Outputs: 'banana'
print(pq.removemin())  # Outputs: 'banana'
print(pq.findmin())    # Outputs: 'cherry'
banana
banana
cherry
  • The first parameter to insert method is the item
  • The second parameter to insert method is the priority
  • The findmin method returns the item with minimum priority
  • The removemin method removes and returns item with minimum priority

Creating an Entry class

class Entry:
    def __init__(self, item, priority):
        self.priority = priority
        self.item = item

    def __lt__(self, other):
        return self.priority < other.priority
  • Indexing the tuple in the list is error-prone for SimpleListPQ
  • Alternatively, use an Entry class to store item and priority
  • The __init__ constructor initializes the item and priority
  • The __lt__ method compares the priorities of two Entrys
  • A lower priority value indicates a higher priority item!
  • Now, we can use this to build UnsortedListPQ!

Creating the UnsortedListPQ class

class UnsortedListPQ:
    def __init__(self):
        self._entries = []

    def insert(self, item, priority):
        self._entries.append(Entry(item, priority))

    def findmin(self):
        return min(self._entries).item

    def removemin(self):
        entry = min(self._entries)
        self._entries.remove(entry)
        return entry.item
  • The insert method has a worst-case time complexity of \(O(1)\)
  • The findmin and removemin methods have a worst-case time complexity of \(O(n)\) because the min function is \(O(n)\)

Creating the SortedListPQ class

class SortedListPQ:
    def __init__(self):
        self._entries = []

    def insert(self, item, priority):
        self._entries.append(Entry(item, priority))
        self._entries.sort(reverse = True)

    def findmin(self):
        return self._entries[-1].item

    def removemin(self):
        return self._entries.pop().item
  • findmin and removemin have a worst-case time complexity of \(O(1)\)
  • However, now insert has worst-case time complexity of \(O(n \log n)\)!
  • Wait, using an insertion sort could improve performance of insert!

What are the trade-offs in the SortedListPQ?

  • Trade-offs exist between the time complexity of the insert method and the findmin and removemin methods

  • In one system, we have fast insert and slow removemin and in the other we have slow insert and fast removemin

  • Achieve constant or logarithmic time complexity for all?

  • Need a new data structure to achieve this ambitious goal!

Exploring the heap data structure

Priority queues and heaps

  • Priority queue: Abstract data type (ADT)
  • Heap: Data structure used to implement a priority queue

Clarifying terminology

  • Many types of heaps exist, we’ll focus on the binary heap
  • The term “key” is often used for “priority” in a heap

Binary heap

  • A binary tree with smaller priorities “above” larger ones

Creating a HeapPQ class

class HeapPQ:
    def __init__(self):
        self._entries = []

    def insert(self, item, priority):
        self._entries.append(Entry(item, priority))
        self._upheap(len(self._entries) - 1)

    def _parent(self, i):
        return (i - 1) // 2

    def _children(self, i):
        left = 2 * i + 1
        right = 2 * i + 2
        return range(left, min(len(self._entries), right + 1))

    def _swap(self, a, b):
        L = self._entries
        L[a], L[b] = L[b], L[a]

    def _upheap(self, i):
        L = self._entries
        parent = self._parent(i)
        if i > 0 and L[i] < L[parent]:
            self._swap(i, parent)
            self._upheap(parent)

    def findmin(self):
        return self._entries[0].item

    def removemin(self):
        L = self._entries
        item = L[0].item
        L[0] = L[-1]
        L.pop()
        self._downheap(0)
        return item

    def _downheap(self, i):
        L = self._entries
        children = self._children(i)
        if children:
            child = min(children, key = lambda x: L[x])
            if L[child] < L[i]:
                self._swap(i, child)
                self._downheap(child)

    def _heapify(self):
        n = len(self._entries)
        for i in reversed(range(n)):
            self._downheap(i)

    def __len__(self):
        return len(self._entries)
  • insert has a worst-case time complexity of \(O(\log n)\)

Creating and using a HeapPQ

pq = HeapPQ()

pairs = [(10, 10), (2, 2), (30, 30), (4,4)]
for item, priority in pairs:
    pq.insert(item, priority)

print(pq.findmin())    # Outputs: 2
print(pq.removemin())  # Outputs: 2
print(pq.findmin())    # Outputs: 4
print(pq.removemin())  # Outputs: 4
2
2
4
4
  • Removing the minimum value maintains the “heap property”
  • The findmin method returns the minimum value in the heap
  • The removemin method removes and returns the minimum value

Creating the PriorityQueue class

class PriorityQueue(HeapPQ):
    def __init__(self,
                 items = (),
                 entries = (),
                 key = lambda x: x):
        self._key = key
        self._entries = [Entry(i, p) for i, p in entries]
        self._entries.extend([Entry(i, key(i)) for i in items])
        self._itemmap = {entry.item : index
                         for index, entry in enumerate(self._entries)}
        self._heapify()

    def insert(self, item, priority = None):
        if priority is None:
            priority = self._key(item)
        index = len(self._entries)
        self._entries.append(Entry(item, priority))
        self._itemmap[item] = index
        self._upheap(index)

    def _swap(self, a, b):
        L = self._entries
        va = L[a].item
        vb = L[b].item
        self._itemmap[va] = b
        self._itemmap[vb] = a
        L[a], L[b] = L[b], L[a]

    def changepriority(self, item, priority = None):
        if priority is None:
            priority = self._key(item)
        i = self._itemmap[item]
        self._entries[i].priority = priority
        self._upheap(i)
        self._downheap(i)

    def _remove_at_index(self, index):
        L = self._entries
        self._swap(index, len(L) - 1)
        del self._itemmap[L[-1].item]
        L.pop()
        self._downheap(index)

    def removemin(self):
        item = self._entries[0].item
        self._remove_at_index(0)
        return item

    def remove(self, item):
        self._remove_at_index(self._itemmap[item])

    def __iter__(self):
        return self

    def __next__(self):
        if len(self) > 0:
            return self.removemin()
        else:
            raise StopIteration
  • PriorityQueue extends HeapPQ and adds more methods

Random access in a PriorityQueue

def _remove_at_index(self, index):
    L = self._entries
    self._swap(index, len(L) - 1)
    del self._itemmap[L[-1].item]
    L.pop()
    self._downheap(index)

def removemin(self):
    item = self._entries[0].item
    self._remove_at_index(0)
    return item

def remove(self, item):
    self.remove_at_index(self._itemmap[item])
  • removemin method is the canonical way to extract next value
  • remove method supports random access to remove any item
  • _remove_at_index method supports both remove methods

What are some uses of this PriorityQueue?

  • Featured application areas:
    • Prioritizing steps in a task list
    • Inverted priority in a MaxHeap
    • Sorting algorithms like heapsort
    • Scheduling decisions in operating systems

Creating and using a PriorityQueue

pq = PriorityQueue()

pq.insert('apple', 3)
pq.insert('banana', 1)
pq.insert('cherry', 2)

pq.changepriority('apple', 0)

print(pq.removemin())  # Outputs: 'apple'

pq.remove('cherry')

for item in pq:
    print(item)        # Outputs: 'banana'
apple
banana
  • Illustrates the use of the PriorityQueue constructor and methods
  • What is the worst-case time complexity of these methods?

Creating and using a MaxHeap

# create a PriorityQueue instance where the
# key is the opposite of the input value of x
maxheap = PriorityQueue(key = lambda x: -x)

# does not specify a priority for a value
n = 10
for i in range(n):
    maxheap.insert(i)

# displays the values in decreasing order
print([maxheap.removemin() for _ in range(n)])
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
  • Creates a PriorityQueue to store values in decreasing order
  • Conveniently coalesces the key and the value into a single key
  • Note that the removemin function was not changed in any way!

Performing a HeapSort

def heapsort(L):
    H = PriorityQueue(L)
    L[:] = [item for item in H]

L = [3, 2, 4, 1, 6, 5]
print("Before heapsort:", L)
heapsort(L)
print("After heapsort: ", L)
Before heapsort: [3, 2, 4, 1, 6, 5]
After heapsort:  [1, 2, 3, 4, 5, 6]
  • Uses the _heapify method to create a PriorityQueue in \(O(n)\)
  • Iteration over each item in list comprehension costs \(O(\log n)\)
  • This means that the heapsort function has a time complexity of \(O(n \log n)\). Is this better, worse, or the same as mergesort?

A Job for an operating system

class Job:
    def __init__(self, name, priority, duration):
        self.name = name
        self.priority = priority
        self.duration = duration

    def __str__(self):
        return f"Job: {self.name}, Priority: {self.priority}, Duration: {self.duration}"

job1 = Job("Long-running job 1", 3, 5)
job2 = Job("Medium-running job 2", 1, 2)
job3 = Job("Short-running job 3", 2, 1)

job_queue = PriorityQueue()
job_queue.insert(job1, job1.priority)
job_queue.insert(job2, job2.priority)
job_queue.insert(job3, job3.priority)
  • Define a Job class with name, priority, and duration
  • Create three Job instances with different priorities and durations

Running Jobs with a PriorityQueue

# run all of the jobs in order of priority
while not len(job_queue) == 0:
    next_job = job_queue.removemin()
    print(f"Running {next_job}")
    for i in range(next_job.duration):
        print(f"{next_job.name} is running...")
    print(f"{next_job.name} has finished running.")
Running Job: Medium-running job 2, Priority: 1, Duration: 2
Medium-running job 2 is running...
Medium-running job 2 is running...
Medium-running job 2 has finished running.
Running Job: Short-running job 3, Priority: 2, Duration: 1
Short-running job 3 is running...
Short-running job 3 has finished running.
Running Job: Long-running job 1, Priority: 3, Duration: 5
Long-running job 1 is running...
Long-running job 1 is running...
Long-running job 1 is running...
Long-running job 1 is running...
Long-running job 1 is running...
Long-running job 1 has finished running.

Comparing PriorityQueue structures

UnsortedListPQ

  • Uses an unsorted list to store entries
  • Fast: insert method is \(O(1)\)
  • Slow: findmin and removemin methods are \(O(n)\)

SortedListPQ

  • Uses a sorted list to store entries
  • Slow: insert method is \(O(n \times log_2n)\)
  • Fast: findmin and removemin methods are \(O(1)\)

HeapPQ and PriorityQueue

  • Performance analysis:
    • Uses a binary heap to store entries
    • Fast: insert, findmin, and removemin are \(O(\log_2 n)\)
    • Trade-offs: no more constant-time complexity methods!
  • Functionality review:
    • PriorityQueue extends HeapPQ and adds more methods
    • PriorityQueue offers random access to remove any item
    • PriorityQueue supports iteration to remove all items