Software Testing

Gregory M. Kapfhammer

February 3, 2025

Software programs must work correctly

  • Data structures: does each structure store data correctly?
  • Algorithm: does each algorithm produce the correct output?
  • Benchmarks: is software under study invoked correctly?
  • Data Storage: do the benchmarks store data correctly?
  • Data Analysis: does the software analyze data correctly?

By running a program and checking its output software testing establishes a confidence in its correctness

  • Steps during software testing:
    • Create an input for the program
    • Setup the program’s environment
    • Pass the input to the program
    • Collect the output from the program
    • Compare the output to the expected output

How would you test the Doubler?

class Doubler:
    def __init__(self, n):
        self._n = 2 * n

    def n(self):
        return self._n

x = Doubler(5)
print(x.n() == 10)
assert(x.n() == 10)
y = Doubler(-4)
print(y.n() == -8)
assert(y.n() == -8)
True
True
  • Establishes a confidence in the correctness of the Doubler class
  • When testing is it better to use print or assert statements?

Best practices for software testing

  • Answer the following questions when testing:
    • Does the program meet its specification?
    • After changing the program, does it still work correctly?
  • Using assertion statements:
    • print statements require manual checking of output
    • assert statements automatically checks correctness
  • Use a testing framework like pytest or unittest
  • Assess the adequacy of the test suite with coverage.py

unittest for DayOfTheWeek

import unittest
from dayoftheweek import DayOfTheWeek

class TestDayOfTheWeek(unittest.TestCase):
    def test_init(self):
        d = DayOfTheWeek('F')
        self.assertEqual(d.name(), 'Friday')
        d = DayOfTheWeek('Th')
        self.assertEqual(d.name(), 'Thursday')

unittest.main(argv=['ignored'], verbosity=2, exit=False)
<unittest.main.TestProgram at 0x7f9685f1b4d0>
  • Call unittest.main differently for tests outside Quarto
  • Run test_dayoftheweek.py in slides/weekfour/
  • The OK output confirms that the assertions passed

Explore DayOfTheWeek

class DayOfTheWeek:
    """A class to represent a day of the week."""
    def __init__(self, abbreviation):
        """Create a new DayOfTheWeek object."""
        self.abbreviation = abbreviation
        self.name_map = {
            "M": "Monday",
            "T": "Tuesday",
            "W": "Wednesday",
            "Th": "Thursday",
            "F": "Friday",
            "Sa": "Saturday",
            "Su": "Sunday",
        }

    def name(self):
        return self.name_map.get(self.abbreviation)
  • Support the lookup of a day of the week through an abbreviation

Exploring test-driven development in Python

  • Test-driven development asks you to write tests before code:
    • How will you use a function?
    • What are its inputs and outputs?
    • Can you write code to make tests pass?
  • The TDD mantra is Red-Green-Refactor:
    • Red: The tests fail. You haven’t written the code yet!
    • Green: You get the tests to pass by changing the code.
    • Refactor: You clean up the code, removing duplication.

How can you refactor Python code?

L1 = [1, 2, 3, 4, 5]
L2 = [6, 7, 8, 9, 10]
avg1 = sum(L1)/len(L1)
avg2 = sum(L2)/len(L2)
print("avg(", L1, ") -->", avg1)
print("avg(", L2, ") -->", avg2)
avg( [1, 2, 3, 4, 5] ) --> 3.0
avg( [6, 7, 8, 9, 10] ) --> 8.0
  • This code will not work for empty lists!
  • And, the code is repetitive and hard to read
  • Can we refactor the program to avoid the defect?
L1 = [1, 2, 3, 4, 5]
L2 = [6, 7, 8, 9, 10]
if len(L1) == 0:
    avg1 = 0
else:
    avg1 = sum(L1) / len(L1)
if len(L2) == 0:
    avg2 = 0
else:
    avg2 = sum(L2) / len(L2)
print("avg(", L1, ") -->", avg1)
print("avg(", L2, ") -->", avg2)
avg( [1, 2, 3, 4, 5] ) --> 3.0
avg( [6, 7, 8, 9, 10] ) --> 8.0
  • This avoids the defect but is repetitive and hard to read!
def avg(L):
    if len(L) == 0:
        return 0
    else:
        return sum(L) / len(L)

L1 = [1, 2, 3, 4, 5]
L2 = [6, 7, 8, 9, 10]
avg1 = avg(L1)
avg2 = avg(L2)
print("avg(", L1, ") -->", avg1)
print("avg(", L2, ") -->", avg2)
avg( [1, 2, 3, 4, 5] ) --> 3.0
avg( [6, 7, 8, 9, 10] ) --> 8.0
  • The avg function avoids the defect and is easier to read!

What should we test?

  • For each function, ask yourselves the following questions:
    • What should happen when I run this function?
    • How do I want to use this function?
    • What are the inputs and outputs of this function?
    • What should be the function’s inputs and outputs?
    • What are the edge cases for this function?
  • Test the system’s expected behavior, not its implementation
  • Test the public interface of the program under test
  • Transform detected defects into repeatable test cases
  • Assess the adequacy of the test suite with coverage.py

Testing often positively influences a program’s object-oriented design

  • Software testing helps to refine an object-oriented design

  • Interplay between testing and object-oriented design:

    • Identify nouns (classes) and verbs (methods)
    • Specify what should happen when you call a method
    • Write a unit test case to encode this expected behavior
    • Confirm that all of the test cases pass correctly
    • Refactor the code to improve its object-oriented design
    • Repeatedly run the test suite to confirm correctness

Advanced Software Testing Techniques

  • Parameterized test cases with pytest
  • Property-based testing with hypothesis
  • Test coverage with coverage.py
  • Mutation testing with mutmut
  • Using artificial intelligence to generate tests

Testing DayOfTheWeek with Pytest

from daydetector.dayoftheweek import DayOfTheWeek

def test_init():
    """Test the DayOfTheWeek class."""
    d = DayOfTheWeek("F")
    assert d.name() == "Friday"
    d = DayOfTheWeek("Th")
    assert d.name() == "Thursday"
    d = DayOfTheWeek("W")
    assert d.name() == "Wednesday"
    d = DayOfTheWeek("T")
    assert d.name() == "Tuesday"
    d = DayOfTheWeek("M")
    assert d.name() == "Monday"
  • Run poetry run pytest in the daydetector directory

  • pytest will automatically discover and run the tests!

Parameterized Tests with pytest

@pytest.mark.parametrize(
    "abbreviation, expected",
    [
        ("M", "Monday"),
        ("T", "Tuesday"),
        ("W", "Wednesday"),
        ("Th", "Thursday"),
        ("F", "Friday"),
        ("Sa", "Saturday"),
        ("Su", "Sunday"),
        ("X", None),
    ],
)
def test_day_name(abbreviation, expected):
    """Use parameterized testing to confirm that lookup works correctly."""
    day = DayOfTheWeek(abbreviation)
    assert day.name() == expected
  • Express the inputs and the expected outputs in a table!

Property-based test case

import hypothesis.strategies as st
from hypothesis import given
import pytest

@pytest.mark.parametrize(
    "valid_days",
    [["Monday", "Tuesday", "Wednesday", "Thursday",
      "Friday", "Saturday", "Sunday"]],
)
@given(
    st.text(alphabet=st.characters(), min_size=1, max_size=2)
)
def test_abbreviation_maps_to_name(valid_days, abbreviation):
    """Use property-based testing with Hypothesis to confirm mapping."""
    day = DayOfTheWeek(abbreviation)
    assert day.name() in valid_days or day.name() is None
  • Hypothesis strategies generate random character inputs for the abbreviation parameter, thereby increasing the input diversity

Test coverage with coverage.py

tests/test_dayoftheweek.py ..........                                                                              [100%]

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                          Stmts   Miss Branch BrPart  Cover
---------------------------------------------------------------
daydetector/__init__.py           0      0      0      0   100%
daydetector/dayoftheweek.py       6      0      0      0   100%
---------------------------------------------------------------
TOTAL                             6      0      0      0   100%


10 passed in 0.21s
  • The test suite covers 100% of the program’s source code

  • Covering statements increases the likelihood of detecting defects

  • Do high coverage tests mean that the program is defect-free?

Mutation testing with mutmut

--- daydetector/dayoftheweek.py
+++ daydetector/dayoftheweek.py
@@ -7,15 +7,7 @@
     def __init__(self, abbreviation):
         """Create a new DayOfTheWeek object."""
         self.abbreviation = abbreviation
-        self.name_map = {
-            "M": "Monday",
-            "T": "Tuesday",
-            "W": "Wednesday",
-            "Th": "Thursday",
-            "F": "Friday",
-            "Sa": "Saturday",
-            "Su": "Sunday",
-        }
+        self.name_map = None
  • mutmut changes the program to see if tests find possible defect

  • Mutation operators systematically modify the program’s code

Can you reproduce the software testing examples from these slides?

  • Use www.algorithmology.org/slides/weekfour/
  • Software testing commands you can try:
    • poetry install
    • poetry run pytest
    • poetry run pytest --cov --cov-branch
    • poetry run mutmut run
    • poetry run mutmut show 16
  • What does output tell you about program and/or test quality?

Oh, one more thing! Did you know that some of the test cases you see in test_dayoftheweek.py were written by a large language model?

  • What are the benefits and downsides of using artificial intelligence (AI) to generate tests?

  • What are situations in which you should and should not use AI to generate tests?

What are the performance trade-offs associated with software testing?

  • Questions to consider when performing software testing:
    • When should you run the test suite?
    • How frequently should you run the test suite?
    • How can you reduce the cost of:
      • … test execution?
      • … coverage monitoring?
      • … mutation testing?
  • How do you balance the costs and benefits of software testing?