test_curl_plugin.py

This tutorial assumes a Python 3 environment; the easiest way to arrange for one if it’s not your system default is with Miniconda. You’ll need to install pytest (pip install -user pytest if you’re using the system Python; pip install pytest otherwise). Your shell will also need to have its CONDOR_CONFIG, PATH, and PYTHONPATH environment variables pointing at a working HTCondor installation. Finally, the code in this tutorial requires Ornithology; either run the example code from the src/condor_tests directory, or add src/condor_tests to your PYTHONPATH.

Description

In this tutorial, we’re going to work through how to write a real, but basic, test from scratch. We’re going to work backwards from the assertions that constitute the test. Those assertions require certain data to be available; at each step of our backwards iteration, which will define how to generate that data, possibly in terms of other data, until the last step doesn’t need any input data, or the framework provides it all for us.

In this example, we’re trying to test the curl plugin. At the simplest level, we want to make sure that (a) a job transferring input files via a URL actually gets them, and (b) that when a transfer from a URL fails, that the plugin notices and correctly puts the job on hold.

In this first pass, we’ll simplify things by assuming that if job A completes, then the URL was downloaded to the right place with the right contents. We can elaborate on these conditions to avoid false positives later.

The Assertions

Our first task is to write down what it means for the test to pass. We do this by using Python’s assert statement to check whether a given value is True. Our job is to figure out what we want to assert on, and how to produce that value in the first place.

Create the following as test_curl_plugin.py. The filename must begin with a test_ to become part of the test suite. Likewise, the functions in that file which start with test_ become individual tests. Note that we gave the functions relatively long but descriptively self-explanatory names, because those are what we’ll see in the test reports. For now, ignore the fact that the test functions are part of a class; we’ll explain why later. (The leading self in a member function’s parameters is mandatory but may safely be ignored.) Like the filename and the function name, the class name must begin with Test (but not the underscore).

from ornithology import JobStatus

class TestCurlPlugin:
    def test_job_with_good_url_succeeds(self, job_with_good_url):
        assert job_with_good_url.state[0] == JobStatus.COMPLETED

    def test_job_with_bad_url_holds(self, job_with_bad_url):
        assert job_with_bad_url.state[0] == JobStatus.HELD

We do not attempt to create the job object in the test; instead, we take the test jobs as arguments. The canonical test function is just a (short) series of assertions about its arguments. This makes it easy – and indeed, automatic – to distinguish a test failure (the job with the bad URL didn’t go on hold…) from a testing infrastructure failure (… because it never started running).

pytest calls each object referenced a test function’s arguments a “fixture”, in the sense of a fixed piece of machinery necessary to run the text. Ornithology provides three special ways of defining fixtures: @config, @standup, and @action (config(), standup(), and action()). We’ll only need the third one for this tutorial.

Note that we gave the fixtures long but descriptively self-explanatory names as well, since those names will be used to report errors.

Fixtures!

(Think: “Plastics!”)

@action is an annotation; an annotation is syntactic sugar for calling a function on the subsequent function. In the following code, @action basically just marks job_with_bad_url and job_with_good_url as the functions which produce the job_with_bad_url and job_with_good_url fixtures. (Which is why we gave them different names in the test functions.)

We start with an empty argument list and a desire to submit a job and then wait for it to complete.

from ornithology import action

@action
def job_with_good_url():
    pass

We need to submit this job and wait for it to reach a “terminal” state: either completed, held, or removed. The easiest way to wait for a job to terminate is to use a ClusterHandle. These are what we get back when submitting jobs via Ornithology. Once we have a handle, we can use its ClusterHandle.wait() method to do the actual waiting. Luckily, we don’t care all that much about the details of our personal condor, so we can use the default_condor fixture provided by Ornithology.

from ornithology import action

@action
def job_with_good_url(default_condor):
    job = default_condor.submit(
        {
            # Do nothing of interest.
            "executable": "/bin/sleep",
            "arguments": "1",
            # These are the two lines we really care about.
            "transfer_input_files": "FIXME",
            "should_transfer_files": "YES",
        }
    )

    job.wait(condition = FIXME)

It is considered good Python form to leave the trailing comma in so that the individual lines may be freely reordered.

Note

Why do we wait for the jobs to enter a terminal state in this fixture?

At one level, we have to wait at some point for the test to work, and we don’t want to wait in the test functions because waiting could fail. At another level, it’s a judgement call: you could certainly instead write a smaller job_with_bad_url() function that accepted a different fixture, a job which had only just been submitted, and that would be fine too.

In this case, the judgement was that we didn’t expect the abstract operation of “running the job” to fail often enough to be worth breaking into two separately-checked pieces.

However, in any case, if these functions checked for the specific state the test functions expect to see, that would defeat the point of splitting them up, so we don’t do that, either.)

What about the FIXMEs?

The job we submit needs to know what URL to download from, but to minimize the tests’ frailty and to isolate it from the outside world, we want that URL to be served by a server we started for the test. We obviously can’t count on port 80 being available, so we’ll need the URL to include the port. The safest way to do that is to determine the URL at run-time, after we’ve started the web server and it has bound to its listen port. That sounds like a lot of work, and something else that could fail, so let’s make the URL a fixture.

Now we’ll get the waiting working. As an implementation detail, ClusterHandle.wait() requires the job to produce an event log, so we’ll have to provide one. By convention, everything the job produces should go into the corresponding test-specific directory. As you might expect by now, Ornithology provides a fixture for that, test_dir().

from ornithology import action, ClusterState

@action
def job_with_good_url(default_condor, good_url, test_dir):
    job = default_condor.submit(
        {
            # Do nothing of interest.
            "executable": "/bin/sleep",
            "arguments": "1s",
            # These are the two lines we really care about.
            "transfer_input_files": good_url,
            "should_transfer_files": "YES",
            # Implementation detail.
            "log": (test_dir / "good_url.log").as_posix(),
        }
    )

    job.wait(condition = FIXME)

    return job

The actual waiting condition will be a method on the ClusterState that is attached to the ClusterHandle. Because functions are first-class objects in Python, we can simply pass a reference to the appropriate method to ClusterHandle.wait(). In this case we will wait for the job to either complete or get held, which are both “terminal” states. The code block below also adds the job_with_bad_url fixture.

from ornithology import action, ClusterState

@action
def job_with_good_url(default_condor, good_url, test_dir):
    job = default_condor.submit(
        {
            "executable": "/bin/sleep",
            "arguments": "1s",
            "transfer_input_files": good_url,
            "should_transfer_files": "YES",
            "log": (test_dir / "good_url.log").as_posix(),
        }
    )

    job.wait(condition=ClusterState.all_terminal)

    return job

@action
def job_with_bad_url(default_condor, bad_url, test_dir):
    job = default_condor.submit(
        {
            "executable": "/bin/sleep",
            "arguments": "1s",
            "transfer_input_files": bad_url,
            "should_transfer_files": "YES",
            "log": (test_dir / "bad_url.log").as_posix(),
        }
    )

    job.wait(condition=ClusterState.all_terminal)

    return job

OK! Now we just need the good and bad URL fixtures. Again, we could split this fixture in two pieces, but it’s already short and simple, so we won’t bother.

@action
def good_url(server):
    server.expect_request("/goodurl").respond_with_data("Great success!")
    return f"http://localhost:{server.port}/goodurl"

@action
def bad_url(server):
    server.expect_request("/badurl").respond_with_data(status = 404)
    return f"http://localhost:{server.port}/badurl"

We’re getting a little test-specific and a little exotic here, so I’ll just say that server is provided by a pytest extension designed for exactly this purpose. The fixture is implemented in the following, funny, way.

from pytest_httpserver import HTTPServer

@action
def server():
    with HTTPServer() as httpserver:
        yield httpserver

This song-and-dance works around a detail in how @action is implemented that we’ll talk about further below.

Testing the Test

We’ve now iterated backwards from the asserts, writing functions for the missing arguments until we’ve reached a function which takes no arguments, which means it’s now time to run pytest and see what happens.

$ pytest ./test_curl_plugin.py
============================= test session starts ==============================
platform linux -- Python 3.8.2, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /home/tlmiller/miniconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/tlmiller/condor/source/src/condor_tests, inifile: pytest.ini
plugins: cov-2.8.1, dependency-0.5.1, httpserver-0.3.4, mock-3.1.0, flask-1.0.0

Base per-test directory: /tmp/condor-tests-1591061678-16424
Python bindings version:
$CondorVersion: 8.9.7 May 20 2020 BuildID: UW_Python_Wheel_Build $
HTCondor version:
$CondorVersion: 8.9.8 Jun 01 2020 PRE-RELEASE-UWCS $
$CondorPlatform: x86_64-Devuan-2 $

collected 2 items

example01.py::TestCurlPlugin::test_job_with_good_url_succeeds PASSED     [ 50%]
example01.py::TestCurlPlugin::test_job_with_bad_url_holds PASSED         [100%]

============================== 2 passed in 19.99s ==============================

Parametrization

Warning

pytest uses the British spelling parametrize instead of parameterize. Be aware if you’re looking for more documentation!

As written, the bad URL gets a code 404 reply. If we wanted to test what happens how the curl plugin responds to a code 500 reply, we don’t have to change anything about the test except job_with_bad_url. With pytest, that’s true even if we want to test both codes.

Parametrizing @actions involves an unfortunate amount of syntactic magic, but here’s how you do it:

@action(params={"404":404, "500":500})
def bad_url(server, request):
    server.expect_request("/badurl").respond_with_data(status = request.param)
    return f"http://localhost:{server.port}/badurl"

If you’re not familiar with the syntax, that’s calling @action with the named argument params as an inline-constant dictionary mapping the string “404” to the integer 404, and the string “500” to the integer 500. The keys are used by pytest to generate the test’s “id” when reporting results; the values will be injected into the test as described below.

For each use of the job_with_bad_url fixture, pytest will generate two subtests: one named “404”, and the other named “500”. In the former, request.param is 404, and in the latter, it is 500. IF you run pytest again, you’ll see that it now reports three test results, one for the good URL job, and one for each of the two bad URL jobs:

$ pytest ./test_curl_plugin.py
============================= test session starts ==============================
platform linux -- Python 3.8.2, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /home/tlmiller/miniconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/tlmiller/condor/source/src/condor_tests, inifile: pytest.ini
plugins: cov-2.8.1, dependency-0.5.1, httpserver-0.3.4, mock-3.1.0, flask-1.0.0

Base per-test directory: /tmp/condor-tests-1591061845-16808
Python bindings version:
$CondorVersion: 8.9.7 May 20 2020 BuildID: UW_Python_Wheel_Build $
HTCondor version:
$CondorVersion: 8.9.8 Jun 01 2020 PRE-RELEASE-UWCS $
$CondorPlatform: x86_64-Devuan-2 $

collected 3 items

example02.py::TestCurlPlugin::test_job_with_good_url_succeeds PASSED     [ 33%]
example02.py::TestCurlPlugin::test_job_with_bad_url_holds[404] PASSED    [ 66%]
example02.py::TestCurlPlugin::test_job_with_bad_url_holds[500] PASSED    [100%]

============================== 3 passed in 29.46s ==============================

You could parameterize job_with_good_url in a similar way to verify that a very small (0 byte) file or a very large file are also handled correctly.

If you instead wanted to verify that the curl plugin worked with both static and dynamic slots, then pytest would instead run six tests: the good URL test and the two bad URL tests in dynamic slots, and those three again in static slots.

The Song-and-Dance

pytest normally doesn’t cache fixtures at all (although they call this “caching at the function level”). However, for testing HTCondor, where starting up a personal condor is a core task, and therefore a core fixture, this rapidly becomes a burden, both in terms of time and in terms of writing a multi-step test where the state of that personal condor matters.

The Ornithology framework solves this by defining all of its custom fixtures to cache at the class level – all functions that are members of the same class share a common pool of fixtures. This makes the tests both easier to write and faster, and it’s why the tutorial starts off with the functions in a class.

However, since the pytest default is not to share fixtures between functions, some extensions – including pytest_httpserver – only provide their default fixtures at the functional level. (Why pytest can’t automagically convert, I don’t know.) This is why we needed to write an adapter around it.

Implementation details of our workaround: the yield <value> construct causes the value to be “returned”, but instead of the function returning, its execution is temporarily suspended. When the fixture goes out of scope, pytest resumes the execution of the function. The with construct is a “context manager” which arranges for the cleanup of the server when the with block ends. This is all implemented via generators.

Complete Test

This version is slightly different than what’s in the source tree (it doesn’t check the contents of the downloaded file) so here’s a copy of the whole thing in one go, as formatted by the black package (pip install [--user] black).

from ornithology import action, JobStatus, ClusterState
from pytest_httpserver import HTTPServer


@action
def server():
    with HTTPServer() as httpserver:
        yield httpserver


@action
def good_url(server):
    server.expect_request("/goodurl").respond_with_data("Great success!")
    return f"http://localhost:{server.port}/goodurl"


@action(params={"404": 404, "500": 500})
def bad_url(server, request):
    server.expect_request("/badurl").respond_with_data(status=request.param)
    return f"http://localhost:{server.port}/badurl"


@action
def job_with_good_url(default_condor, good_url, test_dir):
    job = default_condor.submit(
        {
            "executable": "/bin/sleep",
            "arguments": "1",
            "transfer_input_files": good_url,
            "should_transfer_files": "YES",
            "log": (test_dir / "good_url.log").as_posix(),
        }
    )

    job.wait(condition=ClusterState.all_terminal)

    return job


@action
def job_with_bad_url(default_condor, bad_url, test_dir):
    job = default_condor.submit(
        {
            "executable": "/bin/sleep",
            "arguments": "1",
            "transfer_input_files": bad_url,
            "should_transfer_files": "YES",
            "log": (test_dir / "bad_url.log").as_posix(),
        }
    )

    job.wait(condition=ClusterState.all_terminal)

    return job


class TestCurlPlugin:
    def test_job_with_good_url_succeeds(self, job_with_good_url):
        assert job_with_good_url.state[0] == JobStatus.COMPLETED

    def test_job_with_bad_url_holds(self, job_with_bad_url):
        assert job_with_bad_url.state[0] == JobStatus.HELD