Python development

Normally I do Python development in work, where everything is already set up for easy development and testing. It took considerable research and setup to get a similarly good environment for Python development at home, so I decided to document it for myself and others :)

Install all the packages

Note: I’m using pip3 to install packages because I’m running on Mac OS, but if you’re running on Linux I’d recommend using packages provided by your distribution.

pip3 install lxml mutmut mypy pudb pyfakefs pylint pytest pytest-cov yapf

See Upgrading packages installed with pip3 is troublesome for how I upgrade packages.

Testing

I use pytest for running tests and pytest-cov to get test coverage so I can figure out which parts of my code still need to be tested.

cd test-directory
pytest

Test coverage

To enable test coverage with pytest, create pytest.ini in the directory containing your tests, with contents like https://github.com/tobinjt/bin/blob/master/python/pytest.ini. Every time your tests are run successfully coverage will be generated and a message will be printed showing the coverage percentage.

I found that test coverage needed quite a bit of configuration through the file .coveragerc in the directory containing your tests; my final contents are https://github.com/tobinjt/bin/blob/master/python/.coveragerc. I highly recommend configuring testing to fail if there is insufficient coverage. You can also configure a HTML output directory so that you can easily see which lines of code you haven’t tested; configuring a portable output directory might be difficult, I haven’t needed to do so yet.

Integration tests.

I’m a fan of integration tests, where instead of testing individual functions you test large swathes of code at a time. I take the approach of picking a piece of functionality that should be supported, then writing a test to exercise that functionality end to end.

https://github.com/tobinjt/bin/blob/master/python/linkdirs_test.py#L38 is a good example of this, where I test progressively more complex use cases and scenarios by:

  1. Populating a fake filesystem (pyfakefs is great for this) with the scenario to deal with.
  2. Calling main() with the right arguments.
  3. Checking that the resulting filesystem is correct.

This was particularly reassuring when I added deletion to linkdirs.py :)

Automatically running tests

I like having tests run every time I save the source or test file because it saves me from doing it manually. This is quite easy to do with fswatch:

ipytest () {
  if [[ "$#" -eq 0 ]]; then
    echo "ipytest: expecting at least one filename." >&2
    return 1
  fi
  fswatch -o "$@" | xargs -n2 -I{} pytest "$@"
}

# Run 'pytest foo.py foo_test.py' whenever either file changes.
ipytest foo.py foo_test.py

Mutation testing

Mutation testing is where minor changes are made to source code (e.g. initialising a variable to True instead of False) and then the tests are run with the changed source. If the tests still pass despite the change it might indicate a gap in testing, or it might indicate that the default value is simply a placeholder that is always overwritten, or the change might not be meaningful (e.g. .rstrip('\n') changing to .rstrip('XX\nXX')), or other reasons. There isn’t a one-size-fits-all approach that you can take to addressing mutations that don’t cause test failures - sometimes you will add tests, sometimes you will change the source, and sometimes you will mark the line so it’s not mutated in future.

I use mutmut for mutation testing. I have made some changes to my code in response to mutmut warnings, and thus far I think the most significant change was to add tests for the output from --help to ensure that I don’t accidentally make it confusing (e.g. changing the description so it no longer matches the options).

The first run of mutmut for any file will be slow, but for subsequent runs mutmut will only check mutations that previously failed, so it speeds up as you make fixes. Sadly there have been a couple of occasions where it stayed reporting problems despite me fixing them, and the only way to mitigate that situation was rm .mutmut-cache and rerun from scratch :( mutmut runs your tests once without mutations to get a timing baseline so it can detect mutations that cause tests to run for too long, but this means that you cannot switch to doing anything else while testing mutations, because if the extra system load from doing something else slows down the testing enough you will have false positives.

Beware: mutmut only runs your tests with unmodified source the first time you run it. If you break your tests while addressing mutmut warnings, on the next run it will make all the mutations, run the tests, and declare that you have no problems to fix because the tests fail - but not because of the mutations :(

I found running mutmut was awkward - it doesn’t understand that tests for foo.py are in foo_test.py, so I wrote a wrapper for mutmut run that first runs tests with unmodified sources, then runs mutmut with the correct arguments, and fixes permissions afterwards (because mutmut replaces the source files it operates on and doesn’t set permissions correctly).

I use mutmut like this:

  1. Run mutation testing:

    $ mutmut_run colx.py
    
    - Mutation testing starting -
    
    These are the steps:
    1. A full test suite run will be made to make sure we
       can run the tests successfully and we know how long
       it takes (to detect infinite loops for example)
    2. Mutants will be generated and checked
    
    Results are stored in .mutmut-cache.
    Print found mutants with `mutmut results`.
    
    Legend for output:
    ๐ŸŽ‰ Killed mutants.   The goal is for everything to end up in this bucket.
    โฐ Timeout.          Test suite took 10 times as long as the baseline so were killed.
    ๐Ÿค” Suspicious.       Tests took a long time, but not long enough to be fatal.
    ๐Ÿ™ Survived.         This means your tests needs to be expanded.
    ๐Ÿ”‡ Skipped.          Skipped.
    
    1. Using cached time for baseline tests, to run baseline again delete the cache file
    
    2. Checking mutants
    โ ‹ 88/88  ๐ŸŽ‰ 62  โฐ 1  ๐Ÿค” 0  ๐Ÿ™ 25  ๐Ÿ”‡ 0
    
    real    4m38.098s
    user    4m17.888s
    sys     0m17.081s
    
  2. Show results:

    $ mutmut show colx.py
    
    To apply a mutant on disk:
        mutmut apply <id>
    
    To show a mutant:
        mutmut show <id>
    
    
    Timed out โฐ (1)
    
    ---- colx.py (1) ----
    
    # mutant 229
    --- colx.py
    +++ colx.py
    @@ -121,7 +121,7 @@
         # Strip leading and trailing empty fields.
         first_index = 0
         while len(split_columns) > first_index and not split_columns[first_index]:
    -      first_index += 1
    +      first_index = 1
         last_index = len(split_columns) - 1
         while last_index > first_index and not split_columns[last_index]:
           last_index -= 1
    
    
    Survived ๐Ÿ™ (25)
    
    ---- colx.py (25) ----
    
    # mutant 161
    --- colx.py
    +++ colx.py
    @@ -39,7 +39,7 @@
       Returns:
         argparse.Namespace, with attributes set based on the arguments.
       """
    -  description = '\n'.join(__doc__.split('\n')[1:])
    +  description = 'XX\nXX'.join(__doc__.split('\n')[1:])
       usage = __doc__.split('\n')[0]
    
       argv_parser = argparse.ArgumentParser(
    
  3. Investigate and fix some of the reported mutants.

    • False positives (e.g. changing a constant used consistently throughout the codebase) can be disabled with # pragma: no mutate.
    • Tests can be added or expanded, e.g. changing regexes to be anchored, or adding edge cases so that boundary conditions are tested more tightly.
    • Source changes may be appropriate, but I haven’t encountered any good examples yet.
  4. GOTO 1.

I’ve found that I can fix a lot of the mutations pretty quickly, then my progress slows down as I pick off more and more of the low hanging fruit. My aim is to reach 0 mutations that pass tests, and sometimes the pragmatic way to deal with the last few is to just disable them. I aim for 0 so that future runs give me a better signal, and I’m not left wondering “is that something I decided to ignore in the past?" because I know I didn’t decide to ignore anything. I’ve also found that I’m getting faster with each subsequent file, because I’m learning patterns and I can reuse tests and fixes from earlier files, which encourages me to continue.

Linting

I use pylint for linting. I have configured Syntastic and Vim to run pylint automatically on saving.

pylint *.py

https://github.com/tobinjt/dotfiles/blob/master/.pylintrc is my $HOME/.pylintrc.

Automatic formatting

I use yapf for autoformatting my code. I run this manually but will probably make it automatic the next time I write some Python. I tried pyformat but there doesn’t appear to be a way to set indentation to 2 spaces so I gave up on it.

yapf -i *.py

https://github.com/tobinjt/dotfiles/blob/master/.config/yapf/style is my $HOME/.config/yapf/style. You can run yapf --style-help to generate a config with all settings documented and set to their current setting.

Type checking

I use mypy for checking type annotations, which gives me more confidence that I’m passing the right types to functions, and helps document my code - particularly when I use type aliases to give meaningful names to parameter types. I have configured Syntastic and Vim to run mypy automatically on saving.

mypy *.py

https://github.com/tobinjt/dotfiles/blob/master/.mypy.ini is my $HOME/.mypy.ini.

Debugging

I’ve started using pudb for debugging because it presents so much more information at once. I recently debugged a failing test where displaying all the local variables made the problem obvious, but without that display (e.g. using pdb or printing debug information on each run) the debugging process would have been far slower.

To trigger debugging at a specific point in your code:

import pudb
pudb.set_trace()

Stop generating .pyc files

By default Python will write compiled bytecode for foo.py to foo.pyc, which I found annoying because it clutters up your source directory. I disabled that by setting the environment variable PYTHONDONTWRITEBYTECODE, e.g.:

export PYTHONDONTWRITEBYTECODE="No .pyc files please"

Note that you’ll probably want to unset this when installing Python modules so that you get the benefit of loading bytecode; to do that I use this wrapper function:

pip3() {
  (unset PYTHONDONTWRITEBYTECODE; command pip3 "$@")
}

Upgrading packages installed with pip3 is troublesome

pip3 doesn’t track manually installed packages vs auto-installed packages, and doesn’t have a way to upgrade all packages. Upgrading all packages (manually installed and auto-installed) can be done with a shell one-liner, except that it doesn’t take dependencies into account, so you might upgrade a dependency to a version that breaks a package you manually installed :(

If you keep track of the packages you’ve installed, you can upgrade them with pip3 install --upgrade pkg1 pkg2 .... Keeping track of which packages I need also lets me use the nuke it from orbit approach: remove the site-packages directory, reinstall Python, and reinstall the packages I need. I do this once a month because frequent updates make it far easier to figure out breakages, so I’ve written https://github.com/tobinjt/bin/blob/master/update-python-modules to make it a single command. This approach is probably overkill, but I’d prefer overkill than debugging a failed upgrade.

Vim configuration

I don’t have much configuration for Python:

  • Make sure that syntastic uses Python 3 so type annotations can be parsed:

    let g:syntastic_python_python_exec = 'python3'
    
  • Configure syntastic to use mypy too:

    let g:syntastic_python_checkers = ['python', 'mypy', 'pylint']
    

    I chose the order python, mypy, pylint to fail fast and surface errors from most severe to least severe - syntax errors will be caught by python, type errors by mypy, and lint errors by pylint.