Normally I do Python development in work, where everything is already set up for easy development and testing. Recently I did some Python development at home, so I had to figure out how to do it, and here's what I came up with.

Note: I'm using pip to install packages because I'm running on Mac OS, but if you're running on Linux I'd recommend using packages provided by your distribution.

Testing

I use pytest for running tests and pytest-cov to get test coverage so I can figure out which parts of my code still need to be tested.

$ pip install pytest pytest-cov

To run your tests simply run pytest in the directory containing the tests.

Test coverage

To enable test coverage, create pytest.ini in the directory containing your tests, with contents like https://github.com/tobinjt/bin/blob/master/python/pytest.ini. Every time you run successfully run tests coverage will be generated.

I found that test coverage needed quite a bit of configuration; create .coveragerc in the directory containing your tests, with contents like https://github.com/tobinjt/bin/blob/master/python/.coveragerc. In particular, you can configure testing to fail if there is insufficient coverage, something I highly recommend.

Integration tests.

Unit tests are useful, but I'm a much bigger fan of integration tests, where instead of testing individual functions you test large swathes of code at a time. I take the approach of picking a piece of functionality that should be supported, then writing a test to exercise that functionality. https://github.com/tobinjt/bin/blob/master/python/linkdirs_test.py#L38 is a good example of this: I test progressively more complex use cases and scenarios, by:

  1. Populating a fake filesystem (pyfakefs is great for this) with the scenario to deal with.
  2. Calling main() with the right arguments.
  3. Checking that the resulting filesystem is correct.

This was particularly reassuring when I added deletion to linkdirs.py :)

Linting

I use pylint for linting.

$ pip install pylint

To configure linting, create $HOME/.pylintrc with contents like https://github.com/tobinjt/dotfiles/blob/master/.pylintrc.

To check files run pylint *.py.

Misc

Stop generating .pyc files

By default, Python will write compiled bytecode for foo.py to foo.pyc, which I found annoying. Disabled that by setting the environment variable PYTHONDONTWRITEBYTECODE, e.g.:

$ export PYTHONDONTWRITEBYTECODE="No .pyc files please"

Upgrading packages installed with pip is troublesome

pip doesn't track requested packages vs auto-installed packages, and doesn't have a way to upgrade all packages. Doing that is a shell one-liner, except that it doesn't take dependencies into account, so you might upgrade a dependency to a version that breaks a package you care about :(

The only way I've found to upgrade packages with pip is to keep track of the ones you've installed, then upgrade them with pip install --upgrade pkg1 pkg2 ....

I'm migrating all my source code repositories from Subversion to Git. I tried git-svnimport, but it only works if your repository has the recommended layout of trunk, tags, and branches; unfortunately, a lot of mine don't. git-svn initially looked like overkill, but it worked quite well. Below is the simple shell script I used to import my repositories and push them to Github; I manually created each repository using Github's web interface, but it may be possible to script that too.

#!/bin/bash

set -e

for repo in $( < "$HOME/repo-list" ); do
    echo "$repo"
    cd "$HOME/src"
    git svn clone svn+ssh://subversion.scss.tcd.ie/users/staff/tobinjt/src/svnroot/"$repo"
    cd "$repo"
    git remote add origin git@github.com:tobinjt/"$repo".git
    git push origin master
done

I spend a lot of my time using Vim, Bash, and various CLI tools. Over the past 15 years I've spent a lot of time configuring these tools, and I've gotten so used to my configuration that it's really weird when I don't have it. I use 6 machines on a regular basis (some with multiple accounts), so I need a way of managing those configuration files (typically known as dotfiles) and keeping them in sync between machines.

Configuration files aren't much different to code, so the obvious way to maintain them is a Version Control System. I originally used CVS back in 2002 or so, then migrated to Subversion around 2007 (I think), and I've been using Git since 2010. The big difference between dotfiles and code is that dotfiles need to be in your home directory, not a subdirectory somewhere. One approach is to make your home directory into a VCS repository and configure the VCS to ignore everything you don't want checked in, but that requires more maintenance than I'm happy with, and it possibly leaks information (e.g. if .gitignore contains bank-details.txt). The other approach is keep the checked out repository somewhere else and link all the files into your home directory - this is the approach I've taken.

Start by creating a Git repository on a hosting service somewhere; I use https://github.com, but others have recommended https://bitbucket.org. Why use a hosted service? Because you want the repository to be easily available and you want someone else taking care of backups for you. I was very imaginative and named mine dotfiles :) Check it out somewhere; the tools I wrote assume it will be under ~/src and match *dotfiles*.

Now I need a tool to link the files in ~/src/dotfiles into your home directory. I couldn't find one with a quick search back in 2010 (though now there appear to be many available), and I needed a project to learn Python after starting work in Google, so I wrote one: linkdirs. I'm not happy with that code, but it's good enough for now - the ugly Perl code it replaced was much worse. linkdirs is generic: it ignores various files associated with VCS systems, and Vim swap files, but you can use it for linking directories for other reasons. It links from multiple source directories, creates destination directories as necessary, and hard links files from source to destination. If a destination file exists but isn't a hard link to the source file, it will check if the contents are the same; if they are it will delete the destination and create the hard link, otherwise it will display the diffs. If anything fails or there are diffs it will exit unsuccessfully.

linkdirs is pretty low level, so I wrote a wrapper: dotfiles. If finds all directories matching *dotfiles* directly under ~/src (so I can have a standard repository on every computer plus a work repository on work computers), runs linkdirs with the right args, and does two more things:

  1. cat "${HOME}"/.ssh/config-??-* > "${HOME}/.ssh/config"

    ssh doesn't support multiple config files or includes, but I have standard configs and work configs in different repositories, so I keep the config snippets in separate files and combine them. This is done every time dotfiles runs - there's nothing clever to check if an update is necessary.

  2. vim help tags from different plugins (see below) need to be updated, and spell files need to be compiled. I wrote a simple vim function for each update (UpdateBundleHelptags and UpdateSpellFiles) and they're both run every time by dotfiles.

Both linkdirs and dotfiles support reporting unexpected files in the destination directory, making it relatively easy to find leftover files that I've deleted from the repository.

I use about 20 Vim plugins, and I manage each plugin as a git submodule, allowing me to easily update each plugin over time. Because I add and update plugins quite infrequently I've written instructions for myself in my .vimrc. I use Vundle to manage Vim's runtimepath, but I add the repositories manually because Vundle doesn't support submodules.

When I push a change to Github I later need to sync that changes to every machine (normally the next time I use the machine, or when I notice that something is missing). This is simple but tedious, so I wrapped up the per-machine work in update-dotfiles-and-bin, which also pushes any local changes and reports any unexpected files.

A relatively rare action is setting up a new machine or a new user, but I also made that really simple: clone-dotfiles. Originally dotfiles was a shell function rather than a standalone tool, so clone-dotfiles was more complicated back then. When I use a new machine I clone my bin repository, run clone-dotfiles, and I'm ready to go.

All of these tools are generic except for clone-dotfiles and can be reused by anyone.

About a month ago I wrote that I need a better LISP book. I gave up on the LISP book I had been reading, and started reading On LISP: Advanced Techniques for Common LISP by Paul Graham. I've read about one third of it, and understood most of it - I had trouble with some of the more difficult code, but I understood his explanations of LISP features and what the code was doing. I was impressed enough to get a copy of ANSI Common LISP, and I've read about one third of it in the last week. It's excellent - clear, concise, well structured; I highly recommend it. I've started solving Project Euler problems again, and I'm much happier with my code.

Last year I migrated the School of Computer Science and Statistics mail server from Solaris to Debian Linux. I made a lot of changes and improvements during the migration; one of the simplest was to keep /etc under version control. I assume most people are familiar with version control from writing code - if you're not, please spend a couple of hours reading and experimenting with any modern VCS, you'll be thankful you did. I first set up a version controlled /etc almost 10 years ago when I was Netsoc's sysadmin, but back then I was using CVS, and it was complicated by Solaris putting binaries and named pipes in /etc for backwards (and I really mean backwards) compatibility. This time I used etckeeper and git. One of the reasons for using git is that it's distributed: if we added a second mail server, I wanted to make synchronising /etc as simple as possible. It has proven to be very useful:

  • Being able to see the changes I made in previous days, especially during the initial setup, when a lot of services needed a lot of configuration.

  • Finding out when files last changed, so we can assure ourselves and users that we haven't changed anything that would cause the problems they're having, or find out that someone else made a change unbeknownst to us that could be responsible.

  • Avoiding directory listings like this:

dovecot.conf
dovecot.conf.2008-2009
dovecot.conf.2009-05-07
dovecot.conf.2009.07.13
dovecot.conf.2009-12-19.attempt.3.nearly.there
dovecot.conf.before-changes
dovecot.conf.Friday
dovecot.conf.worked-for-brendan
dovecot.conf.worked.yesterday
dovecot.conf.yesterday
dovecot.jic.conf.jan

Setup is explained in /usr/share/doc/etckeeper/README.gz but I'll summarise here:

cd /etc
etckeeper init
git status
# review the list of files to be added; files can be removed with
#   git rm --cached FILE
# files can be ignored by adding them to /etc/.gitignore
git commit -m "Initial import of /etc"

That's it - you now have a version controlled /etc. Chances are that you'll need to ignore some files because they're generated from others or modified daemons, but that's easy to do. If you intend cloning the repository, please read the security advice in /usr/share/doc/etckeeper/README.gz to avoid any nasty surprises.

I've been working on my wife's website recently, and I wanted to check that all the internal links and resources worked properly. I wasn't going to do this by hand, so I wrote a simple wrapper around wget. It deliberately downloads everything and saves it to make finding the location of broken links easier. Any request that wasn't answered with HTTP status 200 is displayed, e.g.:

--2014-11-17 22:07:14--  http://example.com/bar/
Reusing existing connection to example.com:80.
HTTP request sent, awaiting response... 404 Not Found
--
--2014-11-17 22:07:16--  http://example.com/baz/
Reusing existing connection to example.com:80.
HTTP request sent, awaiting response... 404 Not Found
--
--2014-11-17 22:07:18--  http://example.com/qwerty/
Reusing existing connection to example.com:80.
HTTP request sent, awaiting response... 404 Not Found
See /tmp/check-links-R4ZxQqw1Ak/wget.log and the contents of /tmp/check-links-R4ZxQqw1Ak for further investigation

That tells you which links are broken, and with that knowledge you're a simple grep -r /qwerty/ /tmp/check-links-R4ZxQqw1Ak to find the page containing the broken link.

It's not amazingly advanced, but it has been useful. I found a couple of 404s, and a large number of 301s that I could easily fix to avoid one more round trip for people viewing the site.

I needed to write a static web page in work recently, so I decided to use Markdown, because writing HTML is time-consuming and unproductive. I was writing a reasonably large page, so I wanted folding, which the syntax highlighting I've been using for years didn't support. I wrote some simple folding support to create nested folds at headers, and also reconfigured vim to recognise bulleted lists so that reformatting with gq doesn't destroy lists.

Save https://github.com/tobinjt/dotfiles/blob/master/.vim/plugin/markdown-folding.vim as ~/.vim/plugin/markdown-folding.vim - it will be automatically loaded every time you start vim, but it won't do anything by itself.

Add these lines to ~/.vimrc:

" Associate *.mdwn with markdown syntax.
autocmd BufRead,BufNewFile *.mdwn setlocal filetype=markdown
" Recognise bulleted lists starting with ^\*
autocmd FileType markdown setlocal formatoptions+=n formatlistpat=^\\*\\s*
" Interpret blockquotes as comments.
autocmd FileType markdown setlocal comments=n:>
" Configure folding to use the function defined earlier.
autocmd FileType markdown setlocal foldmethod=expr \
    foldexpr=MarkdownFolding(v:lnum)

Note: this was originally a lot longer and more complex, but a later version of tmux show-environment supports formatting the output as shell commands to eval, so this is much easier now.

tmux is a tty multiplexer similar to screen, but with some really nice features. One of those features is updating environment variables when you reconnect to a session - the client sends the current values to the tmux server, and they can be retrieved with:

$ tmux show-environment -s
unset DISPLAY
SSH_AGENT_PID=3912; export SSH_AGENT_PID
unset SSH_ASKPASS
SSH_AUTH_SOCK=/tmp/ssh-lXpzMY3205/agent.3205; export SSH_AUTH_SOCK
SSH_CONNECTION=192.0.2.1 43512 192.0.2.1 22; export SSH_CONNECTION
unset WINDOWID
unset XAUTHORITY

Of course, tmux can't force other processes to update their environment. bash has a hook you can use to do it: PROMPT_COMMAND. If this variable is set to the name of a function, bash will run that function before displaying your prompt. Here's a function and supporting settings to update your environment:

function prompt_command() {
    if [ -n "${TMUX}" ]; then
        eval "$(tmux show-environment -s)"
    fi
}
PROMPT_COMMAND=prompt_command

Tags:

From http://pool.ntp.org:

The pool.ntp.org project is a big virtual cluster of timeservers providing reliable easy to use NTP service for millions of clients.

The pool is being used by millions or tens of millions of systems around the world. It's the default "time server" for most of the major Linux distributions and many networked appliances (see information for vendors).

The NTP package in Debian Lenny uses the NTP pool, so when a user installs NTP on their home machine, it Just Works. Unfortunately, the SCSS firewall blocks NTP traffic for all hosts except our NTP server, breaking the default configuration for users on our network. Rather than reconfiguring every client, I configured bind on our DNS servers to hijack the pool.ntp.org domain, answering nearly [1] all requests for hosts in that domain with the address of our NTP server. This means that a user can just

apt-get install ntp

and NTP will work properly for them.

[1] The sole exception is www.pool.ntp.org: I want the URL http://www.pool.ntp.org to work in a user's browser. Although pool.ntp.org does resolve to our NTP server, the web server running on that host redirects requests for http://pool.ntp.org to http://www.pool.ntp.org, so that URL works too.

The bind zone file is quite short:

; vim: set filetype=bindzone :
; ----------------------------------------------------------------------
; Zonefile to hijack the pool.ntp.org domain, so NTP clients use our local
; NTP server instead of futilely trying to get through the firewall.
; ----------------------------------------------------------------------

$TTL            1D

@           IN SOA  ns.cs.tcd.ie. postmaster.cs.tcd.ie. (
                2009052001  ; Serial
                2H      ; Refresh - how often slaves
                        ; check for changes.
                2H      ; Retry - how often slaves will
                        ; retry if cheking for changes
                        ; fails
                14D     ; Expire - how long slaves
                        ; consider their copies fo our
                        ; zone to be valid for
                6H      ; Minimum 
            )

            ; Name server records
            IN NS       ns.cs.tcd.ie.
            IN NS       ns2.cs.tcd.ie.
            IN NS       ns3.cs.tcd.ie.
            IN NS       ns4.cs.tcd.ie.

            ; There are no MX records, because pool.ntp.org doesn't have any.

; This makes www.pool.ntp.org work, but of course the real address could
; change at any time.
www     IN CNAME    ntppool-varnish.develooper.com.
; pool.ntp.org resolves to ntp.cs.tcd.ie
; We can't use a CNAME, because bind complains that the record has 
; "CNAME and other data", and ignores it.
@       IN A        134.226.32.57
; *.pool.ntp.org resolves to ntp.cs.tcd.ie
*       IN CNAME    ntp.cs.tcd.ie.

You can play with it using commands like:

dig @ns.cs.tcd.ie pool.ntp.org
dig @ns.cs.tcd.ie www.pool.ntp.org
dig @ns.cs.tcd.ie didgeridoo.pool.ntp.org
dig @ns.cs.tcd.ie i.play.with.matches.pool.ntp.org

Our NTP server (ntp.scss.tcd.ie) is part of the NTP pool, and can be used by anybody, but you're probably better off using the pool.

Tags:

I first thought about learning LISP when I was still an undergrad, but I was stymied by Real Life and a lack of material to learn from. Shortly before I submitted my MSc thesis I picked up two LISP books - LISP and On LISP: Advanced Techniques for Common LISP - but my MSc was taking up all my time, so I put them on a shelf and forgot about them. About a month ago, I read Recursive Functions of Symbolic Expressions and their Computation by Machine (Part I), the original paper about LISP. It's very clearly written, and explains the design of LISP so well (in only 34 pages) that someone could make a reasonable attempt at implementing LISP based solely on reading it. Inspired by the paper, I dug out my books and started learning LISP; I've now reached a point where the solutions to some exercises are interesting enough to post.


Problem 5-3: Now write a pair of procedures KEEP-FIRST-N-CLEVERLY and KEEP-FIRST-N-CLEVERLY-AUX, that together make a list of the first n elements in a list. Be sure that KEEP-FIRST-N-CLEVERLY-AUX is tail recursive.

My solution:

(defun keep-first-n-cleverly (n alist)
  (keep-first-n-cleverly-aux n alist nil)
)

(defun keep-first-n-cleverly-aux (n alist newlist)
  (if (zerop n)
    newlist
    (keep-first-n-cleverly-aux
      (- n 1)
      (rest alist)
      (append newlist (list (first alist)))
    )
  )
)

I like tail recursion: lots of problems are simpler to solve recursively, and knowing that a tail recursive call will be optimised to a goto satisfies the part of my mind that thinks "What if my function is run on a list with 1000 elements? Would I be better writing it iteratively, so that it doesn't run out of stack space?".


Problem 5-9: Define SQUASH, a procedure that takes an expression as its argument and returns a non-nested list of all atoms found in the expression. Here is an example:

* (squash '(a (a (a (a b))) (((a b) b) b) b))
(A A A A B A B B B B)

Essentially, this procedure explores the fringe of the tree represented by the list given as its argument, and returns a list of all the leaves.

My solution:

(defun squash (alist)
  (cond
    ((null alist) nil)
    ((atom alist) (list alist))
    (t (append
         (squash (first alist))
         (squash (rest alist))
       )
    )
  )
)


Problem 5-12: The version of FIBONACCI we have already exhibited is inefficient beyond comparison. Many computations are repeated. Write a version with optional parameters that does not have this flaw. Think of working forward from the first month rather than backward from the nth month.

My solution:

(defun fib (n &optional (count 2) (fibn-2 0) (fibn-1 1))
  (case n
    (0 0)
    (1 1)
    (otherwise
      (if
        (equal n count)
        (+ fibn-2 fibn-1)
        (fib n (+ count 1) fibn-1 (+ fibn-2 fibn-1))
      )
    )
  )
)

The point of this exercise was to use optional parameters; if I was writing fib() for real, I would use an auxiliary procedure, like this:

(defun fib (n)
  (case n
    (0 0)
    (1 1)
    (otherwise (fib-aux n 2 0 1))
  )
)
(defun fib-aux (n num-calculated fibn-2 fibn-1)
  (if (equal n num-calculated)
    (+ fibn-2 fibn-1)
    (fib-aux n (+ num-calculated 1) fibn-1 (+ fibn-2 fibn-1))
  )
)

My first inclination when writing a fibonacci function is to use Memoization; if I was writing it in Perl I would use the standard module Memoize, where fibonacci is presented as an example in the documentation. I don't know yet how hard it would be to do this in LISP, but I expect that closures should be easy enough.