In this post I want to have a look at the code structure of Watson, a simple Python program for time tracking. It is the first of hopefully a long series of articles in which I analyse the architecture of open source projects. Watson is quite simple and is programmed in my primary language (Python), so it is a good starting point.

So let’s dive right into the code. For this article I had a look at commit b820093 which is the release commit of version 2.1.0.

Repository Structure

In the first section of this article, we will look at the structure of the repository.

Watson Repository Structure

The outermost directory follows the common Python project setup: We have a code directory with the name of the package (watson), a tests folder for unit tests, a docs folder for documentation and a setup.py. Additionally, Watson contains a folder scripts with some helper scripts not required by the main program.

Watson uses two requirements files, a requirements.txt for all libraries required to run the program and a requirements-dev.txt for additional packages needed for testing and development. These files are directly mapped to the entries install_requires and tests_require in setup.py. Watson does not pin versions in requirements.txt. I think that for a public repository that’s a good choice. For in-house projects, in my opinion, it makes sense to pin the versions in requirements.txt to get reproducible releases.

The repository contains a Makefile to collect multiple routines from different other programs at a single location. For example make install will run python setup.py install, whereas make docs will call one of the helper scripts followed by mkdocs build. I personally like this approach, because it serves as documentation to typical build steps. And even though setup.py combines several already, most projects need some other programs, too.

# Excerpt from Watson's Makefile

PYTHON ?= python

.PHONY: install
install:
	$(PYTHON) setup.py install

# [...]

.PHONY: docs
docs: install-dev
	$(PYTHON) scripts/gen-cli-docs.py
	mkdocs build

We also find files for shell completion in the repository, namely watson.completion, watson.fish and watson.zsh-completion.

Testing

Watson uses Travis for CI tests which tests the project with several different versions of Python as well as one test for source code problems using flake8.

The testing framework is pytest. Tests are structured the same as the program source; there’s one test file for each source program file except for frames.py.

Additional data needed by the unit tests is stored in the folder tests/resources.

Documentation

Let’s have a look at the documentation. We already saw the docs folder previously. This folder contains the user documentation as well as documentation for the processes around the development of Watson. It does not contain documentation for the code itself. These files are publicly hosted as GitHub Pages.

The documentation for all Watson commands inside commands.md is auto-generated using the script scripts/gen-cli-docs.py. It uses a clever approach: As we will later see, Watson uses the Python library click to build the command line interface. The script to generate the commands documentation iterates over all click commands using reflection and calls the method Command.format_help to write the documentation into a temporary buffer in Markdown format. From this buffer it can then be copied into the Markdown file.

Code is documented with docstrings. The quality of documentation varies greatly between files. config.py contains quite good documentation, frames.py does not contain docstrings at all and only three lines of comments. There is the argument that good code does not require documentation. And while this contains some truth (worse code needs more explanation), as a newcomer to the project I see three classes Frame, Span and Frames in this file and I am wondering what exactly they represent and how they interact.

Code Structure

That said, let’s look at the code structure. Watson consists almost entirely of two files: cli.py with 1727 lines and watson.py with 583 lines.

cloc outputs the following statistics:

---------------------------------------------------------------------------------
File                               blank        comment           code
---------------------------------------------------------------------------------
./cli.py                             242            496            989
./watson.py                          113             54            416
./utils.py                            71            103            228
./fullmoon.py                          7              6            220
./frames.py                           43              3            141
./autocompletion.py                   29             35             54
./config.py                           32             46             29
./__init__.py                          1              0              3
./__main__.py                          1              0              2
./version.py                           0              0              1
---------------------------------------------------------------------------------
SUM:                                 539            743           2083
---------------------------------------------------------------------------------

Let’s first look at watson.py. This file contains a single class Watson. Even though many methods inside Watson are documented, the class Watson itself is not. To my understanding Watson is a god-object. It seems to be responsible for file management, state management as well as communication with a remote sync server.

One noteworthy finding for me in watson.py is the import of requests. It is standard approach in Python to collect all imports at the top of the file, which means they all will be imported as soon as the file is loaded. Python, however, also allows us to execute imports at other places. Watson uses this and imports the library requests only in methods that actually need it “to reduce watson response time”.

Command Line Interface

cli.py is the biggest file in the project. It uses the library click to create the command line interface. The file introduces two custom types to parse the arguments provided to commands: MutuallyExclusiveOption and DateTimeParamType. This is a very handy feature to standardize how special types of arguments (e.g. dates) can be provided across all API commands. If support for relative dates like 10 minutes ago should be added, it could be implemented in this data type parser and would be available in all commands immediately.

# Example of a command line interface command in Watson

@cli.command(context_settings={'ignore_unknown_options': True})
@click.option('--at', 'at_', type=DateTime, default=None,
              help=('Stop frame at this time. Must be in '
                    '(YYYY-MM-DDT)?HH:MM(:SS)? format.'))
@click.pass_obj
@catch_watson_error
def stop(watson, at_):
    """
    Stop monitoring time for the current project.
    If `--at` option is given, the provided stopping time is used. The
    specified time must be after the beginning of the to-be-ended frame and must
    not be in the future.
    Example:
    \b
    $ watson stop --at 13:37
    Stopping project apollo11, started an hour ago and stopped 30 minutes ago. (id: e9ccd52) # noqa: E501
    """
    frame = watson.stop(stop_at=at_)
    output_str = "Stopping project {}{}, started {} and stopped {}. (id: {})"
    click.echo(output_str.format(
        style('project', frame.project),
        (" " if frame.tags else "") + style('tags', frame.tags),
        style('time', frame.start.humanize()),
        style('time', frame.stop.humanize()),
        style('short_id', frame.id),
    ))
    watson.save()

I don’t think there is much more to say about this file. It’s a standard click approach, each function represents one command. Click allows us to perform actions for all commands within a group (or even for all commands in the whole application) and pass information to the actually called command in form of context using click.pass_context and click.pass_obj.

Watson uses this feature for two things:

  • each command theoretically can support the flag --color or --no-color
  • for each command the Watson object is created

Configuration

Watson implements a configuration reader class in config.py. Configuration data is structured with two levels: a section and the parameter name. Parameters can be accessed through methods containing the expected return type in their name, e.g. getboolean(section: str, name: str) -> bool.

The configuration reader can be accessed through the Watson class via Watson.config.

# Example for reading a configuration value in cli.py

def add(watson, args, from_, to, confirm_new_project, confirm_new_tag):
    # [...]

    # Confirm creation of new project if that option is set
    if (watson.config.getboolean('options', 'confirm_new_project') or
            confirm_new_project):
        confirm_project(project, watson.projects)

Error Handling

Watson uses a single exception class to raise all its errors towards the command line interface entry point: WatsonError. Most commands in cli.py are decorated with a function that catches WatsonError and converts it to a styled ClickException. This is click’s exception type to show errors to a user.

# catch_watson_error is used to decorate most Watson commands.
# It catches WatsonException and converts it to ClickException
# which is displayed to the user.

def catch_watson_error(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except _watson.WatsonError as e:
            raise click.ClickException(style('error', str(e)))
    return wrapper

This means, when some code within Watson throws a WatsonError it must expect that the error message is shown to the user. It thus should contain an error message that can be understood by the user and is not loaded with internal information.

Learnings

Before I had looked at Watson’s code I did not know that it is possible to implement custom data types for click arguments. This is something I must adopt to keep my own command line interfaces consistent.

I also liked to see that other people use Makefile to collect build steps in Python projects. I use this in some projects, but have always been unsure how common it is in the Python world.

The idea to generate user documentation from the docstrings in cli.py is really nice. That way your functions are all documented and you automatically get rich user documentation. I recently tested pdoc and its fork pdoc3 to generate documentation from docstrings.

Possible Improvements

My main goal with these articles is to improve my own understanding of code architectures. So my advices should be taken with a grain of salt. Nonetheless, to make myself think about the approaches and trade-offs I want to collect some possible improvements.

First, I would add type annotations to the project. I myself have been sceptical about them at first, but came to love them in bigger Python code bases. It helps so much to immediately see which data type a function expects or returns. It also gives my IDE the possibility to highlight wrong argument types.

I saw that Watson sometimes uses dictionaries to represent complex types, e.g. the currently running project. In my opinion it would be nicer to represent such a type as a dataclass object. This way refactoring it is much easier, because we can easily find out where the data object is used. Again, it allows us to leverage IDE support because IDEs can show us whether all properties of a dataclass have been set in its constructor.

# Currently, Watson uses this dictionary to represent the active frame
current = {
    'project': self.current['project'],
    'start': self._format_date(self.current['start']),
    'tags': self.current['tags'],
}

# My suggestion would be to replace it with a dataclass
@dataclasses.dataclass
class ActiveFrame:
    project: str
    start: Arrow  # Watson uses arrow to represent time
    tags: List[str]

current = ActiveFrame(
    project=self.current['project'],
    start=self.current['start'],
    tags=self.current['tags'],
)

Untangling the Watson class might also be a good idea. I think at least the communication with a remote sync server should not be a concern of the class Watson. If Watson is supposed to be a class for internal state management it should, in my opinion, not perform any HTTP communication.

Conclusion

This concludes my dive into the code of Watson, a time tracking application written in Python. It is a quite simple program and easy to understand.

I was able to gain some new ideas for my own Python projects like implementing custom argument types for click.

A few improvements could make the project even simpler. For example, to understand data types of variables I had to skim a lot through the code. Type annotations could help with this. If the code base grows further in the future, I see the risk that the class Watson might accumulate too many responsibilities. In my opinion communication with a remote sync server already is something that clearly could be handled by another class.

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.