Code Structure of Watson Time Tracking (Python)
In this post I want to have a look at the code structure of Watson, a simple Python program for time tracking. It is the first of hopefully a long series of articles in which I analyse the architecture of open source projects. Watson is quite simple and is programmed in my primary language (Python), so it is a good starting point.
So let’s dive right into the code.
For this article I had a look at commit b820093
which is the release commit
of version 2.1.0.
Repository Structure
In the first section of this article, we will look at the structure of the repository.
The outermost directory follows the common Python project setup: We have a
code directory with the name of the package (watson
), a tests
folder for
unit tests, a docs
folder for documentation and a setup.py
. Additionally,
Watson contains a folder scripts
with some helper scripts not required by
the main program.
Watson uses two requirements files, a requirements.txt
for all libraries
required to run the program and a requirements-dev.txt
for additional
packages needed for testing and development. These files are directly mapped
to the entries install_requires
and tests_require
in setup.py
. Watson
does not pin versions in requirements.txt
. I think that
for a public repository that’s a good choice.
For in-house projects, in my opinion, it makes sense to pin the versions in
requirements.txt
to get reproducible releases.
The repository contains a Makefile
to collect multiple routines from
different other programs at a single location. For example make install
will run python setup.py install
, whereas make docs
will call one of the
helper scripts followed by mkdocs build
. I personally like this approach,
because it serves as documentation to typical build steps. And even though
setup.py
combines several already, most projects need some other
programs, too.
# Excerpt from Watson's Makefile
PYTHON ?= python
.PHONY: install
install:
$(PYTHON) setup.py install
# [...]
.PHONY: docs
docs: install-dev
$(PYTHON) scripts/gen-cli-docs.py
mkdocs build
We also find files for shell completion in the repository, namely
watson.completion
, watson.fish
and watson.zsh-completion
.
Testing
Watson uses Travis for CI tests which tests the project with several different
versions of Python as well as one test for source code problems using
flake8
.
The testing framework is pytest. Tests are structured the same as the
program source; there’s one test file for each source program file except for
frames.py
.
Additional data needed by the unit tests is stored in the folder
tests/resources
.
Documentation
Let’s have a look at the documentation. We already saw the docs
folder
previously. This folder contains the user documentation as well as
documentation for the processes around the development of Watson.
It does not contain documentation for the code itself.
These files are publicly hosted
as GitHub Pages.
The documentation for all Watson commands inside commands.md
is
auto-generated using the script scripts/gen-cli-docs.py
.
It uses a clever approach:
As we will later see, Watson uses the Python library
click to build the command line interface. The script to generate
the commands documentation iterates over all click commands using
reflection and calls the method
Command.format_help
to write the documentation into a temporary buffer in Markdown format.
From this buffer it can then be copied into the Markdown file.
Code is documented with docstrings. The quality of documentation varies greatly
between files. config.py
contains quite good documentation, frames.py
does not contain docstrings at all and only three lines of comments.
There is the argument that good code does not require
documentation. And while this contains some truth (worse code needs more
explanation), as a newcomer to the project I see
three classes
Frame
, Span
and Frames
in this file and I am wondering what exactly
they represent and how they interact.
Code Structure
That said, let’s look at the code structure. Watson consists almost entirely
of two files: cli.py
with 1727 lines and watson.py
with 583 lines.
cloc
outputs the following statistics:
---------------------------------------------------------------------------------
File blank comment code
---------------------------------------------------------------------------------
./cli.py 242 496 989
./watson.py 113 54 416
./utils.py 71 103 228
./fullmoon.py 7 6 220
./frames.py 43 3 141
./autocompletion.py 29 35 54
./config.py 32 46 29
./__init__.py 1 0 3
./__main__.py 1 0 2
./version.py 0 0 1
---------------------------------------------------------------------------------
SUM: 539 743 2083
---------------------------------------------------------------------------------
Let’s first look at watson.py
. This file contains a single class Watson
.
Even though many methods inside Watson
are documented,
the class Watson
itself is not.
To my understanding Watson
is a god-object. It seems to be responsible for
file management, state management as well as communication with a remote
sync server.
One noteworthy finding for me in watson.py
is the import of requests
.
It is standard approach in Python to collect all imports at the top of the
file, which means they all will be imported as soon as the file is loaded.
Python, however, also allows us to execute imports at other places.
Watson uses this and imports the library
requests
only in methods that actually need it
“to reduce watson response time”.
Command Line Interface
cli.py
is the biggest file in the project. It uses the library click
to create the command line interface. The file introduces two custom
types to parse the arguments provided to commands: MutuallyExclusiveOption
and DateTimeParamType
. This is a very handy feature to standardize how
special types of arguments (e.g. dates) can be provided across all API
commands. If support for relative dates like 10 minutes ago should be
added, it could be implemented in this data type parser and would be
available in all commands immediately.
# Example of a command line interface command in Watson
@cli.command(context_settings={'ignore_unknown_options': True})
@click.option('--at', 'at_', type=DateTime, default=None,
help=('Stop frame at this time. Must be in '
'(YYYY-MM-DDT)?HH:MM(:SS)? format.'))
@click.pass_obj
@catch_watson_error
def stop(watson, at_):
"""
Stop monitoring time for the current project.
If `--at` option is given, the provided stopping time is used. The
specified time must be after the beginning of the to-be-ended frame and must
not be in the future.
Example:
\b
$ watson stop --at 13:37
Stopping project apollo11, started an hour ago and stopped 30 minutes ago. (id: e9ccd52) # noqa: E501
"""
frame = watson.stop(stop_at=at_)
output_str = "Stopping project {}{}, started {} and stopped {}. (id: {})"
click.echo(output_str.format(
style('project', frame.project),
(" " if frame.tags else "") + style('tags', frame.tags),
style('time', frame.start.humanize()),
style('time', frame.stop.humanize()),
style('short_id', frame.id),
))
watson.save()
I don’t think there is much more to say about this file. It’s a standard
click approach, each function represents one command. Click allows us to
perform actions for all commands within a group (or even for all commands
in the whole application) and pass information to the actually called command
in form of context using click.pass_context
and click.pass_obj
.
Watson uses this feature for two things:
- each command theoretically can support the flag
--color
or--no-color
- for each command the
Watson
object is created
Configuration
Watson implements a configuration reader class in config.py
.
Configuration data
is structured with two levels: a section and the parameter name.
Parameters can be
accessed through methods containing the expected return type in their name,
e.g. getboolean(section: str, name: str) -> bool
.
The configuration reader can be accessed through the Watson
class
via Watson.config
.
# Example for reading a configuration value in cli.py
def add(watson, args, from_, to, confirm_new_project, confirm_new_tag):
# [...]
# Confirm creation of new project if that option is set
if (watson.config.getboolean('options', 'confirm_new_project') or
confirm_new_project):
confirm_project(project, watson.projects)
Error Handling
Watson uses a single exception class to raise all its errors towards the
command line interface entry point: WatsonError
. Most commands in cli.py
are decorated with a function that catches WatsonError
and converts it to
a styled ClickException
. This is click’s exception type to show errors
to a user.
# catch_watson_error is used to decorate most Watson commands.
# It catches WatsonException and converts it to ClickException
# which is displayed to the user.
def catch_watson_error(func):
@wraps(func)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except _watson.WatsonError as e:
raise click.ClickException(style('error', str(e)))
return wrapper
This means, when some code within Watson throws a WatsonError
it must
expect that the error message is shown to the user. It thus should contain an
error message that can be understood by the user and is not loaded with
internal information.
Learnings
Before I had looked at Watson’s code I did not know that it is possible to implement custom data types for click arguments. This is something I must adopt to keep my own command line interfaces consistent.
I also liked to see that other people use Makefile
to collect build steps
in Python projects. I use this in some projects, but have always been unsure
how common it is in the Python world.
The idea to generate user documentation from the
docstrings in cli.py
is really nice.
That way your functions are all documented and you
automatically get rich user documentation. I recently tested pdoc
and its
fork pdoc3
to generate documentation from docstrings.
Possible Improvements
My main goal with these articles is to improve my own understanding of code architectures. So my advices should be taken with a grain of salt. Nonetheless, to make myself think about the approaches and trade-offs I want to collect some possible improvements.
First, I would add type annotations to the project. I myself have been sceptical about them at first, but came to love them in bigger Python code bases. It helps so much to immediately see which data type a function expects or returns. It also gives my IDE the possibility to highlight wrong argument types.
I saw that Watson sometimes uses dictionaries to represent complex types, e.g.
the currently running project.
In my opinion it would be nicer to represent such a type as a dataclass
object. This way refactoring it is much easier, because we can easily find out
where the data object is used. Again, it allows us to leverage IDE support
because IDEs can show us whether all properties of a dataclass have been
set in its constructor.
# Currently, Watson uses this dictionary to represent the active frame
current = {
'project': self.current['project'],
'start': self._format_date(self.current['start']),
'tags': self.current['tags'],
}
# My suggestion would be to replace it with a dataclass
@dataclasses.dataclass
class ActiveFrame:
project: str
start: Arrow # Watson uses arrow to represent time
tags: List[str]
current = ActiveFrame(
project=self.current['project'],
start=self.current['start'],
tags=self.current['tags'],
)
Untangling the Watson
class might also be a good idea. I think at least the
communication with a remote sync server should not be a concern of
the class Watson
. If Watson
is supposed to be a class for internal
state management it should, in my opinion, not perform any HTTP communication.
Conclusion
This concludes my dive into the code of Watson, a time tracking application written in Python. It is a quite simple program and easy to understand.
I was able to gain some new ideas for my own Python projects like implementing custom argument types for click.
A few improvements could make the project even simpler. For example,
to understand data types
of variables I had to skim a lot through the code. Type annotations could
help with this. If the code base grows further in the future, I see the risk
that the class Watson
might accumulate too many responsibilities. In my
opinion communication with a remote sync server already is something that
clearly could be handled by another class.