Die Süddeutsche Zeitung hat vor ein paar Tagen ihre Zugmonitor-Daten veröffentlicht. Bei diesem Projekt speichert die Süddeutsche Zeitung die Verspätungsinformationen für die Züge der Deutschen Bahn, welche auf der Internetseite für Kunden einsehbar sind (und bald auch per e-Mail versendet werden sollen). Die Süddeutsche Zeitung wertet diese Daten selbst schon und visualisiert sie, doch dank OpenData können auch andere Entwickler die Daten untersuchen.

Seit dem heutigen Montag ist nun die öffentliche API dokumentiert und damit einsatzfähig. Damit man auf diese leichter zugreifen kann, habe ich eine kleine Python-Klasse geschrieben, mit welcher man Anfragen an die Zugmonitor-API stellen kann. Diese werden dann über MongoDB gecacht, um lange Latenzen beim Webzugriff zu vermeiden und die Betreiber von OpenDataCity zu schonen.

import simplejson
import urllib2
import time, datetime
import pymongo

class Zugmonitor(object):
   ##
   # The URL to the zugmonitor API, without trailing slash
   api_url = "http://zugmonitor.sz.de/api"
   
   ##
   # The mongodb database name to be used
   db_name = "zugmonitor"
   
   
   ##
   # Default expire time in seconds that will be added to the current
   # time when something is cached
   expire = 3600

   
   ##
   # Initialize a new instance for connecting to the zugmonitor
   def __init__(self):
      connection = pymongo.Connection('localhost')
      self.cache = connection[self.db_name]['cache']
   
   ##
   # Send a query to the zugmonitor API server
   # @param command The command that shall be executed, as a string
   # @param params Additional parameters to attach with the query,
   # as a list
   # @return The returned JSON dictionary if the resource could be
   # requested or None if an error occurred
   def query(self, command, params = []):
      url_parts = [self.api_url, command] + params
      url = "/".join(url_parts)
      
      data = self.from_cache(url)
      if data == None:
         req = urllib2.Request(url)
         opener = urllib2.build_opener()
         try:
            f = opener.open(req)
         except urllib2.HTTPError:
            return None
         except urllib2.URLError:
            return None
         
         data = simplejson.load(f)
         self.to_cache(url, data)
      
      return data
   
   
   ##
   # Query for all stations of Deutsche Bahn that are contained
   # in the Zugmonitor.
   # @return A dictionary with a unique station_id as a key and
   # a subdictionary as value, where the subdictionary contains:
   # lat, lon, name and id (the same as the key)
   def stations(self):
      return self.query("stations")
   
   ##
   # Get all trains of a specific day.
   # @param year The year of the date to be retrieved
   # @param month The month of the date to be retrieved
   # @param day The day of the date to be retrieved
   # @return A dictionary with optionally the following keys:
   # train_nr, date, status (E=Error, F=Finished, S=Sleeping,
   # X=Running), started, nextrun, finished (for scraping),
   # stations (subdict: station_id, arrival, departure, delay,
   # delay_cause, scraped)
   # For further information refer to the API documentation:
   # http://www.opendatacity.de/zugmonitor-api/
   def trains(self, year, month, day):
      date = datetime.date(year, month, day)
      date_string = date.strftime('%Y-%m-%d')
      return self.query("trains", [date_string])
   
   ##
   # Get this element from cache if it exists there
   # @param identity The identity of the element, as string
   # @return The element if something was found (it contains an
   # additional _id key from mongodb), None if nothing in cache
   def from_cache(self, identity):
      result = self.cache.find_one({'_id': identity})
      
      if result == None:
         return None
      elif result['expire'] <= time.time():
         return None
      else:
         return result['content']
   
   ##
   # Store a value in the cache, overrides an existing entry if there
   # already is one
   # @param identity The identity under which the element shall be
   # stored
   # @param data The data that shall be saved
   # @param expire A unix timestamp for when this value shall expire,
   # if None is given the default expire will be added
   # @return void
   def to_cache(self, identity, data, expire = None):
      if expire == None:
         expire = time.time() + self.expire
      
      element = {
         '_id': identity,
         'expire': expire,
         'content': data
      }
      self.cache.save(element)

Eine Beispielverwendung sieht so aus:

import zugmonitor

zm = zugmonitor.Zugmonitor()
# prints all stations, pretty long
print zm.stations()
# prints all trains of one day and their stations, even longer
print zm.trains(2011, 10, 10)
# prints the same, but this time from mongodb (much faster)
print zm.trains(2011, 10, 10)

# prints empty dict {} because zugmonitor does not go so far
print zm.trains(2010, 10, 10)
I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.