Messaging Systems: An Overview over RabbitMQ, Kafka, ZeroMQ and Mosquitto - Part 1

This article is part of a series: Jump to series overview

When dealing with messaging systems there are a lot of options available from classical message brokers to simple libraries that handle the messaging logic without a central server. Almost all of them have some differences and each of them has a reason to exist. In this article I will compare a few popular ones and very different ones, namely the message broker RabbitMQ, the distributed streaming platform Kafka, the socket and concurrency library ZeroMQ and the lightweight MQTT broker Mosquitto. You will see that each of them has their own advantages and differences from the others and you should choose one according to your needs.

Messaging Patterns

When talking about messaging systems we first have to understand the typical messaging patterns available. In some cases, you want to send messages from one program (producer) to another program (consumer) and in other cases you might have multiple producers or multiple consumers or even multiple steps. Some of these patterns are so common that they have their own names.

Publish-Subscribe (also known as Pub-Sub) is a messaging pattern where you want to send messages from a set of publishers or producers to different subscribers or consumers. However, messages are not sent to the subscribers directly, but instead each message is published to a so-called topic and subscribers can subscribe to topics and retrieve only those messages.

Imagine a toy train system in which each component can send messages about its state. A train might be interested in messages from the traffic signals, crossings might be interested in the positions of trains and a dashboard for the owner of the toys might be interested in everything. This gives us a quite complex system with several publishers and subscribers (sometimes even both at the same time) which are interested in different messages.

The trains could send their position updates to a topic called traffic/train/position, while traffic signs could publish to traffic/signal/state. This would allow our crossing to listen to the topic traffic/train/position for approaching trains while the dashboard would just subscribe to all topics with a wildcard.

Competing Consumers

The Competing Consumer pattern describes the situation when you want to use parallelized consumers to speed up the processing of your messages. In this situation you have one or multiple producers and a set of consumers, which are ready to read messages from this consumer. All of the consumers do the same job and could be mutually exchanged. When a consumer receives a message the message will be deleted (or set to an in-flight state, which could be reverted if the consumer fails) and the other consumers won’t see it anymore. The next available consumer will get the next message and so on.

An example for this pattern could be the re-encoding of videos on a video platform. Once a user uploads a video to the platform, requests for encoding of the video in all required web formats will be sent to the message queue. Since video encoding is a slow operation and videos get uploaded very often, there will be a set of workers available to process the videos. Each worker will pull the next message from the queue and once the video has been converted, the task is finished. The message gets deleted from the queue to ensure that the other workers won’t encode the same video again.

Messaging Systems

Knowing some of the different messaging patterns, let’s have a look at some of the existing messaging systems.

Mosquitto

Mosquitto is a lightweight MQTT broker supporting the publish-subscribe pattern. It does not support competing consumers or other protocols apart from MQTT. Mosquitto is available in the software repository of most Linux distributions and there also is a Windows installer available.

If there is no pre-built package available or you do not want to use them, you can also install it from source code. On my computer it compiled in a few seconds without problems. This shows how simple this program really is. You have to make sure that you have the ares.h available on your system. Depending on your distribution this is included in a package called c-ares (Arch) or libc-ares-dev (Debian based).

If you installed mosquitto from your package manager, you should be able to start it with your standard daemon manager (systemd, SysVinit). Otherwise, it’s as simple as calling mosquitto to start the server and it will greet you with

1509264671: mosquitto version 1.4.14 (build date 2017-10-29 08:06:39+0000) starting
1509264671: Using default config.
1509264671: Opening ipv4 listen socket on port 1883.
1509264671: Opening ipv6 listen socket on port 1883.

If you just executed make && make install during compilation of mosquitto, libmosquitto.so.1 will have been installed to /usr/local/lib/libmosquitto.so.1. If you now try to run the client programs mosquitto_sub or mosquitto_pub, this might give you the error

mosquitto_sub: error while loading shared libraries: libmosquitto.so.1: cannot open shared object file: No such file or directory

On Arch linux, I was able to solve this by adding an entry for the folder /usr/local/lib in the file /etc/ld.so.conf.d/local.conf and then running /sbin/ldconfig.

For communication with Mosquitto we should be able to use any MQTT client library. There are command line clients available from mosquitto called mosquitto_pub and mosquitto_sub and the semi official Python library is paho-mqtt.

The MQTT protocol was built for high latency, low energy situations like embedded devices communicating over high latency networks. Thus, it’s also a quite simple protocol. You connect to the broker and then you subscribe to a topic and/or publish messages to a topic.

So, let’s test the publish-subscribe pattern with two subscribers and one or two publishers. We can test this with just four terminal windows and the mosquitto programs mosquitto_pub and mosquitto_sub. Let’s subscribe on of our subscribers to all topics with the topic wildcard # and one only to the topic test/simple.

Execute these commands on two different shells:

mosquitto_sub -t \#
mosquitto_sub -t test/simple

And then on a third shell execute:

mosquitto_pub -m Hello -t test/simple

You should see the message arriving in both other sessions. Now, let’s try two publishers at the same time.

Shell 1: for i in $(seq 1 100); do mosquitto_pub -m Hello -t test/simple; sleep 2; done
Shell 2: for i in $(seq 1 100); do mosquitto_pub -m "Very complicated" -t test/complicated; sleep 2; done

In this case, one of your subscribers should display

Hello
Very complicated
Hello
Very complicated
Hello
Very complicated
Hello
Very complicated
Hello
Very complicated

while the other one displays only the simple messages, because it only subscribed to the topic test/simple

Hello
Hello
Hello
Hello
Hello
Hello
Hello

This is the basic functionality of mosquitto or the MQTT protocol. You subscribe to messages and each subscriber gets the messages it subscribed for. If the subscriber is not online, it won’t get the message, as there is no persistence in the default mode.

However, mosquitto does support persistence if you use specific settings. One of these settings is the Quality of Service QoS. MQTT supports three different levels of QoS:

QoS=0: At most once delivery: messages might be lost
QoS=1: At least once delivery: messages are not lost, but might be sent multiple times to the subscribers
QoS=2: Exactly-once delivery: messages arrive at the subscriber exactly once

Of course in an ideal world, you’d want to have exactly-once delivery all of the time, but this comes at a price. To ensure that a message has arrived at a subscriber exactly once, you will have to do much more communication which requires a lot more power consumption from your embedded device. Thus, you usually want to set the QoS level as low as possible for your application. However, with QoS=0 mosquitto will not use any persistence. By setting QoS=0, you just told the message broker “I do not care if messages get lost”, so why should mosquitto use resources to store a message in case a subscriber is down if you say that you do not care about lost messages? This is why you have to set QoS to either one or two if you want persistence in mosquitto. Since there is a QoS level assigned to the message and a QoS level assigned to the subscription, you need to have QoS at least at one for both of these settings.

Usually when you connect to mosquitto you connect in a mode called clean_session=True. This tells mosquitto that it may forget about your subscriptions once you disconnect. If you want to use persistence, you have to connect with clean_session=False to tell mosquitto that it should persist details about your subscriptions even after you disconnect.

And finally you need a fixed client ID, because otherwise when you re-connect to mosquitto, it will not know which subscription was yours.

If you follow all of these rules, mosquitto will start to retain messages for clients that are currently offline. The specification for MQTT itself does not use the word persistence anywhere in the text, but it clearly states:

When a Client reconnects with CleanSession set to 0, both the Client and Server MUST re-send any unacknowledged PUBLISH Packets (where QoS > 0) and PUBREL Packets using their original Packet Identifiers

Thus, it should be save to assume that all correct MQTT brokers do store messages for offline clients.

This is of course within limits: A broker can only store a finite number of elements - at maximum until the whole hard disk is full if it uses hard disk persistence, but usually at some lower limit. Mosquitto has an option called max_queued_messages which defines how many messages mosquitto will store in its queue at maximum.

ZeroMQ

What if you want to support more messaging patterns than Mosquitto, but still want to have a lightweight solution? Then ZeroMQ might be the way to go. Unlike all other solutions in this series, ZeroMQ is not a central server, but just a library which you can use in your client programs. It looks similar to standard networking from a user’s perspective, but in the background it handles the overhead of network communication for you. E.g., when you want to connect a client to a server with standard networking libraries you have to start the server first and then the client can connect to the server. With ZeroMQ it does not matter if you start the client first. In the background the library will just wait until the server is available and then connect.

ZeroMQ does support both the publish-subscribe and the competing consumers pattern, but since it does not have a central server, you will have to write a very simple server in ZeroMQ yourself if you want to have a many to many connection. One-to-many on the other hand is quite simple, so we will look at this first. With a one-to-one or a one-to-many connection, you can select a single instance on one side to be the server and all other instances will be clients and connect to the server.

ZeroMQ is simple to install, because it is just a library. For Python, we can get it from pip with:

pip install zmq

The publish-subscribe pattern works quite similar to our implementation with MQTT. The only difference is that with ZeroMQ you do not subscribe to topics, but instead you subscribe to a message prefix. You can specify an empty string to subscribe to all incoming messages (which we will do in this example).

Let’s first implement the publisher. For the publisher we just have to bind to a ZeroMQ socket and then send our message or our messages:

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")

while True:
    socket.send_string("temperature 22")
    time.sleep(1)

The client is not much more complex. This time we need to establish a connection, then subscribe to a message filter and finally receive all relevant messages in a loop:

import zmq

context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")

socket.setsockopt_string(zmq.SUBSCRIBE, "")

while True:
    string = socket.recv_string()
    print(string)

When we run both the subscriber and the publisher now, we will receive one message per second on the subscriber’s side.

However, there is one caveat with ZeroMQ which we can see if we slighly adjust the publisher.

for i in range(1, 10):
    socket.send_string("temperature %d" % i)
    time.sleep(1)

After this adjustment the publisher will send numbered messages from 1 to 9. If we start the subscriber first and only then the publisher, we’d expect to receive all messages. After all, the subscriber has been available all the time.

Instead we get this:

temperature 2
temperature 3
temperature 4
temperature 5
temperature 6
temperature 7
temperature 8
temperature 9

The message temperature 1 seems to be missing.

Due to the time it takes to establish the connection on the socket, the subscriber will not receive all messages the publisher sends. This is an important effect to take into consideration. If you rely on all messages, you might want to wait a few seconds before starting to send out messages from the publisher to your subscribers.

Competing Consumers

The competing consumers pattern is called Parallel Pipeline in ZeroMQ and we can use it with the PUSH and PULL socket types. In the competing consumers pattern there are one producer which sends out tasks and several consumers which pull the tasks from the queue and process them.

This example is equally simple as the example before. In fact, it is even simpler, because for the Parallel Pipeline we do not have to specify a message subscription. Thus, the producer and consumer look again very simple.

The producer opens a socket and starts to write out messages, again one message per second.

import zmq
import time

context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.bind("tcp://*:5556")

for i in range(1, 10):
    socket.send_string("temperature %d" % i)
    time.sleep(1)

The consumer connects to the open socket and just reads messages in a loop.

import zmq

context = zmq.Context()
socket = context.socket(zmq.PULL)
socket.connect("tcp://localhost:5556")

while True:
    string = socket.recv_string()
    print(string)

If you run this with only one consumer and one producer, you will see that it will send one message per second just like the publisher in the pub-sub pattern did before. However, if you now start two subscribers, you should see both of them receiving messages and each message wil be read by only one consumer.

Request-Reply

Apart from the two patterns above, ZeroMQ also supports the request reply pattern, which is used for more chatty communications between programs. It allows programs to send messages in both directions, i.e. a client can send a message to a server and the server can respond.

This is not a use-case shared by the other messaging architectures, so if you need a request-reply type communication between two programs, ZeroMQ might be the way to go.

Persistency / Reliability

You can also play around with the two solutions a bit and see what happens if consumers go down. This is what I have tested:

Publish-Subscribe: Publisher will continue to publish messages and messages will be lost.
Competing Consumers: If one consumer goes down, work will be done by the other one. If all go down, the producer will block and wait until at least one consumer is available again. If one or more consumers join again, the work will be distributed again.

Part 2

In the second part of this two part series we will focus on the bigger systems like RabbitMQ and Kafka, while this part was dedicated to the lightweight solutions.

Messaging Systems

Part 1 - Messaging Systems: An Overview over RabbitMQ, Kafka, ZeroMQ and Mosquitto - Part 1
Part 2 - Messaging Systems: RabbitMQ - Part 2

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.