Messaging Systems: An Overview over RabbitMQ, Kafka, ZeroMQ and Mosquitto - Part 1
When dealing with messaging systems there are a lot of options available from classical message brokers to simple libraries that handle the messaging logic without a central server. Almost all of them have some differences and each of them has a reason to exist. In this article I will compare a few popular ones and very different ones, namely the message broker RabbitMQ, the distributed streaming platform Kafka, the socket and concurrency library ZeroMQ and the lightweight MQTT broker Mosquitto. You will see that each of them has their own advantages and differences from the others and you should choose one according to your needs.
Messaging Patterns
When talking about messaging systems we first have to understand the typical messaging patterns available. In some cases, you want to send messages from one program (producer) to another program (consumer) and in other cases you might have multiple producers or multiple consumers or even multiple steps. Some of these patterns are so common that they have their own names.
Publish-Subscribe
Publish-Subscribe (also known as Pub-Sub) is a messaging pattern where you want to send messages from a set of publishers or producers to different subscribers or consumers. However, messages are not sent to the subscribers directly, but instead each message is published to a so-called topic and subscribers can subscribe to topics and retrieve only those messages.
Imagine a toy train system in which each component can send messages about its state. A train might be interested in messages from the traffic signals, crossings might be interested in the positions of trains and a dashboard for the owner of the toys might be interested in everything. This gives us a quite complex system with several publishers and subscribers (sometimes even both at the same time) which are interested in different messages.
The trains could send their position updates to a topic called
traffic/train/position
, while traffic signs could publish to
traffic/signal/state
. This would allow our crossing to listen to the
topic traffic/train/position
for approaching trains while the dashboard
would just subscribe to all topics with a wildcard.
Competing Consumers
The Competing Consumer pattern describes the situation when you want to use parallelized consumers to speed up the processing of your messages. In this situation you have one or multiple producers and a set of consumers, which are ready to read messages from this consumer. All of the consumers do the same job and could be mutually exchanged. When a consumer receives a message the message will be deleted (or set to an in-flight state, which could be reverted if the consumer fails) and the other consumers won’t see it anymore. The next available consumer will get the next message and so on.
An example for this pattern could be the re-encoding of videos on a video platform. Once a user uploads a video to the platform, requests for encoding of the video in all required web formats will be sent to the message queue. Since video encoding is a slow operation and videos get uploaded very often, there will be a set of workers available to process the videos. Each worker will pull the next message from the queue and once the video has been converted, the task is finished. The message gets deleted from the queue to ensure that the other workers won’t encode the same video again.
Messaging Systems
Knowing some of the different messaging patterns, let’s have a look at some of the existing messaging systems.
Mosquitto
Mosquitto is a lightweight MQTT broker supporting the publish-subscribe pattern. It does not support competing consumers or other protocols apart from MQTT. Mosquitto is available in the software repository of most Linux distributions and there also is a Windows installer available.
If there is no pre-built package available or you do not want to use them,
you can also install it from source code. On my computer it compiled in a few
seconds without problems. This shows how simple this program really is. You
have to make sure that you have the ares.h
available on your system.
Depending on your distribution this is included in a package called
c-ares
(Arch) or libc-ares-dev
(Debian based).
If you installed mosquitto from your package manager, you should be able to
start it with your standard daemon manager (systemd, SysVinit). Otherwise,
it’s as simple as calling mosquitto
to start the server and it will greet
you with
1509264671: mosquitto version 1.4.14 (build date 2017-10-29 08:06:39+0000) starting
1509264671: Using default config.
1509264671: Opening ipv4 listen socket on port 1883.
1509264671: Opening ipv6 listen socket on port 1883.
If you just executed make && make install
during compilation of mosquitto,
libmosquitto.so.1
will have been installed to
/usr/local/lib/libmosquitto.so.1
. If you now try to run the client programs
mosquitto_sub
or mosquitto_pub
, this might give you the error
mosquitto_sub: error while loading shared libraries: libmosquitto.so.1: cannot open shared object file: No such file or directory
On Arch linux, I was able to solve this by adding an entry for the
folder /usr/local/lib
in the file /etc/ld.so.conf.d/local.conf
and then
running /sbin/ldconfig
.
For communication with Mosquitto we should be able to use any MQTT client
library. There are command line clients available from mosquitto called
mosquitto_pub
and mosquitto_sub
and the semi official Python library is
paho-mqtt
.
The MQTT protocol was built for high latency, low energy situations like embedded devices communicating over high latency networks. Thus, it’s also a quite simple protocol. You connect to the broker and then you subscribe to a topic and/or publish messages to a topic.
So, let’s test the publish-subscribe pattern with two subscribers and one
or two publishers. We can test this with just four terminal windows and the
mosquitto programs mosquitto_pub
and mosquitto_sub
. Let’s subscribe on
of our subscribers to all topics with the topic wildcard #
and one only to
the topic test/simple
.
Execute these commands on two different shells:
mosquitto_sub -t \#
mosquitto_sub -t test/simple
And then on a third shell execute:
mosquitto_pub -m Hello -t test/simple
You should see the message arriving in both other sessions. Now, let’s try two publishers at the same time.
Shell 1: for i in $(seq 1 100); do mosquitto_pub -m Hello -t test/simple; sleep 2; done
Shell 2: for i in $(seq 1 100); do mosquitto_pub -m "Very complicated" -t test/complicated; sleep 2; done
In this case, one of your subscribers should display
Hello
Very complicated
Hello
Very complicated
Hello
Very complicated
Hello
Very complicated
Hello
Very complicated
while the other one displays only the simple messages, because it only
subscribed to the topic test/simple
Hello
Hello
Hello
Hello
Hello
Hello
Hello
This is the basic functionality of mosquitto or the MQTT protocol. You subscribe to messages and each subscriber gets the messages it subscribed for. If the subscriber is not online, it won’t get the message, as there is no persistence in the default mode.
However, mosquitto does support persistence if you use specific settings. One of these settings is the Quality of Service QoS. MQTT supports three different levels of QoS:
- QoS=0: At most once delivery: messages might be lost
- QoS=1: At least once delivery: messages are not lost, but might be sent multiple times to the subscribers
- QoS=2: Exactly-once delivery: messages arrive at the subscriber exactly once
Of course in an ideal world, you’d want to have exactly-once delivery all of the time, but this comes at a price. To ensure that a message has arrived at a subscriber exactly once, you will have to do much more communication which requires a lot more power consumption from your embedded device. Thus, you usually want to set the QoS level as low as possible for your application. However, with QoS=0 mosquitto will not use any persistence. By setting QoS=0, you just told the message broker “I do not care if messages get lost”, so why should mosquitto use resources to store a message in case a subscriber is down if you say that you do not care about lost messages? This is why you have to set QoS to either one or two if you want persistence in mosquitto. Since there is a QoS level assigned to the message and a QoS level assigned to the subscription, you need to have QoS at least at one for both of these settings.
Usually when you connect to mosquitto you connect in a mode called
clean_session=True
. This tells mosquitto that it may forget about your
subscriptions once you disconnect. If you want to use persistence, you have
to connect with clean_session=False
to tell mosquitto that it should
persist details about your subscriptions even after you disconnect.
And finally you need a fixed client ID, because otherwise when you re-connect to mosquitto, it will not know which subscription was yours.
If you follow all of these rules, mosquitto
will start to retain messages for clients that are currently offline. The
specification for MQTT
itself does not use the word persistence anywhere
in the text, but it clearly states:
When a Client reconnects with CleanSession set to 0, both the Client and Server MUST re-send any unacknowledged PUBLISH Packets (where QoS > 0) and PUBREL Packets using their original Packet Identifiers
Thus, it should be save to assume that all correct MQTT brokers do store messages for offline clients.
This is of course within limits: A broker can only store a finite number
of elements - at maximum until the whole hard disk is full if it uses
hard disk persistence, but usually at some lower limit. Mosquitto has an
option called max_queued_messages
which defines how many messages mosquitto
will store in its queue at maximum.
ZeroMQ
What if you want to support more messaging patterns than Mosquitto, but still want to have a lightweight solution? Then ZeroMQ might be the way to go. Unlike all other solutions in this series, ZeroMQ is not a central server, but just a library which you can use in your client programs. It looks similar to standard networking from a user’s perspective, but in the background it handles the overhead of network communication for you. E.g., when you want to connect a client to a server with standard networking libraries you have to start the server first and then the client can connect to the server. With ZeroMQ it does not matter if you start the client first. In the background the library will just wait until the server is available and then connect.
ZeroMQ does support both the publish-subscribe and the competing consumers pattern, but since it does not have a central server, you will have to write a very simple server in ZeroMQ yourself if you want to have a many to many connection. One-to-many on the other hand is quite simple, so we will look at this first. With a one-to-one or a one-to-many connection, you can select a single instance on one side to be the server and all other instances will be clients and connect to the server.
Publish-Subscribe
ZeroMQ is simple to install, because it is just a library. For Python, we can get it from pip with:
pip install zmq
The publish-subscribe pattern works quite similar to our implementation with MQTT. The only difference is that with ZeroMQ you do not subscribe to topics, but instead you subscribe to a message prefix. You can specify an empty string to subscribe to all incoming messages (which we will do in this example).
Let’s first implement the publisher. For the publisher we just have to bind to a ZeroMQ socket and then send our message or our messages:
import zmq
import time
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
while True:
socket.send_string("temperature 22")
time.sleep(1)
The client is not much more complex. This time we need to establish a connection, then subscribe to a message filter and finally receive all relevant messages in a loop:
import zmq
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
socket.setsockopt_string(zmq.SUBSCRIBE, "")
while True:
string = socket.recv_string()
print(string)
When we run both the subscriber and the publisher now, we will receive one message per second on the subscriber’s side.
However, there is one caveat with ZeroMQ which we can see if we slighly adjust the publisher.
for i in range(1, 10):
socket.send_string("temperature %d" % i)
time.sleep(1)
After this adjustment the publisher will send numbered messages from 1 to 9. If we start the subscriber first and only then the publisher, we’d expect to receive all messages. After all, the subscriber has been available all the time.
Instead we get this:
temperature 2
temperature 3
temperature 4
temperature 5
temperature 6
temperature 7
temperature 8
temperature 9
The message temperature 1
seems to be missing.
Due to the time it takes to establish the connection on the socket, the subscriber will not receive all messages the publisher sends. This is an important effect to take into consideration. If you rely on all messages, you might want to wait a few seconds before starting to send out messages from the publisher to your subscribers.
Competing Consumers
The competing consumers pattern is called Parallel Pipeline in ZeroMQ and
we can use it with the PUSH
and PULL
socket types. In the competing
consumers pattern there are one producer
which sends out tasks and several consumers which pull the tasks from the
queue and process them.
This example is equally simple as the example before. In fact, it is even simpler, because for the Parallel Pipeline we do not have to specify a message subscription. Thus, the producer and consumer look again very simple.
The producer opens a socket and starts to write out messages, again one message per second.
import zmq
import time
context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.bind("tcp://*:5556")
for i in range(1, 10):
socket.send_string("temperature %d" % i)
time.sleep(1)
The consumer connects to the open socket and just reads messages in a loop.
import zmq
context = zmq.Context()
socket = context.socket(zmq.PULL)
socket.connect("tcp://localhost:5556")
while True:
string = socket.recv_string()
print(string)
If you run this with only one consumer and one producer, you will see that it will send one message per second just like the publisher in the pub-sub pattern did before. However, if you now start two subscribers, you should see both of them receiving messages and each message wil be read by only one consumer.
Request-Reply
Apart from the two patterns above, ZeroMQ also supports the request reply pattern, which is used for more chatty communications between programs. It allows programs to send messages in both directions, i.e. a client can send a message to a server and the server can respond.
This is not a use-case shared by the other messaging architectures, so if you need a request-reply type communication between two programs, ZeroMQ might be the way to go.
Persistency / Reliability
You can also play around with the two solutions a bit and see what happens if consumers go down. This is what I have tested:
- Publish-Subscribe: Publisher will continue to publish messages and messages will be lost.
- Competing Consumers: If one consumer goes down, work will be done by the other one. If all go down, the producer will block and wait until at least one consumer is available again. If one or more consumers join again, the work will be distributed again.
Part 2
In the second part of this two part series we will focus on the bigger systems like RabbitMQ and Kafka, while this part was dedicated to the lightweight solutions.
I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.