Graceful Shutdown of QEMU Guest from Python

This article is part of a series: Jump to series overview

In this tutorial I want to show how you can trigger a clean shutdown of your QEMU guest VM from Python. This is not as simple as one might expect, but also not so difficult. If you only stop the QEMU process, it will be a reset inside the guest VM. For a clean shutdown of a QEMU guest we have to use the QEMU monitor and send a shutdown command there.

So, let’s get started. A QEMU monitor can be opened for each guest VM individually and there are multiple connection methods available. There are also two kinds of monitor: The standard monitor targeting human users (sometimes called HMP) and the control monitor using the JSON-based QEMU Machine Protocol (QMP).

By default, QEMU uses a human monitor attached to the graphical user interface of the VM. Another possibility from the man page is using stdio.

Since we want to shutdown the guest programmatically, we will be using the QMP protocol and we will connect through a unix socket.

Manual Steps for Shutdown

Let’s try this manually at first. Start a QEMU guest listening to /tmp/qmp-sock:

$ qemu-system-x86_64 -m 1024 -hda /srv/nfs/vms/my_image.qcow2 -qmp unix:/tmp/qmp-sock,server,nowait

We can connect to this socket using socat. Since we’re using the JSON protocol, we will have to initialize the communication by sending {"execute": "qmp_capabilities"}. This command will finish the capabilities negotiation without enabling any additional capabilities. Only after finishing the capabilities negotiation we can start sending actual commands. When the machine has fully booted, we can send {"execute": "system_powerdown"} to shutdown the VM.

$ socat - unix-connect:/tmp/qmp-sock
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": ""}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities"}
{"return": {}}
{"execute": "system_powerdown"}
{"return": {}}
{"timestamp": {"seconds": 1607364674, "microseconds": 315794}, "event": "POWERDOWN"}

Shutdown the VM from Python

We can achieve the same thing pretty easily from Python using socket programming. Let’s adjust the cloud computing platform code we have so far.

Before we create a systemd unit file for the VM, we want to add the new parameters to our QEMU command. Each VM will get a socket file for monitor connections. It’s possible to start booting the VM as soon as a client has connected to the monitor. We want the VM to start even if no client is attached to the monitor. Thus, we use the option nowait.

def qemu_socket_monitor(vm_id: str) -> Path:
    return Path(f'/tmp/aetherscale-qmp-{vm_id}.sock')


def create_qemu_systemd_unit(
        unit_name: str, qemu_config: QemuStartupConfig):
    # ...

    qemu_monitor_path = qemu_socket_monitor(qemu_config.vm_id)
    socket_quoted = shlex.quote(f'unix:{qemu_monitor_path},server,nowait')

    command = f'qemu-system-x86_64 -m 4096 -accel kvm -hda {hda_quoted} ' \
        f'-device {device_quoted} -netdev {netdev_quoted} ' \
        f'-name {name_quoted} ' \
        '-nographic ' \
        f'-qmp {socket_quoted}'

Next, we will adjust the stop behaviour. By default, we want VMs to be stopped gracefully. Only when the user submits a kill flag, we want the VM to be killed immediately. It might also make sense to first try to shutdown the guest gracefully and if it does not stop within at most k seconds kill it. However, we will keep this option for later.

For communication with the QEMU monitor we implement a class QemuMonitor. We can use socket to connect to a UNIX socket. I am new to socket programming, so this might not be the best code, but it should work for the current use case and we can re-iterate on it.

import json
from pathlib import Path
import socket
from typing import Any


class QemuMonitor:
    def __init__(self, socket_file: Path):
        # TODO: It's not really nice that we use the file object
        # to read lines and the socket to write
        self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        self.sock.connect(str(socket_file))
        self.f = self.sock.makefile('rw')

        # Initialize connection immediately
        self._initialize()

    def execute(self, command: str) -> Any:
        json_line = json.dumps({'execute': command}) + '\n'
        self.sock.sendall(json_line.encode('utf-8'))
        return json.loads(self.f.readline())

    def _initialize(self):
        # Read the capabilities
        self.f.readline()

        # Acknowledge the QMP capability negotiation
        self.execute('qmp_capabilities')

The QemuMonitor can now be used to send commands when we want to issue the shutdown of a VM:

def stop_vm(options: Dict[str, Any]) -> Dict[str, str]:
    # ...
    kill_flag = bool(options.get('kill', False))
    stop_status = 'killed' if kill_flag else 'stopped'

    # ...
    if not execution.systemd_unit_exists(unit_name):
        # ...
    else:
        execution.disable_systemd_unit(unit_name)

        if kill_flag:
            execution.stop_systemd_unit(unit_name)
        else:
            qemu_socket = qemu_socket_monitor(vm_id)
            qm = qemu.QemuMonitor(qemu_socket)
            qm.execute('system_powerdown')

        response = {
            'status': stop_status,
            'vm-id': vm_id,
        }

    return response

For the deletion of a VM we will keep a kill behaviour, because it needs to be able to delete the VM resources. If a user wants to gracefully stop and delete a VM, they have to do this in two steps: First issue stop-vm command and then issue a delete-vm command.

def delete_vm(options: Dict[str, Any]) -> Dict[str, str]:
    try:
        vm_id = options['vm-id']
    except KeyError:
        raise ValueError('VM ID not specified')

    # force kill stop when a VM is deleted
    options['kill'] = True
    stop_vm(options)

    # ...

Finally, we have to adjust the client to allow setting the kill flag:

    # ...

    stop_vm_parser = subparsers.add_parser('stop-vm')
    stop_vm_parser.add_argument(
        '--vm-id', dest='vm_id', help='ID of the VM to stop', required=True)
    stop_vm_parser.add_argument(
        '--kill', dest='kill', action='store_true', default=False,
        help='Kill the VM immediately, no graceful shutdown')

    # ...

    if args.subparser_name == 'list-vms':
        # ...
    elif args.subparser_name == 'stop-vm':
        response_expected = True
        data = {
            'command': args.subparser_name,
            'options': {
                'vm-id': args.vm_id,
                'kill': args.kill,
            }
        }
    elif args.subparser_name in ['start-vm', 'delete-vm']:
        response_expected = True
        data = {
            'command': args.subparser_name,
            'options': {
                'vm-id': args.vm_id,
            }
        }

Now, we can - with some limitations - gracefully shutdown our VMs. There are still some possible improvements, for example system_powerdown does not work when the host is still booting. I assume some operating systems might already listen to shutdown commands in this phase, mine did not. We must also expect that some VMs will not listen to the command at all. Thus, in the future we should kill VMs that do not shutdown gracefully after a short amount of time.

Another option for graceful shutdown would be using the QEMU Guest Agent with the command guest-shutdown. However, this requires the Guest Agent to be running inside the VM. For now, I will try to do as much as possible without the Guest Agent, even if solutions might not be 100% perfect. We will maybe revisit the Guest Agent at some point in the future when we’re trying to fetch IP addresses of VMs (libvirt can show the IPs of VMs with it’s internal DHCP server or with the Guest Agent).

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.