Persistent QEMU Instances with systemd

This article is part of a series: Jump to series overview

If you’re running QEMU instances without libvirt, you have the problem that there is no daemon managing your instances. If your host reboots, the VMs will not be re-created. In this part of my cloud computing tutorial I will use systemd to create QEMU VMs that are managed by a daemon (in this case systemd). If you haven’t followed along my tutorial series, you can just read the systemd info and skip the Python parts.

One limitation with the systemd approach is that we should not start the VMs with a graphical user interface that way. It might work, but it’s not recommended. For the cloud computing tutotial this is only a limitation for debugging, because on a real cloud computing provider we would not want to see a VM window on our host anyway. For all GUI instances customers would connect with a remote desktop utility.

Up to now I have used the graphical output of QEMU, now we will use SSH to access our machine. For IP address assignment I will rely to the DHCP server of my router. When we setup private VPNs for a collection of VMs we will have to setup our own DHCP server to assign IP addresses.

Luckily, the Ubuntu base image I created earlier already starts an SSH server in the background. So there is nothing more to prepare. If you used another image without SSH server, install it and enable it as a service in your base image.

A systemd unit file for QEMU

What would a systemd unit file for QEMU look like? It’s quite simple actually. I’m using user-mode systemd, but of course you can also use system-wide systemd with a User and Group section in [Service] (QEMU instances should not be run as root):

[Unit]
Description=QEMU VM Operation System X

[Service]
ExecStart=qemu-system-x86_64 -m 4096 -accel kvm -nographic ...

[Install]
WantedBy=default.target

Just extend the qemu command with all options that you need for your exact use case. The important option for usage with systemd is -nographic, because otherwise QEMU will try to start a window on the window manager. This would fail if you don’t set the right environment variables and is not recommend anyway. If you need to start a VM with a GUI window, use the autostart method of your window manager.

Return the VM ID on startup

In the Python code we will perform one small change before we actually start our VMs as systemd services.

Up to now when we started a VM we got a window and could check the status there. Now, if we want to query the status of a VM we’ll have to ask systemd. For this, we will have to know an identifier for our machine, e.g. the VM ID. At the moment the VM ID is not returned, though.

In the client, we will change response_expected for start-vm to True:

# ...

elif args.subparser_name == 'start-vm':
    response_expected = True
    data = {
        'command': 'start-vm',
        'options': {
            'image': args.image,
        }

# ...

And in computing.py we will set a response value after the VM was started. The response status will be starting, because to the user the VM might still seem to be in the starting stage and possibly not responsive to SSH for a while. We will think about a clean lifecycle naming scheme later, but I think it could some thing like starting, running, crashed and stopped probably with still some open issues regarding crashed and stopped.

# ...

print(f'Started VM "{vm_id}"')
    response = {
        'status': 'starting',
        'vm-id': vm_id,
    }

With these two small changes we will now see the ID of a started VM in the terminal:

$ aetherscale-cli start-vm --image ubuntu_base           
[{'status': 'starting', 'vm-id': 'wiuakczz'}]

Starting a VM

To start a new VM we will create a systemd unit file and then start and enable the service. This can be achieved with the functions copy_systemd_unit, start_systemd_unit and enable_systemd_unit which we already created previously. These actions will replace the current call to subprocess.Popen. Additionally, we will create a new function create_qemu_systemd_unit that creates a new systemd unit for a VM.

Let’s first create a data class to pass all required QEMU startup options between functions.

from dataclasses import dataclass

@dataclass
class QemuStartupConfig:
    vm_id: str
    hda_image: Path
    mac_addr: str
    vde_folder: Path

vm_id contains the ID of our VM, hda_image is the full path to our image for this VM (which uses the base image as a backing image), mac_addr of course is the MAC address for the network interface and vde_folder is the path to the previously setup VDE network.

At the same time I removed all locations in the code where there is a distinction between TAP networking and VDE networking. I will only support VDE networking in the future. In case we will need TAP networking again, we can refer to older git commits.

With this data structure available, let’s create a function create_qemu_systemd_unit that writes the systemd unit file.

def create_qemu_systemd_unit(
        unit_name: str, qemu_config: QemuStartupConfig):
    hda_quoted = shlex.quote(str(qemu_config.hda_image.absolute()))
    device_quoted = shlex.quote(
        f'virtio-net-pci,netdev=pubnet,mac={qemu_config.mac_addr}')
    netdev_quoted = shlex.quote(
        f'vde,id=pubnet,sock={str(qemu_config.vde_folder)}')
    name_quoted = shlex.quote(
        f'qemu-vm-{qemu_config.vm_id},process=vm-{qemu_config.vm_id}')

    command = f'qemu-system-x86_64 -m 4096 -accel kvm -hda {hda_quoted} ' \
        f'-device {device_quoted} -netdev {netdev_quoted} ' \
        f'-name {name_quoted}' \
        '-nographic '

    with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as f:
        f.write('[Unit]\n')
        f.write(f'Description=aetherscale VM {qemu_config.vm_id}\n')
        f.write('\n')
        f.write('[Service]\n')
        f.write(f'ExecStart={command}\n')
        f.write('\n')
        f.write('[Install]\n')
        f.write('WantedBy=default.target\n')

    execution.copy_systemd_unit(Path(f.name), unit_name)
    os.remove(f.name)

At first, I quote all arguments with input that might require quoting. It would be quite bad if an attacker could execute commands on our host. Then, I write a pretty simple systemd unit file with the full QEMU command and the VM ID as the unit description.

With this function prepared, all we have to do is change the actual startup of the VM inside callback to use this function and start_systemd_unit instead of calling subprocess.Popen. Since we will need the name of the systemd unit at different places throughout the code, we will also create a function to calculate that.

def systemd_unit_name_for_vm(vm_id: str) -> str:
    return f'aetherscale-vm-{vm_id}.service'


def callback(ch, method, properties, body):
    # ...
    
    mac_addr = interfaces.create_mac_address()
    print(f'Assigning MAC address "{mac_addr}" to VM "{vm_id}"')
    
    qemu_config = QemuStartupConfig(
        vm_id=vm_id,
        hda_image=user_image,
        mac_addr=mac_addr,
        vde_folder=Path(VDE_FOLDER))
    unit_name = systemd_unit_name_for_vm(vm_id)
    create_qemu_systemd_unit(unit_name, qemu_config)
    execution.start_systemd_unit(unit_name)
    
    print(f'Started VM "{vm_id}"')
    response = {
        'status': 'starting',
        'vm-id': vm_id,
    }
    
    # ...

With this change, run the server with the command aetherscale and start a machine from the command line:

$ aetherscale-cli start-vm --image ubuntu_base
[{'status': 'starting', 'vm-id': 'wiuakczz'}]
$ systemctl status --user aetherscale-vm-wiuakczz.service

Stopping a VM

We can also stop our instances with systemd. When systemd stops a process it sends a SIGTERM signal followed by SIGKILL if the process does not stop. This does not mean that the operating system inside the VM is allowed to shutdown cleanly, but it’s a clean stop command for QEMU. It’s also possible to specify a custom script to execute in stop with ExecStop. This can be useful for performing a clean VM shutdown through the QEMU monitor, but we will only do this in a later tutorial.

To implement this, we will create three new functions stop_systemd_unit, disable_systemd_unit and delete_systemd_unit:

def systemd_unit_path(unit_name: str) -> Path:
    systemd_unit_dir = Path().home() / '.config/systemd/user'
    return systemd_unit_dir / unit_name


def delete_systemd_unit(unit_name: str):
    systemd_unit_path(unit_name).unlink(missing_ok=True) 


def stop_systemd_unit(unit_name: str) -> bool:
    return run_command_chain([
        ['systemctl', '--user', 'stop', unit_name],
    ])


def disable_systemd_unit(unit_name: str) -> bool:
    return run_command_chain([
        ['systemctl', '--user', 'disable', unit_name],
    ])

With this, we can adjust our code to stop a VM:

def callback(ch, method, properties, body):
    # ...

    elif data['command'] == 'stop-vm':
        try:
            vm_id = data['options']['vm-id']
        except KeyError:
            print('VM ID not specified', file=sys.stderr)
            return

        unit_name = systemd_unit_name_for_vm(vm_id)
        is_running = execution.systemctl_is_running(unit_name)

        if is_running:
            execution.disable_systemd_unit(unit_name)
            execution.stop_systemd_unit(unit_name)

            response = {
                'status': 'killed',
                'vm-id': vm_id,
            }
        else:
            response = {
                'status': 'error',
                'reason': f'VM "{vm_id}" does not exist',
            }

If you’ve followed closely you might recognize that this chain of actions will delete the systemd unit file, but it will keep the VM’s image file around. This is undesired. Either we want to fully delete the VM, in which case we don’t need its image anymore. Or we want to be able to restart it, then we can keep the systemd file.

Cleaning up the CLI commands and messages

Up to now there only was the difference between starting and stopping a VM, but with being able to restart stopped VMs we should actually distinguish between more actions:

Create a new VM
(Re-)start a stopped VM
Stop a running VM
Delete a VM

So, let’s cleanup our current VM messages and create a new set of messages for VM management:

Creating a new VM:

{
    "command": "create-vm",
    "options": {
        "image": "some-image-name",
    }
}

Stopping a running VM:

{
    "command": "stop-vm",
    "options": {
        "vm-id": "id-of-the-vm",
    }
}

Re-starting a stopped VM:

{
    "command": "start-vm",
    "options": {
        "vm-id": "id-of-the-vm",
    }
}

Deleting a VM:

{
    "command": "delete-vm",
    "options": {
        "vm-id": "id-of-the-vm"
    }
}

Now that we know which messages we want to support we can implement the logic. While we’re at it, let’s change the callback function to only include the actual message handling logic:

def callback(ch, method, properties, body):
    command_fn: Dict[str, Callable[[Dict[str, Any]], Dict[str, Any]]] = {
        'list-vms': list_vms,
        'create-vm': create_vm,
        'start-vm': start_vm,
        'stop-vm': stop_vm,
        'delete-vm': delete_vm,
    }

    message = body.decode('utf-8')
    logging.debug('Received message: ' + message)

    data = json.loads(message)

    try:
        command = data['command']
    except KeyError:
        logging.error('No "command" specified in message')
        return

    try:
        fn = command_fn[command]
    except KeyError:
        logging.error(f'Invalid command "{command}" specified')
        return

    options = data.get('options', {})
    try:
        response = fn(options)
        # if a function wants to return a response
        # set its execution status to success
        resp_message = {
            'execution-info': {
                'status': 'success'
            },
            'response': response,
        }
    except Exception as e:
        resp_message = {
            'execution-info': {
                'status': 'error',
                # TODO: Only ouput message if it is an exception generated by us
                'reason': str(e),
            }
        }

    ch.basic_ack(delivery_tag=method.delivery_tag)

    if properties.reply_to:
        ch.basic_publish(
            exchange='',
            routing_key=properties.reply_to,
            properties=pika.BasicProperties(
                correlation_id=properties.correlation_id
            ),
            body=json.dumps(resp_message)

This uses a dictionary mapping from the command names to functions. If a valid command was specified the associated function will be called. Each function has to take an options Dictionary as input (even if it does not use it) and return a JSON serializable response.

Now we can also move the code for VM creation into its own function. While we’re at it, we will replace all print statements with proper logging and instead of returning on errors we will raise exceptions.

If a command raises an exception, this means that an error has happened. In this case we will write it to a sub structure called execution-info. This will hold all meta information about the execution. The actual response will go into a sub structure called response.

def create_vm(options: Dict[str, Any]) -> Dict[str, str]:
    vm_id = ''.join(
        random.choice(string.ascii_lowercase) for _ in range(8))
    logging.info(f'Starting VM "{vm_id}"')

    try:
        image_name = os.path.basename(options['image'])
    except KeyError:
        raise ValueError('Image not specified')

    try:
        user_image = create_user_image(vm_id, image_name)
    except (OSError, QemuException):
        raise

    mac_addr = interfaces.create_mac_address()
    logging.debug(f'Assigning MAC address "{mac_addr}" to VM "{vm_id}"')

    qemu_config = QemuStartupConfig(
        vm_id=vm_id,
        hda_image=user_image,
        mac_addr=mac_addr,
        vde_folder=Path(VDE_FOLDER))
    unit_name = systemd_unit_name_for_vm(vm_id)
    create_qemu_systemd_unit(unit_name, qemu_config)
    execution.start_systemd_unit(unit_name)

    logging.info(f'Started VM "{vm_id}"')
    return {
        'status': 'starting',
        'vm-id': vm_id,
    }

Inside the stop_vm code we will change the behaviour a bit. If the VM does not exist at all we will raise an exception. If it was already stopped, we will return the instance status nonetheless, but with an additional hint that it was already stopped. This could be displayed to the user and can be ignored by automatic scripts that only wanted to stop the VM (which did succeed, just maybe another process did the same thing at the same time).

To be able to distinguish between these situations we need a new function systemd_unit_exists to check if a unit exists. Since we install all VM units ourselves, we can be sure in which directory they are and just have to check for file existence.

def systemd_unit_exists(unit_name: str) -> bool:
    return systemd_unit_path(unit_name).is_file()

Calling that function we can then distinguish between whether a unit exists and is not running or whether it does not exist at all.

def stop_vm(options: Dict[str, Any]) -> Dict[str, str]:
    try:
        vm_id = options['vm-id']
    except KeyError:
        raise ValueError('VM ID not specified')

    unit_name = systemd_unit_name_for_vm(vm_id)

    if not execution.systemd_unit_exists(unit_name):
        raise RuntimeError('VM does not exist')
    elif not execution.systemctl_is_running(unit_name):
        response = {
            'status': 'killed',
            'vm-id': vm_id,
            'hint': f'VM "{vm_id}" was not running',
        }
    else:
        execution.disable_systemd_unit(unit_name)
        execution.stop_systemd_unit(unit_name)

        response = {
            'status': 'killed',
            'vm-id': vm_id,
        }

    return response

Next one is the start_vm function. To start a VM that does exist but was previously stopped, we just have to enable and start the systemd service. Again, we will return a hint if the user was starting an already started VM. It might be more reasonable to also return the status running in this case, because the VM is already running. But since it’s also possible that another process started the VM one second ago and it’s still not available we will just use starting as the response status. Returning the actual VM status can be done when we have proper status management.

def start_vm(options: Dict[str, Any]) -> Dict[str, str]:
    try:
        vm_id = options['vm-id']
    except KeyError:
        raise ValueError('VM ID not specified')

    unit_name = systemd_unit_name_for_vm(vm_id)

    if not execution.systemd_unit_exists(unit_name):
        raise RuntimeError('VM does not exist')
    elif execution.systemctl_is_running(unit_name):
        response = {
            'status': 'starting',
            'vm-id': vm_id,
            'hint': f'VM "{vm_id}" was already started',
        }
    else:
        execution.start_systemd_unit(unit_name)
        execution.enable_systemd_unit(unit_name)

        response = {
            'status': 'starting',
            'vm-id': vm_id,
        }

    return response

The last one is deletion of a VM. This is a combination of stopping the VM (de-registering it from systemd) and deletion of all resources. Up to now the resources are only the VM image and the systemd unit file. In order to not duplicate the code, we will call stop_vm, ignore the return message and then clean up the resources.

We will also create a new function user_image_path to construct the target path for a user image by VM ID. It doesn’t really do much, but it’s required in two different locations and we will probably change the path at some point in the future.

def user_image_path(vm_id: str) -> Path:
    return USER_IMAGE_FOLDER / f'{vm_id}.qcow2'


def delete_vm(options: Dict[str, Any]) -> Dict[str, str]:
    try:
        vm_id = options['vm-id']
    except KeyError:
        raise ValueError('VM ID not specified')

    stop_vm(options)

    unit_name = systemd_unit_name_for_vm(vm_id)
    user_image = user_image_path(vm_id)

    execution.delete_systemd_unit(unit_name)
    user_image.unlink()

    return {
        'status': 'deleted',
        'vm-id': vm_id,
    }

Finally, we also have to adjust the command line interface in client.py. As start-vm, stop-vm and delete-vm all have the same arguments I will handle them in one block of code.

def main():
    parser = argparse.ArgumentParser(
        description='Manage aetherscale instances')
    subparsers = parser.add_subparsers(dest='subparser_name')

    create_vm_parser = subparsers.add_parser('create-vm')
    create_vm_parser.add_argument(
        '--image', help='Name of the image to create a VM from', required=True)
    start_vm_parser = subparsers.add_parser('start-vm')
    start_vm_parser.add_argument(
        '--vm-id', dest='vm_id', help='ID of the VM to start', required=True)
    stop_vm_parser = subparsers.add_parser('stop-vm')
    stop_vm_parser.add_argument(
        '--vm-id', dest='vm_id', help='ID of the VM to stop', required=True)
    delete_vm_parser = subparsers.add_parser('delete-vm')
    delete_vm_parser.add_argument(
        '--vm-id', dest='vm_id', help='ID of the VM to delete', required=True)
    subparsers.add_parser('list-vms')

    args = parser.parse_args()

    if args.subparser_name == 'list-vms':
        response_expected = True
        data = {
            'command': 'list-vms',
        }
    elif args.subparser_name == 'create-vm':
        response_expected = True
        data = {
            'command': 'create-vm',
            'options': {
                'image': args.image,
            }
        }
    elif args.subparser_name in ['start-vm', 'stop-vm', 'delete-vm']:
        response_expected = True
        data = {
            'command': args.subparser_name,
            'options': {
                'vm-id': args.vm_id,
            }
        }
    else:
        parser.print_usage()
        sys.exit(1)

    try:
        with ServerCommunication() as c:
            result = c.send_msg(data, response_expected)
            print(result)
    except pika.exceptions.AMQPConnectionError:
        print('Could not connect to AMQP broker. Is it running?',
              file=sys.stderr)

Time to have fun

Now let’s play around with this a bit.

$ aetherscale-cli create-vm --image ubuntu_base
[{'execution-info': {'status': 'success'}, 'response': {'status': 'starting', 'vm-id': 'ygjpppvk'}}]

$ aetherscale-cli list-vms
[{'execution-info': {'status': 'success'}, 'response': ['vm-ygjpppvk']}]

aetherscale-cli create-vm --image ubuntu_base
[{'execution-info': {'status': 'success'}, 'response': {'status': 'starting', 'vm-id': 'cxzafamg'}}]

$ systemctl --user status "aetherscale-vm-*" | grep -B 2 Active
● aetherscale-vm-cxzafamg.service - aetherscale VM cxzafamg                                               
     Loaded: loaded (/home/user/.config/systemd/user/aetherscale-vm-cxzafamg.service; disabled; vendor preset: enabled)
     Active: active (running) since Sun 2020-12-06 12:00:34 CET; 2min 38s ago
--
● aetherscale-vm-ygjpppvk.service - aetherscale VM ygjpppvk
     Loaded: loaded (/home/user/.config/systemd/user/aetherscale-vm-ygjpppvk.service; disabled; vendor preset: enabled)
     Active: active (running) since Sun 2020-12-06 11:51:35 CET; 11min ago

$ aetherscale-cli stop-vm --vm-id cxzafamg
[{'execution-info': {'status': 'success'}, 'response': {'status': 'killed', 'vm-id': 'cxzafamg'}}]

$ aetherscale-cli list-vms
[{'execution-info': {'status': 'success'}, 'response': ['vm-ygjpppvk']}]

$ aetherscale-cli start-vm --vm-id cxzafamg
[{'execution-info': {'status': 'success'}, 'response': {'status': 'starting', 'vm-id': 'cxzafamg'}}]

$ aetherscale-cli stop-vm --vm-id ygjpppvk
[{'execution-info': {'status': 'success'}, 'response': {'status': 'killed', 'vm-id': 'ygjpppvk'}}]

$ aetherscale-cli delete-vm --vm-id ygjpppvk
[{'execution-info': {'status': 'success'}, 'response': {'status': 'deleted', 'vm-id': 'ygjpppvk'}}]

$ aetherscale-cli delete-vm --vm-id cxzafamg
[{'execution-info': {'status': 'success'}, 'response': {'status': 'deleted', 'vm-id': 'cxzafamg'}}]

list-vms only lists all running VMs. It would be much nicer if it would list all VMs with their current status. Feel free to implement this if you want.

There is one more open issue that can easily be solved. Enabled systemd user services do not run automatically on bootup, but only when the user is logged in. To run long-running services with standard user account, one has to enable lingering.

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.