Proponents of ephemeral servers say that one should treat infrastructure like cattle instead of pets. The counter argument is that a farmer would not just let their investments (animals) die, but at least try to give them some medicine. While I don’t perform changes on individual servers in my server fleet, there are definitely times when I need to to ssh into them and check something.

For example, today I saw a weird spike in max response times of a service in my monitoring. For a brief period it went from 5 seconds to around 80 seconds. I was uncertain whether this might be an ongoing issue. But which of the six servers in the fleet is the culprit?

The simple solution for me was to just log in to all servers of the AWS auto scaling group with clusterssh. With the AWS command line tool combined with jq this is extremely simple. We can poll information for all instances in an auto scaling group. With the instance IDs of all servers in the scaling group we can then fetch all IP addresses of these servers. Since it’s an Ubuntu instance, we just prepend ubuntu as user to all IP addresses and then run cssh

#!/usr/bin/env bash

name=$1
region=$2

instances=$(aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names $name \
    --region $region | jq -r '.AutoScalingGroups[0].Instances[].InstanceId')
aws ec2 describe-instances --instance-ids $instances --region $region \
    | jq -r '.Reservations[].Instances[].PublicIpAddress' \
    | awk '{print "ubuntu@" $0}' | xargs cssh

Voila, we can now run commands on all servers at the same time.

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.