Get RSS Feeds for Youtube Channels with a Shell Script

Sometimes, I’d like to follow Youtube channels, but I don’t want to use the Youtube website for that. Luckily, Youtube offers RSS feeds, they just don’t advertise them publicly (so let’s see for how long they continue to exist…).

The URL of an RSS feed of a channel has the following schema:

https://www.youtube.com/feeds/videos.xml?channel_id=<channel-id>

For channel URLs I know the following two schemas:

https://www.youtube.com/channel/<channel-id>
https://www.youtube.com/c/<channel-name>

For the first case, it’s really simple. We just need to extract the channel ID from the URL and generate the RSS feed URL using that channel ID. I used the Linux commands grep, rev and cut to achieve this.

We first can grep the URL to make sure that a user has passed a channel URL with channel ID to the program. If this is the case, we just extract the channel ID as the last component after a slash. To get the last component, we reverse the string, get the first component using cut and then reverse the result again.

channel_with_id=$(echo $url | grep 'youtube.com/channel/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_id" ]]; then
    channel_id=$(echo $channel_with_id | rev | cut -d/ -f1 | rev)
    echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
    exit 0
fi

The situation for named channels is a bit more difficult, but we can also achieve it quite quickly using sed. Named channels contain the channel ID in their source code in a meta tag with itemprop="channelID". We can now address this task in two steps: First, drop all lines that do not contain the string itemprop="channelId". For all lines that do contain the string itemprop="channelId", match the full tag and all surroundings and replace everything (i.e. the whole line) with the matched channel ID.

Let me explain some sed logic here which might not be so common. First of all, a sed command consists of an address and a command.

When you commonly use /search/d to delete lines containing search the string /search/ acts as an address to the command d. It tells d which lines it should delete.

The s command is most commonly seen without an address, but it does also accept an address. When you are using s/search/replace/g to replace all occurrences of search with replace in a text, you are not supplying an address. Everything following s belongs to the s command. Not supplying an address effectively tells sed to execute the command on any line. However, s also accepts an address. For example, you could say that you only want to replace search with replace on lines containing the string here: /here/s/search/replace/g.

Let’s quickly try this out before we continue with the RSS feed script:

$ echo "search for search here" | sed "/here/s/search/replace/g"
replace for replace here

$ echo "search for search" | sed "/here/s/search/replace/g"
search for search

And that’s exactly the pattern used in the following sed command. We first filter the lines down to lines that contain the string we are looking for (hopefully only a single line matches), and then we execute a more complex substitution command on that line.

channel_with_name=$(echo $url | grep 'youtube.com/c/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_name" ]]; then
    channel_id=$(curl --silent $url | sed -e '/itemprop="channelId"/ ! d' -e '/itemprop="channelId"/s/.*<meta itemprop="channelId" content="\([0-9a-zA-Z_-][0-9a-zA-Z_-]*\)">.*/\1/')
    echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
    exit 0
fi

The complete script only contains some more boilerplate:

#!/usr/bin/env bash

if [[ -z $1 ]]; then
    echo "Please specify a channel URL"
    exit 1
fi

url=$1

channel_with_id=$(echo $url | grep 'youtube.com/channel/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_id" ]]; then
    channel_id=$(echo $channel_with_id | rev | cut -d/ -f1 | rev)
    echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
    exit 0
fi

channel_with_name=$(echo $url | grep 'youtube.com/c/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_name" ]]; then
    channel_id=$(curl --silent $url | sed -e '/itemprop="channelId"/ ! d' -e '/itemprop="channelId"/s/.*<meta itemprop="channelId" content="\([0-9a-zA-Z_-][0-9a-zA-Z_-]*\)">.*/\1/')
    echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
    exit 0
fi

echo "Could not extract RSS feed from URL"
exit 1

And here is a test execution on a channel about table tennis:

$ ./youtube-channel-rss.sh https://www.youtube.com/c/TomLodziak
https://www.youtube.com/feeds/videos.xml?channel_id=UCqHc12tVGhzBtlMUq7MR7wA

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.