Get RSS Feeds for Youtube Channels with a Shell Script
Sometimes, I’d like to follow Youtube channels, but I don’t want to use the Youtube website for that. Luckily, Youtube offers RSS feeds, they just don’t advertise them publicly (so let’s see for how long they continue to exist…).
The URL of an RSS feed of a channel has the following schema:
https://www.youtube.com/feeds/videos.xml?channel_id=<channel-id>
For channel URLs I know the following two schemas:
https://www.youtube.com/channel/<channel-id>
https://www.youtube.com/c/<channel-name>
For the first case, it’s really simple. We just need to extract the
channel ID from the URL and generate the RSS feed URL using that
channel ID. I used the Linux commands grep
, rev
and cut
to achieve
this.
We first can grep the URL to make sure that a user has passed a channel
URL with channel ID to the program. If this is the case, we just extract
the channel ID as the last component after a slash. To get the last
component, we reverse the string, get the first component using cut
and
then reverse the result again.
channel_with_id=$(echo $url | grep 'youtube.com/channel/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_id" ]]; then
channel_id=$(echo $channel_with_id | rev | cut -d/ -f1 | rev)
echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
exit 0
fi
The situation for named channels is a bit more difficult, but we can also
achieve it quite quickly using sed
. Named channels contain the channel ID
in their source code in a meta
tag with itemprop="channelID"
. We can
now address this task in two steps: First, drop all lines that do not
contain the string itemprop="channelId"
. For all lines that do contain
the string itemprop="channelId"
, match the full tag and all surroundings
and replace everything (i.e. the whole line) with the matched channel ID.
Let me explain some sed logic here which might not be so common. First of all, a sed command consists of an address and a command.
When you commonly use /search/d
to delete lines containing search
the
string /search/
acts as an address to the command d
. It tells d
which
lines it should delete.
The s
command is most commonly seen without an address, but it does
also accept an address. When you are using s/search/replace/g
to replace
all occurrences of search
with replace
in a text, you are not supplying
an address. Everything following s
belongs to the s
command.
Not supplying an address effectively tells sed to execute the command on
any line.
However, s
also accepts an address. For example, you could say that
you only want to replace search
with replace
on lines containing the
string here
: /here/s/search/replace/g
.
Let’s quickly try this out before we continue with the RSS feed script:
$ echo "search for search here" | sed "/here/s/search/replace/g"
replace for replace here
$ echo "search for search" | sed "/here/s/search/replace/g"
search for search
And that’s exactly the pattern used in the following sed command. We first filter the lines down to lines that contain the string we are looking for (hopefully only a single line matches), and then we execute a more complex substitution command on that line.
channel_with_name=$(echo $url | grep 'youtube.com/c/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_name" ]]; then
channel_id=$(curl --silent $url | sed -e '/itemprop="channelId"/ ! d' -e '/itemprop="channelId"/s/.*<meta itemprop="channelId" content="\([0-9a-zA-Z_-][0-9a-zA-Z_-]*\)">.*/\1/')
echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
exit 0
fi
The complete script only contains some more boilerplate:
#!/usr/bin/env bash
if [[ -z $1 ]]; then
echo "Please specify a channel URL"
exit 1
fi
url=$1
channel_with_id=$(echo $url | grep 'youtube.com/channel/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_id" ]]; then
channel_id=$(echo $channel_with_id | rev | cut -d/ -f1 | rev)
echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
exit 0
fi
channel_with_name=$(echo $url | grep 'youtube.com/c/[0-9a-zA-Z_-][0-9a-zA-Z_-]*' --only-matching)
if [[ -n "$channel_with_name" ]]; then
channel_id=$(curl --silent $url | sed -e '/itemprop="channelId"/ ! d' -e '/itemprop="channelId"/s/.*<meta itemprop="channelId" content="\([0-9a-zA-Z_-][0-9a-zA-Z_-]*\)">.*/\1/')
echo "https://www.youtube.com/feeds/videos.xml?channel_id=$channel_id"
exit 0
fi
echo "Could not extract RSS feed from URL"
exit 1
And here is a test execution on a channel about table tennis:
$ ./youtube-channel-rss.sh https://www.youtube.com/c/TomLodziak
https://www.youtube.com/feeds/videos.xml?channel_id=UCqHc12tVGhzBtlMUq7MR7wA