Lowercase all Tags in Jekyll Posts using sed

I’m currently reading the book sed & awk by Dale Dougherty and Arnold Robbins and already in the third chapter I learnt amazing new features. Like probably many other people up to now I have only used sed for simple one-liner replacement commands using the sed commands s or d. Whenever I wanted to achieve something, I had to use a search engine, look up a few answers on stackoverflow and then adjust them. But that’s not the way I want to learn my tools! I want to learn them profoundly.

One use case I currently have - and only after reading a bit in the book I recognized that this can be done with sed - is lowercasing all tags in my Jekyll blog. It seems that sometimes I wrote tags using a capital first letter and sometimes not. Jekyll displays warnings for this when I generate my page and only one out of both variants gets actually listed in the tag overview pages.

sed

To learn sed correctly, I bought sed & awk. The book starts with an explanation of the origins of sed and awk rooting in the editor ed. A sed command has a very simple structure: Address and command. An address can be omited, there can be one address or an address range. Addresses can be given as line numbers or as regular expressions. Regular expressions are surrounded by slashes.

Example of addresses or address ranges are thus (maybe not all, but the ones I know up to now):

(empty): execute command on all lines
1: explicit line number
/ab*a/: regular expression
S,E: range selection (where S and E can be explicit line numbers of regular expressions)

A sed command can thus be as simple as 1d (delete the first line), or as complex as /^a/,/^c/s/foo/bar/g (replace all occurrences of foo with bar on all lines as soon as a line starting with a has been found until a line starting with c has been found). But in both cases it has the same simple basic structure: address and command.

Ranges can also be nested, but for this you need to use multiline scripts. Let’s look at an example that would also work as a single line script but is more readable in multiple lines. Let’s assume we have a document where lists are surrounded by markers LIST and END. We want to replace each - character of list items with a *.

For example given the input

These are some important items to remember:

LIST
- foo
- bar
END

Be sure to remember them carefully.

the output should be:

These are some important items to remember:

LIST
* foo
* bar
END

Be sure to remember them carefully.

The following sed script can achieve this:

/^LIST$/,/^END$/{
     s/-/*/
}

Jekyll Post Tags

A post in Jekyll has a front matter, a region for meta data that is not interpreted as Markdown.

Here is an example of a Jekyll front matter:

---
author: Author
date: 2022-01-01 00:00:00+0000
title: "Some blog title"
layout: post
categories:
- Category
tags:
- tag
- tag2
---

As we can see the front matter is delimited by dashes. The Jekyll documentation does not mention that other numbers than three dashes are allowed, but I’d rather be flexible and allow more than three in the regular expression.

This already gives us a first idea for our sed script to lowercase tags:

/^---/,/$---/{

}

This range selects everything between the first and the second front matter marker. However, that’s not enough, yet. If we lowercase all lines in this selection, we also lowercase the author name, the post title and so on.

We can throw in a second selection range to get all lines starting with tags: up to and including the first next line that does not start with a -. This is either the next parameter in the front matter or the end of the outer selection.

/^---/,/$---/{
    /^tags:/,/^[^-]/{

    }
}

This selector should give us excerpts similar to the following ones:

tags:
- tag1
- tag2
---

tags: tag1 tag2
categories: Category

tags:
- tag1
- tag2
categories: Category

So, we have one more problem to solve: We want to lowercase all lines starting with tags: or with a -, but not other lines like the line starting with categories:.

We can achieve this with yet another selection! /^-/ will match all lines starting with - and /^tags:/ will match all lines starting with tags:.

/---/,/---/{
    /tags:/,/^[^-]/{
        /^-/s/.*/\L&/
        /^tags:/s/.*/\L&/
    }
}

Open Issues

There is one problem with the script, which with my current sed knowledge I cannot solve. /---/,/---/ does not only match the front matter, but also any occurrence of similar looking sections later. When I ran the script on all my blog posts this produced one false positive, a RSA key sample I had posted once.

For me, this is good enough. I can undo the false positive and then commit all changes to the repository.

I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.