A Static Site Generator

A Static Site Generator

Why?

A little while ago I wrote about micropub and this meant I needed a way to integrate dynamic content into this static site.

Of course the easiest way is to just let some dynamic part create a new file (or append to one) in this input folder and then rebuild the website, but I didn't find that very elegant. But i looked into this (and especially hugo's data feature) and it looked ok, but not great.

Anyway, I tried to upgrade hugo from a 4 year old version and while it worked, there were some things that felt needlessly changed and I had to rewrite some templates and overall I was a little annoyed at the thousand features I didn't need and the amount of docs to be read.

This was a while ago and a week ago I thought it might be fun to see just how much time and effort it would take to write a static site generator from scratch, because for all of the myriad of blog system I'd written in the past, I never wrote a static site generator.

And I think I'm finished now and decided to write a bit about nextgen.

Just the facts

  • Written in Rust
  • 333 lines of code (plus a few external crates, of course)
  • Fast enough (250 markdown pages in 0.1s)
  • It can build this website kind of identical, not bit by bit but structurally the same
  • It took roughly a week to build it, a few hours each night
  • seriously, don't use this in its current form

How it evolved

The first commit still uses a lot of regex parsing, was mostly playing around. But the basic flow is the same already and the static asset handling hasn't really changed: Use the walkdir crate to look for matching files in a directory and copy them to ./public/static.

The pulldown_cmark crate for Markdown parsing works great.

The second commit introduces the tera create for templating, which is similar to Jinja. I tried liquid at first, but it really didn't click.

This is the final form of the templates.

Then I introduced sections, which means that for example everything under /blog/ is grouped under the blog section. Also for years I had kinda wondered how websites calculate this "takes 5 minutes to read" but never looked it up. Well, hugo has that feature and so I searched a bit and settled on a very basic algorithm. Count the words in the Markdown file, divide by 200. This was close to hugo's output, but not perfect, so I am simply adding one minute and it seems to mostly match.

The next planned dependency was toml to read the front matter of the markdown files (and also the site config) and it also needed serde.

Then I noticed that I should generate directories with an index.html inside and not a file called e.g. /foo.html - thus this boring commit.

Next up were the section index pages (live example). If there's a file called _index.md in the blog folder, this gets generated, otherwise nothing happens. I got the idea how this is handled from Zola because of course I looked at a few static site generators this week.

I couldn't really work with/format dates yet, thus the next commit introduced the chrono crate, as expected.

Next up were the section index pages, which made me restructure how I keep the parsed pages in memory. These can only be generated once all the pages in this section have been parsed. I first implemented these in Rust and not in the template.

Commit #9 introduced section rss feeds (Atom actually), which was pretty straightforward.

Then the section index pages were converted to the template language and everything was cleaned up a bit.

Here all parsed markdown pages are finally sorted into a defined section. _index for the /index.html, _default for normal pages (like my about page) and the section for e.g. /blog/ pages. Everything that is not a section index or index also goes into _pages. This makes it possible to show the latest posts on the index page, in my case the last 10 of the sections blog or stack.

Now a little refactoring to eliminate a fourfold code duplication of 4 lines to write the resulting pages to disk.

And finally only the main RSS feed is missing, but the list of pages is already there, so it's easy.

An overview

And that's how the final program looks:

  • read config.toml
  • let tera read all the template files
  • copy static files
  • identify the sections
  • parse every markdown page and prepare the variables for the templates
  • if it's a normal page, just generate it
  • if it's the index or a section index, defer writing, but save the variables
  • sort by section, but keep a list of all pages
  • go over the sections and generate the section index and the section rss feed
  • generate the index page and the main rss feed

I didn't do a real comparison in the browser yet, only looked at colordiff output of the different types of pages, so there might be a few small bugfixes left, but the goal of replicating all of hugo's features that I actually use is achieved. I don't generate CSS from SASS but looking at my Makefile it's not even done by hugo now...

Next steps?

I'm not sure if I'll switch to nextgen or if I will add the features I need for micropub, but working with Rust for a few hours made me understand a few things better and I can really use the training.

So yes, I do think hugo went a little off the rails in the last few years, but I am still very happy with it and I'm thankful it was written, it was the best one I found back then.

But I dreaded updating and so I never did it, for over 4 years. Good thing is that it only parses my own content, is not a network service and so I don't need to care about security.