Soupault 4.8.0 release

Estimated reading time: 5 minutes.

Date:

Soupault 4.8.0 is available for download from my own server and from GitHub releases. It is a small release that fixes a bug with parsing HTML of page bodies that could cause weird behavior when tags like style were found in pages, makes the index entry data available to all hooks past the indexing stage, and adds a small helper function — HTML.inner_text(). However, there are big plans for the next release (whenever it is made) — read on for details.

New features and improvements

New plugin API functions

Bug fixes

Future plans

I’m still glad that most design decisions I made early in the development process stood the test of time. However, some decisions clearly did not: one of them is the idea to always process all pages sequentially.

Here’s the thing: most static site generators load all pages into memory before they do any further processing. They load all data, then process it, then save it all to disk.

Soupault, as of now, doesn’t: it loads a list of page files only, then reads every page file individually, processes it, saves the generated HTML to disk, and moves on to the next page. To provide site index data to index pages, it splits the page file list into content and index pages and processes content pages first, then gets to the index pages (typically, pages names index.*).

There are a few reasons why I did it that way:

However, there’s a huge compromise: content pages don’t have access to site index data by default. Complete site index In SSGs that use “front matter”, it’s trivial to provide every page with its own metadata. Soupault, however, extracts metadata from HTML itself, and thus widgets can create new metadata: for this reason there’s an option to only do index extraction after certain widgets have run (extract_after_widgets).

That meant that the only way to provide content pages with access to their own metadata was to do some of the processing twice, so in soupault 4.0.0 I introduced a two-pass workflow that allows the user to do that, at the cost of increased build time and already less obvious debug logs.

Initially I assumed that accessing page’s own metadata or the entire site index from content pages was an uncommon use case. However, that assumption was obviously false, and there are lots use cases for that: autogenerated site-wide navigation menus (like the chapter index pane in ocamlbook.org), tag clouds, and more. Moreover, lots of questions on the mailing list and the IRC channel are about that behavior, and it’s clear that making index data available to all pages by default would make soupault much more intuitive for people.

So, in the next big release, I plan to make the following changes:

I still need to work out implementation details so it will take some time, and I have a few more plans for the next big release that also need quite a lot of work, but I’m committed to making soupault better for everyone, so make sure to check the mailing list and the Atom feed once in a while.