Soupault 2.4.0 release

Estimated reading time: 7 minutes.

Date: 2021-01-17

Soupault 2.4.0, the first release of the new year, is available for download from my own server and from GitHub releases. It offers a few bug fixes, new plugin functions (e.g. a new Value.is_nil etc. function family you can use for explicit type checking), and new options. Among others, threre’s now an option to mark some directories as "hand-made clean URLs" rather than sections to bundle a page with its assets. At the end there’s a brief discussion of the plans for 2021.

New features

An option to keep existing page title

Soupault used to always overwrite the original <title> if the title widget was active. Now you can add keep = true to the widget config to preserve the original page title in cases when a <title> element exists and isn’t empty.

Treating index pages as normal pages

Let’s face it: clean URLs is quite a dirty hack. The World Wide Web doesn’t actually have a concept of a site index, and doesn’t really differentiate ‘pages’ from ‘sections’.¹ An index page is simply a page that web servers return when a URL points at a directory rather than a file. That page isn’t guaranteed or required to provide links to other pages in that directory.

Soupault, however, does have a distinction between normal pages and section index pages. A page is either a metadata source, or a rendered index insertion target. At the very least it prevents nonsensical index pages that link to themselves.

The model is really simple: a directory is a section, a file not named index.* is a normal page, and a file named index.*² is the index page where an autogenerated section index should be inserted.

Thus, a ‘hand-made clean URL page’ like about/index.html instead of /about.html, is essentially a degenerate section with a single page.

Since soupault can transform normal pages to clean URLs by itself, normally it’s best to keep a logical site structure: directory = section, file = page, and leave creation of clean URLs to the software.

However, sometimes creating a degenerate section by hand is a sensible thing to do. One use case is bundling a page with its assets. Suppose you are making a page with a lot of photos, and those photos aren’t going to be used by any other page. In that case placing those photos in a shared asset directory will only make it harder to remember or find what pages are they used by, and will make all links to those images longer. Storing them in a directory with the page offers the easiest mental model.

There’s one issue though: how to tell a real section from a ‘hand-made clean URL’

One option is to ‘just’ check if there are any page files in that directory and or its subdirectories. However, that’s quite resource-intensive.

Another option is to use a different file name for ‘real’ and ‘fake’ index pages. Hugo uses that approach: directories with index.* pages are assumed to be leaf bundles (hand-made clean URLs), while _index.* implies a branch bundle (section).

Starting from this version, soupault offers two ways to mark your ‘index’ page as a normal page rather than a section index.

One way is like in Hugo, but configurable. Using a new force_indexing_path_regex option in the [index] table, you can make soupault treat some pages as normal pages even though their files are named index.*. This can be helpful if you only have a few such pages, or they all are within a single directory.

If you want to be able to mark any directory as a ‘leaf’ (hand-made clean URL), there’s another way: a new leaf_file option in the [index] table. Suppose you set leaf_file = ".leaf". In that case, when soupault finds a directory that has files named index.html and .leaf, it treats index.html as a normal page and extracts metadata from it.

There’s no default value for the leaf_file option, you need to set it explicitly if you want it. It’s to prevent people with existing websites from experiencing an unexpected effect (unlikely, but better take compatibility considerations seriously).

New plugin functions

widget_config alias for config (since now there’s the global soupault_config, a more specific alias may be a good idea)
build_dir and site_dir variables in the plugin environment (PR by Hristos)
Table.get_key_default(table, key, default_value)
Type checking functions in the new Value module: is_int, is_float, is_string, is_table, is_list (table with all integer keys), is_nil.
Sys.mkdir, Sys.get_file_modification_date

Bug fixes

include_subsection option in the [index] table works correctly now (used to cause a spurious option validation error).
soupault no longer outputs duplicate newlines on Windows (#19, reported by wilt00).
The HTML.get_heading_level function now works with nodes returned by HTML.select (rather than just values created with HTML.create_element).

Future plans

The year 2021 has just begun. While I can never know how it works out, I do have some plans in mind.

A new TOML parsing library is in the works. The goal is to provide nice (i.e. specific and helpful) parse error messages and allow other people to manipulate TOML data easier than other libraries allow. I can’t give a specific estimate, but I assure you I haven’t abandoned that idea.

Pagination still remains an unsolved problem. I’m thinking of a system of hooks that would allow Lua code or external scripts to take over a specific part of the generation process. I’ve been researching other projects, and found that Pandoc uses a system of filters that makes the process really flexible. This site now converts Markdown to HTML with pandoc and a Lua hook that produces CommonMark-compliant code blocks with language-* classes (by default pandoc just adds class="$language", which makes it impossible to select all <code> elements with a language set).

Another feature I have in mind is simple asset caching. There’s really no reason to copy the same files over and over again if they haven’t changed on disk. It’s not so hard to implement, so I’ll try to add it in the near future, as time allows.

Of course, the biggest plan is to make soupault multi-threaded. The multicore OCaml project is now moving faster than ever, and the multicore compiler variant is usable with normal OPAM packages now, so I can at least start experimenting with it, if not make a production release yet. This summer I spent quite a lot of time reworking the algorithms to make parallelizing a matter of switching the normal List.fold_left with a parallel fold, but the devil is often in the details. We’ll see how it goes.

I’m also planning to add soupault to package repositories like Chocolatey, HomeBrew, Flatpak etc. Whether I’ll do it or not, and which repositories I will add it to, depends on the maintenance burden it imposes on me. Im looking at it as a chance to get more familiar with those projects and see what they are like for a package maintainer.

¹Some other hypertext network projects, like Gopher (protocol), do have dedicated menus and explicit site hierarchies.

²Or whatever you set the index_page under [settings] to.