Soupault 2.4.0 release
Soupault 2.4.0, the first release of the new year, is available for download from my own server
and from GitHub releases.
It offers a few bug fixes, new plugin functions (e.g. a new
Value.is_nil etc. function family you can use for explicit type checking),
and new options. Among others, threre’s now an option to mark some directories as "hand-made clean URLs" rather than sections to
bundle a page with its assets. At the end there’s a brief discussion of the plans for 2021.
An option to keep existing page title
Soupault used to always overwrite the original
<title> if the
title widget was active. Now you can add
keep = true to the widget config to preserve the original page title in cases when a
<title> element exists and isn’t empty.
Treating index pages as normal pages
Let’s face it: clean URLs is quite a dirty hack. The World Wide Web doesn’t actually have a concept of a site index, and doesn’t really differentiate ‘pages’ from ‘sections’.1 An index page is simply a page that web servers return when a URL points at a directory rather than a file. That page isn’t guaranteed or required to provide links to other pages in that directory.
Soupault, however, does have a distinction between normal pages and section index pages. A page is either a metadata source, or a rendered index insertion target. At the very least it prevents nonsensical index pages that link to themselves.
The model is really simple: a directory is a section, a file not named
index.* is a normal page, and a file named
index.*2 is the index page where an autogenerated section index should be inserted.
Thus, a ‘hand-made clean URL page’ like
about/index.html instead of
/about.html, is essentially a degenerate section with a single page.
Since soupault can transform normal pages to clean URLs by itself, normally it’s best to keep a logical site structure: directory = section, file = page, and leave creation of clean URLs to the software.
However, sometimes creating a degenerate section by hand is a sensible thing to do. One use case is bundling a page with its assets. Suppose you are making a page with a lot of photos, and those photos aren’t going to be used by any other page. In that case placing those photos in a shared asset directory will only make it harder to remember or find what pages are they used by, and will make all links to those images longer. Storing them in a directory with the page offers the easiest mental model.
There’s one issue though: how to tell a real section from a ‘hand-made clean URL’
One option is to ‘just’ check if there are any page files in that directory and or its subdirectories. However, that’s quite resource-intensive.
Another option is to use a different file name for ‘real’ and ‘fake’ index pages. Hugo uses that approach: directories with
index.* pages are assumed to be leaf bundles (hand-made clean URLs), while
_index.* implies a branch bundle (section).
Starting from this version, soupault offers two ways to mark your ‘index’ page as a normal page rather than a section index.
One way is like in Hugo, but configurable. Using a new
force_indexing_path_regex option in the
[index] table, you can make soupault treat some pages as normal pages even though their files are named
index.*. This can be helpful if you only have a few such pages, or they all are within a single directory.
If you want to be able to mark any directory as a ‘leaf’ (hand-made clean URL), there’s another way: a new
leaf_file option in the
[index] table. Suppose you set
leaf_file = ".leaf". In that case, when soupault finds a directory that has files named
.leaf, it treats
index.html as a normal page and extracts metadata from it.
There’s no default value for the
leaf_file option, you need to set it explicitly if you want it. It’s to prevent people with existing websites from experiencing an unexpected effect (unlikely, but better take compatibility considerations seriously).
New plugin functions
config(since now there’s the global
soupault_config, a more specific alias may be a good idea)
site_dirvariables in the plugin environment (PR by Hristos)
Table.get_key_default(table, key, default_value)
Type checking functions in the new
is_list(table with all integer keys),
include_subsectionoption in the
[index]table works correctly now (used to cause a spurious option validation error).
- soupault no longer outputs duplicate newlines on Windows (#19, reported by wilt00).
HTML.get_heading_levelfunction now works with nodes returned by
HTML.select(rather than just values created with
The year 2021 has just begun. While I can never know how it works out, I do have some plans in mind.
A new TOML parsing library is in the works. The goal is to provide nice (i.e. specific and helpful) parse error messages and allow other people to manipulate TOML data easier than other libraries allow. I can’t give a specific estimate, but I assure you I haven’t abandoned that idea.
Pagination still remains an unsolved problem. I’m thinking of a system of hooks that would allow Lua code or external scripts to take over a specific part of the generation process. I’ve been researching other projects, and found that Pandoc uses a system of filters that makes the process really flexible. This site now converts Markdown to HTML with pandoc and a Lua hook that produces CommonMark-compliant code blocks with
language-* classes (by default pandoc just adds
class="$language", which makes it impossible to select all
<code> elements with a language set).
Another feature I have in mind is simple asset caching. There’s really no reason to copy the same files over and over again if they haven’t changed on disk. It’s not so hard to implement, so I’ll try to add it in the near future, as time allows.
Of course, the biggest plan is to make soupault multi-threaded. The multicore OCaml project is now moving faster than ever, and the multicore compiler variant is usable with normal OPAM packages now, so I can at least start experimenting with it, if not make a production release yet. This summer I spent quite a lot of time reworking the algorithms to make parallelizing a matter of switching the normal
List.fold_left with a parallel
fold, but the devil is often in the details. We’ll see how it goes.
I’m also planning to add soupault to package repositories like Chocolatey, HomeBrew, Flatpak etc. Whether I’ll do it or not, and which repositories I will add it to, depends on the maintenance burden it imposes on me. Im looking at it as a chance to get more familiar with those projects and see what they are like for a package maintainer.