Soupault 4.7.0 release: CSV support, global shared data, post-build hook, and more

Estimated reading time: 6 minutes.

Date:

Soupault 4.7.0 is available for download from my own server and from GitHub releases. It adds support for loading CSV files, a variable for passing global data between plugins and hooks, a way to determine which two-pass workflow pass is a plugin is executed for, and a few more improvements.

Configurable page character encoding

By default, soupault assumes that all pages are stored in UTF-8. I would encourage everyone to migrate to it, now that all operating systems use it by default. But there are certainly sites that are older than the widespread deployment of UTF-8, and there are tools that still produce legacy encodings as well.

Now it’s possible to specify the encoding explicitly for such cases:

[settings]
  page_character_encoding = 'utf-8'

The following encodings are supported: ascii, iso-8859-1, windows-1251, windows-1252, utf-8, utf-16, utf-16le, utf-16be, utf-32le, utf-32be, and ebcdic. You can write those options in either upper or lower case (e.g., UTF-16LE, UTF-16le, and utf-16le are equally acceptable). You cannot omit hyphens or replace them with underscores, though.

Plugin support for the two-pass workflow

Soupault supports a two-pass workflow that allows users to make the index data available to all pages (even to content pages).

That feature comes at the cost of duplicating some of the page processing work (at the very least, HTML parsing and index extraction), but enables use cases that would be impossible otherwise. For example, the book blueprint uses that capability to inject a fully auto-generated chapter list sidebar in every page, while its main competitor, mdBook, requires a hand-written chapter list.

However, until this release, plugins could only guess where soupault was in its website build process, e.g., by checking if the site_index table was empty. That approach is not foolproof and absolutely not flexible.

Now there’s a new soupault_pass plugin environment variable: 0 when index_first = false, 1 and 2 for the first and the second pass respectively when it’s true. Thus plugins can check if the two-pass workflow enabled at all and find out which pass is it.

if soupault_pass < 2 then
  -- Do nothing
else
  -- Do things that require index data
end

Global data shared between all plugins and hooks

There was already peristent_data variable that plugins could use to preserve data — for example, to calculate the total reading time of all pages and output it on a specific page.

However, there was no way for plugins and hooks to share any data. For example, suppose you want to profile your website build and measure the time it takes to build each page. You could call Date.now_timestamp() in pre-parse and post-save hooks, then subtract the start time from the end time… but where would you store that data to make it available to both hooks? Technically, you could inject it in the page, but that’s a rather dirty hack.

Now there’s a new variable named global_data that allows different plugins and hooks to communicate without any dirty hacks. You could just do something like global_data["start_time"] = Date.now_timestamp() in the pre-parse hook and access it from the post-render hook easily.

This feature certainly comes at the cost of making soupault process pages in parallel harder in the future. Making soupault use more than one worker thread is now blocked by the fact that Lua-ML, the Lua interpreter it uses, it neither reentrant nor thread-safe and needs a deep refactoring to make it so. When that part is done, there will be more questions about the right design for multi-core soupault workflows, but that’s a question for the future.

CSV support

Soupault can already load JSON, TOML, and YAML data files. However, what if you want to create a website for a product catalog for a small store? A lot of data is kept in spreadsheets or local databases, and the most common export format for such data is CSV.

Now soupault supports loading CSV files, but that’s not all — it can also convert CSV data with a correct header to a list of objects that you can easily pass to a template for rendering.

These are the new functions:

Now let’s look at the CSV.to_list_of_tables function in action. Let’s write a Lua snippet with a CSV data embedded in it for demonstration:

csv_source = [[name,price,comment
baby shoes,5,never worn
fake amulet of Yendor,1,uncursed
]]

csv_data = CSV.from_string(csv_source)
Log.debug(format("Raw CSV data: %s", JSON.pretty_print(csv_data)))
csv_table = CSV.to_list_of_tables(csv_data)
Log.debug(format("Converted CSV data: %s", JSON.pretty_print(csv_table)))

If you add it to a plugin and run soupault, you will see the following output:

[INFO] Processing widget csv-test on page site/index.html
[DEBUG] Raw CSV data: [
  [
    "name",
    "price",
    "comment"
  ],
  [
    "baby shoes",
    5,
    "never worn"
  ],
  [
    "fake amulet of Yendor",
    1,
    "uncursed"
  ]
]

[DEBUG] Converted CSV data: [
  {
    "price": 5,
    "comment": "never worn",
    "name": "baby shoes"
  },
  {
    "price": 1,
    "comment": "uncursed",
    "name": "fake amulet of Yendor"
  }
]

As you can see, the “converted CSV data” can be directly passed to a template like this:

{% for i in items %}
Item {{i.name}} ({{i.comment}} is sold for {{i.price}}.
{% endfor %}

Other new features and improvements

Other new plugin API functions

Bug fixes

Platform support

Official binaries are now available for Linux on ARM64 (e.g., RaspberryPi 3 and 4).