Soupault 2.5.0 release

Estimated reading time: 4 minutes.

Date:

Soupault 2.5.0, is available for download from my own server and from GitHub releases. New features in this release include an option to preserve the original whitespace in HTML pages (i.e. disable pretty-printing) and two new built-in widgets for rewriting internal links: relative_links and absolute_links. There are also some bug fixes and quality of life improvements.

New features

HTML output whitespace control

There’s now a new option pretty_print_html under [settings]. When set to false, it will prevent soupault from inserting any whitespace into HTML for aesthetic reasons.

This option does not enable any kind of HTML minification: whitespace from templates and content files will be preserved. Soupault will only refrain from inserting any more whitespace into the page on its own.

In this release, it’s true by default for compatibility with older releases.

Deprecation warning: In some cases, trying to ‘prettify’ HTML by inserting more whitespace actually breaks the intended layout (thanks to Thomas Letan for pointing it out). For this reason, this option will be set to false by default in the next release.

If you want to keep the current behaviour in the future, make sure to adjust your configuration:

[settings]
  pretty_print_html = true

Internal link rewriting widgets

The mainstream assumption is that every website has its own domain and is located at its virtual host root.1 This is indeed the simplest setup, since you can link to a shared resource at the site root with just <img src="/images/header.png"> or similar.

However, that assumption isn’t always true. First, with the resurgence of public access UNIX hosts (sometimes called “tildes”) the “site in a subdirectory” scheme (like example.com/~user/) also made a comeback. Second, there are pages that are naturally placed in a subdirectory, like autogenerated documentation.

There are now two new built-in widgets that will help users deal with this issue.

relative-links

The relative-links widget adjusts internal links to account for their depth in the directory tree to allow hosting the website in any location.

Suppose you have this in your templates/main.html: <img src="/header.png">. Then in about/index.html that element will be rewritten as <img src="../header.png">; in books/magnetic-fields/index.html it will be <img src="../../header.png"> and so on.

Default configuration:

[widgets.relativize]
  widget = "relative_links"
  check_file = false
  exclude_target_regex = '^((([a-zA-Z0-9]+):)|#|\.|//)'

The default regex is meant to exclude links that are either:

If you want to narrow the scope down, you can use the only_target_regex option instead. For example, with only_target_regex = '^/[a-zA-Z0-9]', it will only rewrite links like /style.css.

The check_file option is helpful is you have pages with unmarked relative links, e.g. there’s about/index.html with <img src="selfie.jpg"> in it, and also about/selfie.jpg file. Arguably, it would be a good idea to use <img src="./selfie.jpg"> to make it explicit where the file is, but it may be impractical to modify all old pages just to be able to use this widget.

In that case you can set check_file = true and this widget will rewrite such links only if there is no such file in the directory with the page.

absolute-links

This widget is prepends a prefix to every internal link. A polar opposite of the relative-links widget.

Sample configuration:

[widget.absolutize]
  widget = absolute_links"
  prefix = "https://example.com/~jrandomhacker"

A prefix can be simply a directory, a URI schema or a host address is not required.

This widget supports all options of the relative_links widget.

Bug fixes

On Windows, errors of external programs executed by exec and preprocess_element widgets (as well as plugins) could crash soupault. It was because Windows doesn’t have a concept of signals, so the standard library communicates conditions like SIGPIPE (“broken pipe”) by raising exceptions. Soupault handles these exceptions correctly now, so the behaviour is consistent between all OSes.

The ignore_extensions option now checks all extensions rather than just the last. I.e. ignore_extensions = ["tar"] will match both file.tar and file.tar.gz now (report by Anton Bachin).

Quality of life

New plugin functions


1This assumption is even enforced by many services like Google Webmaster Tools and Google Analytics. Some services like Alexa go further and require a second-level domain to work. Whether this is systemic discrimination against small, independent, non-commercial websites or merely a bad assumption made without an evil intent is debatable.