Soupault 2.5.0 release
Date:
Soupault 2.5.0, is available for download from my own server
and from GitHub releases.
New features in this release include an option to preserve the original whitespace in HTML pages (i.e. disable pretty-printing)
and two new built-in widgets for rewriting internal links: relative_links
and absolute_links
.
There are also some bug fixes and quality of life improvements.
New features
HTML output whitespace control
There’s now a new option pretty_print_html
under [settings]
. When set to false
, it will prevent soupault from inserting any whitespace into HTML for aesthetic reasons.
This option does not enable any kind of HTML minification: whitespace from templates and content files will be preserved. Soupault will only refrain from inserting any more whitespace into the page on its own.
In this release, it’s true
by default for compatibility with older releases.
Deprecation warning: In some cases, trying to ‘prettify’ HTML by inserting more whitespace actually breaks the intended layout (thanks to Thomas Letan for pointing it out). For this reason, this option will be set to false
by default in the next release.
If you want to keep the current behaviour in the future, make sure to adjust your configuration:
[settings]
pretty_print_html = true
Internal link rewriting widgets
The mainstream assumption is that every website has its own domain and is located at its virtual host root.1 This is indeed the simplest setup, since you can link to a shared resource at the site root with just <img src="/images/header.png">
or similar.
However, that assumption isn’t always true. First, with the resurgence of public access UNIX hosts (sometimes called “tildes”) the “site in a subdirectory” scheme (like example.com/~user/
) also made a comeback. Second, there are pages that are naturally placed in a subdirectory, like autogenerated documentation.
There are now two new built-in widgets that will help users deal with this issue.
relative-links
The relative-links
widget adjusts internal links to account for their depth in the directory tree to allow hosting the website in any location.
Suppose you have this in your templates/main.html
: <img src="/header.png">
. Then in about/index.html
that element will be rewritten as <img src="../header.png">
; in books/magnetic-fields/index.html
it will be <img src="../../header.png">
and so on.
Default configuration:
[widgets.relativize]
widget = "relative_links"
check_file = false
exclude_target_regex = '^((([a-zA-Z0-9]+):)|#|\.|//)'
The default regex is meant to exclude links that are either:
- External links with a URI schema.
- Links to anchors within the same page.
- Hand-made relative links.
- Protocol-relative URLs.
If you want to narrow the scope down, you can use the only_target_regex
option instead. For example, with only_target_regex = '^/[a-zA-Z0-9]'
, it will only rewrite links like /style.css
.
The check_file
option is helpful is you have pages with unmarked relative links, e.g. there’s about/index.html
with <img src="selfie.jpg">
in it, and also about/selfie.jpg
file. Arguably, it would be a good idea to use <img src="./selfie.jpg">
to make it explicit where the file is, but it may be impractical to modify all old pages just to be able to use this widget.
In that case you can set check_file = true
and this widget will rewrite such links only if there is no such file in the directory with the page.
absolute-links
This widget is prepends a prefix to every internal link. A polar opposite of the relative-links
widget.
Sample configuration:
[widget.absolutize]
widget = absolute_links"
prefix = "https://example.com/~jrandomhacker"
A prefix can be simply a directory, a URI schema or a host address is not required.
This widget supports all options of the relative_links
widget.
Bug fixes
On Windows, errors of external programs executed by exec
and preprocess_element
widgets (as well as plugins) could crash soupault. It was because Windows doesn’t have a concept of signals, so the standard library communicates conditions like SIGPIPE
(“broken pipe”) by raising exceptions. Soupault handles these exceptions correctly now, so the behaviour is consistent between all OSes.
The ignore_extensions
option now checks all extensions rather than just the last. I.e. ignore_extensions = ["tar"]
will match both file.tar
and file.tar.gz
now (report by Anton Bachin).
Quality of life
-
Exception tracing now is automatically enabled by
settings.debug = true
and by--debug
command line option, no need to setOCAMLRUNPARAM=b
by hand anymore. -
Better error messages for attempts to run soupault outside of a project directory, and for missing templates and
site_dir
. -
Proper alignment of options in the output of
soupault --help
(patch by Anton Bachin).
New plugin functions
-
Sys.get_extensions
,Sys.has_extension