|
| 1 | +<h1 id="post-title">Soupault 4.7.0 release: CSV support, global shared data, post-build hook, and more</h1> |
| 2 | + |
| 3 | +<p>Date: <time id="post-date">2023-09-19</time> </p> |
| 4 | + |
| 5 | +<p id="post-excerpt"> |
| 6 | +Soupault 4.7.0 is available for download from <a href="https://files.baturin.org/software/soupault/4.7.0">my own server</a> |
| 7 | +and from <a href="https://github.com/PataphysicalSociety/soupault/releases/tag/4.7.0">GitHub releases</a>. |
| 8 | +It adds support for loading CSV files, a variable for passing global data between plugins and hooks, |
| 9 | +a way to determine which two-pass workflow pass is a plugin is executed for, and a few more improvements. |
| 10 | +</p> |
| 11 | + |
| 12 | +## Configurable page character encoding |
| 13 | + |
| 14 | +By default, soupault assumes that all pages are stored in UTF-8. I would encourage everyone to migrate to it, |
| 15 | +now that all operating systems use it by default. But there are certainly sites that are older than the |
| 16 | +widespread deployment of UTF-8, and there are tools that still produce legacy encodings as well. |
| 17 | + |
| 18 | +Now it's possible to specify the encoding explicitly for such cases: |
| 19 | + |
| 20 | +```toml |
| 21 | +[settings] |
| 22 | + page_character_encoding = 'utf-8' |
| 23 | +``` |
| 24 | + |
| 25 | +The following encodings are supported: `ascii`, `iso-8859-1`, `windows-1251`, `windows-1252`, `utf-8`, |
| 26 | +`utf-16`, `utf-16le`, `utf-16be`, `utf-32le`, `utf-32be`, and `ebcdic`. |
| 27 | +You can write those options in either upper or lower case (e.g., `UTF-16LE`, `UTF-16le`, and `utf-16le` |
| 28 | +are equally acceptable). You cannot omit hyphens or replace them with underscores, though. |
| 29 | + |
| 30 | +## Plugin support for the two-pass workflow |
| 31 | + |
| 32 | +Soupault supports a [two-pass workflow](/reference-manual/#making-index-data-available-to-every-page) |
| 33 | +that allows users to make the index data available to all pages (even to content pages). |
| 34 | + |
| 35 | +That feature comes at the cost of duplicating some of the page processing work (at the very least, HTML parsing |
| 36 | +and index extraction), but enables use cases that would be impossible otherwise. |
| 37 | +For example, the [book blueprint](https://github.com/PataphysicalSociety/soupault-blueprints-book) |
| 38 | +uses that capability to inject a fully auto-generated chapter list sidebar in every page, |
| 39 | +while its main competitor, [mdBook](https://rust-lang.github.io/mdBook/), requires a hand-written chapter list. |
| 40 | + |
| 41 | +However, until this release, plugins could only guess where soupault was in its website build process, |
| 42 | +e.g., by checking if the `site_index` table was empty. That approach is not foolproof and absolutely not flexible. |
| 43 | + |
| 44 | +Now there's a new `soupault_pass` plugin environment variable: 0 when `index_first = false`, 1 and 2 for the first and the second pass respectively when it's true. |
| 45 | +Thus plugins can check if the two-pass workflow enabled at all and find out which pass is it. |
| 46 | + |
| 47 | +```lua |
| 48 | +if soupault_pass < 2 then |
| 49 | + -- Do nothing |
| 50 | +else |
| 51 | + -- Do things that require index data |
| 52 | +end |
| 53 | +``` |
| 54 | + |
| 55 | +## Global data shared between all plugins and hooks |
| 56 | + |
| 57 | +There was already `peristent_data` variable that plugins could use to preserve data — for example, |
| 58 | +to calculate the total reading time of all pages and output it on a specific page. |
| 59 | + |
| 60 | +However, there was no way for plugins and hooks to share any data. For example, suppose you want to profile |
| 61 | +your website build and measure the time it takes to build each page. You could call `Date.now_timestamp()` |
| 62 | +in `pre-parse` and `post-save` hooks, then subtract the start time from the end time... but where would you store |
| 63 | +that data to make it available to both hooks? Technically, you could inject it in the page, |
| 64 | +but that's a rather dirty hack. |
| 65 | + |
| 66 | +Now there's a new variable named `global_data` that allows different plugins and hooks to communicate |
| 67 | +without any dirty hacks. You could just do something like `global_data["start_time"] = Date.now_timestamp()` |
| 68 | +in the `pre-parse` hook and access it from the `post-render` hook easily. |
| 69 | + |
| 70 | +This feature certainly comes at the cost of making soupault process pages in parallel harder in the future. |
| 71 | +Making soupault use more than one worker thread is now blocked by the fact that Lua-ML, the Lua interpreter it uses, |
| 72 | +it neither reentrant nor thread-safe and needs a deep refactoring to make it so. When that part is done, |
| 73 | +there will be more questions about the right design for multi-core soupault workflows, but that's a question for the future. |
| 74 | + |
| 75 | +## CSV support |
| 76 | + |
| 77 | +Soupault can already load JSON, TOML, and YAML data files. However, what if you want to create a website |
| 78 | +for a product catalog for a small store? A lot of data is kept in spreadsheets or local databases, |
| 79 | +and the most common export format for such data is CSV. |
| 80 | + |
| 81 | +Now soupault supports loading CSV files, but that's not all — it can also convert CSV data with a correct header |
| 82 | +to a list of objects that you can easily pass to a template for rendering. |
| 83 | + |
| 84 | +These are the new functions: |
| 85 | + |
| 86 | +* `CSV.from_string(str)` — parses CSV data and returns it as a list (i.e., an int-indexed table) of lists. |
| 87 | +* `CSV.unsafe_from_string(str)` — like `CSV.from_string` but returns `nil` on errors instead or raising an exception. |
| 88 | +* `CSV.to_list_of_tables(csv_data)` — converts CSV data with a header returned by `CSV.from_string` into a list of string-indexed tables for easy rendering. |
| 89 | + |
| 90 | +Now let's look at the `CSV.to_list_of_tables` function in action. Let's write a Lua snippet with a CSV data embedded in it for demonstration: |
| 91 | + |
| 92 | +```lua |
| 93 | +csv_source = [[name,price,comment |
| 94 | +baby shoes,5,never worn |
| 95 | +fake amulet of Yendor,1,uncursed |
| 96 | +]] |
| 97 | + |
| 98 | +csv_data = CSV.from_string(csv_source) |
| 99 | +Log.debug(format("Raw CSV data: %s", JSON.pretty_print(csv_data))) |
| 100 | +csv_table = CSV.to_list_of_tables(csv_data) |
| 101 | +Log.debug(format("Converted CSV data: %s", JSON.pretty_print(csv_table))) |
| 102 | +``` |
| 103 | + |
| 104 | +If you add it to a plugin and run soupault, you will see the following output: |
| 105 | + |
| 106 | +``` |
| 107 | +[INFO] Processing widget csv-test on page site/index.html |
| 108 | +[DEBUG] Raw CSV data: [ |
| 109 | + [ |
| 110 | + "name", |
| 111 | + "price", |
| 112 | + "comment" |
| 113 | + ], |
| 114 | + [ |
| 115 | + "baby shoes", |
| 116 | + 5, |
| 117 | + "never worn" |
| 118 | + ], |
| 119 | + [ |
| 120 | + "fake amulet of Yendor", |
| 121 | + 1, |
| 122 | + "uncursed" |
| 123 | + ] |
| 124 | +] |
| 125 | +
|
| 126 | +[DEBUG] Converted CSV data: [ |
| 127 | + { |
| 128 | + "price": 5, |
| 129 | + "comment": "never worn", |
| 130 | + "name": "baby shoes" |
| 131 | + }, |
| 132 | + { |
| 133 | + "price": 1, |
| 134 | + "comment": "uncursed", |
| 135 | + "name": "fake amulet of Yendor" |
| 136 | + } |
| 137 | +] |
| 138 | +``` |
| 139 | + |
| 140 | +As you can see, the "converted CSV data" can be directly passed to a template like this: |
| 141 | + |
| 142 | +```jinja2 |
| 143 | +{% for i in items %} |
| 144 | +Item {{i.name}} ({{i.comment}} is sold for {{i.price}}. |
| 145 | +{% endfor %} |
| 146 | +``` |
| 147 | + |
| 148 | +## Other new features and improvements |
| 149 | + |
| 150 | +* New `max_items` option in index views allows limiting the number of displayed items. |
| 151 | +* New `post-build` hook that runs when all pages are processed and soupault is about to terminate. |
| 152 | +* Info logs to indicate the first and second passes in the `index_first = true` mode. |
| 153 | +* Debug logs now tell why a page is included or excluded from an index view: `"page_included checks for %s: regex=%b, page=%b, section=%b"` |
| 154 | + |
| 155 | +### Other new plugin API functions |
| 156 | + |
| 157 | +* `HTML.swap(l, r)` — swaps two elements in an element tree. |
| 158 | +* `HTML.wrap(node, elem)` — wraps `node` in `elem`. |
| 159 | + |
| 160 | +## Bug fixes |
| 161 | + |
| 162 | +* Fixed an unhandled exception on index entry sorting failures when `sort_strict = true` and `sort_by` is unspecified. |
| 163 | +* Fixed a typo in the comments of the config generated by `soupault --init` (s/ULRs/URLs/). |
| 164 | + |
| 165 | +## Platform support |
| 166 | + |
| 167 | +Official binaries are now available for Linux on ARM64 (e.g., RaspberryPi 3 and 4). |
0 commit comments