-
Notifications
You must be signed in to change notification settings - Fork 10
DAISY2.02 audiobooks with TOC PageList conversion to Readium WebPub Manifest
-
node --version
=>v16.13.0
(or greater) -
npm --version
=>8.1.4
(or greater)
npm install json-diff --global
npm install r2-shared-js --global
npm install r2-streamer-js --global
In case of filesystem permission failures, try with sudo
in Linux and Mac, or in Windows try opening the shell with "run as administrator" (sometimes --unsafe-perm=true
helps too)
Verify installed "binaries" (i.e. globally-available NodeJS scripts):
-
which r2-shared-js-cli
=>/usr/local/bin/r2-shared-js-cli
(for example, on Mac) -
which r2-streamer-js-server
=>/usr/local/bin/r2-streamer-js-server
(for example, on Mac)
Note that a future revision of the CLI utilities will include a UNIX "shebang" at the top of the JS file in order to automatically invoke Node executable. See below for example on how to start the scripts.
Assuming some EPUB files are present inside a folder path (replace PATH_TO_EPUB_FOLDER
with your own filesystem location, which can be absolute or relative to the current pwd
folder):
-
DEBUG=r2:* node /usr/local/bin/r2-streamer-js-server PATH_TO_EPUB_FOLDER
(note thatDEBUG=r2:*
is optional, but useful to display runtime information in the console ... for even more verbosity, useDEBUG=*
) - Open a web browser with URL
http://127.0.0.1:3000
(as indicated in the console) - Click on any blue link at the top of the page (each link corresponds to an EPUB file discovered inside the folder, but note that subfolders are not scanned by this simple server demo / test CLI)
- Click on the
./manifest.json/show/all
link, this will display the Readium WebPub Manifest with clickable links to resources (images, CSS, HTML, etc.) - Note that the
http://127.0.0.1:3000/pub/_ID_/manifest.json
URL endpoint (without/show/all
) serves the raw JSON resource, which is probably what a real world deployment would use. The/show/all
URL is here to facilitate debugging / exploration of Readium WebPub Manifest JSON.
A production deployment of the r2-streamer-js
would typically not use the built-in CLI as-is (i.e. https://github.com/readium/r2-streamer-js/blob/develop/src/http/server-cli.ts ), but instead a smarter CLI should be implemented to meet real-world needs. The core server runtime can be created with the following lines of code:
const server = new Server({
// options
});
server.preventRobots(); // for example
server.addPublications(files); // <=== this can be called any time after the server starts (incremental add/remove of publications, cache management)
const url = await server.start(0, false);
Assuming some DAISY2.02 audio-only publications are present inside a folder path (replace PATH_TO_DAISY_FOLDER
with your own filesystem location, which can be absolute or relative to the current pwd
folder):
-
DEBUG=r2:* node /usr/local/bin/r2-shared-js-cli PATH_TO_DAISY_FOLDER/book.zip PATH_TO_DAISY_FOLDER generate-daisy-audio-manifest-only
(note thatDEBUG=r2:*
is optional, but useful to display runtime information in the console ... for even more verbosity, useDEBUG=*
)
In the above example, PATH_TO_DAISY_FOLDER/book.zip
refers to a zipped DAISY fileset, but the command works with exploded / unzipped contents too:
DEBUG=r2:* node /usr/local/bin/r2-shared-js-cli PATH_TO_DAISY_FOLDER/book/ PATH_TO_DAISY_FOLDER generate-daisy-audio-manifest-only
When the DEBUG
flag is used, the console displays the following in case of success: DAISY audio only book => manifest-audio.json
and DAISY-EPUB-RWPM done.
The Readium WebPub Manifest JSON files are created based on the original DAISY filename, for example: book.zip_manifest.json
or book_manifest.json
with the unzipped folder. This file naming convention is critical, the DAISY and JSON file names must be kept in sync.
Now, simply start the "r2-streamer-js" test server inside the folder that contains the generated JSON files and original DAISY filesets, in order to demonstrate them working together: DEBUG=r2:* node /usr/local/bin/r2-streamer-js-server PATH_TO_DAISY_FOLDER
. The CLI offers an easy way to test the server, but in a real-world scenario the server.addPublications(files)
Javascript function would be called after the server is started to enable the on-demand streaming of the Readium WebPub Manifest JSON. For example server.addPublications([PATH_TO_JSON_FILE])
, and the streamer will automatically find the corresponding original DAISY book based on the common root filename.
Note that the current r2-streamer-js
implementation does not provide an out-of-the-box caching / memory management solution. It is therefore recommended to write additional logic based on server.removePublications(files)
or server.uncachePublication(file)
in order to ensure that the streamer runtime does not allocate unnecessary memory, and does not keep filesystem handles open during access to zipped publications or unzipped folders.
See:
https://github.com/readium/r2-streamer-js/blob/a2faa6140074418fc354bca792023b387cb837a3/src/http/server.ts#L304-L332
and:
https://github.com/readium/r2-streamer-js/blob/a2faa6140074418fc354bca792023b387cb837a3/src/http/server.ts#L338-L411
Point of interest: Thorium (the desktop app) is currently the most active user of the streamer software component, which powers the application's publication backend service. The server is started and killed automatically based on whether or not publications are opened. Publications are removed from the streamer's internal cache as soon as all opened windows are closed by the user. Naturally, this memory management strategy isn't applicable to a real network client/server context, but it shows versatility. A future revision of r2-streamer-js
might include an off-the-shelf cache invalidation strategy, such as least-recently-used / time window. Suggestions welcome! :)