Releases: SciLifeLab/umi-transfer
Releases · SciLifeLab/umi-transfer
Version 1.5.0 - "Able adenine"
New and Improved Features:
- Since this release, umi-transfer features internal multi-threaded output compression. As a result, umi-transfer 1.5 now runs approximately 25 times faster than version 1.0 when using internal compression and about twice as fast compared to using an external compression tool via a buffered pipe.
- The new CLI arguments
-t
,--threads <NUM_THREADS>
and-l
,--compression_level <COMPRESSION_LEVEL>
have been introduced accordingly. The compression level defaults to 3. Higher numbers result in marginally smaller files but take significantly longer to compress. For the number of threads, we recommend 9 or 11, if a sufficient number of logical cores are available. - The Docker base image has been updated from Debian Bullseye (Version 11) to Debian Bookworm (Version 12).
- Improved integration tests, including CLI prompts.
- Updates to the internal Rust libraries and dependencies.
Discontinued Previous Features:
- None
Changes in detail:
- Update the Rust CI on the main branch by @MatthiasZepper in #12
- Implement multi-threaded FastQ compression for umi-transfer by @MatthiasZepper in #11
- Polishing for 1.5 release by @MatthiasZepper in #13
- Version 1.5.0 release PR - "Able adenine" by @MatthiasZepper in #14
Full Changelog: v1.0.0...v.1.5.0
Version 1.0.0 - "Ardent adenine"
This release represents an essentially full rewrite of umi-transfer
by Johannes Alneberg (@alneberg) and me (@MatthiasZepper):
New and Improved Features:
- Code organization: The code base has been split into separate files, with each file representing a subcommand and its associated CLI configuration. This improves clarity and allows for easy integration of additional subcommands and functionalities in the future.
- Enhanced CLI options: The CLI arguments have been revamped for improved usability. Previously, specifying the output directly was not possible, hindering the creation of a nf-core Nextflow module. Specifying an output is still optional, but now the output file names are derived from the input file names rather than from a constant base provided as CLI argument. Furthermore, the delimiter used to join the UMIs can be customized now. The
--edit_nr
flag has been renamed to--correct_numbers
and applies to both files for better consistency. - Improved output file handling: The output file name will automatically include a .gz suffix if the
-z
/--compress
flag is enabled. Conversely, an eventual suffix will be removed if no compression was requested. Additionally, the tool verifies that the output file does not exist yet and prompts for overwrite confirmation (unless-f
/--force
is specified). - Enhanced error handling: Functions have been rewritten to utilize Results and Options, enabling proper error handling. Before, many functions simply panicked and the program crashed, for example if a non-existing input file was specified.
- UMI ID validation: The tool now compares the ID of the UMI to that of the read, ensuring that the tool terminates upon encountering a mismatch. This prevents incorrect UMIs from being added to the read IDs due to differently sorted files.
- Automated tests: Several unit tests and extensive integration tests have been implemented to enhance the reliability of the tool.
- Continuous integration pipelines: The CI pipelines have been refactored, and a new release pipeline builds the tool for seven common architectures.
Discontinued Previous Features:
- Support for inline UMIs: The previous inline functionality for transferring fixed-length UMIs was limited and did not support offsets or regular expressions. Since there are existing tools like
umitools
that already serve this purpose, we decided to prioritize the development of novel functionality. However, the new subcommand structure in the code paves the way for future support of inline UMIs. - Progress bar: The progress bar provided a helpful visual aid, but it required counting one of the files to determine the total number of reads, resulting in the need to read the file twice. Considering performance reasons, we made the decision to remove this feature, especially since most runs are expected to be non-interactive in workflow systems like Nextflow.
- Multi-threading: In the previous version (0.1) of
umi-transfer
, it was possible to run the tool on two cores when processing paired FastQ files, with each file assigned to a separate thread. However, the tool's performance was primarily limited by output compression, and multi-threading caused significant overhead. A future version ofumi-transfer
will be designed to run fully asynchronous and efficiently scale over multiple threads. In the meantime, we recommend utilizing FIFOs and external compression with tools likepigz
. - Support for singletons: To simplify the code structure, we made the second FastQ file mandatory. For running on singletons, you can provide the same input twice and redirect one of the output files to
/dev/null
using a FIFO.
v.0.2.0 Gzipped input and output file support, customizable flags etc.
- Gzipped input files are now accepted. Non gzipped files are still also accepted. The tool will automatically detect if the input is gzipped or not.
- Automatically gzip output files. This can be disabled with the '--no-gzip' flag.
- Disabled automatic renaming from '3' to '2' in R3 output files. This can be enabled again with the '--edit-nr' flag.
- Progress bar UI added in console.
- Better help messages printed with '--help'/'-h' flag.
- Comments added to code.
umi-transfer version 0.1.0
Initial release of umi-transfer.