Open
Description
CRAM v4 isn't on the cards yet, but it's good to keep track of any ideas in one place so if it ever happens we don't forget something.
- Index format improvements: .crai index improvements #137
- Removal of CORE block?
- Cull some of the codecs (mainly those to do with core, eg beta, subexp, etc)
- Additional codecs, if not already adding by then in a 3.x sub-release. Eg FSE, Huff0 at the fast end, ZSTD in the middle and maybe fqzcomp for qualities and custom name-LZ at the slower end (see io_lib cram_modules branch).
- Read names present/absent via another route, eg zero length read name => generate. Currently it is done via a flag but this also forces detached mode and has other implications.
- Possibility of transforms prior to codec.
- Eg nibble packing (put two qualities into one byte before applying rans0, rans1, huff0, etc).
- String delta (prefix / suffix removal).
- Ability to store confidence values in original orientation. This improves order-1 compression ratio for qualities and is very quick to achieve.
- Support for multiple embedded references per slice?
- Permit rANS table to be in compression header and maybe implement an order-2 codec? It'll be much slower due to memory usage, but potentially with say 100 1k read slices per container we get size efficiency coupled to strong random access capability.
- Potential for conditional decoding with potential for duplicated entries. Eg h37 & h38.
- Each read has a bit field to indicate under which reference the data should be emitted (both for most cases).
- Possibility of bit field per data series too? Eg cigar may change, but quality string doesn't.
- How to get working with read-pairing (read number delta)?
Metadata
Metadata
Assignees
Type
Projects
Status