Skip to content

CRAM v4 idea tracker #144

Open
Open
@jkbonfield

Description

@jkbonfield

CRAM v4 isn't on the cards yet, but it's good to keep track of any ideas in one place so if it ever happens we don't forget something.

  • Index format improvements: .crai index improvements #137
  • Removal of CORE block?
  • Cull some of the codecs (mainly those to do with core, eg beta, subexp, etc)
  • Additional codecs, if not already adding by then in a 3.x sub-release. Eg FSE, Huff0 at the fast end, ZSTD in the middle and maybe fqzcomp for qualities and custom name-LZ at the slower end (see io_lib cram_modules branch).
  • Read names present/absent via another route, eg zero length read name => generate. Currently it is done via a flag but this also forces detached mode and has other implications.
  • Possibility of transforms prior to codec.
    • Eg nibble packing (put two qualities into one byte before applying rans0, rans1, huff0, etc).
    • String delta (prefix / suffix removal).
  • Ability to store confidence values in original orientation. This improves order-1 compression ratio for qualities and is very quick to achieve.
  • Support for multiple embedded references per slice?
  • Permit rANS table to be in compression header and maybe implement an order-2 codec? It'll be much slower due to memory usage, but potentially with say 100 1k read slices per container we get size efficiency coupled to strong random access capability.
  • Potential for conditional decoding with potential for duplicated entries. Eg h37 & h38.
    • Each read has a bit field to indicate under which reference the data should be emitted (both for most cases).
    • Possibility of bit field per data series too? Eg cigar may change, but quality string doesn't.
    • How to get working with read-pairing (read number delta)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Stalled

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions