Skip to content

cram: Clarify the preservation map's reference required flag for unmapped slices #809

Open
@zaeleus

Description

@zaeleus

This is in regard to CRAM format specification (version 3.1) (2024-09-04).

§ 8.4.1 "Compression header block: Preservation map" describes the reference required (RR) flag as "true if reference sequence is required to restore the data completely". If this is true and the records do not require a reference sequence to restore the data (e.g., an unmapped slice), is it considered an invalid state?

As I understand it, this flag should be false if the slice is unmapped, but some implementation don't set it as such, e.g., htslib:

$ samtools --version | head -2
samtools 1.21
Using htslib 1.21

$ (
samtools view --output-fmt cram <<EOF
@HD	VN:1.6
r1	4	*	0	0	*	*	0	0	NNNN	!!!!
EOF
) | cram_dump - | grep --after-context 4 "Preservation map" | head -5
      Preservation map:
        SM => 30398990 (0x1cfda0e)
        TD => 30398986 (0x1cfda0a)
        RN => 1 (0x1)
        AP => 1 (0x1)

All the records in this slice are unmapped (cram_dump: "Slice ref seq -1"), and this implicitly sets the reference required field to true, as per "The boolean values are optional, defaulting to true when absent..." However, a reference sequence is not required to decode this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    New items

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions