Skip to content

Unexpected change in GInteractions row-count after hicNormalize / hicCorrectMatrix, and --load_raw_values flag not giving raw counts in normalized/corrected matrix #944

@fatimanawmi

Description

@fatimanawmi

Hi developers,
Below is the exact command sequence I ran, with a short note after each block demonstrating the dimension of the output.


--------------- BUILD MATRIX ------------

hicBuildMatrix \
--samFiles "$ALL.1.sorted.bam" "$ALL.2.sorted.bam" \
--binSize 5000 \
--restrictionSequence GATC \
--danglingSequence GATC \
--restrictionCutFile "$REF.positions.bed" \
--threads $THREADS \
--inputBufferSize 100000 \
--outBam "$ALL.hic.bam" \
-o "$ALL.hic_matrix.h5" \
--QCfolder "$OUTDIR.hicQC"


------------ RAW → GInteractions ------------

hicConvertFormat \
--matrices $F.hic_matrix.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_matrix_raw \

Row count: 189 718 253 × 7 (each row = one non-zero pixel).


---------------- NORMALISATION -------------

hicNormalize \
-m $F.hic_matrix.h5 \
--normalize norm_range \
-o $F.hic_normalized.h5


-------- NORM. MATRIX → GInteractions ------

hicConvertFormat \
--matrices $F.hic_normalized.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_normalized

After normalisation the conversion reports 21 956 835 × 7 rows – ~12 % of the original


------------- CORRECTION: ICE --------------

hicCorrectMatrix correct \
--correctionMethod ICE \
-m $F.hic_normalized.h5 \
--filterThreshold -1 3.3791169303797464 \
-o $F.hic_corrected_normalized.h5
hicConvertFormat \
--matrices $F.hic_corrected_normalized.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_corrected_normalized
ICE correction raises the row count to 19 521 502 × 7.


------------- CORRECTION: KR ---------------

hicCorrectMatrix correct \
--correctionMethod KR \
-m $F.hic_normalized.h5 \
-o $F.hic_corrected_normalized.h5
hicConvertFormat \
--matrices $F.hic_corrected_normalized.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_corrected_normalized
KR correction lifts the row count further to 22 032 533 × 7.


Questions
• Row-count changes – I expected normalisation to preserve the set of non-zero rows, and correction to keep it the same or shrink it if bins are filtered. Instead I see a large drop after hicNormalize and a rise after KR/ICE correction. What internal filtering or bin-merging steps explain this behaviour? Is there a recommended way to keep row count stable across steps ?
• After normalisation/correction hicConvertFormat … --load_raw_values cannot output the raw counts, I only got the normalized/corrected values. Is there a recommended way to obtain the corresponding raw counts for the processed matrix? For using the --load_raw_values flag, I used a pipeline with only .cool files, no .h5 files.

Thank you for your help and for developing the tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions