-
Notifications
You must be signed in to change notification settings - Fork 75
Description
Hi developers,
Below is the exact command sequence I ran, with a short note after each block demonstrating the dimension of the output.
--------------- BUILD MATRIX ------------
hicBuildMatrix \
--samFiles "$ALL.1.sorted.bam" "$ALL.2.sorted.bam" \
--binSize 5000 \
--restrictionSequence GATC \
--danglingSequence GATC \
--restrictionCutFile "$REF.positions.bed" \
--threads $THREADS \
--inputBufferSize 100000 \
--outBam "$ALL.hic.bam" \
-o "$ALL.hic_matrix.h5" \
--QCfolder "$OUTDIR.hicQC"
------------ RAW → GInteractions ------------
hicConvertFormat \
--matrices $F.hic_matrix.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_matrix_raw \
Row count: 189 718 253 × 7 (each row = one non-zero pixel).
---------------- NORMALISATION -------------
hicNormalize \
-m $F.hic_matrix.h5 \
--normalize norm_range \
-o $F.hic_normalized.h5
-------- NORM. MATRIX → GInteractions ------
hicConvertFormat \
--matrices $F.hic_normalized.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_normalized
After normalisation the conversion reports 21 956 835 × 7 rows – ~12 % of the original
------------- CORRECTION: ICE --------------
hicCorrectMatrix correct \
--correctionMethod ICE \
-m $F.hic_normalized.h5 \
--filterThreshold -1 3.3791169303797464 \
-o $F.hic_corrected_normalized.h5
hicConvertFormat \
--matrices $F.hic_corrected_normalized.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_corrected_normalized
ICE correction raises the row count to 19 521 502 × 7.
------------- CORRECTION: KR ---------------
hicCorrectMatrix correct \
--correctionMethod KR \
-m $F.hic_normalized.h5 \
-o $F.hic_corrected_normalized.h5
hicConvertFormat \
--matrices $F.hic_corrected_normalized.h5 \
--inputFormat h5 \
--outputFormat ginteractions \
--outFileName $F.hic_corrected_normalized
KR correction lifts the row count further to 22 032 533 × 7.
Questions
• Row-count changes – I expected normalisation to preserve the set of non-zero rows, and correction to keep it the same or shrink it if bins are filtered. Instead I see a large drop after hicNormalize and a rise after KR/ICE correction. What internal filtering or bin-merging steps explain this behaviour? Is there a recommended way to keep row count stable across steps ?
• After normalisation/correction hicConvertFormat … --load_raw_values cannot output the raw counts, I only got the normalized/corrected values. Is there a recommended way to obtain the corresponding raw counts for the processed matrix? For using the --load_raw_values flag, I used a pipeline with only .cool files, no .h5 files.
Thank you for your help and for developing the tool!