Fix the rANS 4x8 order-1 frequency encoding example. #819

jkbonfield · 2025-03-24T17:22:40Z

The text erroneously divided 43 into 11,11,11,10 when it would be 10,10,10,13.

Instead of 4x11 minus 1 I chose a new example of 4x11 plus 1, so the word boundaries ended at the same points.

Fixes #817

github-actions · 2025-03-24T17:24:49Z

Changed PDFs as of 6259dcd: CRAMcodecs (diff).

zaeleus · 2025-03-24T20:58:11Z

CRAMcodecs.tex

+a & a &  614 \\
+  & b & 1638 \\
+  & c &  819 \\
+  & d & 1024 \\


Suggested change

a & a & 614 \\

& b & 1638 \\

& c & 819 \\

& d & 1024 \\

a & a & 614 \\

& b & 1639 \\

& c & 819 \\

& d & 1023 \\

I know the spec doesn't define the adjustment to the normalized sum,¹ but consider being consistent with the (implicit) method used in the order-0 example, i.e., the symbol with the highest frequency is adjusted.

Footnotes

§ 2.1 "Frequency table" (2023-03-15): "Given rounding differences when renormalising to a fixed sum, it is up to the encoder to decide how to distribute any remainder or remove excess frequencies." ↩

I'm happy to change it. I think neither of these are actually the values that came out of htscodecs (until I modified it a bit anyway) as it was dividing by 21 instead of 20 initially due to an extra line for incrementing the first (or last, I forget) character frequency.

Infact I'm wondering if that's a bug in hist1_4. I see the line that bumped T0[a] to 21 for my little test sequence was from moving something out of a loop:

while (in < in_end) { F0[l][c = *in++]++; - T0[l]++; l = c; } + T0[l]++;

I don't see why I did this and it looks wrong! I wonder if I can come up with some strings that break it...

Ok I can see my code above is a weird quirk, but harmless. It's correct to include that last char in T0, but T0 is not the same as the sum of frequencies for transitions from a->b as that inevitably misses one char. Anyway, it's a harmless thing and it's all an approximation anyway. If the last char is something we've never seen before, not including it in T0 still leads to a correct decode table (as we don't ever have last-char to next-char occuring anyway).

That segway aside, the numbers do work out correctly in my code as I'm strangely scaling to 4096 and then performing the correction to 4095 (which was erroneous and I removed that foible when I did the 4x16 frequency encoding, but it was probably to avoid some overflow in the initial implementation).

Before norm:

T[a]=20 F[a][a]=3 F[a][b]=8 F[a][c]=4 F[a][d]=5

Multiplicative factor is 204.8 if we divide 4096 by 20, which gives the numbers I have above, exactly summing to 4095 by luck! If we divide 4095/20 we get multiplicative factor of 204.75 which gives 614, 1638, 819, 1023, and then as you say we increase the 1638 to 1639 instead. Arguably this would be a better implementation so I'll change it, but none of this have any material impact on the spec as we are defining the format of the frequency table along with the rules to decode it. It's totally up to the implementation to decide how normalisation should work and what I'm currently doing is a really poor strategy compared to a more meaningful redistribution strategy.

github-actions · 2025-03-25T10:25:10Z

Changed PDFs as of 5f24b21: CRAMcodecs (diff).

zaeleus · 2025-03-28T19:36:31Z

CRAMcodecs.tex

+0x61      0x82 0x66  # a             <614>
+0x62 0x02 0x86 0x66  # b <+2: c,d>  <1638>
+          0x83 0x33  # c (implicit)  <819>
+          0x84 0x00  # d (implicit) <1024>


Suggested change

0x61 0x82 0x66 # a <614>

0x62 0x02 0x86 0x66 # b <+2: c,d> <1638>

0x83 0x33 # c (implicit) <819>

0x84 0x00 # d (implicit) <1024>

0x61 0x82 0x66 # a <614>

0x62 0x02 0x86 0x67 # b <+2: c,d> <1639>

0x83 0x33 # c (implicit) <819>

0x83 0xff # d (implicit) <1023>

github-actions · 2025-04-07T13:48:24Z

Changed PDFs as of 562cc61: CRAMcodecs (diff).

The text erroneously divided 43 into 11,11,11,10 when it would be 10,10,10,13. Instead of 4x11 minus 1 I chose a new example of 4x11 plus 1, so the word boundaries ended at the same points. Fixes samtools#817

github-actions · 2025-04-07T14:37:41Z

Changed PDFs as of 563e8ab: CRAMcodecs (diff).

jkbonfield added the cram label Mar 24, 2025

zaeleus reviewed Mar 24, 2025

View reviewed changes

jkbonfield force-pushed the rans4x8-example branch from 6259dcd to 5f24b21 Compare March 25, 2025 10:23

zaeleus reviewed Mar 28, 2025

View reviewed changes

jkbonfield force-pushed the rans4x8-example branch from 5f24b21 to 562cc61 Compare April 7, 2025 13:46

zaeleus approved these changes Apr 7, 2025

View reviewed changes

Fix the rANS 4x8 order-1 frequency encoding example (PR samtools#819)

563e8ab

The text erroneously divided 43 into 11,11,11,10 when it would be 10,10,10,13. Instead of 4x11 minus 1 I chose a new example of 4x11 plus 1, so the word boundaries ended at the same points. Fixes samtools#817

jkbonfield force-pushed the rans4x8-example branch from 562cc61 to 563e8ab Compare April 7, 2025 14:35

jkbonfield merged commit 563e8ab into samtools:master Apr 7, 2025
1 check passed

jkbonfield temporarily deployed to github-pages April 7, 2025 14:39 — with GitHub Pages Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the rANS 4x8 order-1 frequency encoding example. #819

Fix the rANS 4x8 order-1 frequency encoding example. #819

Uh oh!

jkbonfield commented Mar 24, 2025

Uh oh!

github-actions bot commented Mar 24, 2025

Uh oh!

zaeleus Mar 24, 2025

Uh oh!

jkbonfield Mar 25, 2025

Uh oh!

jkbonfield Mar 25, 2025

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

zaeleus Mar 28, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Fix the rANS 4x8 order-1 frequency encoding example. #819

Fix the rANS 4x8 order-1 frequency encoding example. #819

Uh oh!

Conversation

jkbonfield commented Mar 24, 2025

Uh oh!

github-actions bot commented Mar 24, 2025

Uh oh!

zaeleus Mar 24, 2025

Choose a reason for hiding this comment

Footnotes

Uh oh!

jkbonfield Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

jkbonfield Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

zaeleus Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!