Bit-packed arithmetic over Z/nZ for tiny n

Consider n < 256. We get 8x memory reduction using ``nmod8`` instead of ``nmod``, so it would be interesting to optimize the arithmetic in the ``nmod8`` representation. Some low-hanging fruit here is to do batch modular reduction using SIMD operations.

For really small n we can do even better by bit packing. Obviously for n = 2 we want to use a single word to represent a chunk of 64 entries. Polynomial multiplication is easy on modern hardware since there are now dedicated CLMUL instructions

What about other tiny n?

We can store an entry of Z/3Z using 2 bits, so it's possible to pack 32 entries into a word. Packing as tightly as possible is not obviously optimal: using 3 or 4 bits or even 6 or 8 bits has the benefit that one can do some arithmetic in-place before reducing. When packing to 2 bits, the article https://doi.org/10.36045/bbms/1369316548 discusses a nonstandard encoding which makes the arithmetic operations easier to implement using bitwise operations. To figure out what works best, one probably needs some implementation experiments.

It should be possible to do some more small cases like n = 5 and 7 very similarly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bit-packed arithmetic over Z/nZ for tiny n #2428

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bit-packed arithmetic over Z/nZ for tiny n #2428

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions