Description
Perhaps you already have plans for this, but it seems the degree 32 data takes up 1.4 GB, yet I think this can be reduced by at least a factor 3, possibly more (and that is not yet even considering the option of storing everything gzip compressed).
Some suggestions follow; if you are interested, I would also be willing to help implementing them; after all, I could use some of the techniques for my SGL extension anyway.
-
dat32/trans32.grp takes up 17 MB but could be reduced to less than 20 kb. Namely, we notice that in
TRANSSIZES[32]
there are only 487 distinct entries in that array. So we only need to store a list of pairs [n,count], andcount
indicates how oftenn
occurs inTRANSSIZES[32]
. From this one, can reconstruct the array during loading. -
Indeed,
TRANSSIZES[32]
also takes up 21 MB of RAM. This could also be reduced, by replacing all read access toTRANSSIZES
with calls to a help functionTRANSsizes
(this change would also make future HPC-GAP support easier). This helper function can returnTRANSSIZES[deg]
for most degrees, but for degree 32 we can do something different. -
The various
dat32/trans32*.g
files are 2-3 MB each. They mostly consist of degree 32 permutations. As they are stored in cycle notation. I sampled some of the files, and it seems that on average, over 94 bytes are used to store each permutation (the minimum, if all 32 points are moved, is 88 bytes for a full cycle; add 1 byte for each cycle in the cycle decomp). But we can do much better: We can store permutations as strings of length 32 (almost a factor 3 better):
gap> perm;
(1,23,9,31,20,8,27,15)(2,24,10,32,19,7,28,16)(3,21,11,30,17,5,26,13)(4,22,12,29,18,6,25,14)
gap> List(ListPerm(perm, 32), n->IdentifierLetters[n]);
"LMJKONQPTUSR23014567@A89CBDEGFIH"
To further reduce the overhead (avoding quotes and commas) one can simply concatenate these strings.
Of course one can do even better: 32 values fit exactly into 5 bits, times 32, makes 160 bits or 20 bytes. But then you'd have to be willing to store binary data. If one instead uses base64 encoding, one could store these 160 bits in 26 ASCII character and with base122 it gets down to 23 characters. Implementing this is quite easy, and should reduce the data size by a factor in the ballpark of 88/23 = 3.82.