Note that multi-axis compress can be implemented using ∧⌜´𝕨

mlochbaum · mlochbaum · commit fb2b3aab3f6f · 2025-02-11T21:31:22.000-05:00
diff --git a/docs/implementation/primitive/replicate.html b/docs/implementation/primitive/replicate.html
@@ -121,6 +121,6 @@ <h3 id="grouped-compress"><a class="header" href="#grouped-compress">Grouped com
 <p>The sparse method can also be adapted to find groups of 1s instead of individual 1s, by searching for the first 1 and then the first 0 after that. This is useful if <code><span class='Value'>𝕨</span></code> changes value rarely, that is, if <code><span class='Function'>+</span><span class='Modifier'>´</span><span class='Function'>»</span><span class='Modifier2'>⊸</span><span class='Function'>&lt;</span><span class='Value'>𝕨</span></code> is small. Computing this value can be expensive so it's best to compute the threshold first, then update it in blocks and stop if it exceeds the threshold.</p>
 <p>For copying medium-sized cells with memcpy, all the branching here is pretty cheap relative to the actual operation, and it may as well be used all the time. This may not be true for smaller cells copied with overwriting, but I haven't implemented overwriting so I'm not sure.</p>
 <h2 id="higher-ranks"><a class="header" href="#higher-ranks">Higher ranks</a></h2>
-<p>When replicating along the first axis only, additional axes only change the element size (these are the main reason why a large element method is given). Replicating along a later axis offers a few opportunities for improvement relative to replicating each cell individually. See also <a href="select.html#multi-axis-selection">multi-axis Select</a>.</p>
-<p>Particularly for boolean <code><span class='Value'>𝕨</span></code>, Select is usually faster than Replicate (a major exception is for a boolean <code><span class='Value'>𝕩</span></code>). Simply replacing <code><span class='Function'>/</span></code> with <code><span class='Function'>/</span><span class='Modifier'>¨</span><span class='Modifier2'>⊸</span><span class='Function'>⊏</span></code> (after checking conformability) could be an improvement. It's probably best to compute the result shape first to avoid doing any work if it's empty. Similarly, if early result axes are small then the overhead of separating out Indices might make it worse than just doing the small number of Replicates.</p>
-<p>A technique when <code><span class='Value'>𝕨</span></code> is processed with one or more bytes at a time, and applies to many rows, is to repeat it up to an even number of bytes and combine rows of <code><span class='Value'>𝕩</span></code> into longer virtual rows (the last one can be short). I think this only ends up being useful when <code><span class='Value'>𝕩</span></code> is boolean.</p>
+<p>When replicating along the first axis only, additional axes only change the element size (these are the main reason why a large-element method is given). Replicating along a later axis offers a few opportunities for improvement relative to replicating each cell individually. See also <a href="select.html#multi-axis-selection">multi-axis Select</a>.</p>
+<p>Particularly for boolean <code><span class='Value'>𝕨</span></code>, Select is usually faster than Replicate (a major exception is for a boolean <code><span class='Value'>𝕩</span></code>). Simply replacing <code><span class='Function'>/</span></code> with <code><span class='Function'>/</span><span class='Modifier'>¨</span><span class='Modifier2'>⊸</span><span class='Function'>⊏</span></code> (after checking length agreement) could be an improvement. It's probably best to compute the result shape first to avoid doing any work if it's empty. Similarly, if early result axes are small then the overhead of separating out Indices might make it worse than just doing the small number of Replicates.</p>
+<p>Some other tricks are possible for boolean <code><span class='Value'>𝕨</span></code>. If there's a large enough unchanged axis above, perhaps with <code><span class='Value'>𝕨</span><span class='Function'>/</span><span class='Modifier2'>⎉</span><span class='Number'>1</span><span class='Value'>𝕩</span></code>, then <code><span class='Value'>𝕨</span></code> can be repeated to act on virtual rows consisting of multiple rows of <code><span class='Value'>𝕩</span></code> (the last one can be short). I think this only ends up being useful when <code><span class='Value'>𝕩</span></code> is boolean. But we can also combine compress along several axes, as multi-axis <code><span class='Function'>⥊</span><span class='Value'>𝕨</span><span class='Function'>/</span><span class='Value'>𝕩</span></code> is <code><span class='Paren'>(</span><span class='Function'>∧</span><span class='Modifier'>⌜´</span><span class='Value'>𝕨</span><span class='Paren'>)</span><span class='Function'>/</span><span class='Modifier2'>○</span><span class='Function'>⥊</span><span class='Value'>𝕩</span></code>: the previous method is a bit like a specialization where entries of <code><span class='Value'>𝕨</span></code> other than the last are lists of <code><span class='Number'>1</span></code>s. This is particularly nice if <code><span class='Value'>𝕩</span></code> as a whole is small, but even if <code><span class='Value'>𝕨</span></code> will eventually be converted to indices, it's a faster way to combine the bottom few levels if they're fairly dense.</p>
diff --git a/implementation/primitive/replicate.md b/implementation/primitive/replicate.md
@@ -122,8 +122,8 @@ For copying medium-sized cells with memcpy, all the branching here is pretty che
 
 ## Higher ranks
 
-When replicating along the first axis only, additional axes only change the element size (these are the main reason why a large element method is given). Replicating along a later axis offers a few opportunities for improvement relative to replicating each cell individually. See also [multi-axis Select](select.md#multi-axis-selection).
+When replicating along the first axis only, additional axes only change the element size (these are the main reason why a large-element method is given). Replicating along a later axis offers a few opportunities for improvement relative to replicating each cell individually. See also [multi-axis Select](select.md#multi-axis-selection).
 
-Particularly for boolean `𝕨`, Select is usually faster than Replicate (a major exception is for a boolean `𝕩`). Simply replacing `/` with `/¨⊸⊏` (after checking conformability) could be an improvement. It's probably best to compute the result shape first to avoid doing any work if it's empty. Similarly, if early result axes are small then the overhead of separating out Indices might make it worse than just doing the small number of Replicates.
+Particularly for boolean `𝕨`, Select is usually faster than Replicate (a major exception is for a boolean `𝕩`). Simply replacing `/` with `/¨⊸⊏` (after checking length agreement) could be an improvement. It's probably best to compute the result shape first to avoid doing any work if it's empty. Similarly, if early result axes are small then the overhead of separating out Indices might make it worse than just doing the small number of Replicates.
 
-A technique when `𝕨` is processed with one or more bytes at a time, and applies to many rows, is to repeat it up to an even number of bytes and combine rows of `𝕩` into longer virtual rows (the last one can be short). I think this only ends up being useful when `𝕩` is boolean.
+Some other tricks are possible for boolean `𝕨`. If there's a large enough unchanged axis above, perhaps with `𝕨/⎉1𝕩`, then `𝕨` can be repeated to act on virtual rows consisting of multiple rows of `𝕩` (the last one can be short). I think this only ends up being useful when `𝕩` is boolean. But we can also combine compress along several axes, as multi-axis `⥊𝕨/𝕩` is `(∧⌜´𝕨)/○⥊𝕩`: the previous method is a bit like a specialization where entries of `𝕨` other than the last are lists of `1`s. This is particularly nice if `𝕩` as a whole is small, but even if `𝕨` will eventually be converted to indices, it's a faster way to combine the bottom few levels if they're fairly dense.