Skip to content

Conversation

Shimuuar
Copy link
Contributor

Implementation follows plan described in #477 with generateA :: Applicative f => Int -> (Int -> f a) -> f (v a) as primitive. It is not subject to stream fusion.

Writing generateA is not difficult. Problem is doing it efficiently. Current benchmarks are (suggestions for more are very welcome):

  1. Generate vector of random numbers using state monad
  2. Generate vector using IO action
  3. Compute sum of vector using lens
  4. Map vector using lens

First and naive implementation uses list as intermediate data structure. Sum benchmark performs well in this case. Using STA instead brings map benchmark on par with explicit loop and produces slight (5-10%) improvements in state and IO benchmark)

newtype STA v a = STA {  _runSTA :: forall s. Mutable v s a -> ST s (v a) }

Currently sum and map perform on par with explicit loop. State gives 7x slowdown and 8x allocations, IO benchmark 4x slowdown and 8x allocations. We obviously can add rewrite rules for IO/ST but maybe there're more general optimizations.


Fixes #477, #69, #132, #144

Copy link
Contributor

@konsumlamm konsumlamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about for/for_/traverse_ etc?

@Shimuuar
Copy link
Contributor Author

What about for/for_/traverse_ etc?

Good point! I forgot about them. But they're simpler since in this case one isn't in business of constructing vector so it simply reduces to fold.

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Sep 3, 2025

I think that's about as far as reasonably possible for improving performance.

For many PrimMonads we would like to simply create buffer and just write to it. But it's not safe in general case, applicatives with backtracking can break referential transparency. And it's not clear how to express such rewrite rule and whether it's even possible. Also unstreamM suffers from same problem

Shimuuar and others added 12 commits September 5, 2025 09:30
generateA is used as primitive and all other functions are expressed in it
terms. First version goes through intermediate list.

This is simplest implementation possible and would serve as baseline for
further optimizations
We establish implementation which goes through list as baseline and the we can
try to optimize it.

Note definition of foldlOf'. It's different from definition in lens<=5.3.3
but it's absolutely necessary to get good perfomance in folds
Does wonders for traversals using Identity
No performance change in benchmarks
This is clearly not enough. There're many other types that will benefit form
same rewrite but we don't know how to do that. unstreamM suffers from same
problem.

Identity is important since it's used in lens for mapping (over) and rewrite
rule does improve performance: 10-20% in microbenchmark.
Relevant for TypeApplication and other function library use same convention
Now it works in the same way as generateM
@Shimuuar Shimuuar marked this pull request as ready for review September 5, 2025 08:33
@Shimuuar
Copy link
Contributor Author

Shimuuar commented Sep 5, 2025

I think PR is ready.

It may perform better for some applicatives
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API using Applicative (traverse et.al.)
2 participants