-
Notifications
You must be signed in to change notification settings - Fork 331
Description
sqlalchemy-utils
(v0.40.0) was recently added to our project as a dependency of another package, and once it was being imported we noticed a severe performance degradation in one of our processes. Upon investigation, what we found was that lots of time was being spent in sqlalchemy_utils/aggregates.py:536(construct_aggregate_queries)
, because that function will loop through all the objects in the session, regardless of whether any aggregates have been defined (and this process is currently loading millions of objects and causing frequent autoflushes).
While I know there are improvements we can make to our process, on principal simply importing a package should cause minimal side effects.
I have two ideas for how to address this:
- Defer registering the event until/unless an aggregate is defined.
- In
construct_aggregate_queries()
, add a guard clause such that if thegenerator_registry
is empty then we immediately return.
While the second option is easier to implement, it still adds some overhead to each flush event. For the first option, do you know if it is possible to register construct_aggregate_queries
in update_generator_registry()
if there are any aggregated attributes?
I'm willing to make a PR for this, wanted some feedback on the preferred direction.