Skip to content

Event listener for aggregates can cause performance issues on import #700

@smsearcy

Description

@smsearcy

sqlalchemy-utils (v0.40.0) was recently added to our project as a dependency of another package, and once it was being imported we noticed a severe performance degradation in one of our processes. Upon investigation, what we found was that lots of time was being spent in sqlalchemy_utils/aggregates.py:536(construct_aggregate_queries), because that function will loop through all the objects in the session, regardless of whether any aggregates have been defined (and this process is currently loading millions of objects and causing frequent autoflushes).

While I know there are improvements we can make to our process, on principal simply importing a package should cause minimal side effects.

I have two ideas for how to address this:

  1. Defer registering the event until/unless an aggregate is defined.
  2. In construct_aggregate_queries(), add a guard clause such that if the generator_registry is empty then we immediately return.

While the second option is easier to implement, it still adds some overhead to each flush event. For the first option, do you know if it is possible to register construct_aggregate_queries in update_generator_registry() if there are any aggregated attributes?

I'm willing to make a PR for this, wanted some feedback on the preferred direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions