Skip to content

Linear speed drop using embeddings with hoplite (#562) due to excessive database commits #614

@FloMee

Description

@FloMee

Description:
I was very happy to see that you introduced the "embeddings with hoplite" feature with #562. Unfortunately I've observed a linear speed drop while analyzing hundreds of sound files:

Image

Using the pyinstrument module, I've confirmed that the issue lies in the excessive database commits during the analysis process:

Image

Solution:
I've solved the problem for me with moving the db.commit() from line 107 in embeddings/utils.py to line 209 before the db.db.close()

Question:
I'm aware of the fact that my code changes (committing to the database just after all files are analyzed) might introduce the risk of data loss. However I think committing the results of every single chunk is not useful either.

What do you think could be a feasible solution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions