Skip to content

[BUG] Performance degradation with datacatalog.tags table #4583

@andrew-freenome

Description

@andrew-freenome

Describe the bug

I am seeing a performance bottleneck with the Flyte database. With my workload, the query SELECT * FROM "tags" WHERE ("tags"."artifact_id","tags"."dataset_uuid") IN (($1,$2)) is getting executed frequently (380,000 times in the last day) against the datacatalog database. The workload I'm executing does have ~380k tasks, so the number of queries makes sense. On average, it takes 2 seconds to complete and returns 0 rows. I believe this query is executed as part of the task cache, which I do have enabled (but I expect all cache misses). I am using v1.0.1 of Datacatalog, and v1.1.47 of FlytePropeller. This is a database with 32 vCPU and 64GB of memory, 200GB storage with ~50GB of that used.
Screenshot 2023-12-05 at 12 36 55 PM

Expected behavior

I expect that the database would not be a performance bottleneck, and that the datacatalog.tags table is properly indexed in order to support the queries that are executed against it.

Additional context to reproduce

If I add an index (CREATE INDEX tags_dataset_uuid_artifact_id_idx ON tags (dataset_uuid, artifact_id);), the query gets significantly faster (1000x in my local testing). The postgres planner seems to vary the way it actually executes the query. The attached screenshots are from a different DB instance under slightly lesser load, but I'm adding them to show the 3 different ways I've seen the planner choose to execute the query. (Note that these are all without the index).

Context: https://flyte-org.slack.com/archives/CP2HDHKE1/p1701797879836439

Screenshots

Screenshot 2023-12-06 at 2 03 33 PM Screenshot 2023-12-06 at 2 03 45 PM Screenshot 2023-12-06 at 2 03 57 PM

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions