Create a daily partition for the LSST alerts table #295

hernandezc1 · 2025-08-21T20:04:02Z

PR Summary

Changed

/setup_broker/lsst/setup_broker.sh
- updated the bq mk command to include the flag: --time_partitioning_type=DAY for the LSST alerts table

Partitioning the alert table improves a query's performance by only scanning a portion of a table. The maximum number of partitions allowed for a single BigQuery table is 10,000. Additional types of partitioning are outlined here.

troyraen

I definitely agree that we should partition and cluster our tables. I'll leave the specific decision about what to partition this one by up to you since you're working much more closely with the data and use cases these days. My input is the following. I'm doing a lot of guessing about how BigQuery works. You'd need to look into that more if/when you feel it's relevant.

BigQuery can probably execute a query by parallelizing over partitions, and in that case could return results faster from a table that has partitioning vs one that doesn't. That could be faster regardless of what the table is partitioned by and without any requirements on the specifics of a given query.

However, the monetary cost of a query is determined by the amount of data that needs to be processed. So I'm guessing that a partitioned table should facilitate cheaper queries by allowing the search engine to skip entire partitions, but it needs some way to determine which partitions can be skipped. To do this, it can probably use a constraint on the partitioning column itself (ie, the given query has to include something like "WHERE partition_column < value") and/or compare other constraints in the given query with partition metadata (eg, min/max of any column). Because of the distributions of data in these alerts tables, if we partition by time/day, I think that users will typically have to include a constraint on the partitioning column itself in order to pay significantly less for a query. Things like magnitude won't be correlated with time, so they won't be useful. Spatial constraints will have some correlation with time because of LSST's survey strategy. That could potentially mean significant cost savings when compared to no partitioning at all. But by contrast, if we partition spatially (eg, by HEALPix), our users will be much more likely to pay even less for their queries. 1) Spatial constraints are the most common need shared across time-domain astronomy use cases. 2) All alerts for a given astronomical object will be in the same partition. That's why we've always intended to partition these tables spatially.

Our buckets are organized by time and that's where we've intended to direct users who have a strong dependence on time-based searches. I know it's possible to configure a BigQuery table to use a bucket as it's underlying data store. I have no idea how easy that is to set up. I'm pretty sure that queries will take longer and it might cost more as well, but by how much, I don't know. You could look into it as a potential additional service we could offer and/or suggest it to a user who wants a time-partitioned BigQuery table.

If we only have one potential user right now (that we know of) and they strongly prefer a BigQuery table with time-based partitioning, of course it makes sense to consider that. But also consider what you're going to do when we have more potential users who want spatial lookups.

partition the alerts table

dbbb2a4

hernandezc1 self-assigned this Aug 21, 2025

hernandezc1 added Enhancement New feature or request Pipeline: Storage Components whose primary function is to store data labels Aug 21, 2025

hernandezc1 requested a review from troyraen August 21, 2025 20:13

troyraen approved these changes Aug 22, 2025

View reviewed changes

hernandezc1 merged commit f3d9ed9 into develop Aug 27, 2025
4 checks passed

hernandezc1 deleted the u/ch/bq/partition branch August 27, 2025 17:48

hernandezc1 mentioned this pull request Aug 28, 2025

Use SMTs to update data written to BigQuery #297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a daily partition for the LSST alerts table #295

Create a daily partition for the LSST alerts table #295

hernandezc1 commented Aug 21, 2025 •

edited

Loading

Uh oh!

troyraen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Create a daily partition for the LSST alerts table #295

Create a daily partition for the LSST alerts table #295

Conversation

hernandezc1 commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Changed

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hernandezc1 commented Aug 21, 2025 •

edited

Loading