-
Notifications
You must be signed in to change notification settings - Fork 0
Update SMT UDFs for LSST modules #302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
troyraen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| --max-delivery-attempts=5 \ | ||
| --min-retry-delay=10 \ | ||
| --max-retry-delay=600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, double check that the combination of these three flags does what you expect.
| ) | ||
|
|
||
|
|
||
| def _process_field(original_value: Any, config: dict) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this really accept and return Any type? Looks to me like it only handles dict | (list[dict] | None).
| LITE_FIELDS_CONFIG = { | ||
| "diaSource": { | ||
| "fields": { | ||
| "diaSourceId", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for now, but it would be great to define this in a yaml file in pittgoogle-client instead. _create_lite_alert() could move there too.
| const attrs = message.attributes || {}; | ||
| function addTopLevelFields(message, metadata) { | ||
| const attrs = message.attributes || {}; | ||
| const dataStr = message.data.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does toString() have any effect on the data that comes out?
| // We avoid casting fields as JavaScript numbers to prevent precision loss | ||
| if (attrs.healpix9) newFields.healpix9 = attrs.healpix9.toString(); | ||
| if (attrs.healpix19) newFields.healpix19 = attrs.healpix19.toString(); | ||
| if (attrs.healpix29) newFields.healpix29 = attrs.healpix29.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does javascript really not support 64bit ints? That seems strange. If leaving toString(), does that mean these will be strings in the output data? That would be a pain for everyone downstream.
| diaSourceId: attrs.diaSource_diaSourceId ? attrs.diaSource_diaSourceId.toString() : null, | ||
| diaObjectId: attrs.diaObject_diaObjectId ? attrs.diaObject_diaObjectId.toString() : null, | ||
| ssObjectId: attrs.ssSource_ssObjectId ? attrs.ssSource_ssObjectId.toString() : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern about toString() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments at the tops of these SMT yaml files briefly explaining what the transforms do and where there are intended to be used.
Also, minor, but I would name these files like ps_smt_lsst_add_top_level_fields.yaml or just smt_lsst_add_top_level_fields.yaml so "smt" is closer to the beginning. Also, 'lsst' could probably be removed since these are underneath an 'lsst' directory.
| if (attrs.healpix19) newFields.healpix19 = attrs.healpix19.toString(); | ||
| if (attrs.healpix29) newFields.healpix29 = attrs.healpix29.toString(); | ||
| if (attrs["kafka.timestamp"]) { | ||
| newFields.kafkaPublishTimestamp = attrs["kafka.timestamp"] * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'kafka.timestamp * 1000' is happening in several places (2 in this PR and I think I saw it somewhere else when you were screensharing earlier today). Is it necessary? If so, leave a comment explaining why.
Also consider whether a user could be easily confused when comparing the data that comes out of a '* 1000' operation with data from the same alert that hasn't undergone '* 1000'. In most cases I'm aware of where a '* 1000' operation is necessary, the transformation results in output data will not be confusing (e.g., the output is a python datetime object, so not a float like 'kafka.timestamp' is and thus not confusing). I don't know what the output data looks like here, which is why I suggest considering it. If it is both necessary and confusing, what to do about that is probably a case-by-case decision.
Summary of Changes
Changed
pittgoogle-client>=0.3.19broker/cloud_run/lsst/ps_to_storage/main.pyto generate lite alerts before they are published tolsst-litebroker/setup_broker/lsst/setup_broker.shps_lsst_lite_smt.yamlfromps_topic_alerts_liteps_lsst_flatten_schema_smt.yamlandps_lsst_add_top_level_fields_smt.yamldiaSourceIdin value added BigQuery table schemasdiaSource.diaSourceIdin the alert packet schemaps_input_subscriptionfor our value added modules are now configured with exponential backoffs for delivery failuresRemoved
broker/setup_broker/lsst/templates/ps_lsst_lite_smt.yaml