Ingest Swift/BAT-GUANO Alert Stream #277

hernandezc1 · 2025-06-17T15:02:30Z

This PR creates the necessary resources to ingest the JSON-Serialized GCN Notice Types generated by the Neil Gehrels Swift Observatory. Alerts are published to a Pub/Sub topic: swift-alerts. A BigQuery subscription attached to the topic writes the alert data to a BigQuery table (alerts_v4_5_0) in the dataset swift_alerts

troyraen · 2025-06-17T23:05:12Z

Yay!

One thought: We already listen to one GCN stream (LVK) and may want to add more in the future. Consider how we can make this easy for ourselves. For example, maybe all of our GCN consumers can use the same base container image and ingest different streams based only on an environment variable (and maybe separate credentials)? I don't know the answer or even all the possibilities, just putting a bug in your ear.

troyraen

Thanks, this will be great! A few things to clarify below.

I assume this is consumer will be one that we'll leave running 24/7?

I strongly encourage you to include the storage of alerts in a bucket with this PR. A bucket is more valuable to us than a bigquery table. We can create or recreate a table from alerts in a bucket but it's difficult to create a bucket of alerts from a bigquery table. It would also be easy to "replay" a pubsub stream using alerts in a bucket but that would be difficult if trying to use a table. We've already run into this problem with LSST (and/or LVK?) when we wanted to change the table but had trouble because we didn't have a bucket to draw from.

troyraen · 2025-06-23T18:30:57Z

broker/consumer/swift/ps-connector.properties

+# Tha Java class for the Pub/Sub sink connector.
+connector.class=com.google.pubsub.kafka.sink.CloudPubSubSinkConnector
+# The maximum number of tasks that should be created for this connector.
+tasks.max=1


What is the alert rate you expect from this stream? I'm guessing that one task is probably fine, but it's worth checking.

Now I see in the data listings PR, "less than 1 alert per week". So 1 task will be fine.

troyraen · 2025-06-23T18:39:06Z

broker/consumer/swift/vm_install.sh

@@ -0,0 +1,77 @@
+#! /bin/bash
+# Installs the software required to run the Kafka Consumer.
+# Assumes a Debian 10 OS.


Double check this. I think you're using Debian 11.

@troyraen it's actually using Debian 12! Will update the comments in this script. I was reusing code from previous vm_install scripts (e.g., LVK).

it's actually using Debian 12!

even better!

troyraen · 2025-06-23T18:40:38Z

broker/consumer/swift/vm_install.sh

+snap install yq
+
+#--- Install Java and the dev kit
+# see https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-debian-10


Similar to above. I think this should be a Debian 11 link.

troyraen · 2025-06-23T18:46:11Z

broker/consumer/swift/vm_install.sh

+echo "Installing the Kafka -> Pub/Sub connector"
+(
+    plugindir=/usr/local/share/kafka/plugins
+    CONNECTOR_RELEASE="1.1.0"


This version is more than 2 yrs old now. Latest version is 1.3.2, released 2025-06-02. It's probably worth a quick test to see if we can upgrade to the latest version without much trouble. If it turns into a rabbit hole, ok to punt that task and continue with this version for now.

troyraen · 2025-06-23T19:02:23Z

broker/setup_broker/swift/setup_broker.sh

+
+#--- GCP resources used directly in this script
+broker_bucket=$(define_GCP_resources "${PROJECT_ID}-${survey}-broker_files")
+bq_dataset_alerts=$(define_GCP_resources "${survey}_alerts")


Suggested change

bq_dataset_alerts=$(define_GCP_resources "${survey}_alerts")

bq_dataset_alerts=$(define_GCP_resources "${survey}")

Started this conversation on slack... Our datasets for other surveys are just named as the name of the survey. I see that we've started putting our value-added products in a separate dataset but I think that only creates more confusion than it's worth.

troyraen · 2025-06-23T19:05:59Z

broker/setup_broker/swift/setup_broker.sh

+        bq --location="${region}" mk --dataset "${bq_dataset_alerts}"
+
+        (cd templates && bq mk --table "${PROJECT_ID}:${bq_dataset_alerts}.${alerts_table}" "bq_${survey}_${alerts_table}_schema.json") || exit 5
+        bq update --description "Alert data from Swift/BAT-GUANO. This table is an archive of the swift-alerts Pub/Sub stream. It has the same schema as the original alert bytes, including nested and repeated fields." "${PROJECT_ID}:${bq_dataset_alerts}.${alerts_table}"


Add a link to the survey's documentation of the schema or some other page where users can find more info, if you know it.

Actually, maybe we should link to one of our pages instead. Maybe the survey page at docs/source/surveys/swift.rst being added in mwvgroup/pittgoogle-client#100?

troyraen · 2025-06-23T19:31:09Z

broker/setup_broker/swift/setup_broker.sh

+        if [ "$base_name" = "${survey}_alerts" ]; then
+            testid_suffix="_${testid}"  # complies with BigQuery naming conventions
+        else
+            testid_suffix="-${testid}"
+        fi


I recommending finding a more clear and direct way to identify whether "-" or "_" should be used here. Maybe a boolean function arg? Using the resource name makes this code quite brittle because we have to remember to change the guts of the function if we change the resource name and/or add new resources.

The function has been updated

troyraen · 2025-06-23T19:37:18Z

broker/setup_broker/swift/setup_broker.sh

+        # in order to create BigQuery subscriptions, ensure that the following service account:
+        # service-<project number>@gcp-sa-pubsub.iam.gserviceaccount.com" has the
+        # bigquery.dataEditor role for each table


Can the code for this be included here? Ideally, the setup script(s) should take care of all setup and the developer should not have to do things like this manually. If there is a good reason for the developer to do something manually, those instructions should be included with the other setup instructions rather than being buried in code comments.

Yes, I've now included the code here

troyraen · 2025-06-23T19:39:21Z

broker/setup_broker/swift/setup_broker.sh

+        # set IAM policies on resources
+        user="allUsers"
+        roleid="projects/${GOOGLE_CLOUD_PROJECT}/roles/userPublic"
+        gcloud pubsub topics add-iam-policy-binding "${topic_alerts}" --member="${user}" --role="${roleid}"


Make the bigquery dataset public as well.

The process of making the dataset public involves a few more steps than I was expecting. The documentation I referenced can be accessed here

I recall it being more difficult for bigquery than pubsub as well. Looks like you worked it out, so great job!

troyraen · 2025-06-23T19:43:17Z

broker/setup_broker/swift/setup_broker.sh

+    else
+        if [ "$environment_type" = "testing" ]; then
+            # delete testing resources
+            o="GSUtil:parallel_process_count=1" # disable multiprocessing for Macs
+            gsutil -m -o "${o}" rm -r "gs://${broker_bucket}"
+            bq rm -r -f "${PROJECT_ID}:${bq_dataset_alerts}"
+            gcloud pubsub topics delete "${topic_alerts}"
+            gcloud pubsub topics delete "${deadletter_topic_bigquery_import}"
+            gcloud pubsub subscriptions delete "${subscription_alerts_reservoir}"
+            gcloud pubsub subscriptions delete "${deadletter_subscription_bigquery_import}"
+            gcloud pubsub subscriptions delete "${subscription_bigquery_import}"
+        fi
+    fi


Add a comment explaining that the consumer VM will be deleted by the create_vm.sh script.

Add a print statement for the case environment_type="production" explaining why nothing happens and what the developer needs to do instead. You can copy/paste the one from the ZTF setup_broker.sh.

hernandezc1 · 2025-06-27T18:06:54Z

broker/setup_broker/swift/templates/bq_swift_alerts_v4_5_0_schema.json

+        "description": "URL of HEALPix localization probability file."
+    },
+    {
+        "name": "healpix_file",


@troyraen do we want to keep the healpix_file field here? This information will be available in the GCS bucket. We had a similar conversation when we first started storing alert data in BigQuery for LVK

My default instinct is to not alter the survey's schema. And/but if we are serving a copy of this file from GCP, we should definitely figure out how/where to advertise that since getting it from within GCP will be more convenient for at least some users.

I'm not sure I remember the details about what's going on here... Is this the file containing the probability that the event originated from a particular sky location (ie, a probability per healpix pixel)? And are we ingesting that file and serving it from a GCS bucket in addition to the alerts? If so, maybe we leave this original schema field intact but add another field called something like "healpix_file_gcp" or "healpix_file_pg" or "healpix_file_cloud"? I'm much more comfortable adding a new field than changing the existing one. Happy to talk this through in more detail on Thursday if I'm missing the point here.

troyraen

A few things to look at. Most important is (1) the "avro" resources; and (2) name of alert in bucket. I'm also noticing that the ways we assign permissions to users/resources has gotten pretty scattered. It would be good to consolidate and handle them consistently.

troyraen · 2025-06-30T23:19:42Z

broker/cloud_run/swift/ps_to_storage/main.py

+TOPIC_ALERTS_JSON = pittgoogle.Topic.from_cloud(
+    "alerts-json", survey=SURVEY, testid=TESTID, projectid=PROJECT_ID
+)


Do these alerts come from Swift as json or avro? If json, I think the name of this topic can just be swift-alerts since we're not changing the serialization. Also, if this stays as swift-alerts-json does that mean we don't publish any topic that's just called swift-alerts? I think from a usability/consistency standpoint we should always publish a stream called <survey>-alerts that is just a pass through of the survey's full (but deduplicated) alert stream from Kafka (or whatever) into Pub/Sub.

If that seems confusing in comparison with our LSST streams where lsst-alerts is avro and lsst-alerts-json is json, maybe we consider changing those names so that lsst-alerts is the json version and we make the avro one called lsst-alerts-avro? Benefit of the current naming is that lsst-alerts is byte-for-byte the same as what Rubin publishes. That was my original intention for all of our <survey>-alerts streams, and in that sense using swift-alerts is consistent (assuming Swift really does publish these as json -- otherwise, sorry for this irrelevant tangent). But since all of our topics downstream of <survey>-alerts use json exclusively, I can see an argument for making all <survey>-alerts streams json as well.

@troyraen the alerts from Swift are JSON serialized. I named the Pub/Sub resource that way because at the time it seemed more appropriate and descriptive, but I agree that having a <survey>-alerts topic that is just a pass through of the survey's full (deduplicated) alert stream is the convention this module should adopt

troyraen · 2025-06-30T23:31:39Z

broker/cloud_run/swift/ps_to_storage/deploy.sh

There are a bunch of variables in here with names that include "avro" -- I'm sure this is a holdover from the ZTF and LSST modules. For reusability, we should change those names because not all surveys publish avro alerts. For Swift in particular, assuming they publish in json and not avro, having "avro" here is confusing.

troyraen · 2025-06-30T23:41:35Z

broker/cloud_run/swift/ps_to_storage/deploy.sh

+cr_module_name=$(define_GCP_resources "${survey}-${MODULE_NAME}")  # lower case required by cloud run
+gcs_avro_bucket=$(define_GCP_resources "${PROJECT_ID}-${survey}_alerts")
+ps_input_subscrip=$(define_GCP_resources "${survey}-alerts_raw") # pub/sub subscription used to trigger cloud run module
+ps_subscription_avro=$(define_GCP_resources "${survey}-alert_avros-counter")


Are you actually using these counter subscriptions? If not, get rid of this. I think ZTF is the only broker pipeline that uses them.

FWIW, ZTF uses them with an ancillary module that helps us track broker performance. The module very useful, but the way I wrote it is complicated and makes it expensive and that's why we haven't added it to any other broker pipeline. We should rewrite it using bigquery subscriptions (which didn't exist when I wrote the original) -- #172.

troyraen · 2025-06-30T23:47:08Z

broker/cloud_run/swift/ps_to_storage/deploy.sh

+gcs_avro_bucket=$(define_GCP_resources "${PROJECT_ID}-${survey}_alerts")
+ps_input_subscrip=$(define_GCP_resources "${survey}-alerts_raw") # pub/sub subscription used to trigger cloud run module
+ps_subscription_avro=$(define_GCP_resources "${survey}-alert_avros-counter")
+ps_topic_avro=$(define_GCP_resources "projects/${PROJECT_ID}/topics/${survey}-alert_avros")


I know this is the original name of the topic, but using "avros" is not generically good for the aforementioned reasons, and if Swift publishes json then we definitely shouldn't use it here. Let's think of something better. <survey>-alert_in_bucket? Just off the top of my head.

troyraen · 2025-07-01T00:51:34Z

broker/cloud_run/swift/ps_to_storage/deploy.sh

+    # WARNING:  This is set to retry failed deliveries. If there is a bug in main.py this will
+    # retry indefinitely, until the message is delete manually.
+    gcloud pubsub subscriptions create "${ps_input_subscrip}" \
+        --topic "${ps_trigger_topic}" \
+        --topic-project "${PROJECT_ID}" \
+        --ack-deadline=600 \
+        --push-endpoint="${url}${ROUTE_RUN}" \
+        --push-auth-service-account="${runinvoker_svcact}"


Did you figure out how to limit the number of retries for cloud run services? If so, implement it here so we can drop this disaster waiting to happen 🥴.

@troyraen the only solution I can think of is the following:

# add to line 43 ps_deadletter_topic_input_subscrip=$(define_GCP_resources "${survey}-upsilon-deadletter") # add to line 51 gcloud pubsub topics delete "${ps_deadletter_topic_input_subscrip}" # add to line 61 gcloud pubsub topics create "${ps_deadletter_topic_input_subscrip}" gcloud pubsub subscriptions create "${ps_input_subscrip}" \ --topic="${ps_trigger_topic}" \ --topic-project="${PROJECT_ID}" \ --ack-deadline=600 \ --push-endpoint="${url}${ROUTE_RUN}" \ --push-auth-service-account="${runinvoker_svcact}" \ --dead-letter-topic="${ps_deadletter_topic_input_subscrip}" \ --max-delivery-attempts=5

Rather than retrying indefinitely, the message will be published to a dead letter topic after 5 delivery attempts

Ok. What do you think about having only one deadletter topic that we use for every module? I'm resistant to having to manage a different one for each module but I also don't want to create a mess there if/when we actually need to use it. I don't have a clear sense of if/how we will actually want to work with messages that get delivered there. Do you? If we only have one deadletter topic and multiple modules start failing and dumping messages there, will we actually have a need to dig through them and sort out which ones came from which module? I think I would be more inclined to look at the bigquery tables to figure out which messages did/didn't make it through a given module. What do you think?

troyraen · 2025-07-06T00:36:41Z

broker/setup_broker/swift/setup_broker.sh

+        # set IAM policies on public Pub/Sub resources
+        if [ "$testid" = "False" ]; then
+            user="allUsers"
+            roleid="projects/${GOOGLE_CLOUD_PROJECT}/roles/userPublic"
+            gcloud pubsub topics add-iam-policy-binding "${ps_topic_alerts_json}" --member="${user}" --role="${roleid}"
+        fi


'alerts_raw' needs this same policy.

troyraen · 2025-07-06T00:51:17Z

broker/cloud_run/swift/ps_to_storage/deploy.sh

+        gcloud storage buckets add-iam-policy-binding "gs://${gcs_avro_bucket}" \
+            --member="allUsers" \
+            --role="roles/storage.objectViewer"


Similar to the bigquery policy, we should incorporate the permissions being granted here into our custom userPublic role so that we consolidate all the permissions we intend to grant to the public and we can apply the same role/policy to every resource. Also, we should really only give public access to production resources.

I'd like to standardize permissions across broker instances for all surveys in a separate PR

troyraen · 2025-07-06T01:00:35Z

broker/setup_broker/swift/create_vm.sh

 # name of the survey this broker instance will ingest
+survey="${4:-swift}"
 zone="${5:-us-central1-a}"
+project_id="${6:-PROJECT_ID}"


No need to define yet another name for this variable. Just use $PROJECT_ID directly where applicable. (GCP makes this variable confusing by using different names for it in different contexts -- PROJECT_ID and GOOGLE_CLOUD_PROJECT being most common, but there's at least one more. In our scripts we try to standardize on PROJECT_ID.)

troyraen · 2025-07-06T01:17:06Z

broker/setup_broker/swift/setup_broker.sh

+            # grant public access to the dataset; for more information, see:
+            # https://cloud.google.com/bigquery/docs/control-access-to-resources-iam#grant_access_to_a_dataset
+            (cd templates && bq update --source "bq_${survey}_policy.json" "${PROJECT_ID}:${bq_dataset}") || exit 5


Move this to the same block that grants public access to the pubsub topics so that it's only done for production resources.

troyraen · 2025-07-06T01:28:21Z

broker/cloud_run/swift/ps_to_storage/deploy.sh

+    fi
+else
+    echo
+    echo "Creating avro_bucket..."


Let's revisit which scripts create and delete which resources. I know we had a whole discussion about it when we first created these deploy.sh scripts for individual modules and decided to put this bucket creation here, but this is the only public resource that we handle this way and I've been confused by it more than once. Every other resource gets created by setup_broker.sh. At a minimum, our handling of this bucket and the bigquery dataset/table should be consistent.

add directories and files

a6822ed

hernandezc1 self-assigned this Jun 17, 2025

hernandezc1 added Enhancement New feature or request Pipeline: Conductor Components that orchestrate the live pipeline labels Jun 17, 2025

hernandezc1 added 3 commits June 17, 2025 11:12

update classification field in bg table schema

8470a38

address codacy issue(s)

d9e5e1d

update classification and properties fields

415a33e

hernandezc1 added 2 commits June 18, 2025 06:47

Merge branch 'develop' into u/ch/swift

d768787

Merge branch 'develop' into u/ch/swift

3562bce

hernandezc1 requested a review from troyraen June 19, 2025 15:43

troyraen requested changes Jun 23, 2025

View reviewed changes

hernandezc1 added 11 commits June 26, 2025 17:13

add new GCP resources

8fa6c14

add ps_to_storage module for Swift

d89a760

update metadata key/value pairs

241e4bc

address codacy issues

d965999

add IAM policy for BQ dataset

dad2587

configure IAM policy for BQ dataset

08e9486

update script to accomodate swift alert schema

2633278

ensures avro bucket does not exist before creating it

1f8ec8f

creates vm if it does not already exist

d6fd04a

set up artifact registry

03652f7

update default ps topic

86cdebc

hernandezc1 commented Jun 27, 2025

View reviewed changes

hernandezc1 added 5 commits June 27, 2025 14:08

update file metadata

e151221

use latest version of the kafka -> pubsub connector

18072e8

update documentation

26547a1

address codacy issue

48c7ae4

update documentation

97d18b3

hernandezc1 requested a review from troyraen June 27, 2025 18:45

hernandezc1 and others added 7 commits June 29, 2025 14:13

improve readability by updating parameter names

3a2655e

update documentation

9b746d2

update parameter names

f1d221e

squash bug

79805a5

update documentation

face1ab

Merge branch 'develop' into u/ch/swift

bf64911

Merge branch 'develop' into u/ch/swift

76504c4

troyraen requested changes Jul 6, 2025

View reviewed changes

hernandezc1 and others added 13 commits July 8, 2025 13:14

Merge branch 'develop' into u/ch/swift

ae974ec

unpin requirements

43ed80c

rename GCP resources

1f1f54b

rename GCP resources

e710bd7

squash bug and update resource names

efe7f18

use $PROJECT_ID directly where applicable

d4fbb19

update documentation

2c25fd8

add versiontag as an env_var

98f3ccf

update resource name

7587e56

Merge branch 'develop' into u/ch/swift

3c5a658

Merge branch 'develop' into u/ch/swift

4bf33e5

assign dead letter topic

1fca11f

update Pub/Sub dead letter topic name

cf3a4c7

hernandezc1 requested a review from troyraen July 17, 2025 14:30

hernandezc1 and others added 7 commits July 17, 2025 18:03

Merge branch 'develop' into u/ch/swift

968493c

Merge branch 'develop' into u/ch/swift

1122672

Merge branch 'develop' into u/ch/swift

837f736

update resource names

81bb094

update resource names

45651e8

Merge branch 'develop' into u/ch/swift

882187c

Merge branch 'develop' into u/ch/swift

037ba42

troyraen removed their request for review October 2, 2025 21:21

	bq_dataset_alerts=$(define_GCP_resources "${survey}_alerts")
	bq_dataset_alerts=$(define_GCP_resources "${survey}")

Ingest Swift/BAT-GUANO Alert Stream #277

Are you sure you want to change the base?

Ingest Swift/BAT-GUANO Alert Stream #277

Uh oh!

Conversation

hernandezc1 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

troyraen commented Jun 17, 2025

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hernandezc1 Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

troyraen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hernandezc1 commented Jun 17, 2025 •

edited

Loading

hernandezc1 Jun 27, 2025 •

edited

Loading