Skip to content

Conversation

@hernandezc1
Copy link
Contributor

@hernandezc1 hernandezc1 commented Nov 5, 2025

Summary of Changes

Added

  • pittgoogle/pubsub.py
    • The Subscription class now supports the arguments: attribute_filter and udf
    • The _create() function for the Subscription class now supports the creation of subscriptions that use Pub/Sub's built-in filters (i.e., filter based on message attributes) and/or single message transforms through user-defined functions

Closes #116.

@codacy-production
Copy link

codacy-production bot commented Nov 5, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-0.48% 25.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (f7201f5) 974 685 70.33%
Head commit (da8ee8c) 985 (+11) 688 (+3) 69.85% (-0.48%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#117) 12 3 25.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@hernandezc1 hernandezc1 self-assigned this Nov 5, 2025
@hernandezc1 hernandezc1 added enhancement New feature or request python Pull requests that update python code labels Nov 5, 2025
@hernandezc1 hernandezc1 requested a review from troyraen November 5, 2025 19:46
@hernandezc1
Copy link
Contributor Author

@troyraen requesting your review and noting the error caused when building the documentation. Have you encountered that error before?

| CMake Error at CMakeLists.txt:293 (find_package):
    |   By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
    |   asked CMake to find a package configuration file provided by "Arrow", but
    |   CMake did not find one.
    | 
    |   Could not find a package configuration file provided by "Arrow" with any of
    |   the following names:
    | 
    |     ArrowConfig.cmake
    |     arrow-config.cmake
    | 
    |   Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
    |   "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
    |   provides a separate development package or SDK, be sure it has been
    |   installed.
    | 
    | 
    | -- Configuring incomplete, errors occurred!
    | error: command '/usr/local/bin/cmake' failed with exit code 1

Note: This error originates from the build backend, and is likely not a problem with poetry but one of the following issues with pyarrow (19.0.1)

  - not supporting PEP 517 builds
  - not specifying PEP 517 build requirements correctly
  - the build requirements are incompatible with your operating system or Python version
  - the build requirements are missing system dependencies (eg: compilers, libraries, headers).

You can verify this by running pip wheel --no-cache-dir --use-pep517 "pyarrow (==19.0.1)".

Error: Process completed with exit code 1.

@troyraen
Copy link
Contributor

troyraen commented Nov 5, 2025

I've seen similar things. They can be a little tricky. But I would think this should be using our poetry.lock file, which we haven't changed in awhile. The most recent change I see that I would suspect could be causing this is #113 which upgraded setuptools for actions/setup-python, but the Build Documentation completed successfully twice after that merged.

I would start by trying to build the documentation locally, if you haven't already. That may or may not provide insight since this could be related to the underlying os environment which will be different for you locally. But it's the easiest thing to do to start looking.

If the local build works for you, I'd suggest upgrading our dependencies in the poetry.lock file. There is a more recent version of pyarrow, so just upgrading that might fix the problem. (But go ahead and upgrade everything.) I think the command is something like poetry update but our docs should say specifically.

@hernandezc1 hernandezc1 force-pushed the u/ch/pubsub/subscription branch from d766522 to 7407e55 Compare November 6, 2025 14:15
@hernandezc1
Copy link
Contributor Author

@troyraen thanks for your help!

Copy link
Contributor

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Try to write at least one unit test to check some part of the code you're adding here. (Maybe the lines that parse the function name out of the javascript string?)

Might want to copy the validation piece of #116 into a new issue to be dealt with later.

import attrs.validators
import google.api_core.exceptions
import google.cloud.pubsub_v1
from google.pubsub_v1.types import JavaScriptUDF, MessageTransform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do import google.pubsub_v1.types instead of from ....

Also, is this a new dependency that needs to be added to the toml file?

Comment on lines +326 to +333
attribute_filter (str, optional):
Specify a filter to only receive the messages whose attributes match the filter. The filter is an immutable
property of a subscription. After you create a subscription, you cannot update the subscription to modify
the filter.
udf (str, optional):
Specify a JavaScript User-Defined Function (UDF). UDFs attached to a subscription can enable a wide range
of use cases, including: message filtering (based on the message payload and/or attributes), simple data
transformations, data masking and redaction, and data format conversions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to go into the details here, but do give the user a heads up that both of these strings must conform to very specific requirements and then provide links to the google documentation where they can learn what those requirements are.

In the udf docstring, incorporate "Pub/Sub Single Message Transforms (SMTs)" so it's clear what this is used for.

Check our other docstrings and examples to see what words we typically use to describe message attributes and payloads. New users probably won't immediately understand what they mean and I may have intentionally used different words elsewhere that I thought would be more recognizable. Would be good to be consistent throughout our docs. (Maybe we do use "attributes" and "payload", I just literally don't remember right now.)

Example:
Create a subscription to Pitt-Google's 'ztf-loop' topic and pull messages:
Create a subscription to Pitt-Google's 'lsst-loop' topic and pull messages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it clear that this is simulated data. (Ideally, the topic itself would be called something like lsstsims-loop, but also ok just to make a note right here instead of changing the topic name.)

Comment on lines +345 to +346
keepDiaObjects = "attributes:diaObject_diaObjectId"
filterByNPrevDetections = '''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would name these two variables attribute_filter and udf to match the kwarg names so it's clear from the start where each is destined to be applied. But whatever you prefer.

Add a comment for each explaining what kind of messages will pass the filter. (Does the first one actually do anything with just a field name?)

Specify a filter to only receive the messages whose attributes match the filter. The filter is an immutable
property of a subscription. After you create a subscription, you cannot update the subscription to modify
the filter.
udf (str, optional):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this variable smt_udf to include the info about what the string is used for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see that google calls this variable javascript_udf, so that would be a good name for it here as well. It's helpful to know that this is javascript. Could also add in the SMT info and call it smt_javascript_udf.

raise TypeError("The subscription needs to be created but no topic was provided.")

if self.udf:
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this library is quite small, so it's fine to just import it at the top of this file.

Comment on lines +443 to +445
_function_name = match.group(1) if match else "filter"
if not match:
LOGGER.warning("Could not parse function name from UDF; using default 'filter'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily a filter though, right? It could be a transformation that doesn't filter.

Comment on lines +382 to +383
attribute_filter: Optional[str] = attrs.field(default=None)
udf: Optional[str] = attrs.field(default=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should just be kwargs in the touch() function. These are only used once when creating the subscription. And when a user creates a Subscription object for a gcp subscription that already exists, our code doesn't make the filter or udf available in the Subscription object even if they exist on the actual subscription. So I don't think they should be class attributes.

And if the user passes one in but the subscription already exists, should probably raise a warning saying the subscription exists so the kwarg wasn't used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support keyword arguments when creating Pub/Sub subscriptions

3 participants