DM-53622: enable provenance recording by TallJimbo · Pull Request #363 · lsst-dm/prompt_processing

TallJimbo · 2025-12-19T17:26:14Z

No description provided.

kfindeisen

I have some concerns about separation of concerns, and having multiple methods with multiple, partially overlapping responsibilities. That's something we've tried to avoid in this codebase so far. In a few cases (like the error handling), it's easy to factor the code to keep the boundaries clean; with the predefined provenance types it's harder.

I think the root cause of a lot of the complexity is the apparent need to pass a single set of "correct" provenance dimensions to the pipeline execution. How does a single definition work if the pipeline tasks have different dimensions (as they do in practice)? How can the need to human curate the definitions work for general-purpose execution? Why can't the dimensions be inferred automatically from the pipeline definition?

python/activator/middleware_interface.py

tests/test_middleware_interface.py

python/activator/middleware_interface.py

tests/test_middleware_interface.py

python/activator/middleware_interface.py

kfindeisen · 2026-01-05T21:22:39Z

python/activator/middleware_interface.py

+        for task_node in qg.pipeline_graph.tasks.values():
+            if task_node.dimensions == dataset_type.dimensions:
+                data_ids = qg.quanta_by_task[task_node.label].keys()
+                if len(data_ids) == 1:
+                    return DatasetRef(dataset_type, next(iter(data_ids)), run=qg.header.output_run)


It took me a while to work out that this is getting a "typical" data ID for the pipeline execution. We already have code for generating that. The caveat is that MWI mostly works in terms of exposure, not visit; I think visit is only queried in the APDB error handler.

Now that I think about it -- I don't think this would work in the case where we have multiple snaps but also can only run the ISR-only pipeline (this almost happens in daytime testing, though those aren't proper snaps so we only get to run one of them). In that case, you need two exposure IDs and two provenance datasets, don't you?

We've received a guarantee from project leadership that there will not be snaps, ever, and we can start removing code to support them. The exposure vs. visit split in butler won't be going away anytime soon, if ever, but it is now safe to assume that they have a 1-0 or 1-1 relationship.

Much like with the Butler, I'm not sure it's worth the effort to rewrite all the code. 🙂

Though your comment on Jira about groups still having multiple exposures/visits is very relevant here -- snaps may have been what originally motivated PP's multi-exposure support, but it can still be useful in other cases. (Though maybe that means we should remove all uses of the word "snap" to avoid confusion.)

This has been switched to a small butler query that uses the where string you mentioned.

python/activator/middleware_interface.py

kfindeisen · 2026-01-05T23:23:33Z

This PR can't be tested or merged until we have a latest build that includes lsst/daf_butler#1314, lsst/pipe_base#537, ~~and possibly lsst/obs_base#537~~.

TallJimbo

I'm deferring replies/action on points related to the "provenance dimensions" problem while we discuss the big picture in Jira.

python/activator/middleware_interface.py

TallJimbo · 2026-01-06T15:34:12Z

python/activator/middleware_interface.py

+        for task_node in qg.pipeline_graph.tasks.values():
+            if task_node.dimensions == dataset_type.dimensions:
+                data_ids = qg.quanta_by_task[task_node.label].keys()
+                if len(data_ids) == 1:
+                    return DatasetRef(dataset_type, next(iter(data_ids)), run=qg.header.output_run)


We've received a guarantee from project leadership that there will not be snaps, ever, and we can start removing code to support them. The exposure vs. visit split in butler won't be going away anytime soon, if ever, but it is now safe to assume that they have a 1-0 or 1-1 relationship.

tests/test_middleware_interface.py

kfindeisen

Looks good, but please clean up the commit history before merging -- it looks like most of the later commits are fixes to the original ones?

python/activator/exception.py

python/activator/middleware_interface.py

kfindeisen · 2026-01-08T23:06:29Z

tests/test_middleware_interface.py

+            return PredictedQuantumGraph.make_empty(universe=DimensionUniverse(), output_run="test")
        elif n_quanta < 0:
            raise RuntimeError("Invalid input")
        else:


Can you add a TODO comment for the remaining from_old_quantum_graph call?

kfindeisen · 2026-01-09T00:27:04Z

tests/test_middleware_interface.py

+                "activator.middleware_interface.SeparablePipelineExecutor.run_pipeline"
+            ) as mock_run,
+            # Mocked QGs do not have realistic dimensions, and provenance
+            # dataset types need to have the same dimensions.


Should this comment be attached to the definition of test_provenance_dataset_type instead?

kfindeisen · 2026-01-09T00:30:21Z

python/activator/middleware_interface.py

+            self.butler.dimensions.conform(["exposure", "detector"]),
+            "ProvenanceQuantumGraph",
+        )
+        self.butler.registry.registerDatasetType(self._exposure_provenance_dataset_type)


I don't see how it's cleaner to make the dataset type part of the object state (and far from where they're actually needed), but I suppose it minimizes the changes that need to be made to the tests.

This wasn't an option when this code was first written, but we don't have to live with all those backslashes anymore.

TallJimbo · 2026-01-09T16:48:09Z

I believe all comments have now been addressed. I'll post again when all of the needed upstream changes have made it into a daily or weekly release.

TallJimbo · 2026-01-12T15:39:53Z

The upstream changes are now in a (d_2026_01_10+). It wasn't clear to me whether the base container updates on a daily or weekly cadence.

kfindeisen · 2026-01-12T20:19:09Z

We'll create a new base build as soon as we're able to test it, but that's proving tricky right now. Thank you for your patience.

Previously, we'd depended on the default value being False, and didn't update our code when the API switched the default. It didn't matter until provenance datasets could appear in each run with the same ID.

TallJimbo marked this pull request as ready for review December 19, 2025 20:21

kfindeisen requested changes Jan 5, 2026

View reviewed changes

TallJimbo commented Jan 6, 2026

View reviewed changes

TallJimbo force-pushed the tickets/DM-53622 branch from 8705bec to 615ffc8 Compare January 8, 2026 15:11

kfindeisen approved these changes Jan 9, 2026

View reviewed changes

TallJimbo force-pushed the tickets/DM-53622 branch 2 times, most recently from c59f134 to 7cd6e38 Compare January 9, 2026 16:41

TallJimbo added 3 commits January 9, 2026 11:46

Switch to new QG type.

f1e422d

Reformat compound 'with' statements with parentheses.

529e027

This wasn't an option when this code was first written, but we don't have to live with all those backslashes anymore.

Enable provenance recording.

ffa6a9f

TallJimbo force-pushed the tickets/DM-53622 branch from 7cd6e38 to ffa6a9f Compare January 9, 2026 16:46

Set find_first explicitly on all dataset queries.

610b8be

Previously, we'd depended on the default value being False, and didn't update our code when the API switched the default. It didn't matter until provenance datasets could appear in each run with the same ID.

kfindeisen force-pushed the tickets/DM-53622 branch from 617a096 to 610b8be Compare January 23, 2026 00:25

TallJimbo merged commit bb627dc into main Jan 27, 2026
11 checks passed

TallJimbo deleted the tickets/DM-53622 branch January 27, 2026 13:37

Conversation

TallJimbo commented Dec 19, 2025

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kfindeisen Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

kfindeisen Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TallJimbo Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

kfindeisen Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TallJimbo Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kfindeisen commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TallJimbo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TallJimbo Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kfindeisen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kfindeisen Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

kfindeisen Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

kfindeisen Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

TallJimbo commented Jan 9, 2026

Uh oh!

TallJimbo commented Jan 12, 2026

Uh oh!

kfindeisen commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfindeisen Jan 5, 2026 •

edited

Loading

kfindeisen Jan 6, 2026 •

edited

Loading

kfindeisen commented Jan 5, 2026 •

edited

Loading