Update extract fields #1305

cswartzvi · 2025-05-09T03:02:53Z

Updated the existing function modifier extract_fields so that it can infer field types from the type annotation.

Warning

Important: This PR is based on the branch in #1303 and not main. Recommend merging that branch first. Sorry for the confusion!

Changes

Isolated the field extraction logic in a helper function called _process_extract_fields this function determines field types when necessary before calling the preexisting helper _validate_extract_fields
The extract_fields class now calls _process_extract_fields directly (instead of _validate_extract_fields)
Documentation on using extract_fields was updated to include unpacked field names, list of field names, and the previously undocumented TypedDict

How I tested this

Added test cases to validate the functionality of the extract_fields decorator with inferred field types
Updated and consolidated existing annotation checks to handle explicit field types, inferred field types, and TypedDicts

Notes

To use this feature you must specify a generic dictionary with valid type paramerters - therefore it will only work for homogenous dictionaries. For example, the following would extract the standard X_train, X_test, y_train, and y_test as np.ndarray by using unpacked field names:

@extract_fields('X_train', 'X_test' 'y_train' 'y_test')  # unpacked field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

You can also pass a list of field names to the first argument:

@extract_fields(['X_train', 'X_test' 'y_train' 'y_test'])   # list of field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

This also preserves backward compatibility with non-generic dictionaries:

@extract_fields(dict(  # fields specified as a dictionary
    X_train=np.ndarray,
    X_validation=np.ndarray,
    X_test=np.ndarray,
))
def train_test_split_func(...) -> Dict:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to d91009b in 2 minutes and 40 seconds. Click for details.

Reviewed 1199 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 12 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/function_modifiers/test_expanders.py:1024

Draft comment:
Very comprehensive tests; consider adding inline comments in complex cases (e.g. in test_inject_multiple_things where a magic number formula is used) to explain the expected result. This helps clarify why the computed sum equals the given expression.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. tests/function_modifiers/test_expanders.py:29

Draft comment:
Consider adding a module-level comment to explain the purpose and structure of these tests. This will help new contributors understand the various decorator mechanisms being validated.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

3. tests/function_modifiers/test_expanders.py:295

Draft comment:
The parametrized tests using eval() on type strings (e.g., in test_extract_fields_invalid_annotations_for_inferred_types) are acceptable in a trusted test context, but ensure that future modifications sanitize these inputs properly.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

4. tests/function_modifiers/test_expanders.py:1

Draft comment:
Consider splitting this large test file into smaller modules (e.g., separate tests for extract_fields, parameterize, and inject) to improve maintainability and readability.
Reason this comment was not posted:
Comment was on unchanged code.

5. tests/function_modifiers/test_expanders.py:367

Draft comment:
Avoid using eval() to obtain type annotations (e.g. in test_extract_fields_valid_annotations_for_inferred_types). Prefer direct type references to improve clarity and reduce potential runtime issues.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test file and the eval() is being used specifically to test different type annotations in a parameterized way. The alternative would be to repeat very similar test cases or have more complex test setup. Using eval() here is a reasonable testing pattern. The security concerns of eval() are not relevant in test code. The comment seems to misunderstand the testing context. The comment raises valid concerns about eval() being dangerous in production code. Perhaps there could be a safer way to test type annotations. While eval() is dangerous in production code, this is test code where the inputs are fully controlled. The use of eval() here is an appropriate testing pattern for parameterized type annotation tests. The comment should be deleted as it misunderstands the testing context where eval() is an acceptable pattern for parameterized type annotation tests.

6. tests/function_modifiers/test_expanders.py:1023

Draft comment:
In test_inject_multiple_things, the expected output is computed as '8 * (8 + 1) // 2'. Consider adding an inline comment explaining this calculation for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

7. tests/function_modifiers/test_expanders.py:247

Draft comment:
Assertions comparing entire node objects (e.g. 'assert nodes[0] == node.Node(...)') rely on the node's eq implementation. Ensure that this method is robust or compare individual attributes to prevent subtle mismatches.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. tests/function_modifiers/test_expanders.py:900

Draft comment:
Several tests include similar assertions on node transformation outputs. Consider refactoring common assertions into helper functions to avoid repetition and improve test readability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. docs/reference/decorators/index.rst:5

Draft comment:
Typo detected: 'configuratibility' should be 'configurability'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. hamilton/function_modifiers/expanders.py:26

Draft comment:
Typo: In the module docstring, "Decorators that enables DRY code..." should be corrected to "Decorators that enable DRY code..." for proper subject-verb agreement.
Reason this comment was not posted:
Comment was on unchanged code.

11. hamilton/function_modifiers/expanders.py:1141

Draft comment:
Typographical error: In the comment on line 1141, "naturally maeks sense" should be corrected to "naturally makes sense".
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. tests/function_modifiers/test_expanders.py:36

Draft comment:
Typographical error: The string 'non_existant' (line 36) is misspelled; it should be 'non_existent'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_tmjf5f1ze95sTcgM

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

docs/concepts/function-modifiers.rst

cswartzvi · 2025-05-09T03:34:48Z

Errors coming from the dlt plugin ...

elijahbenizzy

Looks good, some nits!

elijahbenizzy · 2025-05-19T03:43:50Z

hamilton/function_modifiers/expanders.py

@@ -694,6 +699,88 @@ def extractor_fn(
        return output_nodes


+def _process_extract_fields(


Nit -- change the name from _process to something more descriptive?

elijahbenizzy · 2025-05-19T03:46:13Z

hamilton/function_modifiers/expanders.py

+        self.output_type = output_type
+
+    @override
+    def transform_node(


Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

ellipsis-dev bot reviewed May 9, 2025

View reviewed changes

docs/concepts/function-modifiers.rst Outdated Show resolved Hide resolved

cswartzvi mentioned this pull request May 9, 2025

Combined extract decorator #121

Open

cswartzvi mentioned this pull request May 9, 2025

Fix dlt plugin with changes to loader_file_format #1306

Merged

7 tasks

elijahbenizzy reviewed May 19, 2025

View reviewed changes

cswartzvi and others added 20 commits May 19, 2025 12:38

Initial implementation of unpack_fields

4f19fad

Initial tests for unpack_fields

71cd1d3

Add documentation for unpack_fields

240b4c4

Use Optional for backward compatibility

6a00141

Fix typo in test name test_unpack_fields_valid_indeterminate_tuple

99b5323

Fix docstring of _process_unpack_fields

6a10ce7

Split unpack_field annotation tests for backward compatibility

7fbed27

Removed 'future' type annotations for backward compatibility

ada4a05

More backward compatibility changes

1d41134

Move error handling to _process_unpack_fields

9a6ae0c

Remove redundant asynchronous extractor for unpack_fields

c284b96

Expand _process_unpack_fields check for tuple[..., int]

de38709

Expand unpack_fields annotation tests

8f560e2

Fix some docstring typos

d15354d

Use single synchronous extractor function

81941bd

Upgrade extract_fields

eca6c99

Update extract_fields documentation

f88b322

Rename unpack_fields tests for consistency

8b86553

Add backward compatible Union

f2c9671

Update docs/concepts/function-modifiers.rst

824ba2b

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

cswartzvi force-pushed the update_extract_fields branch from cf42e9b to 824ba2b Compare May 19, 2025 16:38

cswartzvi added 2 commits May 19, 2025 12:57

Rename process functions

73c81e8

Update docstrings

67f5c66

elijahbenizzy approved these changes May 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update extract fields #1305

Update extract fields #1305

Uh oh!

cswartzvi commented May 9, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

cswartzvi commented May 9, 2025

Uh oh!

elijahbenizzy left a comment

Uh oh!

elijahbenizzy May 19, 2025

Uh oh!

elijahbenizzy May 19, 2025

Uh oh!

Uh oh!

		@@ -694,6 +699,88 @@ def extractor_fn(
		return output_nodes


		def _process_extract_fields(

Update extract fields #1305

Are you sure you want to change the base?

Update extract fields #1305

Uh oh!

Conversation

cswartzvi commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

How I tested this

Notes

Checklist

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cswartzvi commented May 9, 2025

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

elijahbenizzy May 19, 2025

Choose a reason for hiding this comment

Uh oh!

elijahbenizzy May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cswartzvi commented May 9, 2025 •

edited

Loading