Skip to content

Update extract fields #1305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Conversation

cswartzvi
Copy link
Contributor

@cswartzvi cswartzvi commented May 9, 2025

Updated the existing function modifier extract_fields so that it can infer field types from the type annotation.

Warning

Important: This PR is based on the branch in #1303 and not main. Recommend merging that branch first. Sorry for the confusion!

Changes

  • Isolated the field extraction logic in a helper function called _process_extract_fields this function determines field types when necessary before calling the preexisting helper _validate_extract_fields
  • The extract_fields class now calls _process_extract_fields directly (instead of _validate_extract_fields)
  • Documentation on using extract_fields was updated to include unpacked field names, list of field names, and the previously undocumented TypedDict

How I tested this

  • Added test cases to validate the functionality of the extract_fields decorator with inferred field types
  • Updated and consolidated existing annotation checks to handle explicit field types, inferred field types, and TypedDicts

Notes

To use this feature you must specify a generic dictionary with valid type paramerters - therefore it will only work for homogenous dictionaries. For example, the following would extract the standard X_train, X_test, y_train, and y_test as np.ndarray by using unpacked field names:

@extract_fields('X_train', 'X_test' 'y_train' 'y_test')  # unpacked field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

You can also pass a list of field names to the first argument:

@extract_fields(['X_train', 'X_test' 'y_train' 'y_test'])   # list of field names
def train_test_split_func(...) -> Dict[str, np.ndarray]:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

This also preserves backward compatibility with non-generic dictionaries:

@extract_fields(dict(  # fields specified as a dictionary
    X_train=np.ndarray,
    X_validation=np.ndarray,
    X_test=np.ndarray,
))
def train_test_split_func(...) -> Dict:
    ...
    return {"X_train": ..., "X_test": ..., "y_train": ..., "y_test": ...}

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to d91009b in 2 minutes and 40 seconds. Click for details.
  • Reviewed 1199 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. tests/function_modifiers/test_expanders.py:1024
  • Draft comment:
    Very comprehensive tests; consider adding inline comments in complex cases (e.g. in test_inject_multiple_things where a magic number formula is used) to explain the expected result. This helps clarify why the computed sum equals the given expression.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. tests/function_modifiers/test_expanders.py:29
  • Draft comment:
    Consider adding a module-level comment to explain the purpose and structure of these tests. This will help new contributors understand the various decorator mechanisms being validated.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
3. tests/function_modifiers/test_expanders.py:295
  • Draft comment:
    The parametrized tests using eval() on type strings (e.g., in test_extract_fields_invalid_annotations_for_inferred_types) are acceptable in a trusted test context, but ensure that future modifications sanitize these inputs properly.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
4. tests/function_modifiers/test_expanders.py:1
  • Draft comment:
    Consider splitting this large test file into smaller modules (e.g., separate tests for extract_fields, parameterize, and inject) to improve maintainability and readability.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. tests/function_modifiers/test_expanders.py:367
  • Draft comment:
    Avoid using eval() to obtain type annotations (e.g. in test_extract_fields_valid_annotations_for_inferred_types). Prefer direct type references to improve clarity and reduce potential runtime issues.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test file and the eval() is being used specifically to test different type annotations in a parameterized way. The alternative would be to repeat very similar test cases or have more complex test setup. Using eval() here is a reasonable testing pattern. The security concerns of eval() are not relevant in test code. The comment seems to misunderstand the testing context. The comment raises valid concerns about eval() being dangerous in production code. Perhaps there could be a safer way to test type annotations. While eval() is dangerous in production code, this is test code where the inputs are fully controlled. The use of eval() here is an appropriate testing pattern for parameterized type annotation tests. The comment should be deleted as it misunderstands the testing context where eval() is an acceptable pattern for parameterized type annotation tests.
6. tests/function_modifiers/test_expanders.py:1023
  • Draft comment:
    In test_inject_multiple_things, the expected output is computed as '8 * (8 + 1) // 2'. Consider adding an inline comment explaining this calculation for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. tests/function_modifiers/test_expanders.py:247
  • Draft comment:
    Assertions comparing entire node objects (e.g. 'assert nodes[0] == node.Node(...)') rely on the node's eq implementation. Ensure that this method is robust or compare individual attributes to prevent subtle mismatches.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. tests/function_modifiers/test_expanders.py:900
  • Draft comment:
    Several tests include similar assertions on node transformation outputs. Consider refactoring common assertions into helper functions to avoid repetition and improve test readability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. docs/reference/decorators/index.rst:5
  • Draft comment:
    Typo detected: 'configuratibility' should be 'configurability'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. hamilton/function_modifiers/expanders.py:26
  • Draft comment:
    Typo: In the module docstring, "Decorators that enables DRY code..." should be corrected to "Decorators that enable DRY code..." for proper subject-verb agreement.
  • Reason this comment was not posted:
    Comment was on unchanged code.
11. hamilton/function_modifiers/expanders.py:1141
  • Draft comment:
    Typographical error: In the comment on line 1141, "naturally maeks sense" should be corrected to "naturally makes sense".
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
12. tests/function_modifiers/test_expanders.py:36
  • Draft comment:
    Typographical error: The string 'non_existant' (line 36) is misspelled; it should be 'non_existent'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_tmjf5f1ze95sTcgM

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@cswartzvi
Copy link
Contributor Author

Errors coming from the dlt plugin ...

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some nits!

@@ -694,6 +699,88 @@ def extractor_fn(
return output_nodes


def _process_extract_fields(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit -- change the name from _process to something more descriptive?

self.output_type = output_type

@override
def transform_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring?

@cswartzvi cswartzvi force-pushed the update_extract_fields branch from cf42e9b to 824ba2b Compare May 19, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants