fix(setup): allow empty values for args #1230

shcheklein · 2025-07-11T23:22:41Z

Allow sending empty ({}, None, "", etc) values to setup. Before it was failing with an index error.

Summary by Sourcery

Allow empty values to be used as setup arguments by adjusting the key check logic in row_to_objs and updating return types to list[Any]

Bug Fixes:

Prevent index error when passing empty setup values by checking key presence instead of truthiness

Enhancements:

Allow empty values (like {}, None, "") in setup by using key existence checks in row_to_objs
Change return types of row_to_objs and _parse_row from list[DataValue] to list[Any] to support generic values

Tests:

Extend test_row_to_objs_setup to include an 'empty' field and verify empty dict is accepted

sourcery-ai · 2025-07-11T23:22:46Z

Reviewer's Guide

Modified row_to_objs in SignalSchema and the UDF parser to allow empty setup values by checking for key presence instead of truthiness, updated type annotations accordingly, and added a unit test for empty values.

Sequence diagram for row_to_objs handling of empty setup values

sequenceDiagram
    participant UDFParser
    participant SignalSchema
    UDFParser->>SignalSchema: row_to_objs(row)
    SignalSchema->>SignalSchema: _init_setup_values()
    alt setup_values contains name
        SignalSchema-->>UDFParser: objs.append(setup_values.get(name))
    else ModelStore.to_pydantic(fr_type) is not None
        SignalSchema-->>UDFParser: objs.append(fr(**j))
    end

Class diagram for updated SignalSchema and UDF parsing

classDiagram
    class SignalSchema {
        +row_to_objs(row: Sequence[Any]) list[Any]
        - _init_setup_values()
        values: dict
        setup_values: dict
    }
    class UDFParser {
        +_parse_row(row_dict: RowDict, catalog: Catalog, cache: bool, download_cb: Callback) list[Any]
        params: SignalSchema
    }
    SignalSchema <.. UDFParser : uses

File-Level Changes

Change	Details	Files
Relax setup_values handling in row_to_objs to accept empty values	Removed truthiness check on setup_values.get(name) Replaced with key membership check and unconditional append of setup_values[name]	`src/datachain/lib/signal_schema.py`
Update return type annotations to reflect broader value types	Changed row_to_objs return annotation from list[DataValue] to list[Any] Changed _parse_row return annotation in udf parser from list[DataValue] to list[Any]	`src/datachain/lib/signal_schema.py` `src/datachain/lib/udf.py`
Add unit test for empty setup values	Extended test spec and setup to include an empty dict value Updated expected result assertion to include the empty dict	`tests/unit/lib/test_signal_schema.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

cloudflare-workers-and-pages · 2025-07-11T23:23:15Z

Deploying datachain-documentation with Cloudflare Pages

Latest commit:	`c9ca06a`
Status:	✅ Deploy successful!
Preview URL:	https://6e1936f2.datachain-documentation.pages.dev
Branch Preview URL:	https://fix-setup-empty-arg.datachain-documentation.pages.dev

View logs

sourcery-ai

Hey @shcheklein - I've reviewed your changes - here's some feedback:

Double‐check that changing the setup check to if name in setup_values won’t accidentally override parsing for legitimately falsy values (like None or 0) that you might still want to unflatten instead of returning directly.
Since you widened the return types of row_to_objs and _parse_row to list[Any], consider whether you can still preserve a more specific type alias or narrow that type in downstream code to avoid losing type information.
In the test, using dict as the setup value actually refers to the constructor rather than an empty dict instance—consider passing {} directly to make the intent clearer and avoid confusion.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Double‐check that changing the setup check to `if name in setup_values` won’t accidentally override parsing for legitimately falsy values (like None or 0) that you might still want to unflatten instead of returning directly.
- Since you widened the return types of `row_to_objs` and `_parse_row` to `list[Any]`, consider whether you can still preserve a more specific type alias or narrow that type in downstream code to avoid losing type information.
- In the test, using `dict` as the setup value actually refers to the constructor rather than an empty dict instance—consider passing `{}` directly to make the intent clearer and avoid confusion.

## Individual Comments

### Comment 1
<location> `tests/unit/lib/test_signal_schema.py:995` </location>
<code_context>
+    spec = {"name": str, "age": float, "init_val": int, "fr": MyType2, "empty": dict}
     setup_value = 84635
-    setup = {"init_val": lambda: setup_value}
+    setup = {"init_val": lambda: setup_value, "empty": dict}
     schema = SignalSchema(spec, setup)

</code_context>

<issue_to_address>
Consider adding tests for other types of empty values (None, empty string, etc).

Please add test cases for 'empty': None and 'empty': "" to ensure all intended empty value types are covered.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-07-11T23:23:27Z

tests/unit/lib/test_signal_schema.py

    setup_value = 84635
-    setup = {"init_val": lambda: setup_value}
+    setup = {"init_val": lambda: setup_value, "empty": dict}


suggestion (testing): Consider adding tests for other types of empty values (None, empty string, etc).

Please add test cases for 'empty': None and 'empty': "" to ensure all intended empty value types are covered.

shcheklein · 2025-07-11T23:23:36Z

src/datachain/lib/signal_schema.py

        pos = 0
        for name, fr_type in self.values.items():
-            if self.setup_values and (val := self.setup_values.get(name, None)):


Here: we skip it if value is not boolen(val). So, we skip if we have value set to {} for example.

codecov · 2025-07-11T23:29:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.66%. Comparing base (fd3795e) to head (c9ca06a).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1230      +/-   ##
==========================================
- Coverage   88.68%   88.66%   -0.03%     
==========================================
  Files         152      152              
  Lines       13606    13605       -1     
  Branches     1893     1893              
==========================================
- Hits        12067    12063       -4     
- Misses       1093     1095       +2     
- Partials      446      447       +1

Flag	Coverage Δ
datachain	`88.59% <100.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/datachain/lib/signal_schema.py	`95.42% <100.00%> (ø)`
src/datachain/lib/udf.py	`93.36% <ø> (-0.03%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dmpetrov

LG

fix(setup): allow empty values for args

c9ca06a

sourcery-ai bot reviewed Jul 11, 2025

View reviewed changes

shcheklein commented Jul 11, 2025

View reviewed changes

shcheklein requested review from a team and dmpetrov July 11, 2025 23:23

dmpetrov approved these changes Jul 12, 2025

View reviewed changes

shcheklein merged commit c6d32d6 into main Jul 12, 2025
58 of 59 checks passed

shcheklein deleted the fix-setup-empty-arg branch July 12, 2025 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(setup): allow empty values for args #1230

fix(setup): allow empty values for args #1230

Uh oh!

shcheklein commented Jul 11, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jul 11, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

cloudflare-workers-and-pages bot commented Jul 11, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Jul 11, 2025

Uh oh!

shcheklein Jul 11, 2025

Uh oh!

codecov bot commented Jul 11, 2025 •

edited

Loading

Uh oh!

dmpetrov left a comment

Uh oh!

Uh oh!

Uh oh!

fix(setup): allow empty values for args #1230

fix(setup): allow empty values for args #1230

Uh oh!

Conversation

shcheklein commented Jul 11, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for row_to_objs handling of empty setup values

Class diagram for updated SignalSchema and UDF parsing

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

cloudflare-workers-and-pages bot commented Jul 11, 2025

Deploying datachain-documentation with Cloudflare Pages

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

shcheklein Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dmpetrov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shcheklein commented Jul 11, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jul 11, 2025 •

edited

Loading

codecov bot commented Jul 11, 2025 •

edited

Loading