Skip to content

Add option to disable distributed mode #1235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 14, 2025

Conversation

dreadatour
Copy link
Contributor

@dreadatour dreadatour commented Jul 13, 2025

Add option to disable distributed mode by set environment variable:

DATACHAIN_DISTRIBUTED_DISABLED=True

This will be useful to debug queries in SaaS.

Summary by Sourcery

Add an environment variable option to disable distributed mode for debugging.

New Features:

  • Introduce DATACHAIN_DISTRIBUTED_DISABLED environment variable to disable distributed mode
  • Update get_udf_distributor_class to return None when distributed mode is disabled

@dreadatour dreadatour requested review from a team and Copilot July 13, 2025 16:01
@dreadatour dreadatour self-assigned this Jul 13, 2025
Copy link
Contributor

sourcery-ai bot commented Jul 13, 2025

Reviewer's Guide

Introduces a new environment variable to disable distributed processing by short-circuiting the UDF distributor loader, aiding local debugging.

Class diagram for get_udf_distributor_class update

classDiagram
    class loader {
        +get_udf_distributor_class() Optional[type[AbstractUDFDistributor]]
    }
    class AbstractUDFDistributor
    loader ..> AbstractUDFDistributor : returns Optional[type]
Loading

File-Level Changes

Change Details Files
Define a flag to disable distributed mode via environment variable
  • Add DISTRIBUTED_DISABLED constant
  • Document the new env var alongside existing constants
src/datachain/catalog/loader.py
Early exit in get_udf_distributor_class when disabling distributed mode
  • Check if DISTRIBUTED_DISABLED is set to "True"
  • Return None before attempting to import or load distributor
src/datachain/catalog/loader.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 598704d
Status: ✅  Deploy successful!
Preview URL: https://b2292233.datachain-documentation.pages.dev
Branch Preview URL: https://add-option-to-disable-distri.datachain-documentation.pages.dev

View logs

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dreadatour - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds an environment variable flag to disable distributed mode for debugging in SaaS deployments.

  • Defines a new DISTRIBUTED_DISABLED constant for the DATACHAIN_DISTRIBUTED_DISABLED env var
  • Updates get_udf_distributor_class to early-return None when the flag is set to "True"
  • Does not update documentation or tests for the new behavior
Comments suppressed due to low confidence (2)

src/datachain/catalog/loader.py:21

  • [nitpick] Consider adding this new environment variable to the project documentation (e.g., README or module docstring) to explain its purpose and usage.
DISTRIBUTED_DISABLED = "DATACHAIN_DISTRIBUTED_DISABLED"

src/datachain/catalog/loader.py:107

  • There are no tests covering the new DATACHAIN_DISTRIBUTED_DISABLED behavior. Add unit tests to verify that setting this flag causes get_udf_distributor_class to return None.
    if os.environ.get(DISTRIBUTED_DISABLED) == "True":

Copy link

codecov bot commented Jul 13, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.63%. Comparing base (8b3c25a) to head (598704d).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/catalog/loader.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1235      +/-   ##
==========================================
- Coverage   88.66%   88.63%   -0.04%     
==========================================
  Files         153      153              
  Lines       13793    13796       +3     
  Branches     1927     1928       +1     
==========================================
- Hits        12230    12228       -2     
- Misses       1109     1112       +3     
- Partials      454      456       +2     
Flag Coverage Δ
datachain 88.56% <33.33%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/datachain/catalog/loader.py 74.68% <33.33%> (-1.64%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it useful? Runs faster?

@dreadatour
Copy link
Contributor Author

How is it useful? Runs faster?

Skip running UDFs in the separate processes (workers) and run it in-place. This helps to find out if problem is within UDF running machinery or somewhere else. Also logs from UDFs right now are not processed completely, this helps to see the logs (this is temporary reason until I'll this this issue).

I have used this few times on localhost while reproducing bugs, it would be nice to have it dev/prod environment for easier investigations.

@dreadatour dreadatour merged commit fc7847c into main Jul 14, 2025
57 of 59 checks passed
@dreadatour dreadatour deleted the add-option-to-disable-distributed-mode branch July 14, 2025 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants