-
Notifications
You must be signed in to change notification settings - Fork 117
Add option to disable distributed mode #1235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideIntroduces a new environment variable to disable distributed processing by short-circuiting the UDF distributor loader, aiding local debugging. Class diagram for get_udf_distributor_class updateclassDiagram
class loader {
+get_udf_distributor_class() Optional[type[AbstractUDFDistributor]]
}
class AbstractUDFDistributor
loader ..> AbstractUDFDistributor : returns Optional[type]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Deploying datachain-documentation with
|
Latest commit: |
598704d
|
Status: | ✅ Deploy successful! |
Preview URL: | https://b2292233.datachain-documentation.pages.dev |
Branch Preview URL: | https://add-option-to-disable-distri.datachain-documentation.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dreadatour - I've reviewed your changes and they look great!
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds an environment variable flag to disable distributed mode for debugging in SaaS deployments.
- Defines a new
DISTRIBUTED_DISABLED
constant for theDATACHAIN_DISTRIBUTED_DISABLED
env var - Updates
get_udf_distributor_class
to early-returnNone
when the flag is set to"True"
- Does not update documentation or tests for the new behavior
Comments suppressed due to low confidence (2)
src/datachain/catalog/loader.py:21
- [nitpick] Consider adding this new environment variable to the project documentation (e.g., README or module docstring) to explain its purpose and usage.
DISTRIBUTED_DISABLED = "DATACHAIN_DISTRIBUTED_DISABLED"
src/datachain/catalog/loader.py:107
- There are no tests covering the new
DATACHAIN_DISTRIBUTED_DISABLED
behavior. Add unit tests to verify that setting this flag causesget_udf_distributor_class
to returnNone
.
if os.environ.get(DISTRIBUTED_DISABLED) == "True":
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1235 +/- ##
==========================================
- Coverage 88.66% 88.63% -0.04%
==========================================
Files 153 153
Lines 13793 13796 +3
Branches 1927 1928 +1
==========================================
- Hits 12230 12228 -2
- Misses 1109 1112 +3
- Partials 454 456 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it useful? Runs faster?
Skip running UDFs in the separate processes (workers) and run it in-place. This helps to find out if problem is within UDF running machinery or somewhere else. Also logs from UDFs right now are not processed completely, this helps to see the logs (this is temporary reason until I'll this this issue). I have used this few times on localhost while reproducing bugs, it would be nice to have it dev/prod environment for easier investigations. |
Add option to disable distributed mode by set environment variable:
This will be useful to debug queries in SaaS.
Summary by Sourcery
Add an environment variable option to disable distributed mode for debugging.
New Features: