-
Notifications
You must be signed in to change notification settings - Fork 116
Add support for scheduled task from CLI in studio #1243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying datachain-documentation with
|
Latest commit: |
dbd0732
|
Status: | ✅ Deploy successful! |
Preview URL: | https://10d09159.datachain-documentation.pages.dev |
Branch Preview URL: | https://amrit-create-job-api.datachain-documentation.pages.dev |
Reviewer's GuideThis PR integrates scheduling support into the Studio CLI by adding --start-time and --cron flags, implementing natural‐language datetime parsing, updating the job creation flow to include scheduling parameters, and extending documentation and tests accordingly. Sequence diagram for job scheduling with --start-time and --cronsequenceDiagram
actor User
participant CLI as CLI Parser
participant Studio as Studio Logic
participant Remote as Remote Studio API
User->>CLI: datachain job run --start-time/--cron ...
CLI->>Studio: process_jobs_args(args)
Studio->>Studio: parse_start_time(args.start_time)
Studio->>Remote: create_job(..., start_time, cron)
Remote->>Remote: Prepare job data with start_after/cron_expression
Remote-->>Studio: Response (job scheduled)
Studio-->>CLI: Print scheduling confirmation
CLI-->>User: Output: Job scheduled as a task
Class diagram for updated job creation flowclassDiagram
class StudioClient {
+create_job(query, query_type, ... , start_time, cron)
}
class process_jobs_args {
+process_jobs_args(args)
}
class create_job {
+create_job(query_file, ..., start_time, cron)
}
class parse_start_time {
+parse_start_time(start_time_str)
}
StudioClient <|-- process_jobs_args
process_jobs_args o-- create_job
create_job o-- parse_start_time
StudioClient <.. create_job : calls create_job()
create_job <.. process_jobs_args : called by
parse_start_time <.. create_job : called by
Class diagram for Remote StudioClient changesclassDiagram
class StudioClient {
+create_job(query, query_type, ..., start_time, cron)
}
class Response {
+ok
+message
+data
}
StudioClient --> Response : returns
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @amritghimire - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/datachain/studio.py:269` </location>
<code_context>
+def parse_start_time(start_time_str: Optional[str]) -> Optional[str]:
</code_context>
<issue_to_address>
Consider returning a timezone-aware ISO string for consistency.
If the input lacks timezone info, consider defaulting to UTC to avoid ambiguity in downstream processing.
</issue_to_address>
### Comment 2
<location> `src/datachain/studio.py:364` </location>
<code_context>
+ # Parse start_time if provided
+ parsed_start_time = parse_start_time(start_time)
+ if cron and parsed_start_time is None:
+ parsed_start_time = datetime.now(timezone.utc).isoformat()
+
response = client.create_job(
</code_context>
<issue_to_address>
Defaulting start_time to now when cron is set may be surprising.
This behavior could cause jobs to start immediately without user intent. Consider requiring explicit start_time input or documenting this default clearly.
</issue_to_address>
### Comment 3
<location> `src/datachain/remote/studio.py:435` </location>
<code_context>
+ start_time: Optional[str] = None,
+ cron: Optional[str] = None,
) -> Response[JobData]:
data = {
"query": query,
</code_context>
<issue_to_address>
Including None values in the payload may cause issues with some APIs.
Omit start_after and cron_expression from the payload when their values are None to prevent potential API compatibility issues.
</issue_to_address>
### Comment 4
<location> `tests/test_cli_studio.py:448` </location>
<code_context>
+def test_studio_run_task(capsys, mocker, tmp_dir, studio_token):
</code_context>
<issue_to_address>
Missing test for edge cases: invalid or missing --start-time and --cron combinations.
Please add tests for cases where only --start-time, only --cron, neither, or invalid values are provided to ensure correct CLI behavior in all scenarios.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
def parse_start_time(start_time_str: Optional[str]) -> Optional[str]: | ||
if not start_time_str: | ||
return None | ||
|
||
try: | ||
# Parse the datetime string using dateparser | ||
parsed_datetime = dateparser.parse(start_time_str) | ||
|
||
if parsed_datetime is None: | ||
raise DataChainError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider returning a timezone-aware ISO string for consistency.
If the input lacks timezone info, consider defaulting to UTC to avoid ambiguity in downstream processing.
if cron and parsed_start_time is None: | ||
parsed_start_time = datetime.now(timezone.utc).isoformat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Defaulting start_time to now when cron is set may be surprising.
This behavior could cause jobs to start immediately without user intent. Consider requiring explicit start_time input or documenting this default clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
"2024-01-15 14:30:00", | ||
"2024-01-15T14:30:00Z", | ||
"2024-01-15T14:30:00+00:00", | ||
"Jan 15, 2024 2:30 PM", | ||
"15/01/2024 14:30", | ||
"2024-01-15", | ||
"tomorrow", | ||
"next week", | ||
"in 2 hours", | ||
"monday 9am", | ||
"tomorrow 3pm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how useful tests for dateparser
functionality are. I'm inclined to think we shouldn't test functionality already provided by a third-party library with a solid test suite.
(I guess this test may have been generated by a language model)
This adds two argument start time and cron to specify. These two arguments are used to schedule a task in studio.
145a2af
to
2668b30
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1243 +/- ##
==========================================
- Coverage 88.71% 88.71% -0.01%
==========================================
Files 153 153
Lines 13820 13841 +21
Branches 1932 1936 +4
==========================================
+ Hits 12261 12279 +18
- Misses 1104 1105 +1
- Partials 455 457 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
usage: datachain job run [-h] [-v] [-q] [--team TEAM] [--env-file ENV_FILE] [--env ENV [ENV ...]] | ||
[--workers WORKERS] [--files FILES [FILES ...]] [--python-version PYTHON_VERSION] | ||
[--req-file REQ_FILE] [--req REQ [REQ ...]] | ||
usage: datachain job run [-h] [-v] [-q] [--team TEAM] [--env-file ENV_FILE] [--env ENV [ENV ...]] [--cluster CLUSTER] [--workers WORKERS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has become unreadable
Screen.Recording.2025-07-16.at.9.59.18.AM.mov
* `--req-file REQ_FILE` - Python requirements file | ||
* `--req REQ` - Python package requirements | ||
* `--priority PRIORITY` - Priority for the job in range 0-5. Lower value is higher priority (default: 5) | ||
* `--repository URL` - Repository URL to clone before running the job. | ||
* `--start-time START_TIME` - Start time in ISO format or natural language for the cron task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nobody knows what ISO format is
say exactly like MMDDYYY or something, but keep it short
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still very confusing - what is the start time for the cron task? cron is defined by cron expression has exact start time ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, it can be run once it seems ... so it's not only about cron then? (the fact the we still call all these tasks cron is confusing and wrong ... cron - it seems only a particular subtype?)
@@ -99,3 +131,14 @@ datachain job run --cluster-id 1 query.py | |||
* To cancel a running job, use the `datachain job cancel` command | |||
* The job will continue running in Studio even after you stop viewing the logs | |||
* You can get the list of compute clusters using `datachain job clusters` command. | |||
* When using `--start-time` or `--cron` options, the job is scheduled as a task and will not show logs immediately. The job will be executed according to the schedule. | |||
* The `--start-time` option supports natural language parsing using the dateparser library, allowing flexible time expressions like "tomorrow 3pm", "in 2 hours", "monday 9am", etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a link to it
|
||
# Convert to ISO format string | ||
return parsed_datetime.isoformat() | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to have this along with if parsed_datetime is None:
? and repeat the message twice? let's refactor it please ...
studio_run_description = "Run a job in Studio. \n" | ||
studio_run_description += ( | ||
"When using --start-time or --cron," | ||
" the job is scheduled as a task and will not show logs immediately." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job as a task - won't be clear
just say "job is scheduled to run but won't start immediately (can be seen in the Tasks tab in UI)"
file | ||
``` | ||
|
||
## Description | ||
|
||
This command runs a job in Studio using the specified query file. You can configure various aspects of the job including environment variables, Python version, dependencies, and more. | ||
This command runs a job in Studio using the specified query file. You can configure various aspects of the job including environment variables, Python version, dependencies, and more. When using --start-time or --cron, the job is scheduled as a task and will not show logs immediately. The job will be executed according to the schedule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here also , please see below, improve the message a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amritghimire we need a followup and improve the messaging here
This adds two argument start time and cron to specify. These two
arguments are used to schedule a task in studio.
Related to https://github.com/iterative/studio/pull/11886
Summary by Sourcery
Add scheduling support to the CLI by introducing start-time and cron options for one-off and recurring tasks, parsing flexible date/time expressions with dateparser, and forwarding scheduling parameters to the Studio API.
New Features:
Enhancements:
Build:
Documentation:
Tests: