-
Notifications
You must be signed in to change notification settings - Fork 630
Update inference to resume from temporary result file when possible #1734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a resume mechanism for inference jobs by leveraging temporary scratch files, reducing redundant computations after failures. Key changes include refactoring tests to use the unified infer method, adding scratch file handling in multiple inference engines, and integrating resume logic in the base inference engine.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/unit/inference/test_vllm_inference_engine.py | Removed direct infer_from_file call to use unified infer method for resumed inference |
tests/unit/inference/test_remote_inference_engine.py | Updated inference invocation to reflect the new resume mechanism |
tests/unit/inference/test_generation_params.py | Added mocks for tokenizer.batch_decode to support new generation behavior |
tests/unit/inference/test_base_inference_engine.py | Added tests to validate scratch file creation, resume logic, and cleanup behavior |
tests/integration/infer/test_infer.py | Adjusted conversation comparison and removed redundant GPU marker |
src/oumi/inference/{vllm, native_text, llama_cpp}_inference_engine.py | Removed direct cleanup calls to centralize scratch file handling in the base engine |
src/oumi/core/inference/base_inference_engine.py | Integrated scratch file loading, filtering of completed conversations, and deterministic conversation ID generation |
Comments suppressed due to low confidence (2)
tests/unit/inference/test_vllm_inference_engine.py:429
- The direct call to infer_from_file was removed; ensure that the unified infer method adequately tests the resume-from-scratch behavior and that all expected outcomes are validated elsewhere in the suite.
- result = engine.infer_from_file(str(input_path), inference_config)
src/oumi/core/inference/base_inference_engine.py:309
- The computation of the inference_hash in _get_scratch_filepath depends on _dataset_hash; ensure that _dataset_hash is correctly computed in all cases (even when the input conversation list is empty) to avoid potential hash collisions or unexpected file paths.
return str(Path.cwd() / "tmp" / f"temp_inference_output_{inference_hash}.jsonl")
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you confirm that this was tested? Ideally both a local and remote inference engine are tested.
Can confirm with local from manual exception throwing:
|
Description
Inference jobs write results to a temporary file throughout, but we should also be resuming from said files when restarting the job.
This PR loads the scratch file before inference, skips any inputs that have already completed, runs inference, then reads all results from the scratch file, cleans up the scratch file, then returns.
Related issues
Towards OPE-1307
Fixes OPE-1257
Before submitting
Reviewers
At least one review from a member of
oumi-ai/oumi-staff
is required.