Skip to content

Update inference to resume from temporary result file when possible #1734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 20, 2025

Conversation

jgreer013
Copy link
Contributor

@jgreer013 jgreer013 commented Jun 5, 2025

Description

Inference jobs write results to a temporary file throughout, but we should also be resuming from said files when restarting the job.

This PR loads the scratch file before inference, skips any inputs that have already completed, runs inference, then reads all results from the scratch file, cleans up the scratch file, then returns.

Related issues

Towards OPE-1307
Fixes OPE-1257

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

@jgreer013 jgreer013 requested review from oelachqar, taenin and Copilot June 9, 2025 16:36
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a resume mechanism for inference jobs by leveraging temporary scratch files, reducing redundant computations after failures. Key changes include refactoring tests to use the unified infer method, adding scratch file handling in multiple inference engines, and integrating resume logic in the base inference engine.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/inference/test_vllm_inference_engine.py Removed direct infer_from_file call to use unified infer method for resumed inference
tests/unit/inference/test_remote_inference_engine.py Updated inference invocation to reflect the new resume mechanism
tests/unit/inference/test_generation_params.py Added mocks for tokenizer.batch_decode to support new generation behavior
tests/unit/inference/test_base_inference_engine.py Added tests to validate scratch file creation, resume logic, and cleanup behavior
tests/integration/infer/test_infer.py Adjusted conversation comparison and removed redundant GPU marker
src/oumi/inference/{vllm, native_text, llama_cpp}_inference_engine.py Removed direct cleanup calls to centralize scratch file handling in the base engine
src/oumi/core/inference/base_inference_engine.py Integrated scratch file loading, filtering of completed conversations, and deterministic conversation ID generation
Comments suppressed due to low confidence (2)

tests/unit/inference/test_vllm_inference_engine.py:429

  • The direct call to infer_from_file was removed; ensure that the unified infer method adequately tests the resume-from-scratch behavior and that all expected outcomes are validated elsewhere in the suite.
-        result = engine.infer_from_file(str(input_path), inference_config)

src/oumi/core/inference/base_inference_engine.py:309

  • The computation of the inference_hash in _get_scratch_filepath depends on _dataset_hash; ensure that _dataset_hash is correctly computed in all cases (even when the input conversation list is empty) to avoid potential hash collisions or unexpected file paths.
return str(Path.cwd() / "tmp" / f"temp_inference_output_{inference_hash}.jsonl")

Copy link
Contributor

@wizeng23 wizeng23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm that this was tested? Ideally both a local and remote inference engine are tested.

@jgreer013
Copy link
Contributor Author

Could you confirm that this was tested? Ideally both a local and remote inference engine are tested.

Can confirm with local from manual exception throwing:

[base_inference_engine.py:158] Found 2 completed conversations. Processing remaining 0 conversations.

@jgreer013 jgreer013 merged commit 15a6c68 into main Jun 20, 2025
5 checks passed
@jgreer013 jgreer013 deleted the jgreer013/inference-start-from-temp branch June 20, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants