Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update) by kprokofi · Pull Request #4475 · open-edge-platform/training_extensions

kprokofi · 2025-07-21T14:19:12Z

Summary

Merge changes from 2.4.6
Update checkpoint loading fix

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have ran e2e tests and there is no issues.
I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
I have linked related issues.

License

I submit my code changes under the same Apache License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

src/otx/backend/native/models/instance_segmentation/base.py

src/otx/backend/native/utils/utils.py

eugene123tw · 2025-07-22T10:14:51Z

Thanks @kprokofi! Great work on handling backward checkpoint compatibility! 👍 To make this even more robust, could you consider refactoring mock_modules_for_chkpt() to a context manager like legacy_otx_compatibility_context()? Happy to help if you need any help on the implementation.

eugene123tw

Thanks Kirill!

CHANGELOG.md

* Merge develop to release/2.5 (#4432) * Update demo requirements (#4421) Fix demo requirements * Cleanup Geti task templates for anomaly task (#4420) * Remove sub task templates for anomaly * Move anomaly classification templates one level up * Update model_template_id for PADIM and STFPM anomaly templates * Restore Engine (#4430) Restore engine.py Signed-off-by: Ashwin Vaidya <[email protected]> --------- Signed-off-by: Ashwin Vaidya <[email protected]> Co-authored-by: Vladislav Sovrasov <[email protected]> Co-authored-by: Rajesh Gangireddy <[email protected]> Co-authored-by: Ashwin Vaidya <[email protected]> * Support OVAnomaly in OVEngine (#4436) * fix anomaly model * almost refactored * refactor AnomalyOV * Add MaskRCNN v2 Rotated Detection task via Instance Segmentation (#4437) * ✨ Add Rotated MaskRCNN v2 model implementation and configuration files * fix: ensure newline at end of file in rotated_det.py * fix: reorder imports and improve error message in convert_masks_to_rotated_predictions * Update src/otx/backend/native/models/instance_segmentation/rotated_det.py Co-authored-by: Ashwin Vaidya <[email protected]> --------- Co-authored-by: Ashwin Vaidya <[email protected]> * Benchmark Refactor for 2.5 (#4435) * Refactor benchmark criteria in performance tests to remove redundant metrics and add GPU memory tracking * Refactor OVEngine logging and streamline benchmark task handling * Refactor dataset info entries to remove unnecessary extra_overrides in performance benchmark tests * Remove performance benchmark tests for anomaly detection, classification, instance segmentation, keypoint detection, semantic segmentation, and tiling instance segmentation. These tests included various model and dataset configurations along with benchmark criteria for performance evaluation. * Remove performance benchmark workflow configuration * Refactor benchmark.py to streamline engine initialization and remove unnecessary extra_kwargs handling * Refactor engine initialization in Benchmark class to return engine directly from configuration * Fix end time initialization in IterationTimer to ensure proper timing for each phase * Refactor test assertions in TestIterationTimer to simplify data_time logging checks for batch_idx * Fix model name in MODEL_TEST_CASES for keypoint detection benchmark * Fix kp detection metric name * Update documentation for 2.5 release (#4447) * update documentation * change to additional feature * added edits to the documentationЭ Ä : * delete product design * change README * small fix * Provide XPU workarounds (release/2.5) (#4464) * Provide workarounds for the XPU training (#4441) * provide XPU workarounds * add note section to the installation * Update __init__.py * 🐞 Fix 0 image scores in Anomaly OV model (#4469) Bugfix Signed-off-by: Ashwin Vaidya <[email protected]> * Fix regression on release 2.5 (#4468) Update adaptive early stopping configuration across multiple detection and segmentation recipes * Improve EarlyStoppingWithWarmup docs and set check_on_train_epoch_end to False as default (#4473) * Enhance EarlyStoppingWithWarmup functionality and add unit tests - Set default value for check_on_train_epoch_end to False in EarlyStoppingWithWarmup. * Fix formatter * Introduce Classification Factory and Simplify Model Imports (#4456) * add factory for classficaiton * add mising files * minor * fix imports * fix imports in tests 2 * fix ruff * fix unit test * update factory. Reply comments * add literal to other backbones * 🐞 Benchmark fixes for 2.5 (#4471) Bug fixes - Max epochs in train overrides the max_epochs value loaded from config when creating the engine - Other fixes for benchmarking script Signed-off-by: Ashwin Vaidya <[email protected]> * Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update) (#4475) * merge changes * fix linter * fix readme * update modules mock * fix unit test * fix tox * create context manager * add snapshot for anomaly * add hlabel snapshot test * minor fix * fix changelog * fix linter * Update ConfigConverter for Geti2.12 (#4477) * add factory for classficaiton * add mising files * minor * fix imports * fix imports in tests 2 * fix ruff * fix unit test * fix paths * change converter * add configurable augmentation and input size * temporary fix * update ConfigConverter: * fix linter * update unit test for ConfigConverter * change integration tests * add missing file * fix unit test * delete templates * update changelog * update recipe * fix linter * return templates back * (release/2.5) Remove duplicate explain() method and consolidate XAI functionality into predict() (#4493) * Refactor XAI utilities and remove deprecated explain method * Fix XPU training and optimization from Geti2.5 (#4486) * apply fix to run xpu, change from_config * fix typing' * add example * fix xai test * fix linte * fix auto batch size for XPU * return max_epochs for atss * add kwargs override for OTXEngine.from_config() * use cache instead * return train kwargs back * minor fixes| * reply comments * Fix overriding train parameters (#4496) * apply param overrides * add additional kwargs to cache| | * fix unit test * add test for overriding epochs * add test for overriding epochs * Fix adaptive batch size to run on CPU (#4499) * add warning instead of raising error * fix unit test * Fix UFLow configuration (#4504) add callbacks for uflow * reimplement Gaussian noise * Fix confidence threshold cache invalidation and filtering logic (#4498) * Refactor confidence threshold handling in detection and instance segmentation models * adding stage parameter to model methods for validation and testing * Refactor metric computation in OTX models by removing stage parameter and consolidating test step logic * fix inst-seg _filter_outputs_by_threshold * Remove best_confidence_threshold_list from checkpoint during save and add unit tests for detection model confidence threshold logic. * Fix format * Enhance unit tests for detection threshold logic to ensure compatibility with Python 3.10 * Enhance unit tests for detection threshold logic to ensure compatibility with Python 3.10 * Fix tests * Fix format * fix tests * update unit test * Removing best_confidence_threshold_list and updating related unit tests for checkpoint functionality. * Refactor checkpoint saving in OTXModel to remove unnecessary line and update comments in OTXDetectionModel for clarity on best_confidence_threshold usage. * add RandomGaussianBlur aug * minor fix| * fix unit tests * reply comments * provide workaround for XPU batch search * return back parameters for MaskRCNN * fix unit test * Fix semantic segmentation annotation handling for ExtractedMask type (#4511) * Fix tiling when polygons are given * Fix gaussian noise augmentation and add random gaussian blur (#4508) * reimplement Gaussian noise * add RandomGaussianBlur aug * minor fix| * fix unit tests * reply comments * Filter invalid annotation by task (#4515) * Add task parameter to pre-filtering and enhance annotation validation logic * fix unit test * Workaround for batch size search on xpu devices (#4513) * provide workaround for XPU batch search * return back parameters for MaskRCNN * fix unit test * switch off adaprive_bs by default * fix linter * Fix cache args (#4522) * reimplement Gaussian noise * add RandomGaussianBlur aug * minor fix| * fix unit tests * reply comments * provide workaround for XPU batch search * return back parameters for MaskRCNN * fix unit test * fix train args * fix unit tests * add tiling arrow * fix deim recipe * fix test_xai * try self hosted * try pre-commit on Ubuntu * try to bypass unit tests * add installing build tools * remove sudo * fix integration tests * return workflow back * fix pre-commit --------- Signed-off-by: Ashwin Vaidya <[email protected]> Signed-off-by: Ashwin Vaidya <[email protected]> Co-authored-by: Vladislav Sovrasov <[email protected]> Co-authored-by: Rajesh Gangireddy <[email protected]> Co-authored-by: Ashwin Vaidya <[email protected]> Co-authored-by: Eugene Liu <[email protected]> Co-authored-by: Ashwin Vaidya <[email protected]>

merge changes

64022a5

github-actions bot added TEST Any changes in tests BUILD DOC Improvements or additions to documentation labels Jul 21, 2025

fix linter

6301db5

kprokofi changed the base branch from develop to release/2.5 July 21, 2025 14:27

kprokofi added 3 commits July 21, 2025 23:27

fix readme

38adcd4

update modules mock

efce3dd

fix unit test

829ce27

kprokofi marked this pull request as ready for review July 21, 2025 21:47

kprokofi requested review from Daankrol, ashwinvaidya17, eugene123tw, rajeshgangireddy, samet-akcay and sovrasov as code owners July 21, 2025 21:47

kprokofi modified the milestones: 2.4.5, 2.5.0 Jul 21, 2025

fix tox

679c01c

eugene123tw reviewed Jul 22, 2025

View reviewed changes

src/otx/backend/native/models/instance_segmentation/base.py Show resolved Hide resolved

src/otx/backend/native/utils/utils.py Outdated Show resolved Hide resolved

kprokofi added 3 commits July 22, 2025 22:57

create context manager

9bf88ce

add snapshot for anomaly

de59f25

add hlabel snapshot test

54b31b1

kprokofi requested a review from eugene123tw July 22, 2025 14:07

kprokofi added 2 commits July 22, 2025 23:22

minor fix

b4e4429

merge release/2.5

baf189a

eugene123tw approved these changes Jul 22, 2025

View reviewed changes

sovrasov reviewed Jul 23, 2025

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

fix changelog

43786dd

kprokofi requested a review from sovrasov July 23, 2025 11:26

fix linter

9f74d6b

sovrasov approved these changes Jul 23, 2025

View reviewed changes

sovrasov merged commit ac266a2 into open-edge-platform:release/2.5 Jul 23, 2025
14 checks passed

eugene123tw linked an issue Jul 24, 2025 that may be closed by this pull request

Implement Model Checkpoint Compatibility Testing #4460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update)#4475

Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update)#4475
sovrasov merged 13 commits intoopen-edge-platform:release/2.5from
kprokofi:kp/merge_2.4

kprokofi commented Jul 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

eugene123tw commented Jul 22, 2025

Uh oh!

eugene123tw left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kprokofi commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to test

Checklist

License

Uh oh!

Uh oh!

Uh oh!

eugene123tw commented Jul 22, 2025

Uh oh!

eugene123tw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kprokofi commented Jul 21, 2025 •

edited

Loading