Skip to content

Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update)#4475

Merged
sovrasov merged 13 commits intoopen-edge-platform:release/2.5from
kprokofi:kp/merge_2.4
Jul 23, 2025
Merged

Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update)#4475
sovrasov merged 13 commits intoopen-edge-platform:release/2.5from
kprokofi:kp/merge_2.4

Conversation

@kprokofi
Copy link
Contributor

@kprokofi kprokofi commented Jul 21, 2025

Summary

  • Merge changes from 2.4.6
  • Update checkpoint loading fix

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have ran e2e tests and there is no issues.
  • I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).​
  • I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
  • I have linked related issues.

License

  • I submit my code changes under the same Apache License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2025 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

@github-actions github-actions bot added TEST Any changes in tests BUILD DOC Improvements or additions to documentation labels Jul 21, 2025
@kprokofi kprokofi changed the base branch from develop to release/2.5 July 21, 2025 14:27
@kprokofi kprokofi marked this pull request as ready for review July 21, 2025 21:47
@kprokofi kprokofi modified the milestones: 2.4.5, 2.5.0 Jul 21, 2025
@eugene123tw
Copy link
Contributor

Thanks @kprokofi! Great work on handling backward checkpoint compatibility! 👍 To make this even more robust, could you consider refactoring mock_modules_for_chkpt() to a context manager like legacy_otx_compatibility_context()? Happy to help if you need any help on the implementation.

@kprokofi kprokofi requested a review from eugene123tw July 22, 2025 14:07
Copy link
Contributor

@eugene123tw eugene123tw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kirill!

@kprokofi kprokofi requested a review from sovrasov July 23, 2025 11:26
@sovrasov sovrasov merged commit ac266a2 into open-edge-platform:release/2.5 Jul 23, 2025
14 checks passed
@eugene123tw eugene123tw linked an issue Jul 24, 2025 that may be closed by this pull request
kprokofi added a commit that referenced this pull request Aug 19, 2025
* Merge develop to release/2.5 (#4432)

* Update demo requirements (#4421)

Fix demo requirements

* Cleanup Geti task templates for anomaly task (#4420)

* Remove sub task templates for anomaly
* Move anomaly classification templates one level up
* Update model_template_id for PADIM and STFPM anomaly templates

* Restore Engine (#4430)

Restore engine.py

Signed-off-by: Ashwin Vaidya <[email protected]>

---------

Signed-off-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Vladislav Sovrasov <[email protected]>
Co-authored-by: Rajesh Gangireddy <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>

* Support OVAnomaly in OVEngine (#4436)

* fix anomaly model
* almost refactored
* refactor AnomalyOV

* Add MaskRCNN v2 Rotated Detection task via Instance Segmentation (#4437)

* ✨ Add Rotated MaskRCNN v2 model implementation and configuration files
* fix: ensure newline at end of file in rotated_det.py
* fix: reorder imports and improve error message in convert_masks_to_rotated_predictions
* Update src/otx/backend/native/models/instance_segmentation/rotated_det.py
Co-authored-by: Ashwin Vaidya <[email protected]>
---------
Co-authored-by: Ashwin Vaidya <[email protected]>

* Benchmark Refactor for 2.5 (#4435)

* Refactor benchmark criteria in performance tests to remove redundant metrics and add GPU memory tracking

* Refactor OVEngine logging and streamline benchmark task handling

* Refactor dataset info entries to remove unnecessary extra_overrides in performance benchmark tests

* Remove performance benchmark tests for anomaly detection, classification, instance segmentation, keypoint detection, semantic segmentation, and tiling instance segmentation. These tests included various model and dataset configurations along with benchmark criteria for performance evaluation.

* Remove performance benchmark workflow configuration

* Refactor benchmark.py to streamline engine initialization and remove unnecessary extra_kwargs handling

* Refactor engine initialization in Benchmark class to return engine directly from configuration

* Fix end time initialization in IterationTimer to ensure proper timing for each phase

* Refactor test assertions in TestIterationTimer to simplify data_time logging checks for batch_idx

* Fix model name in MODEL_TEST_CASES for keypoint detection benchmark

* Fix kp detection metric name

* Update documentation for 2.5 release (#4447)

* update documentation

* change to additional feature

* added edits to the documentationЭ
Ä
:

* delete product design

* change README

* small fix

* Provide XPU workarounds (release/2.5) (#4464)

* Provide workarounds for the XPU training (#4441)

* provide XPU workarounds

* add note section to the installation

* Update __init__.py

* 🐞 Fix 0 image scores in Anomaly OV model (#4469)

Bugfix

Signed-off-by: Ashwin Vaidya <[email protected]>

* Fix regression on release 2.5  (#4468)

Update adaptive early stopping configuration across multiple detection and segmentation recipes

* Improve EarlyStoppingWithWarmup docs and set check_on_train_epoch_end to False as default (#4473)

* Enhance EarlyStoppingWithWarmup functionality and add unit tests
- Set default value for check_on_train_epoch_end to False in EarlyStoppingWithWarmup.

* Fix formatter

* Introduce Classification Factory and Simplify Model Imports (#4456)

* add factory for classficaiton

* add mising files

* minor

* fix imports

* fix imports in tests 2

* fix ruff

* fix unit test

* update factory. Reply comments

* add literal to other backbones

* 🐞 Benchmark fixes for 2.5 (#4471)

Bug fixes

- Max epochs in train overrides the max_epochs value loaded from config when creating the engine
- Other fixes for benchmarking script

Signed-off-by: Ashwin Vaidya <[email protected]>

* Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update) (#4475)

* merge changes

* fix linter

* fix readme

* update modules mock

* fix unit test

* fix tox

* create context manager

* add snapshot for anomaly

* add hlabel snapshot test

* minor fix

* fix changelog

* fix linter

* Update ConfigConverter for Geti2.12 (#4477)

* add factory for classficaiton

* add mising files

* minor

* fix imports

* fix imports in tests 2

* fix ruff

* fix unit test

* fix paths

* change converter

* add configurable augmentation and input size

* temporary fix

* update ConfigConverter:

* fix linter

* update unit test for ConfigConverter

* change integration tests

* add missing file

* fix unit test

* delete templates

* update changelog

* update recipe

* fix linter

* return templates back

* (release/2.5) Remove duplicate explain() method and consolidate XAI functionality into predict() (#4493)

* Refactor XAI utilities and remove deprecated explain method

* Fix XPU training and optimization from Geti2.5 (#4486)

* apply fix to run xpu, change from_config

* fix typing'

* add example

* fix xai test

* fix linte

* fix auto batch size for XPU

* return max_epochs for atss

* add kwargs override for OTXEngine.from_config()

* use cache instead

* return train kwargs back

* minor fixes|

* reply comments

* Fix overriding train parameters (#4496)

* apply param overrides

* add additional kwargs to cache|
|

* fix unit test

* add test for overriding epochs

* add test for overriding epochs

* Fix adaptive batch size to run on CPU (#4499)

* add warning instead of raising error

* fix unit test

* Fix UFLow configuration (#4504)

add callbacks for uflow

* reimplement Gaussian noise

* Fix confidence threshold cache invalidation and filtering logic (#4498)

* Refactor confidence threshold handling in detection and instance segmentation models

* adding stage parameter to model methods for validation and testing

* Refactor metric computation in OTX models by removing stage parameter and consolidating test step logic

* fix inst-seg _filter_outputs_by_threshold

* Remove best_confidence_threshold_list from checkpoint during save and add unit tests for detection model confidence threshold logic.

* Fix format

* Enhance unit tests for detection threshold logic to ensure compatibility with Python 3.10

* Enhance unit tests for detection threshold logic to ensure compatibility with Python 3.10

* Fix tests

* Fix format

* fix tests

* update unit test

* Removing best_confidence_threshold_list and updating related unit tests for checkpoint functionality.

* Refactor checkpoint saving in OTXModel to remove unnecessary line and update comments in OTXDetectionModel for clarity on best_confidence_threshold usage.

* add RandomGaussianBlur aug

* minor fix|

* fix unit tests

* reply comments

* provide workaround for XPU batch search

* return back parameters for MaskRCNN

* fix unit test

* Fix semantic segmentation annotation handling for ExtractedMask type (#4511)

* Fix tiling when polygons are given

* Fix gaussian noise augmentation and add random gaussian blur (#4508)

* reimplement Gaussian noise

* add RandomGaussianBlur aug

* minor fix|

* fix unit tests

* reply comments

* Filter invalid annotation by task (#4515)

* Add task parameter to pre-filtering and enhance annotation validation logic

* fix unit test

* Workaround for batch size search on xpu devices (#4513)

* provide workaround for XPU batch search

* return back parameters for MaskRCNN

* fix unit test

* switch off adaprive_bs by default

* fix linter

* Fix cache args (#4522)

* reimplement Gaussian noise

* add RandomGaussianBlur aug

* minor fix|

* fix unit tests

* reply comments

* provide workaround for XPU batch search

* return back parameters for MaskRCNN

* fix unit test

* fix train args

* fix unit tests

* add tiling arrow

* fix deim recipe

* fix test_xai

* try self hosted

* try pre-commit on Ubuntu

* try to bypass unit tests

* add installing build tools

* remove sudo

* fix integration tests

* return workflow back

* fix pre-commit

---------

Signed-off-by: Ashwin Vaidya <[email protected]>
Signed-off-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Vladislav Sovrasov <[email protected]>
Co-authored-by: Rajesh Gangireddy <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Eugene Liu <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>
kprokofi added a commit that referenced this pull request Feb 23, 2026
* Merge develop to release/2.5 (#4432)

* Update demo requirements (#4421)

Fix demo requirements

* Cleanup Geti task templates for anomaly task (#4420)

* Remove sub task templates for anomaly
* Move anomaly classification templates one level up
* Update model_template_id for PADIM and STFPM anomaly templates

* Restore Engine (#4430)

Restore engine.py

Signed-off-by: Ashwin Vaidya <[email protected]>

---------

Signed-off-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Vladislav Sovrasov <[email protected]>
Co-authored-by: Rajesh Gangireddy <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>

* Support OVAnomaly in OVEngine (#4436)

* fix anomaly model
* almost refactored
* refactor AnomalyOV

* Add MaskRCNN v2 Rotated Detection task via Instance Segmentation (#4437)

* ✨ Add Rotated MaskRCNN v2 model implementation and configuration files
* fix: ensure newline at end of file in rotated_det.py
* fix: reorder imports and improve error message in convert_masks_to_rotated_predictions
* Update src/otx/backend/native/models/instance_segmentation/rotated_det.py
Co-authored-by: Ashwin Vaidya <[email protected]>
---------
Co-authored-by: Ashwin Vaidya <[email protected]>

* Benchmark Refactor for 2.5 (#4435)

* Refactor benchmark criteria in performance tests to remove redundant metrics and add GPU memory tracking

* Refactor OVEngine logging and streamline benchmark task handling

* Refactor dataset info entries to remove unnecessary extra_overrides in performance benchmark tests

* Remove performance benchmark tests for anomaly detection, classification, instance segmentation, keypoint detection, semantic segmentation, and tiling instance segmentation. These tests included various model and dataset configurations along with benchmark criteria for performance evaluation.

* Remove performance benchmark workflow configuration

* Refactor benchmark.py to streamline engine initialization and remove unnecessary extra_kwargs handling

* Refactor engine initialization in Benchmark class to return engine directly from configuration

* Fix end time initialization in IterationTimer to ensure proper timing for each phase

* Refactor test assertions in TestIterationTimer to simplify data_time logging checks for batch_idx

* Fix model name in MODEL_TEST_CASES for keypoint detection benchmark

* Fix kp detection metric name

* Update documentation for 2.5 release (#4447)

* update documentation

* change to additional feature

* added edits to the documentationЭ
Ä
:

* delete product design

* change README

* small fix

* Provide XPU workarounds (release/2.5) (#4464)

* Provide workarounds for the XPU training (#4441)

* provide XPU workarounds

* add note section to the installation

* Update __init__.py

* 🐞 Fix 0 image scores in Anomaly OV model (#4469)

Bugfix

Signed-off-by: Ashwin Vaidya <[email protected]>

* Fix regression on release 2.5  (#4468)

Update adaptive early stopping configuration across multiple detection and segmentation recipes

* Improve EarlyStoppingWithWarmup docs and set check_on_train_epoch_end to False as default (#4473)

* Enhance EarlyStoppingWithWarmup functionality and add unit tests
- Set default value for check_on_train_epoch_end to False in EarlyStoppingWithWarmup.

* Fix formatter

* Introduce Classification Factory and Simplify Model Imports (#4456)

* add factory for classficaiton

* add mising files

* minor

* fix imports

* fix imports in tests 2

* fix ruff

* fix unit test

* update factory. Reply comments

* add literal to other backbones

* 🐞 Benchmark fixes for 2.5 (#4471)

Bug fixes

- Max epochs in train overrides the max_epochs value loaded from config when creating the engine
- Other fixes for benchmarking script

Signed-off-by: Ashwin Vaidya <[email protected]>

* Merge 2.4.6 to 2.5.0 (Fix checkpoint loading update) (#4475)

* merge changes

* fix linter

* fix readme

* update modules mock

* fix unit test

* fix tox

* create context manager

* add snapshot for anomaly

* add hlabel snapshot test

* minor fix

* fix changelog

* fix linter

* Update ConfigConverter for Geti2.12 (#4477)

* add factory for classficaiton

* add mising files

* minor

* fix imports

* fix imports in tests 2

* fix ruff

* fix unit test

* fix paths

* change converter

* add configurable augmentation and input size

* temporary fix

* update ConfigConverter:

* fix linter

* update unit test for ConfigConverter

* change integration tests

* add missing file

* fix unit test

* delete templates

* update changelog

* update recipe

* fix linter

* return templates back

* (release/2.5) Remove duplicate explain() method and consolidate XAI functionality into predict() (#4493)

* Refactor XAI utilities and remove deprecated explain method

* Fix XPU training and optimization from Geti2.5 (#4486)

* apply fix to run xpu, change from_config

* fix typing'

* add example

* fix xai test

* fix linte

* fix auto batch size for XPU

* return max_epochs for atss

* add kwargs override for OTXEngine.from_config()

* use cache instead

* return train kwargs back

* minor fixes|

* reply comments

* Fix overriding train parameters (#4496)

* apply param overrides

* add additional kwargs to cache|
|

* fix unit test

* add test for overriding epochs

* add test for overriding epochs

* Fix adaptive batch size to run on CPU (#4499)

* add warning instead of raising error

* fix unit test

* Fix UFLow configuration (#4504)

add callbacks for uflow

* reimplement Gaussian noise

* Fix confidence threshold cache invalidation and filtering logic (#4498)

* Refactor confidence threshold handling in detection and instance segmentation models

* adding stage parameter to model methods for validation and testing

* Refactor metric computation in OTX models by removing stage parameter and consolidating test step logic

* fix inst-seg _filter_outputs_by_threshold

* Remove best_confidence_threshold_list from checkpoint during save and add unit tests for detection model confidence threshold logic.

* Fix format

* Enhance unit tests for detection threshold logic to ensure compatibility with Python 3.10

* Enhance unit tests for detection threshold logic to ensure compatibility with Python 3.10

* Fix tests

* Fix format

* fix tests

* update unit test

* Removing best_confidence_threshold_list and updating related unit tests for checkpoint functionality.

* Refactor checkpoint saving in OTXModel to remove unnecessary line and update comments in OTXDetectionModel for clarity on best_confidence_threshold usage.

* add RandomGaussianBlur aug

* minor fix|

* fix unit tests

* reply comments

* provide workaround for XPU batch search

* return back parameters for MaskRCNN

* fix unit test

* Fix semantic segmentation annotation handling for ExtractedMask type (#4511)

* Fix tiling when polygons are given

* Fix gaussian noise augmentation and add random gaussian blur (#4508)

* reimplement Gaussian noise

* add RandomGaussianBlur aug

* minor fix|

* fix unit tests

* reply comments

* Filter invalid annotation by task (#4515)

* Add task parameter to pre-filtering and enhance annotation validation logic

* fix unit test

* Workaround for batch size search on xpu devices (#4513)

* provide workaround for XPU batch search

* return back parameters for MaskRCNN

* fix unit test

* switch off adaprive_bs by default

* fix linter

* Fix cache args (#4522)

* reimplement Gaussian noise

* add RandomGaussianBlur aug

* minor fix|

* fix unit tests

* reply comments

* provide workaround for XPU batch search

* return back parameters for MaskRCNN

* fix unit test

* fix train args

* fix unit tests

* add tiling arrow

* fix deim recipe

* fix test_xai

* try self hosted

* try pre-commit on Ubuntu

* try to bypass unit tests

* add installing build tools

* remove sudo

* fix integration tests

* return workflow back

* fix pre-commit

---------

Signed-off-by: Ashwin Vaidya <[email protected]>
Signed-off-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Vladislav Sovrasov <[email protected]>
Co-authored-by: Rajesh Gangireddy <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Eugene Liu <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BUILD DOC Improvements or additions to documentation TEST Any changes in tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Model Checkpoint Compatibility Testing

3 participants