Changes for OML 3.0 #557

AlekseySh · 2024-05-10T15:50:55Z

Checklist:

Check Pipelines:

inshop (0.921)
SOP (0.866)
CARS (0.908)
CUB (0.837)
FMR@FNMR works without OOM on SOP
Training progress on SOP looks okay (0.8480 -> 0.8585 -> 0.8587 -> 0.8602)
Postprocessor validation in InShop: 0.948
Training progress of postprocessor on InShop looks okay (as before: Made inference modality agnostic in re-ranking and other parts of the repo #542 (comment))
All expected train and val values have been logged, including images

Misc

all disabled tests are turned on
code examples reworked
colab is updated
all configs in pipelines are updated
check all sort and topk function in the code that rebuild predictions. They need to be moved to RetrievalResults.
CI on oml_3.0_release branch is turned off

**Changelog** All the functions and classes on the right side are modality agnostic: * `EmbeddingPairsDataset`, `ImagePairsDataset` -> `PairDataset` * `pairwise_inference_on_images`, `pairwise_inference_on_embeddings` -> `pairwise_inference` * `IDistancesPostprocessor` -> (mostly renamed) -> `IRetrievalPostprocessor` * `PairwisePostprocessor`, `PairwiseEmbeddingsPostprocessor`, `PairwiseImagesPostprocessor` -> `PairwiseReranker` * `inference_on_images` -> `inference` * `inference_on_dataframe` -> `inference_cached` Also: * `EmbeddingMetrics` takes optional `dataset` argument in order to perform postprocessing. * Made postprocessing tests a bit more informative via making dummy models a bit less trivial (added bias to their outputs) Examples changed: * `train + val` and `prediction` for postprocessor * `retrieval usage` * added `global_paths` parameter to `download_mock_dataset` so it looks nicer

…keys. Changed signature of EmbeddingMetrics. **CHANGELOG** * removed keys: `IS_QUERY_KEY`, `IS_GALLERY_KEY`, `CATEGORIES_KEY`, `PATHS_KEY`, `X1_KEY`, `X2_KEY`, `Y1_KEY`, `Y2_KEY`, `SEQUENCE_KEY`. Categories and sequences are passed through `extra_data` instead. The rest is incapsulated in Dataset. * Removed `IMetricDDP`, `EmbeddingMetricsDDP`. Reason: having `EmbeddingMetrics` is enough, because we do accumulator sync there anyway. * Changed signatures of `EmbeddingMetrics`: keys replaces by providing dataset, removed `.sync()` and `.visualisation()` methods and so on. * Updated `.md` examples and `.rst` docs Minor: * removed: `calc_distance_matrix`, `validate_dataset` -- this logic happens in `RetrievalResults`, also removed `find_first_occurrences` -- we have `unique_by_ids` instead. * removed `DummyDataset` in tests (used `EmbeddingsQueryGalleryLabeledDataset` instead).

* `MetricValCallback` is enough to handle DDP * `samples_in_getitem` is not used

Removed visualization.ipynb

@leoromanovich

**CHANGELOG** * `RetrievalResults` uses Sequence of Tensors which may have different size. In other words, it allows us to support the case when queries have different number of retrieved items. * Consequently, changed `batched_knn`, `retrieval_metrics` and `PairwiseReranker` to support new input type. * Added assert that distances arrive sorted to `RetrievalResults`, retrieved ids are unique and other checks. New tests: * Added tests on corner cases for `RetrievalResults` creation. * Added tests on visualization when queries in `RetrievalResults` have different number of retrieved items. * Added new test with predefined values for `batched_knn` to make debugging easier. * Changed existing postprocessor tests: used `sequence` in datasets so queries have different number of retrieved items and we actually test new functionality. @leoromanovich and I also checked that using Sequence of Tensors doesn't lead to poor performance on validation.

**CHANGELOG** * Added support of empty predictions to the retrieval metrics. For example, it may be useful when we cut retrieval results by distance threshold). * Moved categories handling to functional metrics from `EmbeddingMetrics` class, also updated `.md` example to show how to deal with categories. * Added `calc_fnmr_at_fmr_rr`, removed `extract_pos_neg_dists` and `calc_fnmr_at_fmr_from_matrices`. Returned `fnmr@fmr` to `EmbeddingMetrics` (there was a todo). TESTS * Moved tests that use old formats of retrieval metrics to a separate folder: `...test_metrics/test_outdated/...`. * Added a few new tests on retrieval metrics: test handling categories and empty predictions. * Added test on `calc_fnmr_at_fmr_rr`. * Added test that `EmbeddingMetrics` calculate all expected metrics.

… so on) Moved outdated matrix functions to tests (mask_gt, mask_to_ignore and so on)

**CHANGLELOG** * Simplified handling nans in bboxes * Improved typings in training pipelines * Added show argument to `RetrievalResults.visualise()` * Specified reason for skipping cloud logging tests * Polished md examples * Added `mode_for_checkpointing` to Pipeline config

…rning into oml_3.0_release

* Added `TextBaseDataset`, `TextLabeledDataset`, `TextQueryGalleryLabeledDataset`, `TextQueryGalleryDataset`, `get_mock_texts_dataset`, `visualise_text` * Added `HFWrapper` to wrap models from HuggingFace library * `download_mock_dataset` -> `download_mock_dataset` (the original name is also kept for back compatibility)

================== Docs, Readme and examples for OML 3.0 ==================== General * Made imports shorter in all examples (updated the corresponding `__init__.py` files and `__all__` variables) * `train.md`, `val_md` -> `train_val_img_txt.md` * Joined example of using pre-trained image models, pre-trained HF text models (just added), and zoo table (moved) into one file. * Removed links to colab notebooks for all examples except for the `train_val_img_txt.md` code snippet. * Updated dataset format description: added info about `text` column. * Updated mock dataset of texts so we have more data in train. * Added handling categories example (train + val) Renaming: * `download_mock_dataset` got a short link `get_mock_images_dataset` so it looks similar to `get_mock_texts_dataset` * `RetrievalResults.compute_from_embeddings(embeddings, dataset, n_items_to_retrieve=5)` -> `RetrievalResults.from_embeddings(embeddings, dataset, n_items=5)` README: * Added release notes for OML 3.0 * Added side-by-side example of training and validation text and image models * Added `OML Features` section * Zoo section is updated * Updated FAQ, moved to Documentation section ReadTheDocs: * Added new text Datasets to docs * Added `OML Features` section to the home page * Moved getters for mock datasets from utils to Datasets page * Python examples: hide most of the examples under details. Added an example of handling categories. * Updated the page about logging. * Split post-processing section into re-ranking by model (the old content of post-processing) and algo post-processing (just a page holder for the moment). * Removed `zoo` section from post-processing by model.

MISC * Updated `check_retrieval_format` so it works with text dataset. * Added code example of usage `check_retrieval_format` (+ the corresponding test) * Made `last_logs` property and added docs for it (for triplet & arcface losses, bank miner) * Made `distances`, `retrieved_ids`, `gt_ids` documented properties of `RetrievalResults`.

ALGO POST-PROCESSING * Added `AdaptiveThresholding`, `ConstantThresholding`: * classes implementation * updated registry and configs * readthedocs: contents and algo postprocessing page with a new example * pytests and pipelines test * added thresholding to a few existing python examples * Added `is_empty` and `deepcopy` methods to `RetrievalResults`, also updated readthedocs * Removed `top_n` from `IRetrievalPostprocessor` interface * Updated code so it can work with the old NN postprocessing and new algorithmic ones. Added todo so we refactor it in future after we have more postprocessors. * Added drawing test for empty `RetrievalResults` * Used mock text dataset in categories example instead of the image one (because it has bigger categories)

Misc: * Fixed categories handling in pcf metric * Added docs to calculating metrics by rr * Added verbose argument

… branch

AlekseySh added the rework label May 10, 2024

AlekseySh self-assigned this May 10, 2024

AlekseySh linked an issue May 10, 2024 that may be closed by this pull request

[EPIC] Release OML 3.0 #522

Closed

AlekseySh added refactoring high priority labels May 11, 2024

AlekseySh added 11 commits May 12, 2024 20:32

Removed MetricValCallbackDDP and samples_in_getitem

cd9fc85

* `MetricValCallback` is enough to handle DDP * `samples_in_getitem` is not used

Removed visualization.ipynb

9cf1815

Removed visualization.ipynb

Moved outdated matrix functions to tests (mask_gt, mask_to_ignore and…

2eb8941

… so on) Moved outdated matrix functions to tests (mask_gt, mask_to_ignore and so on)

upd

221f73f

Added verbose parameter.

9d6c080

Merge branch 'oml_3.0_release' of github.com:OML-Team/open-metric-lea…

99a2a74

…rning into oml_3.0_release

changed examples optimizer to Adam

980d0fe

AlekseySh mentioned this pull request Jun 3, 2024

Make an example of using OML for texts #530

Closed

AlekseySh added 4 commits June 4, 2024 20:22

Last changes for 3.0 release

6c4e20e

Misc: * Fixed categories handling in pcf metric * Added docs to calculating metrics by rr * Added verbose argument

AlekseySh changed the title ~~OML 3.0~~ Changes for OML 3.0 Jun 6, 2024

AlekseySh added 3 commits June 6, 2024 20:26

fixed text visualisation and linters; temprorary turned on full CI on…

18d51ff

… branch

made sorting check more robust

16d5839

updated tolerance when concat distances

3f9177a

AlekseySh merged commit ca114d6 into main Jun 7, 2024
11 checks passed

AlekseySh deleted the oml_3.0_release branch June 7, 2024 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes for OML 3.0 #557

Changes for OML 3.0 #557

Uh oh!

AlekseySh commented May 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Changes for OML 3.0 #557

Changes for OML 3.0 #557

Uh oh!

Conversation

AlekseySh commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlekseySh commented May 10, 2024 •

edited

Loading