Skip to content

Conversation

@iluise
Copy link
Collaborator

@iluise iluise commented Jan 29, 2026

Description

add code for standard inference + evaluation for jepa/dinov3 etc..
usage:

agpu
uv run ssl_analysis --run-id <run id> (optional: -- verbose)

Issue Number

Closes #1746

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

iluise and others added 30 commits January 16, 2026 16:56
* rm model_forward assignment in val

* rm clutter from diffusion branch

* reverse if order
* Fix bug with diagnostic streams

* Avoid that empty decoders are allocated
* Doing something wrong

* Make fine-tuning work

* Rename sensibly
* Enable multiple student views for one target

* Improved readability
* add pin mem to IOReaderData

* add pin mem to sample & modelbatch class

* add pin mem to stream data

* add pin mem to training loop

* run /scripts/actions.sh lint

* run ./scripts/actions.sh unit-test

* ignore check torch import in package

* move pinning to MultiStreamDataSampler

* add _pin_tensor & _pin_tensor_list helper func

* ruff the code

* move back pin mem. to train loop

* Remove the ignore-import-error rule and revert to the state before the change

* create protocol for pinnable obj

* remove pin_mem from IOReaderData class

* add pin_memory to Trainer.validate

* remove pin_memory from loader_params

* Rever export/export_inference.py to state before c3fc9a7

* change name

* revise Pinnable class description

* add memory_pinning in config, train & va loop

* use getattr to avoid CICD warning

* use setattr to avoid CICD warning

* disable pylint for self.source_tokens_lens

* Fixed issues with memory pinning due to rebasing and also adjusted config position of flag

* Reverting unadvert changes

---------

Co-authored-by: Javad Kasravi <[email protected]>
Co-authored-by: Javad Kasravi <[email protected]>
Co-authored-by: Javad kasravi <[email protected]>
* split WeatherGenReader functionality to allow reading only JSON

adding weathergen JSON reader to develop

* informative error when metrics are not there

* restore JSONreader after rebase

* JSONreader mostly restored

* MLFlow logging independent of JSON/zarr

* linting, properly cheking fsteps, ens, samples in JSONreader

* tiny change to restore the MergeReader

* lint

* enabling JSONreader to skip plots and missing scores gracefully

* required reformatting

* move skipping of metrics to the reader class

* slighly more explicit formulations

---------

Co-authored-by: Sebastian Buschow <[email protected]>
Co-authored-by: Sebastian Buschow <[email protected]>
Co-authored-by: iluise <[email protected]>
Co-authored-by: Ilaria Luise <[email protected]>
* Add target type value error

* Remove type

* Remove unused code

* Commit what shall have been committed

* Remove target readout type from config

* Add computing stream names to embedding engine

---------

Co-authored-by: Christian Lessig <[email protected]>
* add default streams + fix lead time error

* update config

* Correct a bug creating aggr issues on scores (#1685)

---------

Co-authored-by: Savvas Melidonis <[email protected]>
* add default streams + fix lead time error

* update config

* update ratio plots and bar plots for single run

* fix title

* Update config

Added support information for forecast_step configuration.

---------

Co-authored-by: Savvas Melidonis <[email protected]>
* add argument

* check stage argument

* removed unnecessary code

* arbitrary position arguments

* Fix error text

* get stage info from environment variable.

* Update run_train.py

---------

Co-authored-by: Simon Grasse <[email protected]>
* caching get_shared_wg_path()

* renaming get_path_output to get_path_results

* model and results paths from get_shared_wg_path() and removed _get_config_attribute()

* marking get_shared_wg_path() as private

* removing set_path()

* fixed call to _get_shared_wg_path

* fixed import, code clean-up, change caching decorator

* changed way of caching _get_shared_wg_base_path

* fixed typing error

* changes in Refactor shared WG path handling and model config I/O

- Simplify get_path_model/get_path_run to always resolve via _get_shared_wg_path()
- Change _get_shared_wg_path() to cached, argument-free helper returning the shared working dir from private config
- Adjust model config save/load to build filenames relative to the run’s model directory instead of passing parent paths around
- Update load_run_config and load_merge_configs to use new path helpers and improve assertion/log messages
- Replace internal _get_shared_wg_path("results") usages with get_path_run() in wegen_reader and train_logger

* fixed base_path in metrics_path

* fixed forgotten config.general

* fixed lint raised issues

* Improve path handling and add missing docstrings

- Add docstrings to 10+ utility functions for better documentation
- Refactor load_run_config to improve path construction logic
- Move mini_epoch string formatting from _get_model_config_file_read_name
  to caller for better separation of concerns
- Add validation for mini_epoch_str format with descriptive error messages
- Fix multi-line docstring format in _load_private_conf

* fixed line too long

* reverting to previous _get_model_config_file_read_name()

* pretty fix for _get_model_config_file_read_name

* pretty fix for _get_model_config_file_read_name

* removed unused/undefined path
* replace '_' with '-'

* cli options underscore to dash

* change underscores to hyphens

* rename options in cli unit test
Co-authored-by: Savvas Melidonis <[email protected]>
* rename write_num_samples to num_samples

* Fixing linting

---------

Co-authored-by: Christian Lessig <[email protected]>
* remove misleading logging of mini_epoch

* add forecast_steps logging
* Fix duplicate run_id in results and runplots paths. Linting.

* remove duplicate run_id also from metrics directory

* Linting
@iluise iluise self-assigned this Jan 29, 2026
@iluise iluise added the eval anything related to the model evaluation pipeline label Jan 29, 2026
@iluise iluise changed the title Iluise/develop/eval latent space implement evaluation routine for SSL Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval anything related to the model evaluation pipeline

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Implement evaluation pipeline for SSL