Releases: jpata/particleflow
v2.1.0
v2.0.0
v1.9.0
What's Changed
- fix CMS instructions by @jpata in #334
- MLPF datasets v2.0.0: track pythia-level genjets, genmet in datasets; add per-particle ispu flag by @jpata in #332
- CMS training instructions by @jpata in #336
- Retraining with CMS samples v2.0.0 by @jpata in #337
- Fix use of deprecated Ray Tune environment variable by @erwulff in #338
- CMS dataset relabel, generate v2.1.0 with more stats, separate binary classifier by @jpata in #340
- Pre-layernorm by @erwulff in #339
- Switch to datasets v2.2.0 by @jpata in #341
- try to improve val loss stability by @jpata in #342
- Regression of log-transformed energy and pt, training checkpoints by @jpata in #343
- CLIC dataset v2.2.0, CMS dataset 2.4.0 by @jpata in #345
- Update dataset validation plot notebooks by @jpata in #347
Full Changelog: v1.8.0...v1.9.0
v1.8.0
What's Changed
The focus of this release is training on CMS datasets. The model has been retrained on high-statistics CMS samples and outperforms the baseline in our MLPF samples. The export of the transformer model to ONNX works with Flash Attention, and it can be integrated with CMSSW 14 and run on GPU. We have run the first physics validations in CMSSW, and we find that the performance with respect to the previous CMS version of MLPF is improved, but it does not yet outperform the baseline PF in CMSSW.
Some slides from the CMS progress:
- https://indico.cern.ch/event/1399778/#2-cms-status
- https://indico.cern.ch/event/1415765/#2-cms-status-and-plans
- https://indico.cern.ch/event/1421798/#2-cms-status-and-plans-virtual
- https://indico.cern.ch/event/1426959/#2-cms-status-and-plans
The full list of PRs:
- Remove pytorch geometric by @jpata in #310
- add new paper to README by @jpata in #312
- Add Ray Train training to GitHub actions CI/CD test by @erwulff in #314
- CMSSW documentation by @jpata in #319
- Full CMS pytorch training in May 2024 by @jpata in #316
- update CMSSW validation scripts and documentation by @jpata in #322
- onnx export with dynamic shapes, fast attention by @jpata in #324
- switch ONNX model to full float for CMSSW compatibility by @jpata in #325
- Update validation scripts to CMSSW_14_1_0 by @jpata in #323
- update cmssw plots, add ttbar sample to valid, add multiparticlegun and vbf to training by @jpata in #330
Full Changelog: v1.7.0...v1.8.0
v1.7.0
What's Changed
The primary feature of the new release is that pytorch is now the main mode of training.
The CMS status was presented at https://indico.cern.ch/event/1399688/#1-ml-for-pf.
- switch pytorch training to tfds array-record datasets by @farakiko in #228
- Timing the ONNX model, retrain CMS-GNNLSH-TF by @jpata in #229
- fixes for pytorch, CMS t1tttt dataset, update response plots by @jpata in #232
- fix pytorch multi-GPU training hang by @farakiko in #233
- feat: specify number of samples as cmd line arg in pytorch training and testing by @erwulff in #237
- Automatically name training dir in pytorch pipeline by @erwulff in #238
- pytorch backend major update by @farakiko in #240
- Update dist.barrier() and fix stale epochs for torch backend by @farakiko in #249
- multi-bin loss in TF, plot fixes by @jpata in #234
- PyTorch distributed num-workers>0 fix by @farakiko in #252
- speedup of the pytorch GNN-LSH model by @jpata in #245
- Implement HPO for PyTorch pipeline. by @erwulff in #246
- fix tensorboard error by @farakiko in #254
- fix config files by @erwulff in #255
- making the 3d-padded models more efficient in pytorch by @jpata in #256
- Fix pytorch inference after #256 by @jpata in #257
- Update training.py by @jpata in #261
- Reduce the number of data loader workers per dataset in pytorch by @farakiko in #262
- fix inference by @farakiko in #264
- Implementing configurable checkpointing. by @erwulff in #263
- restore onnx export in pytorch by @jpata in #265
- remove outdated forward_batch from pytorch by @jpata in #266
- Separate multiparticlegun samples from singleparticle gun samples by @farakiko in #267
- compare all three models in pytorch by @jpata in #268
- Allows testing on a given --load-checkpoint by @farakiko in #269
- added clic evaluation notebook by @jpata in #272
- Fix --load-checkpoint bug by @farakiko in #270
- Implement CometML logging to PyTorch training pipeline. by @erwulff in #273
- Add command line argument to choose experiments dir in PyTorch training pipeline by @erwulff in #274
- Implement multi-gpu training in HPO with Ray Tune and Ray Train by @erwulff in #277
- Better CometML logging + Ray Train vs DDP comparison by @erwulff in #278
- Fix checkpoint loading by @erwulff in #280
- Learning rate schedules and Mamba layer by @erwulff in #282
- use modern optimizer, revert multi-bin loss in TF by @jpata in #253
- track individual particle loss components, speedup inference by @jpata in #284
- Update the jet pt threshold to be the same as the PF paper by @farakiko in #283
- towards v1.7: new CMS datasets, CLIC hit-based datasets, TF backward-compat optimizations by @jpata in #285
- fix torch no grad by @jpata in #290
- pytorch regression output layer configurability by @jpata in #291
- Implement resume-from-checkpoint in HPO by @erwulff in #293
- enable FlashAttention in pytorch, update to torch 2.2.0 by @jpata in #292
- fix pad_power_of_two by @jpata in #296
- Feat val freq by @erwulff in #298
- normalize loss, reparametrize network by @jpata in #297
- fix up configs by @jpata in #300
- clean up loading by @jpata in #301
- Fix unpacking for 3d padded batch, update plot style by @jpata in #306
Full Changelog: v1.6...v1.7.0
v1.6.2
v1.6.1
v1.6
What's Changed
- pin matplotlib due to breaking changes in mpl 3.8.0 by @jpata in #210
- Update README_tf.md by @jpata in #209
- Update README.md by @jpata in #213
- implement GNN-LSH model in torch by @jpata in #211
- small fixes for CMS training, torch onnx export by @jpata in #215
- updates for v1.6 by @jpata in #219
- Improve the training and evaluation recipe for the benchmark model provided in the paper. Addresses #217
- fix #222
- fix #220
- if padded batching is used, compute number of steps naively, do not step through the dataset to get the number of steps
- split the Delphes dataset clearly to ttbar and qcd, like the other datasets
- regenerate the Delphes dataset using tfds with array_record
- fix #225
Full Changelog: https://github.com/jpata/particleflow/commits/v1.6
MLPF training with CLIC simulation
MLPF as in the upcoming paper, with training on CLIC simulation.
Cleanup of v1.5 with BFG to remove LFS errors.
What's Changed
- add license (Apache 2.0) by @jmduarte in #86
- hep_tfds support in raytune by @erwulff in #82
- update with larger dataset 1.2.0 [TF] by @jpata in #84
- Add raytune algorithms by @erwulff in #85
- Minor raytune fixes by @erwulff in #87
- Load best weights before saving model after end on training by @erwulff in #88
- Add Bayesian Optimization to raytune command, using nevergrad or scikit-optimize by @erwulff in #89
- ACAT2021 benchmark by @jpata in #92
- New best hyperparameter config by @erwulff in #90
- Rewrite
tf.einsum
usingtf.math.multiply
by @jmduarte in #93 - Update plots for ACAT'21 based on PPD suggestions by @jpata in #94
- Specify version of Ray Tune in GitHub tests by @erwulff in #99
- Gen/Sim training dataset by @jpata in #100
- Fix raytune imports by @erwulff in #106
- Better CMS dataset, fix f16, fix transformer by @jpata in #105
- Fix output decoding for PFNetDense by @jpata in #107
- feat: Ray Tune analysis on JUWELS by @erwulff in #111
- Up flatiron modules by @erwulff in #112
- fix optimizer save/load, add mpnn config, get f16 training to work by @jpata in #108
- updated/documented lrp and pytorch pipeline by @farakiko in #110
- Fix PCgrad loading, trainable weights by @jpata in #113
- Fix quickstart nb by @erwulff in #116
- Comet-ml offline logging by @erwulff in #115
- optimized pytorch geometric pipeline using DDP by @farakiko in #118
- Fix bug in CustomCallback class by @erwulff in #119
- Hypertuning development by @erwulff in #120
- Ray cleanup by @erwulff in #121
- June 2022 update: new datasets, jet/MET level validation, additional loss terms by @jpata in #114
- Add multinode training using Horovod by @MaPoKen in #104
- log jet/met reso, make event loss configurable, add sliced Wasserstein loss by @jpata in #123
- fix small bug in eval by @jpata in #127
- Gen jet loss by @jmduarte in #126
- Faster test, pre-commit formatting, general cleanup by @jpata in #129
- Pre commit fixes by @farakiko in #131
- Comparison job for different event losses by @jpata in #132
- Fix lr logging by @erwulff in #137
- integrate hep_tfds, September 2022 benchmark training by @jpata in #136
- MET loss as an option by @jpata in #138
- added MET file by @jpata in #139
- Fix MET loss, validation in CMSSW by @jpata in #141
- Bump tensorflow from 2.9 to 2.9.1 by @dependabot in #143
- Ray Tune checkpointing fix, allow LR schedules for non-PCGrad opt, and more. by @erwulff in #142
- PCGrad with LR schedules, resume from checkpoint with LR schedules by @erwulff in #145
- Add ability to train on Habana Gaudi by @jmduarte in #135
- high-pT gun samples by @jpata in #144
- Additional gun samples, move padding from dataset to model, change response plot definition, update transformer model by @jpata in #146
- Add benchmarking utilities by @erwulff in #147
- added clic pipeline from parquet by @jpata in #149
- added acat2022 model by @jpata in #148
- fix num_cpus flag by @jpata in #150
- Bump tensorflow from 2.10.0 to 2.10.1 by @dependabot in #151
- training on LUMI HPC by @jpata in #152
- Refactoring, CLIC datasets by @jpata in #153
- format black by @jpata in #154
- ssl-based mlpf first iteration by @farakiko in #158
- Fix legacy CLIC dataset pdgid by @jpata in #160
- edm4hep postprocessing by @jpata in #159
- SSL updates: pipeline, new datasets and jet clustering by @farakiko in #161
- tune the pytorch MLPF model to be more similar to TF by @jpata in #165
- Raytune updates, LR-finder bug fix by @erwulff in #164
- tuning the downstream MLPF model [pytorch] by @jpata in #166
- Ssl finetuning by @farakiko in #167
- Update data split mode for SSL studies [pytorch] by @farakiko in #168
- few fixes and cleanups to pytorch, update sim scripts by @jpata in #169
- update clic plots by @jpata in #171
- fix: error in raytune search space by @erwulff in #170
- optimizing VICReg by @farakiko in #173
- update CLIC dataset, retrain MLPF by @jpata in #172
- update README by @jpata in #175
- Refactoring by @farakiko in #174
- clean up dataset prep by @jpata in #176
- fix loader, readd tensorboard by @jpata in #177
- Additional small fixes to pytorch by @jpata in #178
- standardize input features, re-enable fp16 [TF], unify plotting [pytorch] by @jpata in #179
- Pin torch to 1.13.0 by @jpata in #180
- CLIC new samples with 1M events by @jpata in #181
- CLIC new datasets, hit based training option by @jpata in #182
- update hit-based training by @jpata in #184
- TF perf tuning, CLIC benchmarks, flatiron scripts by @erwulff in #185
- Add inference command by @erwulff in #187
- clean up repo by @jpata in #188
- scale test on lumi, fix horovod by @jpata in #189
- switch to tfds array_record, improve visualization, dataset descriptions by @jpata in #190
New Contributors
Full Changelog: v1.4...v1.5
Baseline MLPF model for CMS
MLPF CMS status as reported in the PF group: