Add Human3.6M pretrained models

karfly · web-flow · commit 7bd0362e5156 · 2019-10-18T18:45:26.000+03:00
* Added Human3.6M pretrained models

* Update README.md
diff --git a/README.md b/README.md
@@ -22,34 +22,27 @@ pip install -r requirements.txt
 
 #### Human3.6M
 1. Download and preprocess the dataset by following the instructions in [mvn/datasets/human36m_preprocessing/README.md](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/mvn/datasets/human36m_preprocessing/README.md).
-2. Place the preprocessed dataset to `data/human36m`. If you don't want to store the dataset in the directory with code, just create a soft symbolic link: `ln -s {PATH_TO_HUMAN36M_DATASET}  ./data/human36m`.
-3. Download pretrained backbone's weights from [here](https://drive.google.com/open?id=1TGHBfa9LsFPVS5CH6Qkcy5Jr2QsJdPEa) and place them here: `data/pretrained/human36m/pose_resnet_4.5_pixels_human36m.pth` (ResNet-152 trained on COCO dataset and finetuned jointly on MPII and Human3.6M).
-4. If you want to train Volumetric model, you need rough estimations of the 3D skeleton both for train and val splits. You have two options:
- - Rough 3D skeletons can be estimated by Algebraic model and placed to `data/precalculated_results/human36m/results_train.pkl` and `data/precalculated_results/human36m/results_val.pkl` respectively.
- - Other option is to use the ground truth (GT) estimate of the 3D skeleton by setting `use_gt_pelvis: true` in a config file. Here you don't need any precalculated results, but such training mode overestimates the resulting accuracy, because pelvis is always perfectly defined.
+2. Place the preprocessed dataset to `./data/human36m`. If you don't want to store the dataset in the directory with code, just create a soft symbolic link: `ln -s {PATH_TO_HUMAN36M_DATASET}  ./data/human36m`.
+3. Download pretrained backbone's weights from [here](https://drive.google.com/open?id=1TGHBfa9LsFPVS5CH6Qkcy5Jr2QsJdPEa) and place them here: `./data/pretrained/human36m/pose_resnet_4.5_pixels_human36m.pth` (ResNet-152 trained on COCO dataset and finetuned jointly on MPII and Human3.6M).
+4. If you want to train Volumetric model, you need rough estimations of the 3D skeleton both for train and val splits. In the paper we estimate 3D skeletons via Algebraic model. You can use [pretrained](#model-zoo) Algebraic model to produce predictions or just take [precalculated 3D skeletons](#model-zoo).
 
-#### CMU Panoptic
-*Will be added soon*
-
-## Train
-Every experiment is defined by `.config` files. Configs with experiments from the paper can be found in `experiments` directory (results can be found below):
+## Model zoo
+In this section we collect pretrained models and configs. All **pretrained weights** and **precalculated 3D skeletons** can be downloaded from [Google Drive](https://drive.google.com/open?id=1TGHBfa9LsFPVS5CH6Qkcy5Jr2QsJdPEa) and placed to `./data` dir, so that eval configs can work out-of-the-box (without additional setting of paths).
 
 **Human3.6M:**
- 1. Algebraic w/o confidences — [experiments/human36m/train/human36m_alg_no_conf.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/experiments/human36m/train/human36m_alg_no_conf.yaml)
- 2. Algebraic w/ confidences — [experiments/human36m/train/human36m_alg.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/experiments/human36m/train/human36m_alg.yaml)
- 3. Volumetric (softmax aggregation) — [experiments/human36m/train/human36m_vol_softmax.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/experiments/human36m/train/human36m_vol_softmax.yaml)
- 4. Volumetric (softmax aggregation, GT pelvis) — [experiments/human36m/train/human36m_vol_softmax_gtpelvis.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/experiments/human36m/train/human36m_vol_softmax_gtpelvis.yaml)
-
- **CMU Panoptic**
-
- *Will be added soon*
 
+| Model                | Train config                                                                                                                                                                            | Eval config                                                                                                                                                                           | Weights                                                                                    | Precalculated results                                                  | MPJPE (relative to pelvis), mm |
+|----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------:|-------------------------------:|
+| Algebraic            |         [train/human36m_alg.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/train/human36m_alg.yaml)         |         [eval/human36m_alg.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/eval/human36m_alg.yaml)         | [link](https://drive.google.com/file/d/1HAqMwH94kCfTs9jUHiuCB7vt94rMvxWe/view?usp=sharing) | [link](https://drive.google.com/drive/folders/1LCzMQswdn4UM9fbRYOZb3FmMZ7pZFyIP?usp=sharing) | 22.4                           |
+| Volumetric (softmax) | [train/human36m_vol_softmax.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/train/human36m_vol_softmax.yaml) | [eval/human36m_vol_softmax.yaml](https://github.com/karfly/learnable-triangulation-pytorch/blob/master/eval/human36m_vol_softmax.yaml) | [link](https://drive.google.com/file/d/1r6Ut3oMKPxhyxRh3PZ05taaXwekhJWqj/view?usp=sharing) |                                               —                                              | **20.5**                       |
+## Train
+Every experiment is defined by `.config` files. Configs with experiments from the paper can be found in the `./experiments` directory (see [model zoo](#model-zoo)).
 
 #### Single-GPU
-To train a Volumetric model with softmax aggregation and GT-estimated pelvises using **1 GPU**, run:
+To train a Volumetric model with softmax aggregation using **1 GPU**, run:
 ```bash
 python3 train.py \
-  --config experiments/human36m/train/human36m_vol_softmax_gtpelvis.yaml \
+  --config train/human36m_vol_softmax.yaml \
   --logdir ./logs
 ```
 
@@ -58,11 +51,11 @@ The training will start with the config file specified by `--config`, and logs (
 #### Multi-GPU (*in testing*)
 Multi-GPU training is implemented with PyTorch's [DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#distributeddataparallel). It can be used both for single-machine and multi-machine (cluster) training. To run the processes use the PyTorch [launch utility](https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py).
 
-To train a Volumetric model with softmax aggregation and GT-estimated pelvises using **2 GPUs on single machine**, run:
+To train a Volumetric model with softmax aggregation using **2 GPUs on single machine**, run:
 ```bash
 python3 -m torch.distributed.launch --nproc_per_node=2 --master_port=2345 \
   train.py  \
-  --config experiments/human36m/train/human36m_vol_softmax_gtpelvis.yaml \
+  --config train/human36m_vol_softmax.yaml \
   --logdir ./logs
 ```
 
@@ -86,7 +79,7 @@ Run:
 ```bash
 python3 train.py \
   --eval --eval_dataset val \
-  --config experiments/human36m/eval/human36m_vol_softmax.yaml \
+  --config eval/human36m_vol_softmax.yaml \
   --logdir ./logs
 ```
 Argument `--eval_dataset` can be `val` or `train`. Results can be seen in `logs` directory or in the tensorboard.
@@ -111,8 +104,8 @@ MPJPE relative to pelvis:
 | Kadkhodamohammadi & Padoy [\[5\]](#references)   	|   49.1   	|
 | [Qiu et al.](https://github.com/microsoft/multiview-human-pose-estimation-pytorch) [\[9\]](#references)   	|   26.2   	|
 | RANSAC (our implementation) 	|   27.4   	|
-| **Ours, algebraic**          	|   22.6   	|
-| **Ours, volumetric**         	| **20.8** 	|
+| **Ours, algebraic**          	|   22.4   	|
+| **Ours, volumetric**         	| **20.5** 	|
 
 <br>
 MPJPE absolute (scenes with invalid ground-truth annotations are excluded):
@@ -190,6 +183,7 @@ Volumetric triangulation additionally improves accuracy, drastically reducing th
  - [Ivan Bulygin](https://github.com/blufzzz)
 
 # News
+**18 Oct 2019:** Pretrained models (algebraic and volumetric) for Human3.6M are released.
 **8 Oct 2019:** Code is released!
 
 # References
diff --git a/experiments/human36m/eval/human36m_alg.yaml b/experiments/human36m/eval/human36m_alg.yaml
@@ -0,0 +1,74 @@
+title: "human36m_alg"
+kind: "human36m"
+vis_freq: 1000
+vis_n_elements: 10
+
+image_shape: [384, 384]
+
+opt:
+  criterion: "MSESmooth"
+  mse_smooth_threshold: 400
+
+  n_objects_per_epoch: 15000
+  n_epochs: 9999
+
+  batch_size: 8
+  val_batch_size: 100
+
+  lr: 0.00001
+
+  scale_keypoints_3d: 0.1
+
+model:
+  name: "alg"
+
+  init_weights: true
+  checkpoint: "./data/pretrained/human36m/human36m_alg_10-04-2019/checkpoints/0060/weights.pth"
+
+
+  use_confidences: true
+  heatmap_multiplier: 100.0
+  heatmap_softmax: true
+
+  backbone:
+    name: "resnet152"
+    style: "simple"
+
+    init_weights: true
+    checkpoint: "./data/pretrained/human36m/pose_resnet_4.5_pixels_human36m.pth"
+
+    num_joints: 17
+    num_layers: 152
+
+dataset:
+  kind: "human36m"
+
+  train:
+    h36m_root: "./data/human36m/processed"
+    labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
+    with_damaged_actions: true
+    undistort_images: true
+
+    scale_bbox: 1.0
+
+    shuffle: true
+    randomize_n_views: false
+    min_n_views: null
+    max_n_views: null
+    num_workers: 8
+
+  val:
+    h36m_root: "./data/human36m/processed"
+    labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
+    with_damaged_actions: true
+    undistort_images: true
+
+    scale_bbox: 1.0
+
+    shuffle: false
+    randomize_n_views: false
+    min_n_views: null
+    max_n_views: null
+    num_workers: 8
+
+    retain_every_n_frames_in_test: 1
diff --git a/experiments/human36m/eval/human36m_ransac.yaml b/experiments/human36m/eval/human36m_ransac.yaml
@@ -1,46 +1,33 @@
-title: "debug"
+title: "human36m_ransac"
 kind: "human36m"
 vis_freq: 1000
 vis_n_elements: 10
 
 image_shape: [384, 384]
 
 opt:
-  criterion: "MAE"
+  criterion: "MSESmooth"
+  mse_smooth_threshold: 400
 
-  use_volumetric_ce_loss: true
-  volumetric_ce_loss_weight: 0.01
-
-  n_objects_per_epoch: 50
+  n_objects_per_epoch: 15000
   n_epochs: 9999
 
-  batch_size: 5
-  val_batch_size: 10
+  batch_size: 8
+  val_batch_size: 100
 
-  lr: 0.0001
-  process_features_lr: 0.001
-  volume_net_lr: 0.001
+  lr: 0.00001
 
   scale_keypoints_3d: 0.1
 
 model:
-  name: "vol"
-  kind: "mpii"
-  volume_aggregation_method: "softmax"
+  name: "ransac"
 
   init_weights: false
   checkpoint: ""
 
-  use_gt_pelvis: false
-
-  cuboid_side: 2500.0
-
-  volume_size: 64
-  volume_multiplier: 1.0
-  volume_softmax: true
-
-  heatmap_softmax: true
+  direct_optimization: true
   heatmap_multiplier: 100.0
+  heatmap_softmax: true
 
   backbone:
     name: "resnet152"
@@ -58,8 +45,6 @@ dataset:
   train:
     h36m_root: "./data/human36m/processed"
     labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
-    pred_results_path: "./data/precalculated_results/human36m/results_train.pkl"
-
     with_damaged_actions: true
     undistort_images: true
 
@@ -74,8 +59,6 @@ dataset:
   val:
     h36m_root: "./data/human36m/processed"
     labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
-    pred_results_path: "./data/precalculated_results/human36m/results_val.pkl"
-
     with_damaged_actions: true
     undistort_images: true
 
@@ -87,4 +70,4 @@ dataset:
     max_n_views: null
     num_workers: 8
 
-    retain_every_n_frames_in_test: 30
+    retain_every_n_frames_in_test: 1
diff --git a/experiments/human36m/eval/human36m_vol_softmax.yaml b/experiments/human36m/eval/human36m_vol_softmax.yaml
@@ -1,4 +1,4 @@
-title: "human36m_vol_softmax_gtpelvis"
+title: "human36m_vol_softmax"
 kind: "human36m"
 vis_freq: 1000
 vis_n_elements: 10
@@ -15,7 +15,7 @@ opt:
   n_epochs: 9999
 
   batch_size: 5
-  val_batch_size: 10
+  val_batch_size: 20
 
   lr: 0.0001
   process_features_lr: 0.001
@@ -28,10 +28,10 @@ model:
   kind: "mpii"
   volume_aggregation_method: "softmax"
 
-  init_weights: false
-  checkpoint: ""
+  init_weights: true
+  checkpoint: "./data/pretrained/human36m/human36m_vol_softmax_10-08-2019/checkpoints/0040/weights.pth"
 
-  use_gt_pelvis: true
+  use_gt_pelvis: false
 
   cuboid_side: 2500.0
 
@@ -58,6 +58,7 @@ dataset:
   train:
     h36m_root: "./data/human36m/processed"
     labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
+    pred_results_path: "./data/pretrained/human36m/human36m_alg_10-04-2019/checkpoints/0060/results/train.pkl"
 
     with_damaged_actions: true
     undistort_images: true
@@ -73,6 +74,7 @@ dataset:
   val:
     h36m_root: "./data/human36m/processed"
     labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
+    pred_results_path: "./data/pretrained/human36m/human36m_alg_10-04-2019/checkpoints/0060/results/val.pkl"
 
     with_damaged_actions: true
     undistort_images: true
@@ -85,4 +87,4 @@ dataset:
     max_n_views: null
     num_workers: 8
 
-    retain_every_n_frames_in_test: 30
+    retain_every_n_frames_in_test: 1
diff --git a/experiments/human36m/train/human36m_alg.yaml b/experiments/human36m/train/human36m_alg.yaml
@@ -9,7 +9,7 @@ opt:
   criterion: "MSESmooth"
   mse_smooth_threshold: 400
 
-  n_objects_per_epoch: 10000
+  n_objects_per_epoch: 15000
   n_epochs: 9999
 
   batch_size: 8
diff --git a/mvn/datasets/human36m.py b/mvn/datasets/human36m.py
@@ -180,10 +180,8 @@ def __getitem__(self, idx):
         # save sample's index
         sample['indexes'] = idx
 
-        try:
+        if self.keypoints_3d_pred is not None:
             sample['pred_keypoints_3d'] = self.keypoints_3d_pred[idx]
-        except AttributeError:
-            pass
 
         sample.default_factory = None
         return sample
@@ -270,4 +268,4 @@ def evaluate(self, keypoints_3d_predicted, split_by_subject=False, transfer_cmu_
             'per_pose_error_relative': self.evaluate_using_per_pose_error(per_pose_error_relative, split_by_subject)
         }
 
-        return result['per_pose_error']['Average']['Average'], result
+        return result['per_pose_error_relative']['Average']['Average'], result
diff --git a/mvn/models/pose_resnet.py b/mvn/models/pose_resnet.py
@@ -372,6 +372,6 @@ def get_pose_net(config, device='cuda:0'):
             print("Parameters [{}] were not inited".format(not_inited_params))
 
         model.load_state_dict(new_pretrained_state_dict, strict=False)
-        print("Successfully loaded pretrained weights")
+        print("Successfully loaded pretrained weights for backbone")
 
     return model
diff --git a/train.py b/train.py
@@ -7,6 +7,7 @@
 from collections import defaultdict
 from itertools import islice
 import pickle
+import copy
 
 import numpy as np
 import cv2
@@ -406,7 +407,12 @@ def main(args):
 
     if config.model.init_weights:
         state_dict = torch.load(config.model.checkpoint)
-        model.load_state_dict(state_dict, strict=False)
+        for key in list(state_dict.keys()):
+            new_key = key.replace("module.", "")
+            state_dict[new_key] = state_dict.pop(key)
+
+        model.load_state_dict(state_dict, strict=True)
+        print("Successfully loaded pretrained weights for whole model")
 
     # criterion
     criterion_class = {