Westlake-AI
diff --git a/‎.gitignore
+1-1 b/‎.gitignore
+1-1
diff --git a/‎README.md
+9-4 b/‎README.md
+9-4
diff --git a/‎detection/README.md
+8 b/‎detection/README.md
+8
diff --git a/‎detection/configs/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco.py
+42 b/‎detection/configs/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco.py
+42
diff --git a/‎detection/demo/demo.png
254 KB b/‎detection/demo/demo.png
254 KB
diff --git a/‎detection/demo/image_demo.py
+71 b/‎detection/demo/image_demo.py
+71
diff --git a/‎detection/demo/inference_demo.ipynb
+202 b/‎detection/demo/inference_demo.ipynb
+202
diff --git a/‎detection/demo/pred.png
574 KB b/‎detection/demo/pred.png
574 KB
diff --git a/‎pose_estimation/README.md
+8 b/‎pose_estimation/README.md
+8
diff --git a/‎pose_estimation/demo/coco2017_val/000000066231.jpg
160 KB b/‎pose_estimation/demo/coco2017_val/000000066231.jpg
160 KB
diff --git a/‎pose_estimation/demo/inference_demo.ipynb
+217 b/‎pose_estimation/demo/inference_demo.ipynb
+217
diff --git a/‎pose_estimation/demo/top_down_img_demo.py
+134 b/‎pose_estimation/demo/top_down_img_demo.py
+134
@@ -111,7 +111,7 @@ venv.bak/
 *.pkl
 *.pkl.json
 *.log.json
-*.jpg
+# *.jpg
 bash
 data
 data_set
 
@@ -42,12 +42,17 @@ We plan to release implementations of MogaNet in a few months. Please watch us f
 
 - [x] **ImageNet-1K** Training and Validation Code with [timm](https://github.com/rwightman/pytorch-image-models) [[code](#image-classification)] [[models](https://github.com/Westlake-AI/MogaNet/releases/tag/moganet-in1k-weights)] [[Hugging Face 🤗](https://huggingface.co/MogaNet)]
 - [x] **ImageNet-1K** Training and Validation Code in [OpenMixup](https://github.com/Westlake-AI/openmixup/tree/main/configs/classification/imagenet/moganet) / [MMPretrain (TODO)](https://github.com/open-mmlab/mmpretrain)
-- [x] Downstream Transfer to **Object Detection and Instance Segmentation on COCO** [[code](detection/)] [[models](https://github.com/Westlake-AI/MogaNet/releases/tag/moganet-det-weights)]
-- [x] Downstream Transfer to **Semantic Segmentation on ADE20K** [[code](segmentation/)] [[models](https://github.com/Westlake-AI/MogaNet/releases/tag/moganet-seg-weights)]
-- [x] Downstream Transfer to **2D Human Pose Estimation on COCO** [[code](pose_estimation/)] (baseline models are supported)
+- [x] Downstream Transfer to **Object Detection and Instance Segmentation on COCO** [[code](detection/)] [[models](https://github.com/Westlake-AI/MogaNet/releases/tag/moganet-det-weights)] [[demo](detection/demo/)]
+- [x] Downstream Transfer to **Semantic Segmentation on ADE20K** [[code](segmentation/)] [[models](https://github.com/Westlake-AI/MogaNet/releases/tag/moganet-seg-weights)] [[demo](segmentation/demo/)]
+- [x] Downstream Transfer to **2D Human Pose Estimation on COCO** [[code](pose_estimation/)] (baseline models are supported) [[models](https://github.com/Westlake-AI/MogaNet/releases/tag/moganet-pose-weights)] [[demo](pose_estimation/demo/)]
  - [ ] Downstream Transfer to **3D Human Pose Estimation** (baseline models will be supported) <!--[[code](human_pose_3d/)] (baseline models will be supported) -->
 - [x] Downstream Transfer to **Video Prediction on MMNIST** [[code](video_prediction/)] (baseline models are supported)
-- [x] Image Classification on Google Colab and Notebook Demo [[here](demo.ipynb)]
+- [x] Image Classification on Google Colab and Notebook Demo [[demo](demo.ipynb)]
+
+<p align="center">
+<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/44519745/239330216-a93e71ee-7909-485d-8257-1b34abcd61c6.jpg" width=100% height=100% 
+class="center">
+</p>
 
 
 ## Image Classification
 
@@ -71,6 +71,14 @@ python get_flops.py /path/to/config --shape 1280 800
 | Mask R-CNN | MogaNet-B | ImageNet-1K | 63.4M | 373.1G | 1x | 49.0 | 43.8 | [config](configs/mask_rcnn_moganet_base_fpn_1x_coco.py) | [log](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-det-weights/mask_rcnn_moganet_base_fpn_1x_coco.log.json) / [model](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-det-weights/mask_rcnn_moganet_base_fpn_1x_coco.pth) |
 | Mask R-CNN | MogaNet-L | ImageNet-1K | 102.1M | 495.3G | 1x | 49.4 | 44.2 | [config](configs/mask_rcnn_moganet_large_fpn_1x_coco.py) | [log](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-det-weights/mask_rcnn_moganet_large_fpn_1x_coco.log.json) / [model](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-det-weights/mask_rcnn_moganet_large_fpn_1x_coco.pth) |
 
+## Demo
+
+We provide some demos according to [MMDetection](https://github.com/open-mmlab/mmdetection/demo). Please use [inference_demo](./demo/inference_demo.ipynb) or run the following script:
+```bash
+cd demo
+python image_demo.py demo.png ../configs/moganet/mask_rcnn_moganet_small_fpn_1x_coco.py ../../work_dirs/checkpoints/mask_rcnn_moganet_small_fpn_1x_coco.pth --out-file pred.png
+```
+
 ## Training
 
 We train the model on a single node with 8 GPUs (a batch size of 16) by default. Start training with the config as:
 
@@ -0,0 +1,42 @@
+_base_ = [
+    '../_base_/models/mask_rcnn_r50_fpn.py',
+    '../_base_/datasets/coco_instance.py',
+    '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
+]
+pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth'  # noqa
+model = dict(
+    type='MaskRCNN',
+    backbone=dict(
+        _delete_=True,
+        type='SwinTransformer',
+        embed_dims=96,
+        depths=[2, 2, 6, 2],
+        num_heads=[3, 6, 12, 24],
+        window_size=7,
+        mlp_ratio=4,
+        qkv_bias=True,
+        qk_scale=None,
+        drop_rate=0.,
+        attn_drop_rate=0.,
+        drop_path_rate=0.2,
+        patch_norm=True,
+        out_indices=(0, 1, 2, 3),
+        with_cp=False,
+        convert_weights=True,
+        init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
+    neck=dict(in_channels=[96, 192, 384, 768]))
+
+optimizer = dict(
+    _delete_=True,
+    type='AdamW',
+    lr=0.0001,
+    betas=(0.9, 0.999),
+    weight_decay=0.05,
+    paramwise_cfg=dict(
+        custom_keys={
+            'absolute_pos_embed': dict(decay_mult=0.),
+            'relative_position_bias_table': dict(decay_mult=0.),
+            'norm': dict(decay_mult=0.)
+        }))
+lr_config = dict(warmup_iters=1000, step=[8, 11])
+runner = dict(max_epochs=12)
@@ -0,0 +1,71 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import asyncio
+from argparse import ArgumentParser
+
+from mmdet.apis import (async_inference_detector, inference_detector,
+                        init_detector, show_result_pyplot)
+import sys
+sys.path.append('../../')
+import models  # register_model for MogaNet
+
+
+def parse_args():
+    parser = ArgumentParser()
+    parser.add_argument('img', help='Image file')
+    parser.add_argument('config', help='Config file')
+    parser.add_argument('checkpoint', help='Checkpoint file')
+    parser.add_argument('--out-file', default=None, help='Path to output file')
+    parser.add_argument(
+        '--device', default='cuda:0', help='Device used for inference')
+    parser.add_argument(
+        '--palette',
+        default='coco',
+        choices=['coco', 'voc', 'citys', 'random'],
+        help='Color palette used for visualization')
+    parser.add_argument(
+        '--score-thr', type=float, default=0.3, help='bbox score threshold')
+    parser.add_argument(
+        '--async-test',
+        action='store_true',
+        help='whether to set async options for async inference.')
+    args = parser.parse_args()
+    return args
+
+
+def main(args):
+    # build the model from a config file and a checkpoint file
+    model = init_detector(args.config, args.checkpoint, device=args.device)
+    # test a single image
+    result = inference_detector(model, args.img)
+    # show the results
+    show_result_pyplot(
+        model,
+        args.img,
+        result,
+        palette=args.palette,
+        score_thr=args.score_thr,
+        out_file=args.out_file)
+
+
+async def async_main(args):
+    # build the model from a config file and a checkpoint file
+    model = init_detector(args.config, args.checkpoint, device=args.device)
+    # test a single image
+    tasks = asyncio.create_task(async_inference_detector(model, args.img))
+    result = await asyncio.gather(tasks)
+    # show the results
+    show_result_pyplot(
+        model,
+        args.img,
+        result[0],
+        palette=args.palette,
+        score_thr=args.score_thr,
+        out_file=args.out_file)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    if args.async_test:
+        asyncio.run(async_main(args))
+    else:
+        main(args)
@@ -66,6 +66,14 @@ We provide results of MogaNet and popular architectures (Swin, ConvNeXt, and Uni
 | UniFormer-B | 256x192 | 53.5M | 9.2G | 75.0 | 90.6 | 83.0 | 80.4 | 67.8 | 77.7 | [config](https://github.com/Westlake-AI/MogaNet/tree/main/pose_estimation/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/uniformer_b_coco_256x192.py) | [log](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-pose-weights/uniformer_b_coco_256x192.log.json) \| [model](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-pose-weights/uniformer_b_coco_256x192.pth) |
 | UniFormer-B | 384x288 | 53.5M | 14.8G | 76.7 | 90.8 | 84.0 | 81.4 | 69.3 | 79.7 | [config](https://github.com/Westlake-AI/MogaNet/tree/main/pose_estimation/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/uniformer_b_coco_384x288.py) | [log](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-pose-weights/uniformer_b_coco_384x288.log.json) \| [model](https://github.com/Westlake-AI/MogaNet/releases/download/moganet-pose-weights/uniformer_b_coco_384x288.pth) |
 
+## Demo
+
+We provide some demos according to [MMPose](https://github.com/open-mmlab/mmpose/demo). Please use [inference_demo](./demo/inference_demo.ipynb) or run the python tools with following script:
+```bash
+cd demo
+python top_down_img_demo.py path/to/config path/to/checkpoint --img-root coco2017_val --json-file ../data/coco/annotations/person_keypoints_val2017.json --show
+```
+
 ## Training
 
 We train the model on a single node with 8 GPUs by default (a batch size of 32 $\times$ 8 for Top-Down). Start training with the config as:
 
@@ -0,0 +1,134 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import os
+import warnings
+from argparse import ArgumentParser
+
+import mmcv
+from xtcocotools.coco import COCO
+
+from mmpose.apis import (inference_top_down_pose_model, init_pose_model,
+                         vis_pose_result)
+from mmpose.datasets import DatasetInfo
+
+import sys
+sys.path.append('../../')
+import models  # register_model for MogaNet
+
+
+def main():
+    """Visualize the demo images.
+
+    Require the json_file containing boxes.
+    """
+    parser = ArgumentParser()
+    parser.add_argument('pose_config', help='Config file for detection')
+    parser.add_argument('pose_checkpoint', help='Checkpoint file')
+    parser.add_argument('--img-root', type=str, default='', help='Image root')
+    parser.add_argument(
+        '--json-file',
+        type=str,
+        default='',
+        help='Json file containing image info.')
+    parser.add_argument(
+        '--show',
+        action='store_true',
+        default=False,
+        help='whether to show img')
+    parser.add_argument(
+        '--out-img-root',
+        type=str,
+        default='',
+        help='Root of the output img file. '
+        'Default not saving the visualization images.')
+    parser.add_argument(
+        '--device', default='cuda:0', help='Device used for inference')
+    parser.add_argument(
+        '--kpt-thr', type=float, default=0.3, help='Keypoint score threshold')
+    parser.add_argument(
+        '--radius',
+        type=int,
+        default=4,
+        help='Keypoint radius for visualization')
+    parser.add_argument(
+        '--thickness',
+        type=int,
+        default=1,
+        help='Link thickness for visualization')
+
+    args = parser.parse_args()
+
+    assert args.show or (args.out_img_root != '')
+
+    coco = COCO(args.json_file)
+    # build the pose model from a config file and a checkpoint file
+    pose_model = init_pose_model(
+        args.pose_config, args.pose_checkpoint, device=args.device.lower())
+
+    dataset = pose_model.cfg.data['test']['type']
+    dataset_info = pose_model.cfg.data['test'].get('dataset_info', None)
+    if dataset_info is None:
+        warnings.warn(
+            'Please set `dataset_info` in the config.'
+            'Check https://github.com/open-mmlab/mmpose/pull/663 for details.',
+            DeprecationWarning)
+    else:
+        dataset_info = DatasetInfo(dataset_info)
+
+    img_keys = list(coco.imgs.keys())
+
+    # optional
+    return_heatmap = False
+
+    # e.g. use ('backbone', ) to return backbone feature
+    output_layer_names = None
+
+    # process each image
+    for i in mmcv.track_iter_progress(range(len(img_keys))):
+        # get bounding box annotations
+        image_id = img_keys[i]
+        image = coco.loadImgs(image_id)[0]
+        image_name = os.path.join(args.img_root, image['file_name'])
+        ann_ids = coco.getAnnIds(image_id)
+
+        # make person bounding boxes
+        person_results = []
+        for ann_id in ann_ids:
+            person = {}
+            ann = coco.anns[ann_id]
+            # bbox format is 'xywh'
+            person['bbox'] = ann['bbox']
+            person_results.append(person)
+
+        # test a single image, with a list of bboxes
+        pose_results, returned_outputs = inference_top_down_pose_model(
+            pose_model,
+            image_name,
+            person_results,
+            bbox_thr=None,
+            format='xywh',
+            dataset=dataset,
+            dataset_info=dataset_info,
+            return_heatmap=return_heatmap,
+            outputs=output_layer_names)
+
+        if args.out_img_root == '':
+            out_file = None
+        else:
+            os.makedirs(args.out_img_root, exist_ok=True)
+            out_file = os.path.join(args.out_img_root, f'vis_{i}.jpg')
+
+        vis_pose_result(
+            pose_model,
+            image_name,
+            pose_results,
+            dataset=dataset,
+            dataset_info=dataset_info,
+            kpt_score_thr=args.kpt_thr,
+            radius=args.radius,
+            thickness=args.thickness,
+            show=args.show,
+            out_file=out_file)
+
+
+if __name__ == '__main__':
+    main()