Skip to content

Commit cdad798

Browse files
committed
First commit
0 parents  commit cdad798

File tree

113 files changed

+12769
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+12769
-0
lines changed

Finetune_image_mode.md

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Requirement
2+
Main requirement:
3+
* Pytorch 0.4.1
4+
5+
If you miss any package, please install it by: `pip install missing_package`
6+
7+
# Data split
8+
Change following parameters:
9+
* `skiprows`: Number of row you want to skip
10+
Ex: If you want to skip 30k data, `skiprows=(1,30000)`
11+
* `nrows`: Number of data / class
12+
Default: 50000 data / class
13+
* `root_csv`: Directory of your `train_simplified` folder
14+
Ex:
15+
`/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/csv/train_simplified/`
16+
* `split_csv`: Directory where you want to save splited data into
17+
Ex:
18+
`/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/50k/`
19+
20+
21+
Run:
22+
`python split_data_top.py`
23+
Output:
24+
There are 340 csv files of `train` and `valid` are saved at your `split_csv`. Each csv file has `nrows` data.
25+
26+
# Run model
27+
* Configure `train.yml`
28+
In this file, please change the `main parameters` as following:
29+
* `train_split`
30+
Path to `train` folder: `{split_csv}/train`
31+
32+
* `train_token`
33+
Dont care, but it is same as `train_split`
34+
* `valid_split`
35+
Path to `valid` folder: `{split_csv}/valid`
36+
* `valid_token`
37+
Dont care, but it is same as `valid_split`
38+
39+
You can change other parameters `workers`, `batch_size`, ... to be suiatable for your environement
40+
41+
* Run
42+
```bash
43+
bash run_model.sh
44+
```
45+
Log and checkpoints will be saved to `./logs/se_resnext101_50k`. Change it as you want
46+
47+
# Predict model
48+
* Configure `inference.yml`
49+
In this file, please change:
50+
* `infer_csv`
51+
Path to your `test_simplified.csv` file
52+
53+
* Run
54+
```bash
55+
bash predict_5best.sh
56+
```
57+
We use multiple checkpoints (snapshot) during training. Ensembling 5 best checkpoints will give
58+
free 0.0005 boost.
59+
Outputs are the `logits` will be saved into your `log` folder that you defined above
60+
61+
# Predict dataset for cleaning
62+
* Configure `inference.yml`
63+
In this file, please change:
64+
* `infer_csv`
65+
Comment this line
66+
* `data_clean_train`
67+
Path to train data you want to clean
68+
* `data_clean_valid`
69+
Path to valid data you want to clean
70+
71+
* Run
72+
```bash
73+
bash predict_data_for_clean.sh
74+
```
75+
Please change to the best checkpoint of your model you use for clean data
76+
Ex: `LOGDIR=$(pwd)/logs/clean_model_2_resnet34/`
77+
78+
# Clean data
79+
In this file, change following parameter correct to your environment
80+
* `data_clean_train`
81+
Path to `train data` you want to clean.
82+
Ex: `/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2/train/`
83+
84+
* `data_clean_valid`
85+
Path to `valid data` you want to clean.
86+
Ex: `/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2/valid/`
87+
88+
* `data_clean_train_out`
89+
Output of train data after clean.
90+
Ex: `/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2/train/`
91+
92+
* `data_clean_valid_out`
93+
Output of valid data after clean.
94+
Ex: `/media/ngxbac/Bac/competition/kaggle/competition_data/quickdraw/data/30k/data_2_cleannn/valid/`
95+
96+
* `data_train_predict`
97+
Logit prediction of `data_clean_train` when using a model to predict.
98+
Ex: `./logs/clean_model_1_resnet34/dataset.predictions.data_2_train.logits.satge1.5.npy`
99+
100+
* `data_valid_predict`
101+
Logit prediction of `data_clean_valid` when using a model to predict.
102+
Ex: `./logs/clean_model_1_resnet34/dataset.predictions.data_2_valid.logits.satge1.5.npy`
103+
104+
# Make submission
105+
```python
106+
python make_submission.py
107+
```
108+
Make sure you change correct `log_dir` in `make_submission.py`
109+
110+
# How to resume
111+
Define the `resume` in `train.yml` and `Run model` again. Usually, we will resume from `checkpoint.best.pth.tar`
112+
in the `logs` folder.
113+
114+
### Supported architectures and models
115+
116+
#### From [torchvision](https://github.com/pytorch/vision/) package:
117+
118+
- ResNet (`resnet18`, `resnet34`, `resnet50`, `resnet101`, `resnet152`)
119+
- DenseNet (`densenet121`, `densenet169`, `densenet201`, `densenet161`)
120+
- Inception v3 (`inception_v3`)
121+
- VGG (`vgg11`, `vgg11_bn`, `vgg13`, `vgg13_bn`, `vgg16`, `vgg16_bn`, `vgg19`, `vgg19_bn`)
122+
- SqueezeNet (`squeezenet1_0`, `squeezenet1_1`)
123+
- AlexNet (`alexnet`)
124+
125+
#### From [Pretrained models for PyTorch](https://github.com/Cadene/pretrained-models.pytorch) package:
126+
- ResNeXt (`resnext101_32x4d`, `resnext101_64x4d`)
127+
- NASNet-A Large (`nasnetalarge`)
128+
- NASNet-A Mobile (`nasnetamobile`)
129+
- Inception-ResNet v2 (`inceptionresnetv2`)
130+
- Dual Path Networks (`dpn68`, `dpn68b`, `dpn92`, `dpn98`, `dpn131`, `dpn107`)
131+
- Inception v4 (`inception_v4`)
132+
- Xception (`xception`)
133+
- Squeeze-and-Excitation Networks (`senet154`, `se_resnet50`, `se_resnet101`, `se_resnet152`, `se_resnext50_32x4d`, `se_resnext101_32x4d`)
134+
- PNASNet-5-Large (`pnasnet5large`)
135+
- PolyNet (`polynet`)

README.md

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Tổng hợp ý tưởng (tạm thời), mọi người cùng đóng góp và cho ý kiên
2+
Please respect Kaggle rules
3+
- Only submit to test new ideas, test LB
4+
- Or when there are un-used submissions (do not create 2 accounts)
5+
- Any external dataset, pre-trained weights must be posted in the forum
6+
7+
# Dataset: 340 classes
8+
- Train: unbalanced distribution
9+
- Test: (almost) balanced distribution
10+
Test draws were collected from a different period, different locations
11+
=> countries are useless, time (in the train set) is difficult to exploited
12+
13+
## Proposed split
14+
- Train set was already shuffled, no need to re-shuffled again
15+
- Keep last 10K for blending – blending set, please consider this sub-set as test set, do not even use it for the validation in level 0
16+
- Number of draws per class
17+
snowman 340029
18+
potato 329204
19+
calendar 321981
20+
...
21+
ceiling fan 115413
22+
bed 113862
23+
panda 113613
24+
25+
26+
# Approach
27+
Due to the size of the data, it is fine to use blending instead of stacking
28+
## Level 0
29+
Please keep the model weights (and the seeds) and produce probabilities for the Test set and the blending set!
30+
31+
### GrayImage-based models
32+
If needed, external dataset could be used here
33+
(Bac, please comment here!)
34+
35+
### ColorImage-based models
36+
If needed, external dataset could be used here
37+
38+
### Stroke-based models
39+
- LSTM
40+
Around LB 0.87 with 75K draws/class
41+
- RANET (another type of LSTM)
42+
Around LB 0.87 with 75K draws/class
43+
- Wavenet
44+
Around LB 0.87 with 75K draws/class
45+
- ConvLSTM, mỗi timestep là một bức ảnh đang được vẽ, timestep sau hoàn thiện hơn timestep trước
46+
(idea from Hau)
47+
48+
## Level 1
49+
- Feed the features from Level 0 in to XGBOOST, RF, NN (CNN)
50+
- Extra features: https://www.kaggle.com/c/quickdraw-doodle-recognition/discussion/70680
51+
- Statistics features from raw data: # strokes, # points ...
52+
53+
## Level 2
54+
Weighted average of predictions from Level 1 => One final submission
55+
## Post-processing
56+
Balance the distribution from Level 2 => One final submission
57+
LB improvement: 0.005
58+
59+
# Label Encoder proposal
60+
word_encoder = LabelEncoder()
61+
word_encoder.classes_ = np.array(['The Eiffel Tower', 'The Great Wall of China', 'The Mona Lisa',
62+
'airplane', 'alarm clock', 'ambulance', 'angel',
63+
'animal migration', 'ant', 'anvil', 'apple', 'arm', 'asparagus',
64+
'axe', 'backpack', 'banana', 'bandage', 'barn', 'baseball',
65+
'baseball bat', 'basket', 'basketball', 'bat', 'bathtub', 'beach',
66+
'bear', 'beard', 'bed', 'bee', 'belt', 'bench', 'bicycle',
67+
'binoculars', 'bird', 'birthday cake', 'blackberry', 'blueberry',
68+
'book', 'boomerang', 'bottlecap', 'bowtie', 'bracelet', 'brain',
69+
'bread', 'bridge', 'broccoli', 'broom', 'bucket', 'bulldozer',
70+
'bus', 'bush', 'butterfly', 'cactus', 'cake', 'calculator',
71+
'calendar', 'camel', 'camera', 'camouflage', 'campfire', 'candle',
72+
'cannon', 'canoe', 'car', 'carrot', 'castle', 'cat', 'ceiling fan',
73+
'cell phone', 'cello', 'chair', 'chandelier', 'church', 'circle',
74+
'clarinet', 'clock', 'cloud', 'coffee cup', 'compass', 'computer',
75+
'cookie', 'cooler', 'couch', 'cow', 'crab', 'crayon', 'crocodile',
76+
'crown', 'cruise ship', 'cup', 'diamond', 'dishwasher',
77+
'diving board', 'dog', 'dolphin', 'donut', 'door', 'dragon',
78+
'dresser', 'drill', 'drums', 'duck', 'dumbbell', 'ear', 'elbow',
79+
'elephant', 'envelope', 'eraser', 'eye', 'eyeglasses', 'face',
80+
'fan', 'feather', 'fence', 'finger', 'fire hydrant', 'fireplace',
81+
'firetruck', 'fish', 'flamingo', 'flashlight', 'flip flops',
82+
'floor lamp', 'flower', 'flying saucer', 'foot', 'fork', 'frog',
83+
'frying pan', 'garden', 'garden hose', 'giraffe', 'goatee',
84+
'golf club', 'grapes', 'grass', 'guitar', 'hamburger', 'hammer',
85+
'hand', 'harp', 'hat', 'headphones', 'hedgehog', 'helicopter',
86+
'helmet', 'hexagon', 'hockey puck', 'hockey stick', 'horse',
87+
'hospital', 'hot air balloon', 'hot dog', 'hot tub', 'hourglass',
88+
'house', 'house plant', 'hurricane', 'ice cream', 'jacket', 'jail',
89+
'kangaroo', 'key', 'keyboard', 'knee', 'ladder', 'lantern',
90+
'laptop', 'leaf', 'leg', 'light bulb', 'lighthouse', 'lightning',
91+
'line', 'lion', 'lipstick', 'lobster', 'lollipop', 'mailbox',
92+
'map', 'marker', 'matches', 'megaphone', 'mermaid', 'microphone',
93+
'microwave', 'monkey', 'moon', 'mosquito', 'motorbike', 'mountain',
94+
'mouse', 'moustache', 'mouth', 'mug', 'mushroom', 'nail',
95+
'necklace', 'nose', 'ocean', 'octagon', 'octopus', 'onion', 'oven',
96+
'owl', 'paint can', 'paintbrush', 'palm tree', 'panda', 'pants',
97+
'paper clip', 'parachute', 'parrot', 'passport', 'peanut', 'pear',
98+
'peas', 'pencil', 'penguin', 'piano', 'pickup truck',
99+
'picture frame', 'pig', 'pillow', 'pineapple', 'pizza', 'pliers',
100+
'police car', 'pond', 'pool', 'popsicle', 'postcard', 'potato',
101+
'power outlet', 'purse', 'rabbit', 'raccoon', 'radio', 'rain',
102+
'rainbow', 'rake', 'remote control', 'rhinoceros', 'river',
103+
'roller coaster', 'rollerskates', 'sailboat', 'sandwich', 'saw',
104+
'saxophone', 'school bus', 'scissors', 'scorpion', 'screwdriver',
105+
'sea turtle', 'see saw', 'shark', 'sheep', 'shoe', 'shorts',
106+
'shovel', 'sink', 'skateboard', 'skull', 'skyscraper',
107+
'sleeping bag', 'smiley face', 'snail', 'snake', 'snorkel',
108+
'snowflake', 'snowman', 'soccer ball', 'sock', 'speedboat',
109+
'spider', 'spoon', 'spreadsheet', 'square', 'squiggle', 'squirrel',
110+
'stairs', 'star', 'steak', 'stereo', 'stethoscope', 'stitches',
111+
'stop sign', 'stove', 'strawberry', 'streetlight', 'string bean',
112+
'submarine', 'suitcase', 'sun', 'swan', 'sweater', 'swing set',
113+
'sword', 't-shirt', 'table', 'teapot', 'teddy-bear', 'telephone',
114+
'television', 'tennis racquet', 'tent', 'tiger', 'toaster', 'toe',
115+
'toilet', 'tooth', 'toothbrush', 'toothpaste', 'tornado',
116+
'tractor', 'traffic light', 'train', 'tree', 'triangle',
117+
'trombone', 'truck', 'trumpet', 'umbrella', 'underwear', 'van',
118+
'vase', 'violin', 'washing machine', 'watermelon', 'waterslide',
119+
'whale', 'wheel', 'windmill', 'wine bottle', 'wine glass',
120+
'wristwatch', 'yoga', 'zebra', 'zigzag'], dtype=object)
121+
122+
123+
124+

augmentation.py

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from albumentations import *
2+
3+
4+
def train_aug(p=1.0, image_size=256):
5+
return Compose([
6+
Resize(image_size, image_size),
7+
# GaussNoise(),
8+
# HorizontalFlip(0.5),
9+
# Rotate(limit=10),
10+
# Normalize(),
11+
], p=p)
12+
13+
14+
def valid_aug(p=1.0, image_size=256):
15+
return Compose([
16+
Resize(image_size, image_size),
17+
# Normalize(),
18+
], p=p)
19+
20+
21+
def test_aug(p=1.0, image_size=256):
22+
return Compose([
23+
Resize(image_size, image_size),
24+
# Normalize(),
25+
], p=p)
26+
27+
28+
def test_tta(p=1.0, image_size=256):
29+
return [
30+
Compose([
31+
Resize(image_size, image_size),
32+
HorizontalFlip(p=1),
33+
], p=p),
34+
Compose([
35+
Resize(image_size, image_size),
36+
], p=p),
37+
]

catalyst/CNAME

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
scitator.io

catalyst/LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2018 Sergey Kolesnikov
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

catalyst/README.md

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Catalyst
2+
High-level utils for PyTorch DL/RL research.
3+
It was developed with a focus on reproducibility, fast experimentation and code/ideas/models reusing.
4+
Being able to research/develop something new, rather then write another regular train loop.
5+
Best coding practices included.
6+
7+
## Features
8+
9+
- Universal train/inference loop.
10+
- Key-values storages.
11+
- Data and model usage standardization.
12+
- Configuration files for model/data hyperparameters.
13+
- Loggers and Tensorboard support.
14+
- Reproducibility – even source code will be saved.
15+
- 1Cycle and LRFinder support.
16+
- FP16 support.
17+
- Corrected weight decay (AdamW).
18+
- N-best-checkpoints saving (SWA).
19+
- Training stages support.
20+
- Logdir autonaming based on hyperparameters.
21+
- Callbacks – reusable train/inference pipeline parts.
22+
- Well structured and production friendly.
23+
- Lots of reusable code for different purposes: losses, optimizers, models, knns, embeddings projector.
24+
25+
26+
Catalyst is compatible with: Python 3.6+. PyTorch 0.4.1+.
27+
28+
Stable branch - `master`. Development branch - `dev`.
29+
30+
## Usage
31+
```bash
32+
git submodule add https://github.com/Scitator/catalyst.git catalyst
33+
```
34+
35+
## Examples
36+
37+
https://github.com/Scitator/catalyst-examples
38+
39+
40+
## Dependencies
41+
```bash
42+
pip install git+https://github.com/pytorch/tnt.git@master \
43+
tensorboardX jpeg4py albumentations
44+
```
45+
46+
## Docker
47+
48+
See `./docker` for more information and examples.
49+
50+
51+
## Contribution guide
52+
53+
##### Autoformatting code
54+
55+
We use [yapf](https://github.com/google/yapf) for linting,
56+
and the config file is located at `.style.yapf`.
57+
We recommend running `yapf.sh` prior to pushing to format changed files.
58+
59+
60+
##### Linter
61+
62+
To run the Python linter on a specific file,
63+
run something like `flake8 dl/scripts/train.py`.
64+
You may need to first run `pip install flake8`.
65+
66+
See `codestyle.md` for more information.

catalyst/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)