Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练报错 #114

Open
alexiycv opened this issue Aug 16, 2022 · 5 comments
Open

训练报错 #114

alexiycv opened this issue Aug 16, 2022 · 5 comments

Comments

@alexiycv
Copy link

λ f02d1b16ca1e /home/PLSC mkdir -p ./dataset/
λ f02d1b16ca1e /home/PLSC tar -xzf MS1M_v3_One_Sample.tgz -C ./dataset/

λ f02d1b16ca1e /home/PLSC
λ f02d1b16ca1e /home/PLSC python plsc/data/dataset/tools/lfw_style_bin_dataset_converter.py --bin_path ./dataset/MS1M_v3_One_Sample/agedb_30.bin --out_dir ./dataset/MS1M_v3_One_Sample/agedb_30/ --flip_test
convert 6000 pair images.
plsc/data/dataset/tools/lfw_style_bin_dataset_converter.py:66: DeprecationWarning: FLIP_LEFT_RIGHT is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.FLIP_LEFT_RIGHT instead.
img1 = img1.transpose(Image.FLIP_LEFT_RIGHT)
plsc/data/dataset/tools/lfw_style_bin_dataset_converter.py:73: DeprecationWarning: FLIP_LEFT_RIGHT is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.FLIP_LEFT_RIGHT instead.
img2 = img2.transpose(Image.FLIP_LEFT_RIGHT)
convert 6000 pair horizontal flip images.
λ f02d1b16ca1e /home/PLSC export CUDA_VISIBLE_DEVICES=0
λ f02d1b16ca1e /home/PLSC python tools/train.py -c ./plsc/configs/FaceRecognition/IResNet50_MS1MV3OneSample_ArcFace_0.1_1n8c_dp_fp32.yaml
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/home/PLSC/plsc/data/preprocess/timm_autoaugment.py:38: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
/home/PLSC/plsc/data/preprocess/timm_autoaugment.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
_RANDOM_INTERPOLATION = (Image.BILINEAR, Image.BICUBIC)
[2022/08/16 02:58:22] plsc INFO: DataLoader :
[2022/08/16 02:58:22] plsc INFO: Eval :
[2022/08/16 02:58:22] plsc INFO: dataset :
[2022/08/16 02:58:22] plsc INFO: cls_label_path : ./dataset/MS1M_v3_One_Sample/agedb_30/label.txt
[2022/08/16 02:58:22] plsc INFO: image_root : ./dataset/MS1M_v3_One_Sample/agedb_30
[2022/08/16 02:58:22] plsc INFO: name : FaceVerificationDataset
[2022/08/16 02:58:22] plsc INFO: transform_ops :
[2022/08/16 02:58:22] plsc INFO: DecodeImage :
[2022/08/16 02:58:22] plsc INFO: channel_first : False
[2022/08/16 02:58:22] plsc INFO: to_rgb : True
[2022/08/16 02:58:22] plsc INFO: NormalizeImage :
[2022/08/16 02:58:22] plsc INFO: mean : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: order :
[2022/08/16 02:58:22] plsc INFO: scale : 1.0/255.0
[2022/08/16 02:58:22] plsc INFO: std : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: ToCHWImage : None
[2022/08/16 02:58:22] plsc INFO: loader :
[2022/08/16 02:58:22] plsc INFO: num_workers : 0
[2022/08/16 02:58:22] plsc INFO: use_shared_memory : True
[2022/08/16 02:58:22] plsc INFO: sampler :
[2022/08/16 02:58:22] plsc INFO: batch_size : 128
[2022/08/16 02:58:22] plsc INFO: drop_last : False
[2022/08/16 02:58:22] plsc INFO: name : BatchSampler
[2022/08/16 02:58:22] plsc INFO: shuffle : False
[2022/08/16 02:58:22] plsc INFO: Train :
[2022/08/16 02:58:22] plsc INFO: dataset :
[2022/08/16 02:58:22] plsc INFO: cls_label_path : ./dataset/MS1M_v3_One_Sample/label.txt
[2022/08/16 02:58:22] plsc INFO: image_root : ./dataset/MS1M_v3_One_Sample/
[2022/08/16 02:58:22] plsc INFO: name : FaceIdentificationDataset
[2022/08/16 02:58:22] plsc INFO: transform_ops :
[2022/08/16 02:58:22] plsc INFO: DecodeImage :
[2022/08/16 02:58:22] plsc INFO: channel_first : False
[2022/08/16 02:58:22] plsc INFO: to_rgb : True
[2022/08/16 02:58:22] plsc INFO: RandFlipImage :
[2022/08/16 02:58:22] plsc INFO: flip_code : 1
[2022/08/16 02:58:22] plsc INFO: NormalizeImage :
[2022/08/16 02:58:22] plsc INFO: mean : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: order :
[2022/08/16 02:58:22] plsc INFO: scale : 1.0/255.0
[2022/08/16 02:58:22] plsc INFO: std : [0.5, 0.5, 0.5]
[2022/08/16 02:58:22] plsc INFO: ToCHWImage : None
[2022/08/16 02:58:22] plsc INFO: loader :
[2022/08/16 02:58:22] plsc INFO: num_workers : 8
[2022/08/16 02:58:22] plsc INFO: use_shared_memory : True
[2022/08/16 02:58:22] plsc INFO: sampler :
[2022/08/16 02:58:22] plsc INFO: batch_size : 128
[2022/08/16 02:58:22] plsc INFO: drop_last : False
[2022/08/16 02:58:22] plsc INFO: name : DistributedBatchSampler
[2022/08/16 02:58:22] plsc INFO: shuffle : True
[2022/08/16 02:58:22] plsc INFO: DistributedStrategy :
[2022/08/16 02:58:22] plsc INFO: data_parallel : True
[2022/08/16 02:58:22] plsc INFO: Export :
[2022/08/16 02:58:22] plsc INFO: export_type : onnx
[2022/08/16 02:58:22] plsc INFO: input_shape : ['None', 3, 112, 112]
[2022/08/16 02:58:22] plsc INFO: Global :
[2022/08/16 02:58:22] plsc INFO: accum_steps : 1
[2022/08/16 02:58:22] plsc INFO: checkpoint : None
[2022/08/16 02:58:22] plsc INFO: device : gpu
[2022/08/16 02:58:22] plsc INFO: distributed : False
[2022/08/16 02:58:22] plsc INFO: epochs : 25
[2022/08/16 02:58:22] plsc INFO: eval_during_train : True
[2022/08/16 02:58:22] plsc INFO: eval_func : face_verification_eval
[2022/08/16 02:58:22] plsc INFO: eval_interval : 200
[2022/08/16 02:58:22] plsc INFO: eval_unit : step
[2022/08/16 02:58:22] plsc INFO: max_num_latest_checkpoint : 0
[2022/08/16 02:58:22] plsc INFO: output_dir : ./output/
[2022/08/16 02:58:22] plsc INFO: pretrained_model : None
[2022/08/16 02:58:22] plsc INFO: print_batch_step : 10
[2022/08/16 02:58:22] plsc INFO: rank : 0
[2022/08/16 02:58:22] plsc INFO: save_interval : 1
[2022/08/16 02:58:22] plsc INFO: seed : 2022
[2022/08/16 02:58:22] plsc INFO: task_type : recognition
[2022/08/16 02:58:22] plsc INFO: train_epoch_func : defualt_train_one_epoch
[2022/08/16 02:58:22] plsc INFO: use_visualdl : True
[2022/08/16 02:58:22] plsc INFO: world_size : 1
[2022/08/16 02:58:22] plsc INFO: LRScheduler :
[2022/08/16 02:58:22] plsc INFO: boundaries : [10, 16, 22]
[2022/08/16 02:58:22] plsc INFO: decay_unit : epoch
[2022/08/16 02:58:22] plsc INFO: name : Step
[2022/08/16 02:58:22] plsc INFO: values : [0.2, 0.02, 0.002, 0.0002]
[2022/08/16 02:58:22] plsc INFO: Loss :
[2022/08/16 02:58:22] plsc INFO: Train :
[2022/08/16 02:58:22] plsc INFO: MarginLoss :
[2022/08/16 02:58:22] plsc INFO: m1 : 1.0
[2022/08/16 02:58:22] plsc INFO: m2 : 0.5
[2022/08/16 02:58:22] plsc INFO: m3 : 0.0
[2022/08/16 02:58:22] plsc INFO: model_parallel : False
[2022/08/16 02:58:22] plsc INFO: s : 64.0
[2022/08/16 02:58:22] plsc INFO: weight : 1.0
[2022/08/16 02:58:22] plsc INFO: Metric :
[2022/08/16 02:58:22] plsc INFO: Eval :
[2022/08/16 02:58:22] plsc INFO: LFWAcc :
[2022/08/16 02:58:22] plsc INFO: flip_test : True
[2022/08/16 02:58:22] plsc INFO: Model :
[2022/08/16 02:58:22] plsc INFO: class_num : 93431
[2022/08/16 02:58:22] plsc INFO: data_format : NCHW
[2022/08/16 02:58:22] plsc INFO: name : IResNet50
[2022/08/16 02:58:22] plsc INFO: num_features : 512
[2022/08/16 02:58:22] plsc INFO: pfc_config :
[2022/08/16 02:58:22] plsc INFO: model_parallel : False
[2022/08/16 02:58:22] plsc INFO: sample_ratio : 0.1
[2022/08/16 02:58:22] plsc INFO: Optimizer :
[2022/08/16 02:58:22] plsc INFO: grad_clip :
[2022/08/16 02:58:22] plsc INFO: always_clip : True
[2022/08/16 02:58:22] plsc INFO: clip_norm : 2.0
[2022/08/16 02:58:22] plsc INFO: clip_norm_max : 2.0
[2022/08/16 02:58:22] plsc INFO: name : ClipGradByGlobalNorm
[2022/08/16 02:58:22] plsc INFO: no_clip_list : ['partialfc']
[2022/08/16 02:58:22] plsc INFO: momentum : 0.9
[2022/08/16 02:58:22] plsc INFO: name : Momentum
[2022/08/16 02:58:22] plsc INFO: use_master_param : False
[2022/08/16 02:58:22] plsc INFO: weight_decay : 0.0005
[2022/08/16 02:58:22] plsc INFO: profiler_options : None
[2022/08/16 02:58:22] plsc INFO: train with paddle 2.3.1 and device Place(gpu:0)
[2022/08/16 02:58:22] plsc INFO: Loading dataset ./dataset/MS1M_v3_One_Sample/label.txt
[2022/08/16 02:58:23] plsc INFO: Load dataset finished, 93431 samples
W0816 02:58:23.308667 876 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.2, Runtime API Version: 11.2
W0816 02:58:23.372398 876 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
[2022/08/16 02:58:24] plsc INFO: Number of Parameters is 91.43M.
Traceback (most recent call last):
File "tools/train.py", line 34, in
engine = Engine(config, mode="train")
File "/home/PLSC/plsc/engine/engine.py", line 213, in init
self.lr_scheduler, self.model)
File "/home/PLSC/plsc/optimizer/init.py", line 60, in build_optimizer
param_group[key] = get_fused_params(param_group[key])
File "/home/PLSC/plsc/core/param_fuse.py", line 454, in get_fused_params
var_groups = assign_group_by_size(params)
File "/home/PLSC/plsc/core/param_fuse.py", line 391, in assign_group_by_size
parameters, is_sparse_gradient, [group_size, group_size])
ValueError: (InvalidArgument) argument (position 1) must be list of Tensor, but got ParamBase at pos 0 (at /paddle/paddle/fluid/pybind/eager_utils.cc:240)

@GuoxiaWang
Copy link
Collaborator

请问你的 PaddlePaddle 版本是多少?如果是最新编译的 develop 分支的代码,是哪天编译的?

可以通过以下代码拿到 commit 号

import paddle
print(paddle.__git_commit__)

@alexiycv
Copy link
Author

3cc6ae69ed93388b2648bcc819d593130dede752

@GuoxiaWang
Copy link
Collaborator

是最新的 PLSC 代码吗?

我按照上面执行应该能复现的把?我试试看能否复现。

@GuoxiaWang
Copy link
Collaborator

我看了你另外一个问题 issue115, 我知道什么原因了。

你应该跑的代码是 PLSC release 2.2 的,然后用了最新 paddlepaddle 2.3。

PLSC release 2.2 需要用 paddlepaddle 2.2 来跑。 因为 paddlepaddle 2.3 做了一个大升级。

@alexiycv
Copy link
Author

我看了你另外一个问题 issue115, 我知道什么原因了。

你应该跑的代码是 PLSC release 2.2 的,然后用了最新 paddlepaddle 2.3。

PLSC release 2.2 需要用 paddlepaddle 2.2 来跑。 因为 paddlepaddle 2.3 做了一个大升级。

好的,多谢,这个我就是按照教程一步一步来运行的,然后运行不起来

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants