Skip to content

Npu test#10358

Open
zkyseu wants to merge 5 commits intomasterfrom
npu_test
Open

Npu test#10358
zkyseu wants to merge 5 commits intomasterfrom
npu_test

Conversation

@zkyseu
Copy link

@zkyseu zkyseu commented Nov 26, 2023

华为昇腾910对oneflow源码的修改,与机器能够适配

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.4ms (= 4340.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 5747.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.5ms / 43.4ms)

OneFlow resnet50 time: 26.0ms (= 2604.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 3807.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.46 (= 38.1ms / 26.0ms)

OneFlow resnet50 time: 18.6ms (= 3718.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6967.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.87 (= 34.8ms / 18.6ms)

OneFlow resnet50 time: 17.9ms (= 3580.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.2ms (= 6236.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.74 (= 31.2ms / 17.9ms)

OneFlow resnet50 time: 17.4ms (= 3480.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.7ms (= 5932.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.70 (= 29.7ms / 17.4ms)

OneFlow swin dataloader time: 0.200s (= 40.026s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.421s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.056s (= 11.150s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.559s / 200, num_workers=4)
Relative speed: 0.588 (= 0.033s / 0.056s)

OneFlow swin dataloader time: 0.031s (= 6.135s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.308s / 200, num_workers=8)
Relative speed: 0.539 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 47.7ms (= 4767.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.9ms (= 6285.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 62.9ms / 47.7ms)

OneFlow resnet50 time: 31.3ms (= 3131.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.2ms (= 4621.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 46.2ms / 31.3ms)

OneFlow resnet50 time: 23.8ms (= 4753.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 43.0ms (= 8592.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.81 (= 43.0ms / 23.8ms)

OneFlow resnet50 time: 21.5ms (= 4300.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 37.6ms (= 7525.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 37.6ms / 21.5ms)

OneFlow resnet50 time: 20.8ms (= 4166.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.1ms (= 7019.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 35.1ms / 20.8ms)

@yuanms2
Copy link
Contributor

yuanms2 commented Nov 27, 2023

这个表示可以在昇腾机器上编译,还不能运行吧?

@zkyseu
Copy link
Author

zkyseu commented Nov 27, 2023

这个表示可以在昇腾机器上编译,还不能运行吧?

@yuanms2 目前测试oneflow lite是可以正常推理的,但是oneflow进行模型训练没有进行测试。

@hjchen2 hjchen2 enabled auto-merge (squash) November 27, 2023 01:58
@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 44.0ms (= 4399.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 61.5ms (= 6149.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.40 (= 61.5ms / 44.0ms)

OneFlow resnet50 time: 26.5ms (= 2652.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3738.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.41 (= 37.4ms / 26.5ms)

OneFlow resnet50 time: 18.5ms (= 3696.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.5ms (= 7292.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.97 (= 36.5ms / 18.5ms)

OneFlow resnet50 time: 17.6ms (= 3514.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.8ms (= 6164.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.75 (= 30.8ms / 17.6ms)

OneFlow resnet50 time: 17.0ms (= 3405.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.4ms (= 5683.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.67 (= 28.4ms / 17.0ms)

OneFlow swin dataloader time: 0.200s (= 40.057s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.423s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.054s (= 10.824s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.495s / 200, num_workers=4)
Relative speed: 0.600 (= 0.032s / 0.054s)

OneFlow swin dataloader time: 0.030s (= 6.050s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.429s / 200, num_workers=8)
Relative speed: 0.567 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.7ms (= 4766.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.0ms (= 6499.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 65.0ms / 47.7ms)

OneFlow resnet50 time: 32.0ms (= 3204.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.0ms (= 4404.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 44.0ms / 32.0ms)

OneFlow resnet50 time: 23.7ms (= 4738.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8265.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 41.3ms / 23.7ms)

OneFlow resnet50 time: 21.1ms (= 4224.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.1ms (= 7228.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.71 (= 36.1ms / 21.1ms)

OneFlow resnet50 time: 20.7ms (= 4134.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.2ms (= 6830.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 34.2ms / 20.7ms)

@github-actions
Copy link
Contributor

CI failed when running job: cuda-speed-test. PR label automerge has been removed

@hjchen2 hjchen2 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot November 27, 2023 10:54
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.5ms (= 4348.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.0ms (= 5700.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.31 (= 57.0ms / 43.5ms)

OneFlow resnet50 time: 26.5ms (= 2651.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.2ms (= 3820.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.44 (= 38.2ms / 26.5ms)

OneFlow resnet50 time: 19.1ms (= 3824.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.7ms (= 7144.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.87 (= 35.7ms / 19.1ms)

OneFlow resnet50 time: 17.6ms (= 3524.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.9ms (= 6177.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.75 (= 30.9ms / 17.6ms)

OneFlow resnet50 time: 17.8ms (= 3550.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.3ms (= 5659.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.59 (= 28.3ms / 17.8ms)

OneFlow swin dataloader time: 0.200s (= 40.090s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.618s / 200, num_workers=1)
Relative speed: 0.639 (= 0.128s / 0.200s)

OneFlow swin dataloader time: 0.055s (= 10.909s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.526s / 200, num_workers=4)
Relative speed: 0.598 (= 0.033s / 0.055s)

OneFlow swin dataloader time: 0.030s (= 6.023s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.375s / 200, num_workers=8)
Relative speed: 0.560 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.8ms (= 4778.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.1ms (= 6410.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 64.1ms / 47.8ms)

OneFlow resnet50 time: 33.2ms (= 3322.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 48.1ms (= 4810.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 48.1ms / 33.2ms)

OneFlow resnet50 time: 23.5ms (= 4699.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8259.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 41.3ms / 23.5ms)

OneFlow resnet50 time: 20.7ms (= 4141.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.1ms (= 7221.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 36.1ms / 20.7ms)

OneFlow resnet50 time: 20.3ms (= 4050.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.0ms (= 6797.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 34.0ms / 20.3ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants