Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Npu test #10358

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Npu test #10358

wants to merge 5 commits into from

Conversation

zkyseu
Copy link

@zkyseu zkyseu commented Nov 26, 2023

华为昇腾910对oneflow源码的修改,与机器能够适配

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.4ms (= 4340.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.5ms (= 5747.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.5ms / 43.4ms)

OneFlow resnet50 time: 26.0ms (= 2604.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 3807.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.46 (= 38.1ms / 26.0ms)

OneFlow resnet50 time: 18.6ms (= 3718.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6967.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.87 (= 34.8ms / 18.6ms)

OneFlow resnet50 time: 17.9ms (= 3580.8ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.2ms (= 6236.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.74 (= 31.2ms / 17.9ms)

OneFlow resnet50 time: 17.4ms (= 3480.0ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.7ms (= 5932.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.70 (= 29.7ms / 17.4ms)

OneFlow swin dataloader time: 0.200s (= 40.026s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.421s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.056s (= 11.150s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.559s / 200, num_workers=4)
Relative speed: 0.588 (= 0.033s / 0.056s)

OneFlow swin dataloader time: 0.031s (= 6.135s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.308s / 200, num_workers=8)
Relative speed: 0.539 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 47.7ms (= 4767.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 62.9ms (= 6285.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 62.9ms / 47.7ms)

OneFlow resnet50 time: 31.3ms (= 3131.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.2ms (= 4621.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.48 (= 46.2ms / 31.3ms)

OneFlow resnet50 time: 23.8ms (= 4753.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 43.0ms (= 8592.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.81 (= 43.0ms / 23.8ms)

OneFlow resnet50 time: 21.5ms (= 4300.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 37.6ms (= 7525.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.75 (= 37.6ms / 21.5ms)

OneFlow resnet50 time: 20.8ms (= 4166.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.1ms (= 7019.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 35.1ms / 20.8ms)

@yuanms2
Copy link
Contributor

yuanms2 commented Nov 27, 2023

这个表示可以在昇腾机器上编译,还不能运行吧?

.gitignore Outdated Show resolved Hide resolved
cmake/caches/cn/cpu.cmake Outdated Show resolved Hide resolved
@zkyseu
Copy link
Author

zkyseu commented Nov 27, 2023

这个表示可以在昇腾机器上编译,还不能运行吧?

@yuanms2 目前测试oneflow lite是可以正常推理的,但是oneflow进行模型训练没有进行测试。

@hjchen2 hjchen2 enabled auto-merge (squash) November 27, 2023 01:58
Copy link
Contributor

Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 44.0ms (= 4399.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 61.5ms (= 6149.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.40 (= 61.5ms / 44.0ms)

OneFlow resnet50 time: 26.5ms (= 2652.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3738.6ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.41 (= 37.4ms / 26.5ms)

OneFlow resnet50 time: 18.5ms (= 3696.8ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.5ms (= 7292.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.97 (= 36.5ms / 18.5ms)

OneFlow resnet50 time: 17.6ms (= 3514.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.8ms (= 6164.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.75 (= 30.8ms / 17.6ms)

OneFlow resnet50 time: 17.0ms (= 3405.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.4ms (= 5683.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.67 (= 28.4ms / 17.0ms)

OneFlow swin dataloader time: 0.200s (= 40.057s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.423s / 200, num_workers=1)
Relative speed: 0.635 (= 0.127s / 0.200s)

OneFlow swin dataloader time: 0.054s (= 10.824s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.495s / 200, num_workers=4)
Relative speed: 0.600 (= 0.032s / 0.054s)

OneFlow swin dataloader time: 0.030s (= 6.050s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.429s / 200, num_workers=8)
Relative speed: 0.567 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.7ms (= 4766.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.0ms (= 6499.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 65.0ms / 47.7ms)

OneFlow resnet50 time: 32.0ms (= 3204.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.0ms (= 4404.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 44.0ms / 32.0ms)

OneFlow resnet50 time: 23.7ms (= 4738.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8265.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 41.3ms / 23.7ms)

OneFlow resnet50 time: 21.1ms (= 4224.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.1ms (= 7228.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.71 (= 36.1ms / 21.1ms)

OneFlow resnet50 time: 20.7ms (= 4134.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.2ms (= 6830.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 34.2ms / 20.7ms)

Copy link
Contributor

CI failed when running job: cuda-speed-test. PR label automerge has been removed

@hjchen2 hjchen2 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot November 27, 2023 10:54
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.5ms (= 4348.6ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.0ms (= 5700.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.31 (= 57.0ms / 43.5ms)

OneFlow resnet50 time: 26.5ms (= 2651.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.2ms (= 3820.5ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.44 (= 38.2ms / 26.5ms)

OneFlow resnet50 time: 19.1ms (= 3824.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.7ms (= 7144.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.87 (= 35.7ms / 19.1ms)

OneFlow resnet50 time: 17.6ms (= 3524.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.9ms (= 6177.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.75 (= 30.9ms / 17.6ms)

OneFlow resnet50 time: 17.8ms (= 3550.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.3ms (= 5659.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.59 (= 28.3ms / 17.8ms)

OneFlow swin dataloader time: 0.200s (= 40.090s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.618s / 200, num_workers=1)
Relative speed: 0.639 (= 0.128s / 0.200s)

OneFlow swin dataloader time: 0.055s (= 10.909s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.526s / 200, num_workers=4)
Relative speed: 0.598 (= 0.033s / 0.055s)

OneFlow swin dataloader time: 0.030s (= 6.023s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.375s / 200, num_workers=8)
Relative speed: 0.560 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 47.8ms (= 4778.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.1ms (= 6410.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 64.1ms / 47.8ms)

OneFlow resnet50 time: 33.2ms (= 3322.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 48.1ms (= 4810.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.45 (= 48.1ms / 33.2ms)

OneFlow resnet50 time: 23.5ms (= 4699.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 41.3ms (= 8259.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 41.3ms / 23.5ms)

OneFlow resnet50 time: 20.7ms (= 4141.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.1ms (= 7221.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 36.1ms / 20.7ms)

OneFlow resnet50 time: 20.3ms (= 4050.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 34.0ms (= 6797.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 34.0ms / 20.3ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants