Replies: 5 comments 8 replies
-
@lostmsu, thanks for your question. Sorry, ZeRO-Infinity does not currently work on Windows. Are you currently running DeepSpeed on Windows? |
Beta Was this translation helpful? Give feedback.
-
@tjruwase I am. I also just improved Windows support in OpenAI's Triton, so their next public release might have it. |
Beta Was this translation helpful? Give feedback.
-
@tjruwase can you point me to the part of code, that is responsible for Infinity offloading? I might be able to port it. Does it depend on libaio? |
Beta Was this translation helpful? Give feedback.
-
Below is the output of your script on a dgx-2 box. temp-12G file exists on NVMe device with peak GB/sec: read=14 and write=12.4 run 0 time: 11.538774251937866s write speed: 1064.9311384124103MB/s
run 1 time: 12.798313856124878s write speed: 960.1264774515076MB/s
run 2 time: 12.075251340866089s write speed: 1017.6185698440834MB/s
run 3 time: 11.411003351211548s write speed: 1076.855349332216MB/s
run 0 time: 1.5713260173797607s read speed: 7820.146719450783MB/s
run 1 time: 1.664602518081665s read speed: 7381.942455644629MB/s
run 2 time: 1.7217073440551758s read speed: 7137.101460610488MB/s
run 3 time: 1.7196314334869385s read speed: 7145.7172512154675MB/s As you can see, the reads are 7X faster than writes which suggests to me that the reads are actually hitting the buffer cache rather than the disk. Thus, it seems to me that the writes are more reflective of the sustainable I/O performance ~1GB/sec. |
Beta Was this translation helpful? Give feedback.
-
You might also be interested in the following discussion. |
Beta Was this translation helpful? Give feedback.
-
I also can't find any tutorial to use it.
Beta Was this translation helpful? Give feedback.
All reactions