Skip to content

Latest commit

 

History

History
54 lines (42 loc) · 1.56 KB

verification.md

File metadata and controls

54 lines (42 loc) · 1.56 KB

t = torch.rand(1024) print(t[1].dtype) # torch.float32

  • Default size 1024 * 4B = 4096B = 4KB

  • Task1: Vary TENSOR_SIZE from 256 (1KB) to 26214400 (100MB). Measure time.

  • Task2: Set TENSOR_SIZE to 2621440 (10MB). Vary world_size for 2,4,8,16 Measure time.

  • Task3: How do we measure bandwidth?

  • No information

Recursive halving/doubling relies on bidirectional exchange between a pair of nodes.

  • Output of BDE Reduce Scatter:
node0:~/allreduce-839> ./run_bde.sh
4
4
Staring rank 1
Staring rank 2
Staring rank 3

Rank 0 Before tensor([0, 1, 2, 3])
Rank 1 After tensor([16, 28,  6,  7])

Rank 2 Before tensor([ 8,  9, 10, 11])
Rank 2 After tensor([ 8,  9, 32, 14])

Rank 3 Before tensor([12, 13, 14, 15])
Rank 3 After tensor([12, 13, 20, 36])

Rank 0 After tensor([24, 10,  2,  3])
Rank 1 Before tensor([4, 5, 6, 7])

Output of All Gather

Rank 0 Before tensor([0, 1, 2, 3])
Rank 0 After Reduce Scatter tensor([24, 10,  2,  3])
Rank 0 After All Gather tensor([24, 28, 32, 36])
Rank 3 Before tensor([12, 13, 14, 15])
Rank 3 After Reduce Scatter tensor([12, 13, 20, 36])
Rank 3 After All Gather tensor([24, 28, 32, 36])
Rank 1 Before tensor([4, 5, 6, 7])
Rank 1 After Reduce Scatter tensor([16, 28,  6,  7])
Rank 1 After All Gather tensor([24, 28, 32, 36])
Rank 2 Before tensor([ 8,  9, 10, 11])
Rank 2 After Reduce Scatter tensor([ 8,  9, 32, 14])
Rank 2 After All Gather tensor([24, 28, 32, 36])

cd wondershaper && sudo ./wondershaper -a eth1 -d 102400 -u 102400

cd wondershaper && sudo ./wondershaper -a eth1 -c