New Policy Network: nn-6e49a41bd7c0.network
#151
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Switches from using Monty data to instead distilling from the lc0 BT4 net.
Two stages. First a direct comparison followed by a scaling up of net size.
Stage 1: Same size, schedule, and number of games as master net:
UHO, 1 node
Results of Patch vs Baseline (1 nodes, 1t, 64MB, UHO_Lichess_4852_v1.epd):
Elo: 61.05 +/- 2.53, nElo: 90.44 +/- 3.67
LOS: 100.00 %, DrawRatio: 36.97 %, PairsRatio: 2.50
Games: 34416, Wins: 11362, Losses: 5376, Draws: 17678, Points: 20201.0 (58.70 %)
Ptnml(0-2): [427, 2676, 6362, 5970, 1773], WL/DD Ratio: 0.41
DFRC, 1 node
Results of Patch vs Baseline (1 nodes, 1t, 64MB, DFRC_openings.epd):
Elo: 128.77 +/- 3.68, nElo: 177.40 +/- 4.63
LOS: 100.00 %, DrawRatio: 27.16 %, PairsRatio: 5.26
Games: 21600, Wins: 10492, Losses: 2834, Draws: 8274, Points: 14629.0 (67.73 %)
Ptnml(0-2): [178, 1078, 2933, 4130, 2481], WL/DD Ratio: 0.91
Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,4.00>
Total: 1504 W: 482 L: 315 D: 707
Ptnml(0-2): 15, 128, 325, 243, 41
https://tests.montychess.org/tests/view/69627ee53974b6e428003a08
Passed LTC:
LLR: 2.94 (-2.94,2.94) <1.00,5.00>
Total: 1554 W: 429 L: 275 D: 850
Ptnml(0-2): 6, 136, 360, 248, 27
https://tests.montychess.org/tests/view/69627eee3974b6e428003a0a
Stage 2: 100M games -> 400M games, L1 16384 -> L1 40960
UHO, 1 node
Results of Patch vs Baseline (1 nodes, 1t, 64MB, UHO_Lichess_4852_v1.epd):
Elo: 68.54 +/- 2.06, nElo: 94.73 +/- 2.78
LOS: 100.00 %, DrawRatio: 34.26 %, PairsRatio: 2.50
Games: 60102, Wins: 22986, Losses: 11281, Draws: 25835, Points: 35903.5 (59.74 %)
Ptnml(0-2): [965, 4673, 10296, 9926, 4191], WL/DD Ratio: 0.83
Speed testing: Comparing the stage 1 and stage 2 nets in this PR:
EPYC 9654 x2:
setoption name Threads value 384
setoption name Hash value 384000
go movetime 60000
Start position:
L1 16384:
info depth 19 seldepth 60 score cp 16 time 60017 nodes 2406951331 nps 40103928 pv d2d4 g8f6 g1f3 b7b6 g2g3 c8b7 c2c4 e7e6 f1g2 f8b4 c1d2 b4d2 d1d2 e8g8 b1c3 d7d6 d2f4 f6h5 f4e3
bestmove d2d4
L1 40960:
info depth 18 seldepth 53 score cp 15 time 60046 nodes 1453206163 nps 24201402 pv d2d4 g8f6 c2c4 e7e6 b1c3 f8b4 d1c2 e8g8 g1f3 c7c5 d4c5 b8a6 g2g3 a6c5 f1g2 c5e4 e1g1 e4c3
bestmove d2d4
Speed diff: -40%
Endgame position:
position fen 7k/6p1/6Pp/3B1b1P/5Pn1/B7/4K3/8 w - - 3 70
L1 16384:
info depth 15 seldepth 56 score cp 108 time 60003 nodes 1947063535 nps 32448930 pv a3b2 f5b1 d5g2 b1f5 g2c6 f5e6 c6f3 g4h2 f3g2 e6c4 e2e3 h2g4 e3d2 g4h2 f4f5
bestmove a3b2
L1 40960:
info depth 16 seldepth 50 score cp 123 time 60029 nodes 1789662569 nps 29813008 pv a3c5 g4h2 c5g1 h2g4 g1d4 g4h2 d5g2 h2g4 g2h3 h8g8 e2f3 g4e5 f3g3 e5c6 d4c5 f5b1
bestmove a3c5
Speed diff: -8%
Finally, the L1 40960 is compared to the original BT4 net:
Results of Patch vs Baseline (1 nodes, 1t - 0t, 64MB - NULL, 8moves_v3.pgn):
Elo: -673.44 +/- 14.80, nElo: -1577.71 +/- 5.56
LOS: 0.00 %, DrawRatio: 0.65 %, PairsRatio: 0.00
Games: 15000, Wins: 43, Losses: 14434, Draws: 523, Points: 304.5 (2.03 %)
Ptnml(0-2): [6944, 505, 49, 2, 0], WL/DD Ratio: 5.12
This is at 200,000x less operations per inference than BT4. The L1 40960 policy net performs at a 2000 FIDE level statically (this has been verified with other measurements vs e.g weaker lc0 nets like T79)
Bench: 1620531