At the moment Monty regresses in speed beyond somewhere in the region of 64-128 threads.
The main remaining source of contention is in updating node statistics:
- A thread writes to every node statistic on its selection path in backpropagation on each iteration
- So each thread on each of its iterations is writing to the root node (+ much overlap on nodes near the root)