Hi,
I have successfully made my own version of colormap using some bit of your code. It use multi-threading everywhere, and it split works along multi-threading, and it merges colors on doors with same id alongside each threads, but you already know this. That being said, do you think that there could be some improvement on reduced color palette generation? I suspect that just from reading your colormap code, it's the k-step that can be parallelized, but I don't know if you have tried to parallelize it. The rest of your implementation is fast except in scenario for finding 1M colors in 20M+ pixels which my solution solves.
Thank you,
Reptorian