Replies: 3 comments
-
The code is memory bound somewhere between 8 and 16 threads on my 16 core system. I suspect your system has 4 cores / 8 hyperthreads. Hyperthreading isn't helping your performance. The output may subtly change with different numbers of threads due to the multithreading architecture of the code, but the average quality shouldn't. |
Beta Was this translation helpful? Give feedback.
-
M1 definitely has 8 physical cores (and I believe it has fairly high memory bandwidth but may be wrong). It could have something to do with 4 of those cores being lower-performance efficiency cores, but spreading the workload across more cores should still improve performance. |
Beta Was this translation helpful? Give feedback.
-
Going from 4 to 7-8 helps, but only marginally. Maybe if they were pinned.. |
Beta Was this translation helpful? Give feedback.
-
I've been testing your code from 1 to 8 threads and the output is always different. The speed is not depend on the number of threads. On the contrary, 4 threads may perform much better than 1, whereas 8 threads supposedly provides a better result. However, the same prompt may give the same excellent output with triple speed with 4 threads compared to 8. But still, when I use 8 threads (my maximum on M1) I use all my CPU resources, but it doesn't affect speed at all (seemingly works slower) and not giving quality effect (apparently). Am I wrong? Can you correct me if I'm mistaken? May be there is some best speed/quality option and I just that stupid that was unable to figure out how to use this option?
Beta Was this translation helpful? Give feedback.
All reactions