-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INFO] Code execution speed considerations for developers #4206
Comments
@DedeHai I was wondering if the tables from https://esp32.com/viewtopic.php?p=82090# are still correct, especially for the float multiply vs. float divide. The table comes from a time when FPU support for esp32 was broken. It seems correct that "float divide" is a lot slower than multiply by inverse, and I think (please correct me) the compiler can generate this optimization automatically. However, the difference today should be like "8-10 times slower" but not a factor of almost 100x. EDIT: there was a PR for esp-idf that corrected usage of FPU instructions in esp-idf v4. |
There is an additional thing worth mentioning: floating point "literals"According to c++ semantics, an expression like "if ( x > 1.0)" (with float x) is first "promoted" to double before evaluation, which makes it SLOW. This can be avoided
You can check the code for such "double promotions" by adding https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wdouble-promotion |
use
|
...and the classical one: avoid 8bit and 16bit integers for local variables
Update: If 8bit math (with roll-over on 255) is needed, 8bit types should be used - it's still faster than manually checking and adjusting 8bit overflows. The reason is that esp32 processors have 32bit registers and 32bit instructions, so any calculation on for more info: https://en.cppreference.com/w/cpp/types/integer |
This one is tricky. The code needs not rely on overflows as it does in WLED. |
They are for current WLED, I generatad this yesterday by inserting the code to 0.15. I can add IDF 4 once we move there. The 8bit/16bit is a bit more elaborate. In general what you write is true but it has 8bit/16bit instructions too. So yes, avoid 8bit but manually checking and adjusting overflows is slower. So if 8bit math is needed, it should be used. |
I'm really curious to see the numbers for the newer V4 framework 😀 . But yeah, it won't be better than -S3 results. You could use the Line 481 in e9d2182
|
I want to collect some info here about things I have learned while writing code for the ESP32 family MCUs. Please feel free to add to this.
This is a work in progress.
Comparison of basic operations on the CPU architectures
This table was generated using code from https://esp32.com/viewtopic.php?p=82090#
Even though the ESP32 and the S3 have hardware floating point units, they still do floating point division in software so it should be avoided in speed critical functions.
Edit (softhack007): "Float Multiply-Add" uses a special CPU instruction that combines addition and multiplication. Its generated by the compiler for expressions like
a = a + b * C;
As to why integer divisions on the C3 are so slow is unknown, the datasheet clearly states that it can do 32-bit integer division in hardware.
Bit shifts vs. division
Bit shifts are always faster than doing a division as it is a single-instruction command. The compiler will replace divisions by bit-shifts wherever possible, so
var / 256
is equivalent tovar >> 8
if var is unsigned. If it is a signed integer, it is only equivalent if the value of var is positive and this fact is known to be always the case at compile time. The reason is:-200/256=0
and-200>>8=-1
. So when using signed integers and a bit-shift is possible it is better to do it explicitly instead of leaving it to the compiler. (please correct me if I am wrong here)Fixed point vs. float
Using fixed point math is less accurate but for most operations it is accurate enough and it runs much faster especially when doing divisions.
When doing mixed-math there is a pitfall: casting negative floats into unsigned integers is undefined and leads to problems on some CPUs. https://embeddeduse.com/2013/08/25/casting-a-negative-float-to-an-unsigned-int/
To avoid this problem, explicitly cast a float into
int
before assigning it to an unsigned integer.Modulo Operator: %
The modulo operator uses several instructions. A modulo of 2^i can be replaced with a 'bitwise and' or & operator which is a single instruction. The rule is
n % 2^i = n & (2^i - 1)
. For examplen % 2048 = n & 2047
The text was updated successfully, but these errors were encountered: