Mbed massive overhead makes same code to run much much slower than "bare metal one"

Forum » mbed » Mbed massive overhead makes same code to run much much slower than "bare metal one"

Mbed massive overhead makes same code to run much much slower than "bare metal one"

Topic last updated 06 Apr 2016, by -deleted-. 2 replies

# 01 Apr 2016

Hi I have decided to test performance. I took 128x128 pix TFT display library from mbed, It does not use any hardware for the serial communication. Then I have ported it (same nucleo 446RE) as a bare metal sollution.

I have only changed one thing

mbed; D6 = x to GPIOx-BSRR = xx

Same clock settings.

Any idea how to improve the performance?

~~mbed~~

- bare metal.

Bill Bellis

# 05 Apr 2016

Hmmmm....mbed doesn't have any overhead. In fact it is a bare metal system with classes and a basic bsp for initialization. It sounds like some of the default setups may not be running the same peripheral settings you have in your own code. Add mbed-dev to your project and delete the standard mbed lib. Now you can examine what may be causing the problem. You might want to use a free debugger like IAR kickstart that you can single step and examine registers etc. It's free as long as you don't exceed 32K. You just need to export the project and select IAR. Lots of other good ways to solve this as well. Hopefully others can chime in.

-deleted-

# 06 Apr 2016

Hi there,

GPIOx-BSRR = xx is hardcoded and maps to 3 instructions, which execute in 5 cycles on the CM4 (3 cycles for instructions, 2 cycles for bus wait states). That's the fastest possible pin set/reset/toggle implementation possible on the F446.

In contrast, D6 = x is syntactical sugar (ie. C++ operator overloading) which maps (for free) to something like gpio_write(D6, x). This is a C function that cannot be inlined, and a jump will have to happen. That's already at least 2 instructions (ldr address of function + branch to it), NOT counting setting up the function arguments. Branching back from the function takes another instruction. So even if you to the hardcoded pin set in the function, you are already slower (2x). The actual implementation of the `gpio_write` will only make it slower.

So, sadly, yes, there is an overhead for this abstraction. Unfortunately the hardcoded method isn't very scalable either, so you will need to have this kind of compromise to keep the code flexible and generic.

If you only need to set/reset/toggle the pin, you can pass a pointer to the gpio bit inside the GPIOx-BSRR register in the bit-banding alias-region (available on CM3 and up). See the documentation on bit-banding for details: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337h/Behcjiic.html

Cheers, Niklas

Important changes to forums and questions