Fast GPIO with C++ templates

Some people have complained that mbed's GPIO seems to be kinda slow. I decided to investigate and see how it can be improved.

Let's see how something like "led1 = 1;" looks like in assembly.

   MOVS    R1, #1
   LDR     R0, =led1
   BL      _ZN4mbed10DigitalOutaSEi ; mbed::DigitalOut::operator=(int)
            
            |
            v

 ; mbed::DigitalOut::operator=(int)
 _ZN4mbed10DigitalOutaSEi
   PUSH    {R4,LR}         ; save registers
   MOV     R4, R0          ; R4 = this
   LDR     R0, [R0,#0x10]  ; R0 = this->_pin;
   BL      gpio_write      ; gpio_write(_pin);
   MOV     R0, R4          ; return this;
   POP     {R4,PC}
 ; End of function mbed::DigitalOut::operator=(int)
            
            |
            v

 gpio_write
   BIC.W   R2, R0, #0x1F   ; base = pin & ~0x1F
   AND.W   R0, R0, #0x1F   ; bit_no = pin & 0x1F
   MOVS    R3, #1
   LSL.W   R0, R3, R0      ; R0 = 1 << bit_no
   CMP     R1, #0          ; value == 0?
   ITE EQ
   STREQ   R0, [R2,#0x1C]  ; yes, use FIOCLR
   STRNE   R0, [R2,#0x18]  ; no, use FIOSET
   BX      LR
 ; End of function gpio_write

You can see that the code in gpio_write() is using the trick with the definition of pins in PinNames.h: pins begin at LPC_GPIO0_BASE and there are 32 pins per port. And, incidentally, the base addresses for GPIO ports are also 32 bytes apart. So if you clear the low 5 bits of the pin number, you'll get the GPIO port base for that pin, and the same last 5 bits make up the bit number in the port. Still, the code has not a small overhead for a simple port write.

What if we use write() method instead of the assignment operator?

   MOVS    R1, #1
   LDR     R0, =led1
   BL      _ZN4mbed10DigitalOut5writeEi ; mbed::DigitalOut::write(int)
            
            |
            v

 ; mbed::DigitalOut::write(int)
 _ZN4mbed10DigitalOut5writeEi
   LDR     R0, [R0,#0x10]
   B.W     gpio_write 
; End of function mbed::DigitalOut::write(int)

Well, this is somewhat better. The compiler knows that it doesn't need to return this pointer, so it optimized away the register save/restore and turned the tail call into a branch. So here's Lesson 1: if you want DigitalOut to be somewhat faster, use write() instead of an assignment.

However, the gpio_write function remains and while it's not that big, it does add some processing to a simple operation.

Can we make the code more effective while keeping the convenient C++ interface?

To let the compiler do more optimization, we should. For that it's best if we let it work with the complete source code instead of linking to libraries or separate object files. In that case it can do maximum optimization tailored to how the function is called.

One of the ways to do it is to use templates. They expand at compile time and give the optimizer the most freedom. I wrote the following class:

template <PinName pin> class FastOut
{
// pin = LPC_GPIO0_BASE + port * 32 + bit
// port = (pin - LPC_GPIO0_BASE) / 32
// bit  = (pin - LPC_GPIO0_BASE) % 32

    // helper function to calculate the GPIO port definition for the pin
    // we rely on the fact that port structs are 0x20 bytes apart
    inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };

    // helper function to calculate the mask for the pin's bit in the port
    inline uint32_t mask() { return 1UL<<((pin - LPC_GPIO0_BASE) % 32); };

public:
    FastOut()
    {
        // set FIODIR bit to 1 (output)
        portdef()->FIODIR |= mask();
    }
    void write(int value)
    { 
        if ( value )
            portdef()->FIOSET = mask();
        else
            portdef()->FIOCLR = mask();
    } 
    int read()
    {
        return (portdef()->FIOPIN) & mask() != 0;
    }
    FastOut& operator= (int value) { write(value); return *this; };
    FastOut& operator= (FastOut& rhs) { return write(rhs.read()); };
    operator int() { return read(); };
};

It can be used as following:

FastOut<LED2> led2;

Let's see how the "led2 = 1;" looks like now:

 LDR.W   R8, =0x2009C020 ; 0x2009C020 = GPIO1 base
 MOV.W   R7, #0x100000   ; 0x100000 = 1<<20 (LED2 = P1.20)
 STR.W   R7, [R8,#0x18]  ; FIOSET = mask

As you can see, the compiler precalculated all constants, eliminated dead code (the else branch of if (value) ) and left only the absolutely necessary instructions. Also, even though we used the operator= version, it realized that we don't use the return value so it doesn't need to be kept around. In fact, the led2 instance does not take any flash or RAM space at all, since the class doesn't have any data members.

Pretty cool, huh?

But this was the extreme case. If we do an assignment with a variable, the compiler has to keep the comparison in.

 

; led2 = value;
LDR     R1, =0x2009C020 ; 0x2009C020 = GPIO1 base
MOV.W   R0, #0x100000   ; 0x100000 = 1<<20 (LED2 = P1.20)
CMP     R4, #0          ; value == 0 ?
ITE EQ
STREQ   R0, [R1,#0x1C]  ; if yes, FIOCLR = mask
STRNE   R0, [R1,#0x18]  ; if no, FIOSET = mask

Still, the code is much shorter that with the DigitalOut.

 

Now let's see some figures. I made a program with two loops of hundred million iterations that toggle a pin, first using DigitalOut, then using FastOut. You can try it out here: FastIO

Here's the results for LPC1768:

 

DigitalOut: 28.125000 seconds (281 ns per iteration).
FastOut: 7.291668 seconds (72 ns per iteration).

Almost 4x faster.

 

And for LPC2368:

 

DigitalOut: 90.000000 seconds (900 ns per iteration).
FastOut: 25.000004 seconds (250 ns per iteration).

Around 3x faster.

 

So, if you need better speed than DigitalOut but don't want to mess with registers directly, you can use this class and keep the syntax almost the same. However, as it is, it's missing some features of DigitalOut:

1) It does not set up the pin function to be GPIO but relies on the fact that that's the default after reset. It also doesn't set pull-up/pull-down configuration. However, that can be easily added to the constructor.
2) It does not implement RPC functionality. I haven't looked yet at what's needed for it.


25 comments

12 Apr 2010

Hi

Newbie here so the question mey be silly

How i can use the Fast IO class  in other pins than LED1..2 etc like any pin of mbed

 

Thanks in advance

Thanos

12 Apr 2010

Just put the pin number between <>:

FastOut<p8> pin8;

12 Apr 2010

Hi Igor,

I've done some work on the DigitalIn/Out implementation to get the speed improved, and I think it'll get something near to the speed-up you've seen but without changing the API. Would you be up for trying them out if we gave you a beta release?

Thanks,
Simon

12 Apr 2010

I guess I could, though I'm not a very good tester IMO.

-deleted-
22 May 2010 . Edited: 22 May 2010

Hi Guys,

 

Just ran a test with this on my mbed.

it was a very basic test see code below.

#include "mbed.h"

template  class FastOut
{
// pin = LPC_GPIO0_BASE + port * 32 + bit
// port = (pin - LPC_GPIO0_BASE) / 32
// bit  = (pin - LPC_GPIO0_BASE) % 32

    // helper function to calculate the GPIO port definition for the pin
    // we rely on the fact that port structs are 0x20 bytes apart
    inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };

    // helper function to calculate the mask for the pin's bit in the port
    inline uint32_t mask() { return 1UL<<((pin - LPC_GPIO0_BASE) % 32); };

public:
    FastOut()
    {
        // set FIODIR bit to 1 (output)
        portdef()->FIODIR |= mask();
    }
    void write(int value)
    { 
        if ( value )
            portdef()->FIOSET = mask();
        else
            portdef()->FIOCLR = mask();
    } 
    int read()
    {
        return (portdef()->FIOPIN) & mask() != 0;
    }
    FastOut& operator= (int value) { write(value); return *this; };
    FastOut& operator= (FastOut& rhs) { return write(rhs.read()); };
    operator int() { return read(); };
};

FastOut<p8> led;

int main() 
{
    while(1) 
    {
            led = 0;
            led = 1;        
    }
}

This gave me a frequency of 24MHz.

I am looking at this because I would like to try to use the Sharp LCD display from Sparkfun that is used in the PSP.

This LCD requires a pixel clock of 9 MHz.

Thought I would try to drive this with the mbed at first, I know I dont have enough I/O's but I think ill just try to leave out some of the LSB data bits on the screen.

Anyway Just to let people know I found this works.

Going to run some more tests trying to switch multiple I/O's see what the effect is.

22 May 2010

Also, have a go with the new DigitalOut and PortOut interfaces in the latest version of the library. DigitalOut has been sped up a lot now, and PortOut gives you the ability to write to all bits in a port at about the same speed, which might be interesting.

-deleted-
22 May 2010

Hi Simon,

Just tried DigitalOut see code below:

#include "mbed.h"

DigitalOut led(p8);

int main() {
    while(1) {
        led = 1;
        led = 0;
    }
}

This gives a frequency of 8.7MHz considerably less than what I was getting with Igor's code at 24MHz.

Using this code:

#include "mbed.h"

DigitalOut led(p8);

int main() {
    while(1) {
        led = !led;
    }
}

gives an even worse level of 3.4MHz

I noticed a significant difference in compiled file sizes, when using the Mbed library's DigitalOut I get just under 10kB for each of the two codes seen above, but Igor's code takes up under 2kB as seen in my earlier comment.

Think there must be a lot more stuff going on in the background with the DigitalOut Library. 

I tried Igor's code with switching multiple I/O's

I tried switch 4 I/Os they were done sequencially like this:

#include "mbed.h"

template  class FastOut
{
// pin = LPC_GPIO0_BASE + port * 32 + bit
// port = (pin - LPC_GPIO0_BASE) / 32
// bit  = (pin - LPC_GPIO0_BASE) % 32

    // helper function to calculate the GPIO port definition for the pin
    // we rely on the fact that port structs are 0x20 bytes apart
    inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };

    // helper function to calculate the mask for the pin's bit in the port
    inline uint32_t mask() { return 1UL<<((pin - LPC_GPIO0_BASE) % 32); };

public:
    FastOut()
    {
        // set FIODIR bit to 1 (output)
        portdef()->FIODIR |= mask();
    }
    void write(int value)
    { 
        if ( value )
            portdef()->FIOSET = mask();
        else
            portdef()->FIOCLR = mask();
    } 
    int read()
    {
        return (portdef()->FIOPIN) & mask() != 0;
    }
    FastOut& operator= (int value) { write(value); return *this; };
    FastOut& operator= (FastOut& rhs) { return write(rhs.read()); };
    operator int() { return read(); };
};

FastOut led1;
FastOut led2;
FastOut led3;
FastOut led4;
int main() 
{
    while(1) 
    {
            led1 = 0;
            led2 = 1;
            led3 = 0;
            led4 = 1; 
            led1 = 1;
            led2 = 0; 
            led3 = 1;
            led4 = 0;         
    }
}

It brought the frequency down to 9MHz, this is obviously going to get worse i try to drive a lot more I/O's like this.

I have yet to try PortOut but ill let you know the results.

-deleted-
22 May 2010

Ok so tried the PortOut.

I used the following code:

#include "mbed.h"

#define LED_MASK 0x07800000

PortOut ledport(Port0, LED_MASK);

int main() {
    while(1) {
        ledport = LED_MASK;
        ledport = 0;
    }
}

This gave me a 6MHz output on all 4 pins of the port. I tried the same code using a mask of 0x00800000 it still gives 6MHz so shows that its not effected by how many I/Os you want to change as long as they are on the same Port.

Still its a lot slower than 24MHz and wouldn't cope with the 9MHz clk for the LCD I need.

I wonder if it would be possible to implement Igors code ideas using the PortOut. I dont think Igors code does all that the Mbed library does but maybe we could have a cut down version.

 

22 May 2010

David, try the new version of my program I just published: FastIO I added FastPortOut that you can use similar to the new PortOut, just use template parameters instead of passing them to constructor:

FastPortOut<Port0, LED_MASK> ledport;

Here's the new test results versus library version 21:

DigitalOut: 12.500001 seconds (125 ns per iteration).
FastOut: 7.291668 seconds (72 ns per iteration).
PortOut: 13.541668 seconds (135 ns per iteration).
FastPortOut: 9.375001 seconds (93 ns per iteration).

22 May 2010

Just uploaded new version of the program to add a new class, MaskedPortOut. It uses FIOMASK to pre-set the pin mask and so can use FIOPIN to set the pin values in one go, instead of two writes to FIOSET and FIOCLR. This makes it even faster than FastPortOut, but with a drawback that you can't control other pins on the port, even from other PinOut instances.

DigitalOut: 12.500001 seconds (125 ns per iteration).
FastOut: 8.333334 seconds (83 ns per iteration).
PortOut: 13.541668 seconds (135 ns per iteration).
FastPortOut: 8.333334 seconds (83 ns per iteration).
MaskedPortOut: 5.208334 seconds (52 ns per iteration).

24 May 2010

Igor Skochinsky wrote:

Just uploaded new version of the program to add a new class, MaskedPortOut. It uses FIOMASK to pre-set the pin mask and so can use FIOPIN to set the pin values in one go, instead of two writes to FIOSET and FIOCLR. This makes it even faster than FastPortOut, but with a drawback that you can't control other pins on the port, even from other PinOut instances.

 

DigitalOut: 12.500001 seconds (125 ns per iteration).
FastOut: 8.333334 seconds (83 ns per iteration).
PortOut: 13.541668 seconds (135 ns per iteration).
FastPortOut: 8.333334 seconds (83 ns per iteration).
MaskedPortOut: 5.208334 seconds (52 ns per iteration).

 

When you say: 'you can't control other pins on the port, even from other PinOut instances.' if these pins are assigned to other interfaces will this stop them operating? e.g. SPI, UART

24 May 2010

Andrew Harpin wrote:

When you say: 'you can't control other pins on the port, even from other PinOut instances.' if these pins are assigned to other interfaces will this stop them operating? e.g. SPI, UART

I'm not 100% sure, but the non-GPIO pins will probably continue to work.

-deleted-
24 May 2010

Cant get it working Igor

tried using the following bit of code with out touching fastio.h

 

#include "mbed.h"

#include "FastIO.h"

#define LED_MASK 0x07800000


FastPortOut ledport2;

int main() {
 
   
    while (1)
    {
        ledport2 = 0x0;
        ledport2 = LED_MASK;
    }

}

 

Just got high output on DIP pins 15,16,17,18. :(

24 May 2010

Your code paste removed stuff between angle brackets but I'll assume you had Port0 and LED_MASK there. Also, your statement is a bit vague. Do you mean you have only high output, i.e. you don't see low? Try inserting a delay between assignment of 0 and LED_MASK, it could be that the switch happens too quickly and you don't see 0 at all. I don't have an oscilloscope/logic analyzer myself so I can't test it.

-deleted-
25 May 2010

Yes you arer right Igor I did have it there :s its the second time posting had done that to code i have inserted :(

I only get a high output i dont see it go low.

i will try a delay but I dont think it will make a difference because I am using a 300MHz scope to see the signal.

25 May 2010

David,

You were right, I had a bug in FastPortOut (MaskedPortOut should have worked correctly). Please try the new version: FastIO

-deleted-
25 May 2010

Igor,

DigitalOut gives 4MHz

FastOut gives 6MHz

but port out fast out and masked out give nothing on pin 15

-deleted-
25 May 2010

How can i get this bit of code to work

int main() {
0x2009c000 = 0x00800000; //dir reg
0x2009c010 = 0x00800000; //mask reg
    while(1) {
0x2009c018 = 0x00800000; //set reg
0x2009c01c = 0x00800000; //clear reg
    }
}

Basically all its doing is sets port0 dir reg to output for DIP Pin 15

sets the MASK Reg for Pin 15 again

start infinite loop

sets PIN 15 high

and sets pin 15 low

repeat

but the compiler doesn't recognise that i just want to write to a register in the device and wont compile

sorry if i should know this but im fairly new to C programming escpecially for microcontrollers and am only just getting to grips with mbed stuff.

 

-deleted-
25 May 2010
#include "mbed.h"

PinName pin = P0_23;
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };

int main() {
int dir = 0;
portdef()->FIODIR = dir;
portdef()->FIODIR = 0x00800000 | dir;
portdef()->FIOMASK = 0xFF7FFFFF;
while(1){
portdef()->FIOSET = 0x00800000;
portdef()->FIOCLR = 0x00800000;
}
}

The above code works, it causes DIP Pin 15 to go high and low constantly at 24Mhz but the amplitutde is pretty rubbish at only 300mV

-deleted-
25 May 2010

forget the 300mV thing its really just over 2V, now have it toggling on DIP pins 15,16,17,18 all at 24MHz

-deleted-
25 May 2010

ill be honest though I have no idea what the below 2 lines do

PinName pin = P0_23;
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };

 

the top one i wrote myself but was just luck it worked and the second one was taken from Igors code.

25 May 2010

portdef() macro just returns the port definition from the pin number. Since you know that you need port 0, you can use directly LPC_GPIO0->FIODIR and so on.

-deleted-
25 May 2010

Ok just thinking about the I/O's is it posible to get a higher frequency than 24MHz out of an I/O?

Is 24Mhz I/O good?

this means that it takes 2 clk cycles to write to a register, is this correct?

-deleted-
25 May 2010

Thanks Igor I'm starting to understand a bit better.

I change the code.

#include "mbed.h"
int main() {
int dir = 0;
LPC_GPIO0->FIODIR = dir;
LPC_GPIO0->FIODIR = 0x07800000 | dir;
LPC_GPIO0->FIOMASK = 0xF87FFFFF;
while(1){
LPC_GPIO0->FIOSET = 0x07800000;
LPC_GPIO0->FIOCLR = 0x07800000;
}
}

This is the complete code for toggling 4 I/O lines at the same time at 24MHz.

Think thats as fast as it can go unless the LPC1768 is capable of 1 clk cycle writes in which case we could see a 50MHz I/O

08 Jan 2011

If this is of any interest for you, i've made some tests this morning with my mbed, you can see oscilloscope plots here :

http://sylvain.azarian.org/doku.php?id=mbed

You need to log in to post a comment