Fast GPIO with C++ templates
Some people have complained that mbed's GPIO seems to be kinda slow. I decided to investigate and see how it can be improved.
Let's see how something like "led1 = 1;" looks like in assembly.
MOVS R1, #1
LDR R0, =led1
BL _ZN4mbed10DigitalOutaSEi ; mbed::DigitalOut::operator=(int)
|
v
; mbed::DigitalOut::operator=(int)
_ZN4mbed10DigitalOutaSEi
PUSH {R4,LR} ; save registers
MOV R4, R0 ; R4 = this
LDR R0, [R0,#0x10] ; R0 = this->_pin;
BL gpio_write ; gpio_write(_pin);
MOV R0, R4 ; return this;
POP {R4,PC}
; End of function mbed::DigitalOut::operator=(int)
|
v
gpio_write
BIC.W R2, R0, #0x1F ; base = pin & ~0x1F
AND.W R0, R0, #0x1F ; bit_no = pin & 0x1F
MOVS R3, #1
LSL.W R0, R3, R0 ; R0 = 1 << bit_no
CMP R1, #0 ; value == 0?
ITE EQ
STREQ R0, [R2,#0x1C] ; yes, use FIOCLR
STRNE R0, [R2,#0x18] ; no, use FIOSET
BX LR
; End of function gpio_write
You can see that the code in gpio_write() is using the trick with the definition of pins in PinNames.h: pins begin at LPC_GPIO0_BASE and there are 32 pins per port. And, incidentally, the base addresses for GPIO ports are also 32 bytes apart. So if you clear the low 5 bits of the pin number, you'll get the GPIO port base for that pin, and the same last 5 bits make up the bit number in the port. Still, the code has not a small overhead for a simple port write.
What if we use write() method instead of the assignment operator?
MOVS R1, #1
LDR R0, =led1
BL _ZN4mbed10DigitalOut5writeEi ; mbed::DigitalOut::write(int)
|
v
; mbed::DigitalOut::write(int)
_ZN4mbed10DigitalOut5writeEi
LDR R0, [R0,#0x10]
B.W gpio_write
; End of function mbed::DigitalOut::write(int)
Well, this is somewhat better. The compiler knows that it doesn't need to return this pointer, so it optimized away the register save/restore and turned the tail call into a branch. So here's Lesson 1: if you want DigitalOut to be somewhat faster, use write() instead of an assignment.
However, the gpio_write function remains and while it's not that big, it does add some processing to a simple operation.
Can we make the code more effective while keeping the convenient C++ interface?
To let the compiler do more optimization, we should. For that it's best if we let it work with the complete source code instead of linking to libraries or separate object files. In that case it can do maximum optimization tailored to how the function is called.
One of the ways to do it is to use templates. They expand at compile time and give the optimizer the most freedom. I wrote the following class:
template <PinName pin> class FastOut
{
// pin = LPC_GPIO0_BASE + port * 32 + bit
// port = (pin - LPC_GPIO0_BASE) / 32
// bit = (pin - LPC_GPIO0_BASE) % 32
// helper function to calculate the GPIO port definition for the pin
// we rely on the fact that port structs are 0x20 bytes apart
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };
// helper function to calculate the mask for the pin's bit in the port
inline uint32_t mask() { return 1UL<<((pin - LPC_GPIO0_BASE) % 32); };
public:
FastOut()
{
// set FIODIR bit to 1 (output)
portdef()->FIODIR |= mask();
}
void write(int value)
{
if ( value )
portdef()->FIOSET = mask();
else
portdef()->FIOCLR = mask();
}
int read()
{
return (portdef()->FIOPIN) & mask() != 0;
}
FastOut& operator= (int value) { write(value); return *this; };
FastOut& operator= (FastOut& rhs) { return write(rhs.read()); };
operator int() { return read(); };
};
It can be used as following:
FastOut<LED2> led2;
Let's see how the "led2 = 1;" looks like now:
LDR.W R8, =0x2009C020 ; 0x2009C020 = GPIO1 base
MOV.W R7, #0x100000 ; 0x100000 = 1<<20 (LED2 = P1.20)
STR.W R7, [R8,#0x18] ; FIOSET = mask
As you can see, the compiler precalculated all constants, eliminated dead code (the else branch of if (value) ) and left only the absolutely necessary instructions. Also, even though we used the operator= version, it realized that we don't use the return value so it doesn't need to be kept around. In fact, the led2 instance does not take any flash or RAM space at all, since the class doesn't have any data members.
Pretty cool, huh?
But this was the extreme case. If we do an assignment with a variable, the compiler has to keep the comparison in.
; led2 = value;
LDR R1, =0x2009C020 ; 0x2009C020 = GPIO1 base
MOV.W R0, #0x100000 ; 0x100000 = 1<<20 (LED2 = P1.20)
CMP R4, #0 ; value == 0 ?
ITE EQ
STREQ R0, [R1,#0x1C] ; if yes, FIOCLR = mask
STRNE R0, [R1,#0x18] ; if no, FIOSET = mask
Still, the code is much shorter that with the DigitalOut.
Now let's see some figures. I made a program with two loops of hundred million iterations that toggle a pin, first using DigitalOut, then using FastOut. You can try it out here: FastIO
Here's the results for LPC1768:
DigitalOut: 28.125000 seconds (281 ns per iteration).
FastOut: 7.291668 seconds (72 ns per iteration).
Almost 4x faster.
And for LPC2368:
DigitalOut: 90.000000 seconds (900 ns per iteration).
FastOut: 25.000004 seconds (250 ns per iteration).
Around 3x faster.
So, if you need better speed than DigitalOut but don't want to mess with registers directly, you can use this class and keep the syntax almost the same. However, as it is, it's missing some features of DigitalOut:
1) It does not set up the pin function to be GPIO but relies on the fact that that's the default after reset. It also doesn't set pull-up/pull-down configuration. However, that can be easily added to the constructor.
2) It does not implement RPC functionality. I haven't looked yet at what's needed for it.
Hi
Newbie here so the question mey be silly
How i can use the Fast IO class in other pins than LED1..2 etc like any pin of mbed
Thanks in advance
Thanos
Just put the pin number between <>:
FastOut<p8> pin8;
Hi Igor,
I've done some work on the DigitalIn/Out implementation to get the speed improved, and I think it'll get something near to the speed-up you've seen but without changing the API. Would you be up for trying them out if we gave you a beta release?
Thanks,
Simon
I guess I could, though I'm not a very good tester IMO.
-deleted-
#
22 May 2010 . Edited: 22 May 2010
Hi Guys,
Just ran a test with this on my mbed.
it was a very basic test see code below.
#include "mbed.h"
template class FastOut
{
// pin = LPC_GPIO0_BASE + port * 32 + bit
// port = (pin - LPC_GPIO0_BASE) / 32
// bit = (pin - LPC_GPIO0_BASE) % 32
// helper function to calculate the GPIO port definition for the pin
// we rely on the fact that port structs are 0x20 bytes apart
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };
// helper function to calculate the mask for the pin's bit in the port
inline uint32_t mask() { return 1UL<<((pin - LPC_GPIO0_BASE) % 32); };
public:
FastOut()
{
// set FIODIR bit to 1 (output)
portdef()->FIODIR |= mask();
}
void write(int value)
{
if ( value )
portdef()->FIOSET = mask();
else
portdef()->FIOCLR = mask();
}
int read()
{
return (portdef()->FIOPIN) & mask() != 0;
}
FastOut& operator= (int value) { write(value); return *this; };
FastOut& operator= (FastOut& rhs) { return write(rhs.read()); };
operator int() { return read(); };
};
FastOut<p8> led;
int main()
{
while(1)
{
led = 0;
led = 1;
}
}
This gave me a frequency of 24MHz.
I am looking at this because I would like to try to use the Sharp LCD display from Sparkfun that is used in the PSP.
This LCD requires a pixel clock of 9 MHz.
Thought I would try to drive this with the mbed at first, I know I dont have enough I/O's but I think ill just try to leave out some of the LSB data bits on the screen.
Anyway Just to let people know I found this works.
Going to run some more tests trying to switch multiple I/O's see what the effect is.
Also, have a go with the new DigitalOut and PortOut interfaces in the latest version of the library. DigitalOut has been sped up a lot now, and PortOut gives you the ability to write to all bits in a port at about the same speed, which might be interesting.
Hi Simon,
Just tried DigitalOut see code below:
#include "mbed.h"
DigitalOut led(p8);
int main() {
while(1) {
led = 1;
led = 0;
}
}
This gives a frequency of 8.7MHz considerably less than what I was getting with Igor's code at 24MHz.
Using this code:
#include "mbed.h"
DigitalOut led(p8);
int main() {
while(1) {
led = !led;
}
}
gives an even worse level of 3.4MHz
I noticed a significant difference in compiled file sizes, when using the Mbed library's DigitalOut I get just under 10kB for each of the two codes seen above, but Igor's code takes up under 2kB as seen in my earlier comment.
Think there must be a lot more stuff going on in the background with the DigitalOut Library.
I tried Igor's code with switching multiple I/O's
I tried switch 4 I/Os they were done sequencially like this:
#include "mbed.h"
template class FastOut
{
// pin = LPC_GPIO0_BASE + port * 32 + bit
// port = (pin - LPC_GPIO0_BASE) / 32
// bit = (pin - LPC_GPIO0_BASE) % 32
// helper function to calculate the GPIO port definition for the pin
// we rely on the fact that port structs are 0x20 bytes apart
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };
// helper function to calculate the mask for the pin's bit in the port
inline uint32_t mask() { return 1UL<<((pin - LPC_GPIO0_BASE) % 32); };
public:
FastOut()
{
// set FIODIR bit to 1 (output)
portdef()->FIODIR |= mask();
}
void write(int value)
{
if ( value )
portdef()->FIOSET = mask();
else
portdef()->FIOCLR = mask();
}
int read()
{
return (portdef()->FIOPIN) & mask() != 0;
}
FastOut& operator= (int value) { write(value); return *this; };
FastOut& operator= (FastOut& rhs) { return write(rhs.read()); };
operator int() { return read(); };
};
FastOut led1;
FastOut led2;
FastOut led3;
FastOut led4;
int main()
{
while(1)
{
led1 = 0;
led2 = 1;
led3 = 0;
led4 = 1;
led1 = 1;
led2 = 0;
led3 = 1;
led4 = 0;
}
}
It brought the frequency down to 9MHz, this is obviously going to get worse i try to drive a lot more I/O's like this.
I have yet to try PortOut but ill let you know the results.
Ok so tried the PortOut.
I used the following code:
#include "mbed.h"
#define LED_MASK 0x07800000
PortOut ledport(Port0, LED_MASK);
int main() {
while(1) {
ledport = LED_MASK;
ledport = 0;
}
}
This gave me a 6MHz output on all 4 pins of the port. I tried the same code using a mask of 0x00800000 it still gives 6MHz so shows that its not effected by how many I/Os you want to change as long as they are on the same Port.
Still its a lot slower than 24MHz and wouldn't cope with the 9MHz clk for the LCD I need.
I wonder if it would be possible to implement Igors code ideas using the PortOut. I dont think Igors code does all that the Mbed library does but maybe we could have a cut down version.
David, try the new version of my program I just published: FastIO I added FastPortOut that you can use similar to the new PortOut, just use template parameters instead of passing them to constructor:
FastPortOut<Port0, LED_MASK> ledport;
Here's the new test results versus library version 21:
DigitalOut: 12.500001 seconds (125 ns per iteration).
FastOut: 7.291668 seconds (72 ns per iteration).
PortOut: 13.541668 seconds (135 ns per iteration).
FastPortOut: 9.375001 seconds (93 ns per iteration).
Just uploaded new version of the program to add a new class, MaskedPortOut. It uses FIOMASK to pre-set the pin mask and so can use FIOPIN to set the pin values in one go, instead of two writes to FIOSET and FIOCLR. This makes it even faster than FastPortOut, but with a drawback that you can't control other pins on the port, even from other PinOut instances.
DigitalOut: 12.500001 seconds (125 ns per iteration).
FastOut: 8.333334 seconds (83 ns per iteration).
PortOut: 13.541668 seconds (135 ns per iteration).
FastPortOut: 8.333334 seconds (83 ns per iteration).
MaskedPortOut: 5.208334 seconds (52 ns per iteration).
Just uploaded new version of the program to add a new class, MaskedPortOut. It uses FIOMASK to pre-set the pin mask and so can use FIOPIN to set the pin values in one go, instead of two writes to FIOSET and FIOCLR. This makes it even faster than FastPortOut, but with a drawback that you can't control other pins on the port, even from other PinOut instances.
DigitalOut: 12.500001 seconds (125 ns per iteration).
FastOut: 8.333334 seconds (83 ns per iteration).
PortOut: 13.541668 seconds (135 ns per iteration).
FastPortOut: 8.333334 seconds (83 ns per iteration).
MaskedPortOut: 5.208334 seconds (52 ns per iteration).
When you say: 'you can't control other pins on the port, even from other PinOut instances.' if these pins are assigned to other interfaces will this stop them operating? e.g. SPI, UART
When you say: 'you can't control other pins on the port, even from other PinOut instances.' if these pins are assigned to other interfaces will this stop them operating? e.g. SPI, UART
I'm not 100% sure, but the non-GPIO pins will probably continue to work.
Cant get it working Igor
tried using the following bit of code with out touching fastio.h
#include "mbed.h"
#include "FastIO.h"
#define LED_MASK 0x07800000
FastPortOut ledport2;
int main() {
while (1)
{
ledport2 = 0x0;
ledport2 = LED_MASK;
}
}
Just got high output on DIP pins 15,16,17,18. :(
Your code paste removed stuff between angle brackets but I'll assume you had Port0 and LED_MASK there. Also, your statement is a bit vague. Do you mean you have only high output, i.e. you don't see low? Try inserting a delay between assignment of 0 and LED_MASK, it could be that the switch happens too quickly and you don't see 0 at all. I don't have an oscilloscope/logic analyzer myself so I can't test it.
Yes you arer right Igor I did have it there :s its the second time posting had done that to code i have inserted :(
I only get a high output i dont see it go low.
i will try a delay but I dont think it will make a difference because I am using a 300MHz scope to see the signal.
David,
You were right, I had a bug in FastPortOut (MaskedPortOut should have worked correctly). Please try the new version: FastIO
Igor,
DigitalOut gives 4MHz
FastOut gives 6MHz
but port out fast out and masked out give nothing on pin 15
How can i get this bit of code to work
int main() {
0x2009c000 = 0x00800000; //dir reg
0x2009c010 = 0x00800000; //mask reg
while(1) {
0x2009c018 = 0x00800000; //set reg
0x2009c01c = 0x00800000; //clear reg
}
}
Basically all its doing is sets port0 dir reg to output for DIP Pin 15
sets the MASK Reg for Pin 15 again
start infinite loop
sets PIN 15 high
and sets pin 15 low
repeat
but the compiler doesn't recognise that i just want to write to a register in the device and wont compile
sorry if i should know this but im fairly new to C programming escpecially for microcontrollers and am only just getting to grips with mbed stuff.
#include "mbed.h"
PinName pin = P0_23;
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };
int main() {
int dir = 0;
portdef()->FIODIR = dir;
portdef()->FIODIR = 0x00800000 | dir;
portdef()->FIOMASK = 0xFF7FFFFF;
while(1){
portdef()->FIOSET = 0x00800000;
portdef()->FIOCLR = 0x00800000;
}
}
The above code works, it causes DIP Pin 15 to go high and low constantly at 24Mhz but the amplitutde is pretty rubbish at only 300mV
forget the 300mV thing its really just over 2V, now have it toggling on DIP pins 15,16,17,18 all at 24MHz
ill be honest though I have no idea what the below 2 lines do
PinName pin = P0_23;
inline LPC_GPIO_TypeDef* portdef() { return (LPC_GPIO_TypeDef*)(LPC_GPIO_BASE + ((pin - P0_0)/32)*0x20); };
the top one i wrote myself but was just luck it worked and the second one was taken from Igors code.
portdef() macro just returns the port definition from the pin number. Since you know that you need port 0, you can use directly LPC_GPIO0->FIODIR and so on.
Ok just thinking about the I/O's is it posible to get a higher frequency than 24MHz out of an I/O?
Is 24Mhz I/O good?
this means that it takes 2 clk cycles to write to a register, is this correct?
Thanks Igor I'm starting to understand a bit better.
I change the code.
#include "mbed.h"
int main() {
int dir = 0;
LPC_GPIO0->FIODIR = dir;
LPC_GPIO0->FIODIR = 0x07800000 | dir;
LPC_GPIO0->FIOMASK = 0xF87FFFFF;
while(1){
LPC_GPIO0->FIOSET = 0x07800000;
LPC_GPIO0->FIOCLR = 0x07800000;
}
}
This is the complete code for toggling 4 I/O lines at the same time at 24MHz.
Think thats as fast as it can go unless the LPC1768 is capable of 1 clk cycle writes in which case we could see a 50MHz I/O
If this is of any interest for you, i've made some tests this morning with my mbed, you can see oscilloscope plots here :
http://sylvain.azarian.org/doku.php?id=mbed
You need to log in to post a comment
Hi
Newbie here so the question mey be silly
How i can use the Fast IO class in other pins than LED1..2 etc like any pin of mbed
Thanks in advance
Thanos