The official Mbed 2 C/C++ SDK provides the software platform and libraries to build your applications.

Dependents:   hello SerialTestv11 SerialTestv12 Sierpinski ... more

Issue: Significant LPC1768 InterruptIn speed improvement (Closed: Fixed)

After reading http://mbed.org/questions/1228/Can-I-use-InterruptIn-to-receive-interru/ I got curious and decided to check the LPC1768's interruptIn code, which is here: http://mbed.org/users/mbed_official/code/mbed-src/file/3bc89ef62ce7/vendor/NXP/LPC1768/hal/gpio_irq_api.c.

This code runs loops to check every possible interrupt pending bit, which works but isn't exactly efficient. So plan A making a simple search algorithm to find the bit that is '1'. Considering by far the most likely situation is only a single interrupt pending, that would in most use cases speed up the process alot. Sadly after roughly 5 minutes I got bored of trying to make that.

So time for plan B, and going oldschool (at least for me personally), google the instruction set, and see if there is anything useful in it. The instruction set is surprisingly small and easy to understand, and after almost missing it I found what I was looking for: clz -> Count Leading Zeros. And with some more googling I found it is somewhere in the standard code also hidden as C command:

__CLZ

So at that point it becomes fairly straightforward: Run a loop while there are still bits at '1', count the leading zeros, use that to calculate the bit position, and call the required user function. Next clear the interrupt, and check if there are more pending:

    while(fall0 > 0) {      //Continue as long as there are interrupts pending
        bitloc = 31 - __CLZ(fall0); //CLZ returns number of leading zeros, 31 minus that is location of first pending interrupt
        irq_handler(channel_ids[bitloc], IRQ_FALL); //Run that interrupt
        
        //Both clear the interrupt with clear register, and remove it from our local copy of the interrupt pending register
        LPC_GPIOINT->IO0IntClr = 1 << bitloc;
        fall0 -= 1<<bitloc;
    }

And rising interrupt is exactly the same. For port2 we have to give an offset and take into account port2 is only half a port.

    while(fall2 > 0) {      //Continue as long as there are interrupts pending
        bitloc = 31 - __CLZ(fall2); //CLZ returns number of leading zeros, 31 minus that is location of first pending interrupt
        
        if (bitloc < 16)            //Not sure if this is actually needed
            irq_handler(channel_ids[bitloc+32], IRQ_FALL); //Run that interrupt
        
        //Both clear the interrupt with clear register, and remove it from our local copy of the interrupt pending register
        LPC_GPIOINT->IO2IntClr = 1 << bitloc;
        fall2 -= 1<<bitloc;
    }

In the testcase a simple program where you have to short p21 and p22. p21 creates a 50% duty cycle square wave with increasing frequencies, p22 uses interruptIn to count the number of rising edges in one second. If more than 90% of the edges are counted it is considered a pass, otherwise a fail (it isn't exact, since there is some overhead also present, and not all frequencies can be generated exactly).

In Vendor / NXP / LPC1768 / hal / gpio_irq_api.c there is a define to switch between old and new irq routines. (Apparently it doesn't work if I try to make the switch in main.cpp).

The results I get is that the old routine fails somewhere between 90kHz and 100kHz, while the new one starts missing edges at 390kHz. A four times increase in speed :)

Example program

9 comments:

10 Sep 2013

Hi Erik,

This certainly looks like a nice improvement, thanks! What I don't understand is why you call irq_handler directly now, without checking for a proper ID like the old code did:

if (channel_ids[i] != 0)
    irq_handler(channel_ids[i], IRQ_RISE);


If you ignore the test, it looks like you might end up calling handlers for pins that don't actually have a handler set (that is, there is no InterruptIn object associated with that pin).

Thanks, Bogdan

10 Sep 2013

While ago for me ;).

In principle I don't think a bit should ever be able to be set without a proper irq_handler being attached: the interrupt shouldn't be enabled if no handler is attached (since attaching a NULL pointer should result in the interrupt being disabled). Although for robustness it is probably better if that check is indeed included in the code.

I expect that to be only very slight extra overhead, so should not affect the performance differences much.

Btw I tested this on the LPC1768, but iirc the M0 instruction set also has that function, so then many mbeds can use it.

11 Sep 2013

The problem is that on the LPC1768 the GPIO interrupts are not handled per pin, but per port. You set a GPIO interrupt handler for the whole port and then you check to see what pin(s) actually triggered the interrupt, which is exactly what your code does. However, a pin might not have a corresponding InterruptIn instance, so the interrupt for that particular pin must not be handled. This is why the check for channel_ids[i] is required. Did you get a chance to test your code with the added 'if' statements and check how that affects the speed?

Thanks,
Bogdan

11 Sep 2013

That is true, but I don't think an interrupt bit can be set when the corresponding interrupt pin is not enabled (unless user code did it). If it could be the LPC1768 would crash when you enable a random InterruptIn while a high frequency signal is on another pin: simply because it would be stuck forever checking if it should handle an interrupt. Quick test with old code also doesn't crash my mbed when I connect the PWM to another pin, while if it would try to execute a NULL pointer it should crash.

That said I added it to the test code (and published new version also). This code starts at 65kHz, since otherwise the code at 380kHz would lock up (considering at 385kHz it now stops normally my guess is that it was just permanently handling interrupts, so it never got to my code which would print the result, which is normal behavior that can happen).

Anyway result without if statement:

Expected 395000 edges, measured 279840 edges.
Test failed at 395kHz!

With if statement:

Expected 385000 edges, measured 316706 edges.
Test failed at 385kHz!

So slightly slower, but not a big change. This is btw with the old interrupt system before the chaining code was added, so that will also be some extra overhead. I also didn't expect much of a difference: if it runs at 400kHz roughly, then it takes 240 cycles to enter, execute and exit the complete interrupt. A single check if a variable is NULL should not take more than a few clock cycles.

11 Sep 2013

Thanks for the report! The improvement is still significant, so I'll try to merge your code today.

11 Sep 2013

Your code was merged. Thanks again!

26 Sep 2013

But still i don't understand, how come a CPU with 100 MHz can handle 400 KHz interrupts. Is this because of the pipeline, memory access cycle, or the library?

26 Sep 2013

Combination of two parts: First the general overhead of the abstraction layer of the library. It simply costs time to walk through the multiple layers, which are needed to have alot of flexibility as user. This is generally the case, those easy to use libraries are not suitable if you want very high performance out of it.

Next is that a general IO interrupt is used here. Advantage is that it can be used on most mbed pins. If however you use one of the dedicated GPIO interrupt pins and only that pin, then you can tie the user function directly to that interrupt and it will be alot faster. Also then you shouldn't expect you reach 50MHz or something similar, switching to and from interrupt context also costs time (registers need to be saved, pointers set to new values, etc).

It could be interesting to find out which part is now limitting speed. Previously it was obviously figuring out which interrupt fired, now maybe something else. If I read it correctly an Cortex M3 requires 12 cycles to enter an interrupt (with context switching it is alot more, but I don't think that is relevant here). Lets say also 12 cycles to exit it. Thats already 24 cycles. It needs to read 4 APB registers, and the code says that is slow. No idea how slow, but lets say 5 cycles each, thats another 20 cycles. So now we are down to roughly 2MHz max, and it hasn't done anything yet.

01 Dec 2014

Hi there. It appears that the example code is missing how do i add this code to my project?