Mirror with some correction

Dependencies:   mbed FastIO FastPWM USBDevice

FastInterruptIn/FastInterruptIn.h

Committer:
arnoz
Date:
2021-10-01
Revision:
116:7a67265d7c19
Parent:
101:755f44622abc

File content as of revision 116:7a67265d7c19:

// Fast Interrupt In for KL25Z
//
// This is a replacement for the mbed library InterruptIn class, which
// sets up GPIO ports for edge-sensitive interrupt handling.  This class
// provides the same API but has a shorter code path for responding to
// each interrupt.  In my tests, the mbed InterruptIn class has a maximum
// interrupt rate of about 112kHz; this class can increase that to about
// 181kHz.
//
// If speed is critical (and it is, because why else would you be using 
// this class?), you should elevate the GPIO interrupt priority in the
// hardware interrupt controller so that GPIO pin signals can preempt other
// interrupt handlers.  The mbed USB and timer handlers in particular spend
// relative long periods in interrupt context, so if these are at the same
// or higher priority than the GPIO interrupts, they'll become the limiting
// factor.  The mbed library leaves all interrupts set to maximum priority
// by default, so to elevate the GPIO interrupt priority, you have to lower 
// the priority of everything else.  Call FastInterruptIn::elevatePriority()
// to do this.
//
//
// Performance measurements:  I set up a test program using one KL25Z to
// send 50% duty cycle square wave signals to a second KL25Z (using a PWM
// output on the sender), and measured the maximum interrupt frequency
// where the receiver could correctly count every edge, repeating the test
// with FastInterruptIn and the mbed InterruptIn.  I tested with handlers
// for both edges and handlers for single edges (just rise() or just fall()).
// The Hz rates reflect the maximum *interrupt* frequency, which is twice
// the PWM frequency when testing with handlers for both rise + fall in
// effect.  In all cases, the user callbacks were minimal code paths that
// just incremented counters, and all tests ran with PTA/PTD at elevated
// IRQ priority.  The time per interrupt values shown are the inverse of
// the maximum frequency; these reflect the time between interrupts at
// the corresponding frequency.  Since each frequency is the maximum at
// which that class can handle every interrupt without losing any, the
// time between interrupts tells us how long the CPU takes to fully process
// one interrupt and return to the base state where it's able to handle the
// next one.  This time is the sum of the initial CPU interrupt latency
// (the time it takes from an edge signal occuring on a pin to the CPU
// executing the first instruction of the IRQ vector), the time spent in
// the InterruptIn or FastInterruptIn code, the time spent in the user
// callback, and the time for the CPU to return from the interrupt to
// normal context.  For the test program, the user callback is about 4
// instructions, so perhaps 6 clocks or 360ns.  Other people have measured
// the M0+ initial interrupt latency at about 450ns, and the return time
// is probably similar.  So we have about 1.2us in fixed overhead and user
// callback time, hence the rest is the time spent in the library code.
//
//   mbed InterruptIn:
//     max rate 112kHz
//     -> 8.9us per interrupt 
//        less 1.2us fixed overhead = 7.7us in library code
//
//   FastInterruptIn:
//     max rate 181kHz
//     -> 5.5us per interrupt
//        less 1.2us fixed overhead = 3.3us in library code
// 
//
// Limitations:
//
// 1. KL25Z ONLY.  This is a bare-metal KL25Z class.
//
// 2. Globally incompatible with InterruptIn.  Both classes take over the
// IRQ vectors for the GPIO interrupts globally, so they can't be mixed
// in the same system.  If you use this class anywhere in a program, it
// has to be used exclusively throughout the whole program - don't use
// the mbed InterruptIn anywhere in a program that uses this class.
//
// 3. API differences.  The API is very similar to InterruptIn's API,
// but we don't support the method-based rise/fall callback attachers.  We
// instead use static function pointers (void functions with 'void *'
// context arguments).  It's easy to write static methods for these that 
// dispatch to regular member functions, so the functionality is the same; 
// it's just a little different syntax.  The simpler (in the sense of
// more primitive) callback interface saves a little memory and is
// slightly faster than the method attachers, since it doesn't require
// any variation checks at interrupt time.
//
// Theory of operation
//
// How the mbed code works
// On every interrupt event, the mbed library's GPIO interrupt handler
// searches for a port with an active interrupt.  Each PORTx_IRQn vector
// handles 32 ports, so each handler has to search this space of 32 ports
// for an active interrupt.  The mbed code approaches this problem by
// searching for a '1' bit in the ISFR (interrupt status flags register),
// which is effectively a 32-bit vector of bits indicating which ports have
// active interrupts.  This search could be done quickly if the hardware
// had a "count leading zeroes" instruction, which actually does exist in
// the ARM instruction set, but alas not in the M0+ subset.  So the mbed
// code has to search for the bit by other means.  It accomplishes this by
// way of a binary search.  By my estimate, this takes about 110 clocks or
// 7us.  The routine has some other slight overhead dispatching to the
// user callback once one is selected via the bit search, but the bulk of
// the time is spent in the bit search.  The mbed code could be made more
// efficient by using a better 'count leading zeroes' algorithm; there are 
// readily available implementations that run in about 15 clocks on M0+.
//
// How this code works
// FastInterruptIn takes a different approach that bypasses the bit vector
// search.  We instead search the installed handlers.  We work on the 
// assumption that the total number of interrupt handlers in the system is 
// small compared with the number of ports.  So instead of searching the 
// entire ISFR bit vector, we only check the ports with installed handlers.
//
// The mbed code takes essentially constant time to run.  It doesn't have
// any dependencies (that I can see) on the number of active InterruptIn
// pins.  In contrast, FastInterruptIn's run time is linear in the number
// of active pins: adding more pins will increase the run time.  This is
// a tradeoff, obviously.  It's very much the right tradeoff for the Pinscape 
// system, because we have very few interrupt pins overall.  I suspect it's
// the right tradeoff for most systems, too, since most embedded systems
// have a small fixed set of peripherals they're talking to.
//
// We have a few other small optimizations to maximize our sustainable
// interrupt frequency.  The most important is probably that we read the
// port pin state immediately on entry to the IRQ vector handler.  Since
// we get the same interrupt on a rising or falling edge, we have to read
// the pin state to determine which type of transition triggered the
// interrupt.  This is inherently problematic because the pin state could 
// have changed between the time the interrupt occurred and the time we 
// got around to reading the state - the likelihood of this increases as
// the interrupt source frequency increases.  The soonest we can possibly
// read the state is at entry to the IRQ vector handler, so we do that.
// Even that isn't perfectly instantaneous, due to the unavoidable 450ns
// or so latency in the hardware before the vector code starts executing;
// it would be better if the hardware read the state at the moment the
// interrupt was triggered, but there's nothing we can do about that.
// In contrast, the mbed code waits until after deciding which interrupt
// is active to read the port, so its reading is about 7us delayed vs our
// 500ns delay.  That further reduces the mbed code's ability to keep up
// with fast interrupt sources when both rise and fall handlers are needed.


#ifndef _FASTINTERRUPTIN_H_
#define _FASTINTERRUPTIN_H_

#include "mbed.h"
#include "gpio_api.h"

struct fiiCallback
{
    fiiCallback() { func = 0; }
    void (*func)(void *);
    void *context;
    
    inline void call() { func(context); }
};

class FastInterruptIn
{
public:
    // Globally elevate the PTA and PTD interrupt priorities.  Since the
    // mbed default is to start with all IRQs at maximum priority, we
    // LOWER the priority of all IRQs to the minimum, then raise the PTA
    // and PTD interrupts to maximum priority.  
    //
    // The reason we set all priorities to minimum (except for PTA and PTD) 
    // rather than some medium priority is that this is the most flexible
    // default.  It really should have been the mbed default, in my opinion,
    // since (1) it doesn't matter what the setting is if they're all the
    // same, so an mbed default of 3 would have been equivalent to an mbed
    // default of 0 (the current one) for all programs that don't make any
    // changes anyway, and (2) the most likely use case for programs that
    // do need to differentiate IRQ priorities is that they need one or two
    // items to respond MORE quickly.  It seems extremely unlikely that
    // anyone would need only one or two to be especially slow, which is
    // effectively the case the mbed default is optimized for.
    //
    // This should be called (if desired at all) once at startup.  The 
    // effect is global and permanent (unless later changes are made by
    // someone else), so there's no need to call this again when setting
    // up new handlers or changing existing handlers.  Callers are free to 
    // further adjust priorities as needed (e.g., elevate the priority of
    // some other IRQ), but that should be done after calling this, since we
    // change ALL IRQ priorities with prejudice.
    static void elevatePriority()
    {
        // Set all IRQ priorities to minimum.  M0+ has priority levels
        // 0 (highest) to 3 (lowest).  (Note that the hardware uses the
        // high-order two bits of the low byte, so the hardware priority
        // levels are 0x00 [highest], 0x40, 0x80, 0xC0 [lowest]).  The
        // mbed NVIC macros, in contrast, abstract this to use the LOW
        // two bits, for levels 0, 1, 2, 3.)
        for (int irq = 0 ; irq < 32 ; ++irq)
            NVIC_SetPriority(IRQn(irq), 0x3);
            
        // set the PTA and PTD IRQs to highest priority
        NVIC_SetPriority(PORTA_IRQn, 0x00);
        NVIC_SetPriority(PORTD_IRQn, 0x00);
    }
    
    // set up a FastInterruptIn handler on a given pin
    FastInterruptIn(PinName pin)
    {
        // start with the null callback
        callcb = &FastInterruptIn::callNone;
        
        // initialize the pin as a GPIO Digital In port
        gpio_t gpio;
        gpio_init_in(&gpio, pin);

        // get the port registers
        PDIR = gpio.reg_in;
        pinMask = gpio.mask;
        portno = uint8_t(pin >> PORT_SHIFT);
        pinno = uint8_t((pin & 0x7F) >> 2);
        
        // set up for the selected port
        IRQn_Type irqn;
        void (*vector)();
        switch (portno)
        {
        case PortA:
            irqn = PORTA_IRQn;
            vector = &PortA_ISR;
            PDIR = &FPTA->PDIR;
            break;
        
        case PortD:
            irqn = PORTD_IRQn;
            vector = &PortD_ISR;
            PDIR = &FPTD->PDIR;
            break;
        
        default:
            error("FastInterruptIn: invalid pin specified; "
                "only PTAxx and PTDxx pins are interrupt-capable");
            return;
        }
        
        // set the vector
        NVIC_SetVector(irqn, uint32_t(vector));
        NVIC_EnableIRQ(irqn);
    }
    
    // read the current pin status - returns 1 or 0
    int read() const { return (fastread() >> pinno) & 0x01; }
    
    // Fast read - returns the pin's port bit, which is '0' or '1' shifted
    // left by the port number (e.g., PTA7 or PTD7 return (1<<7) or (0<<7)).
    // This is slightly faster than read() because it doesn't normalize the
    // result to a literal '0' or '1' value.  When the value is only needed
    // for an 'if' test or the like, zero/nonzero is generally good enough,
    // so you can save a tiny bit of time by skiping the shift.
    uint32_t fastread() const { return *PDIR & pinMask; }
    
    // set a rising edge handler
    void rise(void (*func)(void *), void *context = 0)
    {
        setHandler(&cbRise, PCR_IRQC_RISING, func, context);
    }
    
    // set a falling edge handler
    void fall(void (*func)(void *), void *context = 0)
    {
        setHandler(&cbFall, PCR_IRQC_FALLING, func, context);
    }
    
    // Set the pull mode.  Note that the KL25Z only supports PullUp
    // and PullNone modes.  We'll ignore other modes.
    void mode(PinMode pull)
    {
        volatile uint32_t *PCR = &(portno == PortA ? PORTA : PORTD)->PCR[pinno];
        switch (pull)
        {
        case PullNone:
            *PCR &= ~PORT_PCR_PE_MASK;
            break;
            
        case PullUp:
            *PCR |= PORT_PCR_PE_MASK;
            break;
        }
    }
    
protected:
    // set a handler - the mode is PCR_IRQC_RISING or PCR_IRQC_FALLING
    void setHandler(
        fiiCallback *cb, uint32_t mode, void (*func)(void *), void *context)
    {
        // get the PCR (port control register) for the pin
        volatile uint32_t *PCR = &(portno == PortA ? PORTA : PORTD)->PCR[pinno];

        // disable interrupts while messing with shared statics
        __disable_irq();

        // set the callback
        cb->func = func;
        cb->context = context;
        
        // enable or disable the mode in the PCR
        if (func != 0)
        {
            // Handler function is non-null, so we're setting a handler.
            // Enable the mode in the PCR.  Note that we merely need to
            // OR the new mode bits into the existing mode bits, since
            // disabled is 0 and BOTH is equal to RISING|FALLING.
            *PCR |= mode;
            
            // if we're not already in the active list, add us
            listAdd();
        }
        else
        {
            // Handler function is null, so we're clearing the handler.
            // Disable the mode bits in the PCR.  If the old mode was
            // the same as the mode we're disabling, switch to NONE.
            // If the old mode was BOTH, switch to the mode we're NOT
            // disabling.  Otherwise make no change.
            int cur = *PCR & PORT_PCR_IRQC_MASK;
            if (cur == PCR_IRQC_BOTH)
            {
                *PCR &= ~PORT_PCR_IRQC_MASK;
                *PCR |= (mode == PCR_IRQC_FALLING ? PCR_IRQC_RISING : PCR_IRQC_FALLING);
            }
            else if (cur == mode)
            {
                *PCR &= ~PORT_PCR_IRQC_MASK;
            }
            
            // if we're disabled, remove us from the list
            if ((*PCR & PORT_PCR_IRQC_MASK) == PCR_IRQC_DISABLED)
                listRemove();
        }
        
        // set the appropriate callback mode
        if (cbRise.func != 0 && cbFall.func != 0)
        {
            // They want to be called on both Rise and Fall events. 
            // The hardware triggers the same interrupt on both, so we
            // need to distinguish which is which by checking the current
            // pin status when the interrupt occurs.
            callcb = &FastInterruptIn::callBoth;
        }
        else if (cbRise.func != 0)
        {
            // they only want Rise events
            callcb = &FastInterruptIn::callRise;
        }
        else if (cbFall.func != 0)
        {
            // they only want Fall events
            callcb = &FastInterruptIn::callFall;
        }
        else
        {
            // no events are registered
            callcb = &FastInterruptIn::callNone;
        }
        
        // done messing with statics
        __enable_irq();
    }
    
    // add me to the active list for my port
    void listAdd()
    {
        // figure the list head
        FastInterruptIn **headp = (portno == PortA) ? &headPortA : &headPortD;
        
        // search the list to see if I'm already there
        FastInterruptIn **nxtp = headp;
        for ( ; *nxtp != 0 && *nxtp != this ; nxtp = &(*nxtp)->nxt) ;
        
        // if we reached the last entry without finding me, add me
        if (*nxtp == 0)
        {
            *nxtp = this;
            this->nxt = 0;
        }
    }
    
    // remove me from the active list for my port
    void listRemove()
    {
        // figure the list head
        FastInterruptIn **headp = (portno == PortA) ? &headPortA : &headPortD;
        
        // find me in the list
        FastInterruptIn **nxtp = headp;
        for ( ; *nxtp != 0 && *nxtp != this ; nxtp = &(*nxtp)->nxt) ;
        
        // if we found me, unlink me
        if (*nxtp == this)
        {
            *nxtp = this->nxt;
            this->nxt = 0;
        }
    }
    
    // next link in active list for our port
    FastInterruptIn *nxt;
    
    // pin mask - this is 1<<pinno, used for selecting or setting the port's
    // bit in the port-wide bit vector registers (IFSR, PDIR, etc)
    uint32_t pinMask;
    
    // Internal interrupt dispatcher.  This is set to one of 
    // &callNone, &callRise, &callFall, or &callBoth, according 
    // to which type of handler(s) we have registered.
    void (*callcb)(FastInterruptIn *, uint32_t pinstate);
    
    // PDIR (data read) register
    volatile uint32_t *PDIR;
    
    // port and pin number
    uint8_t portno;
    uint8_t pinno;

    // user interrupt handler callbacks
    fiiCallback cbRise;
    fiiCallback cbFall;
    
protected:
    static void callNone(FastInterruptIn *f, uint32_t pinstate) { }
    static void callRise(FastInterruptIn *f, uint32_t pinstate) { f->cbRise.call(); }
    static void callFall(FastInterruptIn *f, uint32_t pinstate) { f->cbFall.call(); }
    static void callBoth(FastInterruptIn *f, uint32_t pinstate)
    {
        if (pinstate)
            f->cbRise.call();
        else
            f->cbFall.call();
    }
    
    // Head of active interrupt handler lists.  When a handler is
    // active, we link it into this static list.  At interrupt time,
    // we search the list for an active interrupt.
    static FastInterruptIn *headPortA;
    static FastInterruptIn *headPortD;

    // PCR_IRQC modes
    static const uint32_t PCR_IRQC_DISABLED = PORT_PCR_IRQC(0);
    static const uint32_t PCR_IRQC_RISING = PORT_PCR_IRQC(9);
    static const uint32_t PCR_IRQC_FALLING = PORT_PCR_IRQC(10);
    static const uint32_t PCR_IRQC_BOTH = PORT_PCR_IRQC(11);
    
    // IRQ handlers.  We set up a separate handler for each port to call
    // the common handler with the port-specific parameters.  
    // 
    // We read the current pin input status immediately on entering the 
    // handler, so that we have the pin reading as soon as possible after 
    // the interrupt.  In cases where we're handling both rising and falling
    // edges, the only way to tell which type of edge triggered the interrupt
    // is to look at the pin status, since the same interrupt is generated
    // in either case.  For a high-frequency signal source, the pin state
    // might change again very soon after the edge that triggered the
    // interrupt, so we can get the wrong state if we wait too long to read
    // the pin.  The soonest we can read the pin is at entry to our handler,
    // which isn't even perfectly instantaneous, since the hardware has some
    // latency (reportedly about 400ns) responding to an interrupt.  
    static void PortA_ISR() { ISR(&PORTA->ISFR, headPortA, FPTA->PDIR); }
    static void PortD_ISR() { ISR(&PORTD->ISFR, headPortD, FPTD->PDIR); }
    inline static void ISR(volatile uint32_t *pifsr, FastInterruptIn *f, uint32_t pdir)
    {
        // search the list for an active entry
        uint32_t ifsr = *pifsr;
        for ( ; f != 0 ; f = f->nxt)
        {
            // check if this entry's pin is in interrupt state
            if ((ifsr & f->pinMask) != 0)
            {
                // clear the interrupt flag by writing '1' to the bit
                *pifsr = f->pinMask;
                
                // call the appropriate user callback
                f->callcb(f, pdir & f->pinMask);
                
                // Stop searching.  If another pin has an active interrupt,
                // or this pin already has another pending interrupt, the
                // hardware will immediately call us again as soon as we
                // return, and we'll find the new interrupt on that new call.
                // This should be more efficient on average than checking all 
                // pins even after finding an active one, since in most cases 
                // there will only be one interrupt to handle at a time.
                return;
            }                
        }
    }
    
};

#endif