USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

USBDevice/USBHAL_KL25Z.cpp

Committer:
mjr
Date:
2016-02-26
Revision:
49:03527ce6840e
Parent:
48:b225d025ca1d
Child:
50:946bc763c068

File content as of revision 49:03527ce6840e:

/* Copyright (c) 2010-2011 mbed.org, MIT License
*
* Permission is hereby granted, free of charge, to any person obtaining a copy of this software
* and associated documentation files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all copies or
* substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
* BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

#if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)

//#define DEBUG
#ifdef DEBUG
#define printd(fmt, ...) printf(fmt, __VA_ARGS__)
#else
#define printd(fmt, ...)
#endif


#include "USBHAL.h"

// Critical section controls.  This module uses a bunch of static variables,
// and much of the code that accesses the statics can be called from either
// normal application context or IRQ context.  Whenever a shared variable is
// accessed from code that can run in an application context, we have to
// protect against interrupts by entering a critical section.  These macros
// enable and disable the USB IRQ if we're running in application context.
// (They do nothing if we're already in interrupt context, because the
// hardware interrupt controller won't generated another of the same IRQ
// that we're already handling.  We could still be interrupted by a different,
// higher-priority IRQ, but our shared variables are only shared within this
// module, so they won't be affected by other interrupt handlers.)
static int inIRQ;
#define ENTER_CRITICAL_SECTION \
    if (!inIRQ) \
        NVIC_DisableIRQ(USB0_IRQn);
#define EXIT_CRITICAL_SECTION \
    if (!inIRQ) \
        NVIC_EnableIRQ(USB0_IRQn);

// static singleton instance pointer
USBHAL * USBHAL::instance;


// Convert physical endpoint number to register bit
#define EP(endpoint) (1<<(endpoint))

// Convert physical endpoint number to logical endpoint number.
// Each logical endpoint has two physical endpoints, one RX and 
// one TX.  The physical endpoints are numbered in RX,TX pairs,
// so the logical endpoint number is simply the physical endpoint
// number divided by 2 (discarding the remainder).
#define PHY_TO_LOG(endpoint)    ((endpoint)>>1)

// Get a physical endpoint's direction.  IN and OUT are from
// the host's perspective, so from our perspective on the device,
// IN == TX and OUT == RX.  The physical endpoints are in RX,TX
// pairs, so the OUT/RX is the even numbered element of a pair
// and the IN/TX is the odd numbered element.
#define IN_EP(endpoint)     ((endpoint) & 1U ? true : false)
#define OUT_EP(endpoint)    ((endpoint) & 1U ? false : true)

// BDT status flags, defined by the SIE hardware.  These are
// bits packed into the 'info' byte of a BDT entry.
#define BD_OWN_MASK        (1<<7)       // OWN - hardware SIE owns the BDT (TX/RX in progress)
#define BD_DATA01_MASK     (1<<6)       // DATA01 - DATA0/DATA1 bit for current TX/RX on endpoint
#define BD_KEEP_MASK       (1<<5)       // KEEP - hardware keeps BDT ownership after token completes
#define BD_NINC_MASK       (1<<4)       // NO INCREMENT - buffer location is a FIFO, so use same address for all bytes
#define BD_DTS_MASK        (1<<3)       // DATA TOGGLE SENSING - hardware SIE checks for DATA0/DATA1 match during RX/TX
#define BD_STALL_MASK      (1<<2)       // STALL - SIE issues STALL handshake in reply to any host access to endpoint

// Endpoint direction (from DEVICE perspective)
#define TX    1
#define RX    0

// Buffer parity.  The hardware has a double-buffering scheme where each
// physical endpoint has two associated BDT entries, labeled EVEN and ODD.
// We disable the double buffering, so only the EVEN buffers are used in
// this implementation.
#define EVEN  0
#define ODD   1

// Get the BDT index for a given logical endpoint, direction, and buffer parity
#define EP_BDT_IDX(logep, dir, odd) (((logep) * 4) + (2 * (dir)) + (1 *  (odd)))

// Get the BDT index for a given physical endpoint and buffer parity
#define PEP_BDT_IDX(phyep, odd)  (((phyep) * 2) + (1 * (odd)))

// Token types reported in the BDT 'info' flags.  
#define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
#define SETUP_TOKEN    0x0D
#define IN_TOKEN       0x09
#define OUT_TOKEN      0x01

// Buffer Descriptor Table (BDT) entry.  This is the hardware-defined
// memory structure for the shared memory block controlling an endpoint.
typedef struct BDT {
    uint8_t   info;       // BD[0:7]
    uint8_t   dummy;      // RSVD: BD[8:15]
    uint16_t  byte_count; // BD[16:32]
    uint32_t  address;    // Addr
} BDT;


// There are:
//    * 16 bidirectional logical endpoints -> 32 physical endpoints
//    * 2 BDT entries per endpoint (EVEN/ODD) -> 64 BDT entries
__attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];

// Transfer buffers.  We allocate the transfer buffers and point the
// SIE hardware to them via the BDT.  We disable hardware SIE's
// double-buffering (EVEN/ODD) scheme, so we only allocate one buffer
// per physical endpoint.
uint8_t *endpoint_buffer[NUMBER_OF_PHYSICAL_ENDPOINTS];

// Allocated size of each endpoint buffer
size_t epMaxPacket[NUMBER_OF_PHYSICAL_ENDPOINTS];


// SET ADDRESS mode tracking.  The address assignment has to be done in a
// specific order and with specific timing defined by the USB setup protocol 
// standards.  To get the sequencing right, we set a flag when we get the
// address message, and then set the address in the SIE when we're at the 
// right subsequent packet step in the protocol exchange.  These variables
// are just a place to stash the information between the time we receive the
// data and the time we're ready to update the SIE register.
static uint8_t set_addr = 0;
static uint8_t addr = 0;

// Endpoint DATA0/DATA1 bits, packed as a bit vector.  Each endpoint's
// bit is at (1 << endpoint number).  These track the current bit value
// on the endpoint.  For TX endpoints, this is the bit for the LAST
// packet we sent (so the next packet will be the inverse).  For RX
// endpoints, this is the bit value we expect for the NEXT packet.
// (Yes, it's inconsistent.)
static uint32_t Data1  = 0x55555555;

// Endpoint read/write completion flags, packed as a bit vector.  Each 
// endpoint's bit is at (1 << endpoint number).  A 1 bit signifies that
// the last read or write has completed (and hasn't had its result 
// consumed yet).
static volatile uint32_t epComplete = 0;

static uint32_t frameNumber() 
{
    return((USB0->FRMNUML | (USB0->FRMNUMH << 8)) & 0x07FF);
}

uint32_t USBHAL::endpointReadcore(uint8_t endpoint, uint8_t *buffer) 
{
    return 0;
}

USBHAL::USBHAL(void) {
    // Disable IRQ
    NVIC_DisableIRQ(USB0_IRQn);

#if defined(TARGET_K64F)
    MPU->CESR=0;
#endif
    // fill in callback array
    epCallback[0] = &USBHAL::EP1_OUT_callback;
    epCallback[1] = &USBHAL::EP1_IN_callback;
    epCallback[2] = &USBHAL::EP2_OUT_callback;
    epCallback[3] = &USBHAL::EP2_IN_callback;
    epCallback[4] = &USBHAL::EP3_OUT_callback;
    epCallback[5] = &USBHAL::EP3_IN_callback;
    epCallback[6] = &USBHAL::EP4_OUT_callback;
    epCallback[7] = &USBHAL::EP4_IN_callback;
    epCallback[8] = &USBHAL::EP5_OUT_callback;
    epCallback[9] = &USBHAL::EP5_IN_callback;
    epCallback[10] = &USBHAL::EP6_OUT_callback;
    epCallback[11] = &USBHAL::EP6_IN_callback;
    epCallback[12] = &USBHAL::EP7_OUT_callback;
    epCallback[13] = &USBHAL::EP7_IN_callback;
    epCallback[14] = &USBHAL::EP8_OUT_callback;
    epCallback[15] = &USBHAL::EP8_IN_callback;
    epCallback[16] = &USBHAL::EP9_OUT_callback;
    epCallback[17] = &USBHAL::EP9_IN_callback;
    epCallback[18] = &USBHAL::EP10_OUT_callback;
    epCallback[19] = &USBHAL::EP10_IN_callback;
    epCallback[20] = &USBHAL::EP11_OUT_callback;
    epCallback[21] = &USBHAL::EP11_IN_callback;
    epCallback[22] = &USBHAL::EP12_OUT_callback;
    epCallback[23] = &USBHAL::EP12_IN_callback;
    epCallback[24] = &USBHAL::EP13_OUT_callback;
    epCallback[25] = &USBHAL::EP13_IN_callback;
    epCallback[26] = &USBHAL::EP14_OUT_callback;
    epCallback[27] = &USBHAL::EP14_IN_callback;
    epCallback[28] = &USBHAL::EP15_OUT_callback;
    epCallback[29] = &USBHAL::EP15_IN_callback;

    // choose usb src as PLL
    SIM->SOPT2 |= (SIM_SOPT2_USBSRC_MASK | SIM_SOPT2_PLLFLLSEL_MASK);

    // enable OTG clock
    SIM->SCGC4 |= SIM_SCGC4_USBOTG_MASK;

    // Attach IRQ
    instance = this;
    NVIC_SetVector(USB0_IRQn, (uint32_t)&_usbisr);
    NVIC_EnableIRQ(USB0_IRQn);

    // USB Module Configuration
    // Reset USB Module
    USB0->USBTRC0 |= USB_USBTRC0_USBRESET_MASK;
    while(USB0->USBTRC0 & USB_USBTRC0_USBRESET_MASK);

    // Set BDT Base Register
    USB0->BDTPAGE1 = (uint8_t)((uint32_t)bdt>>8);
    USB0->BDTPAGE2 = (uint8_t)((uint32_t)bdt>>16);
    USB0->BDTPAGE3 = (uint8_t)((uint32_t)bdt>>24);

    // Clear interrupt flag
    USB0->ISTAT = 0xff;

    // USB Interrupt Enablers
    USB0->INTEN |= USB_INTEN_TOKDNEEN_MASK |
                   USB_INTEN_SOFTOKEN_MASK |
                   USB_INTEN_ERROREN_MASK  |
                   USB_INTEN_SLEEPEN_MASK |
                   USB_INTEN_RESUMEEN_MASK |
                   USB_INTEN_USBRSTEN_MASK;

    // Disable weak pull downs
    USB0->USBCTRL &= ~(USB_USBCTRL_PDE_MASK | USB_USBCTRL_SUSP_MASK);

    USB0->USBTRC0 |= 0x40;
}

USBHAL::~USBHAL(void) 
{
}

void USBHAL::connect(void) 
{
    // enable USB
    USB0->CTL |= USB_CTL_USBENSOFEN_MASK;
    
    // Pull up enable
    USB0->CONTROL |= USB_CONTROL_DPPULLUPNONOTG_MASK;
}

void USBHAL::disconnect(void) 
{
    // disable USB
    USB0->CTL &= ~USB_CTL_USBENSOFEN_MASK;
    
    // Pull up disable
    USB0->CONTROL &= ~USB_CONTROL_DPPULLUPNONOTG_MASK;

    // Free buffers
    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) 
    {
        if (endpoint_buffer[i] != NULL)
        {
            delete [] endpoint_buffer[i];
            endpoint_buffer[i] = NULL;
            epMaxPacket[i] = 0;
        }
    }
}

void USBHAL::configureDevice(void) 
{
    // not needed
}

void USBHAL::unconfigureDevice(void) 
{
    // not needed
}

void USBHAL::setAddress(uint8_t address) 
{
    // we don't set the address now otherwise the usb controller does not ack
    // we set a flag instead
    // see usbisr when an IN token is received
    set_addr = 1;
    addr = address;
 }

bool USBHAL::realiseEndpoint(uint8_t endpoint, uint32_t maxPacket, uint32_t flags) 
{
    // validate the endpoint number
    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
        return false;

    // get the logical endpoint
    uint32_t log_endpoint = PHY_TO_LOG(endpoint);
    
    // Assume this is a bulk or interrupt endpoint.  For these, the hardware maximum
    // packet size is 64 bytes, and we use packet handshaking.
    uint32_t hwMaxPacket = 64;
    uint32_t handshake_flag = USB_ENDPT_EPHSHK_MASK;
    
    // If it's to be an isochronous endpoint, the hardware maximum packet size
    // increases to 1023 bytes, and we don't use handshaking.
    if (flags & ISOCHRONOUS) 
    {
        handshake_flag = 0;
        hwMaxPacket = 1023;
    }

    // limit the requested max packet size to the hardware limit
    if (maxPacket > hwMaxPacket)
        maxPacket = hwMaxPacket;
        
    ENTER_CRITICAL_SECTION
    {
        // if the endpoint buffer hasn't been allocated yet or was previously
        // allocated at a smaller size, allocate a new buffer        
        uint8_t *buf = endpoint_buffer[endpoint];
        if (buf == NULL || epMaxPacket[endpoint] < maxPacket)
        {
            // free any previous buffer
            if (buf != 0)
                delete [] buf;
    
            // allocate at the new size
            endpoint_buffer[endpoint] = buf = new uint8_t[maxPacket];
            
            // set the new max packet size
            epMaxPacket[endpoint] = maxPacket;
        }
        
        // set the endpoint register flags and BDT entry
        if (IN_EP(endpoint)) 
        {
            // IN endpt -> device to host (TX)
            USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag | USB_ENDPT_EPTXEN_MASK;  // en TX (IN) tran
            bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].address = (uint32_t) buf;
            bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].address = 0;
        }
        else 
        {
            // OUT endpt -> host to device (RX)
            USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag | USB_ENDPT_EPRXEN_MASK;  // en RX (OUT) tran.
            bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].address = (uint32_t) buf;
            bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].address = 0;
            
            // set up the first read
            bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].byte_count = maxPacket;
            bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].info       = BD_OWN_MASK | BD_DTS_MASK;
            bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].info       = 0;
        }
    
        // Set DATA1 on the endpoint.  For RX endpoints, we just queued up our first
        // read, which will always be a DATA0 packet, so the next read will use DATA1.
        // For TX endpoints, we always flip the bit *before* sending the packet, so
        // (counterintuitively) we need to set the DATA1 bit now to send DATA0 in the
        // next packet.  So in either case, we want DATA1 initially.
        Data1 |= (1 << endpoint);
    }
    EXIT_CRITICAL_SECTION
    
    // success
    return true;
}

// read setup packet
void USBHAL::EP0setup(uint8_t *buffer) 
{
    uint32_t sz;
    endpointReadResult(EP0OUT, buffer, &sz);
}

void USBHAL::EP0readStage(void) 
{
    if (!(bdt[0].info & BD_OWN_MASK))
    {
        Data1 &= ~1UL;  // set DATA0
        bdt[0].byte_count = MAX_PACKET_SIZE_EP0;
        bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);
    }
}

void USBHAL::EP0read(void) 
{
    if (!(bdt[0].info & BD_OWN_MASK))
        bdt[0].byte_count = MAX_PACKET_SIZE_EP0;
}

uint32_t USBHAL::EP0getReadResult(uint8_t *buffer) 
{
    uint32_t sz;
    endpointReadResult(EP0OUT, buffer, &sz);
    return sz;
}

void USBHAL::EP0write(const uint8_t *buffer, uint32_t size) 
{
    endpointWrite(EP0IN, buffer, size);
}

void USBHAL::EP0getWriteResult(void) 
{
}

void USBHAL::EP0stall(void) 
{
    stallEndpoint(EP0OUT);
}

EP_STATUS USBHAL::endpointRead(uint8_t endpoint, uint32_t maximumSize) 
{
    // We always start a new read when we fetch the result of the
    // previous read, so we don't have to do anything here.  Simply
    // indicate that the read is pending so that the caller can proceed
    // to check the results.
    return EP_PENDING;
}

EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t *buffer, uint32_t *bytesRead) 
{
    // validate the endpoint number and direction
    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS || !OUT_EP(endpoint))
        return EP_INVALID;

    // get the logical endpoint
    uint32_t log_endpoint = PHY_TO_LOG(endpoint);
    
    // get the mode - it's isochronous if it doesn't have the handshake flag
    bool iso = (USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK) == 0;
    
    // get the BDT index
    int idx = EP_BDT_IDX(log_endpoint, RX, 0);
        
    // Check to see if the endpoint is ready to read
    if (log_endpoint == 0)
    {
        // control endpoint - just make sure we own the BDT
        if (bdt[idx].info & BD_OWN_MASK)
            return EP_PENDING;
    }
    else
    {
        // If it's not isochronous, check to see if we've received data, and
        // return PENDING if not.  Isochronous endpoints don't use the TOKNE 
        // interrupt (they use SOF instead), so the 'complete' flag doesn't
        // apply if it's an iso endpoint.
        if (!iso && !(epComplete & EP(endpoint)))
            return EP_PENDING;
    }
    
    ENTER_CRITICAL_SECTION
    {
        // note if we have a SETUP token
        bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
    
        // get the received data buffer and size
        uint8_t *ep_buf = endpoint_buffer[endpoint];
        uint32_t sz  = bdt[idx].byte_count;
    
        // copy the data from the hardware receive buffer to the caller's buffer
        *bytesRead = sz;
        for (uint32_t n = 0 ; n < sz ; n++)
            buffer[n] = ep_buf[n];
        
        // Figure the DATA0/DATA1 bit for the next packet received on this
        // endpoint.  The bit normally toggles on each packet, but it's
        // special for SETUP packets on endpoint 0.  The next OUT packet
        // after a SETUP packet with no data stage is always DATA0, even
        // if the SETUP packet was also DATA0.
        if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) 
        {
            if (setup && (buffer[6] == 0))  // if SETUP with no data stage,
                Data1 &= ~1UL;              // the next packet is always DATA0
            else
                Data1 ^= (1 << endpoint);   // otherwise just toggle the last bit
        }
    
        // set up the BDT entry to receive the next packet, and hand it to the SIE
        bdt[idx].byte_count = epMaxPacket[endpoint];
        bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK | (((Data1 >> endpoint) & 1) << 6);
    
        // clear the SUSPEND TOKEN BUSY flag to allow token processing to continue
        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
    
        // clear the 'completed' flag - we're now awaiting the next packet
        epComplete &= ~EP(endpoint);
    }
    EXIT_CRITICAL_SECTION
        
    // the read is completed
    return EP_COMPLETED;
}

EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, const uint8_t *data, uint32_t size) 
{
    // validate the endpoint number and direction
    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS || !IN_EP(endpoint))
        return EP_INVALID;

    // get the BDT index
    int idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, 0);
    
    ENTER_CRITICAL_SECTION
    {
        // get the endpoint buffer
        uint8_t *ep_buf = endpoint_buffer[endpoint];
    
        // copy the data to the hardware buffer
        bdt[idx].byte_count = size;
        for (uint32_t n = 0 ; n < size ; n++)
            ep_buf[n] = data[n];
        
        // toggle DATA0/DATA1 before sending
        Data1 ^= (1 << endpoint);

        // hand the BDT to the SIE to do the send
        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
    }
    EXIT_CRITICAL_SECTION
    
    // write is now pending in the hardware
    return EP_PENDING;
}

EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint) 
{
    // assume write is still pending
    EP_STATUS result = EP_PENDING;
    
    ENTER_CRITICAL_SECTION
    {
        // check the 'completed' flag - if set, the write is completed
        if (epComplete & EP(endpoint)) 
        {
            // the result is COMPLETED
            result = EP_COMPLETED;

            // clear the 'completed' flag - this is consumed by fetching the result
            epComplete &= ~EP(endpoint);
        }
    }
    EXIT_CRITICAL_SECTION
    
    // return the result
    return result;
}

void USBHAL::stallEndpoint(uint8_t endpoint) 
{
    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
}

void USBHAL::unstallEndpoint(uint8_t endpoint) 
{
    ENTER_CRITICAL_SECTION
    {
        // clear the stall bit in the endpoint register
        USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
        
        // take ownership of the BDT entry
        int idx = PEP_BDT_IDX(endpoint, 0);
        bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
        
        // if this is an RX endpoint, start a new read
        if (OUT_EP(endpoint))
        {
            bdt[idx].byte_count = epMaxPacket[endpoint];
            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
        }

        // Reset Data1 for the endpoint - we need to set the bit to 1 for 
        // either TX or RX, by the same logic as in realiseEndpoint()
        Data1 |= (1 << endpoint);
        
        // clear the 'completed' bit for the endpoint
        epComplete &= ~(1 << endpoint);
    }
    EXIT_CRITICAL_SECTION
}

bool USBHAL::getEndpointStallState(uint8_t endpoint) 
{
    uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
    return (stall) ? true : false;
}

void USBHAL::remoteWakeup(void) 
{
    // [TODO]
}

void USBHAL::_usbisr(void) 
{
    inIRQ = true;
    instance->usbisr();
    inIRQ = false;
}


void USBHAL::usbisr(void) 
{
    uint8_t i;
    uint8_t istat = USB0->ISTAT;

    // reset interrupt
    if (istat & USB_ISTAT_USBRST_MASK) 
    {
        // disable all endpoints
        for (i = 0 ; i < 16 ; i++)
            USB0->ENDPOINT[i].ENDPT = 0x00;

        // enable control endpoint
        realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
        realiseEndpoint(EP0IN, MAX_PACKET_SIZE_EP0, 0);

        // reset DATA0/1 state
        Data1 = 0x55555555;
    
        // reset endpoint completion status
        epComplete = 0;
        
        // reset EVEN/ODD state (and keep it permanently on EVEN -
        // this disables the hardware double-buffering system)
        USB0->CTL |=  USB_CTL_ODDRST_MASK;

        USB0->ISTAT   =  0xFF;  // clear all interrupt status flags
        USB0->ERRSTAT =  0xFF;  // clear all error flags
        USB0->ERREN   =  0xFF;  // enable error interrupt sources
        USB0->ADDR    =  0x00;  // set default address
        
        // notify upper layers of the bus reset
        busReset();
        
        // we're not suspended
        suspendStateChanged(0);

        // do ONLY the reset processing on a RESET interrupt
        return;
    }

    // resume interrupt
    if (istat & USB_ISTAT_RESUME_MASK) 
    {
        suspendStateChanged(0);
        USB0->ISTAT = USB_ISTAT_RESUME_MASK;
    }

    // SOF interrupt
    if (istat & USB_ISTAT_SOFTOK_MASK) 
    {
        // Read frame number and signal the SOF event to the callback
        SOF(frameNumber());
        USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
    }

    // stall interrupt
    if (istat & USB_ISTAT_STALL_MASK)
    {
        // if the control endpoint (EP 0) is stalled, unstall it
        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
        {
            // clear the stall bit in the endpoint register
            USB0->ENDPOINT[0].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
        
            // take ownership of the RX and TX BDTs
            bdt[EP_BDT_IDX(0, RX, EVEN)].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
            bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
            bdt[EP_BDT_IDX(0, RX, EVEN)].byte_count = MAX_PACKET_SIZE_EP0;
            bdt[EP_BDT_IDX(0, TX, EVEN)].byte_count = MAX_PACKET_SIZE_EP0;

            // start a new read on EP0OUT
            bdt[EP_BDT_IDX(0, RX, EVEN)].info = BD_OWN_MASK | BD_DTS_MASK;

            // reset the DATA0/1 bit to 1 on EP0IN and EP0OUT, by the same 
            // logic as in realiseEndpoint()            
            Data1 |= 0x03;
        }
        
        // clear the busy-suspend bit to resume token processing
        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
        
        // clear the interrupt status bit for STALL
        USB0->ISTAT = USB_ISTAT_STALL_MASK;
    }

    // token interrupt
    if (istat & USB_ISTAT_TOKDNE_MASK) 
    {
        // get the endpoint information from the status register
        uint32_t num  = (USB0->STAT >> 4) & 0x0F;
        uint32_t dir  = (USB0->STAT >> 3) & 0x01;
        int endpoint = (num << 1) | dir;
        uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;

        // check which endpoint we're working with
        if (num == 0)
        {
            // Endpoint 0 requires special handling
            uint32_t idx = EP_BDT_IDX(num, dir, ev_odd);
            int pid = TOK_PID(idx);
            if (pid == SETUP_TOKEN)
            {
                // SETUP packet - next IN (TX) packet must be DATA1 (confusingly,
                // this means we must clear the Data1 bit, since we flip the bit
                // before each send)
                Data1 &= ~0x02;
                
                // forcibly take ownership of the EP0IN endpoints in case we have
                // unfinished previous transmissions (the protocol state machine here
                // assumes that we don't, so it's probably an error if this code
                // actually does anything, but we make no provision for handling this)
                bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
                bdt[EP_BDT_IDX(0, TX, ODD )].info &= ~BD_OWN_MASK;

                // handle the EP0 SETUP event in the generic protocol layer
                EP0setupCallback();
            } 
            else if (pid == OUT_TOKEN)
            {
                // OUT packet on EP0
                EP0out();
            }
            else if (pid == IN_TOKEN)
            {
                // IN packet on EP0
                EP0in();
                
                // Special case: if the 'set address' flag is set, it means that the
                // host just sent us our bus address.  We must put this into effect
                // in the hardware SIE *after* sending the reply, which we just did
                // above.  So it's now time!
                if (set_addr == 1) {
                    USB0->ADDR = addr & 0x7F;
                    set_addr = 0;
                }
            }
        }
        else
        {
            // For all other endpoints, note the read/write completion in the flags
            epComplete |= EP(endpoint);
            
            // call the endpoint token callback; if that handles the token, it consumes
            // the 'completed' status, so clear that flag again
            if ((instance->*(epCallback[endpoint - 2]))()) {
                epComplete &= ~EP(endpoint);
            }
        }

        // clear the TOKDNE interrupt status bit
        USB0->ISTAT = USB_ISTAT_TOKDNE_MASK;
    }

    // sleep interrupt
    if (istat & USB_ISTAT_SLEEP_MASK) 
    {
        suspendStateChanged(1);
        USB0->ISTAT = USB_ISTAT_SLEEP_MASK;
    }

    // error interrupt
    if (istat & USB_ISTAT_ERROR_MASK) 
    {
        // reset all error status bits, and clear the SUSPEND flag to allow
        // token processing to continue
        USB0->ERRSTAT = 0xFF;
        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
        USB0->ISTAT = USB_ISTAT_ERROR_MASK;
    }
}

#endif