USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

USBDevice/USBHAL_KL25Z.cpp

Committer:
mjr
Date:
2015-12-24
Revision:
34:884405d998bb
Parent:
31:81f57ea86f8f
Child:
35:53e1a208f582

File content as of revision 34:884405d998bb:

/* Copyright (c) 2010-2011 mbed.org, MIT License
*
* Permission is hereby granted, free of charge, to any person obtaining a copy of this software
* and associated documentation files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all copies or
* substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
* BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

// USB HAL implementation for KLxx
//
// This implementation uses the KLxx USB SIE (Serial Interface Engine), which is
// a hardware subsystem in these microcontroller that handles the low-level transmission
// details.  The SIE operates concurrently with the CPU, allowing bus transmissions to 
// proceed without CPU attention.  The CPU and SIE coordinate their activities through
// hardware registers and shared RAM buffers.  The main coordination point is the BDT,
// or Buffer Descriptor Table, a shared memory region that holds status information on
// all active endpoint buffers.
//
// This version has a number of substantial changes from the original mbed version:
//
//  - Fixed several bugs in the interrupt handler related to figuring the endpoint
//    number.  The official version does the calculations wrong in such a way that
//    the errors are mostly harmless for endpoints 0-4 but will cause stray pointer
//    writes for higher endpoint numbers.
//
//  - Improved concurrency control between SEI and CPU by paying better attention to
//    who has ownership of a BDT entry at any given time, and only manipulating a BDT
//    when we have ownership.
//
//  - Cleaned up and added a lot of documentation to the code.  Rationalized some
//    misnomers (e.g., EVEN and ODD were reversed).  Simplified the endpoint realization
//    routine considerably to remove redundant code.

#if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)

#include "USBHAL.h"

USBHAL * USBHAL::instance;

// Convert physical endpoint number to register bit
#define EPMASK(endpoint) (1<<(endpoint))

// Convert physical to logical
#define PHY_TO_LOG(endpoint)    ((endpoint)>>1)

// Get endpoint direction
#define IN_EP(endpoint)     ((endpoint) & 1U ? true : false)
#define OUT_EP(endpoint)    ((endpoint) & 1U ? false : true)

// TRANSMIT/RECEIVE direction flag - this is relative to the microcontroller.
// In USB lingo, directions are always host-relative, so TX == IN (we're
// transmitting, host is receiving) and RX == OUT (host is transmitting,
// we're receiving).
#define TX    1
#define RX    0

// Double-buffer parity.  This implementation doesn't use the double-buffering
// feature of the SIE hardware; we simply always use the EVEN buffer for each
// endpoint.
#define EVEN  0
#define ODD   1

// Get the index of the buffer descriptor for the given endpoint in the
// given direction with the given double-buffer parity.  'ep' is the
// logical endpoint number (0-15), 'dir' is the direction (0=receive,
// 1=transmit), and 'odd' is the double-buffer parity (0=even, 1=odd).
#define EP_BDT_IDX(ep, dir, odd) ((((ep) * 4) + (2 * (dir)) + (1 *  (odd))))

// BDT = buffer descriptor table entry.  Each endpoint has two of these structures, 
// forming a double buffer arrangement that allows the microprocessor to prepare a 
// buffer for an upcoming operation while the USB SIE is transferring data in the 
// other buffer.  The OWN bit in the 'info' member indicates who owns the buffer:
// 0 means that the microprocessor owns the buffer, 1 means that the USB SIE owns
// the buffer (and is actively transferring between USB and local memory).
typedef struct BDT {
    uint8_t   info;       // flags - a combination of BD_xxx_MASK flags below;
                          //         bits 2-5 are overloaded as the Token PID
                          //         for received packets
    uint8_t   reserved;   // reserved, must be 0
    uint16_t  byte_count; // bytes in buffer (10 bits only - high 6 bits reserved)
    uint32_t  address;    // address of buffer in RAM
} BDT;

// BD (buffer descriptor) bit flags, defined by the USB hardware
#define BD_OWN_MASK        (1<<7)   // OWN semaphore bit - 0 -> CPU owns BD, 1 -> USB-FS owns BD
#define BD_DATA01_MASK     (1<<6)   // PID - 0 -> DATA0, 1 -> DATA1
#define BD_KEEP_MASK       (1<<5)   // Keep: 0 -> release ownership upon packet completion, 1 -> keep ownership
#define BD_NINC_MASK       (1<<4)   // Mode: 0 -> auto increment DMA mode, 1 -> no increment FIFO mode
#define BD_DTS_MASK        (1<<3)   // Data Toggle Sync: 0 -> DTS disabled, 1 -> DTS enabled
#define BD_STALL_MASK      (1<<2)   // Stall: 0 -> no stall is issued; 1 -> STALL handshake is issued if a token is received for this BDT
//                         (1<<1)   // reserved (bit 1), must be 0
//                         (1<<0)   // reserved (bit 0), must be 0

// Private BD bit flags.  When we own a buffer, we can use the
// reserved bits in the info field for our own private information.
#define BD_COMPLETED_MASK  (1<<0)   // Operation Completed - buffer contains valid read data, or write data was transmitted

// Extract the Token PID from a BDT entry.  This gives us the PID for a 
// received packet, stored in bits 2-5 of the first byte of the BDT
// (the 'info' byte in our struct).  The USB hardware sets these bits
// when a packet is received.  Note that these bits are overloaded
// as flag bits when we hand off the BDT to the USB hardware; the USB
// subsystem overwrites them on receiving data.  The possible PID
// values are listed below (only for device mode - there are other
// values possible in host mode, which we don't implement here).
#define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
#define SETUP_TOKEN    0x0D
#define IN_TOKEN       0x09
#define OUT_TOKEN      0x01

// 16 bidirectional endpoints -> 32 physical endpoints
// 2 BDT entries per endpoint (even, odd)) -> 64 BDT entries total
__attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];

// Endpoint buffers - one buffer per physical endpoint
uint8_t *endpoint_buffer[NUMBER_OF_PHYSICAL_ENDPOINTS];
uint32_t endpoint_buffer_size[NUMBER_OF_PHYSICAL_ENDPOINTS];

// USB device address
static uint8_t addr = 0;

// SET ADDRESS mode flag
static uint8_t set_addr = 0;

// Current DATA0/DATA1 value, per physical endpoint.  This has the 32
// endpoints packed into the 32 bits (TX15 RX15 TX14 RX14 ... TX0 RX0).
// The bits represent the packet type for the next operation - a 0 bit
// means DATA0, a 1 bit means DATA1.
static uint32_t Data1  = 0x55555555;


// Current connection state
static uint8_t connectState;
const uint8_t CONNECTED = 0x01;     // connected
const uint8_t SUSPENDED = 0x02;     // suspended/sleep mode


// get the current USB frame number from the hardware controller
static uint32_t frameNumber()
{
    return((USB0->FRMNUML | (USB0->FRMNUMH << 8)) & 0x07FF);
}

uint32_t USBHAL::endpointReadcore(uint8_t endpoint, uint8_t *buffer)
{
    return 0;
}

USBHAL::USBHAL(void)
{
    // Disable IRQ
    NVIC_DisableIRQ(USB0_IRQn);

#if defined(TARGET_K64F)
    MPU->CESR=0;
#endif
    // fill in callback array
    epCallback[0] = &USBHAL::EP1_OUT_callback;
    epCallback[1] = &USBHAL::EP1_IN_callback;
    epCallback[2] = &USBHAL::EP2_OUT_callback;
    epCallback[3] = &USBHAL::EP2_IN_callback;
    epCallback[4] = &USBHAL::EP3_OUT_callback;
    epCallback[5] = &USBHAL::EP3_IN_callback;
    epCallback[6] = &USBHAL::EP4_OUT_callback;
    epCallback[7] = &USBHAL::EP4_IN_callback;
    epCallback[8] = &USBHAL::EP5_OUT_callback;
    epCallback[9] = &USBHAL::EP5_IN_callback;
    epCallback[10] = &USBHAL::EP6_OUT_callback;
    epCallback[11] = &USBHAL::EP6_IN_callback;
    epCallback[12] = &USBHAL::EP7_OUT_callback;
    epCallback[13] = &USBHAL::EP7_IN_callback;
    epCallback[14] = &USBHAL::EP8_OUT_callback;
    epCallback[15] = &USBHAL::EP8_IN_callback;
    epCallback[16] = &USBHAL::EP9_OUT_callback;
    epCallback[17] = &USBHAL::EP9_IN_callback;
    epCallback[18] = &USBHAL::EP10_OUT_callback;
    epCallback[19] = &USBHAL::EP10_IN_callback;
    epCallback[20] = &USBHAL::EP11_OUT_callback;
    epCallback[21] = &USBHAL::EP11_IN_callback;
    epCallback[22] = &USBHAL::EP12_OUT_callback;
    epCallback[23] = &USBHAL::EP12_IN_callback;
    epCallback[24] = &USBHAL::EP13_OUT_callback;
    epCallback[25] = &USBHAL::EP13_IN_callback;
    epCallback[26] = &USBHAL::EP14_OUT_callback;
    epCallback[27] = &USBHAL::EP14_IN_callback;
    epCallback[28] = &USBHAL::EP15_OUT_callback;
    epCallback[29] = &USBHAL::EP15_IN_callback;


    // choose usb src as PLL
    SIM->SOPT2 |= (SIM_SOPT2_USBSRC_MASK | SIM_SOPT2_PLLFLLSEL_MASK);

    // enable OTG clock
    SIM->SCGC4 |= SIM_SCGC4_USBOTG_MASK;

    // Attach IRQ
    instance = this;
    NVIC_SetVector(USB0_IRQn, (uint32_t)&_usbisr);
    NVIC_EnableIRQ(USB0_IRQn);

    // USB Module Configuration
    // Reset USB Module
    USB0->USBTRC0 |= USB_USBTRC0_USBRESET_MASK;
    while(USB0->USBTRC0 & USB_USBTRC0_USBRESET_MASK);

    // Set BDT Base Register
    USB0->BDTPAGE1 = (uint8_t)((uint32_t)bdt>>8);
    USB0->BDTPAGE2 = (uint8_t)((uint32_t)bdt>>16);
    USB0->BDTPAGE3 = (uint8_t)((uint32_t)bdt>>24);

    // Clear interrupt flag
    USB0->ISTAT = 0xff;

    // USB Interrupt Enablers
    USB0->INTEN |= USB_INTEN_TOKDNEEN_MASK |
                   USB_INTEN_SOFTOKEN_MASK |
                   USB_INTEN_ERROREN_MASK  |
                   USB_INTEN_SLEEPEN_MASK |
                   USB_INTEN_RESUMEEN_MASK |
                   USB_INTEN_USBRSTEN_MASK;

    // Disable weak pull downs
    USB0->USBCTRL &= ~(USB_USBCTRL_PDE_MASK | USB_USBCTRL_SUSP_MASK);

    USB0->USBTRC0 |= 0x40;
}

USBHAL::~USBHAL(void)
{
}

void USBHAL::connect(void)
{
    // Enable USB, and keep the SIE's internal even/odd state set to EVEN
    // for all packets.  This effectively disables the double-buffering system.
    USB0->CTL |= USB_CTL_USBENSOFEN_MASK | USB_CTL_ODDRST_MASK;

    // Pull up enable
    USB0->CONTROL |= USB_CONTROL_DPPULLUPNONOTG_MASK;

    // mark as connected
    connectState = CONNECTED;
}

void USBHAL::disconnect(void)
{
    // disable USB
    USB0->CTL &= ~USB_CTL_USBENSOFEN_MASK;

    // Pull up disable
    USB0->CONTROL &= ~USB_CONTROL_DPPULLUPNONOTG_MASK;

    //Free buffers if required:
    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) {
        free(endpoint_buffer[i]);
        endpoint_buffer[i] = NULL;
    }

    // mark as disconnected
    connectState = 0;
}

void USBHAL::configureDevice(void)
{
    // not needed
}

void USBHAL::unconfigureDevice(void)
{
    // not needed
}

void USBHAL::setAddress(uint8_t address)
{
    // we don't set the address now otherwise the usb controller does not ack
    // we set a flag instead
    // see usbisr when an IN token is received
    set_addr = 1;
    addr = address;
}

bool USBHAL::realiseEndpoint(uint8_t endpoint, uint32_t maxPacket, uint32_t flags)
{
    // validate the endpoint number
    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
        return false;

    // get the logical endpoint
    uint32_t log_endpoint = PHY_TO_LOG(endpoint);

    // get the BDT index for the endpoint's two buffers (even and odd)
    int txrx = IN_EP(endpoint) ? TX : RX;
    int eidx = EP_BDT_IDX(log_endpoint, txrx, EVEN);
    int oidx = EP_BDT_IDX(log_endpoint, txrx, ODD);

    // is this to be an isochronous endpoing?
    int iso = (flags & ISOCHRONOUS);

    // figure the size needed for this buffer
    uint32_t alo_size = (iso ? 1023 : 64);

    // get the current buffer
    uint8_t *buf = endpoint_buffer[endpoint];
    
    // if there's an existing buffer that's too small, delete it and replace it
    if (buf != NULL && endpoint_buffer_size[endpoint] < alo_size)
    {
        free(endpoint_buffer[endpoint]);
        endpoint_buffer[endpoint] = buf = NULL;
    }
    
    // if there's no buffer, allocate a new one
    if (buf == NULL)
    {
        // Allocate the buffer space
        endpoint_buffer[endpoint] = buf = (uint8_t *)malloc(alo_size);
        
        // remember the buffer size
        endpoint_buffer_size[endpoint] = alo_size;
    }

    // Set the up endpoint flags.  Use the ACK handshake for all types except
    // isochronous, and set the RX/TX flag according to the IN/OUT type.
    uint32_t handshake_flag = (iso ? 0 : USB_ENDPT_EPHSHK_MASK);
    uint32_t rxtx_flag = IN_EP(endpoint) ? USB_ENDPT_EPTXEN_MASK : USB_ENDPT_EPRXEN_MASK;
    USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag | rxtx_flag;
    
    // set up the buffer address and size for the first (even) buffer
    bdt[eidx].info = 0;
    bdt[eidx].address = (uint32_t)buf;
    bdt[eidx].byte_count = maxPacket;

    // set the second (odd) buffer - we don't use this, so just clear it out
    bdt[oidx].info = 0;
    bdt[oidx].address = 0;
    bdt[oidx].byte_count = 0;

    // If it's an OUT (RX) endpoint, queue the first read on the channel by
    // giving ownership of the BDT entry to the USB hardware.  This lets the
    // USB subsystem receive data into this buffer as soon as the host sends
    // something.  Note that this queues a read on the even buffer, so the
    // next read will be on the odd buffer.
    if (OUT_EP(endpoint))
        bdt[eidx].info = BD_OWN_MASK | BD_DTS_MASK;

    // Set the initial DATA1 for the endpoint.  This represents the packet
    // type (DATA0 or DATA1) for the next operation.  The first packet sent
    // or received on an endpoint is always DATA0, but note that we set the
    // 1 bit. For an RX endpoint, we've just queued the first read with a
    // DATA0 packet, so the next read operation we perform will set up to 
    // receive the *second* packet, which wll be a DATA1, thus the 1 bit.
    // For a TX endpoint, the next write operation will be the first on the
    // new pipe, so it will need to send a DATA0 packet; but our write
    // routine happens to invert the Data1 bit *before* sending the new
    // packet, so we need to initialize this to 1.  Conveniently, then,
    // it's a 1 for both channel types.
    Data1 |= EPMASK(endpoint);    

    // success
    return true;
}

// Determine if we own an endpoint's BDT structure.  Returns true if so, false
// if the USB hardware owns the BDT.  The USB system has exclusive access to the
// BDT while it's processing a transfer; this is negotiated via the OWN bit in
// the BDT's info field.
bool ownBDT(int idx)
{
    // get the BDT 'info' field pointer - note that this is accessed asynchronously
    // by the USB hardware, so use a 'volatile' pointer to let the compiler know
    // that the memory location is shared with other hardware writers
    volatile uint8_t *p = &bdt[idx].info;

    // The OWN bit means that the USB hardware owns the BDT, so *we* own the BDT
    // if and only if the OWN bit is OFF.
    return (*p & BD_OWN_MASK) == 0;
}

// Was the last operation on a buffer completed?  An operation is complete if
// we own the buffer and the BD_COMPLETED_MASK flag is set.  
bool completed(int idx)
{
    return ownBDT(idx) && (bdt[idx].info & BD_COMPLETED_MASK) != 0;
}

// set the completed flag on a buffer
void setCompleted(int idx, bool val)
{
    if (val)
        bdt[idx].info |= BD_COMPLETED_MASK;
    else
        bdt[idx].info &= ~BD_COMPLETED_MASK;
}

// Queue the next read on an endpoint
static void queueNextRead(int endpoint)
{
    // get the active buffer for the endpoint
    int log_endpoint = PHY_TO_LOG(endpoint);
    int idx = EP_BDT_IDX(log_endpoint, RX, EVEN);
    
    // If the buffer is owned by the SIE, it's already queued for
    // reading, so there's nothing more to do.  If it's marked as
    // "completed", it contains received data that we haven't read
    // yet, so we also don't need to queue a new read.
    if (!ownBDT(idx) || completed(idx))
        return;

    // get the buffer we read from check for a SETUP token
    int setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
    
    // Figure the DATA0/DATA1 status for the next packet.  Assuming
    // that the value we read matches the value in Data1, flip the
    // bit for the next packet, EXCEPT that we need to set DATA0 for
    // the next packet after a SETUP token with no data stage.  If
    // we don't have a match, leave the value the same so that we can
    // retry with the next packet.
    if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) 
    {
        uint8_t *buffer = (uint8_t *)bdt[idx].address;
        if (setup && (buffer[6] == 0))  // if no setup data stage,
            Data1 &= ~1UL;              // set DATA0
        else
            Data1 ^= (1 << endpoint);
    }
    
    // Hand ownership of the buffer back to the USB subsystem so
    // that it can transfer the next incomding packet.  Set DTS
    // mode and set the DATA0/DATA1 value we figured above.
    bdt[idx].byte_count = endpoint_buffer_size[endpoint];
    bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
}

// read setup packet
void USBHAL::EP0setup(uint8_t *buffer)
{
    uint32_t sz;
    endpointReadResult(EP0OUT, buffer, &sz);
}

void USBHAL::EP0readStage(void)
{
    Data1 &= ~1UL;  // set DATA0
    bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);
}

void USBHAL::EP0read(void)
{
    // queue the next read on the endpoint
    queueNextRead(EP0OUT);
}

uint32_t USBHAL::EP0getReadResult(uint8_t *buffer)
{
    // read the endpoint result and return the size
    uint32_t sz = 0;
    endpointReadResult(EP0OUT, buffer, &sz);
    return sz;
}

void USBHAL::EP0write(uint8_t *buffer, uint32_t size)
{
    endpointWrite(EP0IN, buffer, size);
}

void USBHAL::EP0getWriteResult(void)
{
}

void USBHAL::EP0stall(void)
{
    stallEndpoint(EP0OUT);
}

EP_STATUS USBHAL::endpointRead(uint8_t endpoint, uint32_t maximumSize)
{
    // queue the next read on the endpoint
    queueNextRead(endpoint);

    // reads are always pending on setup - they're completed by reading the result
    return EP_PENDING;
}

EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t *buffer, uint32_t *bytesRead)
{
    // validate the endpoint number
    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
        return EP_INVALID;

    // we can only read from an OUT (RX) endpoint
    if (IN_EP(endpoint))
        return EP_INVALID;
        
    // get the logical endpoint number and BDT index
    uint32_t log_endpoint = PHY_TO_LOG(endpoint);
    int idx = EP_BDT_IDX(log_endpoint, RX, EVEN);

    // note if it's an ISO endpoint
    int iso = !(USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK);

    // If we haven't completed the read, return Pending; this doesn't apply
    // to isochronous endpoints, where we read continuously without an interrupt
    if (log_endpoint != 0 && !iso && !completed(idx))
        return EP_PENDING;

    // copy the data
    uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
    uint32_t sz = *bytesRead = bdt[idx].byte_count;
    for (uint32_t n = 0 ; n < sz ; n++)
        buffer[n] = ep_buf[n];

    // we've read the data out of the buffer - clear the completion flag
    // for the next read
    setCompleted(idx, false);
    
    // queue the next read
    queueNextRead(endpoint);
    
    // Clear the SUSPEND flag to resume token processing.  The USB hardware 
    // sets this flag upon receiving a SETUP token, so that the microcontroller
    // software can dequeueue pending packet transactions if necessary.
    USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;

    // successful completion
    return EP_COMPLETED;
}

EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, uint8_t *data, uint32_t size)
{
    // validate the endpoint number
    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
        return EP_INVALID;

    // we can only write to an IN (TX) endpoint
    if (OUT_EP(endpoint))
        return EP_INVALID;
        
    // get the BDT index
    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, EVEN);

    // we can't write the endpoint until we have ownership - if the USB
    // subsystem owns it, it means that the last write is still in progress,
    // so we have to wait until it completes
    if (!ownBDT(idx))
        return EP_BUSY;

    // copy the data into the USB buffer
    uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
    bdt[idx].byte_count = size;
    for (uint32_t n = 0 ; n < size ; n++)
        ep_buf[n] = data[n];
        
    // flip the DATA0/DATA1 status
    Data1 ^= EPMASK(endpoint);
    
    // hand ownership of the BDT to the USB subsystem, setting DTS mode
    // (auto DATA0/DATA1 sequence sensing), and setting the next DATA0/DATA1
    // type for the packet
    bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);

    // the transfer is now pending in the USB hardware
    return EP_PENDING;
}

EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint)
{
    // get the BDT index
    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, EVEN);

    // check the completion flag
    if (completed(idx))
    {
        // operation completed - we've now consumed the completion flag, so
        // clear the flag to mark the buffer as free
        setCompleted(idx, false);
        
        // return status = COMPLETED
        return EP_COMPLETED;
    }
    
    // not completed - operation is still pending
    return EP_PENDING;
}

void USBHAL::stallEndpoint(uint8_t endpoint)
{
    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
}

void USBHAL::unstallEndpoint(uint8_t endpoint)
{
    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
}

bool USBHAL::getEndpointStallState(uint8_t endpoint)
{
    uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
    return (stall) ? true : false;
}

void USBHAL::remoteWakeup(void)
{
    // [TODO]
}


void USBHAL::_usbisr(void)
{
    instance->usbisr();
}


void USBHAL::usbisr(void)
{
    uint8_t i;
    uint8_t istat = USB0->ISTAT;

    // reset interrupt
    if (istat & USB_ISTAT_USBRST_MASK)
    {
        // disable all endpt
        for (i = 0 ; i < 16 ; i++)
            USB0->ENDPOINT[i].ENDPT = 0x00;

        // enable control endpoint
        realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
        realiseEndpoint(EP0IN, MAX_PACKET_SIZE_EP0, 0);

        Data1 = 0x55555555;
        USB0->CTL |=  USB_CTL_ODDRST_MASK;

        USB0->ISTAT   =  0xFF;  // clear all interrupt status flags
        USB0->ERRSTAT =  0xFF;  // clear all error flags
        USB0->ERREN   =  0xFF;  // enable error interrupt sources
        USB0->ADDR    =  0x00;  // set default address
        
        // notify upper layers of the bus reset
        busReset();
        
        // we're not suspended
        suspendStateChanged(0);

        // we're not connected or suspended
        connectState = 0;

        // do no other processing on a reset event
        return;
    }

    // resume interrupt
    if (istat & USB_ISTAT_RESUME_MASK)
    {
        USB0->ISTAT = USB_ISTAT_RESUME_MASK;
        connectState &= ~SUSPENDED;
        suspendStateChanged(0);
    }

    // SOF interrupt
    if (istat & USB_ISTAT_SOFTOK_MASK)
    {
        USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
        // SOF event, read frame number
        SOF(frameNumber());
    }

    // stall interrupt
    if (istat & (1<<7))
    {
        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
            USB0->ENDPOINT[0].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
        USB0->ISTAT |= USB_ISTAT_STALL_MASK;
    }

    // token interrupt
    if (istat & (1<<3))
    {
        // get the logical endpoint number, direction, and buffer parity
        // from the device status
        uint32_t num = (USB0->STAT >> 4) & 0x0F;
        uint32_t dir = (USB0->STAT >> 3) & 0x01;
        uint32_t odd = (USB0->STAT >> 2) & 0x01;

        // figure the physical endpoint number and bit vector mask
        int endpoint = (num << 1) | dir;
        
        // figure the BDT index
        int idx = EP_BDT_IDX(num, dir, odd);

        // setup packet
        if (num == 0 && TOK_PID(idx) == SETUP_TOKEN)
        {
            // The first IN packet after a SETUP token is required to be
            // DATA1.  (Note that the write routine flips the bit before
            // writing, so we have to clear the bit to get DATA1 on the
            // next packet.)
            Data1 &= ~0x02;

            // handle the EP0 SETUP event (SETUP data received)
            EP0setupCallback();
        }
        else
        {
            // Normal data (non-SETUP) packet
            
            if (TOK_PID(idx) == OUT_TOKEN)
            {
                // OUT packet
                if (num == 0)
                    EP0out();
                else
                {
                    setCompleted(idx, true);
                    if ((instance->*(epCallback[endpoint - 2]))())
                        setCompleted(idx, false);
                }
            }

            if (TOK_PID(idx) == IN_TOKEN)
            {
                // IN packet
                if (num == 0)
                {
                    EP0in();
                    if (set_addr == 1)
                    {
                        USB0->ADDR = addr & 0x7F;
                        set_addr = 0;
                    }
                }
                else
                {
                    setCompleted(idx, true);
                    if ((instance->*(epCallback[endpoint - 2]))())
                        setCompleted(idx, false);
                }
            }
        }

        USB0->ISTAT = USB_ISTAT_TOKDNE_MASK;
    }

    // sleep interrupt
    if (istat & (1<<4))
    {
        USB0->ISTAT |= USB_ISTAT_SLEEP_MASK;
        connectState |= SUSPENDED;
        suspendStateChanged(1);
    }

    // error interrupt
    if (istat & USB_ISTAT_ERROR_MASK)
    {
        USB0->ERRSTAT = 0xFF;
        USB0->ISTAT |= USB_ISTAT_ERROR_MASK;
    }
}

#endif