USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

Revision:
34:884405d998bb
Parent:
31:81f57ea86f8f
Child:
35:53e1a208f582
--- a/USBDevice/USBHAL_KL25Z.cpp	Sat Dec 19 06:35:44 2015 +0000
+++ b/USBDevice/USBHAL_KL25Z.cpp	Thu Dec 24 01:37:22 2015 +0000
@@ -16,16 +16,39 @@
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */
 
+// USB HAL implementation for KLxx
+//
+// This implementation uses the KLxx USB SIE (Serial Interface Engine), which is
+// a hardware subsystem in these microcontroller that handles the low-level transmission
+// details.  The SIE operates concurrently with the CPU, allowing bus transmissions to 
+// proceed without CPU attention.  The CPU and SIE coordinate their activities through
+// hardware registers and shared RAM buffers.  The main coordination point is the BDT,
+// or Buffer Descriptor Table, a shared memory region that holds status information on
+// all active endpoint buffers.
+//
+// This version has a number of substantial changes from the original mbed version:
+//
+//  - Fixed several bugs in the interrupt handler related to figuring the endpoint
+//    number.  The official version does the calculations wrong in such a way that
+//    the errors are mostly harmless for endpoints 0-4 but will cause stray pointer
+//    writes for higher endpoint numbers.
+//
+//  - Improved concurrency control between SEI and CPU by paying better attention to
+//    who has ownership of a BDT entry at any given time, and only manipulating a BDT
+//    when we have ownership.
+//
+//  - Cleaned up and added a lot of documentation to the code.  Rationalized some
+//    misnomers (e.g., EVEN and ODD were reversed).  Simplified the endpoint realization
+//    routine considerably to remove redundant code.
+
 #if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)
 
 #include "USBHAL.h"
 
 USBHAL * USBHAL::instance;
 
-static volatile int epComplete = 0;
-
 // Convert physical endpoint number to register bit
-#define EP(endpoint) (1<<(endpoint))
+#define EPMASK(endpoint) (1<<(endpoint))
 
 // Convert physical to logical
 #define PHY_TO_LOG(endpoint)    ((endpoint)>>1)
@@ -34,55 +57,107 @@
 #define IN_EP(endpoint)     ((endpoint) & 1U ? true : false)
 #define OUT_EP(endpoint)    ((endpoint) & 1U ? false : true)
 
-#define BD_OWN_MASK        (1<<7)
-#define BD_DATA01_MASK     (1<<6)
-#define BD_KEEP_MASK       (1<<5)
-#define BD_NINC_MASK       (1<<4)
-#define BD_DTS_MASK        (1<<3)
-#define BD_STALL_MASK      (1<<2)
-
+// TRANSMIT/RECEIVE direction flag - this is relative to the microcontroller.
+// In USB lingo, directions are always host-relative, so TX == IN (we're
+// transmitting, host is receiving) and RX == OUT (host is transmitting,
+// we're receiving).
 #define TX    1
 #define RX    0
-#define ODD   0
-#define EVEN  1
-// this macro waits a physical endpoint number
-#define EP_BDT_IDX(ep, dir, odd) (((ep * 4) + (2 * dir) + (1 *  odd)))
+
+// Double-buffer parity.  This implementation doesn't use the double-buffering
+// feature of the SIE hardware; we simply always use the EVEN buffer for each
+// endpoint.
+#define EVEN  0
+#define ODD   1
+
+// Get the index of the buffer descriptor for the given endpoint in the
+// given direction with the given double-buffer parity.  'ep' is the
+// logical endpoint number (0-15), 'dir' is the direction (0=receive,
+// 1=transmit), and 'odd' is the double-buffer parity (0=even, 1=odd).
+#define EP_BDT_IDX(ep, dir, odd) ((((ep) * 4) + (2 * (dir)) + (1 *  (odd))))
 
+// BDT = buffer descriptor table entry.  Each endpoint has two of these structures, 
+// forming a double buffer arrangement that allows the microprocessor to prepare a 
+// buffer for an upcoming operation while the USB SIE is transferring data in the 
+// other buffer.  The OWN bit in the 'info' member indicates who owns the buffer:
+// 0 means that the microprocessor owns the buffer, 1 means that the USB SIE owns
+// the buffer (and is actively transferring between USB and local memory).
+typedef struct BDT {
+    uint8_t   info;       // flags - a combination of BD_xxx_MASK flags below;
+                          //         bits 2-5 are overloaded as the Token PID
+                          //         for received packets
+    uint8_t   reserved;   // reserved, must be 0
+    uint16_t  byte_count; // bytes in buffer (10 bits only - high 6 bits reserved)
+    uint32_t  address;    // address of buffer in RAM
+} BDT;
+
+// BD (buffer descriptor) bit flags, defined by the USB hardware
+#define BD_OWN_MASK        (1<<7)   // OWN semaphore bit - 0 -> CPU owns BD, 1 -> USB-FS owns BD
+#define BD_DATA01_MASK     (1<<6)   // PID - 0 -> DATA0, 1 -> DATA1
+#define BD_KEEP_MASK       (1<<5)   // Keep: 0 -> release ownership upon packet completion, 1 -> keep ownership
+#define BD_NINC_MASK       (1<<4)   // Mode: 0 -> auto increment DMA mode, 1 -> no increment FIFO mode
+#define BD_DTS_MASK        (1<<3)   // Data Toggle Sync: 0 -> DTS disabled, 1 -> DTS enabled
+#define BD_STALL_MASK      (1<<2)   // Stall: 0 -> no stall is issued; 1 -> STALL handshake is issued if a token is received for this BDT
+//                         (1<<1)   // reserved (bit 1), must be 0
+//                         (1<<0)   // reserved (bit 0), must be 0
+
+// Private BD bit flags.  When we own a buffer, we can use the
+// reserved bits in the info field for our own private information.
+#define BD_COMPLETED_MASK  (1<<0)   // Operation Completed - buffer contains valid read data, or write data was transmitted
+
+// Extract the Token PID from a BDT entry.  This gives us the PID for a 
+// received packet, stored in bits 2-5 of the first byte of the BDT
+// (the 'info' byte in our struct).  The USB hardware sets these bits
+// when a packet is received.  Note that these bits are overloaded
+// as flag bits when we hand off the BDT to the USB hardware; the USB
+// subsystem overwrites them on receiving data.  The possible PID
+// values are listed below (only for device mode - there are other
+// values possible in host mode, which we don't implement here).
+#define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
 #define SETUP_TOKEN    0x0D
 #define IN_TOKEN       0x09
 #define OUT_TOKEN      0x01
-#define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
+
+// 16 bidirectional endpoints -> 32 physical endpoints
+// 2 BDT entries per endpoint (even, odd)) -> 64 BDT entries total
+__attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];
+
+// Endpoint buffers - one buffer per physical endpoint
+uint8_t *endpoint_buffer[NUMBER_OF_PHYSICAL_ENDPOINTS];
+uint32_t endpoint_buffer_size[NUMBER_OF_PHYSICAL_ENDPOINTS];
 
-// for each endpt: 8 bytes
-typedef struct BDT {
-    uint8_t   info;       // BD[0:7]
-    uint8_t   dummy;      // RSVD: BD[8:15]
-    uint16_t  byte_count; // BD[16:32]
-    uint32_t  address;    // Addr
-} BDT;
+// USB device address
+static uint8_t addr = 0;
+
+// SET ADDRESS mode flag
+static uint8_t set_addr = 0;
+
+// Current DATA0/DATA1 value, per physical endpoint.  This has the 32
+// endpoints packed into the 32 bits (TX15 RX15 TX14 RX14 ... TX0 RX0).
+// The bits represent the packet type for the next operation - a 0 bit
+// means DATA0, a 1 bit means DATA1.
+static uint32_t Data1  = 0x55555555;
 
 
-// there are:
-//    * 16 bidirectionnal endpt -> 32 physical endpt
-//    * as there are ODD and EVEN buffer -> 32*2 bdt
-__attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];
-uint8_t * endpoint_buffer[(NUMBER_OF_PHYSICAL_ENDPOINTS - 2) * 2];
-uint8_t * endpoint_buffer_iso[2*2];
+// Current connection state
+static uint8_t connectState;
+const uint8_t CONNECTED = 0x01;     // connected
+const uint8_t SUSPENDED = 0x02;     // suspended/sleep mode
 
-static uint8_t set_addr = 0;
-static uint8_t addr = 0;
 
-static uint32_t Data1  = 0x55555555;
-
-static uint32_t frameNumber() {
+// get the current USB frame number from the hardware controller
+static uint32_t frameNumber()
+{
     return((USB0->FRMNUML | (USB0->FRMNUMH << 8)) & 0x07FF);
 }
 
-uint32_t USBHAL::endpointReadcore(uint8_t endpoint, uint8_t *buffer) {
+uint32_t USBHAL::endpointReadcore(uint8_t endpoint, uint8_t *buffer)
+{
     return 0;
 }
 
-USBHAL::USBHAL(void) {
+USBHAL::USBHAL(void)
+{
     // Disable IRQ
     NVIC_DisableIRQ(USB0_IRQn);
 
@@ -160,286 +235,402 @@
     USB0->USBTRC0 |= 0x40;
 }
 
-USBHAL::~USBHAL(void) { }
+USBHAL::~USBHAL(void)
+{
+}
 
-void USBHAL::connect(void) {
-    // enable USB
-    USB0->CTL |= USB_CTL_USBENSOFEN_MASK;
+void USBHAL::connect(void)
+{
+    // Enable USB, and keep the SIE's internal even/odd state set to EVEN
+    // for all packets.  This effectively disables the double-buffering system.
+    USB0->CTL |= USB_CTL_USBENSOFEN_MASK | USB_CTL_ODDRST_MASK;
+
     // Pull up enable
     USB0->CONTROL |= USB_CONTROL_DPPULLUPNONOTG_MASK;
+
+    // mark as connected
+    connectState = CONNECTED;
 }
 
-void USBHAL::disconnect(void) {
+void USBHAL::disconnect(void)
+{
     // disable USB
     USB0->CTL &= ~USB_CTL_USBENSOFEN_MASK;
+
     // Pull up disable
     USB0->CONTROL &= ~USB_CONTROL_DPPULLUPNONOTG_MASK;
 
     //Free buffers if required:
-    for (int i = 0; i<(NUMBER_OF_PHYSICAL_ENDPOINTS - 2) * 2; i++) {
+    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) {
         free(endpoint_buffer[i]);
         endpoint_buffer[i] = NULL;
     }
-    free(endpoint_buffer_iso[2]);
-    endpoint_buffer_iso[2] = NULL;
-    free(endpoint_buffer_iso[0]);
-    endpoint_buffer_iso[0] = NULL;
+
+    // mark as disconnected
+    connectState = 0;
 }
 
-void USBHAL::configureDevice(void) {
+void USBHAL::configureDevice(void)
+{
     // not needed
 }
 
-void USBHAL::unconfigureDevice(void) {
+void USBHAL::unconfigureDevice(void)
+{
     // not needed
 }
 
-void USBHAL::setAddress(uint8_t address) {
+void USBHAL::setAddress(uint8_t address)
+{
     // we don't set the address now otherwise the usb controller does not ack
     // we set a flag instead
     // see usbisr when an IN token is received
     set_addr = 1;
     addr = address;
- }
+}
 
-bool USBHAL::realiseEndpoint(uint8_t endpoint, uint32_t maxPacket, uint32_t flags) {
-    uint32_t handshake_flag = 0;
-    uint8_t * buf;
+bool USBHAL::realiseEndpoint(uint8_t endpoint, uint32_t maxPacket, uint32_t flags)
+{
+    // validate the endpoint number
+    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
+        return false;
 
-    if (endpoint > NUMBER_OF_PHYSICAL_ENDPOINTS - 1) {
-        return false;
-    }
-
+    // get the logical endpoint
     uint32_t log_endpoint = PHY_TO_LOG(endpoint);
 
-    if ((flags & ISOCHRONOUS) == 0) {
-        handshake_flag = USB_ENDPT_EPHSHK_MASK;
-        if (IN_EP(endpoint)) {
-            if (endpoint_buffer[EP_BDT_IDX(log_endpoint, TX, ODD)] == NULL)
-                endpoint_buffer[EP_BDT_IDX(log_endpoint, TX, ODD)] = (uint8_t *) malloc (64*2);
-            buf = &endpoint_buffer[EP_BDT_IDX(log_endpoint, TX, ODD)][0];
-        } else {
-            if (endpoint_buffer[EP_BDT_IDX(log_endpoint, RX, ODD)] == NULL)
-                endpoint_buffer[EP_BDT_IDX(log_endpoint, RX, ODD)] = (uint8_t *) malloc (64*2);
-            buf = &endpoint_buffer[EP_BDT_IDX(log_endpoint, RX, ODD)][0];
-        }
-    } else {
-        if (IN_EP(endpoint)) {
-            if (endpoint_buffer_iso[2] == NULL)
-                endpoint_buffer_iso[2] = (uint8_t *) malloc (1023*2);
-            buf = &endpoint_buffer_iso[2][0];
-        } else {
-            if (endpoint_buffer_iso[0] == NULL)
-                endpoint_buffer_iso[0] = (uint8_t *) malloc (1023*2);
-            buf = &endpoint_buffer_iso[0][0];
-        }
+    // get the BDT index for the endpoint's two buffers (even and odd)
+    int txrx = IN_EP(endpoint) ? TX : RX;
+    int eidx = EP_BDT_IDX(log_endpoint, txrx, EVEN);
+    int oidx = EP_BDT_IDX(log_endpoint, txrx, ODD);
+
+    // is this to be an isochronous endpoing?
+    int iso = (flags & ISOCHRONOUS);
+
+    // figure the size needed for this buffer
+    uint32_t alo_size = (iso ? 1023 : 64);
+
+    // get the current buffer
+    uint8_t *buf = endpoint_buffer[endpoint];
+    
+    // if there's an existing buffer that's too small, delete it and replace it
+    if (buf != NULL && endpoint_buffer_size[endpoint] < alo_size)
+    {
+        free(endpoint_buffer[endpoint]);
+        endpoint_buffer[endpoint] = buf = NULL;
+    }
+    
+    // if there's no buffer, allocate a new one
+    if (buf == NULL)
+    {
+        // Allocate the buffer space
+        endpoint_buffer[endpoint] = buf = (uint8_t *)malloc(alo_size);
+        
+        // remember the buffer size
+        endpoint_buffer_size[endpoint] = alo_size;
     }
 
-    // IN endpt -> device to host (TX)
-    if (IN_EP(endpoint)) {
-        USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag |        // ep handshaking (not if iso endpoint)
-                                              USB_ENDPT_EPTXEN_MASK;  // en TX (IN) tran
-        bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].address = (uint32_t) buf;
-        bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].address = 0;
-    }
-    // OUT endpt -> host to device (RX)
-    else {
-        USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag |        // ep handshaking (not if iso endpoint)
-                                              USB_ENDPT_EPRXEN_MASK;  // en RX (OUT) tran.
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].byte_count = maxPacket;
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].address    = (uint32_t) buf;
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].info       = BD_OWN_MASK | BD_DTS_MASK;
-        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].info       = 0;
-    }
+    // Set the up endpoint flags.  Use the ACK handshake for all types except
+    // isochronous, and set the RX/TX flag according to the IN/OUT type.
+    uint32_t handshake_flag = (iso ? 0 : USB_ENDPT_EPHSHK_MASK);
+    uint32_t rxtx_flag = IN_EP(endpoint) ? USB_ENDPT_EPTXEN_MASK : USB_ENDPT_EPRXEN_MASK;
+    USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag | rxtx_flag;
+    
+    // set up the buffer address and size for the first (even) buffer
+    bdt[eidx].info = 0;
+    bdt[eidx].address = (uint32_t)buf;
+    bdt[eidx].byte_count = maxPacket;
+
+    // set the second (odd) buffer - we don't use this, so just clear it out
+    bdt[oidx].info = 0;
+    bdt[oidx].address = 0;
+    bdt[oidx].byte_count = 0;
 
-    Data1 |= (1 << endpoint);
+    // If it's an OUT (RX) endpoint, queue the first read on the channel by
+    // giving ownership of the BDT entry to the USB hardware.  This lets the
+    // USB subsystem receive data into this buffer as soon as the host sends
+    // something.  Note that this queues a read on the even buffer, so the
+    // next read will be on the odd buffer.
+    if (OUT_EP(endpoint))
+        bdt[eidx].info = BD_OWN_MASK | BD_DTS_MASK;
 
+    // Set the initial DATA1 for the endpoint.  This represents the packet
+    // type (DATA0 or DATA1) for the next operation.  The first packet sent
+    // or received on an endpoint is always DATA0, but note that we set the
+    // 1 bit. For an RX endpoint, we've just queued the first read with a
+    // DATA0 packet, so the next read operation we perform will set up to 
+    // receive the *second* packet, which wll be a DATA1, thus the 1 bit.
+    // For a TX endpoint, the next write operation will be the first on the
+    // new pipe, so it will need to send a DATA0 packet; but our write
+    // routine happens to invert the Data1 bit *before* sending the new
+    // packet, so we need to initialize this to 1.  Conveniently, then,
+    // it's a 1 for both channel types.
+    Data1 |= EPMASK(endpoint);    
+
+    // success
     return true;
 }
 
-// read setup packet
-void USBHAL::EP0setup(uint8_t *buffer) {
-    uint32_t sz;
-    endpointReadResult(EP0OUT, buffer, &sz);
-}
+// Determine if we own an endpoint's BDT structure.  Returns true if so, false
+// if the USB hardware owns the BDT.  The USB system has exclusive access to the
+// BDT while it's processing a transfer; this is negotiated via the OWN bit in
+// the BDT's info field.
+bool ownBDT(int idx)
+{
+    // get the BDT 'info' field pointer - note that this is accessed asynchronously
+    // by the USB hardware, so use a 'volatile' pointer to let the compiler know
+    // that the memory location is shared with other hardware writers
+    volatile uint8_t *p = &bdt[idx].info;
 
-void USBHAL::EP0readStage(void) {
-    Data1 &= ~1UL;  // set DATA0
-    bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);
-}
-
-void USBHAL::EP0read(void) {
-    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(EP0OUT), RX, 0);
-    bdt[idx].byte_count = MAX_PACKET_SIZE_EP0;
+    // The OWN bit means that the USB hardware owns the BDT, so *we* own the BDT
+    // if and only if the OWN bit is OFF.
+    return (*p & BD_OWN_MASK) == 0;
 }
 
-uint32_t USBHAL::EP0getReadResult(uint8_t *buffer) {
-    uint32_t sz;
-    endpointReadResult(EP0OUT, buffer, &sz);
-    return sz;
-}
-
-void USBHAL::EP0write(uint8_t *buffer, uint32_t size) {
-    endpointWrite(EP0IN, buffer, size);
-}
-
-void USBHAL::EP0getWriteResult(void) {
-}
-
-void USBHAL::EP0stall(void) {
-    stallEndpoint(EP0OUT);
-}
-
-EP_STATUS USBHAL::endpointRead(uint8_t endpoint, uint32_t maximumSize) {
-    endpoint = PHY_TO_LOG(endpoint);
-    uint32_t idx = EP_BDT_IDX(endpoint, RX, 0);
-    bdt[idx].byte_count = maximumSize;
-    return EP_PENDING;
+// Was the last operation on a buffer completed?  An operation is complete if
+// we own the buffer and the BD_COMPLETED_MASK flag is set.  
+bool completed(int idx)
+{
+    return ownBDT(idx) && (bdt[idx].info & BD_COMPLETED_MASK) != 0;
 }
 
-EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t * buffer, uint32_t *bytesRead) {
-    uint32_t n, sz, idx, setup = 0;
-    uint8_t not_iso;
-    uint8_t * ep_buf;
-
-    uint32_t log_endpoint = PHY_TO_LOG(endpoint);
-
-    if (endpoint > NUMBER_OF_PHYSICAL_ENDPOINTS - 1) {
-        return EP_INVALID;
-    }
-
-    // if read on a IN endpoint -> error
-    if (IN_EP(endpoint)) {
-        return EP_INVALID;
-    }
-
-    idx = EP_BDT_IDX(log_endpoint, RX, 0);
-    sz  = bdt[idx].byte_count;
-    not_iso = USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK;
+// set the completed flag on a buffer
+void setCompleted(int idx, bool val)
+{
+    if (val)
+        bdt[idx].info |= BD_COMPLETED_MASK;
+    else
+        bdt[idx].info &= ~BD_COMPLETED_MASK;
+}
 
-    //for isochronous endpoint, we don't wait an interrupt
-    if ((log_endpoint != 0) && not_iso && !(epComplete & EP(endpoint))) {
-        return EP_PENDING;
-    }
-
-    if ((log_endpoint == 0) && (TOK_PID(idx) == SETUP_TOKEN)) {
-        setup = 1;
-    }
+// Queue the next read on an endpoint
+static void queueNextRead(int endpoint)
+{
+    // get the active buffer for the endpoint
+    int log_endpoint = PHY_TO_LOG(endpoint);
+    int idx = EP_BDT_IDX(log_endpoint, RX, EVEN);
+    
+    // If the buffer is owned by the SIE, it's already queued for
+    // reading, so there's nothing more to do.  If it's marked as
+    // "completed", it contains received data that we haven't read
+    // yet, so we also don't need to queue a new read.
+    if (!ownBDT(idx) || completed(idx))
+        return;
 
-    // non iso endpoint
-    if (not_iso) {
-        ep_buf = endpoint_buffer[idx];
-    } else {
-        ep_buf = endpoint_buffer_iso[0];
-    }
-
-    for (n = 0; n < sz; n++) {
-        buffer[n] = ep_buf[n];
-    }
-
-    if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) {
+    // get the buffer we read from check for a SETUP token
+    int setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
+    
+    // Figure the DATA0/DATA1 status for the next packet.  Assuming
+    // that the value we read matches the value in Data1, flip the
+    // bit for the next packet, EXCEPT that we need to set DATA0 for
+    // the next packet after a SETUP token with no data stage.  If
+    // we don't have a match, leave the value the same so that we can
+    // retry with the next packet.
+    if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) 
+    {
+        uint8_t *buffer = (uint8_t *)bdt[idx].address;
         if (setup && (buffer[6] == 0))  // if no setup data stage,
             Data1 &= ~1UL;              // set DATA0
         else
             Data1 ^= (1 << endpoint);
     }
+    
+    // Hand ownership of the buffer back to the USB subsystem so
+    // that it can transfer the next incomding packet.  Set DTS
+    // mode and set the DATA0/DATA1 value we figured above.
+    bdt[idx].byte_count = endpoint_buffer_size[endpoint];
+    bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
+}
 
-    if (((Data1 >> endpoint) & 1)) {
-        bdt[idx].info = BD_DTS_MASK | BD_DATA01_MASK | BD_OWN_MASK;
-    }
-    else {
-        bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK;
-    }
+// read setup packet
+void USBHAL::EP0setup(uint8_t *buffer)
+{
+    uint32_t sz;
+    endpointReadResult(EP0OUT, buffer, &sz);
+}
+
+void USBHAL::EP0readStage(void)
+{
+    Data1 &= ~1UL;  // set DATA0
+    bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);
+}
+
+void USBHAL::EP0read(void)
+{
+    // queue the next read on the endpoint
+    queueNextRead(EP0OUT);
+}
+
+uint32_t USBHAL::EP0getReadResult(uint8_t *buffer)
+{
+    // read the endpoint result and return the size
+    uint32_t sz = 0;
+    endpointReadResult(EP0OUT, buffer, &sz);
+    return sz;
+}
+
+void USBHAL::EP0write(uint8_t *buffer, uint32_t size)
+{
+    endpointWrite(EP0IN, buffer, size);
+}
+
+void USBHAL::EP0getWriteResult(void)
+{
+}
+
+void USBHAL::EP0stall(void)
+{
+    stallEndpoint(EP0OUT);
+}
+
+EP_STATUS USBHAL::endpointRead(uint8_t endpoint, uint32_t maximumSize)
+{
+    // queue the next read on the endpoint
+    queueNextRead(endpoint);
 
+    // reads are always pending on setup - they're completed by reading the result
+    return EP_PENDING;
+}
+
+EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t *buffer, uint32_t *bytesRead)
+{
+    // validate the endpoint number
+    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
+        return EP_INVALID;
+
+    // we can only read from an OUT (RX) endpoint
+    if (IN_EP(endpoint))
+        return EP_INVALID;
+        
+    // get the logical endpoint number and BDT index
+    uint32_t log_endpoint = PHY_TO_LOG(endpoint);
+    int idx = EP_BDT_IDX(log_endpoint, RX, EVEN);
+
+    // note if it's an ISO endpoint
+    int iso = !(USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK);
+
+    // If we haven't completed the read, return Pending; this doesn't apply
+    // to isochronous endpoints, where we read continuously without an interrupt
+    if (log_endpoint != 0 && !iso && !completed(idx))
+        return EP_PENDING;
+
+    // copy the data
+    uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
+    uint32_t sz = *bytesRead = bdt[idx].byte_count;
+    for (uint32_t n = 0 ; n < sz ; n++)
+        buffer[n] = ep_buf[n];
+
+    // we've read the data out of the buffer - clear the completion flag
+    // for the next read
+    setCompleted(idx, false);
+    
+    // queue the next read
+    queueNextRead(endpoint);
+    
+    // Clear the SUSPEND flag to resume token processing.  The USB hardware 
+    // sets this flag upon receiving a SETUP token, so that the microcontroller
+    // software can dequeueue pending packet transactions if necessary.
     USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
-    *bytesRead = sz;
 
-    epComplete &= ~EP(endpoint);
+    // successful completion
     return EP_COMPLETED;
 }
 
-EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, uint8_t *data, uint32_t size) {
-    uint32_t idx, n;
-    uint8_t * ep_buf;
-
-    if (endpoint > NUMBER_OF_PHYSICAL_ENDPOINTS - 1) {
+EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, uint8_t *data, uint32_t size)
+{
+    // validate the endpoint number
+    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
         return EP_INVALID;
-    }
 
-    // if write on a OUT endpoint -> error
-    if (OUT_EP(endpoint)) {
+    // we can only write to an IN (TX) endpoint
+    if (OUT_EP(endpoint))
         return EP_INVALID;
-    }
-
-    idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, 0);
-    bdt[idx].byte_count = size;
-
+        
+    // get the BDT index
+    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, EVEN);
 
-    // non iso endpoint
-    if (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPHSHK_MASK) {
-        ep_buf = endpoint_buffer[idx];
-    } else {
-        ep_buf = endpoint_buffer_iso[2];
-    }
+    // we can't write the endpoint until we have ownership - if the USB
+    // subsystem owns it, it means that the last write is still in progress,
+    // so we have to wait until it completes
+    if (!ownBDT(idx))
+        return EP_BUSY;
 
-    for (n = 0; n < size; n++) {
+    // copy the data into the USB buffer
+    uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
+    bdt[idx].byte_count = size;
+    for (uint32_t n = 0 ; n < size ; n++)
         ep_buf[n] = data[n];
-    }
+        
+    // flip the DATA0/DATA1 status
+    Data1 ^= EPMASK(endpoint);
+    
+    // hand ownership of the BDT to the USB subsystem, setting DTS mode
+    // (auto DATA0/DATA1 sequence sensing), and setting the next DATA0/DATA1
+    // type for the packet
+    bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
 
-    if ((Data1 >> endpoint) & 1) {
-        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
-    } else {
-        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | BD_DATA01_MASK;
-    }
-
-    Data1 ^= (1 << endpoint);
-
+    // the transfer is now pending in the USB hardware
     return EP_PENDING;
 }
 
-EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint) {
-    if (epComplete & EP(endpoint)) {
-        epComplete &= ~EP(endpoint);
+EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint)
+{
+    // get the BDT index
+    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, EVEN);
+
+    // check the completion flag
+    if (completed(idx))
+    {
+        // operation completed - we've now consumed the completion flag, so
+        // clear the flag to mark the buffer as free
+        setCompleted(idx, false);
+        
+        // return status = COMPLETED
         return EP_COMPLETED;
     }
-
+    
+    // not completed - operation is still pending
     return EP_PENDING;
 }
 
-void USBHAL::stallEndpoint(uint8_t endpoint) {
+void USBHAL::stallEndpoint(uint8_t endpoint)
+{
     USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
 }
 
-void USBHAL::unstallEndpoint(uint8_t endpoint) {
+void USBHAL::unstallEndpoint(uint8_t endpoint)
+{
     USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
 }
 
-bool USBHAL::getEndpointStallState(uint8_t endpoint) {
+bool USBHAL::getEndpointStallState(uint8_t endpoint)
+{
     uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
     return (stall) ? true : false;
 }
 
-void USBHAL::remoteWakeup(void) {
+void USBHAL::remoteWakeup(void)
+{
     // [TODO]
 }
 
 
-void USBHAL::_usbisr(void) {
+void USBHAL::_usbisr(void)
+{
     instance->usbisr();
 }
 
 
-void USBHAL::usbisr(void) {
+void USBHAL::usbisr(void)
+{
     uint8_t i;
     uint8_t istat = USB0->ISTAT;
 
     // reset interrupt
-    if (istat & USB_ISTAT_USBRST_MASK) {
-        
+    if (istat & USB_ISTAT_USBRST_MASK)
+    {
         // disable all endpt
-        for(i = 0; i < 16; i++) {
+        for (i = 0 ; i < 16 ; i++)
             USB0->ENDPOINT[i].ENDPT = 0x00;
-        }
 
         // enable control endpoint
         realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
@@ -459,71 +650,98 @@
         // we're not suspended
         suspendStateChanged(0);
 
+        // we're not connected or suspended
+        connectState = 0;
+
+        // do no other processing on a reset event
         return;
     }
 
     // resume interrupt
-    if (istat & USB_ISTAT_RESUME_MASK) {
+    if (istat & USB_ISTAT_RESUME_MASK)
+    {
         USB0->ISTAT = USB_ISTAT_RESUME_MASK;
+        connectState &= ~SUSPENDED;
         suspendStateChanged(0);
     }
 
     // SOF interrupt
-    if (istat & USB_ISTAT_SOFTOK_MASK) {
+    if (istat & USB_ISTAT_SOFTOK_MASK)
+    {
         USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
         // SOF event, read frame number
         SOF(frameNumber());
     }
 
     // stall interrupt
-    if (istat & 1<<7) {
+    if (istat & (1<<7))
+    {
         if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
             USB0->ENDPOINT[0].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
         USB0->ISTAT |= USB_ISTAT_STALL_MASK;
     }
 
     // token interrupt
-    if (istat & 1<<3) {
-        uint32_t num  = (USB0->STAT >> 4) & 0x0F;
-        uint32_t dir  = (USB0->STAT >> 3) & 0x01;
-        uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;
+    if (istat & (1<<3))
+    {
+        // get the logical endpoint number, direction, and buffer parity
+        // from the device status
+        uint32_t num = (USB0->STAT >> 4) & 0x0F;
+        uint32_t dir = (USB0->STAT >> 3) & 0x01;
+        uint32_t odd = (USB0->STAT >> 2) & 0x01;
+
+        // figure the physical endpoint number and bit vector mask
+        int endpoint = (num << 1) | dir;
+        
+        // figure the BDT index
+        int idx = EP_BDT_IDX(num, dir, odd);
 
         // setup packet
-        if ((num == 0) && (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == SETUP_TOKEN)) {
+        if (num == 0 && TOK_PID(idx) == SETUP_TOKEN)
+        {
+            // The first IN packet after a SETUP token is required to be
+            // DATA1.  (Note that the write routine flips the bit before
+            // writing, so we have to clear the bit to get DATA1 on the
+            // next packet.)
             Data1 &= ~0x02;
-            bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
-            bdt[EP_BDT_IDX(0, TX, ODD)].info  &= ~BD_OWN_MASK;
 
-            // EP0 SETUP event (SETUP data received)
+            // handle the EP0 SETUP event (SETUP data received)
             EP0setupCallback();
-
-        } else {
-            // OUT packet
-            if (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == OUT_TOKEN) {
+        }
+        else
+        {
+            // Normal data (non-SETUP) packet
+            
+            if (TOK_PID(idx) == OUT_TOKEN)
+            {
+                // OUT packet
                 if (num == 0)
                     EP0out();
-                else {
-                    epComplete |= (1 << EP(num));
-                    if ((instance->*(epCallback[EP(num) - 2]))()) {
-                        epComplete &= ~(1 << EP(num));
-                    }
+                else
+                {
+                    setCompleted(idx, true);
+                    if ((instance->*(epCallback[endpoint - 2]))())
+                        setCompleted(idx, false);
                 }
             }
 
-            // IN packet
-            if (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == IN_TOKEN) {
-                if (num == 0) {
+            if (TOK_PID(idx) == IN_TOKEN)
+            {
+                // IN packet
+                if (num == 0)
+                {
                     EP0in();
-                    if (set_addr == 1) {
+                    if (set_addr == 1)
+                    {
                         USB0->ADDR = addr & 0x7F;
                         set_addr = 0;
                     }
                 }
-                else {
-                    epComplete |= (1 << (EP(num) + 1));
-                    if ((instance->*(epCallback[EP(num) + 1 - 2]))()) {
-                        epComplete &= ~(1 << (EP(num) + 1));
-                    }
+                else
+                {
+                    setCompleted(idx, true);
+                    if ((instance->*(epCallback[endpoint - 2]))())
+                        setCompleted(idx, false);
                 }
             }
         }
@@ -532,17 +750,19 @@
     }
 
     // sleep interrupt
-    if (istat & 1<<4) {
+    if (istat & (1<<4))
+    {
         USB0->ISTAT |= USB_ISTAT_SLEEP_MASK;
+        connectState |= SUSPENDED;
         suspendStateChanged(1);
     }
 
     // error interrupt
-    if (istat & USB_ISTAT_ERROR_MASK) {
+    if (istat & USB_ISTAT_ERROR_MASK)
+    {
         USB0->ERRSTAT = 0xFF;
         USB0->ISTAT |= USB_ISTAT_ERROR_MASK;
     }
 }
 
-
 #endif