USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

Revision:
48:b225d025ca1d
Parent:
40:cd877d5c09ea
Child:
49:03527ce6840e
--- a/USBDevice/USBHAL_KL25Z.cpp	Wed Feb 03 22:51:03 2016 +0000
+++ b/USBDevice/USBHAL_KL25Z.cpp	Mon Feb 15 23:18:55 2016 +0000
@@ -50,6 +50,7 @@
 // static singleton instance pointer
 USBHAL * USBHAL::instance;
 
+
 // Convert physical endpoint number to register bit
 #define EP(endpoint) (1<<(endpoint))
 
@@ -77,7 +78,7 @@
 #define BD_DTS_MASK        (1<<3)       // DATA TOGGLE SENSING - hardware SIE checks for DATA0/DATA1 match during RX/TX
 #define BD_STALL_MASK      (1<<2)       // STALL - SIE issues STALL handshake in reply to any host access to endpoint
 
-// Endpoint direction
+// Endpoint direction (from DEVICE perspective)
 #define TX    1
 #define RX    0
 
@@ -109,6 +110,7 @@
     uint32_t  address;    // Addr
 } BDT;
 
+
 // There are:
 //    * 16 bidirectional logical endpoints -> 32 physical endpoints
 //    * 2 BDT entries per endpoint (EVEN/ODD) -> 64 BDT entries
@@ -123,6 +125,7 @@
 // Allocated size of each endpoint buffer
 size_t epMaxPacket[NUMBER_OF_PHYSICAL_ENDPOINTS];
 
+
 // SET ADDRESS mode tracking.  The address assignment has to be done in a
 // specific order and with specific timing defined by the USB setup protocol 
 // standards.  To get the sequencing right, we set a flag when we get the
@@ -145,7 +148,7 @@
 // endpoint's bit is at (1 << endpoint number).  A 1 bit signifies that
 // the last read or write has completed (and hasn't had its result 
 // consumed yet).
-static volatile int epComplete = 0;
+static volatile uint32_t epComplete = 0;
 
 static uint32_t frameNumber() 
 {
@@ -165,38 +168,36 @@
     MPU->CESR=0;
 #endif
     // fill in callback array
-    epCallback[0] = &USBHAL::EP0_OUT_callback;
-    epCallback[1] = &USBHAL::EP0_IN_callback;
-    epCallback[2] = &USBHAL::EP1_OUT_callback;
-    epCallback[3] = &USBHAL::EP1_IN_callback;
-    epCallback[4] = &USBHAL::EP2_OUT_callback;
-    epCallback[5] = &USBHAL::EP2_IN_callback;
-    epCallback[6] = &USBHAL::EP3_OUT_callback;
-    epCallback[7] = &USBHAL::EP3_IN_callback;
-    epCallback[8] = &USBHAL::EP4_OUT_callback;
-    epCallback[9] = &USBHAL::EP4_IN_callback;
-    epCallback[10] = &USBHAL::EP5_OUT_callback;
-    epCallback[11] = &USBHAL::EP5_IN_callback;
-    epCallback[12] = &USBHAL::EP6_OUT_callback;
-    epCallback[13] = &USBHAL::EP6_IN_callback;
-    epCallback[14] = &USBHAL::EP7_OUT_callback;
-    epCallback[15] = &USBHAL::EP7_IN_callback;
-    epCallback[16] = &USBHAL::EP8_OUT_callback;
-    epCallback[17] = &USBHAL::EP8_IN_callback;
-    epCallback[18] = &USBHAL::EP9_OUT_callback;
-    epCallback[19] = &USBHAL::EP9_IN_callback;
-    epCallback[20] = &USBHAL::EP10_OUT_callback;
-    epCallback[21] = &USBHAL::EP10_IN_callback;
-    epCallback[22] = &USBHAL::EP11_OUT_callback;
-    epCallback[23] = &USBHAL::EP11_IN_callback;
-    epCallback[24] = &USBHAL::EP12_OUT_callback;
-    epCallback[25] = &USBHAL::EP12_IN_callback;
-    epCallback[26] = &USBHAL::EP13_OUT_callback;
-    epCallback[27] = &USBHAL::EP13_IN_callback;
-    epCallback[28] = &USBHAL::EP14_OUT_callback;
-    epCallback[29] = &USBHAL::EP14_IN_callback;
-    epCallback[30] = &USBHAL::EP15_OUT_callback;
-    epCallback[31] = &USBHAL::EP15_IN_callback;
+    epCallback[0] = &USBHAL::EP1_OUT_callback;
+    epCallback[1] = &USBHAL::EP1_IN_callback;
+    epCallback[2] = &USBHAL::EP2_OUT_callback;
+    epCallback[3] = &USBHAL::EP2_IN_callback;
+    epCallback[4] = &USBHAL::EP3_OUT_callback;
+    epCallback[5] = &USBHAL::EP3_IN_callback;
+    epCallback[6] = &USBHAL::EP4_OUT_callback;
+    epCallback[7] = &USBHAL::EP4_IN_callback;
+    epCallback[8] = &USBHAL::EP5_OUT_callback;
+    epCallback[9] = &USBHAL::EP5_IN_callback;
+    epCallback[10] = &USBHAL::EP6_OUT_callback;
+    epCallback[11] = &USBHAL::EP6_IN_callback;
+    epCallback[12] = &USBHAL::EP7_OUT_callback;
+    epCallback[13] = &USBHAL::EP7_IN_callback;
+    epCallback[14] = &USBHAL::EP8_OUT_callback;
+    epCallback[15] = &USBHAL::EP8_IN_callback;
+    epCallback[16] = &USBHAL::EP9_OUT_callback;
+    epCallback[17] = &USBHAL::EP9_IN_callback;
+    epCallback[18] = &USBHAL::EP10_OUT_callback;
+    epCallback[19] = &USBHAL::EP10_IN_callback;
+    epCallback[20] = &USBHAL::EP11_OUT_callback;
+    epCallback[21] = &USBHAL::EP11_IN_callback;
+    epCallback[22] = &USBHAL::EP12_OUT_callback;
+    epCallback[23] = &USBHAL::EP12_IN_callback;
+    epCallback[24] = &USBHAL::EP13_OUT_callback;
+    epCallback[25] = &USBHAL::EP13_IN_callback;
+    epCallback[26] = &USBHAL::EP14_OUT_callback;
+    epCallback[27] = &USBHAL::EP14_IN_callback;
+    epCallback[28] = &USBHAL::EP15_OUT_callback;
+    epCallback[29] = &USBHAL::EP15_IN_callback;
 
     // choose usb src as PLL
     SIM->SOPT2 |= (SIM_SOPT2_USBSRC_MASK | SIM_SOPT2_PLLFLLSEL_MASK);
@@ -237,14 +238,14 @@
 }
 
 USBHAL::~USBHAL(void) 
-{ 
+{
 }
 
 void USBHAL::connect(void) 
 {
     // enable USB
     USB0->CTL |= USB_CTL_USBENSOFEN_MASK;
-
+    
     // Pull up enable
     USB0->CONTROL |= USB_CONTROL_DPPULLUPNONOTG_MASK;
 }
@@ -253,17 +254,18 @@
 {
     // disable USB
     USB0->CTL &= ~USB_CTL_USBENSOFEN_MASK;
-
+    
     // Pull up disable
     USB0->CONTROL &= ~USB_CONTROL_DPPULLUPNONOTG_MASK;
 
-    //Free buffers if required:
+    // Free buffers
     for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) 
     {
         if (endpoint_buffer[i] != NULL)
         {
             free(endpoint_buffer[i]);
             endpoint_buffer[i] = NULL;
+            epMaxPacket[i] = 0;
         }
     }
 }
@@ -296,56 +298,60 @@
     // get the logical endpoint
     uint32_t log_endpoint = PHY_TO_LOG(endpoint);
     
-    // Constrain the requested packet size to the maximum for the endpoint type.
-    // Full Speed USB allows up to 1023 bytes for isochronous endpoints and 64 bytes
-    // for bulk and interrupt endpoints.
-    uint32_t realMaxPacket = ((flags & ISOCHRONOUS) ? 1023 : 64);
-    if (maxPacket > realMaxPacket)
-        maxPacket = realMaxPacket;
+    // Assume this is a bulk or interrupt endpoint.  For these, the hardware maximum
+    // packet size is 64 bytes, and we use packet handshaking.
+    uint32_t hwMaxPacket = 64;
+    uint32_t handshake_flag = USB_ENDPT_EPHSHK_MASK;
+    
+    // If it's to be an isochronous endpoint, the hardware maximum packet size
+    // increases to 1023 bytes, and we don't use handshaking.
+    if (flags & ISOCHRONOUS) 
+    {
+        handshake_flag = 0;
+        hwMaxPacket = 1023;
+    }
 
-    // Use the HANDSHAKE flag for non-isochronous endpoints.  Don't use handshaking
-    // for an iso endpoint, since this type of endpoint is for applications like
-    // audio and video streaming where it's preferable to ignore lost packets and
-    // just carry on with the latest data.
-    uint32_t ctlFlags = 0;
-    if (!(flags & ISOCHRONOUS))
-        ctlFlags |= USB_ENDPT_EPHSHK_MASK;
+    // limit the requested max packet size to the hardware limit
+    if (maxPacket > hwMaxPacket)
+        maxPacket = hwMaxPacket;
         
-    // figure the RX/TX based on the endpoint direction
-    ctlFlags |= (IN_EP(endpoint) ? USB_ENDPT_EPTXEN_MASK : USB_ENDPT_EPRXEN_MASK);
-
     ENTER_CRITICAL_SECTION
     {
-        // if we don't already have a buffer that's big enough, allocate a new one
+        // if the endpoint buffer hasn't been allocated yet or was previously
+        // allocated at a smaller size, allocate a new buffer        
         uint8_t *buf = endpoint_buffer[endpoint];
-        if (maxPacket > epMaxPacket[endpoint])
+        if (buf == NULL || epMaxPacket[endpoint] < maxPacket)
         {
-            // free any existing buffer
+            // free any previous buffer
             if (buf != 0)
                 free(buf);
-                
-            // allocate a new one
+    
+            // allocate at the new size
+            endpoint_buffer[endpoint] = buf = (uint8_t *)malloc(maxPacket);
+            
+            // set the new max packet size
             epMaxPacket[endpoint] = maxPacket;
-            endpoint_buffer[endpoint] = buf = (uint8_t *)malloc(maxPacket);
         }
         
-        // Set up the BDT entry.  Note that we disable the hardware double-buffering
-        // scheme, so we only use the EVEN buffer for the endpoint.
-        int idx = PEP_BDT_IDX(endpoint, EVEN);
-        bdt[idx].info = 0;
-        bdt[idx].address = (uint32_t)buf;
-        
-        // Set the endpoint flags.  Note that these bits are additive, since the
-        // endpoint register represents the logical endpoint, which is the combination
-        // of the physical IN and OUT endpoints.
-        USB0->ENDPOINT[log_endpoint].ENDPT |= ctlFlags;
-    
-        // If this is an OUT endpoint, queue the first read on the endpoint by
-        // handing ownership of the BDT to the SIE.
-        if (OUT_EP(endpoint))
+        // set the endpoint register flags and BDT entry
+        if (IN_EP(endpoint)) 
+        {
+            // IN endpt -> device to host (TX)
+            USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag | USB_ENDPT_EPTXEN_MASK;  // en TX (IN) tran
+            bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].address = (uint32_t) buf;
+            bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].address = 0;
+        }
+        else 
         {
-            bdt[idx].byte_count = maxPacket;
-            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
+            // OUT endpt -> host to device (RX)
+            USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag | USB_ENDPT_EPRXEN_MASK;  // en RX (OUT) tran.
+            bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].address = (uint32_t) buf;
+            bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].address = 0;
+            
+            // set up the first read
+            bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].byte_count = maxPacket;
+            bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].info       = BD_OWN_MASK | BD_DTS_MASK;
+            bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].info       = 0;
         }
     
         // Set DATA1 on the endpoint.  For RX endpoints, we just queued up our first
@@ -402,7 +408,6 @@
 
 void USBHAL::EP0stall(void) 
 {
-    printd("EP0 stall!\r\n");
     stallEndpoint(EP0OUT);
 }
 
@@ -425,42 +430,42 @@
     uint32_t log_endpoint = PHY_TO_LOG(endpoint);
     
     // get the mode - it's isochronous if it doesn't have the handshake flag
-    bool iso = !(USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK);
+    bool iso = (USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK) == 0;
     
     // get the BDT index
     int idx = EP_BDT_IDX(log_endpoint, RX, 0);
         
-    // If the "complete" flag isn't set, the read is still pending in the SIE.
-    // This doesn't apply the isochronous endpoints, since we don't get TOKDNE
-    // interrupts on those (we use the SOF signal instead).  It also doesn't
-    // apply to endpoint 0, since that doesn't use the epComplete mechanism
-    // at all (necessary because we handle all transactions on this endpoint
-    //  in IRQ context).  For EP0, just make sure the hardware doesn't still
-    // own the BDT - if it does, the last read hasn't completed yet.
+    // Check to see if the endpoint is ready to read
     if (log_endpoint == 0)
     {
         // control endpoint - just make sure we own the BDT
         if (bdt[idx].info & BD_OWN_MASK)
             return EP_PENDING;
     }
-    else if (!iso && !(epComplete & EP(endpoint)))
-        return EP_PENDING;
+    else
+    {
+        // If it's not isochronous, check to see if we've received data, and
+        // return PENDING if not.  Isochronous endpoints don't use the TOKNE 
+        // interrupt (they use SOF instead), so the 'complete' flag doesn't
+        // apply if it's an iso endpoint.
+        if (!iso && !(epComplete & EP(endpoint)))
+            return EP_PENDING;
+    }
     
-    // get the buffer
-    uint8_t *ep_buf = endpoint_buffer[endpoint];
-
     ENTER_CRITICAL_SECTION
     {
-        // get the packet size from the BDT
+        // note if we have a SETUP token
+        bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
+    
+        // get the received data buffer and size
+        uint8_t *ep_buf = endpoint_buffer[endpoint];
         uint32_t sz  = bdt[idx].byte_count;
     
-        // note if it's a SETUP token    
-        bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
-    
-        // copy the data
-        memcpy(buffer, ep_buf, sz);
+        // copy the data from the hardware receive buffer to the caller's buffer
         *bytesRead = sz;
-
+        for (uint32_t n = 0 ; n < sz ; n++)
+            buffer[n] = ep_buf[n];
+        
         // Figure the DATA0/DATA1 bit for the next packet received on this
         // endpoint.  The bit normally toggles on each packet, but it's
         // special for SETUP packets on endpoint 0.  The next OUT packet
@@ -468,20 +473,20 @@
         // if the SETUP packet was also DATA0.
         if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) 
         {
-            if (setup && (buffer[6] == 0))  // if no setup data stage,
-                Data1 &= ~1UL;              // set DATA0
+            if (setup && (buffer[6] == 0))  // if SETUP with no data stage,
+                Data1 &= ~1UL;              // the next packet is always DATA0
             else
                 Data1 ^= (1 << endpoint);   // otherwise just toggle the last bit
         }
-
-        // set up the BDT entry to receive the next packet, and hand it to the SIE to fill    
+    
+        // set up the BDT entry to receive the next packet, and hand it to the SIE
         bdt[idx].byte_count = epMaxPacket[endpoint];
-        bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK | ((Data1 >> endpoint) << 6);
+        bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK | (((Data1 >> endpoint) & 1) << 6);
     
-        // clear the SUSPEND TOKEN BUSY flag to allow the SIE to continue processing tokens
+        // clear the SUSPEND TOKEN BUSY flag to allow token processing to continue
         USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
     
-        // clear the completion flag
+        // clear the 'completed' flag - we're now awaiting the next packet
         epComplete &= ~EP(endpoint);
     }
     EXIT_CRITICAL_SECTION
@@ -499,40 +504,48 @@
     // get the BDT index
     int idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, 0);
     
-    // get the buffer pointer
-    uint8_t *buf = endpoint_buffer[endpoint];
-
     ENTER_CRITICAL_SECTION
     {
+        // get the endpoint buffer
+        uint8_t *ep_buf = endpoint_buffer[endpoint];
+    
         // copy the data to the hardware buffer
         bdt[idx].byte_count = size;
-        memcpy(buf, data, size);
-
-        // flip the DATA1 bit before sending    
+        for (uint32_t n = 0 ; n < size ; n++)
+            ep_buf[n] = data[n];
+        
+        // toggle DATA0/DATA1 before sending
         Data1 ^= (1 << endpoint);
 
-        // hand the BDT to the SIE hardware, and set the current DATA1 bit
-        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | ((Data1 >> endpoint) << 6);
+        // hand the BDT to the SIE to do the send
+        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
     }
     EXIT_CRITICAL_SECTION
-
-    // the operation is now pending
+    
+    // write is now pending in the hardware
     return EP_PENDING;
 }
 
 EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint) 
 {
+    // assume write is still pending
     EP_STATUS result = EP_PENDING;
-
+    
     ENTER_CRITICAL_SECTION
     {
-        if (epComplete & EP(endpoint)) {
+        // check the 'completed' flag - if set, the write is completed
+        if (epComplete & EP(endpoint)) 
+        {
+            // the result is COMPLETED
+            result = EP_COMPLETED;
+
+            // clear the 'completed' flag - this is consumed by fetching the result
             epComplete &= ~EP(endpoint);
-            result = EP_COMPLETED;
         }
     }
     EXIT_CRITICAL_SECTION
     
+    // return the result
     return result;
 }
 
@@ -543,13 +556,28 @@
 
 void USBHAL::unstallEndpoint(uint8_t endpoint) 
 {
-    printd("unstall endpoint %d %s\r\n", endpoint>>1,endpoint&1?"TX":"RX");
     ENTER_CRITICAL_SECTION
     {
+        // clear the stall bit in the endpoint register
         USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
+        
+        // take ownership of the BDT entry
         int idx = PEP_BDT_IDX(endpoint, 0);
         bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
-        Data1 &= ~(1 << endpoint);
+        
+        // if this is an RX endpoint, start a new read
+        if (OUT_EP(endpoint))
+        {
+            bdt[idx].byte_count = epMaxPacket[endpoint];
+            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
+        }
+
+        // Reset Data1 for the endpoint - we need to set the bit to 1 for 
+        // either TX or RX, by the same logic as in realiseEndpoint()
+        Data1 |= (1 << endpoint);
+        
+        // clear the 'completed' bit for the endpoint
+        epComplete &= ~(1 << endpoint);
     }
     EXIT_CRITICAL_SECTION
 }
@@ -565,60 +593,6 @@
     // [TODO]
 }
 
-// Control endpoint OUT/SETUP callback.  Called from ISR context only.
-bool USBHAL::EP0_OUT_callback(void)
-{
-    // setup packet
-    int idx = PEP_BDT_IDX(EP0OUT, EVEN);
-    if (TOK_PID(idx) == SETUP_TOKEN) 
-    {
-        // SETUP packet on EP0
-        
-        // Set DATA1 on Control IN endpoint for next packet (recall that we
-        // toggle the bit before a send, so clearing the bit sets DATA1 for
-        // the next send).  If there's a data IN stage for the SETUP packet,
-        // it must always be DATA1, regardless of the prior state of the IN
-        // endpoint.
-        Data1 &= ~0x02;
-        
-        // make sure we own the IN enpdoint now, in preparation for the reply
-        bdt[PEP_BDT_IDX(EP0IN, EVEN)].info &= ~BD_OWN_MASK;
-        
-        // process the SETUP packet through the portable protocol code
-        EP0setupCallback();
-    } 
-    else 
-    {
-        // OUT packet on EP0 - process it through the protocol code
-        EP0out();
-    }
-    
-    // success
-    return true;
-}
-
-// Control endpoint IN packet handler.  This is only called from ISR context.
-bool USBHAL::EP0_IN_callback(void)
-{
-    // process the IN packet through the portable protocol code
-    EP0in();
-    
-    // If we have a SET ADDRESS command outstanding, put it into effect now.
-    // The USB spec requires an address change to be made immediately (within 
-    // 2ms) after the reply to the SET ADDRESS SETUP packet.  If the flag is
-    // set, it means that the EP0in() call above just sent the response, so
-    // now is the time to make the address change in the SIE hardware register.
-    if (set_addr == 1) 
-    {
-        USB0->ADDR = addr & 0x7F;
-        set_addr = 0;
-    }
-    
-    // success
-    return true;
-}
-
-
 void USBHAL::_usbisr(void) 
 {
     inIRQ = true;
@@ -635,16 +609,22 @@
     // reset interrupt
     if (istat & USB_ISTAT_USBRST_MASK) 
     {
-        // disable all endpt
-        for(i = 0 ; i < 16 ; i++)
+        // disable all endpoints
+        for (i = 0 ; i < 16 ; i++)
             USB0->ENDPOINT[i].ENDPT = 0x00;
 
         // enable control endpoint
         realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
         realiseEndpoint(EP0IN, MAX_PACKET_SIZE_EP0, 0);
 
+        // reset DATA0/1 state
         Data1 = 0x55555555;
+    
+        // reset endpoint completion status
         epComplete = 0;
+        
+        // reset EVEN/ODD state (and keep it permanently on EVEN -
+        // this disables the hardware double-buffering system)
         USB0->CTL |=  USB_CTL_ODDRST_MASK;
 
         USB0->ISTAT   =  0xFF;  // clear all interrupt status flags
@@ -657,8 +637,8 @@
         
         // we're not suspended
         suspendStateChanged(0);
-        
-        // return now - do no more processing on a RESET interrupt
+
+        // do ONLY the reset processing on a RESET interrupt
         return;
     }
 
@@ -683,38 +663,95 @@
         // if the control endpoint (EP 0) is stalled, unstall it
         if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
         {
-            unstallEndpoint(EP0OUT);
-            unstallEndpoint(EP0IN);
+            // clear the stall bit in the endpoint register
+            USB0->ENDPOINT[0].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
+        
+            // take ownership of the RX and TX BDTs
+            bdt[EP_BDT_IDX(0, RX, EVEN)].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
+            bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
+            bdt[EP_BDT_IDX(0, RX, EVEN)].byte_count = MAX_PACKET_SIZE_EP0;
+            bdt[EP_BDT_IDX(0, TX, EVEN)].byte_count = MAX_PACKET_SIZE_EP0;
+
+            // start a new read on EP0OUT
+            bdt[EP_BDT_IDX(0, RX, EVEN)].info = BD_OWN_MASK | BD_DTS_MASK;
+
+            // reset the DATA0/1 bit to 1 on EP0IN and EP0OUT, by the same 
+            // logic as in realiseEndpoint()            
+            Data1 |= 0x03;
         }
         
-        // clear the SUSPEND flag to allow token processing to continue
+        // clear the busy-suspend bit to resume token processing
         USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+        
+        // clear the interrupt status bit for STALL
         USB0->ISTAT = USB_ISTAT_STALL_MASK;
     }
 
     // token interrupt
     if (istat & USB_ISTAT_TOKDNE_MASK) 
     {
+        // get the endpoint information from the status register
         uint32_t num  = (USB0->STAT >> 4) & 0x0F;
         uint32_t dir  = (USB0->STAT >> 3) & 0x01;
         int endpoint = (num << 1) | dir;
-        // uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;  // we only use EVEN buffers, so this is always 0
+        uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;
+
+        // check which endpoint we're working with
+        if (num == 0)
+        {
+            // Endpoint 0 requires special handling
+            uint32_t idx = EP_BDT_IDX(num, dir, ev_odd);
+            int pid = TOK_PID(idx);
+            if (pid == SETUP_TOKEN)
+            {
+                // SETUP packet - next IN (TX) packet must be DATA1 (confusingly,
+                // this means we must clear the Data1 bit, since we flip the bit
+                // before each send)
+                Data1 &= ~0x02;
+                
+                // forcibly take ownership of the EP0IN endpoints in case we have
+                // unfinished previous transmissions (the protocol state machine here
+                // assumes that we don't, so it's probably an error if this code
+                // actually does anything, but we make no provision for handling this)
+                bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
+                bdt[EP_BDT_IDX(0, TX, ODD )].info &= ~BD_OWN_MASK;
 
-        // set the Completed bit for the endpoint to indicate that we've
-        // finished this send/receive request
-        epComplete |= EP(endpoint);
-        
-        // Call the endpoint packet callback.  If that returns true, it means
-        // that the callback handled the packet.  That consumes the packet, so
-        // clear the Completed flag to indicate that we're on to the next
-        // transaction on the endpoint.
-        if ((instance->*(epCallback[endpoint]))())
-            epComplete &= ~EP(endpoint);
+                // handle the EP0 SETUP event in the generic protocol layer
+                EP0setupCallback();
+            } 
+            else if (pid == OUT_TOKEN)
+            {
+                // OUT packet on EP0
+                EP0out();
+            }
+            else if (pid == IN_TOKEN)
+            {
+                // IN packet on EP0
+                EP0in();
+                
+                // Special case: if the 'set address' flag is set, it means that the
+                // host just sent us our bus address.  We must put this into effect
+                // in the hardware SIE *after* sending the reply, which we just did
+                // above.  So it's now time!
+                if (set_addr == 1) {
+                    USB0->ADDR = addr & 0x7F;
+                    set_addr = 0;
+                }
+            }
+        }
+        else
+        {
+            // For all other endpoints, note the read/write completion in the flags
+            epComplete |= EP(endpoint);
+            
+            // call the endpoint token callback; if that handles the token, it consumes
+            // the 'completed' status, so clear that flag again
+            if ((instance->*(epCallback[endpoint - 2]))()) {
+                epComplete &= ~EP(endpoint);
+            }
+        }
 
-        // allow token processing to resume
-        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
-        
-        // reset the TOKDNE interrupt status flag
+        // clear the TOKDNE interrupt status bit
         USB0->ISTAT = USB_ISTAT_TOKDNE_MASK;
     }