USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

Revision:
50:946bc763c068
Parent:
49:03527ce6840e
Child:
51:666cc4fedd3f
--- a/USBDevice/USBHAL_KL25Z.cpp	Fri Feb 26 18:41:47 2016 +0000
+++ b/USBDevice/USBHAL_KL25Z.cpp	Wed Apr 27 01:50:32 2016 +0000
@@ -18,14 +18,7 @@
 
 #if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)
 
-//#define DEBUG
-#ifdef DEBUG
-#define printd(fmt, ...) printf(fmt, __VA_ARGS__)
-#else
-#define printd(fmt, ...)
-#endif
-
-
+#include <stdarg.h>
 #include "USBHAL.h"
 
 // Critical section controls.  This module uses a bunch of static variables,
@@ -47,6 +40,102 @@
     if (!inIRQ) \
         NVIC_EnableIRQ(USB0_IRQn);
 
+//#define DEBUG_WITH_PRINTF
+// debug printf; does a regular printf() in debug mode, nothing in
+// normal mode.  Note that many of our routines are called in ISR
+// context, so printf should really never be used here.  But in
+// practice we can get away with it enough that it can be helpful
+// as a limited debugging tool.
+#ifdef DEBUG_WITH_PRINTF
+#define printd(fmt, ...) printf(fmt, __VA_ARGS__)
+#else
+#define printd(fmt, ...)
+#endif
+
+// Makeshift debug instrumentation.  This is a safer and better
+// alternative to printf() that gathers event information in a 
+// circular buffer for later useoutside of interrupt context, such 
+// as printf() display at intervals in the main program loop.  
+//
+// Timing is critical to USB, so debug instrumentation is inherently 
+// problematic in that it can affect the timing and thereby change 
+// the behavior of what we're trying to debug.  Small timing changes
+// can create new errors that wouldn't be there otherwise, or even
+// accidentally fix the bug were trying to find (e.g., by changing
+// the timing enough to avoid a race condition).  To minimize these 
+// effects, we use a small buffer and very terse event codes - 
+// generally one character per event.  That makes for a cryptic 
+// debug log, but it results in almost zero timing effects, allowing
+// us to see a more faithful version of the subject program.
+//
+// Note that the buffer size isn't critical to timing, because any
+// printf()-type display should always occur in regular (non-ISR)
+// context and thus won't have any significant effect on interrupt
+// timing or latency.  The buffer can be expanded if longer logs
+// would be helpful.  However, it is important to keep the individual
+// event messages short (a character or two in most cases), because
+// it takes time to move them into the buffer.  
+//#define DEBUG_WITH_EVENTS
+#ifdef DEBUG_WITH_EVENTS
+const int nevents = 64;  // MUST BE A POWER OF 2
+char events[nevents];
+char ewrite = 0, eread = 0;
+void HAL_DEBUG_EVENT(char c)
+{
+    events[ewrite] = c;
+    ewrite = (ewrite+1) & (nevents-1);
+    if (ewrite == eread)
+        eread = (eread+1) & (nevents-1);
+}
+void HAL_DEBUG_EVENT(char a, char b) { 
+    HAL_DEBUG_EVENT(a); HAL_DEBUG_EVENT(b); 
+}
+void HAL_DEBUG_EVENT(char a, char b, char c) { 
+    HAL_DEBUG_EVENT(a); HAL_DEBUG_EVENT(b); HAL_DEBUG_EVENT(c); 
+}
+void HAL_DEBUG_EVENT(const char *s) { 
+    while (*s) HAL_DEBUG_EVENT(*s++); 
+}
+void HAL_DEBUG_EVENTI(char c, int i) {
+    HAL_DEBUG_EVENT(c);
+    if (i > 1000) HAL_DEBUG_EVENT(((i / 1000) % 10) + '0');
+    if (i > 100) HAL_DEBUG_EVENT(((i / 100) % 10) + '0');
+    if (i > 10) HAL_DEBUG_EVENT(((i / 10) % 10) + '0');
+    HAL_DEBUG_EVENT((i % 10) + '0');
+}
+void HAL_DEBUG_EVENTF(const char *fmt, ...) {
+    va_list va; 
+    va_start(va, fmt); 
+    char buf[64]; 
+    vsprintf(buf, fmt, va); 
+    va_end(va); 
+    HAL_DEBUG_EVENT(buf);
+}
+void HAL_DEBUG_PRINTEVENTS(const char *prefix)
+{
+    if (prefix != 0)
+        printf("%s ", prefix);
+    else
+        printf("ev: ");
+
+    char buf[nevents];
+    int i;
+    ENTER_CRITICAL_SECTION
+    {
+        for (i = 0 ; eread != ewrite ; eread = (eread+1) & (nevents - 1))
+            buf[i++] = events[eread];
+    }
+    EXIT_CRITICAL_SECTION
+    printf("%.*s\r\n", i, buf);
+}
+#else
+#define HAL_DEBUG_EVENT(...)   void(0)
+#define HAL_DEBUG_EVENTf(...)  void(0)
+#define HAL_DEBUG_EVENTI(...)  void(0)
+void HAL_DEBUG_PRINTEVENTS(const char *) { }
+#endif
+
+
 // static singleton instance pointer
 USBHAL * USBHAL::instance;
 
@@ -142,7 +231,7 @@
 // packet we sent (so the next packet will be the inverse).  For RX
 // endpoints, this is the bit value we expect for the NEXT packet.
 // (Yes, it's inconsistent.)
-static uint32_t Data1  = 0x55555555;
+static volatile uint32_t Data1  = 0x55555555;
 
 // Endpoint read/write completion flags, packed as a bit vector.  Each 
 // endpoint's bit is at (1 << endpoint number).  A 1 bit signifies that
@@ -150,6 +239,12 @@
 // consumed yet).
 static volatile uint32_t epComplete = 0;
 
+// Endpoint Realised flags.  We set these flags (arranged in the usual 
+// endpoint bit vector format) when endpoints are realised, so that
+// read/write operations will know if it's okay to proceed.  The
+// control endpoint (EP0) is always realised in both directions.
+static volatile uint32_t epRealised = 0x03;
+
 static uint32_t frameNumber() 
 {
     return((USB0->FRMNUML | (USB0->FRMNUMH << 8)) & 0x07FF);
@@ -160,7 +255,63 @@
     return 0;
 }
 
-USBHAL::USBHAL(void) {
+// Enabled interrupts at startup or reset:
+//   TOKDN  - token done
+//   SOFTOK - start-of-frame token
+//   ERROR  - error
+//   SLEEP  - sleep (inactivity on bus)
+//   RST    - bus reset
+//
+// Note that don't enable RESUME (resume from suspend mode), per 
+// the hardware reference manual ("When not in suspend mode this 
+// interrupt must be disabled").  We also don't enable ATTACH, which
+// is only meaningful in host mode.
+#define BUS_RESET_INTERRUPTS \
+    USB_INTEN_TOKDNEEN_MASK \
+    | USB_INTEN_STALLEN_MASK \
+    | USB_INTEN_SOFTOKEN_MASK \
+    | USB_INTEN_ERROREN_MASK \
+    | USB_INTEN_SLEEPEN_MASK \
+    | USB_INTEN_USBRSTEN_MASK
+
+// Do a low-level reset on the USB hardware module.  This lets the 
+// device software initiate a hard reset.
+static void resetSIE(void)
+{
+    // set the reset bit in the transceiver control register,
+    // then wait for it to clear
+    USB0->USBTRC0 |= USB_USBTRC0_USBRESET_MASK;
+    while (USB0->USBTRC0 & USB_USBTRC0_USBRESET_MASK);
+    
+    // clear BDT entries
+    for (int i = 0 ; i < sizeof(bdt)/sizeof(bdt[0]) ; ++i)
+    {
+        bdt[i].info = 0;
+        bdt[i].byte_count = 0;
+    }
+
+    // Set BDT Base Register
+    USB0->BDTPAGE1 = (uint8_t)((uint32_t)bdt>>8);
+    USB0->BDTPAGE2 = (uint8_t)((uint32_t)bdt>>16);
+    USB0->BDTPAGE3 = (uint8_t)((uint32_t)bdt>>24);
+
+    // Clear interrupt flag
+    USB0->ISTAT = 0xff;
+
+    // Enable the initial set of interrupts
+    USB0->INTEN = BUS_RESET_INTERRUPTS;
+
+    // Disable weak pull downs, and turn off suspend mode
+    USB0->USBCTRL = 0;
+    //$$$USB0->USBCTRL &= ~(USB_USBCTRL_PDE_MASK | USB_USBCTRL_SUSP_MASK);
+
+    // set the "reserved" bit in the transceiver control register
+    // (hw ref: "software must set this bit to 1")
+    USB0->USBTRC0 |= 0x40;
+}
+
+USBHAL::USBHAL(void) 
+{
     // Disable IRQ
     NVIC_DisableIRQ(USB0_IRQn);
 
@@ -212,33 +363,21 @@
 
     // USB Module Configuration
     // Reset USB Module
-    USB0->USBTRC0 |= USB_USBTRC0_USBRESET_MASK;
-    while(USB0->USBTRC0 & USB_USBTRC0_USBRESET_MASK);
-
-    // Set BDT Base Register
-    USB0->BDTPAGE1 = (uint8_t)((uint32_t)bdt>>8);
-    USB0->BDTPAGE2 = (uint8_t)((uint32_t)bdt>>16);
-    USB0->BDTPAGE3 = (uint8_t)((uint32_t)bdt>>24);
-
-    // Clear interrupt flag
-    USB0->ISTAT = 0xff;
-
-    // USB Interrupt Enablers
-    USB0->INTEN |= USB_INTEN_TOKDNEEN_MASK |
-                   USB_INTEN_SOFTOKEN_MASK |
-                   USB_INTEN_ERROREN_MASK  |
-                   USB_INTEN_SLEEPEN_MASK |
-                   USB_INTEN_RESUMEEN_MASK |
-                   USB_INTEN_USBRSTEN_MASK;
-
-    // Disable weak pull downs
-    USB0->USBCTRL &= ~(USB_USBCTRL_PDE_MASK | USB_USBCTRL_SUSP_MASK);
-
-    USB0->USBTRC0 |= 0x40;
+    resetSIE();
 }
 
 USBHAL::~USBHAL(void) 
 {
+    // Free buffers
+    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) 
+    {
+        if (endpoint_buffer[i] != NULL)
+        {
+            delete [] endpoint_buffer[i];
+            endpoint_buffer[i] = NULL;
+            epMaxPacket[i] = 0;
+        }
+    }
 }
 
 void USBHAL::connect(void) 
@@ -257,17 +396,15 @@
     
     // Pull up disable
     USB0->CONTROL &= ~USB_CONTROL_DPPULLUPNONOTG_MASK;
+}
 
-    // Free buffers
-    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) 
-    {
-        if (endpoint_buffer[i] != NULL)
-        {
-            delete [] endpoint_buffer[i];
-            endpoint_buffer[i] = NULL;
-            epMaxPacket[i] = 0;
-        }
-    }
+void USBHAL::hardReset(void)
+{
+    // reset the SIE module
+    resetSIE();
+    
+    // do the internal reset work
+    internalReset();
 }
 
 void USBHAL::configureDevice(void) 
@@ -307,8 +444,8 @@
     // increases to 1023 bytes, and we don't use handshaking.
     if (flags & ISOCHRONOUS) 
     {
+        hwMaxPacket = 1023;
         handshake_flag = 0;
-        hwMaxPacket = 1023;
     }
 
     // limit the requested max packet size to the hardware limit
@@ -360,6 +497,9 @@
         // (counterintuitively) we need to set the DATA1 bit now to send DATA0 in the
         // next packet.  So in either case, we want DATA1 initially.
         Data1 |= (1 << endpoint);
+        
+        // mark the endpoint as realised
+        epRealised |= (1 << endpoint);
     }
     EXIT_CRITICAL_SECTION
     
@@ -374,6 +514,7 @@
     endpointReadResult(EP0OUT, buffer, &sz);
 }
 
+// Start reading the data stage of a SETUP transaction on EP0
 void USBHAL::EP0readStage(void) 
 {
     if (!(bdt[0].info & BD_OWN_MASK))
@@ -384,20 +525,29 @@
     }
 }
 
+// Read an OUT packet on EP0
 void USBHAL::EP0read(void) 
 {
     if (!(bdt[0].info & BD_OWN_MASK))
+    {
+        Data1 &= ~1UL;
         bdt[0].byte_count = MAX_PACKET_SIZE_EP0;
+        bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);
+    }
 }
 
 uint32_t USBHAL::EP0getReadResult(uint8_t *buffer) 
 {
     uint32_t sz;
-    endpointReadResult(EP0OUT, buffer, &sz);
-    return sz;
+    if (endpointReadResult(EP0OUT, buffer, &sz) == EP_COMPLETED) {
+        return sz;
+    }
+    else {
+        return 0;
+    }
 }
 
-void USBHAL::EP0write(const uint8_t *buffer, uint32_t size) 
+void USBHAL::EP0write(const volatile uint8_t *buffer, uint32_t size) 
 {
     endpointWrite(EP0IN, buffer, size);
 }
@@ -408,7 +558,7 @@
 
 void USBHAL::EP0stall(void) 
 {
-    stallEndpoint(EP0OUT);
+    // $$$ stallEndpoint(EP0OUT);
 }
 
 EP_STATUS USBHAL::endpointRead(uint8_t endpoint, uint32_t maximumSize) 
@@ -422,7 +572,7 @@
 
 EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t *buffer, uint32_t *bytesRead) 
 {
-    // validate the endpoint number and direction
+    // validate the endpoint number and direction, and make sure it's realised
     if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS || !OUT_EP(endpoint))
         return EP_INVALID;
 
@@ -451,51 +601,62 @@
         if (!iso && !(epComplete & EP(endpoint)))
             return EP_PENDING;
     }
-    
+
+    EP_STATUS result = EP_INVALID;    
     ENTER_CRITICAL_SECTION
     {
-        // note if we have a SETUP token
-        bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
-    
-        // get the received data buffer and size
-        uint8_t *ep_buf = endpoint_buffer[endpoint];
-        uint32_t sz  = bdt[idx].byte_count;
-    
-        // copy the data from the hardware receive buffer to the caller's buffer
-        *bytesRead = sz;
-        for (uint32_t n = 0 ; n < sz ; n++)
-            buffer[n] = ep_buf[n];
+        // proceed only if the endpoint has been realised
+        if (epRealised & EP(endpoint))
+        {
+            // note if we have a SETUP token
+            bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
+        
+            // get the received data buffer and size
+            uint8_t *ep_buf = endpoint_buffer[endpoint];
+            uint32_t sz = bdt[idx].byte_count;
         
-        // Figure the DATA0/DATA1 bit for the next packet received on this
-        // endpoint.  The bit normally toggles on each packet, but it's
-        // special for SETUP packets on endpoint 0.  The next OUT packet
-        // after a SETUP packet with no data stage is always DATA0, even
-        // if the SETUP packet was also DATA0.
-        if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) 
-        {
-            if (setup && (buffer[6] == 0))  // if SETUP with no data stage,
-                Data1 &= ~1UL;              // the next packet is always DATA0
-            else
-                Data1 ^= (1 << endpoint);   // otherwise just toggle the last bit
+            // copy the data from the hardware receive buffer to the caller's buffer
+            *bytesRead = sz;
+            for (uint32_t n = 0 ; n < sz ; n++)
+                buffer[n] = ep_buf[n];
+            
+            // Figure the DATA0/DATA1 bit for the next packet received on this
+            // endpoint.  The bit normally toggles on each packet, but it's
+            // special for SETUP packets on endpoint 0.  The next OUT packet
+            // after a SETUP packet with no data stage is always DATA0, even
+            // if the SETUP packet was also DATA0.
+            if (setup && (sz >= 7 && buffer[6] == 0)) {
+                // SETUP with no data stage -> next packet is always DATA0
+                Data1 &= ~1UL;
+            }
+            else {
+                // otherwise just toggle the last bit (assuming it matches our
+                // internal state - if not, we must be out of sync, so presumably
+                // *not* toggling our state will get us back in sync)
+                if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1))
+                    Data1 ^= (1 << endpoint);
+            }
+        
+            // set up the BDT entry to receive the next packet, and hand it to the SIE
+            bdt[idx].byte_count = epMaxPacket[endpoint];
+            bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK | (((Data1 >> endpoint) & 1) << 6);
+
+            // clear the SUSPEND TOKEN BUSY flag to allow token processing to continue
+            USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+        
+            // clear the 'completed' flag - we're now awaiting the next packet
+            epComplete &= ~EP(endpoint);
+            
+            // the read is now complete
+            result = EP_COMPLETED;
         }
-    
-        // set up the BDT entry to receive the next packet, and hand it to the SIE
-        bdt[idx].byte_count = epMaxPacket[endpoint];
-        bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK | (((Data1 >> endpoint) & 1) << 6);
-    
-        // clear the SUSPEND TOKEN BUSY flag to allow token processing to continue
-        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
-    
-        // clear the 'completed' flag - we're now awaiting the next packet
-        epComplete &= ~EP(endpoint);
     }
     EXIT_CRITICAL_SECTION
         
-    // the read is completed
-    return EP_COMPLETED;
+    return result;
 }
 
-EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, const uint8_t *data, uint32_t size) 
+EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, const volatile uint8_t *data, uint32_t size) 
 {
     // validate the endpoint number and direction
     if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS || !IN_EP(endpoint))
@@ -504,26 +665,33 @@
     // get the BDT index
     int idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, 0);
     
+    EP_STATUS result = EP_INVALID;
     ENTER_CRITICAL_SECTION
     {
-        // get the endpoint buffer
-        uint8_t *ep_buf = endpoint_buffer[endpoint];
+        // proceed only if the endpoint has been realised and we own the BDT
+        if ((epRealised & EP(endpoint)) && !(bdt[idx].info & BD_OWN_MASK))
+        {
+            // get the endpoint buffer
+            uint8_t *ep_buf = endpoint_buffer[endpoint];
+        
+            // copy the data to the hardware buffer
+            bdt[idx].byte_count = size;
+            for (uint32_t n = 0 ; n < size ; n++)
+                ep_buf[n] = data[n];
+            
+            // toggle DATA0/DATA1 before sending
+            Data1 ^= (1 << endpoint);
     
-        // copy the data to the hardware buffer
-        bdt[idx].byte_count = size;
-        for (uint32_t n = 0 ; n < size ; n++)
-            ep_buf[n] = data[n];
-        
-        // toggle DATA0/DATA1 before sending
-        Data1 ^= (1 << endpoint);
+            // hand the BDT to the SIE to do the send
+            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
 
-        // hand the BDT to the SIE to do the send
-        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
+            // write is now pending in the hardware
+            result = EP_PENDING;
+        }
     }
     EXIT_CRITICAL_SECTION
     
-    // write is now pending in the hardware
-    return EP_PENDING;
+    return result;
 }
 
 EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint) 
@@ -533,8 +701,14 @@
     
     ENTER_CRITICAL_SECTION
     {
-        // check the 'completed' flag - if set, the write is completed
-        if (epComplete & EP(endpoint)) 
+        // If the endpoint isn't realised, the result is 'invalid'.  Otherwise,
+        // check the 'completed' flag: if set, the write is completed.
+        if (!(epRealised & EP(endpoint)))
+        {
+            // endpoint isn't realised - can't read it
+            result = EP_INVALID;
+        }
+        else if (epComplete & EP(endpoint)) 
         {
             // the result is COMPLETED
             result = EP_COMPLETED;
@@ -551,117 +725,50 @@
 
 void USBHAL::stallEndpoint(uint8_t endpoint) 
 {
-    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
+    ENTER_CRITICAL_SECTION
+    {
+        if (epRealised & EP(endpoint))
+            USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
+    }
+    EXIT_CRITICAL_SECTION
 }
 
 void USBHAL::unstallEndpoint(uint8_t endpoint) 
 {
     ENTER_CRITICAL_SECTION
     {
-        // clear the stall bit in the endpoint register
-        USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
-        
-        // take ownership of the BDT entry
-        int idx = PEP_BDT_IDX(endpoint, 0);
-        bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
-        
-        // if this is an RX endpoint, start a new read
-        if (OUT_EP(endpoint))
+        if (epRealised & EP(endpoint))
         {
-            bdt[idx].byte_count = epMaxPacket[endpoint];
-            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
+            // clear the stall bit in the endpoint register
+            USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
+            
+            // take ownership of the BDT entry
+            int idx = PEP_BDT_IDX(endpoint, 0);
+            bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
+            
+            // if this is an RX endpoint, start a new read
+            if (OUT_EP(endpoint))
+            {
+                bdt[idx].byte_count = epMaxPacket[endpoint];
+                bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
+            }
+    
+            // Reset Data1 for the endpoint - we need to set the bit to 1 for 
+            // either TX or RX, by the same logic as in realiseEndpoint()
+            Data1 |= (1 << endpoint);
+            
+            // clear the 'completed' bit for the endpoint
+            epComplete &= ~(1 << endpoint);
         }
-
-        // Reset Data1 for the endpoint - we need to set the bit to 1 for 
-        // either TX or RX, by the same logic as in realiseEndpoint()
-        Data1 |= (1 << endpoint);
-        
-        // clear the 'completed' bit for the endpoint
-        epComplete &= ~(1 << endpoint);
     }
     EXIT_CRITICAL_SECTION
 }
 
-bool USBHAL::getEndpointStallState(uint8_t endpoint) 
-{
-    uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
-    return (stall) ? true : false;
-}
-
-void USBHAL::remoteWakeup(void) 
-{
-    // [TODO]
-}
-
-void USBHAL::_usbisr(void) 
+void USBHAL_KL25Z_unstall_EP0(bool force)
 {
-    inIRQ = true;
-    instance->usbisr();
-    inIRQ = false;
-}
-
-
-void USBHAL::usbisr(void) 
-{
-    uint8_t i;
-    uint8_t istat = USB0->ISTAT;
-
-    // reset interrupt
-    if (istat & USB_ISTAT_USBRST_MASK) 
+    ENTER_CRITICAL_SECTION
     {
-        // disable all endpoints
-        for (i = 0 ; i < 16 ; i++)
-            USB0->ENDPOINT[i].ENDPT = 0x00;
-
-        // enable control endpoint
-        realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
-        realiseEndpoint(EP0IN, MAX_PACKET_SIZE_EP0, 0);
-
-        // reset DATA0/1 state
-        Data1 = 0x55555555;
-    
-        // reset endpoint completion status
-        epComplete = 0;
-        
-        // reset EVEN/ODD state (and keep it permanently on EVEN -
-        // this disables the hardware double-buffering system)
-        USB0->CTL |=  USB_CTL_ODDRST_MASK;
-
-        USB0->ISTAT   =  0xFF;  // clear all interrupt status flags
-        USB0->ERRSTAT =  0xFF;  // clear all error flags
-        USB0->ERREN   =  0xFF;  // enable error interrupt sources
-        USB0->ADDR    =  0x00;  // set default address
-        
-        // notify upper layers of the bus reset
-        busReset();
-        
-        // we're not suspended
-        suspendStateChanged(0);
-
-        // do ONLY the reset processing on a RESET interrupt
-        return;
-    }
-
-    // resume interrupt
-    if (istat & USB_ISTAT_RESUME_MASK) 
-    {
-        suspendStateChanged(0);
-        USB0->ISTAT = USB_ISTAT_RESUME_MASK;
-    }
-
-    // SOF interrupt
-    if (istat & USB_ISTAT_SOFTOK_MASK) 
-    {
-        // Read frame number and signal the SOF event to the callback
-        SOF(frameNumber());
-        USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
-    }
-
-    // stall interrupt
-    if (istat & USB_ISTAT_STALL_MASK)
-    {
-        // if the control endpoint (EP 0) is stalled, unstall it
-        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
+        if (force || (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK))
         {
             // clear the stall bit in the endpoint register
             USB0->ENDPOINT[0].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
@@ -679,23 +786,119 @@
             // logic as in realiseEndpoint()            
             Data1 |= 0x03;
         }
+    }
+    EXIT_CRITICAL_SECTION
+}
+
+bool USBHAL::getEndpointStallState(uint8_t endpoint) 
+{
+    uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
+    return (stall) ? true : false;
+}
+
+void USBHAL::remoteWakeup(void) 
+{
+    // [TODO]
+}
+
+// Internal reset handler.  Called when we get a Bus Reset signal
+// from the host, and when we initiate a reset of the SIE hardware
+// from the device side.
+void USBHAL::internalReset(void)
+{
+    ENTER_CRITICAL_SECTION
+    {
+        int i;
         
-        // clear the busy-suspend bit to resume token processing
+        // set the default bus address
+        USB0->ADDR = 0x00;
+        addr = 0;
+        set_addr = 0;
+        
+        // disable all endpoints
+        epRealised = 0x00;
+        for (i = 0 ; i < 16 ; i++)
+            USB0->ENDPOINT[i].ENDPT = 0x00;
+    
+        // take control of all BDTs away from the SIE
+        for (i = 0 ; i < sizeof(bdt)/sizeof(bdt[0]) ; ++i) 
+        {
+            bdt[i].info = 0;
+            bdt[i].byte_count = 0;
+        }
+            
+        // reset DATA0/1 state
+        Data1 = 0x55555555;
+    
+        // reset endpoint completion status
+        epComplete = 0;
+    
+        // reset EVEN/ODD state (and keep it permanently on EVEN -
+        // this disables the hardware double-buffering system)
+        USB0->CTL |= USB_CTL_ODDRST_MASK;
+        
+        // reset error status and enable all error interrupts
+        USB0->ERRSTAT = 0xFF;
+        USB0->ERREN = 0xFF;
+        
+        // enable our standard complement of interrupts
+        USB0->INTEN = BUS_RESET_INTERRUPTS;
+        
+        // we're not suspended
+        suspendStateChanged(0);
+        
+        // we're not sleeping
+        sleepStateChanged(0);
+    
+        // notify upper layers of the bus reset, to reset the protocol state
+        busReset();
+        
+        // realise the control endpoint (EP0) in both directions
+        realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
+        realiseEndpoint(EP0IN, MAX_PACKET_SIZE_EP0, 0);
+    }
+    EXIT_CRITICAL_SECTION
+}
+
+void USBHAL::_usbisr(void) 
+{
+    inIRQ = true;
+    instance->usbisr();
+    inIRQ = false;
+}
+
+void USBHAL::usbisr(void) 
+{
+    // get the interrupt status - this tells us which event(s)
+    // triggered this interrupt
+    uint8_t istat = USB0->ISTAT;
+       
+    // reset interrupt
+    if (istat & USB_ISTAT_USBRST_MASK) 
+    {
+        // do the internal reset work
+        internalReset();
+        
+        // resume token processing if it was suspended
         USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
         
-        // clear the interrupt status bit for STALL
-        USB0->ISTAT = USB_ISTAT_STALL_MASK;
+        // clear the interrupt status
+        USB0->ISTAT = USB_ISTAT_USBRST_MASK;
+        
+        // return immediately, ignoring any other status flags
+        return;
     }
-
+    
     // token interrupt
     if (istat & USB_ISTAT_TOKDNE_MASK) 
     {
         // get the endpoint information from the status register
-        uint32_t num  = (USB0->STAT >> 4) & 0x0F;
-        uint32_t dir  = (USB0->STAT >> 3) & 0x01;
+        uint32_t stat = USB0->STAT;
+        uint32_t num  = (stat >> 4) & 0x0F;
+        uint32_t dir  = (stat >> 3) & 0x01;
         int endpoint = (num << 1) | dir;
-        uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;
-
+        uint32_t ev_odd = (stat >> 2) & 0x01;
+        
         // check which endpoint we're working with
         if (num == 0)
         {
@@ -709,12 +912,11 @@
                 // before each send)
                 Data1 &= ~0x02;
                 
-                // forcibly take ownership of the EP0IN endpoints in case we have
-                // unfinished previous transmissions (the protocol state machine here
+                // Forcibly take ownership of the EP0IN BDT in case we have
+                // unfinished previous transmissions.  The protocol state machine
                 // assumes that we don't, so it's probably an error if this code
-                // actually does anything, but we make no provision for handling this)
+                // actually does anything, but just in case...
                 bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
-                bdt[EP_BDT_IDX(0, TX, ODD )].info &= ~BD_OWN_MASK;
 
                 // handle the EP0 SETUP event in the generic protocol layer
                 EP0setupCallback();
@@ -731,9 +933,9 @@
                 
                 // Special case: if the 'set address' flag is set, it means that the
                 // host just sent us our bus address.  We must put this into effect
-                // in the hardware SIE *after* sending the reply, which we just did
-                // above.  So it's now time!
-                if (set_addr == 1) {
+                // in the hardware SIE immediately after sending the reply.  We just
+                // did that above, so this is the time.
+                if (set_addr) {
                     USB0->ADDR = addr & 0x7F;
                     set_addr = 0;
                 }
@@ -751,17 +953,108 @@
             }
         }
 
+        // resume token processing if suspended
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+
         // clear the TOKDNE interrupt status bit
         USB0->ISTAT = USB_ISTAT_TOKDNE_MASK;
+        return;
+    }
+
+    // SOF interrupt
+    if (istat & USB_ISTAT_SOFTOK_MASK) 
+    {
+        // Read frame number and signal the SOF event to the callback
+        SOF(frameNumber());
+        USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
+    }
+
+    // stall interrupt
+    if (istat & USB_ISTAT_STALL_MASK)
+    {
+        // if the control endpoint (EP 0) is stalled, unstall it
+        USBHAL_KL25Z_unstall_EP0(false);
+        
+        // clear the busy-suspend bit to resume token processing
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+        
+        // clear the interrupt status bit for STALL
+        USB0->ISTAT = USB_ISTAT_STALL_MASK;
     }
 
-    // sleep interrupt
+    // Sleep interrupt.  This indicates that the USB bus has been
+    // idle for at least 3ms (no frames transacted).  This has
+    // several possible causes:
+    //
+    //  - The USB cable was unplugged
+    //  - The host was powered off
+    //  - The host has stopped communicating due to a software fault
+    //  - The host has stopped communicating deliberately (e.g., due
+    //    to user action, or due to a protocol error)
+    //
+    // A "sleep" event on the SIE is not to be confused with the
+    // sleep/suspend power state on the PC.  The sleep event here
+    // simply means that the SIE isn't seeing token traffic on the
+    // required schedule.
+    //
+    // Note that the sleep event is the closest thing the KL25Z USB 
+    // module has to a disconnect event.  There's no way to detect 
+    // if we're physically connected to a host, so all we can really
+    // know is that we're not transacting tokens.  USB requires token
+    // exchange every 1ms, so if there's no token exchange for a few
+    // milliseconds, the connection must be broken at some level.
     if (istat & USB_ISTAT_SLEEP_MASK) 
     {
-        suspendStateChanged(1);
+        // tell the upper layers about the change
+        sleepStateChanged(1);
+        
+        // $$$ cycle the connection to trigger a host retry if this
+        // is during the initial connection
+        if (USB0->ADDR != 0x00)
+        {
+            static Timeout to;
+            disconnect();
+            to.attach_us(this, &USBHAL::connect, 5000);
+        }
+        
+        // resume token processing
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+
+        // reset the interrupt bit
         USB0->ISTAT = USB_ISTAT_SLEEP_MASK;
     }
 
+    // Resume from suspend mode.
+    //
+    // NB: Don't confuse "suspend" with "sleep".  Suspend mode refers 
+    // to a hardware low-power mode initiated by the device.  "Sleep"
+    // means only that the USB connection has been idle (no tokens
+    // transacted) for more than 3ms.  A sleep signal means that the
+    // connection with the host was broken, either physically or 
+    // logically; it doesn't of itself have anything to do with suspend
+    // mode, and in particular it doesn't mean that the host has
+    // commanded us to enter suspend mode or told us that the host
+    // is entering a low-power state.  The higher-level device
+    // implementation might choose to enter suspend mode on the device
+    // in response to a lost connection, but the USB/HAL layers don't
+    // take any such action on their own.  Note that suspend mode can
+    // only end with explicit intervention by the host, in the form of
+    // a USB RESUME signal, so the host has to be aware that we're
+    // doing this sort of power management.
+    if (istat & USB_ISTAT_RESUME_MASK) 
+    {
+        // note the change
+        suspendStateChanged(0);
+
+        // remove suspend mode flags
+        USB0->USBCTRL &= ~USB_USBCTRL_SUSP_MASK;
+        USB0->USBTRC0 &= ~USB_USBTRC0_USBRESMEN_MASK;
+        USB0->INTEN &= ~USB_INTEN_RESUMEEN_MASK;
+        
+        // clear the interrupt status
+        USB0->ISTAT = USB_ISTAT_RESUME_MASK;
+    }
+
     // error interrupt
     if (istat & USB_ISTAT_ERROR_MASK) 
     {