USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

Revision:
36:20bb47609697
Parent:
35:53e1a208f582
Child:
37:c5ac4ccf6597
--- a/USBDevice/USBHAL_KL25Z.cpp	Fri Dec 25 20:24:34 2015 +0000
+++ b/USBDevice/USBHAL_KL25Z.cpp	Tue Jan 05 05:22:50 2016 +0000
@@ -16,12 +16,18 @@
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */
 
+#define DEBUG_PRINTF // enable debug messages
+
 #if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)
 
 #include "USBHAL.h"
 
 USBHAL * USBHAL::instance;
 
+// DMAERR has occurred
+bool USB_DMAERR;
+
+
 // Perform an operation atomically, by disabling interrupts.  This must be
 // used for any test-and-write operations in non-IRQ code, including increment, 
 // decrement, bit set, and bit clear, on variables that are also written 
@@ -37,14 +43,6 @@
         NVIC_EnableIRQ(USB0_IRQn); \
     } while (0)
     
-    
-// endpoint completion flags
-static volatile int epComplete = 0;
-
-// clear a completion flag - must be atomic, as the IRQ also manipulates these bits
-#define clear_completion(endpoint) atomic(epComplete &= ~EP(endpoint))
-    
-
 // Convert physical endpoint number to register bit
 #define EP(endpoint) (1<<(endpoint))
 
@@ -62,39 +60,99 @@
 #define BD_DTS_MASK        (1<<3)
 #define BD_STALL_MASK      (1<<2)
 
+// Endpoint direction.  These are relative to the DEVICE:
+//  TX = transmit to host = "IN" endpoint
+//  RX = receive from host = "OUT" endpoint
 #define TX    1
 #define RX    0
-#define ODD   0
-#define EVEN  1
-// this macro waits a physical endpoint number
-#define EP_BDT_IDX(ep, dir, odd) (((ep * 4) + (2 * dir) + (1 *  odd)))
+
+// Double-buffering parity.  The hadrware SIE has a native double-buffering
+// scheme that switches back and forth between two buffers per physical
+// endpoint.  We disable this feature, so we effectively only use the EVEN
+// buffer for each endpoint.
+#define EVEN  0
+#define ODD   1
+
+// Get the BDT index for a given endpoint.  'logep' is the LOGICAL
+// endpoint number; 'dir' is the direction (TX or RX).  'odd' is
+// the even/odd flag for the hardware SIE double-buffering system.
+// We don't actually use the double-buffering scheme, so this is 
+// essentially always EVEN throughout this implementation.
+#define EP_BDT_IDX(logep, dir, odd) ((4*(logep)) + (2*(dir)) + (1*(odd)))
+
+// get the BDT for a given physical endpoint
+#define PEP_BDT_IDX(phyep, odd) ((2*(phyep)) + (1*(odd)))
+
 
 #define SETUP_TOKEN    0x0D
 #define IN_TOKEN       0x09
 #define OUT_TOKEN      0x01
 #define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
 
-// for each endpt: 8 bytes
+
+// Buffer Descriptor Table (BDT) entry.
+//
+// Each entry in the BDT uses this structure, which is defined by
+// the hardware SIE design.
 typedef struct BDT {
     uint8_t   info;       // BD[0:7]
     uint8_t   dummy;      // RSVD: BD[8:15]
-    uint16_t  byte_count; // BD[16:32]
+    uint16_t  byte_count; // BD[16:25] reserved[26:31]
     uint32_t  address;    // Addr
-} BDT;
+} __attribute__((packed)) BDT;
 
+// The Buffer Descriptor Table is a contiguous block of BDT entries
+// (the structure defined above).  This memory block is shared with the
+// hardware SIE to coordinate endpoint send/receive operations between
+// the hardware and software.  The hardware design requires this block
+// to be aligned on a 512-byte boundary; the location of the block is
+// stored in a hardware register that we set during initialization.
+//
+// There are:
+//    * 16 logical endpoints (EP0 .. EP15)
+//    * 2 physical endpoints (TX and RX) per logical endpoint -> 32 physical endpoints
+//    * two buffers per physical endpoint (EVEN and ODD) -> 64 buffers
+__attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];
+uint8_t *endpoint_buffer[NUMBER_OF_PHYSICAL_ENDPOINTS];
+
+// The device's bus address
+static uint8_t addr = 0;
+
+// Mode flag indicating that the host will send our assigned address
+// in the next packet
+static uint8_t set_addr = 0;
 
-// there are:
-//    * 16 bidirectionnal endpt -> 32 physical endpt
-//    * as there are ODD and EVEN buffer -> 32*2 bdt
-__attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];
-uint8_t * endpoint_buffer[NUMBER_OF_PHYSICAL_ENDPOINTS];
+// Private data per endpoint
+struct {
+    uint32_t flags;     // flags set when endpoint was realized
+    uint32_t bufsiz;    // allocated buffer size for endpoint
+} epInfo[NUMBER_OF_PHYSICAL_ENDPOINTS];
+    
+// Endpoint completion flags (one for each endpoint).  These flags indicate
+// if the last read or write operation on an endpoint has completed.  It's 
+// set by the interrupt handler when a send/receive operation completes.  
+// It's cleared by the readResult/writeResult routines on retrieving a 
+// successful completion status.
+static volatile int epComplete = 0;
 
-static uint8_t set_addr = 0;
-static uint8_t addr = 0;
+// clear a completion flag - must be atomic, as the IRQ also manipulates these bits
+#define clear_completion(endpoint) atomic(epComplete &= ~EP(endpoint))
+    
+// Endpoint ready flags.  These indicate if endpoints are ready for their
+// next read/write operations.  These are set by the read/write routines, and
+// cleared by the interrupt handler.  Note that these are *almost* the
+// inverse of the epComplete flags, but not quite: these aren't consumed by
+// the status checks.  We use these to check for clearance before starting an
+// operation, to make sure we're not attempting to overwrite a previous one.
+static volatile uint32_t epReady = 0;
+
+// set/clear the ready flag - must be atomic, as the IRQ also manipulates these bits
+#define set_ready(endpoint) atomic(epReady |= EP(endpoint))
+#define clear_ready(endpoint) atomic(epReady &= ~EP(endpoint))
 
 
 // DATA0/DATA1 setting for next packet on each endpoint
-static uint32_t Data1  = 0x55555555;
+static volatile uint32_t Data1  = 0x55555555;
 
 // set DATA0/DATA1 status on an endpoint (must be atomic as the IRQ handler
 // manpulates this variable)
@@ -110,7 +168,8 @@
     return 0;
 }
 
-USBHAL::USBHAL(void) {
+USBHAL::USBHAL(void) 
+{
     // Disable IRQ
     NVIC_DisableIRQ(USB0_IRQn);
 
@@ -149,50 +208,51 @@
     epCallback[28] = &USBHAL::EP15_OUT_callback;
     epCallback[29] = &USBHAL::EP15_IN_callback;
 
-
     // choose usb src as PLL
     SIM->SOPT2 |= (SIM_SOPT2_USBSRC_MASK | SIM_SOPT2_PLLFLLSEL_MASK);
 
     // enable OTG clock
     SIM->SCGC4 |= SIM_SCGC4_USBOTG_MASK;
 
-    // Attach IRQ
-    instance = this;
-    NVIC_SetVector(USB0_IRQn, (uint32_t)&_usbisr);
-    NVIC_EnableIRQ(USB0_IRQn);
-
     // USB Module Configuration
     // Reset USB Module
     USB0->USBTRC0 |= USB_USBTRC0_USBRESET_MASK;
     while(USB0->USBTRC0 & USB_USBTRC0_USBRESET_MASK);
 
     // Set BDT Base Register
-    USB0->BDTPAGE1 = (uint8_t)((uint32_t)bdt>>8);
-    USB0->BDTPAGE2 = (uint8_t)((uint32_t)bdt>>16);
-    USB0->BDTPAGE3 = (uint8_t)((uint32_t)bdt>>24);
+    USB0->BDTPAGE1 = (uint8_t)(uint32_t(bdt) >> 8);
+    USB0->BDTPAGE2 = (uint8_t)(uint32_t(bdt) >> 16);
+    USB0->BDTPAGE3 = (uint8_t)(uint32_t(bdt) >> 24);
 
     // Clear interrupt flag
     USB0->ISTAT = 0xff;
 
     // USB Interrupt Enablers
-    USB0->INTEN |= USB_INTEN_TOKDNEEN_MASK |
-                   USB_INTEN_SOFTOKEN_MASK |
-                   USB_INTEN_ERROREN_MASK  |
-                   USB_INTEN_SLEEPEN_MASK |
-                   USB_INTEN_RESUMEEN_MASK |
-                   USB_INTEN_USBRSTEN_MASK;
+    USB0->INTEN = USB_INTEN_TOKDNEEN_MASK |
+                  USB_INTEN_SOFTOKEN_MASK |
+                  USB_INTEN_ERROREN_MASK  |
+                  USB_INTEN_SLEEPEN_MASK |
+                  USB_INTEN_RESUMEEN_MASK |
+                  USB_INTEN_USBRSTEN_MASK |
+                  (1 << 7);                   // stall mask
+
+    // Attach IRQ
+    instance = this;
+    NVIC_SetVector(USB0_IRQn, (uint32_t)&_usbisr);
+    NVIC_EnableIRQ(USB0_IRQn);
 
     // Disable weak pull downs
     USB0->USBCTRL &= ~(USB_USBCTRL_PDE_MASK | USB_USBCTRL_SUSP_MASK);
-
     USB0->USBTRC0 |= 0x40;
 }
 
 USBHAL::~USBHAL(void) { }
 
-void USBHAL::connect(void) {
+void USBHAL::connect(void) 
+{
     // enable USB
     USB0->CTL |= USB_CTL_USBENSOFEN_MASK;
+    
     // Pull up enable
     USB0->CONTROL |= USB_CONTROL_DPPULLUPNONOTG_MASK;
 }
@@ -247,63 +307,103 @@
         alo_size = 64;
     }
     
-    if (endpoint_buffer[endpoint] == 0)
+    // if we have a buffer, but the new buffer size is bigger, delete the old one
+    if (endpoint_buffer[endpoint] != 0 && epInfo[endpoint].bufsiz < maxPacket) {
+        free(endpoint_buffer[endpoint]);
+        endpoint_buffer[endpoint] = 0;
+        epInfo[endpoint].bufsiz = 0;
+    }
+    
+    // allocate a new buffer if we don't already have one    
+    if (endpoint_buffer[endpoint] == 0) {
         endpoint_buffer[endpoint] = (uint8_t *)malloc(alo_size);
+        epInfo[endpoint].bufsiz = alo_size;
+    }
+    
+    // remember the realisation flags for the endpoint
+    epInfo[endpoint].flags = flags;
 
-    // IN endpt -> device to host (TX)
-    if (IN_EP(endpoint)) {
+    // Set DATA1 initially on the endpoint.  If it's an RX endpoint, we're about to start
+    // a read with DATA0, so the next read will be DATA1.  If it's a TX endpoint, we'll
+    // actually want to send DATA0 for the first packet, but for that we need the flag
+    // to be set to DATA1, since the write routine always inverts the flag *before*
+    // using it.
+    set_DATA1(endpoint);
+    
+    // the endpoint hasn't completed its first operation yet
+    clear_completion(endpoint);
+    
+    if (IN_EP(endpoint)) 
+    {
+        // IN endpt -> device to host (TX)
+        set_ready(endpoint);
         USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag |        // ep handshaking (not if iso endpoint)
                                               USB_ENDPT_EPTXEN_MASK;  // en TX (IN) tran
-        bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].address = (uint32_t) endpoint_buffer[endpoint];
-        bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].address = 0;
+        bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].info       = 0;
+        bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].info       = 0;
+        bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].address = (uint32_t) endpoint_buffer[endpoint];
+        bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].address = 0;
     }
-    // OUT endpt -> host to device (RX)
-    else {
+    else 
+    {
+        // OUT endpt -> host to device (RX)
+        clear_ready(endpoint);
         USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag |        // ep handshaking (not if iso endpoint)
                                               USB_ENDPT_EPRXEN_MASK;  // en RX (OUT) tran.
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].byte_count = maxPacket;
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].address    = (uint32_t) endpoint_buffer[endpoint];
-        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].address    = 0;
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].info       = BD_OWN_MASK | BD_DTS_MASK;
-        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].info       = 0;
+        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].byte_count = maxPacket;
+        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].address    = (uint32_t) endpoint_buffer[endpoint];
+        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].address    = 0;
+        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].info       = BD_OWN_MASK | BD_DTS_MASK;
+        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].info       = 0;
     }
 
-    set_DATA1(endpoint);
-    clear_completion(endpoint);
-
+    // success
     return true;
 }
 
 // read setup packet
-void USBHAL::EP0setup(uint8_t *buffer) {
+void USBHAL::EP0setup(uint8_t *buffer) 
+{
     uint32_t sz;
-    endpointReadResult(EP0OUT, buffer, &sz);
+    if (endpointReadResult(EP0OUT, buffer, &sz) != EP_COMPLETED)
+        printf("EP0setup not ready\r\n"); // $$$
+    if (sz != 8) 
+        printf("EP0setup - wrong size! %d bytes, expected 8 bytes\r\n", sz); // $$$
 }
 
-void USBHAL::EP0readStage(void) {
+void USBHAL::EP0readStage(void) 
+{
+    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(EP0OUT), RX, 0);
     set_DATA0(EP0OUT);                              // set DATA0 for the next packet
-    bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);      // start the read
+    { // if (epReady & EP(EP0OUT)) {
+        bdt[idx].byte_count = MAX_PACKET_SIZE_EP0;
+        bdt[idx].info = (BD_DTS_MASK | BD_OWN_MASK);    // start the read
+    }
+#ifdef DEBUG_PRINTF
+ //   else printf("!!! EP0readStage() not ready !!!\r\n");
+#endif
 }
 
-void USBHAL::EP0read(void) {
-    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(EP0OUT), RX, 0);
-    bdt[idx].byte_count = MAX_PACKET_SIZE_EP0;
+void USBHAL::EP0read(void) 
+{
 }
 
-uint32_t USBHAL::EP0getReadResult(uint8_t *buffer) {
+uint32_t USBHAL::EP0getReadResult(uint8_t *buffer) 
+{
     uint32_t sz;
-    endpointReadResult(EP0OUT, buffer, &sz);
+    while (endpointReadResult(EP0OUT, buffer, &sz) == EP_PENDING) ;
     return sz;
 }
 
 void USBHAL::EP0write(uint8_t *buffer, uint32_t size) {
-    endpointWrite(EP0IN, buffer, size);
+    while (endpointWrite(EP0IN, buffer, size) == EP_BUSY) ;
 }
 
 void USBHAL::EP0getWriteResult(void) {
 }
 
-void USBHAL::EP0stall(void) {
+void USBHAL::EP0stall(void) 
+{
     stallEndpoint(EP0OUT);
 }
 
@@ -312,7 +412,7 @@
     return EP_PENDING;
 }
 
-EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t * buffer, uint32_t *bytesRead) 
+EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t *buffer, uint32_t *bytesRead) 
 {
     // validate the endpoint number
     if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
@@ -327,14 +427,32 @@
 
     // get the BDT index
     int idx = EP_BDT_IDX(log_endpoint, RX, 0);
+ 
+#if 1 // $$$$   
+#ifdef DEBUG_PRINTF
+    if (endpoint == EP0OUT && !(epReady & EP(endpoint))) printf("!!! EP0 read not ready\r\n");
+#endif
+    // if the endpoint isn't ready, the read is pending
+    if (!(epReady & EP(endpoint)))
+        return EP_PENDING;
+#else
+    // If the endpoint isn't ready, the read is still pending.  Ignore this
+    // for EP0, since the control procotol inherently serializes packets on
+    // both the RX and TX endpoints.
+    if (endpoint != EP0OUT && !(epReady & EP(endpoint)))
+        return EP_PENDING;
+#endif
 
     // make sure we're marked as complete (except for isochronous endpoints)
-    bool iso = !(USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK);
+    bool iso = epInfo[log_endpoint].flags & ISOCHRONOUS;
     if ((log_endpoint != 0) && !iso && !(epComplete & EP(endpoint)))
         return EP_PENDING;
 
     // this consumes the read status - clear the completion flag to prepare for the next read
     clear_completion(endpoint);
+    
+    // the new operation is pending, so the endpoint is not ready
+    clear_ready(endpoint);
 
     // copy the data from the hardware buffer to the caller's buffer
     uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
@@ -358,9 +476,10 @@
     // hand off the BDT to start the next read
     bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
 
-    // turn off the "suspend token busy" flag (the hardware sets this for
-    // SETUP tokens to let us process tokens sequentially in this special case)
-    USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+    // If this is a SETUP packet, clear the "suspend token busy" hardware flag.
+    // The SIE sets this bit during setup to allow non-interrupted processing.
+    if (setup)
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
     
     // operation completed
     return EP_COMPLETED;
@@ -379,10 +498,23 @@
     // get the BDT entry for the endpoint
     int idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, 0);
 
-    // if the buffer is owned by the SIE, there's a previous write that
-    // hasn't finished yet - we can't start a new one yet
-    if (bdt[idx].info & BD_OWN_MASK)
+#if 0
+#ifdef DEBUG_PRINTF
+    if (endpoint == EP0IN && !(epReady & EP(endpoint))) printf("!!! EP0 write not ready\r\n");
+#endif
+    // $$$
+    // if the endpoint isn't ready, return BUSY, since we have an outstanding
+    // write that hasn't completed yet
+    if (!(epReady & EP(endpoint)))
         return EP_BUSY;
+#else
+    // If the endpoint isn't marked as ready, return BUSY, since we have an
+    // outstanding write on the endpoint that hasn't completed yet.  Skip this
+    // test for EP0, since EP0 is handled entirely through interrupt callbacks
+    // and packet exchange is inherently serialized by the protocol.  
+    if (endpoint != EP0IN && !(epReady & EP(endpoint)))
+        return EP_BUSY;
+#endif
 
     // copy the bytes to the endpoint hardware buffer
     uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
@@ -392,6 +524,10 @@
 
     // toggle DATA0/DATA1 on each packet
     toggle_DATA01(endpoint);
+    
+    // mark the endpoint as not ready and not complete
+    clear_completion(endpoint);
+    clear_ready(endpoint);
 
     // hand the buffer to the SIE
     bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
@@ -402,25 +538,88 @@
 
 EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint) 
 {
-    if (epComplete & EP(endpoint)) {
+    // if the endpoint is ready, and the completion flag is set, we're done
+    uint32_t mask = EP(endpoint);
+    if ((epReady & mask) && (epComplete & mask)) {
         clear_completion(endpoint);
         return EP_COMPLETED;
     }
-
+    
     return EP_PENDING;
 }
 
-void USBHAL::stallEndpoint(uint8_t endpoint) {
-    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
+void USBHAL::stallEndpoint(uint8_t endpoint) 
+{
+#ifdef DEBUG_PRINTF
+    printf("Stall %d\r\n", endpoint);
+#endif
+    int idx = PEP_BDT_IDX(endpoint, EVEN);
+    if ((epReady & EP(endpoint)) && !(bdt[idx].info & BD_OWN_MASK))
+        bdt[idx].info |= BD_STALL_MASK;
+    else
+        USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
 }
 
-void USBHAL::unstallEndpoint(uint8_t endpoint) {
-    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
+void USBHAL::unstallEndpoint(uint8_t endpoint) 
+{
+#ifdef DEBUG_PRINTF
+    printf("Unstall %d\r\n", endpoint);
+#endif
+
+    // get the physical endpoint
+    int logep = PHY_TO_LOG(endpoint);
+    
+    // figure the new endpoint control register flags
+    uint8_t f = IN_EP(endpoint) ? USB_ENDPT_EPTXEN_MASK : USB_ENDPT_EPRXEN_MASK;
+    if (logep != 0)
+    {
+        // add the HANDSHAKE flag if it's not isochronous
+        if (!(epInfo[endpoint].flags & ISOCHRONOUS))
+            f |= USB_ENDPT_EPHSHK_MASK;
+    }
+            
+    // set the new flags
+    USB0->ENDPOINT[logep].ENDPT = f;
+    
+    // clear the STALL bit in the endponit control register
+    USB0->ENDPOINT[logep].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
+    
+    // clear the stall bit in the BDT, and take ownership
+    int idx = PEP_BDT_IDX(endpoint, EVEN);
+    bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK);
+    
+    // the endpoint is now NOT COMPLETED
+    clear_completion(endpoint);
+    
+    // Set the data flag to DATA1 for the endpoint.  If it's TX, the next
+    // packet sent needs to be DATA0, and write flips the bit first, so we
+    // want it to be DATA1 now.  If it's RX, we're about to start a read
+    // with DATA0, so the next read will be back to DATA1.
+    set_DATA1(endpoint);
+
+    // check the endpoint type for final settings
+    if (IN_EP(endpoint))
+    {
+        // TX endpoint - mark as ready so that next write operation can proceed,
+        // and set the DATA1 flag so that the next packet will be sent as DATA0
+        // (the write opereation flips the bit before sending)
+        set_ready(endpoint);
+    }
+    else
+    {
+        // RX endpoint - start next read operation with DATA0
+        if (epReady & EP(endpoint))
+        {
+            clear_ready(endpoint);
+            bdt[idx].byte_count = epInfo[endpoint].bufsiz;
+            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
+        }
+    }
 }
 
-bool USBHAL::getEndpointStallState(uint8_t endpoint) {
-    uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
-    return (stall) ? true : false;
+bool USBHAL::getEndpointStallState(uint8_t endpoint) 
+{
+    return (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK) != 0;
 }
 
 void USBHAL::remoteWakeup(void) {
@@ -433,15 +632,20 @@
 }
 
 
-void USBHAL::usbisr(void) {
+void USBHAL::usbisr(void) 
+{
     uint8_t i;
     uint8_t istat = USB0->ISTAT;
 
     // reset interrupt
-    if (istat & USB_ISTAT_USBRST_MASK) {
-        
+    if (istat & USB_ISTAT_USBRST_MASK) 
+    {
+#ifdef DEBUG_PRINTF
+       // printf("\r\n>>> USB RESET\r\n");
+#endif
+
         // disable all endpt
-        for(i = 0; i < 16; i++) {
+        for(i = 0 ; i < 16 ; i++) {
             USB0->ENDPOINT[i].ENDPT = 0x00;
         }
 
@@ -451,11 +655,15 @@
 
         Data1 = 0x55555555;
         epComplete = 0;
+        
+        // Reset the SIE even/odd buffer state.  We keep this bit
+        // set throughout the session, which makes the EVEN buffer
+        // the only one that's ever used, effectively disabling the
+        // double-buffering system.
         USB0->CTL |=  USB_CTL_ODDRST_MASK;
 
-        USB0->ISTAT   =  0xFF;  // clear all interrupt status flags
-        USB0->ERRSTAT =  0xFF;  // clear all error flags
-        USB0->ERREN   =  0xFF;  // enable error interrupt sources
+        USB0->ERRSTAT =  0xBF;  // clear all error flags
+        USB0->ERREN   =  0x9F; // 0xBF;  // enable error interrupt sources 
         USB0->ADDR    =  0x00;  // set default address
         
         // notify upper layers of the bus reset
@@ -464,91 +672,174 @@
         // we're not suspended
         suspendStateChanged(0);
 
+        // clear all interrupt status flags and return        
+        USB0->ISTAT = 0xFF;
         return;
     }
 
     // resume interrupt
     if (istat & USB_ISTAT_RESUME_MASK) {
+        suspendStateChanged(0);
         USB0->ISTAT = USB_ISTAT_RESUME_MASK;
-        suspendStateChanged(0);
+        return;
     }
 
     // SOF interrupt
     if (istat & USB_ISTAT_SOFTOK_MASK) {
-        USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
         // SOF event, read frame number
         SOF(frameNumber());
+        USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
+        return;
     }
 
     // stall interrupt
-    if (istat & 1<<7) {
-        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
-            USB0->ENDPOINT[0].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
-        USB0->ISTAT |= USB_ISTAT_STALL_MASK;
+    if (istat & USB_ISTAT_STALL_MASK)
+    {
+#ifdef DEBUG_PRINTF
+        printf("\r\n>>>>STALL INTERRUPT\r\n");
+#endif
+        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK) {
+            unstallEndpoint(EP0IN);
+            unstallEndpoint(EP0OUT);
+        }
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+        USB0->ISTAT = USB_ISTAT_STALL_MASK;
+        return;
     }
 
-    // token interrupt
-    if (istat & 1<<3) {
-        uint32_t num  = (USB0->STAT >> 4) & 0x0F;
-        uint32_t dir  = (USB0->STAT >> 3) & 0x01;
-        uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;
+    // TOKEN DONE interrupt
+    if (istat & USB_ISTAT_TOKDNE_MASK)
+    {
+        uint32_t stat = USB0->STAT;
+        uint32_t num  = (stat >> 4) & 0x0F;
+        uint32_t dir  = (stat >> 3) & 0x01;
+        uint32_t ev_odd = (stat >> 2) & 0x01;
         int endpoint = (num << 1) | dir;
+        int idx = EP_BDT_IDX(num, dir, ev_odd);
+        int pid = TOK_PID(idx);
+        int len = bdt[idx].byte_count;
+        
+        if (ev_odd) printf("!!! odd BDT entry???\r\n");//$$$
 
+        // endpoint is ready for next operation
+        epReady |= EP(endpoint);
+        // if (num == 0) epReady |= EP(EP0OUT) | EP(EP0IN);
+        
+        //$$$for (int i = 0 ; (bdt[idx].info & BD_OWN_MASK) && i < 10000000 ; ++i) ;
+        //$$$static int unrel;
+       //$$$ if (bdt[idx].info & BD_OWN_MASK) { printf("\r\nIRQ: BDT wasn't released from SIE to CPU (ep=%d)\r\n", endpoint); } 
+        if (pid == SETUP_TOKEN && len != 8) {
+            pid = OUT_TOKEN;
+          //$$$  printf("IRQ: SETUP packet has wrong size (%d)\r\n", bdt[EP_BDT_IDX(num, dir, ev_odd)].byte_count);//$$$
+        }
+            
         // setup packet
-        if ((num == 0) && (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == SETUP_TOKEN)) {
+        if (num == 0 && pid == SETUP_TOKEN)
+        {
             Data1 &= ~0x02;
-            bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
-            bdt[EP_BDT_IDX(0, TX, ODD)].info  &= ~BD_OWN_MASK;
+            //bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
+            //bdt[EP_BDT_IDX(0, TX, ODD )].info &= ~BD_OWN_MASK;
 
             // EP0 SETUP event (SETUP data received)
             EP0setupCallback();
-
-        } else {
+        } 
+        else if (pid == OUT_TOKEN)
+        {
             // OUT packet
-            if (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == OUT_TOKEN) {
-                if (num == 0)
-                    EP0out();
-                else {
-                    epComplete |= EP(endpoint);
-                    if ((instance->*(epCallback[endpoint - 2]))()) {
-                        epComplete &= ~EP(endpoint);
-                    }
+            if (num == 0)
+                EP0out();
+            else 
+            {
+                epComplete |= EP(endpoint);
+                if ((instance->*(epCallback[endpoint - 2]))())
+                    epComplete &= ~EP(endpoint);
+            }
+        }
+        else if (pid == IN_TOKEN)
+        {
+            // IN packet
+            if (num == 0) 
+            {
+                EP0in();
+                if (set_addr == 1) 
+                {
+                    USB0->ADDR = addr & 0x7F;
+                    set_addr = 0;
                 }
             }
-
-            // IN packet
-            if (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == IN_TOKEN) {
-                if (num == 0) {
-                    EP0in();
-                    if (set_addr == 1) {
-                        USB0->ADDR = addr & 0x7F;
-                        set_addr = 0;
-                    }
-                }
-                else {
-                    epComplete |= EP(endpoint);
-                    if ((instance->*(epCallback[endpoint - 2]))()) {
-                        epComplete &= ~EP(endpoint);
-                    }
-                }
+            else 
+            {
+                epComplete |= EP(endpoint);
+                if ((instance->*(epCallback[endpoint - 2]))())
+                    epComplete &= ~EP(endpoint);
             }
         }
 
         USB0->ISTAT = USB_ISTAT_TOKDNE_MASK;
+        return;
     }
 
     // sleep interrupt
-    if (istat & 1<<4) {
-        USB0->ISTAT |= USB_ISTAT_SLEEP_MASK;
+    if (istat & USB_ISTAT_SLEEP_MASK) 
+    {
+#ifdef DEBUG_PRINTF
+        //printf("\r\n>>> USB SLEEP INTERRUPT\r\n");
+#endif
         suspendStateChanged(1);
+        USB0->ISTAT = USB_ISTAT_SLEEP_MASK;
+        return;
     }
 
     // error interrupt
-    if (istat & USB_ISTAT_ERROR_MASK) {
-        USB0->ERRSTAT = 0xFF;
-        USB0->ISTAT |= USB_ISTAT_ERROR_MASK;
+    if (istat & USB_ISTAT_ERROR_MASK) 
+    {
+        uint32_t errstat = USB0->ERRSTAT;
+#ifdef DEBUG_PRINTF
+        printf("\r\n>>>>ERROR INTERRUPT: ERRSTAT=%x\r\n", errstat);
+#endif
+        if (errstat & 0x20)
+            USB_DMAERR = true;
+
+        // reset the error status
+        USB0->ERRSTAT = 0xBF;
+        
+        // allow the SIE to continue token processing
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
+
+        // reset the interrupt
+        USB0->ISTAT = USB_ISTAT_ERROR_MASK;
+        return;
     }
 }
 
 
+#ifdef DEBUG_PRINTF
+void USBDeviceStatusDump()
+{
+    printf("USBDevice status:\r\n");
+    printf("        0  0  1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10 10 11 11 12 12 13 13 14 14 15 15\r\n");
+    printf("       RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX\r\n");
+    printf("Data1  ");
+    for (int i = 0 ; i < 32 ; ++i)
+        printf(" %d ", (Data1 >> i) & 1);
+    printf("\r\n"
+           "Cmplt  ");
+    for (int i = 0 ; i < 32 ; ++i)
+        printf(" %d ", (epComplete >> i) & 1);
+    printf("\r\n"
+           "Ready  ");
+    for (int i = 0 ; i < 32 ; ++i)
+        printf(" %d ", (epReady >> i) & 1);
+    printf("\r\n"
+           "OWN    ");
+    for (int i = 0 ; i < 16 ; ++i)
+        printf(" %d  %d ", (bdt[EP_BDT_IDX(i, RX, 0)].info >> 7) & 1, (bdt[EP_BDT_IDX(i, TX, 0)].info >> 7) & 1);
+    printf("\r\n"
+           "Stall  ");
+    for (int i = 0 ; i < 16 ; ++i)
+        printf("  %s   ", (USB0->ENDPOINT[i].ENDPT & USB_ENDPT_EPSTALL_MASK) ? "S" : ".");
+    printf("\r\n\r\n");
+}
 #endif
+
+#endif