USB device stack, with KL25Z fixes for USB 3.0 hosts and sleep/resume interrupt handling

Dependents:   frdm_Slider_Keyboard idd_hw2_figlax_PanType idd_hw2_appachu_finger_chording idd_hw3_AngieWangAntonioDeLimaFernandesDanielLim_BladeSymphony ... more

Fork of USBDevice by mbed official

This is an overhauled version of the standard mbed USB device-side driver library, with bug fixes for KL25Z devices. It greatly improves reliability and stability of USB on the KL25Z, especially with devices using multiple endpoints concurrently.

I've had some nagging problems with the base mbed implementation for a long time, manifesting as occasional random disconnects that required rebooting the device. Recently (late 2015), I started implementing a USB device on the KL25Z that used multiple endpoints, and suddenly the nagging, occasional problems turned into frequent and predictable crashes. This forced me to delve into the USB stack and figure out what was really going on. Happily, the frequent crashes made it possible to track down and fix the problems. This new version is working very reliably in my testing - the random disconnects seem completely eradicated, even under very stressful conditions for the device.

Summary

  • Overall stability improvements
  • USB 3.0 host support
  • Stalled endpoint fixes
  • Sleep/resume notifications
  • Smaller memory footprint
  • General code cleanup

Update - 2/15/2016

My recent fixes introduced a new problem that made the initial connection fail most of the time on certain hosts. It's not clear if the common thread was a particular type of motherboard or USB chip set, or a specific version of Windows, or what, but several people ran into it. We tracked the problem down to the "stall" fixes in the earlier updates, which we now know weren't quite the right fixes after all. The latest update (2/15/2016) fixes this. It has new and improved "unstall" handling that so far works well with diverse hosts.

Race conditions and overall stability

The base mbed KL25Z implementation has a lot of problems with "race conditions" - timing problems that can happen when hardware interrupts occur at inopportune moments. The library shares a bunch of static variable data between interrupt handler context and regular application context. This isn't automatically a bad thing, but it does require careful coordination to make sure that the interrupt handler doesn't corrupt data that the other code was in the middle of updating when an interrupt occurs. The base mbed code, though, doesn't do any of the necessary coordination. This makes it kind of amazing that the base code worked at all for anyone, but I guess the interrupt rate is low enough in most applications that the glitch rate was below anyone's threshold to seriously investigate.

This overhaul adds the necessary coordination for the interrupt handlers to protect against these data corruptions. I think it's very solid now, and hopefully entirely free of the numerous race conditions in the old code. It's always hard to be certain that you've fixed every possible bug like this because they strike (effectively) at random, but I'm pretty confident: my test application was reliably able to trigger glitches in the base code in a matter of minutes, but the same application (with the overhauled library) now runs for days on end without dropping the connection.

Stalled endpoint fixes

USB has a standard way of handling communications errors called a "stall", which basically puts the connection into an error mode to let both sides know that they need to reset their internal states and sync up again. The original mbed version of the USB device library doesn't seem to have the necessary code to recover from this condition properly. The KL25Z hardware does some of the work, but it also seems to require the software to take some steps to "un-stall" the connection. (I keep saying "seems to" because the hardware reference material is very sketchy about all of this. Most of what I've figured out is from observing the device in action with a Windows host.) This new version adds code to do the necessary re-syncing and get the connection going again, automatically, and transparently to the user.

USB 3.0 Hosts

The original mbed code sometimes didn't work when connecting to hosts with USB 3.0 ports. This didn't affect every host, but it affected many of them. The common element seemed to be the Intel Haswell chip set on the host, but there may be other chip sets affected as well. In any case, the problem affected many PCs from the Windows 7 and 8 generation, as well as many Macs. It was possible to work around the problem by avoiding USB 3.0 ports - you could use a USB 2 port on the host, or plug a USB 2 hub between the host and device. But I wanted to just fix the problem and eliminate the need for such workarounds. This modified version of the library has such a fix, which so far has worked for everyone who's tried.

Sleep/resume notifications

This modified version also contains an innocuous change to the KL25Z USB HAL code to handle sleep and resume interrupts with calls to suspendStateChanged(). The original KL25Z code omitted these calls (and in fact didn't even enable the interrupts), but I think this was an unintentional oversight - the notifier function is part of the generic API, and other supported boards all implement it. I use this feature in my own application so that I can distinguish sleep mode from actual disconnects and handle the two conditions correctly.

Smaller memory footprint

The base mbed version of the code allocates twice as much memory for USB buffers as it really needed to. It looks like the original developers intended to implement the KL25Z USB hardware's built-in double-buffering mechanism, but they ultimately abandoned that effort. But they left in the double memory allocation. This version removes that and allocates only what's actually needed. The USB buffers aren't that big (128 bytes per endpoint), so this doesn't save a ton of memory, but even a little memory is pretty precious on this machine given that it only has 16K.

(I did look into adding the double-buffering support that the original developers abandoned, but after some experimentation I decided they were right to skip it. It just doesn't seem to mesh well with the design of the rest of the mbed USB code. I think it would take a major rewrite to make it work, and it doesn't seem worth the effort given that most applications don't need it - it would only benefit applications that are moving so much data through USB that they're pushing the limits of the CPU. And even for those, I think it would be a lot simpler to build a purely software-based buffer rotation mechanism.)

General code cleanup

The KL25Z HAL code in this version has greatly expanded commentary and a lot of general cleanup. Some of the hardware constants were given the wrong symbolic names (e.g., EVEN and ODD were reversed), and many were just missing (written as hard-coded numbers without explanation). I fixed the misnomers and added symbolic names for formerly anonymous numbers. Hopefully the next person who has to overhaul this code will at least have an easier time understanding what I thought I was doing!

Revision:
37:c5ac4ccf6597
Parent:
36:20bb47609697
Child:
38:072e12583e73
--- a/USBDevice/USBHAL_KL25Z.cpp	Tue Jan 05 05:22:50 2016 +0000
+++ b/USBDevice/USBHAL_KL25Z.cpp	Mon Jan 11 21:07:25 2016 +0000
@@ -16,160 +16,148 @@
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */
 
-#define DEBUG_PRINTF // enable debug messages
+#if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)
 
-#if defined(TARGET_KL25Z) | defined(TARGET_KL46Z) | defined(TARGET_K20D5M) | defined(TARGET_K64F)
+//#define DEBUG
+#ifdef DEBUG
+#define printd(fmt, ...) printf(fmt, __VA_ARGS__)
+#else
+#define printd(fmt, ...)
+#endif
+
 
 #include "USBHAL.h"
 
+// Critical section controls.  This module uses a bunch of static variables,
+// and much of the code that accesses the statics can be called from either
+// normal application context or IRQ context.  Whenever a shared variable is
+// accessed from code that can run in an application context, we have to
+// protect against interrupts by entering a critical section.  These macros
+// enable and disable the USB IRQ if we're running in application context.
+// (They do nothing if we're already in interrupt context, because the
+// hardware interrupt controller won't generated another of the same IRQ
+// that we're already handling.  We could still be interrupted by a different,
+// higher-priority IRQ, but our shared variables are only shared within this
+// module, so they won't be affected by other interrupt handlers.)
+static int inIRQ;
+#define ENTER_CRITICAL_SECTION \
+    if (!inIRQ) \
+        NVIC_DisableIRQ(USB0_IRQn);
+#define EXIT_CRITICAL_SECTION \
+    if (!inIRQ) \
+        NVIC_EnableIRQ(USB0_IRQn);
+
+// static singleton instance pointer
 USBHAL * USBHAL::instance;
 
-// DMAERR has occurred
-bool USB_DMAERR;
-
-
-// Perform an operation atomically, by disabling interrupts.  This must be
-// used for any test-and-write operations in non-IRQ code, including increment, 
-// decrement, bit set, and bit clear, on variables that are also written 
-// within the IRQ handler.  (Operations *within* the IRQ handler don't require 
-// this, because regular code can't interrupt an interrupt handler.)  This
-// applies in particular to 'epComplete' and 'Data1' updates, which are both
-// bit vectors that are written with bitwise test-and-set operations in
-// regular code and IRQ code.
-#define atomic(_stm_) \
-    do { \
-        NVIC_DisableIRQ(USB0_IRQn); \
-        _stm_; \
-        NVIC_EnableIRQ(USB0_IRQn); \
-    } while (0)
-    
 // Convert physical endpoint number to register bit
 #define EP(endpoint) (1<<(endpoint))
 
-// Convert physical to logical
+// Convert physical endpoint number to logical endpoint number.
+// Each logical endpoint has two physical endpoints, one RX and 
+// one TX.  The physical endpoints are numbered in RX,TX pairs,
+// so the logical endpoint number is simply the physical endpoint
+// number divided by 2 (discarding the remainder).
 #define PHY_TO_LOG(endpoint)    ((endpoint)>>1)
 
-// Get endpoint direction
+// Get a physical endpoint's direction.  IN and OUT are from
+// the host's perspective, so from our perspective on the device,
+// IN == TX and OUT == RX.  The physical endpoints are in RX,TX
+// pairs, so the OUT/RX is the even numbered element of a pair
+// and the IN/TX is the odd numbered element.
 #define IN_EP(endpoint)     ((endpoint) & 1U ? true : false)
 #define OUT_EP(endpoint)    ((endpoint) & 1U ? false : true)
 
-#define BD_OWN_MASK        (1<<7)
-#define BD_DATA01_MASK     (1<<6)
-#define BD_KEEP_MASK       (1<<5)
-#define BD_NINC_MASK       (1<<4)
-#define BD_DTS_MASK        (1<<3)
-#define BD_STALL_MASK      (1<<2)
+// BDT status flags, defined by the SIE hardware.  These are
+// bits packed into the 'info' byte of a BDT entry.
+#define BD_OWN_MASK        (1<<7)       // OWN - hardware SIE owns the BDT (TX/RX in progress)
+#define BD_DATA01_MASK     (1<<6)       // DATA01 - DATA0/DATA1 bit for current TX/RX on endpoint
+#define BD_KEEP_MASK       (1<<5)       // KEEP - hardware keeps BDT ownership after token completes
+#define BD_NINC_MASK       (1<<4)       // NO INCREMENT - buffer location is a FIFO, so use same address for all bytes
+#define BD_DTS_MASK        (1<<3)       // DATA TOGGLE SENSING - hardware SIE checks for DATA0/DATA1 match during RX/TX
+#define BD_STALL_MASK      (1<<2)       // STALL - SIE issues STALL handshake in reply to any host access to endpoint
 
-// Endpoint direction.  These are relative to the DEVICE:
-//  TX = transmit to host = "IN" endpoint
-//  RX = receive from host = "OUT" endpoint
+// Endpoint direction
 #define TX    1
 #define RX    0
 
-// Double-buffering parity.  The hadrware SIE has a native double-buffering
-// scheme that switches back and forth between two buffers per physical
-// endpoint.  We disable this feature, so we effectively only use the EVEN
-// buffer for each endpoint.
+// Buffer parity.  The hardware has a double-buffering scheme where each
+// physical endpoint has two associated BDT entries, labeled EVEN and ODD.
+// We disable the double buffering, so only the EVEN buffers are used in
+// this implementation.
 #define EVEN  0
 #define ODD   1
 
-// Get the BDT index for a given endpoint.  'logep' is the LOGICAL
-// endpoint number; 'dir' is the direction (TX or RX).  'odd' is
-// the even/odd flag for the hardware SIE double-buffering system.
-// We don't actually use the double-buffering scheme, so this is 
-// essentially always EVEN throughout this implementation.
-#define EP_BDT_IDX(logep, dir, odd) ((4*(logep)) + (2*(dir)) + (1*(odd)))
+// Get the BDT index for a given logical endpoint, direction, and buffer parity
+#define EP_BDT_IDX(logep, dir, odd) (((logep) * 4) + (2 * (dir)) + (1 *  (odd)))
 
-// get the BDT for a given physical endpoint
-#define PEP_BDT_IDX(phyep, odd) ((2*(phyep)) + (1*(odd)))
+// Get the BDT index for a given physical endpoint and buffer parity
+#define PEP_BDT_IDX(phyep, odd)  (((phyep) * 2) + (1 * (odd)))
 
-
+// Token types reported in the BDT 'info' flags.  
+#define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
 #define SETUP_TOKEN    0x0D
 #define IN_TOKEN       0x09
 #define OUT_TOKEN      0x01
-#define TOK_PID(idx)   ((bdt[idx].info >> 2) & 0x0F)
 
-
-// Buffer Descriptor Table (BDT) entry.
-//
-// Each entry in the BDT uses this structure, which is defined by
-// the hardware SIE design.
+// Buffer Descriptor Table (BDT) entry.  This is the hardware-defined
+// memory structure for the shared memory block controlling an endpoint.
 typedef struct BDT {
     uint8_t   info;       // BD[0:7]
     uint8_t   dummy;      // RSVD: BD[8:15]
-    uint16_t  byte_count; // BD[16:25] reserved[26:31]
+    uint16_t  byte_count; // BD[16:32]
     uint32_t  address;    // Addr
-} __attribute__((packed)) BDT;
+} BDT;
 
-// The Buffer Descriptor Table is a contiguous block of BDT entries
-// (the structure defined above).  This memory block is shared with the
-// hardware SIE to coordinate endpoint send/receive operations between
-// the hardware and software.  The hardware design requires this block
-// to be aligned on a 512-byte boundary; the location of the block is
-// stored in a hardware register that we set during initialization.
-//
 // There are:
-//    * 16 logical endpoints (EP0 .. EP15)
-//    * 2 physical endpoints (TX and RX) per logical endpoint -> 32 physical endpoints
-//    * two buffers per physical endpoint (EVEN and ODD) -> 64 buffers
+//    * 16 bidirectional logical endpoints -> 32 physical endpoints
+//    * 2 BDT entries per endpoint (EVEN/ODD) -> 64 BDT entries
 __attribute__((__aligned__(512))) BDT bdt[NUMBER_OF_PHYSICAL_ENDPOINTS * 2];
+
+// Transfer buffers.  We allocate the transfer buffers and point the
+// SIE hardware to them via the BDT.  We disable hardware SIE's
+// double-buffering (EVEN/ODD) scheme, so we only allocate one buffer
+// per physical endpoint.
 uint8_t *endpoint_buffer[NUMBER_OF_PHYSICAL_ENDPOINTS];
 
-// The device's bus address
+// Allocated size of each endpoint buffer
+size_t epMaxPacket[NUMBER_OF_PHYSICAL_ENDPOINTS];
+
+// SET ADDRESS mode tracking.  The address assignment has to be done in a
+// specific order and with specific timing defined by the USB setup protocol 
+// standards.  To get the sequencing right, we set a flag when we get the
+// address message, and then set the address in the SIE when we're at the 
+// right subsequent packet step in the protocol exchange.  These variables
+// are just a place to stash the information between the time we receive the
+// data and the time we're ready to update the SIE register.
+static uint8_t set_addr = 0;
 static uint8_t addr = 0;
 
-// Mode flag indicating that the host will send our assigned address
-// in the next packet
-static uint8_t set_addr = 0;
+// Endpoint DATA0/DATA1 bits, packed as a bit vector.  Each endpoint's
+// bit is at (1 << endpoint number).  These track the current bit value
+// on the endpoint.  For TX endpoints, this is the bit for the LAST
+// packet we sent (so the next packet will be the inverse).  For RX
+// endpoints, this is the bit value we expect for the NEXT packet.
+// (Yes, it's inconsistent.)
+static uint32_t Data1  = 0x55555555;
 
-// Private data per endpoint
-struct {
-    uint32_t flags;     // flags set when endpoint was realized
-    uint32_t bufsiz;    // allocated buffer size for endpoint
-} epInfo[NUMBER_OF_PHYSICAL_ENDPOINTS];
-    
-// Endpoint completion flags (one for each endpoint).  These flags indicate
-// if the last read or write operation on an endpoint has completed.  It's 
-// set by the interrupt handler when a send/receive operation completes.  
-// It's cleared by the readResult/writeResult routines on retrieving a 
-// successful completion status.
+// Endpoint read/write completion flags, packed as a bit vector.  Each 
+// endpoint's bit is at (1 << endpoint number).  A 1 bit signifies that
+// the last read or write has completed (and hasn't had its result 
+// consumed yet).
 static volatile int epComplete = 0;
 
-// clear a completion flag - must be atomic, as the IRQ also manipulates these bits
-#define clear_completion(endpoint) atomic(epComplete &= ~EP(endpoint))
-    
-// Endpoint ready flags.  These indicate if endpoints are ready for their
-// next read/write operations.  These are set by the read/write routines, and
-// cleared by the interrupt handler.  Note that these are *almost* the
-// inverse of the epComplete flags, but not quite: these aren't consumed by
-// the status checks.  We use these to check for clearance before starting an
-// operation, to make sure we're not attempting to overwrite a previous one.
-static volatile uint32_t epReady = 0;
-
-// set/clear the ready flag - must be atomic, as the IRQ also manipulates these bits
-#define set_ready(endpoint) atomic(epReady |= EP(endpoint))
-#define clear_ready(endpoint) atomic(epReady &= ~EP(endpoint))
-
-
-// DATA0/DATA1 setting for next packet on each endpoint
-static volatile uint32_t Data1  = 0x55555555;
-
-// set DATA0/DATA1 status on an endpoint (must be atomic as the IRQ handler
-// manpulates this variable)
-#define set_DATA0(endpoint) atomic(Data1 &= ~EP(endpoint))
-#define set_DATA1(endpoint) atomic(Data1 |= EP(endpoint))
-#define toggle_DATA01(endpoint) atomic(Data1 ^= EP(endpoint))
-
-static uint32_t frameNumber() {
+static uint32_t frameNumber() 
+{
     return((USB0->FRMNUML | (USB0->FRMNUMH << 8)) & 0x07FF);
 }
 
-uint32_t USBHAL::endpointReadcore(uint8_t endpoint, uint8_t *buffer) {
+uint32_t USBHAL::endpointReadcore(uint8_t endpoint, uint8_t *buffer) 
+{
     return 0;
 }
 
-USBHAL::USBHAL(void) 
-{
+USBHAL::USBHAL(void) {
     // Disable IRQ
     NVIC_DisableIRQ(USB0_IRQn);
 
@@ -214,45 +202,47 @@
     // enable OTG clock
     SIM->SCGC4 |= SIM_SCGC4_USBOTG_MASK;
 
+    // Attach IRQ
+    instance = this;
+    NVIC_SetVector(USB0_IRQn, (uint32_t)&_usbisr);
+    NVIC_EnableIRQ(USB0_IRQn);
+
     // USB Module Configuration
     // Reset USB Module
     USB0->USBTRC0 |= USB_USBTRC0_USBRESET_MASK;
     while(USB0->USBTRC0 & USB_USBTRC0_USBRESET_MASK);
 
     // Set BDT Base Register
-    USB0->BDTPAGE1 = (uint8_t)(uint32_t(bdt) >> 8);
-    USB0->BDTPAGE2 = (uint8_t)(uint32_t(bdt) >> 16);
-    USB0->BDTPAGE3 = (uint8_t)(uint32_t(bdt) >> 24);
+    USB0->BDTPAGE1 = (uint8_t)((uint32_t)bdt>>8);
+    USB0->BDTPAGE2 = (uint8_t)((uint32_t)bdt>>16);
+    USB0->BDTPAGE3 = (uint8_t)((uint32_t)bdt>>24);
 
     // Clear interrupt flag
     USB0->ISTAT = 0xff;
 
     // USB Interrupt Enablers
-    USB0->INTEN = USB_INTEN_TOKDNEEN_MASK |
-                  USB_INTEN_SOFTOKEN_MASK |
-                  USB_INTEN_ERROREN_MASK  |
-                  USB_INTEN_SLEEPEN_MASK |
-                  USB_INTEN_RESUMEEN_MASK |
-                  USB_INTEN_USBRSTEN_MASK |
-                  (1 << 7);                   // stall mask
-
-    // Attach IRQ
-    instance = this;
-    NVIC_SetVector(USB0_IRQn, (uint32_t)&_usbisr);
-    NVIC_EnableIRQ(USB0_IRQn);
+    USB0->INTEN |= USB_INTEN_TOKDNEEN_MASK |
+                   USB_INTEN_SOFTOKEN_MASK |
+                   USB_INTEN_ERROREN_MASK  |
+                   USB_INTEN_SLEEPEN_MASK |
+                   USB_INTEN_RESUMEEN_MASK |
+                   USB_INTEN_USBRSTEN_MASK;
 
     // Disable weak pull downs
     USB0->USBCTRL &= ~(USB_USBCTRL_PDE_MASK | USB_USBCTRL_SUSP_MASK);
+
     USB0->USBTRC0 |= 0x40;
 }
 
-USBHAL::~USBHAL(void) { }
+USBHAL::~USBHAL(void) 
+{ 
+}
 
 void USBHAL::connect(void) 
 {
     // enable USB
     USB0->CTL |= USB_CTL_USBENSOFEN_MASK;
-    
+
     // Pull up enable
     USB0->CONTROL |= USB_CONTROL_DPPULLUPNONOTG_MASK;
 }
@@ -266,21 +256,28 @@
     USB0->CONTROL &= ~USB_CONTROL_DPPULLUPNONOTG_MASK;
 
     //Free buffers if required:
-    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) {
-        free(endpoint_buffer[i]);
-        endpoint_buffer[i] = NULL;
+    for (int i = 0 ; i < NUMBER_OF_PHYSICAL_ENDPOINTS ; i++) 
+    {
+        if (endpoint_buffer[i] != NULL)
+        {
+            free(endpoint_buffer[i]);
+            endpoint_buffer[i] = NULL;
+        }
     }
 }
 
-void USBHAL::configureDevice(void) {
+void USBHAL::configureDevice(void) 
+{
     // not needed
 }
 
-void USBHAL::unconfigureDevice(void) {
+void USBHAL::unconfigureDevice(void) 
+{
     // not needed
 }
 
-void USBHAL::setAddress(uint8_t address) {
+void USBHAL::setAddress(uint8_t address) 
+{
     // we don't set the address now otherwise the usb controller does not ack
     // we set a flag instead
     // see usbisr when an IN token is received
@@ -290,73 +287,74 @@
 
 bool USBHAL::realiseEndpoint(uint8_t endpoint, uint32_t maxPacket, uint32_t flags) 
 {
-    uint32_t handshake_flag;
-    uint32_t alo_size;
-
+    // validate the endpoint number
     if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
         return false;
 
+    // get the logical endpoint
     uint32_t log_endpoint = PHY_TO_LOG(endpoint);
     
-    if (flags & ISOCHRONOUS) {
-        handshake_flag = 0;
-        alo_size = 1023;
-    }
-    else {
-        handshake_flag = USB_ENDPT_EPHSHK_MASK;
-        alo_size = 64;
-    }
-    
-    // if we have a buffer, but the new buffer size is bigger, delete the old one
-    if (endpoint_buffer[endpoint] != 0 && epInfo[endpoint].bufsiz < maxPacket) {
-        free(endpoint_buffer[endpoint]);
-        endpoint_buffer[endpoint] = 0;
-        epInfo[endpoint].bufsiz = 0;
-    }
-    
-    // allocate a new buffer if we don't already have one    
-    if (endpoint_buffer[endpoint] == 0) {
-        endpoint_buffer[endpoint] = (uint8_t *)malloc(alo_size);
-        epInfo[endpoint].bufsiz = alo_size;
-    }
-    
-    // remember the realisation flags for the endpoint
-    epInfo[endpoint].flags = flags;
+    // Constrain the requested packet size to the maximum for the endpoint type.
+    // Full Speed USB allows up to 1023 bytes for isochronous endpoints and 64 bytes
+    // for bulk and interrupt endpoints.
+    uint32_t realMaxPacket = ((flags & ISOCHRONOUS) ? 1023 : 64);
+    if (maxPacket > realMaxPacket)
+        maxPacket = realMaxPacket;
+
+    // Use the HANDSHAKE flag for non-isochronous endpoints.  Don't use handshaking
+    // for an iso endpoint, since this type of endpoint is for applications like
+    // audio and video streaming where it's preferable to ignore lost packets and
+    // just carry on with the latest data.
+    uint32_t ctlFlags = 0;
+    if (!(flags & ISOCHRONOUS))
+        ctlFlags |= USB_ENDPT_EPHSHK_MASK;
+        
+    // figure the RX/TX based on the endpoint direction
+    ctlFlags |= (IN_EP(endpoint) ? USB_ENDPT_EPTXEN_MASK : USB_ENDPT_EPRXEN_MASK);
 
-    // Set DATA1 initially on the endpoint.  If it's an RX endpoint, we're about to start
-    // a read with DATA0, so the next read will be DATA1.  If it's a TX endpoint, we'll
-    // actually want to send DATA0 for the first packet, but for that we need the flag
-    // to be set to DATA1, since the write routine always inverts the flag *before*
-    // using it.
-    set_DATA1(endpoint);
-    
-    // the endpoint hasn't completed its first operation yet
-    clear_completion(endpoint);
-    
-    if (IN_EP(endpoint)) 
+    ENTER_CRITICAL_SECTION
     {
-        // IN endpt -> device to host (TX)
-        set_ready(endpoint);
-        USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag |        // ep handshaking (not if iso endpoint)
-                                              USB_ENDPT_EPTXEN_MASK;  // en TX (IN) tran
-        bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].info       = 0;
-        bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].info       = 0;
-        bdt[EP_BDT_IDX(log_endpoint, TX, EVEN)].address = (uint32_t) endpoint_buffer[endpoint];
-        bdt[EP_BDT_IDX(log_endpoint, TX, ODD )].address = 0;
+        // if we don't already have a buffer that's big enough, allocate a new one
+        uint8_t *buf = endpoint_buffer[endpoint];
+        if (maxPacket > epMaxPacket[endpoint])
+        {
+            // free any existing buffer
+            if (buf != 0)
+                free(buf);
+                
+            // allocate a new one
+            epMaxPacket[endpoint] = maxPacket;
+            endpoint_buffer[endpoint] = buf = (uint8_t *)malloc(maxPacket);
+        }
+        
+        // Set up the BDT entry.  Note that we disable the hardware double-buffering
+        // scheme, so we only use the EVEN buffer for the endpoint.
+        int idx = PEP_BDT_IDX(endpoint, EVEN);
+        bdt[idx].info = 0;
+        bdt[idx].address = (uint32_t)buf;
+        
+        // Set the endpoint flags.  Note that these bits are additive, since the
+        // endpoint register represents the logical endpoint, which is the combination
+        // of the physical IN and OUT endpoints.
+        USB0->ENDPOINT[log_endpoint].ENDPT |= ctlFlags;
+    
+        // If this is an OUT endpoint, queue the first read on the endpoint by
+        // handing ownership of the BDT to the SIE.
+        if (OUT_EP(endpoint))
+        {
+            bdt[idx].byte_count = maxPacket;
+            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
+        }
+    
+        // Set DATA1 on the endpoint.  For RX endpoints, we just queued up our first
+        // read, which will always be a DATA0 packet, so the next read will use DATA1.
+        // For TX endpoints, we always flip the bit *before* sending the packet, so
+        // (counterintuitively) we need to set the DATA1 bit now to send DATA0 in the
+        // next packet.  So in either case, we want DATA1 initially.
+        Data1 |= (1 << endpoint);
     }
-    else 
-    {
-        // OUT endpt -> host to device (RX)
-        clear_ready(endpoint);
-        USB0->ENDPOINT[log_endpoint].ENDPT |= handshake_flag |        // ep handshaking (not if iso endpoint)
-                                              USB_ENDPT_EPRXEN_MASK;  // en RX (OUT) tran.
-        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].byte_count = maxPacket;
-        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].address    = (uint32_t) endpoint_buffer[endpoint];
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].address    = 0;
-        bdt[EP_BDT_IDX(log_endpoint, RX, EVEN)].info       = BD_OWN_MASK | BD_DTS_MASK;
-        bdt[EP_BDT_IDX(log_endpoint, RX, ODD )].info       = 0;
-    }
-
+    EXIT_CRITICAL_SECTION
+    
     // success
     return true;
 }
@@ -365,270 +363,212 @@
 void USBHAL::EP0setup(uint8_t *buffer) 
 {
     uint32_t sz;
-    if (endpointReadResult(EP0OUT, buffer, &sz) != EP_COMPLETED)
-        printf("EP0setup not ready\r\n"); // $$$
-    if (sz != 8) 
-        printf("EP0setup - wrong size! %d bytes, expected 8 bytes\r\n", sz); // $$$
+    endpointReadResult(EP0OUT, buffer, &sz);
 }
 
 void USBHAL::EP0readStage(void) 
 {
-    uint32_t idx = EP_BDT_IDX(PHY_TO_LOG(EP0OUT), RX, 0);
-    set_DATA0(EP0OUT);                              // set DATA0 for the next packet
-    { // if (epReady & EP(EP0OUT)) {
-        bdt[idx].byte_count = MAX_PACKET_SIZE_EP0;
-        bdt[idx].info = (BD_DTS_MASK | BD_OWN_MASK);    // start the read
+    if (!(bdt[0].info & BD_OWN_MASK))
+    {
+        Data1 &= ~1UL;  // set DATA0
+        bdt[0].byte_count = MAX_PACKET_SIZE_EP0;
+        bdt[0].info = (BD_DTS_MASK | BD_OWN_MASK);
     }
-#ifdef DEBUG_PRINTF
- //   else printf("!!! EP0readStage() not ready !!!\r\n");
-#endif
 }
 
 void USBHAL::EP0read(void) 
 {
+    if (!bdt[0].info & BD_OWN_MASK)
+        bdt[0].byte_count = MAX_PACKET_SIZE_EP0;
 }
 
 uint32_t USBHAL::EP0getReadResult(uint8_t *buffer) 
 {
     uint32_t sz;
-    while (endpointReadResult(EP0OUT, buffer, &sz) == EP_PENDING) ;
+    endpointReadResult(EP0OUT, buffer, &sz);
     return sz;
 }
 
-void USBHAL::EP0write(uint8_t *buffer, uint32_t size) {
-    while (endpointWrite(EP0IN, buffer, size) == EP_BUSY) ;
+void USBHAL::EP0write(uint8_t *buffer, uint32_t size) 
+{
+    endpointWrite(EP0IN, buffer, size);
 }
 
-void USBHAL::EP0getWriteResult(void) {
+void USBHAL::EP0getWriteResult(void) 
+{
 }
 
 void USBHAL::EP0stall(void) 
 {
+    printd("EP0 stall!\r\n");
     stallEndpoint(EP0OUT);
 }
 
 EP_STATUS USBHAL::endpointRead(uint8_t endpoint, uint32_t maximumSize) 
 {
+    // We always start a new read when we fetch the result of the
+    // previous read, so we don't have to do anything here.  Simply
+    // indicate that the read is pending so that the caller can proceed
+    // to check the results.
     return EP_PENDING;
 }
 
 EP_STATUS USBHAL::endpointReadResult(uint8_t endpoint, uint8_t *buffer, uint32_t *bytesRead) 
 {
-    // validate the endpoint number
-    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
+    // validate the endpoint number and direction
+    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS || !OUT_EP(endpoint))
         return EP_INVALID;
 
-    // we can only read an OUT endpoint
-    if (IN_EP(endpoint))
-        return EP_INVALID;
-
-    // get the logical endpoint number
+    // get the logical endpoint
     uint32_t log_endpoint = PHY_TO_LOG(endpoint);
-
+    
+    // get the mode - it's isochronous if it doesn't have the handshake flag
+    bool iso = !(USB0->ENDPOINT[log_endpoint].ENDPT & USB_ENDPT_EPHSHK_MASK);
+    
     // get the BDT index
     int idx = EP_BDT_IDX(log_endpoint, RX, 0);
- 
-#if 1 // $$$$   
-#ifdef DEBUG_PRINTF
-    if (endpoint == EP0OUT && !(epReady & EP(endpoint))) printf("!!! EP0 read not ready\r\n");
-#endif
-    // if the endpoint isn't ready, the read is pending
-    if (!(epReady & EP(endpoint)))
+        
+    // If the "complete" flag isn't set, the read is still pending in the SIE.
+    // This doesn't apply the isochronous endpoints, since we don't get TOKDNE
+    // interrupts on those (we use the SOF signal instead).  It also doesn't
+    // apply to endpoint 0, since that doesn't use the epComplete mechanism
+    // at all (necessary because we handle all transactions on this endpoint
+    //  in IRQ context).  For EP0, just make sure the hardware doesn't still
+    // own the BDT - if it does, the last read hasn't completed yet.
+    if (log_endpoint == 0)
+    {
+        // control endpoint - just make sure we own the BDT
+        if (bdt[idx].info & BD_OWN_MASK)
+            return EP_PENDING;
+    }
+    else if (!iso && !(epComplete & EP(endpoint)))
         return EP_PENDING;
-#else
-    // If the endpoint isn't ready, the read is still pending.  Ignore this
-    // for EP0, since the control procotol inherently serializes packets on
-    // both the RX and TX endpoints.
-    if (endpoint != EP0OUT && !(epReady & EP(endpoint)))
-        return EP_PENDING;
-#endif
+    
+    // get the buffer
+    uint8_t *ep_buf = endpoint_buffer[endpoint];
 
-    // make sure we're marked as complete (except for isochronous endpoints)
-    bool iso = epInfo[log_endpoint].flags & ISOCHRONOUS;
-    if ((log_endpoint != 0) && !iso && !(epComplete & EP(endpoint)))
-        return EP_PENDING;
-
-    // this consumes the read status - clear the completion flag to prepare for the next read
-    clear_completion(endpoint);
+    ENTER_CRITICAL_SECTION
+    {
+        // get the packet size from the BDT
+        uint32_t sz  = bdt[idx].byte_count;
     
-    // the new operation is pending, so the endpoint is not ready
-    clear_ready(endpoint);
-
-    // copy the data from the hardware buffer to the caller's buffer
-    uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
-    uint32_t sz = *bytesRead = bdt[idx].byte_count;
-    for (uint32_t n = 0; n < sz; n++)
-        buffer[n] = ep_buf[n];
+        // note if it's a SETUP token    
+        bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
     
-    // Figure the DATA0/DATA1 value for the next packet.  If we have a
-    // SETUP token with no data stage, it's DATA0.  Otherwise we just
-    // flip the last value, assuming the new token matches what we expect.
-    bool setup = (log_endpoint == 0 && TOK_PID(idx) == SETUP_TOKEN);
-    if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) {
-        if (setup && (buffer[6] == 0)) {
-            set_DATA0(EP0OUT);          // no setup data stage, so set DATA0
+        // copy the data
+        memcpy(buffer, ep_buf, sz);
+        *bytesRead = sz;
+
+        // Figure the DATA0/DATA1 bit for the next packet received on this
+        // endpoint.  The bit normally toggles on each packet, but it's
+        // special for SETUP packets on endpoint 0.  The next OUT packet
+        // after a SETUP packet with no data stage is always DATA0, even
+        // if the SETUP packet was also DATA0.
+        if (((Data1 >> endpoint) & 1) == ((bdt[idx].info >> 6) & 1)) 
+        {
+            if (setup && (buffer[6] == 0))  // if no setup data stage,
+                Data1 &= ~1UL;              // set DATA0
+            else
+                Data1 ^= (1 << endpoint);   // otherwise just toggle the last bit
         }
-        else {
-            toggle_DATA01(endpoint);    // in other cases, toggle the DATA0/DATA1 status on each packet 
-        }
-    }
 
-    // hand off the BDT to start the next read
-    bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
-
-    // If this is a SETUP packet, clear the "suspend token busy" hardware flag.
-    // The SIE sets this bit during setup to allow non-interrupted processing.
-    if (setup)
+        // set up the BDT entry to receive the next packet, and hand it to the SIE to fill    
+        bdt[idx].byte_count = epMaxPacket[endpoint];
+        bdt[idx].info = BD_DTS_MASK | BD_OWN_MASK | ((Data1 >> endpoint) << 6);
+    
+        // clear the SUSPEND TOKEN BUSY flag to allow the SIE to continue processing tokens
         USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
     
-    // operation completed
+        // clear the completion flag
+        epComplete &= ~EP(endpoint);
+    }
+    EXIT_CRITICAL_SECTION
+        
+    // the read is completed
     return EP_COMPLETED;
 }
 
 EP_STATUS USBHAL::endpointWrite(uint8_t endpoint, uint8_t *data, uint32_t size) 
 {
-    // validate the endpoint number
-    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS)
-        return EP_INVALID;
-
-    // if write on a OUT endpoint -> error
-    if (OUT_EP(endpoint))
+    // validate the endpoint number and direction
+    if (endpoint >= NUMBER_OF_PHYSICAL_ENDPOINTS || !IN_EP(endpoint))
         return EP_INVALID;
 
-    // get the BDT entry for the endpoint
+    // get the BDT index
     int idx = EP_BDT_IDX(PHY_TO_LOG(endpoint), TX, 0);
+    
+    // get the buffer pointer
+    uint8_t *buf = endpoint_buffer[endpoint];
 
-#if 0
-#ifdef DEBUG_PRINTF
-    if (endpoint == EP0IN && !(epReady & EP(endpoint))) printf("!!! EP0 write not ready\r\n");
-#endif
-    // $$$
-    // if the endpoint isn't ready, return BUSY, since we have an outstanding
-    // write that hasn't completed yet
-    if (!(epReady & EP(endpoint)))
-        return EP_BUSY;
-#else
-    // If the endpoint isn't marked as ready, return BUSY, since we have an
-    // outstanding write on the endpoint that hasn't completed yet.  Skip this
-    // test for EP0, since EP0 is handled entirely through interrupt callbacks
-    // and packet exchange is inherently serialized by the protocol.  
-    if (endpoint != EP0IN && !(epReady & EP(endpoint)))
-        return EP_BUSY;
-#endif
+    ENTER_CRITICAL_SECTION
+    {
+        // copy the data to the hardware buffer
+        bdt[idx].byte_count = size;
+        memcpy(buf, data, size);
 
-    // copy the bytes to the endpoint hardware buffer
-    uint8_t *ep_buf = (uint8_t *)bdt[idx].address;
-    bdt[idx].byte_count = size;
-    for (uint32_t n = 0 ; n < size ; n++)
-        ep_buf[n] = data[n];
+        // flip the DATA1 bit before sending    
+        Data1 ^= (1 << endpoint);
 
-    // toggle DATA0/DATA1 on each packet
-    toggle_DATA01(endpoint);
-    
-    // mark the endpoint as not ready and not complete
-    clear_completion(endpoint);
-    clear_ready(endpoint);
+        // hand the BDT to the SIE hardware, and set the current DATA1 bit
+        bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | ((Data1 >> endpoint) << 6);
+    }
+    EXIT_CRITICAL_SECTION
 
-    // hand the buffer to the SIE
-    bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK | (((Data1 >> endpoint) & 1) << 6);
-
-    // the packet is now pending
+    // the operation is now pending
     return EP_PENDING;
 }
 
 EP_STATUS USBHAL::endpointWriteResult(uint8_t endpoint) 
 {
-    // if the endpoint is ready, and the completion flag is set, we're done
-    uint32_t mask = EP(endpoint);
-    if ((epReady & mask) && (epComplete & mask)) {
-        clear_completion(endpoint);
-        return EP_COMPLETED;
+    EP_STATUS result = EP_PENDING;
+
+    ENTER_CRITICAL_SECTION
+    {
+        if (epComplete & EP(endpoint)) {
+            epComplete &= ~EP(endpoint);
+            result = EP_COMPLETED;
+        }
     }
+    EXIT_CRITICAL_SECTION
     
-    return EP_PENDING;
+    return result;
 }
 
 void USBHAL::stallEndpoint(uint8_t endpoint) 
 {
-#ifdef DEBUG_PRINTF
-    printf("Stall %d\r\n", endpoint);
-#endif
-    int idx = PEP_BDT_IDX(endpoint, EVEN);
-    if ((epReady & EP(endpoint)) && !(bdt[idx].info & BD_OWN_MASK))
-        bdt[idx].info |= BD_STALL_MASK;
-    else
-        USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
+    USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT |= USB_ENDPT_EPSTALL_MASK;
 }
 
 void USBHAL::unstallEndpoint(uint8_t endpoint) 
 {
-#ifdef DEBUG_PRINTF
-    printf("Unstall %d\r\n", endpoint);
-#endif
-
-    // get the physical endpoint
-    int logep = PHY_TO_LOG(endpoint);
-    
-    // figure the new endpoint control register flags
-    uint8_t f = IN_EP(endpoint) ? USB_ENDPT_EPTXEN_MASK : USB_ENDPT_EPRXEN_MASK;
-    if (logep != 0)
+    printd("unstall endpoint %d %s\r\n", endpoint>>1,endpoint&1?"TX":"RX");
+    ENTER_CRITICAL_SECTION
     {
-        // add the HANDSHAKE flag if it's not isochronous
-        if (!(epInfo[endpoint].flags & ISOCHRONOUS))
-            f |= USB_ENDPT_EPHSHK_MASK;
+        USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
+        int idx = PEP_BDT_IDX(endpoint, 0);
+        bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK | BD_DATA01_MASK);
+        Data1 &= ~(1 << endpoint);
     }
-            
-    // set the new flags
-    USB0->ENDPOINT[logep].ENDPT = f;
-    
-    // clear the STALL bit in the endponit control register
-    USB0->ENDPOINT[logep].ENDPT &= ~USB_ENDPT_EPSTALL_MASK;
-    
-    // clear the stall bit in the BDT, and take ownership
-    int idx = PEP_BDT_IDX(endpoint, EVEN);
-    bdt[idx].info &= ~(BD_OWN_MASK | BD_STALL_MASK);
-    
-    // the endpoint is now NOT COMPLETED
-    clear_completion(endpoint);
-    
-    // Set the data flag to DATA1 for the endpoint.  If it's TX, the next
-    // packet sent needs to be DATA0, and write flips the bit first, so we
-    // want it to be DATA1 now.  If it's RX, we're about to start a read
-    // with DATA0, so the next read will be back to DATA1.
-    set_DATA1(endpoint);
-
-    // check the endpoint type for final settings
-    if (IN_EP(endpoint))
-    {
-        // TX endpoint - mark as ready so that next write operation can proceed,
-        // and set the DATA1 flag so that the next packet will be sent as DATA0
-        // (the write opereation flips the bit before sending)
-        set_ready(endpoint);
-    }
-    else
-    {
-        // RX endpoint - start next read operation with DATA0
-        if (epReady & EP(endpoint))
-        {
-            clear_ready(endpoint);
-            bdt[idx].byte_count = epInfo[endpoint].bufsiz;
-            bdt[idx].info = BD_OWN_MASK | BD_DTS_MASK;
-        }
-    }
+    EXIT_CRITICAL_SECTION
 }
 
 bool USBHAL::getEndpointStallState(uint8_t endpoint) 
 {
-    return (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK) != 0;
+    uint8_t stall = (USB0->ENDPOINT[PHY_TO_LOG(endpoint)].ENDPT & USB_ENDPT_EPSTALL_MASK);
+    return (stall) ? true : false;
 }
 
-void USBHAL::remoteWakeup(void) {
+void USBHAL::remoteWakeup(void) 
+{
     // [TODO]
 }
 
 
-void USBHAL::_usbisr(void) {
+void USBHAL::_usbisr(void) 
+{
+    inIRQ = true;
     instance->usbisr();
+    inIRQ = false;
 }
 
 
@@ -638,32 +578,22 @@
     uint8_t istat = USB0->ISTAT;
 
     // reset interrupt
-    if (istat & USB_ISTAT_USBRST_MASK) 
-    {
-#ifdef DEBUG_PRINTF
-       // printf("\r\n>>> USB RESET\r\n");
-#endif
-
+    if (istat & USB_ISTAT_USBRST_MASK) {
+        
         // disable all endpt
-        for(i = 0 ; i < 16 ; i++) {
+        for(i = 0 ; i < 16 ; i++)
             USB0->ENDPOINT[i].ENDPT = 0x00;
-        }
 
         // enable control endpoint
         realiseEndpoint(EP0OUT, MAX_PACKET_SIZE_EP0, 0);
         realiseEndpoint(EP0IN, MAX_PACKET_SIZE_EP0, 0);
 
         Data1 = 0x55555555;
-        epComplete = 0;
-        
-        // Reset the SIE even/odd buffer state.  We keep this bit
-        // set throughout the session, which makes the EVEN buffer
-        // the only one that's ever used, effectively disabling the
-        // double-buffering system.
         USB0->CTL |=  USB_CTL_ODDRST_MASK;
 
-        USB0->ERRSTAT =  0xBF;  // clear all error flags
-        USB0->ERREN   =  0x9F; // 0xBF;  // enable error interrupt sources 
+        USB0->ISTAT   =  0xFF;  // clear all interrupt status flags
+        USB0->ERRSTAT =  0xFF;  // clear all error flags
+        USB0->ERREN   =  0xFF;  // enable error interrupt sources
         USB0->ADDR    =  0x00;  // set default address
         
         // notify upper layers of the bus reset
@@ -671,175 +601,115 @@
         
         // we're not suspended
         suspendStateChanged(0);
-
-        // clear all interrupt status flags and return        
-        USB0->ISTAT = 0xFF;
+        
+        // return now - do no more processing on a RESET interrupt
         return;
     }
 
     // resume interrupt
-    if (istat & USB_ISTAT_RESUME_MASK) {
+    if (istat & USB_ISTAT_RESUME_MASK) 
+    {
         suspendStateChanged(0);
         USB0->ISTAT = USB_ISTAT_RESUME_MASK;
-        return;
     }
 
     // SOF interrupt
-    if (istat & USB_ISTAT_SOFTOK_MASK) {
-        // SOF event, read frame number
+    if (istat & USB_ISTAT_SOFTOK_MASK) 
+    {
+        // Read frame number and signal the SOF event to the callback
         SOF(frameNumber());
         USB0->ISTAT = USB_ISTAT_SOFTOK_MASK;
-        return;
     }
 
     // stall interrupt
     if (istat & USB_ISTAT_STALL_MASK)
     {
-#ifdef DEBUG_PRINTF
-        printf("\r\n>>>>STALL INTERRUPT\r\n");
-#endif
-        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK) {
+        // if the control endpoint (EP 0) is stalled, unstall it
+        if (USB0->ENDPOINT[0].ENDPT & USB_ENDPT_EPSTALL_MASK)
+        {
+            unstallEndpoint(EP0OUT);
             unstallEndpoint(EP0IN);
-            unstallEndpoint(EP0OUT);
         }
+        
+        // clear the SUSPEND flag to allow token processing to continue
         USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
-        USB0->ISTAT = USB_ISTAT_STALL_MASK;
-        return;
+        USB0->ISTAT |= USB_ISTAT_STALL_MASK;
     }
 
-    // TOKEN DONE interrupt
-    if (istat & USB_ISTAT_TOKDNE_MASK)
+    // token interrupt
+    if (istat & USB_ISTAT_TOKDNE_MASK) 
     {
-        uint32_t stat = USB0->STAT;
-        uint32_t num  = (stat >> 4) & 0x0F;
-        uint32_t dir  = (stat >> 3) & 0x01;
-        uint32_t ev_odd = (stat >> 2) & 0x01;
+        uint32_t num  = (USB0->STAT >> 4) & 0x0F;
+        uint32_t dir  = (USB0->STAT >> 3) & 0x01;
+        uint32_t ev_odd = (USB0->STAT >> 2) & 0x01;
         int endpoint = (num << 1) | dir;
-        int idx = EP_BDT_IDX(num, dir, ev_odd);
-        int pid = TOK_PID(idx);
-        int len = bdt[idx].byte_count;
-        
-        if (ev_odd) printf("!!! odd BDT entry???\r\n");//$$$
 
-        // endpoint is ready for next operation
-        epReady |= EP(endpoint);
-        // if (num == 0) epReady |= EP(EP0OUT) | EP(EP0IN);
-        
-        //$$$for (int i = 0 ; (bdt[idx].info & BD_OWN_MASK) && i < 10000000 ; ++i) ;
-        //$$$static int unrel;
-       //$$$ if (bdt[idx].info & BD_OWN_MASK) { printf("\r\nIRQ: BDT wasn't released from SIE to CPU (ep=%d)\r\n", endpoint); } 
-        if (pid == SETUP_TOKEN && len != 8) {
-            pid = OUT_TOKEN;
-          //$$$  printf("IRQ: SETUP packet has wrong size (%d)\r\n", bdt[EP_BDT_IDX(num, dir, ev_odd)].byte_count);//$$$
-        }
-            
         // setup packet
-        if (num == 0 && pid == SETUP_TOKEN)
+        if (num == 0 && dir == RX && TOK_PID(EP_BDT_IDX(num, dir, ev_odd)) == SETUP_TOKEN) 
         {
             Data1 &= ~0x02;
-            //bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
-            //bdt[EP_BDT_IDX(0, TX, ODD )].info &= ~BD_OWN_MASK;
-
+            bdt[EP_BDT_IDX(0, TX, EVEN)].info &= ~BD_OWN_MASK;
+            bdt[EP_BDT_IDX(0, TX, ODD)].info  &= ~BD_OWN_MASK;
+            
             // EP0 SETUP event (SETUP data received)
             EP0setupCallback();
         } 
-        else if (pid == OUT_TOKEN)
+        else 
         {
             // OUT packet
-            if (num == 0)
-                EP0out();
-            else 
+            if (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == OUT_TOKEN) 
             {
-                epComplete |= EP(endpoint);
-                if ((instance->*(epCallback[endpoint - 2]))())
-                    epComplete &= ~EP(endpoint);
-            }
-        }
-        else if (pid == IN_TOKEN)
-        {
-            // IN packet
-            if (num == 0) 
-            {
-                EP0in();
-                if (set_addr == 1) 
-                {
-                    USB0->ADDR = addr & 0x7F;
-                    set_addr = 0;
+                if (num == 0)
+                    EP0out();
+                else {
+                    epComplete |= EP(endpoint);
+                    if ((instance->*(epCallback[endpoint - 2]))()) {
+                        epComplete &= ~EP(endpoint);
+                    }
                 }
             }
-            else 
+
+            // IN packet
+            if (TOK_PID((EP_BDT_IDX(num, dir, ev_odd))) == IN_TOKEN) 
             {
-                epComplete |= EP(endpoint);
-                if ((instance->*(epCallback[endpoint - 2]))())
-                    epComplete &= ~EP(endpoint);
+                if (num == 0) {
+                    EP0in();
+                    if (set_addr == 1) {
+                        USB0->ADDR = addr & 0x7F;
+                        set_addr = 0;
+                    }
+                }
+                else {
+                    epComplete |= EP(endpoint);
+                    if ((instance->*(epCallback[endpoint - 2]))()) {
+                        epComplete &= ~EP(endpoint);
+                    }
+                }
             }
         }
 
+        USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
         USB0->ISTAT = USB_ISTAT_TOKDNE_MASK;
-        return;
     }
 
     // sleep interrupt
     if (istat & USB_ISTAT_SLEEP_MASK) 
     {
-#ifdef DEBUG_PRINTF
-        //printf("\r\n>>> USB SLEEP INTERRUPT\r\n");
-#endif
         suspendStateChanged(1);
-        USB0->ISTAT = USB_ISTAT_SLEEP_MASK;
-        return;
+        USB0->ISTAT |= USB_ISTAT_SLEEP_MASK;
     }
 
     // error interrupt
     if (istat & USB_ISTAT_ERROR_MASK) 
     {
-        uint32_t errstat = USB0->ERRSTAT;
-#ifdef DEBUG_PRINTF
-        printf("\r\n>>>>ERROR INTERRUPT: ERRSTAT=%x\r\n", errstat);
-#endif
-        if (errstat & 0x20)
-            USB_DMAERR = true;
-
-        // reset the error status
-        USB0->ERRSTAT = 0xBF;
-        
-        // allow the SIE to continue token processing
+        // reset all error status bits, and clear the SUSPEND flag to allow
+        // token processing to continue
+        USB0->ERRSTAT = 0xFF;
         USB0->CTL &= ~USB_CTL_TXSUSPENDTOKENBUSY_MASK;
-
-        // reset the interrupt
-        USB0->ISTAT = USB_ISTAT_ERROR_MASK;
-        return;
+        USB0->ISTAT |= USB_ISTAT_ERROR_MASK;
     }
 }
 
-
-#ifdef DEBUG_PRINTF
-void USBDeviceStatusDump()
-{
-    printf("USBDevice status:\r\n");
-    printf("        0  0  1  1  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10 10 11 11 12 12 13 13 14 14 15 15\r\n");
-    printf("       RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX RX TX\r\n");
-    printf("Data1  ");
-    for (int i = 0 ; i < 32 ; ++i)
-        printf(" %d ", (Data1 >> i) & 1);
-    printf("\r\n"
-           "Cmplt  ");
-    for (int i = 0 ; i < 32 ; ++i)
-        printf(" %d ", (epComplete >> i) & 1);
-    printf("\r\n"
-           "Ready  ");
-    for (int i = 0 ; i < 32 ; ++i)
-        printf(" %d ", (epReady >> i) & 1);
-    printf("\r\n"
-           "OWN    ");
-    for (int i = 0 ; i < 16 ; ++i)
-        printf(" %d  %d ", (bdt[EP_BDT_IDX(i, RX, 0)].info >> 7) & 1, (bdt[EP_BDT_IDX(i, TX, 0)].info >> 7) & 1);
-    printf("\r\n"
-           "Stall  ");
-    for (int i = 0 ; i < 16 ; ++i)
-        printf("  %s   ", (USB0->ENDPOINT[i].ENDPT & USB_ENDPT_EPSTALL_MASK) ? "S" : ".");
-    printf("\r\n\r\n");
-}
-#endif
+void USBDeviceStatusDump() { }
 
 #endif