Mirror with some correction
Dependencies: mbed FastIO FastPWM USBDevice
Diff: main.cpp
- Revision:
- 54:fd77a6b2f76c
- Parent:
- 53:9b2611964afc
- Child:
- 55:4db125cd11a0
--- a/main.cpp Fri Apr 22 17:58:35 2016 +0000 +++ b/main.cpp Sat Apr 30 17:43:38 2016 +0000 @@ -291,9 +291,9 @@ for (;;) { diagLED(1, 0, 0); - wait(.2); + wait_us(200000); diagLED(1, 0, 1); - wait(.2); + wait_us(200000); } } @@ -1713,21 +1713,212 @@ bool waitForConnect, bool enableJoystick, bool useKB) : USBJoystick(vendor_id, product_id, product_release, waitForConnect, enableJoystick, useKB) { - suspended_ = false; + sleeping_ = false; + reconnectPending_ = false; + timer_.start(); + } + + // show diagnostic LED feedback for connect state + void diagFlash() + { + if (!configured() || sleeping_) + { + // flash once if sleeping or twice if disconnected + for (int j = isConnected() ? 1 : 2 ; j > 0 ; --j) + { + // short red flash + diagLED(1, 0, 0); + wait_us(50000); + diagLED(0, 0, 0); + wait_us(50000); + } + } } // are we connected? int isConnected() { return configured(); } - // Are we in suspend mode? - int isSuspended() const { return suspended_; } + // Are we in sleep mode? If true, this means that the hardware has + // detected no activity on the bus for 3ms. This happens when the + // cable is physically disconnected, the computer is turned off, or + // the connection is otherwise disabled. + bool isSleeping() const { return sleeping_; } + + // If necessary, attempt to recover from a broken connection. + // + // This is a hack, to work around an apparent timing bug in the + // KL25Z USB implementation that I haven't been able to solve any + // other way. + // + // The issue: when we have an established connection, and the + // connection is broken by physically unplugging the cable or by + // rebooting the PC, the KL25Z sometimes fails to reconnect when + // the physical connection is re-established. The failure is + // sporadic; I'd guess it happens about 25% of the time, but I + // haven't collected any real statistics on it. + // + // The proximate cause of the failure is a deadlock in the SETUP + // protocol between the host and device that happens around the + // point where the PC is requesting the configuration descriptor. + // The exact point in the protocol where this occurs varies slightly; + // it can occur a message or two before or after the Get Config + // Descriptor packet. No matter where it happens, the nature of + // the deadlock is the same: the PC thinks it sees a STALL on EP0 + // from the device, so it terminates the connection attempt, which + // stops further traffic on the cable. The KL25Z USB hardware sees + // the lack of traffic and triggers a SLEEP interrupt (a misnomer + // for what should have been called a BROKEN CONNECTION interrupt). + // Both sides simply stop talking at this point, so the connection + // is effectively dead. + // + // The strange thing is that, as far as I can tell, the KL25Z isn't + // doing anything to trigger the STALL on its end. Both the PC + // and the KL25Z are happy up until the very point of the failure + // and show no signs of anything wrong in the protocol exchange. + // In fact, every detail of the protocol exchange up to this point + // is identical to every successful exchange that does finish the + // whole setup process successfully, on both the KL25Z and Windows + // sides of the connection. I can't find any point of difference + // between successful and unsuccessful sequences that suggests why + // the fateful message fails. This makes me suspect that whatever + // is going wrong is inside the KL25Z USB hardware module, which + // is a pretty substantial black box - it has a lot of internal + // state that's inaccessible to the software. Further bolstering + // this theory is a little experiment where I found that I could + // reproduce the exact sequence of events of a failed reconnect + // attempt in an *initial* connection, which is otherwise 100% + // reliable, by inserting a little bit of artifical time padding + // (200us per event) into the SETUP interrupt handler. My + // hypothesis is that the STALL event happens because the KL25Z + // USB hardware is too slow to respond to a message. I'm not + // sure why this would only happen after a disconnect and not + // during the initial connection; maybe there's some reset work + // in the hardware that takes a substantial amount of time after + // a disconnect. + // + // The solution: the problem happens during the SETUP exchange, + // after we've been assigned a bus address. It only happens on + // some percentage of connection requests, so if we can simply + // start over when the failure occurs, we'll eventually succeed + // simply because not every attempt fails. The ideal would be + // to get the success rate up to 100%, but I can't figure out how + // to fix the underlying problem, so this is the next best thing. + // + // We can detect when the failure occurs by noticing when a SLEEP + // interrupt happens while we have an assigned bus address. + // + // To start a new connection attempt, we have to make the *host* + // try again. The logical connection is initiated solely by the + // host. Fortunately, it's easy to get the host to initiate the + // process: if we disconnect on the device side, it effectively + // makes the device look to the PC like it's electrically unplugged. + // When we reconnect on the device side, the PC thinks a new device + // has been plugged in and initiates the logical connection setup. + // We have to remain disconnected for a macroscopic interval for + // this to happen - 5ms seems to do the trick. + // + // Here's the full algorithm: + // + // 1. In the SLEEP interrupt handler, if we have a bus address, + // we disconnect the device. This happens in ISR context, so we + // can't wait around for 5ms. Instead, we simply set a flag noting + // that the connection has been broken, and we note the time and + // return. + // + // 2. In our main loop, whenever we find that we're disconnected, + // we call recoverConnection(). The main loop's job is basically a + // bunch of device polling. We're just one more device to poll, so + // recoverConnection() will be called soon after a disconnect, and + // then will be called in a loop for as long as we're disconnected. + // + // 3. In recoverConnection(), we check the flag we set in the SLEEP + // handler. If set, we wait until 5ms has elapsed from the SLEEP + // event time that we noted, then we'll reconnect and clear the flag. + // This gives us the required 5ms (or longer) delay between the + // disconnect and reconnect, ensuring that the PC will notice and + // will start over with the connection protocol. + // + // 4. The main loop keeps calling recoverConnection() in a loop for + // as long as we're disconnected, so if the new connection attempt + // triggered in step 3 fails, the SLEEP interrupt will happen again, + // we'll disconnect again, the flag will get set again, and + // recoverConnection() will reconnect again after another suitable + // delay. This will repeat until the connection succeeds or hell + // freezes over. + // + // Each disconnect happens immediately when a reconnect attempt + // fails, and an entire successful connection only takes about 25ms, + // so our loop can retry at more than 30 attempts per second. + // In my testing, lost connections almost always reconnect in + // less than second with this code in place. + void recoverConnection() + { + // if a reconnect is pending, reconnect + if (reconnectPending_) + { + // Loop until we reach 5ms after the last sleep event. + for (bool done = false ; !done ; ) + { + // If we've reached the target time, reconnect. Do the + // time check and flag reset atomically, so that we can't + // have another sleep event sneak in after we've verified + // the time. If another event occurs, it has to happen + // before we check, in which case it'll update the time + // before we check it, or after we clear the flag, in + // which case it will reset the flag and we'll do another + // round the next time we call this routine. + __disable_irq(); + if (uint32_t(timer_.read_us() - lastSleepTime_) > 5000) + { + connect(false); + reconnectPending_ = false; + done = true; + } + __enable_irq(); + } + } + } protected: - virtual void suspendStateChanged(unsigned int suspended) - { suspended_ = suspended; } - - // are we suspended? - int suspended_; + // Handle a USB SLEEP interrupt. This interrupt signifies that the + // USB hardware module hasn't seen any token traffic for 3ms, which + // means that we're either physically or logically disconnected. + // + // Important: this runs in ISR context. + // + // Note that this is a specialized sense of "sleep" that's unrelated + // to the similarly named power modes on the PC. This has nothing + // to do with suspend/sleep mode on the PC, and it's not a low-power + // mode on the KL25Z. They really should have called this interrupt + // DISCONNECT or BROKEN CONNECTION.) + virtual void sleepStateChanged(unsigned int sleeping) + { + // note the new state + sleeping_ = sleeping; + + // If we have a non-zero bus address, we have at least a partial + // connection to the host (we've made it at least as far as the + // SETUP stage). Explicitly disconnect, and the pending reconnect + // flag, and remember the time of the sleep event. + if (USB0->ADDR != 0x00) + { + disconnect(); + lastSleepTime_ = timer_.read_us(); + reconnectPending_ = true; + } + } + + // is the USB connection asleep? + volatile bool sleeping_; + + // flag: reconnect pending after sleep event + volatile bool reconnectPending_; + + // time of last sleep event while connected + volatile uint32_t lastSleepTime_; + + // timer to keep track of interval since last sleep event + Timer timer_; }; // --------------------------------------------------------------------------- @@ -3003,18 +3194,6 @@ inline void firingMode(int m) { firing = m; -#if 0 // $$$ - lwPin[3]->set(0); - lwPin[4]->set(0); - lwPin[5]->set(0); - switch (m) - { - case 1: lwPin[3]->set(255); break; // red - case 2: lwPin[4]->set(255); break; // green - case 3: lwPin[5]->set(255); break; // blue - case 4: lwPin[3]->set(255); lwPin[5]->set(255); break; // purple - } -#endif //$$$ } // Find the most recent local maximum in the history data, up to @@ -3290,13 +3469,14 @@ // // Reboot - resets the microcontroller // -void reboot(USBJoystick &js) +void reboot(USBJoystick &js, bool disconnect = true, long pause_us = 2000000L) { // disconnect from USB - js.disconnect(); + if (disconnect) + js.disconnect(); // wait a few seconds to make sure the host notices the disconnect - wait(2.5f); + wait_us(pause_us); // reset the device NVIC_SystemReset(); @@ -3734,11 +3914,10 @@ // enable the 74HC595 chips, if present init_hc595(cfg); - // Initialize the LedWiz ports. Note that it's important to wait until - // after initializing the various off-board output port controller chip - // sybsystems (TLC5940, 74HC595), since pins attached to peripheral - // controllers will need to address their respective controller objects, - // which don't exit until we initialize those subsystems. + // Initialize the LedWiz ports. Note that the ordering here is important: + // this has to come after we create the TLC5940 and 74HC595 object instances + // (which we just did above), since we need to access those objects to set + // up ports assigned to the respective chips. initLwOut(cfg); // start the TLC5940 clock @@ -3767,13 +3946,14 @@ { // short yellow flash diagLED(1, 1, 0); - wait(0.05); + wait_us(50000); diagLED(0, 0, 0); // reset the flash timer connectTimer.reset(); } } + connected = true; // Last report timer for the joytick interface. We use the joystick timer // to throttle the report rate, because VP doesn't benefit from reports any @@ -3781,8 +3961,6 @@ Timer jsReportTimer; jsReportTimer.start(); - Timer plungerIntervalTimer; plungerIntervalTimer.start(); // $$$ - // Time since we successfully sent a USB report. This is a hacky workaround // for sporadic problems in the USB stack that I haven't been able to figure // out. If we go too long without successfully sending a USB report, we'll @@ -3826,7 +4004,11 @@ // set up the ZB Launch Ball monitor ZBLaunchBall zbLaunchBall; - Timer dbgTimer; dbgTimer.start(); // $$$ plunger debug report timer + // enable the peripheral chips + if (tlc5940 != 0) + tlc5940->enable(true); + if (hc595 != 0) + hc595->enable(true); // we're all set up - now just loop, processing sensor reports and // host requests @@ -4006,12 +4188,6 @@ // rotate X and Y according to the device orientation in the cabinet accelRotate(x, y); -#if 0 - // $$$ report velocity in x axis and timestamp in y axis - x = int(plungerReader.getVelocity() * 1.0 * JOYMAX); - y = (plungerReader.getTimestamp() / 1000) % JOYMAX; -#endif - // send the joystick report jsOK = js.update(x, y, zrep, jsButtons, statusFlags); @@ -4050,12 +4226,12 @@ #endif // check for connection status changes - bool newConnected = js.isConnected() && !js.isSuspended(); + bool newConnected = js.isConnected() && !js.isSleeping(); if (newConnected != connected) { - // give it a few seconds to stabilize + // give it a moment to stabilize connectChangeTimer.start(); - if (connectChangeTimer.read() > 3) + if (connectChangeTimer.read_us() > 100000) { // note the new status connected = newConnected; @@ -4064,34 +4240,10 @@ connectChangeTimer.stop(); connectChangeTimer.reset(); - // adjust to the new status - if (connected) + // if we're newly disconnected, clean up for PC suspend mode or power off + if (!connected) { - // We're newly connected. This means we just powered on, we were - // just plugged in to the PC USB port after being unplugged, or the - // PC just came out of sleep/suspend mode and resumed the connection. - // In any of these cases, we can now assume that the PC power supply - // is on (the PC must be on for the USB connection to be running, and - // if the PC is on, its power supply is on). This also means that - // power to any external output controller chips (TLC5940, 74HC595) - // is now on, because those have to be powered from the PC power - // supply to allow for a reliable data connection to the KL25Z. - // We can thus now set clear initial output state in those chips and - // enable their outputs. - if (tlc5940 != 0) - { - tlc5940->update(true); - tlc5940->enable(true); - } - if (hc595 != 0) - { - hc595->update(true); - hc595->enable(true); - } - } - else - { - // We're no longer connected. Turn off all outputs. + // turn off all outputs allOutputsOff(); // The KL25Z runs off of USB power, so we might (depending on the PC @@ -4124,84 +4276,107 @@ // if we're disconnected, initiate a new connection if (!connected) { - // The "connected" variable means that we're either disconnected - // or that the connection has been suspended (e.g., the host is in - // a sleep mode). If the connection was lost entirely, explicitly - // initiate a reconnection. - if (!js.isConnected()) - js.connect(false); + // show USB HAL debug events + extern void HAL_DEBUG_PRINTEVENTS(const char *prefix); + HAL_DEBUG_PRINTEVENTS(">DISC"); + + // show immediate diagnostic feedback + js.diagFlash(); + + // clear any previous diagnostic LED display + diagLED(0, 0, 0); // set up a timer to monitor the reboot timeout Timer rebootTimer; rebootTimer.start(); - // wait for reconnect or reboot - connectTimer.reset(); - connectTimer.start(); - while (!js.isConnected() || js.isSuspended()) + // set up a timer for diagnostic displays + Timer diagTimer; + diagTimer.reset(); + diagTimer.start(); + + // loop until we get our connection back + while (!js.isConnected() || js.isSleeping()) { - // show a diagnostic flash every 2 seconds - if (connectTimer.read_us() > 2000000) + // try to recover the connection + js.recoverConnection(); + + // show a diagnostic flash every couple of seconds + if (diagTimer.read_us() > 2000000) { - // flash once if suspended or twice if disconnected - for (int j = js.isConnected() ? 1 : 2 ; j > 0 ; --j) - { - // short red flash - diagLED(1, 0, 0); - wait(0.05f); - diagLED(0, 0, 0); - wait(0.05f); - } + // flush the USB HAL debug events, if in debug mode + HAL_DEBUG_PRINTEVENTS(">NC"); + + // show diagnostic feedback + js.diagFlash(); // reset the flash timer - connectTimer.reset(); + diagTimer.reset(); } // if the disconnect reboot timeout has expired, reboot if (cfg.disconnectRebootTimeout != 0 && rebootTimer.read() > cfg.disconnectRebootTimeout) - reboot(js); + reboot(js, false, 0); + } + + // if we made it out of that loop alive, we're connected again! + connected = true; + HAL_DEBUG_PRINTEVENTS(">C"); + + // Enable peripheral chips and update them with current output data + if (tlc5940 != 0) + { + tlc5940->update(true); + tlc5940->enable(true); + } + if (hc595 != 0) + { + hc595->update(true); + hc595->enable(true); } } - // $$$ -#if 0 - if (dbgTimer.read() > 10) { - dbgTimer.reset(); - if (plungerSensor != 0 && (cfg.plunger.sensorType == PlungerType_TSL1410RS || cfg.plunger.sensorType == PlungerType_TSL1410RP)) - { - PlungerSensorTSL1410R *ps = (PlungerSensorTSL1410R *)plungerSensor; - uint32_t nRuns; - uint64_t totalTime; - ps->ccd.getTimingStats(totalTime, nRuns); - printf("average plunger read time: %f ms (total=%f, n=%d)\r\n", totalTime / 1000.0f / nRuns, totalTime, nRuns); - } - } -#endif - // end $$$ - // provide a visual status indication on the on-board LED if (calBtnState < 2 && hbTimer.read_us() > 1000000) { - if (jsOKTimer.read() > 5) + static int spiTimeUpdate = 0; // $$$ + if (spiTimeUpdate++ > 10 && tlc5940 != 0) { + spiTimeUpdate = 0; + printf("Average SPI time: %lf us\r\n", double(tlc5940->spi_total_time) / tlc5940->spi_runs); + } + + if (jsOKTimer.read_us() > 1000000) { // USB freeze - show red/yellow. - // Our outgoing joystick messages aren't going through, even though we - // think we're still connected. This indicates that one or more of our - // USB endpoints have stopped working, which can happen as a result of - // bugs in the USB HAL or latency responding to a USB IRQ. Show a - // distinctive diagnostic flash to signal the error. I haven't found a - // way to recover from this class of error other than rebooting the MCU, - // so the goal is to fix the HAL so that this error never happens. // - // NOTE! This diagnostic code *hopefully* shouldn't occur. It happened - // in the past due to a number of bugs in the mbed KL25Z USB HAL that - // I've since fixed. I think I found all of the cases that caused it, - // but I'm leaving the diagnostics here in case there are other bugs - // still lurking that can trigger the same symptoms. - jsOKTimer.stop(); + // It's been more than a second since we successfully sent a joystick + // update message. This must mean that something's wrong on the USB + // connection, even though we haven't detected an outright disconnect. + // Show a distinctive diagnostic LED pattern when this occurs. hb = !hb; diagLED(1, hb, 0); + + // If the reboot-on-disconnect option is in effect, treat this condition + // as equivalent to a disconnect, since something is obviously wrong + // with the USB connection. + if (cfg.disconnectRebootTimeout != 0) + { + // The reboot timeout is in effect. If we've been incommunicado for + // longer than the timeout, reboot. If we haven't reached the time + // limit, keep running for now, and leave the OK timer running so + // that we can continue to monitor this. + if (jsOKTimer.read() > cfg.disconnectRebootTimeout) + reboot(js, false, 0); + } + else + { + // There's no reboot timer, so just keep running with the diagnostic + // pattern displayed. Since we're not waiting for any other timed + // conditions in this state, stop the timer so that it doesn't + // overflow if this condition persists for a long time. + jsOKTimer.stop(); + } } else if (cfg.plunger.enabled && !cfg.plunger.cal.calibrated) {