MBED stalling on call to interface chip

21 Jan 2015

We are using the mbed in autonomous systems that should run continuously for months / years.

While doing so, we have observed that sometimes (seemingly randomly), the mbed interface chip may become "inaccessible" to the Cortex M3. When this happens, any call to the interface chip (i.e. the default MAC address getter or use of the mbed's local file system) will result in the system stalling.

In this state, the function "mbed_interface_connected" still returns 1, so there seems to be no way to check for the issue prior to a call that will stall the program.

Once in this state, it is possible to get stuck in a loop where the M3's watchdog reboots the M3, only to have it stall again on the interface chip call (after which the watchdog reboots the M3, it stalls again, etc).

Rebooting the mbed interface chip (via the black button or a power cycle) will put the system back into a normal state. This isn't particularly helpful when one does not have physical access to the mbed, however. :)

An extremely simple program to demonstrate this issue is available at:

http://developer.mbed.org/users/uci1/code/an_m0interface_stall_test/

Note: in running this program 3 times, the stalling occurred after 11 seconds, 2 minutes and then 2:40 minutes. So, mileage may vary.

It would be fantastic if someone could explain what might be going on to cause the inaccessibility of the interface chip. Also, if there are any clever ideas for working around this issue other than avoiding all calls to the interface chip, which is an obvious, if inconvenient, work around.

Thanks very much for your help!

21 Jan 2015

Never heard about this one before, but did you try either updating the firmware, or going one version back?

21 Jan 2015

Can you clarify if this happening on the LPC1768, LPC11U24 or both?? Your test program is called m0 but your question references m3.

21 Jan 2015

Oh, of course, I should have specified.

This is using the LPC1768. The "m0" in the program test referrs to the mbed interface chip.

I've seen this behavior with mbed libraries v85 and the latest v92 (which the test program uses). Likely it occurs with other versions.

The LPC1768 mbed that I'm currently using has firmware 16457. We have some other units already deployed that may have an older firmware version (I'm not sure). I haven't yet tried updating the firmware, but it would be nice to gain a better understanding (if possible) of what's going on currently before doing so.

23 Jan 2015

A bit more information. It seems that calls to printf or debug (via mbed_debug.h) do not result in the stalling, even though I guess they access the interface chip in order to send data out of the USB plug on the MBED board.

Does this implicate something about the local file system access being related to the stalling? It does seem that either requesting the MAC address or accessing the local file system can lead to the system stalling. These could be related if the MAC address function is attempting to parse the MBED.HTM file on the local file system.

Also, it does not appear to matter whether or not a computer is connected to the MBED board via the USB interface cable.

23 Jan 2015

LocalFileSystem uses a semihosting interface (debug), while printf and also mbed_debug use a regular UART to send data to the interface chip, which in turn sends it via USB.

23 Jan 2015

Not a solution but a possible fail-safe: use a DigitalOut to pull the nRST pin low when the watchdog timer causes a reset. This should reset the IF chip and also reset the target again. That second reset should not lead to a new reset when the digitalout pin is default high. An external watchdog or CPU monitor device would probably be even more robust.

23 Jan 2015

Thanks for the suggestion. We are working on an external watchdog, indeed. We're a bit hesitant to reset via the nRST pin (or just a call to mbed_reset) whenever the watchdog flag is high on startup, as it could potentially lead to a reset loop. In the meantime, avoiding calls to get the MAC address or to use the local file system seem more robust...

It would be great to have more understanding of what's going on, however. Even a note that others see the same effect would be nice.