Two Serial Quirks

19 Feb 2010 . Edited: 19 Feb 2010

Sorry, I'm not sure if this is the place for this post?  I checked the early post by Dan Ros that says Bugs & Suggestions is "about the development of mbed (as opposed to development on mbed)".  I'm being a bit slow, but should I stick to website / IDE related things here, or can I talk about things I've noticed specific to the embedded system?  I'll try anyway and ask for forgiveness later! ;p  On with the show:

1)  A few weeks ago Christian Lerche and I noticed an issue that caused a system hang w/ the Serial interrupts and printf() routine (both via USBserial and other ports too).  It is already documented and I have a test program to reproduce the issue.  I'm not sure if Simon Ford ran across it yet, as he seems busy lately with pwms, watchdogs and all his other duties too! :)

http://mbed.org/forum/mbed/topic/408/?page=1#comment-2221

http://mbed.org/users/leimgrub/programs/SerialInterrupt/5z0gd/

The work around I came up with was to implement a puts() function and just sprintf() my messages into a buffer before puts()ing them...

2) Just yesterday I finished up a system that does a lot of full-duplex serial I/O.  I have a python GUI that sends messages to the mbed when I modify some widgets and the mbed is peridiocally sending back messages to the GUI via the USB serial link.  It works okay for a while, but when I start to "hammer" the system (e.g. mbed sends to PC at ~10hz 10byte packets and the PC sends to mbed ~10hz 10 byte packets) I can get the mbed into a state where it stops receiving bytes properly.  It *may* be an issue with custom usb/flash interface or perhaps my linux usb<->serial driver.  I think it isn't the mbed chip itself as I can push the reset button and it still doesn't receive data properly.  If I wait for a while (maybe 3-30 seconds), it will sometimes start working again.  If I cycle power, it will always come up in a working state.

I haven't done extensive testing, but it seems reproducable at various baud rates (9600 and 115200)...

I also tested running the same code but on one of the non-usb serial ports.  I haven't seen the issue come up yet (and its over the sparkfun blueSmirf link at the same 115200 baud rate).

Just curious if you've seen this before, otherwise I might try to put together the most simple test program to reproduce the issue if possible... Thanks, and keep up all the good work adding features and promoting this super handy system!

-John

19 Feb 2010 . Edited: 19 Feb 2010

John:

I use serial printfs with a 57600 baud rate over the MBED USD-Serial driver.   I wrote some routines to do a fully buffered interface. I.e.  I have generic software  byte queues (1024bytes in size) tied to the TX/RX routines.   I have not seen anything hang.  I can give you the code if interested.

-Eli Hughes

19 Feb 2010

Eli,

 

I'd like to give your code a try and see if that runs moe stably on my setup.  What I'm using now is based on Richard Sewell's "SerialBuffered"* class with some more of my stuff on top of that.  Is your library based on the serial RX interrupt and a ticker for the TX?  Or is it polled somehow on both rx and tx ends?  Or maybe you even went above and beyond the regular serial lib and did your own register poking? *grin* Thanks for your feedback and I'll check back to see if you publish your program sometime soon (if you don't want to publish perhasp we can find a way for you to send me the exported zip?)

*http://mbed.org/users/jarkman/notebook/serialbuffered/

-John

20 Feb 2010

Hi John, Eli,

This is exactly the place to post! Thank you for taking the time to investigate these issues, and in particular a way to reproduce the first. I had it broken within seconds :)

1) The hanging issue you see is definitely a problem related to re-entrancy (two things trying to access a stream at once), and we'll look at fixing it along with a look at re-entrancy and thread safety in general. For now, I have determined a workaround which should get you what you need:

  • Create 2 Serial objects (e.g. pc, pc2) both tied to (USBTX, USBRX)
  • Use one inside interrupts, and another outside interrupts

With this modification, your program runs ok. Here is a version showing that:

2) For the USB serial interface, we have a beta firmware update which fixes some serial problems. I'd be interested if you could try this and see if it solves your issues. You can download it here:

To update, save this to your mbed, then power cycle it (unplug, replug) - after a few seconds it'll be updated. For more information, and if you want to revert to the latest official release, see:

Please tell us how you get on.

Simon

20 Feb 2010

Hi Simon

I've posted an alternative fix SerialInterruptFix2 which I think might be more determinisitic than using two objects with a known re-entrancy problem and hoping somehow it works? However, the multi object approach might be necessary when you have the stream used in different interrupt handlers (eg. timer and serial) as well as the main thread and then it becomes difficult to create critical sections without resorting to disabling all interrupts rather than just selected one.

As with my post here, the key is making a critical section for the printf outside the interrupt handler:

      NVIC_DisableIRQ(UART0_IRQn);
      pc.printf("If I receive while transmitting this I *won't* crash...\r\n");
      NVIC_EnableIRQ(UART0_IRQn);

Thanks
Daniel

 

24 Feb 2010

Simon and Daniel,

Thanks much for both of your options.  I spent much of Monday exploring both options and trying to figure out exactly what was going on under the hood including:

1) I first tried to simply upgrade the firmware, but it didn't fix my issue.

2) I wrote a number of test apps trying to push the full duplex rx/tx and couldn't get it to break at thruputs much higher than my original 10hz test... I tried using two seperate serial objects and that did fix the issue with printf... I also tried disabling interrupts, but doing so I could create a situation where the mbed would miss a packet that it received while interrupts were turned off... Instead of disabling interrupts, is there a way to just 'temporarily queue up interrupts' or something? hah...

3) I realized that my specific issue still occured even without the mbed sending data!  So using two seperate Serial devices nor disabling interrupts even applied at this point... So after doing some more digging, I fixed the problem by adding flush() calls in the python pyserial code after EVERY byte...

For example, the mbed receives everything fine with no checksum errors if I use this call from my laptop:

 

usbSerialPort.write('<magic byte> <data byte 0> <data byte 1> ... <data byte n> <checksum> <magic byte>')

 

However, if I will frequently get checksum errors if I do the following instead:

 

usbSerialPort.write('<magic byte>')
usbSerialPort.write('<data byte 0>')
usbSerialPort.write('<data byte 1>')
...
usbSerialPort.write('<data byte n>')
usbSerialPort.write('<checksum>')
usbSerialPort.write('<magic byte>')

 

BUT, if I do this, it works fine again:

 

usbSerialPort.write('<magic byte>')
usbSerialPort.flush()
usbSerialPort.write('<data byte 0>')
usbSerialPort.flush()
usbSerialPort.write('<data byte 1>')
usbSerialPort.flush()
...
usbSerialPort.write('<data byte n>')
usbSerialPort.flush()
usbSerialPort.write('<checksum>')
usbSerialPort.flush()
usbSerialPort.write('<magic byte>')
usbSerialPort.flush()

 

My wild speculation is it has something to do with the latency/thruput tradeoffs involved with wrapping a serial stream over USB.  My guess is that by not calling flush() after each byte, that the bytes are being buffered up and perhaps not immedeately being sent off as one group.  This may be happening on the PC / linux / usb driver / pySerial side or on the mbed usb chip, I really don't know... My receive code does allow for up to 15ms between each incoming packet, so it seems more like some bytes are getting dropped but I haven't proven this...

Anyway, just an observation that stumped me for a good while that hopefully someone else might find usefull...