Important changes to forums and questions
All forums and questions are now archived. To start a new conversation or read the latest updates go to forums.mbed.com.
5 years, 7 months ago.
Synchronization bug in libexactLE/hci driver?
After streaming data over BLE for many hours my device ends up in a hung state where the hciDrvWrite routine is stuck spinning on the reading flag.
I don't have source for libexactBLE.a but I've tracked this down to the following lines in the disassembled code (which I have annotated):
867c: 4b43 ldr r3, [pc, #268] ; (878c <hciDrvWrite+0x124>) r3 <= irqObj 867e: 6818 ldr r0, [r3, #0] ; r0 <= [irqObj] 8680: f7fd ff68 bl 6554 <gpio_irq_disable> 8684: 4942 ldr r1, [pc, #264] ; (8790 <hciDrvWrite+0x128>) r1 <= reading 8686: 4b43 ldr r3, [pc, #268] ; (8794 <hciDrvWrite+0x12c>) r3 <= writing 8688: 680a ldr r2, [r1, #0] ; r2 <= [reading] 868a: 2a00 cmp r2, #0 868c: d1fc bne.n 8688 <hciDrvWrite+0x20> ; SPIN! 868e: 681a ldr r2, [r3, #0] 8690: 2a00 cmp r2, #0 8692: d1f9 bne.n 8688 <hciDrvWrite+0x20> . . . 8790: 200021b0 .word 0x200021b0 8794: 20002194 .word 0x20002194
and gdb shows 0x200021b0 as the reading flag which is set to 1:
(gdb) x/x 0x200021b0 0x200021b0 <reading>: 0x01
So, for some unknown reason, the reading flag is left set and the write routine spins forever.
If I had source I would try to work through this (is it possible to get source for libexactLE.a???) but my first thought is that disabling the interrupt before spinning on the reading flag seems like a bad idea because that could prevent any pending read from finishing.
Could someone at Maxim take a look at the hci driver source and let me know if this condition is possible and whether or not a fix is easy?
Thanks, Kevin
I've traced this a little further and it looks like a corrupted/misaligned hdrRx is the root cause. Typically, during normal operation I see the following with a break point set just after the header was read:
whereas when the write is stuck waiting on the reading flag I see:
It looks to me like the hdrRx is offset by one byte, i.e. the 0x04 indicating the packet type was lost, and things fall apart from there.
posted by Kevin Peterson 25 May 2019