In the process of debugging another issue, I was looking at the disassembly of a GCC built version of some networking code which had linked in the mbed-rtos code. I was initially just scanning the disassembly to see where the code played around with interrupt masking and priority level to see if there were spots in the code which could hang without allowing me to break in with my debug monitor. While doing that I saw multiple instances of code that looked similar to this snippet:
00000d60 <rt_mbx_send>:
d60: b5f0 push {r4, r5, r6, r7, lr}
d62: 6846 ldr r6, [r0, #4]
d64: b083 sub sp, #12
d66: 4603 mov r3, r0
d68: 460f mov r7, r1
d6a: b116 cbz r6, d72 <rt_mbx_send+0x12>
d6c: 7841 ldrb r1, [r0, #1]
d6e: 2901 cmp r1, #1
d70: d02b beq.n dca <rt_mbx_send+0x6a>
d72: 899d ldrh r5, [r3, #12]
d74: 89dc ldrh r4, [r3, #14]
d76: 42a5 cmp r5, r4
d78: d014 beq.n da4 <rt_mbx_send+0x44>
d7a: 8918 ldrh r0, [r3, #8]
d7c: 1d02 adds r2, r0, #4
d7e: f843 7022 str.w r7, [r3, r2, lsl #2]
d82: f3ef 8210 mrs r2, PRIMASK
d86: b672 cpsid i
d88: 3501 adds r5, #1
d8a: 819d strh r5, [r3, #12]
d8c: b662 cpsie i
...
The code at addresses 0xd88 and 0xd8a looks very suspicious to me. It increments r5 by one and then stores the result at r3+12 but the load into r5 from the same address happened earlier at address 0xd72. This means that the load happened outside of the critical section created by the cpsid and cpsie instructions. This code looks like it is trying to perform an interlocked increment but this is not a safe way to do it.
I looked at the C code and this disassembly corresponds to this macro:
#define rt_inc(p) __disable_irq();(*p)++;__enable_irq();
This C code makes it clear that the intent was to perform an interlocked increment but that isn't what the compiler generated.
I was able to correct the code generation on my machine by marking the count fields manipulated by rt_inc() and rt_dec() as volatile.
I doubt the online compiler would have the same problems since it uses load/store exclusive compiler intrinsics instead. In general I think it would be better if the interlocked functionality like rt_inc() and rt_dec() for GCC and IAR (it looks like IAR would probably have similar problems but I haven't actually looked at its code gen) were implemented as forced inline assembly functions which used load/store exclusive instructions on ARMv7m.
Hopefully I am just mis-interpreting the situation here and everything is ok :) If not, this could lead to some nasty and hard to debug concurrency issues.
-Adam
In the process of debugging another issue, I was looking at the disassembly of a GCC built version of some networking code which had linked in the mbed-rtos code. I was initially just scanning the disassembly to see where the code played around with interrupt masking and priority level to see if there were spots in the code which could hang without allowing me to break in with my debug monitor. While doing that I saw multiple instances of code that looked similar to this snippet:
The code at addresses 0xd88 and 0xd8a looks very suspicious to me. It increments r5 by one and then stores the result at r3+12 but the load into r5 from the same address happened earlier at address 0xd72. This means that the load happened outside of the critical section created by the cpsid and cpsie instructions. This code looks like it is trying to perform an interlocked increment but this is not a safe way to do it.
I looked at the C code and this disassembly corresponds to this macro:
This C code makes it clear that the intent was to perform an interlocked increment but that isn't what the compiler generated.
I was able to correct the code generation on my machine by marking the count fields manipulated by rt_inc() and rt_dec() as volatile.
I doubt the online compiler would have the same problems since it uses load/store exclusive compiler intrinsics instead. In general I think it would be better if the interlocked functionality like rt_inc() and rt_dec() for GCC and IAR (it looks like IAR would probably have similar problems but I haven't actually looked at its code gen) were implemented as forced inline assembly functions which used load/store exclusive instructions on ARMv7m.
Hopefully I am just mis-interpreting the situation here and everything is ok :) If not, this could lead to some nasty and hard to debug concurrency issues.
-Adam