Potential concurrency issues in RTOS when build with GCC?

# 07 Aug 2013

In the process of debugging another issue, I was looking at the disassembly of a GCC built version of some networking code which had linked in the mbed-rtos code. I was initially just scanning the disassembly to see where the code played around with interrupt masking and priority level to see if there were spots in the code which could hang without allowing me to break in with my debug monitor. While doing that I saw multiple instances of code that looked similar to this snippet:

00000d60 <rt_mbx_send>:
     d60:	b5f0      	push	{r4, r5, r6, r7, lr}
     d62:	6846      	ldr	r6, [r0, #4]
     d64:	b083      	sub	sp, #12
     d66:	4603      	mov	r3, r0
     d68:	460f      	mov	r7, r1
     d6a:	b116      	cbz	r6, d72 <rt_mbx_send+0x12>
     d6c:	7841      	ldrb	r1, [r0, #1]
     d6e:	2901      	cmp	r1, #1
     d70:	d02b      	beq.n	dca <rt_mbx_send+0x6a>
     d72:	899d      	ldrh	r5, [r3, #12]
     d74:	89dc      	ldrh	r4, [r3, #14]
     d76:	42a5      	cmp	r5, r4
     d78:	d014      	beq.n	da4 <rt_mbx_send+0x44>
     d7a:	8918      	ldrh	r0, [r3, #8]
     d7c:	1d02      	adds	r2, r0, #4
     d7e:	f843 7022 	str.w	r7, [r3, r2, lsl #2]
     d82:	f3ef 8210 	mrs	r2, PRIMASK
     d86:	b672      	cpsid	i
     d88:	3501      	adds	r5, #1
     d8a:	819d      	strh	r5, [r3, #12]
     d8c:	b662      	cpsie	i
    ...

The code at addresses 0xd88 and 0xd8a looks very suspicious to me. It increments r5 by one and then stores the result at r3+12 but the load into r5 from the same address happened earlier at address 0xd72. This means that the load happened outside of the critical section created by the cpsid and cpsie instructions. This code looks like it is trying to perform an interlocked increment but this is not a safe way to do it.

I looked at the C code and this disassembly corresponds to this macro:

 #define rt_inc(p)     __disable_irq();(*p)++;__enable_irq();

This C code makes it clear that the intent was to perform an interlocked increment but that isn't what the compiler generated.

I was able to correct the code generation on my machine by marking the count fields manipulated by rt_inc() and rt_dec() as volatile.

I doubt the online compiler would have the same problems since it uses load/store exclusive compiler intrinsics instead. In general I think it would be better if the interlocked functionality like rt_inc() and rt_dec() for GCC and IAR (it looks like IAR would probably have similar problems but I haven't actually looked at its code gen) were implemented as forced inline assembly functions which used load/store exclusive instructions on ARMv7m.

Hopefully I am just mis-interpreting the situation here and everything is ok :) If not, this could lead to some nasty and hard to debug concurrency issues.

-Adam

In the process of debugging another issue, I was looking at the disassembly of a GCC built version of some networking code which had linked in the mbed-rtos code. I was initially just scanning the disassembly to see where the code played around with interrupt masking and priority level to see if there were spots in the code which could hang without allowing me to break in with my debug monitor. While doing that I saw multiple instances of code that looked similar to this snippet: {{{ 00000d60 <rt_mbx_send>: d60: b5f0 push {r4, r5, r6, r7, lr} d62: 6846 ldr r6, [r0, #4] d64: b083 sub sp, #12 d66: 4603 mov r3, r0 d68: 460f mov r7, r1 d6a: b116 cbz r6, d72 <rt_mbx_send+0x12> d6c: 7841 ldrb r1, [r0, #1] d6e: 2901 cmp r1, #1 d70: d02b beq.n dca <rt_mbx_send+0x6a> d72: 899d ldrh r5, [r3, #12] d74: 89dc ldrh r4, [r3, #14] d76: 42a5 cmp r5, r4 d78: d014 beq.n da4 <rt_mbx_send+0x44> d7a: 8918 ldrh r0, [r3, #8] d7c: 1d02 adds r2, r0, #4 d7e: f843 7022 str.w r7, [r3, r2, lsl #2] d82: f3ef 8210 mrs r2, PRIMASK d86: b672 cpsid i d88: 3501 adds r5, #1 d8a: 819d strh r5, [r3, #12] d8c: b662 cpsie i ... }}} The code at addresses **0xd88** and **0xd8a** looks very suspicious to me. It increments r5 by one and then stores the result at r3+12 but the load into r5 from the same address happened earlier at address **0xd72**. This means that the load happened outside of the critical section created by the **cpsid** and **cpsie** instructions. This code looks like it is trying to perform an interlocked increment but this is not a safe way to do it. I looked at the C code and this disassembly corresponds to this macro: <<code>> #define rt_inc(p) __disable_irq();(*p)++;__enable_irq(); <</code>> This C code makes it clear that the intent was to perform an interlocked increment but that isn't what the compiler generated. I was able to correct the code generation on my machine by marking the count fields manipulated by rt_inc() and rt_dec() as volatile. I doubt the online compiler would have the same problems since it uses load/store exclusive compiler intrinsics instead. In general I think it would be better if the interlocked functionality like rt_inc() and rt_dec() for GCC and IAR (it looks like IAR would probably have similar problems but I haven't actually looked at its code gen) were implemented as forced inline assembly functions which used load/store exclusive instructions on ARMv7m. Hopefully I am just mis-interpreting the situation here and everything is ok :) If not, this could lead to some nasty and hard to debug concurrency issues. -Adam

Joey Ye

# 10 Sep 2013

This is a bug in dis/enable_irq implementation in mbed rtos. I believe it starts from a buggy old version of CMSIS. The bug in CMSIS was fixed since v3.10, but mbed rtos still keeps it.

Following patch should fix it. But I feel that a better solution is to have rtos use the up-to-date implementation in libraries/mbed/targets/cmsis/core_cmFunc.h

Thanks, Joey

diff --git a/libraries/rtos/rtx/rt_HAL_CM.h b/libraries/rtos/rtx/rt_HAL_CM.h
index 1c664fc..1c74b59 100644
--- a/libraries/rtos/rtx/rt_HAL_CM.h
+++ b/libraries/rtos/rtx/rt_HAL_CM.h
@@ -69,7 +69,7 @@
 
 __attribute__((always_inline)) static inline void __enable_irq(void)
 {
-  __asm volatile ("cpsie i");
+  __asm volatile ("cpsie i" : : : "memory");
 }
 
 __attribute__((always_inline)) static inline U32 __disable_irq(void)
@@ -77,7 +77,7 @@ __attribute__((always_inline)) static inline U32 __disable_irq
   U32 result;
 
   __asm volatile ("mrs %0, primask" : "=r" (result));
-  __asm volatile ("cpsid i");
+  __asm volatile ("cpsid i" : : : "memory");
   return(result & 1);
 }

Adam Green

# 11 Sep 2013

Joey, thanks for the very helpful response. The diff you provided above makes sense to me. Later this week, I will try switching the rtos to use the more recent CMSIS implementation and then look at the resulting asm.

-Adam

Martin Kojtal

# 09 May 2014

Hello,

looks like this issue is still relevant. Adam, did you look at the code after proposed implementation by Joey?

Regards,
0xc0170

Adam Green

# 15 May 2014

Heh Martin,

I didn't have a chance to try out Joey's fix because I got busy on other projects. It is still on my work item list but I don't know when I will get to it.

I thought Joey's suggestion of removing CMSIS duplication in RTX and using the latest CMSIS instead makes a lot of sense.

Thanks,

Adam

Important changes to forums and questions