6 years, 5 months ago.

RTOS Stack Issue at runtime - ARM GCC

Hi everyone, I'm asking some help because I'm struggling with rtos part of mbed and I can't figure out what's going on.

I've exported my project to offline compiler because I wanted more information on what the linker was doing. But now, when I run my project It fails on a beautifull Blue Led of Death. Digging a little, I can affirm that mbed_die(); is called in os_error(); an the cause is a stack ovf check (I've enhanced the mbed_die to blink the cause of os_error, in my case it blinks OS_ERR_STK_OVF = 1). So memory corruption is the key.

Code that trigger my mbed_die in rt_System.c

if ((os_tsk.run->tsk_stack < (U32)os_tsk.run->stack) ||
    (os_tsk.run->stack[0] != MAGIC_WORD)) {
    os_error (OS_ERR_STK_OVF);
}
  • So first of all I wanted to know if someone had faced an issue like this one? I've seen some very similar issue when allocated stack buffer are not alligned but this is not my case (my rtos lib are up to date).
  • Furthermore to understand better the upper paste code, could someone explain me better how the rt_init_stack() function works. The most obscure thing is the size part: why the stack size (priv_stack) is divided by 4. I know we use downgrowing stacks and I would have taken the &p_TCB->stack[priv_stack] for stk ... but I'm not a OS designer so I'm pretty sure i'm missing something :)

HAL_CM.c

void rt_init_stack (P_TCB p_TCB, FUNCP task_body) {
  /* Prepare TCB and saved context for a first time start of a task. */
  U32 *stk,i,size;

  /* Prepare a complete interrupt frame for first task start */
  size = p_TCB->priv_stack >> 2;

  /* Write to the top of stack. */
  stk = &p_TCB->stack[size];

  /* Auto correct to 8-byte ARM stack alignment. */
  if ((U32)stk & 0x04) {
    stk--;
  }

  stk -= 16;

  /* Default xPSR and initial PC */
  stk[15] = INITIAL_xPSR;
  stk[14] = (U32)task_body;

  /* Clear R4-R11,R0-R3,R12,LR registers. */
  for (i = 0; i < 14; i++) {
    stk[i] = 0;
  }

  /* Assign a void pointer to R0. */
  stk[8] = (U32)p_TCB->msg;

  /* Initial Task stack pointer. */
  p_TCB->tsk_stack = (U32)stk;

  /* Task entry point. */
  p_TCB->ptask = task_body;

  /* Set a magic word for checking of stack overflow.
   For the main thread (ID: 0x01) the stack is in a memory area shared with the
   heap, therefore the last word of the stack is a moving target.
   We want to do stack/heap collision detection instead.
  */
  if (p_TCB->task_id != 0x01)
      p_TCB->stack[0] = MAGIC_WORD;
}

I really appreciate your help. Regards Clément

platform: LPC1768 compiler: arm-none-eabi-* libs: USBHost + 3GWanDongle + LWip

Well I could not understand better how the stack are initilized. However I got an explanation to share about my initial issue. I modify a bit more the mbed_die() function to show which thread was throwing this error. In my case the faulty thread was a Command AT Stream parser that uses sscanf,strcmp, ... . Increasing the stack of this thread resolve the issue. The most interesting part is that everything was compiling well and working well in online compile.. so I imagine the standard libs used in online toolchain does not work in the same way or maybe they are even better optimized than the nano lib in gcc. Well now I've other issues but this is an another story ;). Hope this could help. clement

posted by Clément BENOIT 17 Apr 2015

1 Answer

6 years, 5 months ago.

The stack size is divided by 4 since the p_TCB->stack array is an array of 32-bit words and not bytes.

The standard C libraries are different between the online compiler and GCC so such differences in stack usage aren't unexpected but it does suck when you have code that works with the online compiler and then fails in GCC. The best way is to do what you did, trap it under the debugger to see which thread has too small of a stack and then increase it.