9 years, 11 months ago.

mbed-rtos memory utilization

Is it normal for the mbed-rtos library to use 200k RAM? I took the basic LED template project and added the heapFree() function I found here on mbed to check for the heap space available.

unsigned long heapSize()
{
    char   stackVariable;
    void   *heap;
    unsigned long result;
    heap  = malloc(4);
    result  = (uint8_t*)&stackVariable - (uint8_t*)heap;
    free(heap);
    return result;
}

If I just run the program as it is auto-generated, I'm left with 261208 bytes free. If I merely add the mbed-rtos library and run my program again, it goes down to about 60k.

Is this normal memory usage for mbed-rtos?

Dave, I would be interested in seeing the actual pointer values of &stackVariable and heap to see how they deviate from what I expect. Can you add a printf() to your heapSize function to print those addresses and report them here? I don't have a K64F device to try your code out on for myself.

From what I see in the RTOS code, I would expect the main thread's stack to be similar to what you have when there is no RTOS in use. The beginning of the heap should be pushed ahead by the fact that the RTOS will use more RAM for the globals that it maintains but I still wouldn't expect it to be 200k.

posted by Adam Green 09 Dec 2014

The values I get (on X64F) are:

stack: 0x1ffffbc4  heap: 0x1fff1170  free: 0xea54 (59988)
posted by Giles Biddison 09 Dec 2014

3 Answers

9 years, 10 months ago.

I have issued a pull request with a fix for this issue. I increased the priv_stack field to be 32-bits as well.

https://github.com/mbedmicro/mbed/pull/826

Thanks, looks great!

Can you explain why the extra u16 reserved2 is necessary? It looks Word aligned to me whether that's there or not?

posted by Giles Biddison 06 Jan 2015

Giles Biddison wrote:

It looks Word aligned to me whether that's there or not?

That added u16 reserved2 field explicitly shows the 2 bytes of padding that need to be inserted in the structure to properly align the now 4-byte long priv_stack field. If I didn't insert it, it would have been automatically inserted by the compiler. This just makes it explicit so that people don't make the mistake you made when you tried setting the TCB_TSTACK offset to a value of 38 (0x26), which isn't 4-byte aligned.

posted by Adam Green 07 Jan 2015
9 years, 11 months ago.

That should not be the actual memory usage: very few targets even have that much space!

However I am not sure that your current way is the correct way to assess memory usage in an RTOS: The problem is that each thread has its own memory area for its own stack, and the memory model is alot more complex than without RTOS.

I definitely agree that this can't possibly be right. Unfortunately, I couldn't find my other mbed target that has less resources so I could run it there and compare. I'll look around for other methods of dumping the available RAM, but this is all I have seen so far. I definitely have some kind of a memory issue, though. The method outlined above shows very little RAM available, and if I do something as simple as allocate a byte array on the stack, my program indicates a hard fault error. Thanks for your response!

posted by Dave M 09 Dec 2014

On which stack are you allocating it and how large? Each thread has its own stack of limitted size (you can override the size when making a new thread). Large arrays I would make as global arrays, then they are placed on the global stack.

posted by Erik - 09 Dec 2014

it was in the main thread, not in a thread that I had explicitly created. Basically, at some point creating *anything* would cause the program to fail. The problem came up when I found that adding to my constructor's initializer list would result in a hard fault indication. Even adding an object with an empty implementation would crash... hence the hunt for memory problems.

posted by Dave M 09 Dec 2014

Erik, would you mind running my app to see what I am referring to? http://developer.mbed.org/users/dmatsumoto/code/RtosHeapTest/

posted by Dave M 09 Dec 2014

I am sure I will get the same result as you get, however that isn't the amount of memory used by RTOS :).

If you have something simple which crashes I can look at it.

Edit: On the LPC1768 it goes directly to the hardfault handler, how did you do that :P

Edit2: Without the printf in the RTOS:

in main, heap free is: 27184

Which is virtually its entire RAM.

For K64F:

in main, heap free is: 60024

F401 which has 96k SRAM:

Stack = 0x20007BF0, heap = 0x20000940, diff = 29360

(Changed it a bit around there).

I don't know enough about the RTOS memory model to say anything relevant about it :).

posted by Erik - 09 Dec 2014

So if I'm understanding what you're saying, for the LPC1768 the RTOS uses almost no RAM with my test application, but the K64F seems to use almost 200k, whereas for the F401 it uses over 60k? Is that correct?

posted by Dave M 09 Dec 2014

Well if you replace 'RAM usage' with difference between heap and stack pointers made in a thread (main thread is also thread), then yes :P. I am not willing to say that this is the actual free memory we are measuring.

Btw on the K64F I Also checked stack/heap locations:

Stack = 0x1FFFFBF0, heap = 0x1FFF1178, diff = 60024

Which is the first quarter of the RAM. So at first glance it would look like the stack is huge, but again, I would need to spend alot more time on it to make that conclusion :).

posted by Erik - 09 Dec 2014

Haha, ok, fair enough. :) I'll keep looking for possible issues.

posted by Dave M 09 Dec 2014

Found something interesting!

in HAL_CM.c:

void rt_init_stack (P_TCB p_TCB, FUNCP task_body) {
  /* Prepare TCB and saved context for a first time start of a task. */
  U32 *stk,i,size;

  /* Prepare a complete interrupt frame for first task start */
  size = p_TCB->priv_stack >> 2;

Now since main is a thread, and it takes up whatever memory is available, let's just assume it's about 200k. When the main thread is created, priv_stack should be set to:

os_thread_def_main.stack_size = INITIAL_SP - HEAP_START - (OS_SCHEDULERSTKSIZE*4)

which is 0x3eaf4. However, priv_stack is declared as U16, so the upper two bytes get truncated. This effectively cuts the stack size by 1/4.

Unfortunately, there is a lot of code to wade through because merely changing from U16 to U32 causes issues with memory alignment and access from some assembly code. Removing the >> 2 in the size calculation also does not work.

posted by Dave M 09 Dec 2014

That indeed seems weird, although it is also 16-bit for the cortex-A targets. And they definately should be able to handle alot more than 60kB for a thread.

posted by Erik - 09 Dec 2014

Dave,

That is a good catch. Definitely appears like it could be the cause of the problem you are seeing. It is very unfortunate that the user specifiers the stack size via a 32-bit field, os_thread_def::stacksize, but the internal RTOS OS_TCB::priv_stack field is only 16-bit.

One fix would be to modify set_main_stack() in RTX_CM_lib.h to cap the stacksize and stack_pointer fields to be 64k below INITIAL_SP - (OS_SCHEDULERSTKSIZE * 4). This would workaround the size limit in the OS_TCB structure but still give you the stack you expect.

posted by Adam Green 09 Dec 2014

I wonder what happens if you change set_main_stack() in RTX_CM_lib.h to something like this:

void set_main_stack(void) {
    // Make sure that stack isn't greater than 64k as that is the maximum
    // supported by RTX internal OS_TCB thread structure.
    uint32_t topOfStack = INITIAL_SP - (OS_SCHEDULERSTKSIZE * 4);
    uint32_t stackSize = topOfStack - (uint32_t)HEAP_START;
    if (stackSize >= 0x10000)
        stackSize = 0xFFFC;

    // That is the bottom of the main stack block: no collision detection.
    // No collision detection means that stack can grow larger than 64k limit.
    os_thread_def_main.stack_pointer = (uint8_t*)(topOfStack - stackSize);

    // Leave OS_SCHEDULERSTKSIZE words for the scheduler and interrupts
    os_thread_def_main.stacksize = stackSize;
}
posted by Adam Green 10 Dec 2014

Hi Adam, this hack seems to work, though it's not optimal since we can't get the maximum stack size for the main thread (but at this point it's better than nothing!). Thanks for the suggestion. Just as a nitpick, should stackSize = 0xFFF8 in your check, since everything is supposed to be 8-byte aligned? I know there's code to handle such a case, but I was just wondering.

posted by Dave M 10 Dec 2014

There is no limit check on the main stack so it should actually grow over 64k. That means you don't get a limited main stack. You are telling the RTOS it is 64k but it will grow until the heap collides with it.

I believe the main 8-byte alignment happens later when RTX makes sure that the top of stack value actually set into PSP is properly aligned.

posted by Adam Green 10 Dec 2014

Thanks for that clarification!

posted by Dave M 10 Dec 2014

However even if thats true for main thread, how about other threads? Are they limitted to 65k?

posted by Erik - 10 Dec 2014

I think you're bringing up a good point, Erik. If the stack grows from the top down, as shown in the mbed docs, then the main stack might not collide into anything but an increasing heap, but what if more threads are created? I have to look again to see how the other threads have their stacks allocated, but I would think there's a problem with the main thread's stack overrunning another thread's.

posted by Dave M 11 Dec 2014

The other threads' stacks are allocated from the heap. Their stacks are limited to 64k.

posted by Adam Green 12 Dec 2014

That makes sense on one hand, on the other hand I think 64k is limitted since apparantly also Cortex-A processors are supported.

posted by Erik - 12 Dec 2014
9 years, 11 months ago.

From os_tcb.h

typedef struct OS_TCB {

...
  U8     reserved;
  U16    priv_stack;              /* Private stack size in bytes             */
...
} *P_TCB;

This is what Dave is talking about OS_TCB::priv_stack is U16, limiting any thread to max 2^16 "stack" space.

The initial stack size calculation for the X64F comes out to 0x3eaf4 (256756) bytes, which is truncated to 0xeaf4 (60148) bytes.

This is why the rtos appears to consume 200kb on the X64F.

Attempting to naively change the type to U32 causes a hard fault.

There is at least 1 place where assembly code uses an offset into the TCB structure:

From rt_TypeDef.h

#define TCB_TSTACK      36        /* 'tsk_stack' offset                      */

But changing this to 38 does not prevent the hard fault.

The 16 bit size value is used in rt_init_stack (HAL_CM.c) to set the address of a pointer which eventually becomes the stack address.

rt_init_stack in HAL_CM.c uses a 32 bit size value when it does the actual stack pointer manipulation.

I've temporarily hacked this by adding a special case for the main thread which adds the truncated bits back in. Ugly, but does the job of restoring our stack space to us:

  // hack to fix main stack truncation for X64F
  if( rt_get_TID() == 0x01)  
    size = (0x030000 + p_TCB->priv_stack) >> 2;

We'll probably stick with this until someone can suggest a real fix (I'd like to just change the U16 in the struct to be a U32).

posted by Giles Biddison 16 Dec 2014