Memory Optimization

Beginning with Mbed OS 5, new features such as RTOS created an increase in flash and RAM usage. This guide explains how to optimize program memory usage for release builds using Mbed OS 5.

Note: More information about the memory usage differences between Mbed OS 2 and Mbed OS 5 is available in a blog post.

Removing unused modules

For a simple program like Blinky, a program that flashes an LED, typical memory usage is split among the following modules:

+---------------------+-------+-------+-------+
| Module              | .text | .data |  .bss |
+---------------------+-------+-------+-------+
| Fill                |   132 |     4 |  2377 |
| Misc                | 28807 |  2216 |    88 |
| features/frameworks |  4236 |    52 |   744 |
| hal/common          |  2745 |     4 |   325 |
| hal/targets         | 12172 |    12 |   200 |
| rtos/rtos           |   119 |     4 |     0 |
| rtos/rtx            |  5721 |    20 |  6786 |
| Subtotals           | 53932 |  2312 | 10520 |
+---------------------+-------+-------+-------+

Even if you are no longer testing your program, the features/frameworks module includes the Mbed OS test tools. Because of this, you are building one of our test harnesses into every binary. Removing this module saves a significant amount of RAM and flash memory.

Printf and UART

The linker can also remove other modules that your program does not use. For example, Blinky's main program doesn't use printf or UART drivers. However, every Mbed OS module handles traces and assertions by redirecting their error messages to printf on serial output - forcing the printf and UART drivers to be compiled in and requiring a large amount of flash memory.

To disable error logging to serial output, set the NDEBUG macro and the following configuration parameter in your program's mbed_app.json file:

{
    "macros": [
        "NDEBUG=1"
    ],
    "target_overrides": {
        "*": {
            "platform.stdio-flush-at-exit": false
        }
    }
}

Note: Different compilers, different results; compiling with one compiler yields different memory usage savings than compiling with another.

Embedded targets

You can also take advantage of the fact that these programs only run on embedded targets. When you run a C++ application on a desktop computer, the operating system constructs every global C++ object before calling main. It also registers a handle to destroy these objects when the program ends. The code the compiler injects has some implications for the application:

  • The code that the compiler injects consumes memory.
  • It implies dynamic memory allocation and thus requires the binary to include malloc, even when the application does not use it.

When you run an application on an embedded device, you don't need handlers to destroy objects when the program exits, because the application will never end. You can save more RAM and flash memory usage by removing destructor registration on application startup and by eliminating the code to destruct objects when the operating system calls exit() at runtime.

memap - static memory map analysis

memap is a simple utility that displays static memory information required by Arm Mbed applications. This information is produced by analyzing the memory map file previously generated by your toolchain.

Note: This tool shows static RAM usage and the total size of allocated heap and stack space defined at compile time, not the actual heap and stack usage (which may be different depending on your application).

Using memap

memap is automatically invoked after an Mbed build finishes successfully. It's also possible to manually run the program with different command-line options, for example:

$> python memap.py
usage: memap.py [-h] -t TOOLCHAIN [-o OUTPUT] [-e EXPORT] [-v] file

Memory Map File Analyser for ARM mbed version 0.3.11

positional arguments:
  file                  memory map file

optional arguments:
  -h, --help            show this help message and exit
  -t TOOLCHAIN, --toolchain TOOLCHAIN
                        select a toolchain used to build the memory map file
                        (ARM, ARMC6, GCC_ARM, IAR)
  -o OUTPUT, --output OUTPUT
                        output file name
  -e EXPORT, --export EXPORT
                        export format (examples: 'json', 'csv-ci', 'table':
                        default)
  -v, --version         show program's version number and exit

Result example:

$> python memap.py GCC_ARM\myprog3.map -t GCC_ARM

+----------------------------+-------+-------+------+
| Module                     | .text | .data | .bss |
+----------------------------+-------+-------+------+
| Fill                       |   170 |     0 | 2294 |
| Misc                       | 36282 |  2220 | 2152 |
| core/hal                   | 15396 |    16 |  568 |
| core/rtos                  |  6751 |    24 | 2662 |
| features/FEATURE_IPV4      |    96 |     0 |   48 |
| frameworks/greentea-client |   912 |    28 |   44 |
| frameworks/utest           |  3079 |     0 |  732 |
| Subtotals                  | 62686 |  2288 | 8500 |
+----------------------------+-------+-------+------+
Allocated Heap: 65540 bytes
Allocated Stack: 32768 bytes
Total Static RAM memory (data + bss): 10788 bytes
Total RAM memory (data + bss + heap + stack): 109096 bytes
Total Flash memory (text + data + misc): 66014 bytes

Information on memory sections

The table above showed multiple memory sections.

  • .text: is where the code application and constants are located in Flash.
  • .data: nonzero initialized variables; allocated in both RAM and Flash memory (variables are copied from Flash to RAM at runtime).
  • .bss: uninitialized data allocated in RAM, or variables initialized to zero.
  • Heap: dynamic allocations in the Heap area in RAM (for example, used by malloc). The maximum size value may be defined at build time.
  • Stack: dynamic allocations in the Stack area in RAM (for example, used to store local data, temporary data when branching to a subroutine or context switch information). The maximum size value may be defined at build time.

There are other entries that require a bit of clarification:

  • Fill: represents the bytes in multiple sections (RAM and Flash) that the toolchain has filled with zeros because it requires subsequent data or code to be aligned appropriately in memory.
  • Misc: usually represents helper libraries introduced by the toolchain (like libc), but can also represent modules that are not part of Mbed.

Current support

We have tested memap on Windows 7, Linux and Mac OS X. The GCC_ARM (GNU Arm Embedded Toolchain), Arm Compiler 5, Arm Compiler 6 and IAR toolchains generate memory map files.

Known issues and new features

This utility is considered "alpha" quality at the moment. The information generated by this utility may not be fully accurate and may vary from one toolchain to another.

If you are experiencing problems, or would like additional features, please raise a ticket on GitHub and use [memap] in the title.

Runtime memory tracing

Running out of memory is a common problem with resource constrained systems such as the MCUs on which Arm Mbed OS runs. When faced with an out of memory error, you often need to understand how your software uses dynamic memory. The runtime memory tracer in Mbed OS 5 is the tool that shows the runtime memory allocation patterns of your software: which parts of the code allocate and free memory and how much memory they need.

Using the memory tracer

The memory tracer is not enabled by default. To enable it, you need to define the MBED_MEM_TRACING_ENABLED macro. The recommended way to define this macro is to add it to the list of macros defined in your mbed_app.json:

{
    "macros": ["MBED_MEM_TRACING_ENABLED"]
}

Tip: See the documentation of the Arm Mbed configuration system for more details about mbed_app.json.

After it is enabled, the memory tracer intercepts the calls to the standard allocation functions (malloc, realloc, calloc and free). It invokes a user supplied callback each time one of these functions is called. To let the tracer know which callback it needs to invoke, call mbed_mem_trace_set_callback(callback_function_name) as early as possible (preferably at the beginning of your main function). You can find the full documentation of the callback function in the memory tracer header file. The tracer supplies a default callback function (mbed_mem_trace_default_callback) that outputs trace data on the Mbed console (using printf). For each memory operation, the callback outputs a line that begins with #<op>:<0xresult>;<0xcaller>-:

  • op identifies the memory operation (m for malloc, r for realloc, c for calloc and f for free).
  • result (base 16) is the result returned by the memory operation. This is always 0 for free because free doesn't return anything.
  • caller (base 16) is the address in the code where the memory operation was called.

The rest of the output depends on the operation being traced:

  • For malloc: size, where size is the original argument to malloc.
  • For realloc: 0xptr;size, where ptr (base 16) and size are the original arguments to realloc.
  • For calloc: nmemb;size, where nmemb and size are the original arguments to calloc.
  • For free: 0xptr, where ptr (base 16) is the original argument to free.

Examples:

  • #m:0x20003240;0x600d-50 encodes a malloc that returned 0x20003240. It was called by the instruction at 0x600D with the size argument equal to 50.
  • #f:0x0;0x602f-0x20003240 encodes a free that was called by the instruction at 0x602f with the ptr argument equal to 0x20003240.

Find the source of the default callback here. Besides being useful in itself, it can also serve as a template for user defined callback functions.

Tip: Find the full documentation of the callback function in the memory tracer header file.

Example

A simple code example that uses the memory tracer on a K64F board:

#include <stdlib.h>
#include "mbed.h"
#include "mbed_mem_trace.h"


int main() {
    mbed_mem_trace_set_callback(mbed_mem_trace_default_callback);
    while (true) {
        void *p = malloc(50);
        wait(0.5);
        free(p);
    }
}

It outputs the following trace:

#m:0x20003080;0x182f-50
#f:0x0;0x183f-0x20003080
#m:0x20003080;0x182f-50
#f:0x0;0x183f-0x20003080
#m:0x20003080;0x182f-50
#f:0x0;0x183f-0x20003080
...

Limitations

  • The tracer doesn't handle nested calls of the memory functions. For example, if you call realloc and the implementation of realloc calls malloc internally, the call to malloc is not traced.
  • The caller argument of the callback function isn't always reliable. It doesn't work at all on some toolchains, and it might output erroneous data on others.

Runtime statistics

Arm Mbed OS 5 provides various runtime statistics to help characterize resource usage. This allows easy identification of potential problems, such as a stack close to overflowing. The metrics currently supported are available for the heap and the stack.

Heap statistics

Heap statistics provide exact information about the number of bytes dynamically allocated by a program. It does not take into account heap fragmentation or allocation overhead. This allows allocation size reports to remain consistent, regardless of order of allocation (fragmentation) or allocation algorithm (overhead).

To enable heap stats:

  1. Add the command-line flag -DMBED_HEAP_STATS_ENABLED=1.
  2. Use the function mbed_stats_heap_get() to take a snapshot of heap stats.

Note: This function is available even when the heap stats are not enabled, but always returns zero for all fields.

Example use cases
  • Getting worst case memory usage, max_size, to properly size MCU RAM.
  • Detecting program memory leaks by the current size allocated (current_size) or number of allocations in use (alloc_cnt).
  • Use alloc_fail_cnt to check if allocations have been failing, and if so, how many.
Example program using heap statistics
#include "mbed.h"
#include "mbed_stats.h"

int main(void)
{
    mbed_stats_heap_t heap_stats;

    printf("Starting heap stats example\r\n");

    void *allocation = malloc(1000);
    printf("Freeing 1000 bytes\r\n");

    mbed_stats_heap_get(&heap_stats);
    printf("Current heap: %lu\r\n", heap_stats.current_size);
    printf("Max heap size: %lu\r\n", heap_stats.max_size);

    free(allocation);

    mbed_stats_heap_get(&heap_stats);
    printf("Current heap after: %lu\r\n", heap_stats.current_size);
    printf("Max heap size after: %lu\r\n", heap_stats.max_size);
}
Side effects of enabling heap statistics
  • An additional 8 bytes of overhead for each memory allocation.
  • The function realloc will never reuse the buffer it is resizing.
  • Memory allocation is slightly slower due to the added bookkeeping.

Stack statistics

Stack stats provide information on the allocated stack size of a thread and the worst case stack usage. Any thread on the system can be queried for stack information.

To enable heap stats, add the command-line flag -DMBED_STACK_STATS_ENABLED=1.

There are two functions you can use to access the stack stats:

  • mbed_stats_stack_get calculates combined stack informations for all threads.
  • mbed_stats_stack_get_each provides stack informations for each thread separately.

Note: These functions are available even when the stack stats are not enabled but always return zero for all fields.

Example use cases

  • Using max_size to calibrate stack sizes for each thread.
  • Detecting which stack is close to overflowing.
Example program using stack statistics
#include "mbed.h"
#include "mbed_stats.h"

int main(void)
{
    printf("Starting stack stats example\r\n");

    int cnt = osThreadGetCount();
    mbed_stats_stack_t *stats = (mbed_stats_stack_t*) malloc(cnt * sizeof(mbed_stats_stack_t));

    cnt = mbed_stats_stack_get_each(stats, cnt);
    for (int i = 0; i < cnt; i++) {
        printf("Thread: 0x%X, Stack size: %u, Max stack: %u\r\n", stats[i].thread_id, stats[i].reserved_size, stats[i].max_size);
    }
}
Mistake on this page? Email us or submit a change in GitHub.