Performance comparison: Arduino Uno vs. ST Nucleo L152RE

1. Introduction

Recently I started to investigate the ARM Cortex-M world and I browsed a lot to find information about suitable and affordable evaluation boards. Inevitably I stumbled upon blogs with commenting functionality and the readers discussed whether an AVR microcontroller is outperformed by an ARM or not. As usual these comments contain a lot of opinion and absolutely no reliable data. Therefore I thought, now that I have an ARM board, I'll compare it to the Arduino UNO.

I have examined the minimum latency i.e. the minimum time between the occurrence of an external interrupt and a change of a digital output and the timings of integer math operations.

I used the following hardware for the tests:

  • Arduino UNO R3 with ATMega328P @ 16MHz
  • ST Nucleo L152RE with STM32L152RET6 @ 32MHz
  • Oscilloscope for time measurement

2. Minimum latency measurement

To measure the minimum latency I have configured one pin as an input and another pin as an output. Both pins were connected by a wire and the oscilloscope was hooked up to this connection. An interrupt handler which reacts on every edge was attached to the input pin. Its purpose is to set the output pin to the inverse value of the input pin. Since the output is connected back to the input that leads to an infinite loop of value changes which produce a square wave with a period time that is twice the minimum latency.

2.1 Arduino Uno

On the Arduino Uno pin 2 was used as the input and pin 3 as the output. The following listing shows the first sketch I tried. This version uses the functions of the Wiring library as documented at the Arduino website and is most likely the way most of the Arduino users would implement it. The resulting minimum latency is 14.35µs.

Pure Wiring functions

static const int PIN_IN = 2;
static const int PIN_OUT = 3;
  
static void pin_in_on_change()
{
  digitalWrite(PIN_OUT, !digitalRead(PIN_IN));
}

void setup()
{
  pinMode(PIN_IN, INPUT);
  pinMode(PIN_OUT, OUTPUT);
  
  attachInterrupt(0, pin_in_on_change, CHANGE);
}

void loop()
{
  /* start interrupt loop: */
  digitalWrite(PIN_OUT, HIGH);
  while(1);
}

The second version is the way more advanced users would implement the interrupt handler. Here the the port register is directly manipulated which saves a lot of time. In this case the minimum latency is only 6.38µs.

Direct manipulation of port register

static const int PIN_IN = 2;
static const int PIN_OUT = 3;
  
static void pin_in_on_change()
{
  PORTD ^= ((~PORTD) << 1) & 0x08;
}

void setup()
{
  pinMode(PIN_IN, INPUT);
  pinMode(PIN_OUT, OUTPUT);
  
  attachInterrupt(0, pin_in_on_change, CHANGE);
}

void loop()
{
  /* start interrupt loop: */
  digitalWrite(PIN_OUT, HIGH);
  while(1);
}

2.2 ST Nucleo

On the ST Nucleo pin 6 was used as the output and pin 6 as input. With the following code the minimum latency was 5.23µs.

#include "mbed.h"

InterruptIn d7_in(D7);
DigitalOut d6_out(D6);

static void d7_on_change()
{
  d6_out = !d7_in;
}

int main() 
{
  d7_in.rise(d7_on_change);
  d7_in.fall(d7_on_change);
    
  /* start interrupt loop: */
  d6_out = 1;
}

2.3 Results

The final results are summarized in the following table:

Arduino with Wiring	                     14.35µs
Arduino with direct port manipulation     6.38µs
ST Nucleo                                 5.23µs

There you see, that the ST Nucleo is slightly faster than the Arduino using direct port manipulation but nearly three times faster than the Arduino using the Wiring functions.

3. Integer math comparison

To measure the execution time of mathematical operations I have configured one pin as an output pin and raised the output before the operation and lowered it afterwards. That produced a pulse for each operation and the length of the pulse equals the execution time of the math operation plus the time needed to raise and lower the output. To compensate for the time to raise and lower the output the math operation was omitted in the first pulse. Therefore the first pulse gives the time that has to be subtracted from the other pulses to get the pure math operation time. The following pulses encapsulated addition, subtraction, multiplication and division.

3.1 Arduino Uno

On the Arduino Uno the following code was used:

/** 
 * Output pin for oscilloscope monitoring:
 */
static const int PIN_OUT = 3;

/** 
 * Declare variables as volatile to assure that the compiler does not optimize
 * mathematical operations away:
 */
static volatile uint8_t a, b, c, d, e, f;

void setup()
{
  pinMode(PIN_OUT, OUTPUT);
  
  a = 0;
  b = 0;
}

void loop()
{
  a++;
  b--;
  
  /* empty pulse: */
  PORTD |= 0x08;
  PORTD ^= 0x08;

  /* addition: */
  PORTD |= 0x08;
  c = a+b;
  PORTD ^= 0x08;
  
  /* subtraction: */
  PORTD |= 0x08;
  d = a-b;
  PORTD ^= 0x08;
  
  /* multiplication: */
  PORTD |= 0x08;
  e = a*b;
  PORTD ^= 0x08;
  
  /* division: */
  PORTD |= 0x08;
  f = a/b;
  PORTD ^= 0x08;
 
  delay(100);
}

To test other integer sizes the variable declaration on line 10 was adjusted.

The following image shows the pulse burst produced by the above code: /media/uploads/iliketux/arduino_uint8.png The final results for the different integer data types are shown in the following table:

Addition (uint8_t)                      186.00ns
Subtraction (uint8_t)                   186.00ns
Multiplication (uint8_t)                374.00ns
Division (uint8_t)                       12.63µs

Addition (int8_t)                       186.00ns
Subtraction (int8_t)                    186.00ns
Multiplication (int8_t)                 374.00ns
Division (int8_t)                        15.35µs

Addition (uint16_t)                     376.00ns
Subtraction (uint16_t)                  376.00ns
Multiplication (uint16_t)               940.00ns
Division (uint16_t)                      12.59µs

Addition (int16_t)                      376.00ns
Subtraction (int16_t)                   376.00ns
Multiplication (int16_t)                940.00ns
Division (int16_t)                       15.39µs

Addition (uint32_t)                     750.00ns
Subtraction (uint32_t)                  750.00ns
Multiplication (uint32_t)                 3.50µs
Division (uint32_t)                      36.35µs

Addition (int32_t)                      750.00ns
Subtraction (int32_t)                   750.00ns
Multiplication (int32_t)                  3.50µs
Division (int32_t)                       39.75µs

3.2 ST Nucleo

On the ST Nucleo the following code was used:

#include "mbed.h"

DigitalOut d6_out(D6);
DigitalOut led(LED1);

static volatile int32_t a, b, c, d, e, f;

void setup()
{
    a = 0;
    b = 0;
}

void loop()
{
    a++;
    b--;
    
    d6_out = 1;
    d6_out = 0;
        
    d6_out = 1;
    c = a + b;
    d6_out = 0;
    
    d6_out = 1;
    d = a - b;
    d6_out = 0;
    
    d6_out = 1;
    e = a * b;
    d6_out = 0;
    
    d6_out = 1;
    f = a / b;
    d6_out = 0;
    
    wait(0.1f);
}

int main() 
{
    setup();
    while(1) loop();
}

The following two images show the resulting pulse bursts: /media/uploads/iliketux/st_nucleo_uint8.png /media/uploads/iliketux/st_nucleo_zoom.png During the tests it became apparent that the execution time of integer math operation is independent of the used integer type. Therefore the final results can be summarized in the following table:

Addition                                156.00ns
Subtraction                             156.00ns
Multiplication                          156.00ns
Division                                280.00ns

3.3 Results

The comparison of the results shows that the Arduino is only for 8 bit additions and subtractions nearly as fast as the ST Nucleo. In all the other cases the ST Nucleo is the clear winner starting with twice as fast for 8 bit multiplications and ending with 142 times as fast for 32 bit divisions.


Please log in to post comments.