8 years, 4 months ago.

Critical section bug + unpredictable behaviour

Hello,

I guess I found a bug in Timer.cpp I'll explain what is happening (Correct me if I am wrong). Then I will explain another problem happening, I can't even debug or think of root cause, so I'd really appreciate it if someone brainstorm with me.

So First let's talk about first bug in Timer.cpp, there are two function

int Timer::read_us() {
    core_util_critical_section_enter();
    int time = _time + slicetime();
    core_util_critical_section_exit();
    return time;
}

int Timer::slicetime() {
    core_util_critical_section_enter();
    int ret = 0;
    if (_running) {
        ret = ticker_read(_ticker_data) - _start;
    }
    core_util_critical_section_exit();
    return ret;
}

As you can see, read_us() calls slicetime() inside critical section, while slicetime() contain critical section. So what happen is we enter nested critical section and the counter inside the critical section increment before it's decremented.

void core_util_critical_section_enter(void)
{
    bool interrupts_disabled = !core_util_are_interrupts_enabled();
    __disable_irq();

    /* Save the interrupt disabled state as it was prior to any nested critical section lock use */
    if (!interrupt_enable_counter) {
        critical_interrupts_disabled = interrupts_disabled;
    }

    /* If the interrupt_enable_counter overflows or we are in a nested critical section and interrupts
       are enabled, then something has gone badly wrong thus assert an error.
    */
    MBED_ASSERT(interrupt_enable_counter < UINT32_MAX); 
// FIXME
#ifndef   FEATURE_UVISOR
    if (interrupt_enable_counter > 0) {
        MBED_ASSERT(interrupts_disabled);
    }
#else
#warning "core_util_critical_section_enter needs fixing to work from unprivileged code"
#endif /* FEATURE_UVISOR */
    interrupt_enable_counter++;
}

void core_util_critical_section_exit(void)
{
    /* If critical_section_enter has not previously been called, do nothing */
    if (interrupt_enable_counter) {

// FIXME
#ifndef   FEATURE_UVISOR
        bool interrupts_disabled = !core_util_are_interrupts_enabled(); /* get the current interrupt disabled state */

        MBED_ASSERT(interrupts_disabled); /* Interrupts must be disabled on invoking an exit from a critical section */
#else
#warning "core_util_critical_section_exit needs fixing to work from unprivileged code"
#endif /* FEATURE_UVISOR */

        interrupt_enable_counter--;

        /* Only re-enable interrupts if we are exiting the last of the nested critical sections and
           interrupts were enabled on entry to the first critical section.
        */
        if (!interrupt_enable_counter && !critical_interrupts_disabled) {
            __enable_irq();
        }
    }
}

Inside function void core_util_critical_section_enter(void), if counter interrupt_enable_counter>0, MBED should assert and go to mbed_die() function, which is what happens because there are two consecutive calls for function void core_util_critical_section_enter(void).

The second issue, which is related to my implementation,: I have been porting mbed to Tiva C using ARM GCC, I've successfully implemented GPIO and us_ticker, and was function alright so far, Until, I started to implement UART to communicate with ESP WIFI module. When there's UART receiver interrupt, Something weird happens to the counter inside the function void core_util_critical_section_enter(void), the counter suddenly increase to several hundreds. And the function calls before core_util_critical_section_enter(void) are always different and the incremented value are always different too. but when I force the counter to 0, the program continues normally (but not functioning the way I want). So the second question is, Can you think how should I debug this issue? or how could I exclude the problem one by one. You can find my code in the following link :

https://github.com/Mohamedsaleh14/mbed-porting-to-tiva-C--using-Eclipse-and-ARM-GCC/tree/Flashing-ESP-through-UART

2 Answers

8 years, 4 months ago.

Hello,

I do not yet provide an answer, but would like to have this logged in on github where we can reference the code and a patch plus mention other developers to get their attention. Can you please create an issue for it https://github.com/ARMmbed/mbed-os ?

We will have a look at this, thanks for reporting !

8 years, 4 months ago.

Hello Mohamed,

The function core_util_critical_section_enter()/core_util_critical_section_exit() is safe to call in nested blocks. When interrupt_enable_counter > 0 the code should only call mbed_die() if interrupts are enabled. Is it possible that the board you are using is enabling interrupts in ticker_read()?

As for debugging the critical section counter, I would recommend adding code to break or loop if the counter gets above 10 or so, as the count should not typically get this high. If you have a debugger attached, you should be able to see the callstack and what is causing the count to increment so high. If you don't find anything there, I would then recommend looking at the map file to see if there are any buffers near the critical section counter which could be overflowing and clobbering the counter.