wait_ms implement not suitable for <50us on FDRM KL25Z board

17 Mar 2013

The title should refer to the wait_us function, NOT the wait_ms function :(

The us_ticker_read function for the Freescale board has an issue with

code from us_ticker.c

ticks = (~ticks) / 24;

Seems that the compiler is killing us with the divide by 24 code such that a wait_us(1) is actually closer to 30+us and it doesn't get close to the requested delay until you are over the 100us mark.

I've implemented a version of wait_us which is more accurate for smaller delays (does have greater errors than the mbed library over 100us). Or simply remove the divide by 24 altogether and treat it as a wait_tick function..

uint32_t my_ticker_read() {
    uint64_t ticks;
    ticks  = (uint64_t)PIT->LTMR64H << 32;
    ticks |= (uint64_t)PIT->LTMR64L;
    ticks  = (~ticks); // remove divide by 24 from standard lib
    ticks = ticks >> 3; // divide by 8
    ticks = ticks * 0x55555556 >> 32; // divide by 3
    return (uint32_t)(0xFFFFFFFF & ticks);
    }
    
void my_wait_us(int us) {
    uint32_t start = my_ticker_read();
    while ((my_ticker_read() - start) < us);
}    

NOTE: You do have to call the regular us_ticker_read() function first to ensure the ticker is initialized...

Regards Andy

18 Mar 2013

Hi Andy, thank you for reporting this issue.

We have actually adopted your optimization in the official library: Revision 3

Cheers, Emilio

18 Mar 2013

That's weird, I would expect any modern compiler to turn that division into a multiplication. I suspect it's caused by usage of -Ospace instead of -Otime. You may want to tune this option per file.

BTW, here's the list of other places referring to __aeabi_uidivmod or __aeabi_idivmod:

i2c_frequency(), expression "PCLK / (i*ICR[j]);". There might be some clever optimization here, but it's not terribly important since it's unlikely to be called often. You could calculate the table on the first call but that would increase the RAM consumption.

spi_frequency(): similar to above.

serial_baud(): reverse constant division ("PCLK / (16 * baudrate);") so not easily optimized. Since it's a setup function like i2c_frequency() it's not important.

clock(): division by 10000 that should be replaced by multiplication (maybe try #pragma Otime for this specific function). This one can be called often so it's worth fixing IMO.

Timer::read_ms(): division by 1000. Can and should be fixed.

Related:

Timer::read(): you could eliminate a call to __aeabi_fdiv by doing division first (which should be optimized to multiplication) and conversion to float second.

There are many FP calculations in pwmout_api.c and I think some can be optimized. E.g.:

float pwmout_read(pwmout_t* obj)
{
    float v = (float)(*obj->CnV) / (float)(*obj->MOD);
    return (v > 1.0) ? (1.0) : (v);
}

I would replace by:

float pwmout_read(pwmout_t* obj) {
    uint32_t CnV = *obj->CnV;
    uint32_t MOD = *obj->MOD;
    if ( CnV >= MOD )
        return 1.0f;
    else
        return (float)(CnV) / (float)(MOD);
}
19 Mar 2013

Hi Igor, thank you for suggesting all these optimizations.

I have opened a ticket in our issue tracker to actually implement some of them.

We treated the "wait_us" issue with higher priority, because it was breaking badly the "promise" of the API with a too high relative error.

Cheers, Emilio

19 Mar 2013

Emilio,

Why don`t we get a function wait_cy? With cy = number of cycles in the ticker counter. Just an integer that represents the facts.

When we know at what frequency that counter is running most of us are not that stupid to calculate the time in us.

I am most irritated that this easy programming in s, ms and us allways gives these rounding errors. It is fine that this can be done for convenience, but in general I think it is a bad habit programming a controller with floating point values, if you can avoid it.

19 Mar 2013

wait_ms/wait_us take an integer as input, so you won't have rounding errors then and it is exactly what you want. Problem with cycles is that you immediatly lose portability, not to mention it doesn't really add alot. Mainly since what Igor said, it should be able to be processed alot faster by optimizing it to multiplications.

Edit: regarding those divisions in frequency setups, they also do store the frequency in a variable, so you might as well store the clock divider settings.

19 Mar 2013

The problem is that 1 us is still 24 (or is it 48?) counter cycles.

What portability problems are there to expect?

19 Mar 2013

Geert Hospers wrote:

What portability problems are there to expect?

Each mbed board has a different clock frequency. Moreover, you need an external peripheral to keep count of the number of clock cycles (AKA timer). You cannot use the main processor, because a simple loop with NOPs could get interrupted by multiple ISRs making its execution time very variable.

Despite their simplicity, timers can be very different from one microcontroller to another, as a single unified interface we choose microsecond timestamps.

Cheers, Emilio

19 Mar 2013

Emilio Monti wrote:

Geert Hospers wrote:

What portability problems are there to expect?

Each mbed board has a different clock frequency. Moreover, you need an external peripheral to keep count of the number of clock cycles (AKA timer). You cannot use the main processor, because a simple loop with NOPs could get interrupted by multiple ISRs making its execution time very variable.

Despite their simplicity, timers can be very different from one microcontroller to another, as a single unified interface we choose microsecond timestamps.

Cheers, Emilio

I am not giving up yet :)) I just ask for (timer-)cy as an alternative method to pass the real contents to the timer registers. It is then the responsability of the programmer to use cy or us (or ms or s). It is just an additional choice. And sorry I do not see the problem of differences in timers, controllers and brands, every board has its own startup files where differences in clock speeds and dividers can be handled. That is especially necessary for getting the translation from cy to us/ms/s, in fact you are turning the world upside down. I think you look too much to the controller from a top down view, like programs that are written for PC. Please do not force us into the direction that everything has to work with abstraction or OS that kills accuracy. I am coming from AVR, and for precise control we think and count in cycles.

19 Mar 2013

If you want low-level, why stop with the Timer class? Just go and write to the timer regs directly if you want. Or you can even write assembly and count each instruction. That's the nice thing about mbed platform you can use high-level wrappers or you can go down to the bare metal. That doesn't mean that mbed library should support all imaginable scenarios. It has a specific purpose and I think it performs it pretty well. Feature bloat is not always good.

19 Mar 2013

Igor Skochinsky wrote:

If you want low-level, why stop with the Timer class? Just go and write to the timer regs directly if you want. Or you can even write assembly and count each instruction. That's the nice thing about mbed platform you can use high-level wrappers or you can go down to the bare metal. That doesn't mean that mbed library should support all imaginable scenarios. It has a specific purpose and I think it performs it pretty well. Feature bloat is not always good.

1. What is that "specific purpose"?

1a. Do you want mbed to mimick a multitasking PC (we already have embedded linux for that) or do you want to make it an accurate controller?

2. I do not prefer to go low level. But the timers are the most important peripheral (for me) on a microcontroller. It seems principally wrong to setup a programming environment for a microcontroller that waists accuracy by programming registers by (floating point) approximated values. The minimum resolution of an integer value is 1 us. It is unacceptable that 24 peripheral cycles are waisted. Even with Bascom-BASIC this will not happen on AVR's.

19 Mar 2013

/media/uploads/sam_grove/specific_purpose.jpg

19 Mar 2013

Geert Hospers wrote:

Igor Skochinsky wrote:

If you want low-level, why stop with the Timer class? Just go and write to the timer regs directly if you want. Or you can even write assembly and count each instruction. That's the nice thing about mbed platform you can use high-level wrappers or you can go down to the bare metal. That doesn't mean that mbed library should support all imaginable scenarios. It has a specific purpose and I think it performs it pretty well. Feature bloat is not always good.

1. What is that "specific purpose"?

1a. Do you want mbed to mimick a multitasking PC (we already have embedded linux for that) or do you want to make it an accurate controller?

2. I do not prefer to go low level. But the timers are the most important peripheral (for me) on a microcontroller. It seems principally wrong to setup a programming environment for a microcontroller that waists accuracy by programming registers by (floating point) approximated values. The minimum resolution of an integer value is 1 us. It is unacceptable that 24 peripheral cycles are waisted. Even with Bascom-BASIC this will not happen on AVR's.

They wanted mbed to have a common interface for different microcontrollers (which is imo a good idea). You cannot do that if each microcontroller has a different time function. They could set the LPC1768 on 1/96us rate, but then the maximum time it could count also becomes 96 times smaller, so suddenly the maximum time you can count also becomes dependent on the microcontroller. Now they have imo a fairly nice balance between maximum time you can measure (and wait), and minimum. And this is only really an issue on the Freescale board, simply because it does not have a proper timer prescaler. On all other mbed boards, including the next mbed board, the timers have prescalers, so they are simply set to count at ticks of 1us. Then no timers have to be chained like with the freescale board, and you just have a single 32-bit timer used for the common time interface. Also really there is no rounding error if you just use the us/ms functions.

Aditionally it is fairly straightforward to make a child class of the timer class and use that to have the timer functions + your own custom function. Then you publish it and everyone is happy :)

.

However if we are complaining here about time related functions, I also have one, the PWM function. Imo there are two problems: 1. The clock divider is Fclk/4 (at least on LPC1768). That causes you to lose a factor 4 in clock speed (and accuracy), but gives a maximum time of 4 times longer. It is a 32-bit PWM register, I cannot think of any legimate PWM application which requires times you won't reach without the prescaler. If you really want it you can also make a dynamic clock divider.

2. While it runs at Fclk/4, you can't write more accurately than micro-second resolution, which really is just pointless rounding.

See: http://mbed.org/users/Sissors/code/FastPWM/

19 Mar 2013

Erik thanks for your understanding and explanation!

I honestly did not know that on mbedLPC1768 the timers where properly prescaled to run at 1 Mhz. That makes it all more "digestable".

Nevertheless, I find it hard to believe that it is so difficult (or breaking the code) if other time cycles then us, ms, s are introduced like peripheral_cy. It is only that for the newcomer 1/96 us may seem weird (ït does not look "nice"). It is normal that you have to set a clockspeed and dividers on a microcontroller. On ARM it is more complex, but at least peripheral speed setting seems easy.

I still have very much to learn C / CPP to write (publish) such a child class for the timers, so don't expect that too soon....

20 Mar 2013

I believe the the updated functions below from us_ticker.c will allow you to avoid both multiplication and division. When chaining the PIT timers, timer 0 can be set with a 24 count period which leads timer 1 counting at a 1 usec rate for a full 32bit period. The timer 0 LDVAL should probably be defined in a header somewhere in case the bus frequency is changed from the current 24Mhz (or even calculated at runtime).

static void pit_init(void) {
    SIM->SCGC6 |= SIM_SCGC6_PIT_MASK;   // Clock PIT
    PIT->MCR = 0;                       // Enable PIT

    // Timer 1
    PIT->CHANNEL[1].LDVAL = 0xFFFFFFFF;
    PIT->CHANNEL[1].TCTRL = PIT_TCTRL_CHN_MASK;    // Chain to timer 0, disable Interrupts
    PIT->CHANNEL[1].TCTRL |= PIT_TCTRL_TEN_MASK;   // Start timer 1

    // Timer 0
    PIT->CHANNEL[0].LDVAL = 23;
    PIT->CHANNEL[0].TCTRL = PIT_TCTRL_TEN_MASK;    // Start timer 0, disable interrupts
}

uint32_t us_ticker_read() {
    if (!us_ticker_inited)
        us_ticker_init();
    
    return ~(PIT->CHANNEL[1].CVAL);
}
20 Mar 2013

That looks to be a nice solution! Then you effectively made the same as what the LPCs have, channel 0 just works as prescaler.

20 Mar 2013

David White wrote:

When chaining the PIT timers, timer 0 can be set with a 24 count period which leads timer 1 counting at a 1 usec rate for a full 32bit period

Thanks David! :-)

I missed that one channel can be used as a prescaler for the other one.

When I am back to the office I'll update the library with this solution.

Cheers, Emilio

21 Mar 2013

Probably should add a rollover check on the wait_us function there is a (common?) use case (in wait_us) where ticker_read() returns a value less than the start value or am I missing something here??

Andy

21 Mar 2013

Hi Andy,

Andy M wrote:

Probably should add a rollover check on the wait_us function there is a (common?) use case (in wait_us) where ticker_read() returns a value less than the start value or am I missing something here??

I guess you are worried about when the "us_ticker" timer has a starting value very close to the maximum 32bit unsigned integer and you want to wait for a time delay that will expire when the timer will mark a very low count. To exemplify, let's say:

    // We want to way 100 microseconds
    int us = 100;
    
    // The timer is 16 microsecond away to be reset to 0
    uint32_t start = 0xFFFFFFF0;

To exemplify how the code is robust to this case, let's see, as an example, how it handles the check when the timer marks 100 ticks, for a total elapsed time of 116 microseconds, from the original 0xFFFFFFF0 count.

    // Current timer count
    uint32_t us_ticker  = 100;
    
    // void wait_us(int us) {
    //     uint32_t start = us_ticker_read();
    //     while ((us_ticker_read() - start) < us);
    // }
    if ((us_ticker - start) < us) {
        printf("while(True): Failed to detect that the interval is expired, still in the loop\n");
    } else {
        printf("while(False): Got out of the wait loop\n");
    }
    printf("%d\n", (time - start));

The above actually prints:

while(False): Got out of the wait loop
116

As expected and desired.

Cheers, Emilio

Import programtest_wait

"wait_us" example

21 Mar 2013

We updated the KL25Z "us_ticker" implementation based on David's suggestion: Revision 4.

Cheers, Emilio

07 Apr 2013

Emilio Monti wrote:

We updated the KL25Z "us_ticker" implementation based on David's suggestion: Revision 4.

Cheers, Emilio

How do I go about using that revision? Sorry, I'm just starting with mbed.

08 Apr 2013

If you have a new project it uses the latest revision.

If you have an older project click on the mbed library in your program in the compiler, then at the right side you have an update button.

08 Apr 2013

Hmmm It still isn't working then. It crashes with anything less than about 70.

08 Apr 2013

What code are you using it with?

08 Apr 2013

I made this simple program to test it. http://mbed.org/users/vaxon/code/KL25Z_timer_issues/

I also tried "test_wait" above but it is also very glitchy.

09 Apr 2013

Quote:

I made this simple program to test it. http://mbed.org/users/vaxon/code/KL25Z_timer_issues/

I looked at your example. Noticed that you use LED3, which does not exist on KL25Z? Also note that the LEDs on KL25Z light up when you write a 0 to the pin, rather than when you write a 1 as on the LPC1768. With your current example code you will have a hard time noticing the LED light up as it will be on for only 100us (20us x 5) during each 200ms interval (20us * 10000).

09 Apr 2013

LED3 seems to light the red LED. (Oh that is supposed to be 1000. I changed it and forgot to publish) Sure you can't see it blink but the red led is visible and dim at >100.

That is also jsut the striped down example to remove anything that might be a problem. I tried it with serial and that won't work at all <70 or so.

Thank you for your responses! I'd really love to get started developing on this platform.

09 Apr 2013

This is a better example. At 100ns, the green LED blinks on .25sec, off .75sec and serial prints j; At 20 it is locked up.

Repository: KL25Z_timer_issues