Library allowing up to 16 strings of 60 WS2811 or WS2812 LEDs to be driven from a single FRDM-KL25Z board. Uses hardware DMA to do a full 800 KHz rate without much CPU burden.
After being frustrated by the SPI system's performance, I ended up using an approach inspired by Paul Stoffregen's OctoWS2811. This uses 3 of the 4 DMA channels triggered by the TPM0 timer PWM and overflow events.
This design will allow for up to 16 strings of up to 60 (limited by RAM space) WS2811/WS2812 LEDs to be driven on a single port. Adding more strings takes the same time to DMA, because the bits are output in parallel.
Here is my test program:
Import programMulti_WS2811_test
Test program for my Multi_WS2811 library that started out as a fork of heroic/WS2811. My library uses hardware DMA on the FRDM-KL25Z to drive up to 16 strings of WS2811 or WS2812 LEDs in parallel.
Here's 60 LEDs on a single string, at 10% brightness: https://www.icloud.com/sharedalbum/#B015oqs3qeGdFY
Note though that the 3.3V output from the FRDM-KL25Z's GPIO pins is OUT OF SPEC for driving the 5V WS2812 inputs, which require 3.5V for a logic HIGH signal. It only works on my board if I don't connect my scope or logic analyzer to the output pin. I recommend that you add a 5V buffer to the outputs to properly drive the LED strings. I added a CD4504 to do the 3.3 to 5V translation (mostly because I had one). You could use (say) a 74HCT244 to do 8 strings.
Each LED in a string takes 24/800e3 seconds to DMA, so if MAX_LEDS_PER_STRING is set to 60, then it takes 1.8 msec to actually do the DMA, plus 64 usec of guard time, or 1.87 msec per frame (538 frames/second). Of course, actually composing the frame will take most of the time in a real program.
The way I have my code set up, I can use up to 8 pins on PORTD. However, changing the defines at the top of WS2811.cpp will change the selected port.
Alternatively, you could use another port to get more strings. Watch out for pin mux conflicts, though.
Here are your choices:
- PORTE: 15 total: PTE0-PTE5, PTE20-PTE25, PTE29-PTE31
- PORTD: 8 total: PTD0-PTD7
- PORTC: 16 total: PTC0-PTC13, PTC16-17
- PORTB: 16 total: PTB0-PTB11, PTB16-19
- PORTA: 15 total: PTA0-PTA5, PTA12-PTA20
Here is how the DMA channels are interleaved:
The way I have it set up to generate the three phases of the required waveform is this:
I have timer TPM0 set up to generate events at overflow (OVF), at 250 nsec (CH0), and at 650 nsec (CH1). At 1250 nsec it resets to 0.
At timer count = 0, DMA0 fires, because it's triggered by TPM0's overflow (OVF) event. This results in the data lines being driven to a constant "1" level, as the data that DMA0 is programmed to transfer is a single, all-1's word. (This is the easiest way to explain what is happening; this is the way I'd wanted it to work, but I had to use as much precious RAM as for the RGB data to hold 1's to get it to work).
At 250 nsec, DMA1 fires, because it's triggered by TPM0's CH0 compare event. This drives either a 0 or 1 level to the pins, because DMA1 is programmed to transfer our data bytes to the pins.
At 650 nsec, DMA2 fires, because it's triggered by TPM0's CH1 compare event. This results in the data lines being driven to a constant "0" level, as the data that DMA2 is programmed to transfer is a single, all-0's word.
At 1250 nsec, the timer resets to 0, and the whole cycle repeats.
Because this library uses three of timer TPM0's six channels (and sets TPM0 to 800kHz), you will need to select TPM1 or TPM2 output pins if you want to use PwmOut pins in your program (for instance, for RC servos, which want a 50Hz frequency). If you just want to change discrete LED brightnesses, you can use TPM0's CH3, CH4, or CH5 pins. Just make sure that you set up your PwmOut instance at the same frequency.
Here is a table showing the assignment of timer resources to PwmOut capable pins in the FRDM-KL25Z:
KL25Z pin | Arduino name | Timer | Channel |
---|---|---|---|
PTA3 | TPM0 | CH0 | |
PTC1 | A5 | TPM0 | CH0 |
PTD0 | D10 | TPM0 | CH0 |
PTE24 | TPM0 | CH0 | |
PTA4 | D4 | TPM0 | CH1 |
PTC2 | A4 | TPM0 | CH1 |
PTD1 | D13/LED_BLUE | TPM0 | CH1 |
PTE25 | TPM0 | CH1 | |
PTA5 | D5 | TPM0 | CH2 |
PTC3 | TPM0 | CH2 | |
PTD2 | D11 | TPM0 | CH2 |
PTE29 | TPM0 | CH2 | |
PTC4 | TPM0 | CH3 | |
PTD3 | D12 | TPM0 | CH3 |
PTE30 | TPM0 | CH3 | |
PTC8 | D6 | TPM0 | CH4 |
PTD4 | D2 | TPM0 | CH4 |
PTE31 | TPM0 | CH4 | |
PTA0 | TPM0 | CH5 | |
PTC9 | D7 | TPM0 | CH5 |
PTD5 | D9 | TPM0 | CH5 |
PTE26 | TPM0 | CH5 | |
PTA12 | D3 | TPM1 | CH0 |
PTB0 | A0 | TPM1 | CH0 |
PTE20 | TPM1 | CH0 | |
PTA13 | D8 | TPM1 | CH1 |
PTB1 | A1 | TPM1 | CH1 |
PTE21 | TPM1 | CH1 | |
PTA1 | D0/USBRX | TPM2 | CH0 |
PTB18 | LED_RED | TPM2 | CH0 |
PTB2 | A2 | TPM2 | CH0 |
PTE22 | TPM2 | CH0 | |
PTA2 | D1/USBTX | TPM2 | CH1 |
PTB19 | LED_GREEN | TPM2 | CH1 |
PTB3 | A3 | TPM2 | CH1 |
PTE23 | TPM2 | CH1 |
WS2811.cpp
- Committer:
- Ned Konz
- Date:
- 2015-06-12
- Revision:
- 4:990838718b51
- Parent:
- 3:df4319053bfa
- Child:
- 5:2c3b76ea0b40
File content as of revision 4:990838718b51:
// 800 KHz WS2811 driver driving potentially many LED strings. // Uses 3-phase DMA // 16K SRAM less stack, etc. // // Per LED: 3 bytes (malloc'd) for RGB data // // Per LED strip / per LED // 96 bytes (static) for bit data // + 96 bytes (static) for ones data // = 192 bytes // // 40 LEDs max per string = 7680 bytes static // // 40 LEDs: 7680 + 40*3 = 7800 bytes // 80 LEDs: 7680 + 80*3 = 7920 bytes #include <mbed.h> #include "MKL25Z4.h" #ifndef MBED_WS2811_H #include "WS2811.h" #endif #if defined(WS2811_DEBUG_PIN) #define DEBUG_MASK (1<<WS2811_DEBUG_PIN) #define RESET_DEBUG (WS2811_IO_GPIO->PDOR &= ~DEBUG_MASK) #define SET_DEBUG (WS2811_IO_GPIO->PDOR |= DEBUG_MASK) #else #define DEBUG_MASK 0 #define RESET_DEBUG (void)0 #define SET_DEBUG (void)0 #endif static volatile unsigned dma_done = 3; // 48 MHz clock, no prescaling. #define NSEC_TO_TICKS(nsec) ((nsec)*48/1000) #define USEC_TO_TICKS(usec) ((usec)*48) #define CLK_NSEC 1250 #define tpm_period NSEC_TO_TICKS(CLK_NSEC) #define tpm_p0_period NSEC_TO_TICKS(250) #define tpm_p1_period NSEC_TO_TICKS(650) #define guardtime_period USEC_TO_TICKS(55) /* guardtime minimum 50 usec. */ enum DMA_MUX_SRC { DMA_MUX_SRC_TPM0_CH_0 = 24, DMA_MUX_SRC_TPM0_CH_1, DMA_MUX_SRC_TPM0_Overflow = 54, }; enum DMA_CHAN { DMA_CHAN_START = 0, DMA_CHAN_0_LOW = 1, DMA_CHAN_1_LOW = 2, N_DMA_CHANNELS }; // class static template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::wait_for_dma_done() { #if 0 while (dma_done < 3) wait_us(100); #endif } // class static template <unsigned MAX_LEDS_PER_STRIP> bool WS2811<MAX_LEDS_PER_STRIP>::initialized = false; // class static template <unsigned MAX_LEDS_PER_STRIP> uint32_t WS2811<MAX_LEDS_PER_STRIP>::enabledPins = 0; #define WORD_ALIGNED __attribute__ ((aligned(4))) // class static template <unsigned MAX_LEDS_PER_STRIP> struct WS2811<MAX_LEDS_PER_STRIP>::DMALayout WS2811<MAX_LEDS_PER_STRIP>::dmaData WORD_ALIGNED; // class static template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::hw_init() { if (initialized) return; dma_data_init(); clock_init(); dma_init(); io_init(); tpm_init(); initialized = true; SET_DEBUG; RESET_DEBUG; } // class static template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::dma_data_init() { memset(dmaData.allOnes, 0xFF, sizeof(dmaData.allOnes)); } // class static /// Enable PORTD, DMA and TPM0 clocking template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::clock_init() { SIM->SCGC5 |= SIM_SCGC5_PORTD_MASK; SIM->SCGC6 |= SIM_SCGC6_DMAMUX_MASK | SIM_SCGC6_TPM0_MASK; // Enable clock to DMA mux and TPM0 SIM->SCGC7 |= SIM_SCGC7_DMA_MASK; // Enable clock to DMA SIM->SOPT2 |= SIM_SOPT2_TPMSRC(1); // Clock source: MCGFLLCLK or MCGPLLCLK } // class static /// Configure GPIO output pins template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::io_init() { uint32_t m = 1; for (uint32_t i = 0; i < 32; i++) { // set up each pin if (m & enabledPins) { WS2811_IO_PORT->PCR[i] = PORT_PCR_MUX(1) // GPIO | PORT_PCR_DSE_MASK; // high drive strength } m <<= 1; } WS2811_IO_GPIO->PDDR |= enabledPins; // set as outputs #if WS2811_MONITOR_TPM0_PWM // PTD0 CH0 monitor: TPM0, high drive strength WS2811_IO_PORT->PCR[0] = PORT_PCR_MUX(4) | PORT_PCR_DSE_MASK; // PTD1 CH1 monitor: TPM0, high drive strength WS2811_IO_PORT->PCR[1] = PORT_PCR_MUX(4) | PORT_PCR_DSE_MASK; WS2811_IO_GPIO->PDDR |= 3; // set as outputs WS2811_IO_GPIO->PDOR &= ~(enabledPins | 3); // initially low #else WS2811_IO_GPIO->PDOR &= ~enabledPins; // initially low #endif #ifdef WS2811_DEBUG_PIN WS2811_IO_PORT->PCR[WS2811_DEBUG_PIN] = PORT_PCR_MUX(1) | PORT_PCR_DSE_MASK; WS2811_IO_GPIO->PDDR |= DEBUG_MASK; WS2811_IO_GPIO->PDOR &= ~DEBUG_MASK; #endif } // class static /// Configure DMA and DMAMUX template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::dma_init() { // reset DMAMUX DMAMUX0->CHCFG[DMA_CHAN_START] = 0; DMAMUX0->CHCFG[DMA_CHAN_0_LOW] = 0; DMAMUX0->CHCFG[DMA_CHAN_1_LOW] = 0; // wire our DMA event sources into the first three DMA channels // t=0: all enabled outputs go high on TPM0 overflow DMAMUX0->CHCFG[DMA_CHAN_START] = DMAMUX_CHCFG_ENBL_MASK | DMAMUX_CHCFG_SOURCE(DMA_MUX_SRC_TPM0_Overflow); // t=tpm_p0_period: all of the 0 bits go low. DMAMUX0->CHCFG[DMA_CHAN_0_LOW] = DMAMUX_CHCFG_ENBL_MASK | DMAMUX_CHCFG_SOURCE(DMA_MUX_SRC_TPM0_CH_0); // t=tpm_p1_period: all outputs go low. DMAMUX0->CHCFG[DMA_CHAN_1_LOW] = DMAMUX_CHCFG_ENBL_MASK | DMAMUX_CHCFG_SOURCE(DMA_MUX_SRC_TPM0_CH_1); NVIC_SetVector(DMA0_IRQn, (uint32_t)&DMA0_IRQHandler); NVIC_EnableIRQ(DMA0_IRQn); } // class static /// Configure TPM0 to do two different PWM periods at 800kHz rate template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::tpm_init() { // set up TPM0 for proper period (800 kHz = 1.25 usec ±600nsec) TPM_Type volatile *tpm = TPM0; tpm->SC = TPM_SC_DMA_MASK // enable DMA | TPM_SC_TOF_MASK // reset TOF flag if set | TPM_SC_CMOD(0) // disable clocks | TPM_SC_PS(0); // 48MHz / 1 = 48MHz clock tpm->MOD = tpm_period - 1; // 48MHz / 800kHz // No Interrupts; High True pulses on Edge Aligned PWM tpm->CONTROLS[0].CnSC = TPM_CnSC_MSB_MASK | TPM_CnSC_ELSB_MASK | TPM_CnSC_DMA_MASK; tpm->CONTROLS[1].CnSC = TPM_CnSC_MSB_MASK | TPM_CnSC_ELSB_MASK | TPM_CnSC_DMA_MASK; // set TPM0 channel 0 for 0.35 usec (±150nsec) (0 code) // 1.25 usec * 1/3 = 417 nsec tpm->CONTROLS[0].CnV = tpm_p0_period; // set TPM0 channel 1 for 0.7 usec (±150nsec) (1 code) // 1.25 usec * 2/3 = 833 nsec tpm->CONTROLS[1].CnV = tpm_p1_period; NVIC_SetVector(TPM0_IRQn, (uint32_t)&TPM0_IRQHandler); NVIC_EnableIRQ(TPM0_IRQn); } // class static template <unsigned MAX_LEDS_PER_STRIP> void WS2811<MAX_LEDS_PER_STRIP>::startDMA() { if (!initialized) hw_init(); wait_for_dma_done(); dma_done = 0; DMA_Type volatile * dma = DMA0; TPM_Type volatile *tpm = TPM0; uint32_t nBytes = sizeof(dmaData.start_t1_low) + sizeof(dmaData.dmaWords) + sizeof(dmaData.trailing_zeros_1); tpm->SC = TPM_SC_DMA_MASK // enable DMA | TPM_SC_TOF_MASK // reset TOF flag if set | TPM_SC_CMOD(0) // disable clocks | TPM_SC_PS(0); // 48MHz / 1 = 48MHz clock tpm->MOD = tpm_period - 1; // 48MHz / 800kHz tpm->CNT = tpm_p0_period - 2 ; tpm->STATUS = 0xFFFFFFFF; dma->DMA[DMA_CHAN_START].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status dma->DMA[DMA_CHAN_0_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status dma->DMA[DMA_CHAN_1_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status // t=0: all outputs go high // triggered by TPM0_Overflow // source is one word of 0 then 24 x 0xffffffff, then another 0 word dma->DMA[DMA_CHAN_START].SAR = (uint32_t)(void*)dmaData.start_t0_high; dma->DMA[DMA_CHAN_START].DSR_BCR = DMA_DSR_BCR_BCR_MASK & nBytes; // length of transfer in bytes // t=tpm_p0_period: some outputs (the 0 bits) go low. // Triggered by TPM0_CH0 // Start 2 words before the actual data to avoid garbage pulses. dma->DMA[DMA_CHAN_0_LOW].SAR = (uint32_t)(void*)dmaData.start_t1_low; // set source address dma->DMA[DMA_CHAN_0_LOW].DSR_BCR = DMA_DSR_BCR_BCR_MASK & nBytes; // length of transfer in bytes // t=tpm_p1_period: all outputs go low. // Triggered by TPM0_CH1 // source is constant 0x00000000 (first word of dmaWords) dma->DMA[DMA_CHAN_1_LOW].SAR = (uint32_t)(void*)dmaData.start_t1_low; // set source address dma->DMA[DMA_CHAN_1_LOW].DSR_BCR = DMA_DSR_BCR_BCR_MASK & nBytes; // length of transfer in bytes dma->DMA[DMA_CHAN_0_LOW].DAR = dma->DMA[DMA_CHAN_1_LOW].DAR = dma->DMA[DMA_CHAN_START].DAR = (uint32_t)(void*)&WS2811_IO_GPIO->PDOR; SET_DEBUG; dma->DMA[DMA_CHAN_0_LOW].DCR = DMA_DCR_EINT_MASK // enable interrupt on end of transfer | DMA_DCR_ERQ_MASK | DMA_DCR_D_REQ_MASK // clear ERQ on end of transfer | DMA_DCR_SINC_MASK // increment source each transfer | DMA_DCR_CS_MASK | DMA_DCR_SSIZE(0) // 32-bit source transfers | DMA_DCR_DSIZE(0); // 32-bit destination transfers dma->DMA[DMA_CHAN_1_LOW].DCR = DMA_DCR_EINT_MASK // enable interrupt on end of transfer | DMA_DCR_ERQ_MASK | DMA_DCR_D_REQ_MASK // clear ERQ on end of transfer | DMA_DCR_CS_MASK | DMA_DCR_SSIZE(0) // 32-bit source transfers | DMA_DCR_DSIZE(0); // 32-bit destination transfers dma->DMA[DMA_CHAN_START].DCR = DMA_DCR_EINT_MASK // enable interrupt on end of transfer | DMA_DCR_ERQ_MASK | DMA_DCR_D_REQ_MASK // clear ERQ on end of transfer | DMA_DCR_SINC_MASK // increment source each transfer | DMA_DCR_CS_MASK | DMA_DCR_SSIZE(0) // 32-bit source transfers | DMA_DCR_DSIZE(0); tpm->SC |= TPM_SC_CMOD(1); // enable internal clocking } #if !INSTANTIATE_TEMPLATES extern "C" void DMA0_IRQHandler() { DMA_Type volatile *dma = DMA0; TPM_Type volatile *tpm = TPM0; uint32_t db; db = dma->DMA[DMA_CHAN_0_LOW].DSR_BCR; if (db & DMA_DSR_BCR_DONE_MASK) { dma->DMA[DMA_CHAN_0_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status } db = dma->DMA[DMA_CHAN_1_LOW].DSR_BCR; if (db & DMA_DSR_BCR_DONE_MASK) { dma->DMA[DMA_CHAN_1_LOW].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status } db = dma->DMA[DMA_CHAN_START].DSR_BCR; if (db & DMA_DSR_BCR_DONE_MASK) { dma->DMA[DMA_CHAN_START].DSR_BCR = DMA_DSR_BCR_DONE_MASK; // clear/reset DMA status } tpm->SC = TPM_SC_TOF_MASK; // reset TOF flag; disable internal clocking dma_done++; SET_DEBUG; #if 0 // set TPM0 to interrrupt after guardtime tpm->MOD = guardtime_period - 1; // 48MHz * 55 usec tpm->CNT = 0; tpm->SC = TPM_SC_PS(0) // 48MHz / 1 = 48MHz clock | TPM_SC_TOIE_MASK // enable interrupts | TPM_SC_CMOD(1); // and internal clocking #endif } extern "C" void TPM0_IRQHandler() { TPM0->SC = 0; // disable internal clocking TPM0->SC = TPM_SC_TOF_MASK; RESET_DEBUG; dma_done = 3; } #endif