New VGA 640x400 70Hz library

04 Jul 2011

Here's a library for everyone that wants to connect a VGA monitor directly to their mbed. This library, inspired by Cliff Biffle's assembly implementation, is 100% pure C. It uses I2S, PWM and DMA to generate the VGA signal and uses very little CPU time. The library is here:

http://mbed.org/users/Ivop/libraries/vga640x400/ltqx8p

It uses my fastlib library:

http://mbed.org/users/Ivop/libraries/fastlib/ltrmdj

There isn't much documentation, except for the header files, but this example shows it is pretty straight forward to use:

http://mbed.org/users/Ivop/programs/vga640x400_example/ltsaxj

Now all we need is a terminal emulator on top of this ;-)

pins used:

  • DIP5 - Monochrome bitstream (connect to R, G and B through 1K resistors)
  • DIP8 - VSYNC
  • DIP25 - HSYNC
  • GND

/media/uploads/Ivop/mbed-vga640x400.jpg

03 Jul 2011

Impressive !

Is it a bitmap framebuffer ?

I had played with some ideas of implementing DMA for my QVGA color LCD (320x240) but as far as I could find out you can't do parallel DMA to an IO port (bus)

I need 120 parallel bytes pushed out of port0 every 80 usec as fast as possible with a clock signal. see http://mbed.org/users/gertk/notebook/driving-a-controllerless-qvga-display/

Gert

03 Jul 2011

Some more info:

Here's the timing for 640x400 70Hz:

http://tinyvga.com/vga-timing/640x400@70Hz

My driver is slightly off (25Mhz pixel clock, vertical refresh 69.6Hz) but close enough. Note the positive vsync polarity.

It uses a text buffer of 80x25 characters and a 256 characters 8x16 bitmap font. DMA is setup to constantly cycle between two 100-byte buffers (800 'pixels') and while one is transferred to the I2S interface, the other is filled with the next scanline's data. After 400 visible lines, both buffers are cleared. I use partially unrolled loops for rendering the characters and clearing the buffers. The binary of the example is around 4.7kB.

HSYNC is generated by PWM and VSYNC is set and cleared during the DMA interrupts.

It would be possible to exchange the whole character based rendering engine by a framebuffer of 32000 bytes. Or perhaps mix them (192 lines of framebuffer and 13 lines of characters) to save some memory and have both the ability to render images and text. Originally it was my intention to enter the competition with a logic analyzer for which I needed this. It never came from the ground in time, but I still ponder over the idea :)

If you need to output 120x8 bits parallel, you could try to use DMA on pins p0.4-p0.11. See my overview here:

http://mbed.org/users/Ivop/notebook/overview-of-all-mbed-pins-and-functions/

These pins are consecutive, so you only need to preshift your data. Perhaps you could use a similar technique I used above (preshift 120 bytes while outputting the previous 120 bytes).

Regards, Ivo

04 Jul 2011

In my QVGA system I moved the framebuffer to to ABSHRAM0 and ABSHRAM1 space, but I don't use USB, CAN or Ethernet so this space is available. I speeded up the line output routine by shuffling some pins so that the clock pins are on port0 too. I shaved of a some cycles but found out if I used your fastlib routines I had to insert nops between the data output and clock high to low transitions, the display could not keep up.. So I can't gain more speed this way. Every data byte has to be clocked unlike VGA where you can just shift data out at a fixed speed and the PLL inside the monitor will clock the bits into the panel.

// direct access pointer to port0
volatile uint32_t *myport0= (volatile uint32_t *) 0x2009c000;

// the main interrupt routine, runs every 80 usecs
inline void LineInterrupt() {
    unsigned char t;
    unsigned int s;
    
    *(myport0+7)=2;                     // clear line latch pulse (pin10 p0.1)

    if (pointer==FRAMEBUFFERMAX) {      // when end of framebuffer
        pointer=0;                      // reset pointer
        *(myport0+6)=1<<26;             // signal new frame (pin18 p0.26)
        FR=!FR;                         // toggle FR (pin17 p0.25) 
        systemclock++;                  // increment system clock (internal software timer)
    } else {
        *(myport0+7)=1<<26;             // reset FLM (pin18 p0.26)
    }

    // shift a line of the framebuffer out
    for (t=120; t>0; t--) {               // counting from 120 to 0 is faster than 0 to 120
            s=1|framebuffer[pointer++]<<4;// get a byte from the framebuffer and preshift
            *(myport0+7)=0x00000FF0;      // clear the 0 bits
            *(myport0+6)=s;               // set the 1 bits + CL2 (pin9 p0.0)
        __nop();                          // minimal wait seems to be two nops
        __nop();                          // with one nop the display can not keep up
        *(myport0+7)=1;                   // reset CL2  
    }
    *(myport0+6)=2;                       // set line latch pulse CL1 (pin10 p0.1)
}

This resulted in an increase of remaining CPU power of 30% !

07 Jul 2011

Toying around with a simple terminal emulator on your mbed VGA code:

/media/uploads/gertk/_scaled_2011-07-07_10.47.32.jpg

(note, the LCD monitor is physically 800x600, so the characters are not so 'crisp', brightness is a bit low, I connected the RGB inputs each with a 1k resistor to DIP5 )

The USB to host uart settings get corrupted because of the powerdown of all pheripherals, I tried to adjust the baudrate back to 115200 by setting the divisor to 54 but still it sometimes works sometimes not. I commentend out the lines below the powerup line of init_uart0 and now the first mbed settings remain. The mbed has no problem keeping up with 115200 baud.

Just some thoughts: the I2S is used as fast shift register pushing monochrome bits out, would it be possible to add an external cmos/ttl shift register with an output latch (latching on every 3rd bit) an send RGB to the monitor (with 640/3 resolution of course) ? Is the I2S clock active ? Maybe even tweaking it so 320x400 or 320x240 is possible in 8 colors?

07 Jul 2011

Use the source...

#include "vga640x400/vga640x400.h"
#include "mbed.h"


// define serial port for debug
Serial linktopc(USBTX,USBRX);


extern unsigned char font_lin[256*16];
int cx=0,cy=0;

void scroll_up() {
    int n;
    memcpy(text_buffer,text_buffer+80,80*24);
    for (n=0; n<80; n++)
        text_buffer[24*80+n]=32;
}

void printchar(unsigned char c) {
    // xor 1 to compensate for character order (10325476 et cetera)
    // moved to userspace instead of rendering loop for obvious reasons
    text_buffer[(cx^1)+80*cy]=c;
    cx++;
    if (cx==80) {
        cx=0;
        cy++;
        if (cy>24) {
            scroll_up();
            cy=24;
        }
    }
}

void cls() {
    for (int i=0; i<2000; i++)
        text_buffer[i] = 32;
    cx=0;
    cy=0;
}

int main(int argc, char **argv) {
    font = font_lin;
    int p;
   
    
    init_vga();
    
    
    // serial port on at 115200 baud 
    linktopc.baud(115200);
    setbuf(stdout, NULL); // no buffering for this filehandle

    printf("VGA terminal emulator\n\r");

    cls();

    printchar('V');
    printchar('G');
    printchar('A');
    
    cx=0; cy=2;
    
    while (1) {
        p=getchar();
        putchar(p);
        switch (p) {
            case 13:
                cx=0;
                break;
            case 10:
                cy++;
                if (cy>24) {
                    scroll_up();
                    cy=24;
                }
                cx=0;
                break;
            case 12:
                cls();
                break;
            default:
                printchar(p);
                break;
        }
    }
}
07 Jul 2011

Gert van der Knokke wrote:

(note, the LCD monitor is physically 800x600, so the characters are not so 'crisp', brightness is a bit low, I connected the RGB inputs each with a 1k resistor to DIP5 )

Nice to see you were able to replicate everything! You could use lower value resistors to increase the luminance, but I better wanted to be safe than sorry and damage my monitor, although I have seen other embedded MCU projects connect 5V lines straight to the R, G and B lines (0.7V max. ouch..). I think going down to 510 should be safe though.

Gert van der Knokke wrote:

The USB to host uart settings get corrupted because of the powerdown of all pheripherals, I tried to adjust the baudrate back to 115200 by setting the divisor to 54 but still it sometimes works sometimes not. I commentend out the lines below the powerup line of init_uart0 and now the first mbed settings remain. The mbed has no problem keeping up with 115200 baud.

I power down all peripherals to save power and only use what is necessary. I think it is best to adjust the uart's baudrate after init_vga() because I change PLL0 to get a CCLK of 100MHz (which divides nicely to 25MHz). Perhaps I move init_uart0 to debug mode and don't touch the uart at all in the next version of my library. But even then init_vga() should be called first in any application. If you setup Serial linktopc afterwards, the mbed C++ library should adjust itself automatically to the new clock.

Gert van der Knokke wrote:

Just some thoughts: the I2S is used as fast shift register pushing monochrome bits out, would it be possible to add an external cmos/ttl shift register with an output latch (latching on every 3rd bit) an send RGB to the monitor (with 640/3 resolution of course) ? Is the I2S clock active ? Maybe even tweaking it so 320x400 or 320x240 is possible in 8 colors?

I2S has a clock line, so that should be possible. It is currently not exported to the outside world though. Also, keep in mind that a lower hsync frequency is probably not supported by many monitors. To half the vertical resolution, you have to display each line twice.

07 Jul 2011

I have checked the specs of the mbed and it seems each output is capable of 40 mA (source/sink) so for 0.7V over 75 ohm gives a bit under 10 mA per color so even with 270 or 330 ohms per color we are still within safe limits. But you are right, better safe than sorry. I have moved the serial initialization mbed setting to after the vga_init (see code above) and then it works OK (with the init uart0 lines active again). The other idea of reducing resolution to get 8 colors is still in the making... :-)

07 Jul 2011

Gert van der Knokke wrote:

I have moved the serial initialization mbed setting to after the vga_init (see code above) and then it works OK (with the init uart0 lines active again).

Great. But now the comment is not right. You are actually overriding the baudrate instead of the other way around :) Also, you can use memset to clear the text_buffer and the bottom line after scrolling.

Gert van der Knokke wrote:

The other idea of reducing resolution to get 8 colors is still in the making... :-)

Do you want a 320x200 framebuffer to draw on or a 'console mode' with character cell attributes?

07 Jul 2011

Ok changed the comment :) Some further tinkering resulted in this, a 32000 byte framebuffer:

/media/uploads/gertk/_scaled_2011-07-07_22.18.01.jpg

Just love the modular approach hehe... Moved my z80 emulated BBC Basic into your example and adjusted some X and Y limits.

This is a first version of the code for the framebuffer I came up with:

#define FRAMEBUFFERMAX 32000

// force this framebuffer in ABSHRAM0 and ABSHRAM1
// warning!! this disables the use of the USB, Ethernet and CAN bus functions!
unsigned char *framebuffer = (unsigned char *)(0x2007C000);

// the framepointer
volatile unsigned int pointer=0;

static void state_visible_area(void) {
    int x;
    unsigned char *sp = curline;

    for (x=0; x<80; x++)
    *sp++=framebuffer[1^pointer++];
    
    if (pointer==FRAMEBUFFERMAX) pointer=0;

    if (line_counter == 438) {
        state = state_clearing_buffers;
    }
}

I put in the XOR just for ease of access from 'userspace' however, by moving it back to userspace you could simply use memcpy in the code above (hmmm.....) At least the bytes are in the correct order now. Compared to my QVGA version the emulator runs a bit faster. A small Basic program calculating 1000 times SIN(3) takes with the VGA code 13.5 seconds and on my QVGA 16.5 seconds.

07 Jul 2011

Gert van der Knokke wrote:

Some further tinkering resulted in this, a 32000 byte framebuffer: /media/uploads/gertk/_scaled_2011-07-07_22.18.01.jpg

Looks very nice :)

As for your framebuffer code, it is probably faster to partially unroll the loop (I used 4 times per loop) and something like:

if (line_counter != 438) return;
state   = state_clearing_buffers;
pointer = 0;

This saves some code and thus some cycles everytime the ISR is called (only one comparison except for the last time when line_counter equals 438).

07 Jul 2011

If you want to keep the 'userspace' framebuffer linear, maybe instead of just partially unrolling the loop, you could do:

    unsigned char *sp, *pointer;
    for (pointer=framebuffer+1, sp=curline; pointer<framebuffer+80; pointer+=2) {
        *sp++ = *pointer--;
        *sp++ = *pointer;
    }

or

    unsigned char *sp, *pointer;
    for (pointer=framebuffer+1, sp=curline; pointer<framebuffer+80; sp+=2, pointer+=2)
        *sp = *pointer;
    for (pointer=framebuffer, sp=curline+1; pointer<framebuffer+80; sp+=2, pointer+=2)
        *sp = *pointer;

I'm curious if any of this significantly speeds up your Basic program :) Both alternatives could also be partially unrolled.

08 Jul 2011

I just noticed that using the local filesystem from within the mbed stops the VGA output during the file transfer and it does not recover well... Sometimes the screen is ok but most of the times I need to reset. So I can save a Basic program to the local filesystem but re-load-ing it is tricky. I tried putting an extra init_vga() after the load and save routines but then the monitor switches off and the mbed locks up.

if you'd like to give it a spin:

/media/uploads/gertk/bbc_basic_vga.zip

10 PRINT "Hello world!" 
SAVE "TEST" 
NEW
LOAD "TEST" 

Remember: BBC Basic is case sensitive, only uppercase commands are recognized.

08 Jul 2011

I tried moving the initialization of local after init_vga() but that didn't help and increasing the DMA interrupt priority to zero didn't work either. I suspect that the LocalFileSystem module messes with the interrupts and/or gpdma and afterwards the timing is sometimes off by a little.

BTW very nice to see basic running :)

BTW2 Here, loading or saving twice restores the timing most of the time.

08 Jul 2011

Hmm, I found the problem:

http://mbed.org/forum/mbed/topic/1769/

My QVGA system uses a microSD card as storage so it does not display the problem. The VGA testbed is just the mbed on a breadboard and nothing else so I had to use the local filesystem for 'incidental storage'. At least that explains the disturbance...

BTW I was wondering if you would use the I2S output in 8 bit mono, would that not get rid of the byte order problem ? In the source I could not find where the number of bytes is given to the dma engine as I assume you need to double the number it is outputting now. There is also a fastlib function for setting the 'dma endiannes' but I tried both settings and could not see a difference.

Anyway great seeing this little NXP chip doing a z80 emulation with Basic on top and VGA output at the same time.

08 Jul 2011

Ah, that explains why the VGA signal gets distorted sometimes.

I2S mono sends each sample twice, so that won't work. DMA endianness affects the whole 32-bit word. See UM10360 page 589 for details.

Have you tried the xor-less rendering loops yet, while keeping the linear framebuffer?

08 Jul 2011

The first solution discarding the XOR:

Ivo van Poorten wrote:

unsigned char *sp, *pointer;
    for (pointer=framebuffer+1, sp=curline; pointer<framebuffer+80; pointer+=2) {
        *sp++ = *pointer--;
        *sp++ = *pointer;
    }

This one does not work, although it puzzled me quite some time why it did not work:

1) pointer+=2 has not the correct effect as in the loop you autodecrement pointer so it effectively adds just 1. one solution would be to add 3 (pointer+=3) or autoincrement pointer in the second line (*pointer++)

2) another problem is framebuffer is a static pointer to the framebuffer ram so it will copy the same line over and over again.

Well, I came up with this little gem:

    for (sp=curline; sp<(curline+80); pointer+=3) {
        *sp++ = *pointer--;
        *sp++ = *pointer;
    }

    // reset pointer if we're running out of framebuffer, this could be done in VSYNC too
    if (pointer==(FRAMEBUFFERMAX+framebuffer+1)) pointer=framebuffer+1;

This shows the big difference between C and Basic, setting up a for/next loop with one variable and incrementing something completely different :-)

Now pointer is a global pointer to framebuffer. And it works but the speed difference is negligible: 12.6 with the old code, 12.3 with the new.

08 Jul 2011

IMHO 2.4% speed-up is not negligible :) I have worked on projects where every 0.1% counts.

Sorry about the +=2 error. The code wasn't tested in any way.

You could also setup DMA to transfer directly from the framebuffer with a transfersize of 4000 (16000 bytes). The linked list could be something like:

loop 100 bytes buffer; before visible area, change next field to first 4000 bytes lli, which links to the other 4000 bytes lli and then back to looping an empty 100 bytes buffer again.

Side-effect is that you have to account for the weird byte order in 'userspace' again.

09 Jul 2011

One big advantage of DMA-ing from the two 16k blocks (ABSHRAM0/ABSHRAM1) will be that the CPU can run full speed on the lower 32k block since these two upper blocks are on a separate internal bus.

P.S. my benchmark is this simple Basic program:

10 S=TIME
20 FOR T=1 TO 1000
30 A=SIN(3)
40 NEXT T
50 PRINT (TIME-S)/70;"sec."

the TIME variable is updated every VSYNC so dividing it by 70 should give a fairly accurate seconds reading.

PP.SS. I added a SDCARD to the project on pins DIP11-DIP14 initialized it after init_vga and load/save is no problem now, rocksolid screen :-) Reducing the resistors on the VGA output to 3 x 390 ohm resulted in a nice bright white on black without overloading the output pin.

17 Jul 2011

Ivo van Poorten wrote:

You could also setup DMA to transfer directly from the framebuffer with a transfersize of 4000 (16000 bytes). The linked list could be something like:

loop 100 bytes buffer; before visible area, change next field to first 4000 bytes lli, which links to the other 4000 bytes lli and then back to looping an empty 100 bytes buffer again.

Side-effect is that you have to account for the weird byte order in 'userspace' again.

Ehm.. You now send 100 bytes per line, I assume that is 80 bytes data and 20 bytes 'retrace' so sending a large block of framebuffer data at once is not going to work is it ? (Unless each line is 100 bytes but then we're running out of ram.

How about DMA ing from 80 bytes source, then setup a DMA list from a single NULL word (without increment on source) for 20 bytes, then change the source address of the first list and so on ?

17 Jul 2011

Yes, you are right. I had completely forgotten about the horizontal sync and the front and back porch when I wrote the above. Indeed you need to switch between 20 bytes of zero and 80 bytes of framebuffer during the visible area and all zeroes during the vertical porches and sync pulse.

18 Jul 2011

Well, spectacular results :-)

By using direct DMA from the 'other' 32k and outputting only 80 bytes per line I got 8.8 seconds in my benchmark. The linked list points to itself.

But I was thinking: is it possible to let the HSYNC do the interrupt instead of the DMA ?

Pondering through the source I still don't quite understand how the DMA and the HSYNC are synchronized. Is this by 'chance' because they have the same clock source ? That would explain the different horizontal startup positions at each reset.

There is a mentioning of interrupt in the init_hsync pwm but how is it used ?

If the HSYNC would generate the interrupt then we only have to send 80 bytes of framebuffer data and then it blanks automatically (I assume I2S is default low or will it hold the last value?)

static void state_blank_area(void) {
    if (line_counter != 449) return;

    line_counter = 0;
    pointer=framebuffer;
    systemclock++;          // external VSYNC counter
    state = state_before_vsync;
}

// DMA from graphic framebuffer @ ASHRAM0 + ABSHRAM1
static void state_visible_area(void) {
    static unsigned int nn=0;

    if (line_counter != 438) {
        pointer+=80;
        fl_dma_set_srcaddr (0,(unsigned char *)pointer);

    } else {
        // setup DMA from a single NULL word
        fl_dma_set_srcaddr (0, &nn);
        fl_dma_channel_control(0,       // control word
                               20,                  // count (20*4 = 80)
                               4, 4,               // src and dest burst size
                               32, 32,             // src and dest width
                               FL_NO_SRC_INCREMENT, // do not increment source
                               FL_NO_DEST_INCREMENT,
                               FL_ON               // terminal count interrupt
                              );
        state = state_blank_area;
    }

}


static void state_after_vsync(void) {
    if (line_counter != 38) return;
    // setup dma for 80 bytes per line begining at pointer
    fl_dma_set_srcaddr (0, (unsigned char *)pointer);
    fl_dma_channel_control(0,       // control word
                           20,                  // count (20*4 = 80)
                           4, 4,               // src and dest burst size
                           32, 32,             // src and dest width
                           FL_SRC_INCREMENT,
                           FL_NO_DEST_INCREMENT,
                           FL_ON               // terminal count interrupt
                          );

    state = state_visible_area;
}

}
18 Jul 2011

IIRC I2S repeats its last value (I'm fairly sure, but you have to look at the datasheet if you want to be 100% sure). The sync between hsync and dma is not completely by chance as they (pwm and gpdma) are started at almost the same time. The HSHIFT define determines the relative position of the pwm channel. It might be though that my monitor is more forgiving than yours as it seems possible that somehow dma starts a bit later sometimes (which is kinda weird btw as the "flow" through the program and the instructions executed after a reset are 100% the same each time the device is reset).

Anyway, it is entirely possible to ditch the whole LLI thing and trigger an interrupt when MR1 matches. With a bit of luck you burn enough cycles before you start a single 80-byte DMA transfer each cycle during that interrupt (because the pixels start 48 colorclocks after the hsync pulse). During the blanking period you make sure the I2S peripheral has been fed 0 and nothing else has to be done.

A 30% speed increase is indeed spectacular! (And stating that your previous code was 43% slower sounds even beter ;-))

18 Jul 2011

BTW you are right that an interrupt is generated for MR0 (the call to fl_pwm_config_match() configures MR0, the one that matches with 800, the line length). Reset means it starts over at 0 when it matches and stop is disabled, so it keeps going on forever. Enabling the interrupt is a leftover from previous experiments and should be disabled. Right now it calls a dummy function, set by the CMSIS startup code. Disabling it might get you another 0.1% speed increase :)

20 Jul 2011

Not perfect yet but this one creates a steady canvas with bursts of DMA on every line. Too bad you need to reset all the channel parameters after every burst.. Also I needed to fiddel with the HSYNC width to get the left hand start of the screen visible...

I2S seems to be repeating the last bytes in the FIFO so I need to switch it off in the END of DMA interrupt (which is not active here) maybe by using the mute function.

#include "fastlib/common.h"
#include "fastlib/pinsel.h"
#include "fastlib/gpio.h"
#include "fastlib/clock.h"
#include "fastlib/power.h"
#include "fastlib/pwm.h"
#include "fastlib/i2s.h"
#include "fastlib/uart.h"
#include "fastlib/dma.h"
#include "fastlib/nvic.h"

#include "mbed.h"
DigitalOut myled1(LED1);

DigitalOut myled2(LED1);
#include <string.h>

#if 0       // DEBUG messages on UART0
#include <stdio.h>
#define dbprintf(...) printf(__VA_ARGS__)
#else
#define dbprintf(...)
#endif

// -----------------------------------------------------------------------------------

// VSYNC counter
extern unsigned long systemclock;

// 640 x 400 pixels / 8 =
#define FRAMEBUFFERMAX 32000

// force this framebuffer in ABSHRAM0 and ABSHRAM1
// warning!! this disables the use of the USB, Ethernet and CAN bus functions!
unsigned char *framebuffer = (unsigned char *)(0x2007C000);

// the framepointer
volatile unsigned char *pointer=framebuffer;

static unsigned line_counter;



// -----------------------------------------------------------------------------------

#define FEED0_AND_WAIT(x,y)   fl_pll0_feed(); while(y fl_pll0_get_##x())

static void init_pll0(unsigned int clock_source, int N, int M, int cpu_divider) {
    fl_select_pll0_clock_source(clock_source);

    fl_pll0_control(FL_ENABLE,  FL_DISCONNECT);
    FEED0_AND_WAIT(connect,);
    fl_pll0_control(FL_DISABLE, FL_DISCONNECT);
    FEED0_AND_WAIT(enable,);

    fl_pll0_config(N, M);
    fl_pll0_feed();

    fl_pll0_control(FL_ENABLE,  FL_DISCONNECT);
    FEED0_AND_WAIT(enable,!);

    fl_set_cpu_clock_divider(cpu_divider);
    while (!fl_pll0_get_lock()) ;

    fl_pll0_control(FL_ENABLE,  FL_CONNECT);
    FEED0_AND_WAIT(connect,!);
}

// -----------------------------------------------------------------------------------

static void init_uart0(unsigned divaddval, unsigned mulval, unsigned divisor) {
    fl_power_uart0(FL_ON);
    fl_select_clock_uart0(FL_CLOCK_DIV1);
    fl_uart_set_fractional_divider(0, divaddval, mulval);
    fl_uart_set_divisor_latch(0, divisor);
}

// -----------------------------------------------------------------------------------

static void init_vsync(unsigned port, unsigned pin) {
    fl_power_gpio(FL_ON);
    fl_pinsel(port, pin, FL_FUNC_DEFAULT, FL_IGNORE, FL_IGNORE);
    fl_gpio_set_direction(port, 1<<pin, FL_OUTPUT);
    fl_gpio_clear_value    (port, 1<<pin);
}

// -----------------------------------------------------------------------------------

static void init_dma_controller(void) {
    fl_power_gpdma(FL_ON);
    fl_dma_enable(FL_ENABLE);
    while (!fl_dma_is_enabled()) ;

    fl_dma_set_srcaddr (0, framebuffer);
    fl_dma_set_destaddr(0, (void*)FL_I2STXFIFO);
    fl_dma_set_next_lli(0, 0);

    fl_dma_channel_control(0,       // control word
                           20,                  // count (25*4 = 80)
                           4, 4,               // src and dest burst size
                           32, 32,             // src and dest width
                           FL_SRC_INCREMENT,
                           FL_NO_DEST_INCREMENT,
                           FL_ON              // terminal count interrupt
                          );

    fl_dma_channel_config(0, FL_ENABLE,
                          FL_DMA_PERIPHERAL_IS_MEMORY, FL_DMA_BURST_REQUEST_I2S_CH0,
                          FL_DMA_MEMORY_TO_PERIPHERAL,
                          FL_ON, FL_ON
                         );

    // fl_nvic_interrupt_set_enable(FL_NVIC_INT_DMA);
}

// -----------------------------------------------------------------------------------

static void init_i2s(void) {
    // I2S on P0.9 (DIP5)
    fl_power_i2s(FL_ON);
    fl_select_clock_i2s(FL_CLOCK_DIV1);                     // assume 100MHz
    fl_pinsel(0, 7, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_CLK
    fl_pinsel(0, 8, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_WS
    fl_pinsel(0, 9, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_SDA
    fl_i2s_set_tx_rate(1, 4);
    fl_i2s_output_set_config(FL_8BITS, FL_STEREO, 8, 0, 0, 0, 0);
}

// -----------------------------------------------------------------------------------

static void init_hsync(void) {
    // PWM1.2 on P2.1 (DIP25)
    fl_power_pwm1(FL_ON);
    fl_select_clock_pwm1(FL_CLOCK_DIV1);
    fl_pinsel(2, 1, FL_FUNC_ALT1, FL_FLOATING, FL_FLOATING);    // PWM1.2, no pullup/down
    fl_pwm_set_prescale(4);         // 100/25 = 4
    fl_pwm_config_match(0, FL_ON, FL_ON, FL_OFF);   // interrupt, reset, !stop
    fl_pwm_set_match(0, 800);   // 800 color clocks

// was 48
#define HSHIFT 10

//    fl_pwm_set_match(1, 97+HSHIFT);    // go high at 97
    fl_pwm_set_match(1, 47+HSHIFT);    // go high at 97
    fl_pwm_set_match(2, 1+HSHIFT);     // go low at 1
    fl_pwm_config_edges(2, FL_DOUBLE_EDGE);
    fl_pwm_output_enable(2, FL_ENABLE);


}

// -----------------------------------------------------------------------------------

static void state_before_vsync(void);

static void (*state)(void) = state_before_vsync;

static void state_blank_area(void) {
    if (line_counter != 449) return;

    line_counter = 0;
    pointer=framebuffer;
    systemclock++;          // external VSYNC counter
    state = state_before_vsync;
}

// graphic framebuffer @ ASHRAM0 + ABSHRAM1
static void state_visible_area(void) {

    if (line_counter != 438) {
        fl_dma_set_srcaddr (0,(unsigned char *)pointer);
        fl_dma_set_destaddr(0, (void*)FL_I2STXFIFO);
        fl_dma_set_next_lli(0, 0);

        *FL_DMA(0, Control) = 20
                              | (1  << 12)
                              | (1 << 15)
                              | (2  << 18)
                              | (2 << 21)
                              | (FL_SRC_INCREMENT  << 26)
                              | (FL_NO_DEST_INCREMENT << 27)
                              | (FL_OFF << 31);

        *FL_DMA(0, Config) = FL_ENABLE | (FL_DMA_PERIPHERAL_IS_MEMORY<<1) | (FL_DMA_BURST_REQUEST_I2S_CH0<<6) |
                             (FL_DMA_MEMORY_TO_PERIPHERAL<<11) | (FL_ON<<14) | (FL_ON<<15);
        pointer+=80;

    } else   state = state_blank_area;


}


static void state_after_vsync(void) {
    if (line_counter != 38) return;
    state = state_visible_area;
}

static void state_during_vsync(void) {
    if (line_counter != 4) return;

    fl_gpio_clear_value(0, 1<<6);
    state = state_after_vsync;
}

static void state_before_vsync(void) {
    if (line_counter != 2) return;

    fl_gpio_set_value(0, 1<<6);
    state = state_during_vsync;
}

// inactive
extern "C" void DMA_IRQHandler(void) __irq {
    fl_dma_clear_terminal_count_interrupt_request(0);
    //  line_counter++;
    //  state();
}

// active
extern "C" void PWM1_IRQHandler(void) __irq {
    int regval=*FL_PWM1IR;
    *FL_PWM1IR=regval;
    line_counter++;
    state();
}


// -----------------------------------------------------------------------------------

void init_vga(void) {
    fl_power_off_all_peripherals();

    init_pll0(FL_PLL0_CLOCK_SOURCE_MAIN, 2, 25, 3); // 100MHz
    init_uart0(0, 0, 651);                          // 100MHz/651/16=9600.6 (default 8N1)

    init_vsync(0, 6);                               // VSYNC on P0.6 (DIP8)
    init_i2s();
    init_hsync();
    init_dma_controller();

    fl_pwm_timer_counter_enable(FL_ENABLE);
    fl_pwm_enable(FL_ENABLE);
    fl_i2s_config_dma1(FL_OFF, FL_ON, 0, 2);
    fl_nvic_interrupt_set_enable(FL_NVIC_INT_PWM);   // enable PWM interrupts

}
20 Jul 2011

Gert van der Knokke wrote:

Not perfect yet but this one creates a steady canvas with bursts of DMA on every line. Too bad you need to reset all the channel parameters after every burst.. Also I needed to fiddel with the HSYNC width to get the left hand start of the screen visible...

The HSYNC pulse needs to be 96 colorclocks as per the specification. Your monitor might cope with a shorter pulse, but other's might not. If you want the interrupt to start earlier because you need to set up all the DMA parameters, perhaps you could trigger the interrupt by MR2 (i.e. when the pulse starts instead of when it ends).

Gert van der Knokke wrote:

[.....]
    fl_dma_channel_control(0,       // control word
                           20,                  // count (25*4 = 80)
                           4, 4,               // src and dest burst size
                           32, 32,             // src and dest width
                           FL_SRC_INCREMENT,
                           FL_NO_DEST_INCREMENT,
                           FL_ON              // terminal count interrupt
                          );
[.....]
        *FL_DMA(0, Control) = 20
                              | (1  << 12)
                              | (1 << 15)
                              | (2  << 18)
                              | (2 << 21)
                              | (FL_SRC_INCREMENT  << 26)
                              | (FL_NO_DEST_INCREMENT << 27)
                              | (FL_OFF << 31);

Just to be sure, you realize that it makes no difference whether you use the inline function or inline it yourself? My fastlib is meant to resolve everything to direct assignments w/o any function calls. That's why all function parameters are const, functions are static inline, if/else statements are supposed to be optimized out (because of constant propagation and dead-code elimination), et cetera. A proper compiler should do that.

Anyway, nice work! Does it have any positive or negative impact on the available cpu time for your emulator?

20 Jul 2011

I was not sure if it would have any impact so I tried both and also tried to understand what happens exactly :-)

All in all it is a bit slower than my earlier try with the linked list(s), last time I checked I was at 9.1 seconds so setting up the DMA seems to take some time. But my first priority was to get the 'blank canvas' running using the PWM interrupt. Most time I spent was with finding out how the handler was named, I read somewhere: PWM_IRQHandler() and could not find out why the mbed kept on hanging after activating the nvic.. The compiler giving me no error (of course) Then I saw it should have been PWM1_IRQHandler (duh) and the thing came to life... By using the PWM IRQ on channel0 I can shift the image by changing the width and match values of the Hsync pulse generated by channel 1+2. I also tried IRQ-ing from channel 1 or 2 but then the starting point is static and I can not center the image. Now I am going to try to set the I2S mute bit at the end of the DMA cycle to get blanking properly working.

20 Jul 2011

Phew... Blanking on end of DMA does not work as the I2S system has a FIFO which still needs to be send...

So I tried blanking with an extra PWM channel (and two interrupts per line):

#include "fastlib/common.h"
#include "fastlib/pinsel.h"
#include "fastlib/gpio.h"
#include "fastlib/clock.h"
#include "fastlib/power.h"
#include "fastlib/pwm.h"
#include "fastlib/i2s.h"
#include "fastlib/uart.h"
#include "fastlib/dma.h"
#include "fastlib/nvic.h"

#include "mbed.h"
DigitalOut myled1(LED1);

DigitalOut myled2(LED1);
#include <string.h>

#if 0       // DEBUG messages on UART0
#include <stdio.h>
#define dbprintf(...) printf(__VA_ARGS__)
#else
#define dbprintf(...)
#endif

// -----------------------------------------------------------------------------------

// VSYNC counter
extern unsigned long systemclock;

// 640 x 400 pixels / 8 =
#define FRAMEBUFFERMAX 32000

// force this framebuffer in ABSHRAM0 and ABSHRAM1
// warning!! this disables the use of the USB, Ethernet and CAN bus functions!
unsigned char *framebuffer = (unsigned char *)(0x2007C000);

// the framepointer
volatile unsigned char *pointer=framebuffer+1;

static unsigned line_counter;

// -----------------------------------------------------------------------------------

#define FEED0_AND_WAIT(x,y)   fl_pll0_feed(); while(y fl_pll0_get_##x())

static void init_pll0(unsigned int clock_source, int N, int M, int cpu_divider) {
    fl_select_pll0_clock_source(clock_source);

    fl_pll0_control(FL_ENABLE,  FL_DISCONNECT);
    FEED0_AND_WAIT(connect,);
    fl_pll0_control(FL_DISABLE, FL_DISCONNECT);
    FEED0_AND_WAIT(enable,);

    fl_pll0_config(N, M);
    fl_pll0_feed();

    fl_pll0_control(FL_ENABLE,  FL_DISCONNECT);
    FEED0_AND_WAIT(enable,!);

    fl_set_cpu_clock_divider(cpu_divider);
    while (!fl_pll0_get_lock()) ;

    fl_pll0_control(FL_ENABLE,  FL_CONNECT);
    FEED0_AND_WAIT(connect,!);
}

// -----------------------------------------------------------------------------------

static void init_uart0(unsigned divaddval, unsigned mulval, unsigned divisor) {
    fl_power_uart0(FL_ON);
    fl_select_clock_uart0(FL_CLOCK_DIV1);
    fl_uart_set_fractional_divider(0, divaddval, mulval);
    fl_uart_set_divisor_latch(0, divisor);
}

// -----------------------------------------------------------------------------------

static void init_vsync(unsigned port, unsigned pin) {
    fl_power_gpio(FL_ON);
    fl_pinsel(port, pin, FL_FUNC_DEFAULT, FL_IGNORE, FL_IGNORE);
    fl_gpio_set_direction(port, 1<<pin, FL_OUTPUT);
    fl_gpio_clear_value    (port, 1<<pin);
}

// -----------------------------------------------------------------------------------

static void init_dma_controller(void) {
    fl_power_gpdma(FL_ON);
    fl_dma_enable(FL_ENABLE);
    while (!fl_dma_is_enabled()) ;

    fl_dma_set_srcaddr (0, framebuffer);
    fl_dma_set_destaddr(0, (void*)FL_I2STXFIFO);
    fl_dma_set_next_lli(0, 0);

    fl_dma_channel_control(0,       // control word
                           20,                  // count (25*4 = 80)
                           4, 4,               // src and dest burst size
                           32, 32,             // src and dest width
                           FL_SRC_INCREMENT,
                           FL_NO_DEST_INCREMENT,
                           FL_OFF              // terminal count interrupt
                          );
//    if ((fl_dma_channel_get_control_mask(0) | 25) != dma_lli0.control_word) {
//        dbprintf("%08x and %08x\r\n", fl_dma_channel_get_control_mask(0) | 25, dma_lli0.control_word);
//        dbprintf("control_word mismatch\r\n");
//        while(1);
//    }

    fl_dma_channel_config(0, FL_ENABLE,
                          FL_DMA_PERIPHERAL_IS_MEMORY, FL_DMA_BURST_REQUEST_I2S_CH0,
                          FL_DMA_MEMORY_TO_PERIPHERAL,
                          FL_ON, FL_ON
                         );

//     fl_nvic_interrupt_set_enable(FL_NVIC_INT_DMA);
}

// -----------------------------------------------------------------------------------

static void init_i2s(void) {
    // I2S on P0.9 (DIP5)
    fl_power_i2s(FL_ON);
    fl_select_clock_i2s(FL_CLOCK_DIV1);                     // assume 100MHz
    fl_pinsel(0, 7, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_CLK
    fl_pinsel(0, 8, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_WS
    fl_pinsel(0, 9, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_SDA
    fl_i2s_set_tx_rate(1, 4);
    fl_i2s_output_set_config(FL_8BITS, FL_STEREO, 8, 0, 0, 0, 0);

}

// -----------------------------------------------------------------------------------

static void init_hsync(void) {
    // PWM1.2 on P2.1 (DIP25)
    fl_power_pwm1(FL_ON);
    fl_select_clock_pwm1(FL_CLOCK_DIV1);
    fl_pinsel(2, 1, FL_FUNC_ALT1, FL_FLOATING, FL_FLOATING);    // PWM1.2, no pullup/down
    fl_pinsel(3, 1, FL_FUNC_ALT1, FL_FLOATING, FL_FLOATING);    // PWM1.2, no pullup/down
    fl_pwm_set_prescale(4);         // 100/25 = 4

// was 48
#define HSHIFT 0

    // main PWM
    fl_pwm_set_match(0, 800);   // 800 color clocks

    // interrupt on blank start 
    fl_pwm_config_match(0, FL_ON, FL_ON, FL_OFF);   // interrupt, reset, !stop


    // this PWM generates the HSYNC pulse
    fl_pwm_set_match(2, 16+HSHIFT);         // go low at 16
    fl_pwm_set_match(1, 48+HSHIFT);         // go high at 48
    fl_pwm_config_edges(2, FL_DOUBLE_EDGE); // need this for negative sync
    fl_pwm_output_enable(2, FL_ENABLE);     // enable this output

    // this PWM generates the blanking
    fl_pwm_set_match(3, 96+HSHIFT);         // go high at 96
    fl_pwm_set_match(4, 1+HSHIFT);          // go low at 1 (0)
    fl_pwm_config_edges(4, FL_DOUBLE_EDGE); // need this for negative edges
    fl_pwm_output_enable(4, FL_ENABLE);     // enable this output
    fl_pwm_config_match(4, FL_OFF, FL_OFF, FL_OFF);   // interrupt, reset, !stop

    // and interrupt on blank end 
    fl_pwm_config_match(3, FL_ON, FL_OFF, FL_OFF);   // interrupt, reset, !stop

}

// -----------------------------------------------------------------------------------

static void state_before_vsync(void);

static void (*state)(void) = state_before_vsync;

static void state_blank_area(void) {
    if (line_counter != 449) return;

    line_counter = 0;
    pointer=framebuffer;
    systemclock++;          // external VSYNC counter
    state = state_before_vsync;
}

// graphic framebuffer @ ASHRAM0 + ABSHRAM1

// emit a line from the visible area (framebuffer)
static void state_visible_area(void) {

    if (line_counter != 438) {
        fl_dma_set_srcaddr (0,(unsigned char *)pointer);
        fl_dma_set_destaddr(0, (void*)FL_I2STXFIFO);
        fl_dma_set_next_lli(0, 0);

        fl_dma_channel_control(0,       // control word
                               20,                  // count (25*4 = 80)
                               4, 4,               // src and dest burst size
                               32, 32,             // src and dest width
                               FL_SRC_INCREMENT,
                               FL_NO_DEST_INCREMENT,
                               FL_OFF              // terminal count interrupt
                              );

        fl_dma_channel_config(0, FL_ENABLE,
                              FL_DMA_PERIPHERAL_IS_MEMORY, FL_DMA_BURST_REQUEST_I2S_CH0,
                              FL_DMA_MEMORY_TO_PERIPHERAL,
                              FL_ON, FL_ON
                             );

        // disable blanking
        fl_i2s_output_set_config(FL_8BITS, FL_STEREO, 8, 0, 0, 0, 0);

        pointer+=80;

    } else   state = state_blank_area;


}


static void state_after_vsync(void) {
    if (line_counter != 38) return;
    state = state_visible_area;
}

static void state_during_vsync(void) {
    if (line_counter != 4) return;

    fl_gpio_clear_value(0, 1<<6);
    state = state_after_vsync;
}

static void state_before_vsync(void) {
    if (line_counter != 2) return;

    fl_gpio_set_value(0, 1<<6);
    state = state_during_vsync;
}

// inactive 
extern "C" void DMA_IRQHandler(void) __irq {
    fl_dma_clear_terminal_count_interrupt_request(0);
}

// active
extern "C" void PWM1_IRQHandler(void) __irq {
    int regval=*FL_PWM1IR;
    if (regval & (1<<3)) {  // match 3 = end of horizontal blanking area
        fl_i2s_output_set_config(FL_8BITS, FL_STEREO, 8, 0, 0, 0, 0);   // enable I2S output
        state();
        line_counter++;
    }
    if (regval & (1<<0)) {  // match 0 = start of horizontal blanking area
        fl_i2s_output_set_config(FL_8BITS, FL_STEREO, 8, 0, 0, 1, 0);   // disable I2S output
    }
    // clear interrupt flag
    *FL_PWM1IR=regval;
}


// -----------------------------------------------------------------------------------

void init_vga(void) {
    fl_power_off_all_peripherals();

    init_pll0(FL_PLL0_CLOCK_SOURCE_MAIN, 2, 25, 3); // 100MHz
    init_uart0(0, 0, 651);                          // 100MHz/651/16=9600.6 (default 8N1)

    init_vsync(0, 6);                               // VSYNC on P0.6 (DIP8)
    init_i2s();
    init_hsync();
    init_dma_controller();

    fl_pwm_timer_counter_enable(FL_ENABLE);
    fl_pwm_enable(FL_ENABLE);
    fl_i2s_config_dma1(FL_OFF, FL_ON, 0, 2);
    fl_nvic_interrupt_set_enable(FL_NVIC_INT_PWM);

}

It is not getting any faster (it is still faster as the double scanline buffer) but the signal is more standard (haha) and the monitor seems happier. Quite tricky getting the negative logic correct with the match registers.

P.S. if one of the PWM channels has a 'reset' bit set, do all PWM channels reset simultaneously ?

21 Jul 2011

Well, I ditched the whole double PWM thing and tried again with a different linked lists approach. This one is running really well, faster than the double scanline buffer and blanking is done by linking the active list to blank list which links to NULL (so stops the DMA). The horizontal sync is now correctly timed (HSHIFT = 0) and the state machine interrupts are generated on Match register 0

The Basic benchmark runs in 9.18 seconds.

I think I will call this my final version :-)

/*
 * 640x400 70Hz full graphic VGA Driver
 *
 * Copyright (C) 2011 by Ivo van Poorten <ivop@euronet.nl>
 * and Gert van der Knokke <gertk@xs4all.nl>
 * This file is licensed under the terms of the GNU Lesser
 * General Public License, version 3.
 *
 * Inspired by Simon's (not Ford) Cookbook entry and Cliff Biffle's
 * assembly code.
 */
#include "fastlib/common.h"
#include "fastlib/pinsel.h"
#include "fastlib/gpio.h"
#include "fastlib/clock.h"
#include "fastlib/power.h"
#include "fastlib/pwm.h"
#include "fastlib/i2s.h"
#include "fastlib/uart.h"
#include "fastlib/dma.h"
#include "fastlib/nvic.h"

// -----------------------------------------------------------------------------------
// VSYNC counter (used externally)
extern unsigned long systemclock;

// 640 x 400 pixels / 8 =
#define FRAMEBUFFERMAX 32000

// force the framebuffer in ABSHRAM0 and ABSHRAM1 (16k + 16k)
// warning!! this disables the use of the USB, Ethernet and CAN bus functions!
unsigned char *framebuffer = (unsigned char *)(0x2007C000);

// the framepointer
volatile unsigned char *pointer=framebuffer;

// active line counter
static unsigned line_counter;

// -----------------------------------------------------------------------------------
// setup CPU PLL
#define FEED0_AND_WAIT(x,y)   fl_pll0_feed(); while(y fl_pll0_get_##x())

static void init_pll0(unsigned int clock_source, int N, int M, int cpu_divider) {
    fl_select_pll0_clock_source(clock_source);

    fl_pll0_control(FL_ENABLE,  FL_DISCONNECT);
    FEED0_AND_WAIT(connect,);
    fl_pll0_control(FL_DISABLE, FL_DISCONNECT);
    FEED0_AND_WAIT(enable,);

    fl_pll0_config(N, M);
    fl_pll0_feed();

    fl_pll0_control(FL_ENABLE,  FL_DISCONNECT);
    FEED0_AND_WAIT(enable,!);

    fl_set_cpu_clock_divider(cpu_divider);
    while (!fl_pll0_get_lock()) ;

    fl_pll0_control(FL_ENABLE,  FL_CONNECT);
    FEED0_AND_WAIT(connect,!);
}

// -----------------------------------------------------------------------------------
// setup UART0
static void init_uart0(unsigned divaddval, unsigned mulval, unsigned divisor) {
    fl_power_uart0(FL_ON);
    fl_select_clock_uart0(FL_CLOCK_DIV1);
    fl_uart_set_fractional_divider(0, divaddval, mulval);
    fl_uart_set_divisor_latch(0, divisor);
}

// -----------------------------------------------------------------------------------
// setup VSYNC output on designated pin
static void init_vsync(unsigned port, unsigned pin) {
    fl_power_gpio(FL_ON);
    fl_pinsel(port, pin, FL_FUNC_DEFAULT, FL_IGNORE, FL_IGNORE);
    fl_gpio_set_direction(port, 1<<pin, FL_OUTPUT);
    fl_gpio_clear_value    (port, 1<<pin);
}

// -----------------------------------------------------------------------------------
// define structure for DMA linked lists
struct dma_lli {
    void *source;
    void *dest;
    struct dma_lli *next;
    unsigned control_word;
};

// some arbitrary blank data for I2S used fro blanking
// even after DMA the I2S output will keep on emitting zeroes (= blank)
static unsigned char blanking[32]={0,0,0,0,0,0,0,0,
                                   0,0,0,0,0,0,0,0,
                                   0,0,0,0,0,0,0,0,
                                   0,0,0,0,0,0,0,0
                                  };

// preset our blanking DMA linked list
extern const struct dma_lli blank_lli;

// blank linked lists ends the DMA cycle (lli=0)
static const struct dma_lli blank_lli = {
    blanking, (void*)FL_I2STXFIFO, 0, 4
    | (1 << 12)
    | (1 << 15)
    | (2 << 18)
    | (2 << 21)
    | (0 << 26)
    | (0 << 27)
    | (0 << 31)
};

// setup the DMA controller
static void init_dma_controller(void) {
    fl_power_gpdma(FL_ON);
    fl_dma_enable(FL_ENABLE);
    while (!fl_dma_is_enabled()) ;

    // do some initial DMA setup but no need to start the DMA
    fl_dma_set_srcaddr (0, framebuffer);         // initial source pointer
    fl_dma_set_destaddr(0, (void*)FL_I2STXFIFO); // destination is I2S
    fl_dma_set_next_lli(0, &blank_lli);          // link active list to blank list

    fl_dma_channel_control(0,                    // control word
                           20,                   // count (20*4 = 80)
                           4, 4,                 // src and dest burst size
                           32, 32,               // src and dest width
                           FL_SRC_INCREMENT,
                           FL_NO_DEST_INCREMENT,
                           FL_OFF                // no interrupts
                          );

}

// -----------------------------------------------------------------------------------
// setup I2S for 25 MHz dot/pixel clock
static void init_i2s(void) {
    // I2S on P0.9 (DIP5)
    fl_power_i2s(FL_ON);
    fl_select_clock_i2s(FL_CLOCK_DIV1);                     // assume 100MHz
    fl_pinsel(0, 7, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_CLK
    fl_pinsel(0, 8, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_WS
    fl_pinsel(0, 9, FL_FUNC_ALT1, FL_IGNORE, FL_IGNORE);    // I2STX_SDA
    fl_i2s_set_tx_rate(1, 4);
    fl_i2s_output_set_config(FL_8BITS, FL_STEREO, 8, 0, 0, 0, 0);

}

// -----------------------------------------------------------------------------------
// create HSYNC output with PWM
static void init_hsync(void) {
    // PWM1.2 on P2.1 (DIP25)
    fl_power_pwm1(FL_ON);
    fl_select_clock_pwm1(FL_CLOCK_DIV1);
    fl_pinsel(2, 1, FL_FUNC_ALT1, FL_FLOATING, FL_FLOATING);    // PWM1.2, no pullup/down
    fl_pwm_set_prescale(4);         // 100/25 = 4

#define HSHIFT 0

    // main PWM
    fl_pwm_set_match(0, 800);   // 800 color clocks

    // generate line interrupts from PWM MR0
    fl_pwm_config_match(0, FL_ON, FL_ON, FL_OFF);   // interrupt, reset, !stop

    // this PWM generates the HSYNC pulse
    fl_pwm_set_match(2, 16+HSHIFT);         // go low at 16
    fl_pwm_set_match(1, 48+HSHIFT);         // go high at 48
    fl_pwm_config_edges(2, FL_DOUBLE_EDGE); // need this for negative sync
    fl_pwm_output_enable(2, FL_ENABLE);     // enable this output

}

// -----------------------------------------------------------------------------------
// state machine list for the complete screen output
static void state_before_vsync(void);

static void (*state)(void) = state_before_vsync;

static void state_blank_area(void) {
    if (line_counter != 449) return;

    line_counter = 0;
    pointer=framebuffer;
    systemclock++;          // external VSYNC counter
    state = state_before_vsync;
}


// emit a line from the visible area (framebuffer)
static void state_visible_area(void) {
    extern const struct dma_lli blank_lli;

    if (line_counter != 438) {
        // reset DMA parameters for active line
        fl_dma_set_srcaddr (0,(unsigned char *)pointer);  // source is our current framebuffer pointer
        fl_dma_set_destaddr(0, (void*)FL_I2STXFIFO);      // destination is I2S
        fl_dma_set_next_lli(0, &blank_lli);               // connect to blanking list
        fl_dma_channel_control(0,                  // control word
                               20,                 // count (20*4 = 80 bytes active)
                               4, 4,               // src and dest burst size
                               32, 32,             // src and dest width
                               FL_SRC_INCREMENT,
                               FL_NO_DEST_INCREMENT,
                               FL_OFF              // no interrupt on first 640 pixel
                              );
        // restart DMA sequence
        fl_dma_channel_config(0, FL_ENABLE,
                              FL_DMA_PERIPHERAL_IS_MEMORY, FL_DMA_BURST_REQUEST_I2S_CH0,
                              FL_DMA_MEMORY_TO_PERIPHERAL,
                              FL_ON, FL_ON
                             );
        // increment framebuffer pointer
        pointer+=80;

    } else   state = state_blank_area;
}


static void state_after_vsync(void) {
    if (line_counter != 38) return;
    state = state_visible_area;
}

static void state_during_vsync(void) {
    if (line_counter != 4) return;

    fl_gpio_clear_value(0, 1<<6);
    state = state_after_vsync;
}

static void state_before_vsync(void) {
    if (line_counter != 2) return;

    fl_gpio_set_value(0, 1<<6);
    state = state_during_vsync;
}

// inactive
extern "C" void DMA_IRQHandler(void) __irq {
    fl_dma_clear_terminal_count_interrupt_request(0);

}

// active
extern "C" void PWM1_IRQHandler(void) __irq {
    int regval=*FL_PWM1IR;
    // clear interrupt flag
    state();
    line_counter++;
    *FL_PWM1IR=regval;
}


// -----------------------------------------------------------------------------------

void init_vga(void) {
    fl_power_off_all_peripherals();

    init_pll0(FL_PLL0_CLOCK_SOURCE_MAIN, 2, 25, 3); // 100MHz
    init_uart0(0, 0, 651);                          // 100MHz/651/16=9600.6 (default 8N1)

    init_vsync(0, 6);                               // VSYNC on P0.6 (DIP8)
    init_i2s();
    init_hsync();
    init_dma_controller();

    fl_pwm_timer_counter_enable(FL_ENABLE);
    fl_pwm_enable(FL_ENABLE);
    fl_i2s_config_dma1(FL_OFF, FL_ON, 0, 2);
    fl_nvic_interrupt_set_enable(FL_NVIC_INT_PWM);  // start the PWM interrupts
}

23 Jul 2011

Gert van der Knokke wrote:

Well, I ditched the whole double PWM thing and tried again with a different linked lists approach. This one is running really well, faster than the double scanline buffer and blanking is done by linking the active list to blank list which links to NULL (so stops the DMA). The horizontal sync is now correctly timed (HSHIFT = 0) and the state machine interrupts are generated on Match register 0

The Basic benchmark runs in 9.18 seconds.

I think I will call this my final version :-)

Nice! I might have some spare time tomorrow to check it out myself.