An I/O controller for virtual pinball machines: accelerometer nudge sensing, analog plunger input, button input encoding, LedWiz compatible output controls, and more.

Dependencies:   mbed FastIO FastPWM USBDevice

Fork of Pinscape_Controller by Mike R

/media/uploads/mjr/pinscape_no_background_small_L7Miwr6.jpg

Source code migrated to github: Since ARM is shutting down MBED in the near future, I've moved the active project source tree to github: https://github.com/mjrgh/Pinscape_Controller/. The source tree at the bottom of this page is a snapshot as of the migration but will no longer be updated.

~ ~ ~

This is Version 2 of the Pinscape Controller, an I/O controller for virtual pinball machines. (You can find the old version 1 software here.) Pinscape is software for the KL25Z that turns the board into a full-featured I/O controller for virtual pinball, with support for accelerometer-based nudging, a mechanical plunger, button inputs, and feedback device control.

In case you haven't heard of the idea before, a "virtual pinball machine" is basically a video pinball simulator that's built into a real pinball machine body. A TV monitor goes in place of the pinball playfield, and a second TV goes in the backbox to show the backglass artwork. Some cabs also include a third monitor to simulate the DMD (Dot Matrix Display) used for scoring on 1990s machines, or even an original plasma DMD. A computer (usually a Windows PC) is hidden inside the cabinet, running pinball emulation software that displays a life-sized playfield on the main TV. The cabinet has all of the usual buttons, too, so it not only looks like the real thing, but plays like it too. That's a picture of my own machine to the right. On the outside, it's built exactly like a real arcade pinball machine, with the same overall dimensions and all of the standard pinball cabinet trim hardware.

It's possible to buy a pre-built virtual pinball machine, but it also makes a great DIY project. If you have some basic wood-working skills and know your way around PCs, you can build one from scratch. The computer part is just an ordinary Windows PC, and all of the pinball emulation can be built out of free, open-source software. In that spirit, the Pinscape Controller is an open-source software/hardware project that offers a no-compromises, all-in-one control center for all of the unique input/output needs of a virtual pinball cabinet. If you've been thinking about building one of these, but you're not sure how to connect a plunger, flipper buttons, lights, nudge sensor, and whatever else you can think of, this project might be just what you're looking for.

You can find much more information about DIY Pin Cab building in general in the Virtual Cabinet Forum on vpforums.org. Also visit my Pinscape Resources page for more about this project and other virtual pinball projects I'm working on.

Downloads

  • Pinscape Release Builds: This page has download links for all of the Pinscape software. To get started, install and run the Pinscape Config Tool on your Windows computer. It will lead you through the steps for installing the Pinscape firmware on the KL25Z.
  • Config Tool Source Code. The complete C# source code for the config tool. You don't need this to run the tool, but it's available if you want to customize anything or see how it works inside.

Documentation

The new Version 2 Build Guide is now complete! This new version aims to be a complete guide to building a virtual pinball machine, including not only the Pinscape elements but all of the basics, from sourcing parts to building all of the hardware.

You can also refer to the original Hardware Build Guide (PDF), but that's out of date now, since it refers to the old version 1 software, which was rather different (especially when it comes to configuration).

System Requirements

The new Config Tool requires a fairly up-to-date Microsoft .NET installation. If you use Windows Update to keep your system current, you should be fine. A modern version of Internet Explorer (IE) is required, even if you don't use it as your main browser, because the Config Tool uses some system components that Microsoft packages into the IE install set. I test with IE11, so that's known to work. IE8 doesn't work. IE9 and 10 are unknown at this point.

The Windows requirements are only for the config tool. The firmware doesn't care about anything on the Windows side, so if you can make do without the config tool, you can use almost any Windows setup.

Main Features

Plunger: The Pinscape Controller started out as a "mechanical plunger" controller: a device for attaching a real pinball plunger to the video game software so that you could launch the ball the natural way. This is still, of course, a central feature of the project. The software supports several types of sensors: a high-resolution optical sensor (which works by essentially taking pictures of the plunger as it moves); a slide potentiometer (which determines the position via the changing electrical resistance in the pot); a quadrature sensor (which counts bars printed on a special guide rail that it moves along); and an IR distance sensor (which determines the position by sending pulses of light at the plunger and measuring the round-trip travel time). The Build Guide explains how to set up each type of sensor.

Nudging: The KL25Z (the little microcontroller that the software runs on) has a built-in accelerometer. The Pinscape software uses it to sense when you nudge the cabinet, and feeds the acceleration data to the pinball software on the PC. This turns physical nudges into virtual English on the ball. The accelerometer is quite sensitive and accurate, so we can measure the difference between little bumps and hard shoves, and everything in between. The result is natural and immersive.

Buttons: You can wire real pinball buttons to the KL25Z, and the software will translate the buttons into PC input. You have the option to map each button to a keyboard key or joystick button. You can wire up your flipper buttons, Magna Save buttons, Start button, coin slots, operator buttons, and whatever else you need.

Feedback devices: You can also attach "feedback devices" to the KL25Z. Feedback devices are things that create tactile, sound, and lighting effects in sync with the game action. The most popular PC pinball emulators know how to address a wide variety of these devices, and know how to match them to on-screen action in each virtual table. You just need an I/O controller that translates commands from the PC into electrical signals that turn the devices on and off. The Pinscape Controller can do that for you.

Expansion Boards

There are two main ways to run the Pinscape Controller: standalone, or using the "expansion boards".

In the basic standalone setup, you just need the KL25Z, plus whatever buttons, sensors, and feedback devices you want to attach to it. This mode lets you take advantage of everything the software can do, but for some features, you'll have to build some ad hoc external circuitry to interface external devices with the KL25Z. The Build Guide has detailed plans for exactly what you need to build.

The other option is the Pinscape Expansion Boards. The expansion boards are a companion project, which is also totally free and open-source, that provides Printed Circuit Board (PCB) layouts that are designed specifically to work with the Pinscape software. The PCB designs are in the widely used EAGLE format, which many PCB manufacturers can turn directly into physical boards for you. The expansion boards organize all of the external connections more neatly than on the standalone KL25Z, and they add all of the interface circuitry needed for all of the advanced software functions. The big thing they bring to the table is lots of high-power outputs. The boards provide a modular system that lets you add boards to add more outputs. If you opt for the basic core setup, you'll have enough outputs for all of the toys in a really well-equipped cabinet. If your ambitions go beyond merely well-equipped and run to the ridiculously extravagant, just add an extra board or two. The modular design also means that you can add to the system over time.

Expansion Board project page

Update notes

If you have a Pinscape V1 setup already installed, you should be able to switch to the new version pretty seamlessly. There are just a couple of things to be aware of.

First, the "configuration" procedure is completely different in the new version. Way better and way easier, but it's not what you're used to from V1. In V1, you had to edit the project source code and compile your own custom version of the program. No more! With V2, you simply install the standard, pre-compiled .bin file, and select options using the Pinscape Config Tool on Windows.

Second, if you're using the TSL1410R optical sensor for your plunger, there's a chance you'll need to boost your light source's brightness a little bit. The "shutter speed" is faster in this version, which means that it doesn't spend as much time collecting light per frame as before. The software actually does "auto exposure" adaptation on every frame, so the increased shutter speed really shouldn't bother it, but it does require a certain minimum level of contrast, which requires a certain minimal level of lighting. Check the plunger viewer in the setup tool if you have any problems; if the image looks totally dark, try increasing the light level to see if that helps.

New Features

V2 has numerous new features. Here are some of the highlights...

Dynamic configuration: as explained above, configuration is now handled through the Config Tool on Windows. It's no longer necessary to edit the source code or compile your own modified binary.

Improved plunger sensing: the software now reads the TSL1410R optical sensor about 15x faster than it did before. This allows reading the sensor at full resolution (400dpi), about 400 times per second. The faster frame rate makes a big difference in how accurately we can read the plunger position during the fast motion of a release, which allows for more precise position sensing and faster response. The differences aren't dramatic, since the sensing was already pretty good even with the slower V1 scan rate, but you might notice a little better precision in tricky skill shots.

Keyboard keys: button inputs can now be mapped to keyboard keys. The joystick button option is still available as well, of course. Keyboard keys have the advantage of being closer to universal for PC pinball software: some pinball software can be set up to take joystick input, but nearly all PC pinball emulators can take keyboard input, and nearly all of them use the same key mappings.

Local shift button: one physical button can be designed as the local shift button. This works like a Shift button on a keyboard, but with cabinet buttons. It allows each physical button on the cabinet to have two PC keys assigned, one normal and one shifted. Hold down the local shift button, then press another key, and the other key's shifted key mapping is sent to the PC. The shift button can have a regular key mapping of its own as well, so it can do double duty. The shift feature lets you access more functions without cluttering your cabinet with extra buttons. It's especially nice for less frequently used functions like adjusting the volume or activating night mode.

Night mode: the output controller has a new "night mode" option, which lets you turn off all of your noisy devices with a single button, switch, or PC command. You can designate individual ports as noisy or not. Night mode only disables the noisemakers, so you still get the benefit of your flashers, button lights, and other quiet devices. This lets you play late into the night without disturbing your housemates or neighbors.

Gamma correction: you can designate individual output ports for gamma correction. This adjusts the intensity level of an output to make it match the way the human eye perceives brightness, so that fades and color mixes look more natural in lighting devices. You can apply this to individual ports, so that it only affects ports that actually have lights of some kind attached.

IR Remote Control: the controller software can transmit and/or receive IR remote control commands if you attach appropriate parts (an IR LED to send, an IR sensor chip to receive). This can be used to turn on your TV(s) when the system powers on, if they don't turn on automatically, and for any other functions you can think of requiring IR send/receive capabilities. You can assign IR commands to cabinet buttons, so that pressing a button on your cabinet sends a remote control command from the attached IR LED, and you can have the controller generate virtual key presses on your PC in response to received IR commands. If you have the IR sensor attached, the system can use it to learn commands from your existing remotes.

Yet more USB fixes: I've been gradually finding and fixing USB bugs in the mbed library for months now. This version has all of the fixes of the last couple of releases, of course, plus some new ones. It also has a new "last resort" feature, since there always seems to be "just one more" USB bug. The last resort is that you can tell the device to automatically reboot itself if it loses the USB connection and can't restore it within a given time limit.

More Downloads

  • Custom VP builds: I created modified versions of Visual Pinball 9.9 and Physmod5 that you might want to use in combination with this controller. The modified versions have special handling for plunger calibration specific to the Pinscape Controller, as well as some enhancements to the nudge physics. If you're not using the plunger, you might still want it for the nudge improvements. The modified version also works with any other input controller, so you can get the enhanced nudging effects even if you're using a different plunger/nudge kit. The big change in the modified versions is a "filter" for accelerometer input that's designed to make the response to cabinet nudges more realistic. It also makes the response more subdued than in the standard VP, so it's not to everyone's taste. The downloads include both the updated executables and the source code changes, in case you want to merge the changes into your own custom version(s).

    Note! These features are now standard in the official VP releases, so you don't need my custom builds if you're using 9.9.1 or later and/or VP 10. I don't think there's any reason to use my versions instead of the latest official ones, and in fact I'd encourage you to use the official releases since they're more up to date, but I'm leaving my builds available just in case. In the official versions, look for the checkbox "Enable Nudge Filter" in the Keys preferences dialog. My custom versions don't include that checkbox; they just enable the filter unconditionally.
  • Output circuit shopping list: This is a saved shopping cart at mouser.com with the parts needed to build one copy of the high-power output circuit for the LedWiz emulator feature, for use with the standalone KL25Z (that is, without the expansion boards). The quantities in the cart are for one output channel, so if you want N outputs, simply multiply the quantities by the N, with one exception: you only need one ULN2803 transistor array chip for each eight output circuits. If you're using the expansion boards, you won't need any of this, since the boards provide their own high-power outputs.
  • Cary Owens' optical sensor housing: A 3D-printable design for a housing/mounting bracket for the optical plunger sensor, designed by Cary Owens. This makes it easy to mount the sensor.
  • Lemming77's potentiometer mounting bracket and shooter rod connecter: Sketchup designs for 3D-printable parts for mounting a slide potentiometer as the plunger sensor. These were designed for a particular slide potentiometer that used to be available from an Aliexpress.com seller but is no longer listed. You can probably use this design as a starting point for other similar devices; just check the dimensions before committing the design to plastic.

Copyright and License

The Pinscape firmware is copyright 2014, 2021 by Michael J Roberts. It's released under an MIT open-source license. See License.

Warning to VirtuaPin Kit Owners

This software isn't designed as a replacement for the VirtuaPin plunger kit's firmware. If you bought the VirtuaPin kit, I recommend that you don't install this software. The KL25Z can only run one firmware program at a time, so if you install the Pinscape firmware on your KL25Z, it will replace and erase your existing VirtuaPin proprietary firmware. If you do this, the only way to restore your VirtuaPin firmware is to physically ship the KL25Z back to VirtuaPin and ask them to re-flash it. They don't allow you to do this at home, and they don't even allow you to back up your firmware, since they want to protect their proprietary software from copying. For all of these reasons, if you want to run the Pinscape software, I strongly recommend that you buy a "blank" retail KL25Z to use with Pinscape. They only cost about $15 and are available at several online retailers, including Amazon, Mouser, and eBay. The blank retail boards don't come with any proprietary firmware pre-installed, so installing Pinscape won't delete anything that you paid extra for.

With those warnings in mind, if you're absolutely sure that you don't mind permanently erasing your VirtuaPin firmware, it is at least possible to use Pinscape as a replacement for the VirtuaPin firmware. Pinscape uses the same button wiring conventions as the VirtuaPin setup, so you can keep your buttons (although you'll have to update the GPIO pin mappings in the Config Tool to match your physical wiring). As of the June, 2021 firmware, the Vishay VCNL4010 plunger sensor that comes with the VirtuaPin v3 plunger kit is supported, so you can also keep your plunger, if you have that chip. (You should check to be sure that's the sensor chip you have before committing to this route, if keeping the plunger sensor is important to you. The older VirtuaPin plunger kits came with different IR sensors that the Pinscape software doesn't handle.)

TLC5940/TLC5940.h

Committer:
mjr
Date:
2021-12-22
Revision:
116:80ebb41bad94
Parent:
98:4df3c0f7e707

File content as of revision 116:80ebb41bad94:

// Pinscape Controller TLC5940 interface
//
// Based on Spencer Davis's mbed TLC5940 library.  Adapted for the
// KL25Z and simplified (removes dot correction and status input 
// support).

 
#ifndef TLC5940_H
#define TLC5940_H

#include "NewPwm.h"

// --------------------------------------------------------------------------
// Data Transmission Mode.
//
// NOTE!  This section contains a possible workaround to try if you're 
// having data signal stability problems with your TLC5940 chips.  If
// things are working properly, you can ignore this part.
//
// The software has two options for sending data updates to the chips:
//
// Mode 0:  Send data *during* the grayscale cycle.  This is the default,
// and it's the standard method the chips are designed for.  In this mode, 
// we start sending an update just after then blanking interval that starts 
// a new grayscale cycle.  The timing is arranged so that the update is 
// completed well before the end of the grayscale cycle.  At the next 
// blanking interval, we latch the new data, so the new brightness levels 
// will be shown starting on the next cycle.
//
// Mode 1:  Send data *between* grayscale cycles.  In this mode, we send
// each complete update during a blanking period, then latch the update
// and start the next grayscale cycle.  This isn't the way the chips were
// intended to be used, but it works.  The disadvantage is that it requires
// the blanking interval to be extended long enough for the full data 
// update (192 bits * the number of chips in the chain).  Since the
// outputs are turned off throughout the blanking period, this reduces
// the overall brightness/intensity of the outputs by reducing the duty
// cycle.  The TLC5940 chips can't achieve 100% duty cycle to begin with,
// since they require a brief minimum time in the blanking interval
// between grayscale cycles; however, the minimum is so short that the
// duty cycle is close to 100%.  With the full data transmission stuffed
// into the blanking interval, we reduce the duty cycle further below
// 100%.  With four chips in the chain, a 28 MHz data clock, and a
// 500 kHz grayscale clock, the reduction is about 0.3%.
//
// Mode 0 is the method documented in the manufacturer's data sheet.
// It works well empirically with the Pinscape expansion boards.
//
// So what's the point of Mode 1?  In early testing, with a breadboard 
// setup, I saw some problems with data signal stability, which manifested 
// as sporadic flickering in the outputs.  Switching to Mode 1 improved
// the signal stability considerably.  I'm therefore leaving this code
// available as an option in case anyone runs into similar signal problems
// and wants to try the alternative mode as a workaround.
//
#define DATA_UPDATE_INSIDE_BLANKING  0

#include "mbed.h"


// --------------------------------------------------------------------------
// Some notes on the data transmission design
//
// I spent a while working on using DMA to send the data, thinking that
// this would reduce the CPU load.  But I couldn't get this working
// reliably; there was some kind of timing interaction or race condition
// that caused crashes when initiating the DMA transfer from within the
// blanking interrupt.  I spent quite a while trying to debug it and
// couldn't figure out what was going on.  There are some complications
// involved in using DMA with SPI that are documented in the KL25Z
// reference manual, and I was following those carefully, but I suspect
// that the problem was somehow related to that, because it seemed to
// be sporadic and timing-related, and I couldn't find any software race
// conditions or concurrency issues that could explain it.
//
// I finally decided that I wasn't going to crack that and started looking
// for alternatives, so out of curiosity, I measured the time needed for a 
// synchronous (CPU-driven) SPI send, to see how it would fit into various
// places in the code.  This turned out to be faster than I expected: with
// SPI at 28MHz, the measured time for a synchronous send is about 72us for
// 4 chips worth of GS data (192 bits), which I expect to be the typical
// Expansion Board setup.  For an 8-chip setup, which will probably be 
// about the maximum workable setup, the time would be 144us.  We only have
// to send the data once per grayscale cycle, and each cycle is 11.7ms with 
// the grayscale clock at 350kHz (4096 steps per cycle divided by 350,000 
// steps per second = 11.7ms per cycle), so this is only 1% overhead.  The 
// main loop spends most of its time polling anyway, so we have plenty of 
// cycles to reallocate from idle polling to the sending the data.
//
// The easiest place to do the send is in the blanking interval ISR, but
// I wanted to keep this out of the ISR.  It's only ~100us, but even so,
// it's critical to minimize time in ISRs so that we don't miss other 
// interrupts.  So instead, I set it up so that the ISR coordinates with
// the main loop via a flag:
//
//  - In the blanking interrupt, set a flag ("cts" = clear to send),
//    and arm a timeout that fires 2/3 through the next blanking cycle
//
//  - In the main loop, poll "cts" each time through the loop.  When 
//    cts is true, send the data synchronously and clear the flag.
//    Do nothing when cts is false.
//
// The main loop runs on about a 1.5ms cycle, and 2/3 of the grayscale
// cycle is 8ms, so the main loop will poll cts on average 5 times per
// 8ms window.  That makes it all but certain that we'll do a send in
// a timely fashion on every grayscale cycle.
//
// The point of the 2/3 window is to guarantee that the data send is
// finished before the grayscale cycle ends.  The TLC5940 chips require
// this; data transmission has to be entirely between blanking intervals.
// The main loop and interrupt handler are operating asynchronously
// relative to one another, so the exact phase alignment will vary
// randomly.  If we start a transmission within the 2/3 window, we're
// guaranteed to have at least 3.5ms (1/3 of the cycle) left before
// the next blanking interval.  The transmission only takes ~100us,
// so we're leaving tons of margin for error in the timing - we have
// 34x longer than we need.
//
// The main loop can easily absorb the extra ~100us of overhead without
// even noticing.  The loop spends most of its time polling devices, so
// it's really mostly idle time to start with.  So we're effectively
// reallocating some idle time to useful work.  The chunk of time is
// only about 6% of one loop iteration, so we're not even significantly
// extending the occasional iterations that actually do this work.
// (If we had a 2ms chunk of monolithic work to do, that could start
// to add undesirable latency to other polling tasks.  100us won't.)
//
// We could conceivably reduce this overhead slightly by adding DMA, 
// but I'm not sure it would actually do much good.  Setting up the DMA
// transfer would probably take at least 20us in CPU time just to set
// up all of the registers.  And SPI is so fast that the DMA transfer
// would saturate the CPU memory bus for the 30us or so of the transfer.
// (I have my suspicions that this bus saturation effect might be part
// of the problem I was having getting DMA working in the first place.)
// So we'd go from 100us of overhead per cycle to at maybe 50us per 
// cycle.  We'd also have to introduce some concurrency controls to the 
// output "set" operation that we don't need with the current scheme 
// (because it's synchronous).  So overall I think the current
// synchronous approach is almost as good in terms of performance as 
// an asynchronous DMA setup would be, and it's a heck of a lot simpler
// and seems very reliable.
//
// --------------------------------------------------------------------------


/**
  * SPI speed used by the mbed to communicate with the TLC5940
  * The TLC5940 supports up to 30Mhz.  It's best to keep this as
  * high as possible, since a higher SPI speed yields a faster 
  * grayscale data update.  However, I've seen some slight
  * instability in the signal in my breadboard setup using the
  * full 30MHz, so I've reduced this slightly, which seems to
  * yield a solid signal.  The limit will vary according to how
  * clean the signal path is to the chips; you can probably crank
  * this up to full speed if you have a well-designed PCB, good
  * decoupling capacitors near the 5940 VCC/GND pins, and short
  * wires between the KL25Z and the PCB.  A short, clean path to
  * KL25Z ground seems especially important.
  *
  * The SPI clock must be fast enough that the data transmission
  * time for a full update is comfortably less than the blanking 
  * cycle time.  The grayscale refresh requires 192 bits per TLC5940 
  * in the daisy chain, and each bit takes one SPI clock to send.  
  * Our reference setup in the Pinscape controller allows for up to 
  * 4 TLC5940s, so a full refresh cycle on a fully populated system 
  * would be 768 SPI clocks.  The blanking cycle is 4096 GSCLK cycles.  
  *
  *   t(blank) = 4096 * 1/GSCLK_SPEED
  *   t(refresh) = 768 * 1/SPI_SPEED
  *   Therefore:  SPI_SPEED must be > 768/4096 * GSCLK_SPEED
  *
  * Since the SPI speed can be so high, and since we want to keep
  * the GSCLK speed relatively low, the constraint above simply
  * isn't a factor.  E.g., at SPI=30MHz and GSCLK=500kHz, 
  * t(blank) is 8192us and t(refresh) is 25us.
  */
#define SPI_SPEED 28000000

/**
  * The rate at which the GSCLK pin is pulsed.   This also controls 
  * how often the reset function is called.   The reset function call
  * interval in seconds is (4096/GSCLK_SPEED).  The maximum reliable 
  * rate is around 32Mhz.  It's best to keep this rate as low as 
  * possible:  the higher the rate, the higher the refresh() call 
  * frequency, so the higher the CPU load.  Higher frequencies also 
  * make it more challenging to wire the chips for clean signal 
  * transmission.  Lower clock speeds are more forgiving of wiring
  * quality.
  *
  * The lower bound depends on the application.  For driving lights,
  * the limiting factor is flicker: the lower the rate, the more
  * noticeable the flicker.  Incandescents tend to look flicker-free
  * at about 50 Hz (205 kHz grayscale clock).  LEDs need significantly 
  * faster rates than incandescents, since they don't have the thermal
  * lag of incandescents; for flicker-free LEDs, you usually need at
  * least 200Hz (GSCLK_SPEED 819200).
  */
#define GSCLK_SPEED    350000

class TLC5940
{
public:
    /**
      *  Set up the TLC5940
      *
      *  @param SCLK - The SCK pin of the SPI bus
      *  @param MOSI - The MOSI pin of the SPI bus
      *  @param GSCLK - The GSCLK pin of the TLC5940(s)
      *  @param BLANK - The BLANK pin of the TLC5940(s)
      *  @param XLAT - The XLAT pin of the TLC5940(s)
      *  @param nchips - The number of TLC5940s (if you are daisy chaining)
      */
    TLC5940(PinName SCLK, PinName MOSI, PinName GSCLK, PinName BLANK, PinName XLAT, int nchips)
        : spi(MOSI, NC, SCLK),
          gsclk(GSCLK),
          blank(BLANK, 1),
          xlat(XLAT),
          nchips(nchips)
    {
        // start up initially disabled
        enabled = false;
        
        // set XLAT to initially off
        xlat = 0;
        
        // Assert BLANK while starting up, to keep the outputs turned off until
        // everything is stable.  This helps prevent spurious flashes during startup.
        // (That's not particularly important for lights, but it matters more for
        // tactile devices.  It's a bit alarming to fire a replay knocker on every
        // power-on, for example.)
        blank = 1;
        
        // Configure SPI format and speed.  The KL25Z only supports 8-bit mode.
        // We nominally need to write the data in 12-bit chunks for the TLC5940
        // grayscale levels, but SPI is ultimately just a bit-level serial format,
        // so we can reformat the 12-bit blocks into 8-bit bytes to fit the 
        // KL25Z's limits.  This should work equally well on other microcontrollers 
        // that are more flexible.  The TLC5940 requires polarity/phase format 0.
        spi.format(8, 0);
        spi.frequency(SPI_SPEED);
        
        // Send out a full data set to the chips, to clear out any random
        // startup data from the registers.  Include some extra bits - there
        // are some cases (such as after sending dot correct commands) where
        // an extra bit per chip is required, and the initial state is 
        // unpredictable, so send extra bits to make sure we cover all bases.  
        // This does no harm; extra bits just fall off the end of the daisy 
        // chain, and since we want all registers initially set to 0, we can 
        // send arbitrarily many extra 0's.
        for (int i = 0 ; i < nchips*25 ; ++i)
            spi.write(0x00);
            
        // do an initial XLAT to latch all of these "0" values into the
        // grayscale registers
        xlat = 1;
        xlat = 0;

        // Allocate our SPI buffer.  The transfer on each cycle is 192 bits per
        // chip = 24 bytes per chip.
        spilen = nchips*24;
        spibuf = new uint8_t[spilen];
        memset(spibuf, 0x00, spilen);
        
        // Configure the GSCLK output's frequency
        gsclk.getUnit()->period(1.0f/GSCLK_SPEED);
        
        // we're not yet ready to send new data to the chips
        cts = false;
        
        // we don't need an XLAT signal until we send data
        needXlat = false;
    }
     
    // Global enable/disble.  When disabled, we assert the blanking signal
    // continuously to keep all outputs turned off.  This can be used during
    // startup and sleep mode to prevent spurious output signals from
    // uninitialized grayscale registers.  The chips have random values in
    // their internal registers when power is first applied, so we have to 
    // explicitly send the initial zero levels after power cycling the chips.
    // The chips might not have power even when the KL25Z is running, because
    // they might be powered from a separate power supply from the KL25Z
    // (the Pinscape Expansion Boards work this way).  Global blanking helps
    // us start up more cleanly by suppressing all outputs until we can be
    // reasonably sure that the various chip registers are initialized.
    void enable(bool f)
    {
        // note the new setting
        enabled = f;
        
        // If disabled, apply blanking immediately.  If enabled, do nothing
        // extra; we'll drop the blanking signal at the end of the next 
        // blanking interval as normal.
        if (!f)
        {
            // disable interrupts, since the blanking interrupt writes gsclk too
            __disable_irq();
        
            // turn off the GS clock and assert BLANK to turn off all outputs
            gsclk.glitchFreeWrite(0);
            wait_us(3);
            blank = 1;

            // done messing with shared data
            __enable_irq();
        }
        
    }
    
    // Start the clock running
    void start()
    {        
        // Set up the first call to the reset function, which asserts BLANK to
        // end the PWM cycle and handles new grayscale data output and latching.
        // The original version of this library used a timer to call reset
        // periodically, but that approach is somewhat problematic because the
        // reset function itself takes a small amount of time to run, so the
        // *actual* cycle is slightly longer than what we get from counting
        // GS clocks.  Running reset on a timer therefore causes the calls to
        // slip out of phase with the actual full cycles, which causes 
        // premature blanking that shows up as visible flicker.  To get the
        // reset cycle to line up more precisely with a full PWM cycle, it
        // works better to set up a new timer at the end of each cycle.  That
        // organically accounts for the time spent in the interrupt handler.
        // This doesn't result in perfectly uniform timing, since interrupt
        // latency varies slightly on each interrupt, but it does guarantee
        // that the blanking will never be premature - all variation will go
        // into the tail end of the cycle after the 4096 GS clocks.  That
        // might cause some brightness variation, but it won't cause flicker,
        // and in practice any brightness variation from this seems to be too 
        // small to be visible.
        armReset();
    }
    
    // stop the timer
    void stop()
    {
        disarmReset();
    }
    
     /*
      *  Set an output.  'idx' is the output index: 0 is OUT0 on the first
      *  chip, 1 is OUT1 on the first chip, 16 is OUT0 on the second chip
      *  in the daisy chain, etc.  'data' is the brightness value for the
      *  output, 0=off, 4095=full brightness.
      */
    void set(int idx, unsigned short data) 
    {
        // validate the index
        if (idx >= 0 && idx < nchips*16)
        {
#if DATA_UPDATE_INSIDE_BLANKING
            // If we send data within the blanking interval, turn off interrupts while 
            // modifying the buffer, since the send happens in the interrupt handler.
            __disable_irq();
#endif

            // Figure the SPI buffer location of the output we're changing.  The SPI
            // buffer has the packed bit format that we send across the wire, with 12 
            // bits per output, arranged from last output to first output (N = number 
            // of outputs = nchips*16):
            //
            //       byte 0  =  high 8 bits of output N-1
            //            1  =  low 4 bits of output N-1 | high 4 bits of output N-2
            //            2  =  low 8 bits of N-2
            //            3  =  high 8 bits of N-3
            //            4  =  low 4 bits of N-3 | high 4 bits of N-2
            //            5  =  low 8bits of N-4
            //           ...
            //  24*nchips-3  =  high 8 bits of output 1
            //  24*nchips-2  =  low 4 bits of output 1 | high 4 bits of output 0
            //  24*nchips-1  =  low 8 bits of output 0
            //
            // So this update will affect two bytes.  If the output number if even, we're
            // in the high 4 + low 8 pair; if odd, we're in the high 8 + low 4 pair.
            int di = nchips*24 - 3 - (3*(idx/2));
            if (idx & 1)
            {
                // ODD = high 8 | low 4
                spibuf[di]    = uint8_t((data >> 4) & 0xff);
                spibuf[di+1] &= 0x0F;
                spibuf[di+1] |= uint8_t((data << 4) & 0xf0);
            }
            else
            {
                // EVEN = high 4 | low 8
                spibuf[di+1] &= 0xF0;
                spibuf[di+1] |= uint8_t((data >> 8) & 0x0f);
                spibuf[di+2]  = uint8_t(data & 0xff);
            }

#if DATA_UPDATE_INSIDE_BLANKING
            // re-enable interrupts
            __enable_irq();
#endif
        }
    }
    
    // Send updates if ready.  Our top-level program's main loop calls this on
    // every iteration.  This lets us send grayscale updates to the chips in
    // regular application context (rather than in interrupt context), to keep
    // the time in the ISR as short as possible.  We return immediately if
    // we're not within the update window or we've already sent updates for
    // the current cycle.
    void send()
    {
        // if we're in the transmission window, send the data
        if (cts)
        {
            // Write the data to the SPI port.  Note that we go directly
            // to the hardware registers rather than using the mbed SPI
            // class, because this makes the operation about 50% faster.
            // The mbed class checks for input on every byte in case the
            // SPI connection is bidirectional, but for this application
            // it's strictly one-way, so we can skip checking for input 
            // and just blast bits to the output register as fast as 
            // it'll take them.  Before writing the output register 
            // ("D"), we have to check the status register ("S") and see
            // that the Transmit Empty Flag (SPTEF) is set.  The 
            // procedure is: spin until SPTEF is set in "S", write the 
            // next byte to "D", loop until out of bytes.
            uint8_t *p = spibuf;
            for (int i = spilen ; i > 0 ; --i) {
                while (!(SPI0->S & SPI_S_SPTEF_MASK)) ;
                SPI0->D = *p++;
            }
        
            // we've sent new data, so we need an XLAT signal to latch it
            needXlat = true;
            
            // done - we don't need to send again until the next GS cycle
            cts = false;
        }
    }

private:
    // SPI port.  This is master mode, output only, so we only assign the MOSI 
    // and SCK pins.
    SPI spi;

    // SPI transfer buffer.  This contains the live grayscale data, formatted
    // for direct transmission to the TLC5940 chips via SPI.
    uint8_t *volatile spibuf;
    
    // Length of the SPI buffer in bytes.  The native data format of the chips
    // is 12 bits per output = 1.5 bytes.  There are 16 outputs per chip, which
    // comes to 192 bits == 24 bytes per chip.
    uint16_t spilen;
    
    // Dirty: true means that the non-live buffer has new pending data.  False means
    // that the non-live buffer is empty.
    volatile bool dirty;
    
    // Enabled: this enables or disables all outputs.  When this is true, we assert the
    // BLANK signal continuously.
    bool enabled;
    
    // use a PWM out for the grayscale clock - this provides a stable
    // square wave signal without consuming CPU
    NewPwmOut gsclk;

    // Digital out pins used for the TLC5940
    DigitalOut blank;
    DigitalOut xlat;
    
    // number of daisy-chained TLC5940s we're controlling
    int nchips;

    // Timeout to end each PWM cycle.  This is a one-shot timer that we reset
    // on each cycle.
    Timeout resetTimer;
    
    // Timeout to end the data window for the PWM cycle.
    Timeout windowTimer;
    
    // "Clear To Send" flag: 
    volatile bool cts;
    
    // Do we need an XLAT signal on the next blanking interval?
    volatile bool needXlat;
        
    // Reset the grayscale cycle and send the next data update
    void reset()
    {
        // start the blanking cycle
        startBlank();
        
        // we're now clear to send the new GS data
        cts = true;
        
#if DATA_UPDATE_INSIDE_BLANKING
        // We're configured to send the new GS data inline during each 
        // blanking cycle.  Send it now.
        send();
#else
        // We're configured to send GS data during the GS cycle.  This means
        // we can defer the GS data transmission to any point within the next
        // GS cycle, which will last about 12ms (assuming a 350kHz GS clock).
        // That's a ton of time given that our GS transmission only takes about
        // 100us.  With such a leisurely time window to work with, we can move
        // the transmission out of the ISR context and into regular application
        // context, which is good because it greatly reduces the time we spend 
        // in this ISR, which is good in turn because more ISR time means more 
        // latency for other interrupts and more chances to miss interrupts
        // entirely.  
        //
        // The mechanism for deferring the transmission to application context
        // is simple.  The main program loop periodically polls the "cts" flag
        // and transmits the data if it finds "cts" set.  To conform to the
        // hardware spec for the TLC5940 chips, the data transmission has to
        // finish before the next blanking interval.  This means our time 
        // window to do the transmission is the 12ms of the grayscale cycle 
        // minus the ~100us to do the transmission.  So basically 12ms.  
        // Timing is never exact on the KL25Z, though, so we should build in
        // a little margin for error.  To be conservative, we'll say that the 
        // update must begin within the first 2/3 of the grayscale cycle time.
        // That's an 8ms window, and leaves a 4ms margin of error.  It's
        // almost inconceivable that any of the timing factors would be 
        // outside of those bounds.
        //
        // To coordinate this 2/3-of-a-cycle window with the main loop, set
        // up a timeout to clear the "cts" flag 2/3 into the cycle time.  If
        // for some reason the main loop doesn't do the transmission before
        // this timer fires, it'll see the "cts" flag turned off and won't
        // attempt the transmission on this round.  (That should essentially
        // never happen, but it wouldn't be a problem even if it happened with
        // some regularity, because we'd just transmit the data on the next
        // cycle.)
        windowTimer.attach_us(this, &TLC5940::closeSendWindow, 
            uint32_t((1.0f/GSCLK_SPEED)*4096.0f*2.0f/3.0f*1000000.0f));
#endif

        // end the blanking interval
        endBlank();

        // re-arm the reset handler for the next blanking interval
        armReset();
    }
    
    // End the data-send window.  This is a timeout routine that fires halfway
    // through each grayscale cycle.  The TLC5940 chips allow new data to be
    // sent at any time during the grayscale pulse cycle, but the transmission
    // has to fit into this window.  We do these transmissions from the main loop,
    // so that they happen in application context rather than interrupt context,
    // but this means that we have to synchronize the main loop activity to the
    // grayscale timer cycle.  To make sure the transmission is done before the
    // next grayscale cycle ends, we only allow the transmission to start for
    // the first 2/3 of the cycle.  This gives us plenty of time to send the
    // data and plenty of padding to make sure we don't go too late.  Consider
    // the relative time periods: we run the grayscale clock at 350kHz, and each
    // grayscale cycle has 4096 steps, so each cycle takes 11.7ms.  For the
    // typical Expansion Board setup with 4 TLC5940 chips, we have 768 bits 
    // to send via SPI at 28 MHz, which nominally takes 27us.  The actual
    // measured time to send 768 bits via send() is 72us, so there's CPU overhead 
    // of about 2.6x.  The biggest workable Expnasion Board setup would probably 
    // be around 8 TLC chips, so we'd have twice the bits and twice the 
    // transmission time of our 4-chip scenario, so the send time would be
    // about 150us.  2/3 of the grayscale cycle gives us an 8ms window to 
    // perform a 150us operation.  The main loop runs about every 1.5ms, so 
    // we're all but certain to poll CTS more than once during each 8ms window.  
    // Even if we start at the very end of the window, we still have about 3.5ms 
    // to finish a <150us operation, so we're all but certain to finish in time.
    void closeSendWindow() 
    { 
        cts = false; 
    }
    
    // arm the reset handler - this fires at the end of each GS cycle    
    void armReset()
    {
        resetTimer.attach_us(this, &TLC5940::reset, 
            uint32_t((1.0/GSCLK_SPEED)*4096.0*1000000.0f));
    }
    
    void disarmReset()
    {
        resetTimer.detach();
    }

    void startBlank()
    {
        // turn off the grayscale clock
        gsclk.glitchFreeWrite(0);
        
        // Make sure the gsclk cycle ends, since the TLC5940 data sheet
        // says we can't take BLANK high until GSCLK has been low for 20ns.
        // (We don't have to add any padding for the 20ns, since it'll take
        // at least one CPU cycle of 60ns to return from waitEndCycle().
        // That routine won't return until GSCLK is low, so it will have
        // low for at least 60ns by the time we get back from this call.)
        gsclk.waitEndCycle();
        
        // assert BLANK to end the grayscale cycle
        blank = 1;
    }
            
    void endBlank()
    {
        // if we've sent new grayscale data since the last blanking
        // interval, latch it by asserting XLAT
        if (needXlat)
        {
            // latch the new data while we're still blanked
            xlat = 1;
            xlat = 0;
            needXlat = false;
        }

        // End the blanking interval and restart the grayscale clock.  Note
        // that we keep the blanking on if the chips are globally disabled.
        if (enabled)
        {
            blank = 0;
            gsclk.write(.5);
        }
    }
};
 
#endif