Mike R / Mbed 2 deprecated Pinscape_Controller_V2

Dependencies:   mbed FastIO FastPWM USBDevice

Fork of Pinscape_Controller by Mike R

Revision:
47:df7a88cd249c
Parent:
45:c42166b2878c
Child:
48:058ace2aed1d
diff -r c42166b2878c -r df7a88cd249c TSL1410R/tsl1410r.h
--- a/TSL1410R/tsl1410r.h	Mon Feb 15 20:30:32 2016 +0000
+++ b/TSL1410R/tsl1410r.h	Thu Feb 18 07:32:20 2016 +0000
@@ -1,90 +1,204 @@
-// DMA VERSION - NOT WORKING
-
-// I'm saving this code for now, since it was somewhat promising but doesn't
-// quite work.  The idea here was to read the ADC via DMA, operating the ADC
-// in continuous mode.  This speeds things up pretty impressively (by about 
-// a factor of 3 vs having the MCU read each result from the ADC sampling 
-// register), but I can't figure out how to get a stable enough signal out of 
-// it.  I think the problem is that the timing isn't precise enough in detecting 
-// when the DMA completes each write.  We have to clock the next pixel onto the 
-// CCD output each time we complete a sample, and we have to do so quickly so 
-// that the next pixel charge is stable at the ADC input pin by the time the 
-// ADC sample interval starts.  I'm seeing a ton of noise, which I think means 
-// that the new pixel isn't ready for the ADC in time. 
-//
-// I've tried a number of approaches, none of which works:
-//
-// - Skip every other sample, so that we can spend one whole sample just 
-// clocking in the next pixel.  We discard the "odds" samples that are taken
-// during pixel changes, and use only the "even" samples where the pixel is
-// stable the entire time.  I'd think the extra sample would give us plenty
-// of time to stabilize the next pixel, but it doesn't seem to work out that
-// way.  I think the problem might be that the latency of the MCU responding
-// to each sample completion is long enough relative to the sampling interval
-// that we can't reliably respond to the ADC done condition fast enough.  I've
-// tried basing the sample completion detection on the DMA byte counter and
-// the ADC interrupt.  The DMA byte counter is updated after the DMA transfer
-// is done, so that's probably just too late in the cycle.  The ADC interrupt
-// should be concurrent with the DMA transfer starting, but in practice it 
-// still doesn't give us good results.
-//
-// - Use DMA, but with the ADC in single-sample mode.  This bypasses the latency
-// problem by ensuring that the ADC doesn't start a new sample until we've
-// definitely finished clocking in the next pixel.  But it defeats the whole
-// purpose by eliminating the speed improvement - the speeds are comparable to
-// doing the transfers via the MCU.  This surprises me because I'd have expected
-// that the DMA would run concurrently with the MCU pixel clocking code, but
-// maybe there's enough bus contention between the MCU and DMA in this case that
-// there's no true overlapping of the operation.  Or maybe the interrupt dispatch
-// adds enough overhead to negate any overlapping.  I haven't actually been able
-// to get good data out of this mode, either, but I gave up early because of the
-// lack of any speed improvement.
-
 /*
- *  TSL1410R interface class.
+ *  TSL1410R/TSL1412R interface class.
+ *
+ *  This provides a high-level interface for the Taos TSL1410R and
+ *  TSL1412R linear CCD array sensors.  (The 1410R and 1412R are
+ *  identical except for the pixel array size, which the caller 
+ *  provides as a parameter when instantiating the class.)
+ *
+ *  For fast reads of the pixel file from the sensor, we use the KL25Z's 
+ *  DMA capability.  The method we use is very specific to the KL25Z 
+ *  hardware and is pretty tricky.  These are attributes I don't normally 
+ *  like, but the speedup is amazing, enough to justify the complex 
+ *  design.  I don't think there's any other way to even get close to 
+ *  this kind of read speed for this sensor with this MCU.  (And I've 
+ *  tried!)  Reading the sensor quickly is more than just academic,
+ *  too: the plunger moves very fast during a release motion, so we 
+ *  have to be able to take correspondingly fast pictures to get 
+ *  clean images without motion blur.  Before the speedup of this DMA
+ *  approach, images captured during release motions had a lot of 
+ *  motion blur, and even aliasing from the sinusoidal motion when
+ *  the plunger bounces back and forth off the springs.  The speedup
+ *  gets us over the threshold where we can capture images with very
+ *  little blur, so we can track the motion much more precisely even
+ *  at release speeds.  This lets us determine the plunger position
+ *  more precisely and more quickly, which improves responsiveness in
+ *  the pinball simulator on the PC.
+ *
+ *  Here's our approach.  
+ * 
+ *  First, we put the analog input port (the ADC == Analog-to-Digital 
+ *  Converter) in "continuous" mode, at the highest clock speed we can 
+ *  program with the available clocks and the fastest read cycle 
+ *  available in the ADC hardware.  (The analog input port is the 
+ *  GPIO pin attached to the sensor's AO == Analog Output pin, where 
+ *  it outputs each pixel's value, one at a time, as an analog voltage 
+ *  level.)  In continuous mode, every time the ADC finishes taking a 
+ *  sample, it stores the result value in its output register and then 
+ *  immediately starts taking a new sample.  This means that no MCU 
+ *  (or even DMA) action is required to start each new sample.  This 
+ *  is where most of the speedup comes from, since it takes significant
+ *  time (multiple microseconds) to move data through the peripheral 
+ *  registers, and it takes more time (also multiple microseconds) for
+ *  the ADC to spin up for each new sample when in single-sample mode.
+ *  We cut out about 7us this way and get the time per sample down to 
+ *  about 2us.  This is close to the documented maximum speed for the
+ *  ADC hardware.
+ *
+ *  Second, we use the DMA controller to read the ADC result register
+ *  and store each sample in a memory array for processing.  The ADC
+ *  hardware is designed to work with the DMA controller by signaling
+ *  the DMA controller when a new sample is ready; this allows DMA to
+ *  move each sample immediately when it's available without any CPU
+ *  involvement.
  *
- *  This provides a high-level interface for the Taos TSL1410R linear CCD array sensor.
+ *  Third - and this is where it really gets tricky - we use two
+ *  additional "linked" DMA channels to generate the clock signal
+ *  to the CCD sensor.  The clock signal is how we tell the CCD when
+ *  to place the next pixel voltage on its AO pin, so the clock has
+ *  to be generated in lock step with the ADC sampling cycle.  The
+ *  ADC timing isn't perfectly uniform or predictable, so we can't 
+ *  just generate the pixel clock with a *real* clock.  We have to
+ *  time the signal exactly with the ADC, which means that we have 
+ *  to generate it from the ADC "sample is ready" signal.  Fortunately,
+ *  there is just such a signal, and in fact we're already using it,
+ *  as described above, to tell the DMA when to move each result from
+ *  the ADC output register to our memory array.  So how do we use this
+ *  to generate the CCD clock?  The answer lies in the DMA controller's
+ *  channel linking feature.  This allows one DMA channel to trigger a
+ *  second DMA channel each time the first channel completes one
+ *  transfer.  And we can use DMA to control our clock GPIO pin by
+ *  using the pin's GPIO IPORT register as the DMA destination address.
+ *  Specifically, we can take the clock high by writing our pin's bit 
+ *  pattern to the PSOR ("set output") register, and we can take the 
+ *  clock low by writing to the PCOR ("clear output") register.  We 
+ *  use one DMA channel for each of these operations.
+ *
+ *  Putting it all together, the cascade of linked DMA channels
+ *  works like this:
+ *
+ *   - The ADC sample completes, which triggers channel 1, the 
+ *     "Clock Up" channel.  This performs one transfer of the 
+ *     clock GPIO bit to the clock PSOR register, taking the clock
+ *     high, which causes the CCD to move the next pixel onto AO.
+ *
+ *   - After the Clock Up channel does its transfer, it triggers
+ *     its link to channel 2, the ADC transfer channel.  This
+ *     channel moves the ADC output register value to our memory
+ *     array.
+ *
+ *   - After the ADC channel does its transfer, it triggers channel
+ *     3, the "Clock Down" channel.  This performs one transfer of
+ *     the clock GPIO bit to the clock PCOR register, taking the
+ *     clock low.
+ *
+ *  Note that the order of the channels - Clock Up, ADC, Clock Down -
+ *  is important, because it ensures that we don't toggle the clock
+ *  bit too fast.  The CCD has a minimum pulse duration of 50ns for
+ *  the clock signal.  The DMA controller is so fast that we could
+ *  toggle the clock faster than this limit if we did the Up and 
+ *  Down transfers adjacently.  
+ *
+ *  Note also that it's important that Clock Up be the first operation,
+ *  because the ADC is in continuous mode, meaning that it starts
+ *  taking a new sample immediately upon finishing the previous one.
+ *  So when the ADC DMA signal fires, the new sample is just starting.
+ *  We therefore have to get the next pixel onto the sampling pin as
+ *  quickly as possible.  The CCD sensor's "analog output settling
+ *  time" is 120ns - this is the time for a new pixel voltage to 
+ *  stabilize on AO after a clock rising edge.  So assuming that the
+ *  ADC raises the DMA signal immediately, and the DMA controller
+ *  responds within a couple of MCU clock cycles, we should have the
+ *  new pixel voltage stable on the sampling pin by about 200ns after
+ *  the new ADC sample cycle starts.  The sampling cycle with our
+ *  current parameters is about 2us, so the voltage level is stable
+ *  for 90% of the cycle.  
+ *
+ *  Also, it's okay that the ADC sample transfer doesn't happen until 
+ *  after the Clock Up DMA transfer.  The ADC output register holds the 
+ *  last result until the next sample completes, so we have about 2us 
+ *  to grab it.  The first Clock Up DMA transfer only takes a couple 
+ *  of clocks - order of 100ns - so we get to it with time to spare.
+ *
+ *  (Note that it's tempting to try to handle the clock with a single 
+ *  DMA channel, by using the PTOR "toggle output" to do TWO writes:
+ *  one to toggle the clock up and another to toggle it down.  But
+ *  I haven't found a good way to do this.  The problem is that the
+ *  DMA controller can only do one transfer per trigger in the fully
+ *  autonomous mode we're using, and we need to do two writes.  In
+ *  fact, we'd really need to do three or four writes: we'd have to
+ *  throw in one or two no-op writes (of all zeroes) between the two
+ *  toggles, for time padding to ensure that we meet the minimum 50ns
+ *  pulse width for the TSL1410R clock signal.  But it's the same
+ *  issue whether it's two writes or four.  The DMA controller does
+ *  have a "continuous" mode that does an entire transfer on a single
+ *  trigger, but it can't reset itself after such a transfer, so CPU
+ *  intervention would be required on every ADC cycle to set up the
+ *  next clock write.  We could do that with an interrupt, but given
+ *  the 2us cycle time, an interrupt would create a ton of CPU load, 
+ *  and probably isn't even enough time to reliably complete each
+ *  interrupt service call before the next cycle.  Fortunately, at
+ *  the moment we only have one other module in the whole system
+ *  using DMA at all - the TLC5940 PWM controller interface, which
+ *  only needs one channel.  So with the four available channels in
+ *  the hardware, we can afford to use three of them here.)
  */
  
 #include "mbed.h"
 #include "config.h"
 #include "AltAnalogIn.h"
 #include "SimpleDMA.h"
+#include "DMAChannels.h"
  
 #ifndef TSL1410R_H
 #define TSL1410R_H
-#define TSL1410R_DMA
+
 
-// For faster GPIO on the clock pin, we write the IOPORT registers directly.
-// PORT_BASE gives us the memory mapped location of the IOPORT register set
-// for a pin; PINMASK gives us the bit pattern to write to the registers.
-//
-// - To turn a pin ON:  PORT_BASE(pin)->PSOR |= PINMASK(pin)
-// - To turn a pin OFF: PORT_BASE(pin)->PCOR |= PINMASK(pin)
-// - To toggle a pin:   PORT_BASE(pin)->PTOR |= PINMASK(pin)
+// To allow DMA access to the clock pin, we need to point the DMA
+// controller to the IOPORT registers that control the pin.  PORT_BASE()
+// gives us the address of the register group for the 32 GPIO pins with
+// the same letter name as our target pin (e.g., PTA0 through PTA31), 
+// and PINMASK gives us the bit pattern to write to those registers to
+// access our single GPIO pin.  Each register group has three special
+// registers that update the pin in particular ways:  PSOR ("set output 
+// register") turns pins on, PCOR ("clear output register") turns pins 
+// off, and PTOR ("toggle output register") toggle pins to the opposite
+// of their current values.  These registers have special semantics:
+// writing a bit as 0 has no effect on the corresponding pin, while
+// writing a bit as 1 performs the register's action on the pin.  This
+// allows a single GPIO pin to be set, cleared, or toggled with a
+// 32-bit write to one of these registers, without affecting any of the
+// other pins addressed by the register.  (It also allows changing any
+// group of pins with a single write, although we don't use that
+// feature here.)
 //
-// When used in a loop where the port address and pin mask are cached in
-// local variables, this runs at the same speed as the FastIO library - about 
-// 78ns per pin write on the KL25Z.  Not surprising since it's doing the same
-// thing, and the compiler should be able to reduce a pin write to a single ARM
-// instruction when the port address and mask are in local register variables.
-// The advantage over the FastIO library is that this approach allows for pins
-// to be assigned dynamically at run-time, which we prefer because it allows for
-// configuration changes to be made on the fly rather than having to recompile
-// the program.
+// - To turn a pin ON:  PORT_BASE(pin)->PSOR = PINMASK(pin)
+// - To turn a pin OFF: PORT_BASE(pin)->PCOR = PINMASK(pin)
+// - To toggle a pin:   PORT_BASE(pin)->PTOR = PINMASK(pin)
+//
 #define GPIO_PORT(pin)        (((unsigned int)(pin)) >> PORT_SHIFT)
-#define GPIO_PORT_BASE(pin)   ((FGPIO_Type *)(FPTA_BASE + GPIO_PORT(pin) * 0x40))
+#define GPIO_PORT_BASE(pin)   ((GPIO_Type *)(PTA_BASE + GPIO_PORT(pin) * 0x40))
 #define GPIO_PINMASK(pin)     gpio_set(pin)
  
 class TSL1410R
 {
 public:
-    TSL1410R(int nPixSensor, PinName siPin, PinName clockPin, PinName ao1Pin, PinName ao2Pin) 
-        : nPixSensor(nPixSensor), si(siPin), clock(clockPin), ao1(ao1Pin), ao2(ao2Pin)
+    TSL1410R(int nPixSensor, PinName siPin, PinName clockPin, PinName ao1Pin, PinName /*ao2Pin*/) 
+        : adc_dma(DMAch_ADC), 
+          clkUp_dma(DMAch_CLKUP), 
+          clkDn_dma(DMAch_CLKDN),
+          si(siPin), 
+          clock(clockPin), 
+          ao1(ao1Pin, true),
+          nPixSensor(nPixSensor)
     {
-        // we're in parallel mode if ao2 is a valid pin
-        parallel = (ao2Pin != NC);
+        // allocate our double pixel buffers
+        pix1 = new uint8_t[nPixSensor*2];
+        pix2 = pix1 + nPixSensor;
         
+        // put the first DMA transfer into the first buffer (pix1)
+        pixDMA = 0;
+
         // remember the clock pin port base and pin mask for fast access
         clockPort = GPIO_PORT_BASE(clockPin);
         clockMask = GPIO_PINMASK(clockPin);
@@ -93,221 +207,176 @@
         clear();
         clear();
         
-        // set up our DMA channel for reading from our analog in pin
-        ao1.initDMA(&adc_dma);
+        // Set up the Clock Up DMA channel.  This channel takes the
+        // clock high by writing the clock bit to the PSOR (set output) 
+        // register for the clock pin.
+        clkUp_dma.source(&clockMask, false, 32);
+        clkUp_dma.destination(&clockPort->PSOR, false, 32);
+
+        // Set up the Clock Down DMA channel.  This channel takes the
+        // clock low by writing the clock bit to the PCOR (clear output)
+        // register for the clock pin.
+        clkDn_dma.source(&clockMask, false, 32);
+        clkDn_dma.destination(&clockPort->PCOR, false, 32);
         
-        // Set up our DMA channel for writing the sensor SCLK - we use the PTOR
-        // (toggle) register to flip the bit on each write.  To pad the timing
-        // to the rate required by the CCD, do a no-op 0 write to PTOR after
-        // each toggle.  This gives us a 16-byte buffer, which we can make
-        // circular in the DMA controller.
-        static const uint32_t clkseq[] = { clockMask, 0, clockMask, 0 };
-        clk_dma.destination(&clockPort->PTOR, false, 32);
-        clk_dma.source(clkseq, true, 32, 16);   // set up our circular source buffer
-        clk_dma.trigger(Trigger_ADC0);          // software trigger
-        clk_dma.setCycleSteal(false);           // do the entire transfer on each trigger
+        // Set up the ADC transfer DMA channel.  This channel transfers
+        // the current analog sampling result from the ADC output register
+        // to our pixel array.
+        ao1.initDMA(&adc_dma);
+
+        // Set up our chain of linked DMA channel:
+        //   ADC sample completion triggers Clock Up
+        //   ...which triggers the ADC transfer
+        //   ... which triggers Clock Down
+        clkUp_dma.trigger(Trigger_ADC0);
+        clkUp_dma.link(adc_dma);
+        adc_dma.link(clkDn_dma, false);
         
-        totalTime = 0.0; nRuns = 0; // $$$
+        // Set the trigger on the downstream links to NONE - these are
+        // triggered by their upstream links, so they don't need separate
+        // peripheral or software triggers.
+        adc_dma.trigger(Trigger_NONE);
+        clkDn_dma.trigger(Trigger_NONE);
+        
+        // Register an interrupt callback so that we're notified when
+        // the last transfer completes.
+        clkDn_dma.attach(this, &TSL1410R::transferDone);
+
+        // clear the timing statistics        
+        totalTime = 0.0; 
+        nRuns = 0;
     }
     
-    float totalTime; int nRuns; // $$$
-
-    // ADC interrupt handler - on each ADC event, 
-    static TSL1410R *instance;
-    static void _aiIRQ() { }
+    // end of transfer notification
+    void transferDone()
+    {
+        // stop the ADC sampler
+        ao1.stop();
+            
+        // clock out one extra pixel to leave A1 in the high-Z state
+        clock = 1;
+        clock = 0;
 
-    // Read the pixels.
-    //
-    // 'n' specifies the number of pixels to sample, and is the size of
-    // the output array 'pix'.  This can be less than the full number
-    // of pixels on the physical device; if it is, we'll spread the
-    // sample evenly across the full length of the device by skipping
-    // one or more pixels between each sampled pixel to pad out the
-    // difference between the sample size and the physical CCD size.
-    // For example, if the physical sensor has 1280 pixels, and 'n' is
-    // 640, we'll read every other pixel and skip every other pixel.
-    // If 'n' is 160, we'll read every 8th pixel and skip 7 between
-    // each sample.
-    // 
-    // The reason that we provide this subset mode (where 'n' is less
-    // than the physical pixel count) is that reading a pixel is the most
-    // time-consuming part of the scan.  For each pixel we read, we have
-    // to wait for the pixel's charge to transfer from its internal smapling
-    // capacitor to the CCD's output pin, for that charge to transfer to
-    // the KL25Z input pin, and for the KL25Z ADC to get a stable reading.
-    // This all takes on the order of 20us per pixel.  Skipping a pixel
-    // only requires a clock pulse, which takes about 350ns.  So we can
-    // skip 60 pixels in the time it takes to sample 1 pixel.
-    //
-    // We clock an SI pulse at the beginning of the read.  This starts the
-    // next integration cycle: the pixel array will reset on the SI, and 
-    // the integration starts 18 clocks later.  So by the time this method
-    // returns, the next sample will have been integrating for npix-18 clocks.  
-    // That's usually enough time to allow immediately reading the next
-    // sample.  If more integration time is required, the caller can simply
-    // sleep/spin for the desired additional time, or can do other work that
-    // takes the desired additional time.
-    //
-    // If the caller has other work to tend to that takes longer than the
-    // desired maximum integration time, it can call clear() to clock out
-    // the current pixels and start a fresh integration cycle.
-    void read(register uint16_t *pix, int n)
+        // stop the clock
+        t.stop();
+        
+        // count the statistics
+        totalTime += t.read();
+        nRuns += 1;
+    }
+    
+    // Get the stable pixel array.  This is the image array from the
+    // previous capture.  It remains valid until the next startCapture()
+    // call, at which point this buffer will be reused for the new capture.
+    void getPix(uint8_t * &pix, int &n)
     {
-        Timer t; t.start(); //float tDMA, tPix; // $$$
+        // return the pixel array that ISN'T assigned to the DMA
+        pix = pixDMA ? pix1 : pix2;
+        n = nPixSensor;
+    }
+    
+    // Start an image capture from the sensor.  This waits for any previous 
+    // capture to finish, then starts a new one and returns immediately.  The
+    // new capture proceeds autonomously via the DMA hardware, so the caller
+    // can continue with other processing during the capture.
+    void startCapture()
+    {
+        // wait for the previous transfer to finish
+        while (adc_dma.isBusy())  { }
         
-        // get the clock pin pointers into local variables for fast access
-        register volatile uint32_t *clockPTOR = &clockPort->PTOR;
-        register const uint32_t clockMask = this->clockMask;
+        // swap to the other DMA buffer
+        pixDMA ^= 1;
+        
+        // start timing this transfer
+        t.reset();
+        t.start();
         
+        // set up the active pixel array as the destination buffer for 
+        // the ADC DMA channel
+        adc_dma.destination(pixDMA ? pix2 : pix1, true);
+
+        // start the DMA transfers
+        clkDn_dma.start(nPixSensor*4);
+        adc_dma.start(nPixSensor);
+        clkUp_dma.start(nPixSensor*4);
+            
         // start the next integration cycle by pulsing SI and one clock
         si = 1;
         clock = 1;
         si = 0;
         clock = 0;
         
-        // figure how many pixels to skip on each read
-        int skip = nPixSensor/n - 1;
-        
-///$$$
-static int done=0;
-if (done++ == 0) printf("nPixSensor=%d, n=%d, skip=%d, parallel=%d\r\n", nPixSensor, n, skip, parallel);
-
-        // read all of the pixels
-        int dst;
-        if (parallel)
-        {
-            // Parallel mode - read pixels from each half sensor concurrently.
-            // Divide 'n' (the output pixel count) by 2 to get the loop count,
-            // since we're going to do 2 pixels on each iteration.
-            for (n /= 2, dst = 0 ; dst < n ; ++dst)
-            {
-                // Take the clock high.  The TSL1410R will connect the next
-                // pixel pair's hold capacitors to the A01 and AO2 lines 
-                // (respectively) on the clock rising edge.
-                *clockPTOR = clockMask;
+        // clock in the first pixel
+        clock = 1;
+        clock = 0;
 
-                // Start the ADC sampler for AO1.  The TSL1410R sample 
-                // stabilization time per the data sheet is 120ns.  This is
-                // fast enough that we don't need an explicit delay, since
-                // the instructions to execute this call will take longer
-                // than that.
-                ao1.start();
-                
-                // take the clock low while we're waiting for the reading
-                *clockPTOR = clockMask;
-                
-                // Read the first half-sensor pixel from AO1
-                pix[dst] = ao1.read_u16();
-                
-                // Read the second half-sensor pixel from AO2, and store it
-                // in the destination array at the current index PLUS 'n',
-                // which you will recall contains half the output pixel count.
-                // This second pixel is halfway up the sensor from the first 
-                // pixel, so it goes halfway up the output array from the
-                // current output position.
-                ao2.start();
-                pix[dst + n] = ao2.read_u16();
-                
-                // Clock through the skipped pixels
-                for (int i = skip ; i > 0 ; --i) 
-                {
-                    *clockPTOR = clockMask;
-                    *clockPTOR = clockMask;
-                    *clockPTOR = 0;         // pad the timing with an extra nop write
-                }
-            }
-        }
-        else
-        {
-            // serial mode - read all pixels in a single file
-
-            // clock in the first pixel
-            clock = 1;
-            clock = 0;
-            
-            // start the ADC DMA transfer
-            ao1.startDMA(pix, n, true);
-            
-            // We do 4 clock PTOR writes per clocked pixel (the skipped pixels 
-            // plus the pixel we actually want to sample), at 32 bits (4 bytes) 
-            // each, giving 16 bytes per pixel for the overall write.
-            int clk_dma_len = (skip+1)*16;
-            clk_dma.start(clk_dma_len);
-            
-            // start the first sample
-            ao1.start();
-            
-            // read all pixels
-            for (dst = n*2 ; dst > 0 ; dst -= 2)
-            {
-                // wait for the current ADC sample to finish
-                while (adc_dma.remaining() >= dst) { }
-                
-                // start the next analog read while we're finishing the DMA transfers
-                ao1.start();
-                
-                // re-arm the clock DMA
-                //clk_dma.restart(clk_dma_len);
-            }
-            
-            // wait for the DMA transfer to finish
-            while (adc_dma.isBusy()) { }
-            
-            // apply the 12-bit to 16-bit rescaling to all values
-            for (int i = 0 ; i < n ; ++i)
-                pix[i] <<= 4;
-        }
-        
-//$$$
-if (done==1) printf(". done: dst=%d\r\n", dst);
-        
-        // clock out one extra pixel to leave A1 in the high-Z state
-        *clockPTOR = clockMask;
-        *clockPTOR = clockMask;
-        
-        if (n >= 64) { totalTime += t.read(); nRuns += 1; } // $$$
+        // Start the ADC sampler.  The ADC will read samples continuously
+        // until we tell it to stop.  Each sample completion will trigger 
+        // our linked DMA channel, which will store the next sample in our
+        // pixel array and pulse the CCD serial data clock to load the next
+        // pixel onto the analog sampler pin.  This will all happen without
+        // any CPU involvement, so we can continue with other work.
+        ao1.start();
     }
-
+    
     // Clock through all pixels to clear the array.  Pulses SI at the
     // beginning of the operation, which starts a new integration cycle.
     // The caller can thus immediately call read() to read the pixels 
     // integrated while the clear() was taking place.
     void clear()
     {
-        // get the clock pin pointers into local variables for fast access
-        register FGPIO_Type *clockPort = this->clockPort;
-        register uint32_t clockMask = this->clockMask;
-
         // clock in an SI pulse
         si = 1;
         clockPort->PSOR = clockMask;
         si = 0;
         clockPort->PCOR = clockMask;
         
-        // if in serial mode, clock all pixels across both sensor halves;
-        // in parallel mode, the pixels are clocked together
-        int n = parallel ? nPixSensor/2 : nPixSensor;
-        
         // clock out all pixels
-        for (int i = 0 ; i < n + 1 ; ++i) {
-            clock = 1; // $$$clockPort->PSOR = clockMask;
-            clock = 0; // $$$clockPort->PCOR = clockMask;
+        for (int i = 0 ; i < nPixSensor + 1 ; ++i) 
+        {
+            clock = 1;
+            clock = 0;
         }
     }
+    
+    // get the timing statistics
+    void getTimingStats(float &totalTime, uint32_t &nRuns)
+    {
+        totalTime = this->totalTime;
+        nRuns = this->nRuns;
+    }
 
 private:
-    SimpleDMA adc_dma;        // DMA controller for reading the analog input
-    SimpleDMA clk_dma;        // DMA controller for the sensor SCLK (writes the PTOR register to toggle the clock bit)
-    char *dmabuf;             // buffer for DMA transfers
-    int nPixSensor;           // number of pixels in physical sensor array
+    // DMA controller interfaces
+    SimpleDMA adc_dma;        // DMA channel for reading the analog input
+    SimpleDMA clkUp_dma;      // "Clock Up" channel
+    SimpleDMA clkDn_dma;      // "Clock Down" channel
+
+    // Sensor interface pins
     DigitalOut si;            // GPIO pin for sensor SI (serial data) 
     DigitalOut clock;         // GPIO pin for sensor SCLK (serial clock)
-    FGPIO_Type *clockPort;    // IOPORT base address for clock pin - cached for fast writes
+    GPIO_Type *clockPort;     // IOPORT base address for clock pin - cached for DMA writes
     uint32_t clockMask;       // IOPORT register bit mask for clock pin
-    AltAnalogIn ao1;          // GPIO pin for sensor AO1 (analog output 1) - we read sensor data from this pin
-    AltAnalogIn ao2;          // GPIO pin for sensor AO2 (analog output 2) - 2nd sensor data pin, when in parallel mode
-    bool parallel;            // true -> running in parallel mode (we read AO1 and AO2 separately on each clock)
+    AltAnalogIn ao1;          // GPIO pin for sensor AO (analog output)
+    
+    // number of pixels in the physical sensor array
+    int nPixSensor;           // number of pixels in physical sensor array
+
+    // pixel buffers - we keep two buffers so that we can transfer the
+    // current sensor data into one buffer via DMA while we concurrently
+    // process the last buffer
+    uint8_t *pix1;            // pixel array 1
+    uint8_t *pix2;            // pixel array 2
+    
+    // DMA target buffer.  This is the buffer for the next DMA transfer.
+    // 0 means pix1, 1 means pix2.  The other buffer contains the stable 
+    // data from the last transfer.
+    uint8_t pixDMA;
+
+    // timing statistics
+    Timer t;                  // timer - started when we start a DMA transfer
+    float totalTime;          // total time consumed by all reads so far
+    uint32_t nRuns;           // number of runs so far
 };
  
 #endif /* TSL1410R_H */
-