Program to record speech audio into RAM and then play it back, moving Billy Bass's mouth in sync with the speech.

Dependencies:   mbed

Remember Big Mouth Billy Bass?

I've made a simple demo program for him using the Freescale FRDM-KL25Z board. I've hooked up the digital I/O to his motor driver transistors and pushbutton switch.

This program records 1.8 seconds of speech audio from ADC input when the pushbutton is pressed, then plays the audio back with Billy Bass's mouth controlled so that it opens during vowel sounds.

The ADC input is driven from a microphone and preamplifier, via a capacitor and into a resistor divider connected to the +3.3V supply pin to provide mid-range biasing for the ADC signals.

The DAC output is connected to his audio amplifier input (to the trace that was connected to pin 10 of the controller IC). I had to provide a DC bias using the DAC to get the single transistor amplifier biased into proper operation.

For more on the method of vowel recognition, please see the paper: http://www.mirlab.org/conference_papers/International_Conference/ICASSP%201999/PDF/AUTHOR/IC991957.PDF

Y. Nishida, Y. Nakadai, Y. Suzuki, T. Sakurai, T. Kurokawa, and H. Sato. 1999.

Voice recognition focusing on vowel strings on a fixed-point 20-MIPS DSP board.

In Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01 (ICASSP '99), Vol. 1. IEEE Computer Society, Washington, DC, USA, 137-140. DOI=10.1109/ICASSP.1999.758081 http://dx.doi.org/10.1109/ICASSP.1999.758081

Revision:
4:c989412b91ea
Parent:
3:c04d8d0493f4
--- a/AudioAnalyzer.h	Wed May 15 15:32:34 2013 +0000
+++ b/AudioAnalyzer.h	Wed May 15 17:53:33 2013 +0000
@@ -1,6 +1,8 @@
 #ifndef __included_audio_analyzer_h
 #define __included_audio_analyzer_h
 
+#include <math.h>
+
 namespace NK
 {
 
@@ -11,6 +13,8 @@
     uint16_t nsamples;
     uint16_t zeroCrossings;
     uint32_t power;
+    float logPower;
+    float powerRef;
     int8_t minValue;
     int8_t maxValue;
     bool analyzed;
@@ -19,7 +23,7 @@
 
 public:
     AudioAnalyzer(int8_t const *_samples, uint16_t _nsamples)
-        : samples(_samples), nsamples(_nsamples), zeroCrossings(0), power(0), analyzed(false) {
+        : samples(_samples), nsamples(_nsamples), zeroCrossings(0), power(0), logPower(0.0), powerRef(0.0), analyzed(false) {
     }
 
     uint16_t getZeroCrossings() {
@@ -27,16 +31,52 @@
         return zeroCrossings;
     }
 
+    float getZeroCrossingRatioPercent() {
+        return getZeroCrossings() * 100.0 / nsamples;
+    }
+
     uint32_t getPower() {
         if (!analyzed) analyze();
         return power;
     }
 
+    float getLogPower() {
+        if (!analyzed) analyze();
+        logPower = ::log((double)power) - powerRef;
+        return logPower;
+    }
+
     void getMinMaxValues(int8_t *min, int8_t *max) {
         if (!analyzed) analyze();
         *min = minValue;
         *max = maxValue;
     }
+
+    bool isVoiced() {
+        return !(isnan(getLogPower()) || logPower < PowerThreshold);
+    }
+
+    void setPowerRef(float _powerRef) {
+        powerRef = _powerRef;
+    }
+
+    // anything with logPower above PowerThreshold
+    // and below the line
+    // zeroCrossingRatioPercent = VowelSlope * logPower + VowelIntercept
+    bool isVowel() {
+        getLogPower();
+        if (logPower < PowerThreshold)
+            return false;
+        return (getZeroCrossingRatioPercent() < VowelSlope * (logPower - VowelXIntercept));
+    }
+
+    static const float PowerThreshold = -4.0;
+    // anything below the line
+    // zeroCrossingRatioPercent = VowelSlope * logPower + VowelIntercept
+    // and above PowerThreshold
+    // is considered a vowel.
+    static const float VowelSlope = 14.7;
+    static const float VowelXIntercept = -0.7;
 };
 
 } // namespace NK