CMSIS DSP library

Dependents:   KL25Z_FFT_Demo Hat_Board_v5_1 KL25Z_FFT_Demo_tony KL25Z_FFT_Demo_tony ... more

Fork of mbed-dsp by mbed official

Committer:
emilmont
Date:
Wed Nov 28 12:30:09 2012 +0000
Revision:
1:fdd22bb7aa52
Child:
2:da51fb522205
DSP library code

Who changed what in which revision?

UserRevisionLine numberNew contents of line
emilmont 1:fdd22bb7aa52 1 /* ----------------------------------------------------------------------
emilmont 1:fdd22bb7aa52 2 * Copyright (C) 2010 ARM Limited. All rights reserved.
emilmont 1:fdd22bb7aa52 3 *
emilmont 1:fdd22bb7aa52 4 * $Date: 15. February 2012
emilmont 1:fdd22bb7aa52 5 * $Revision: V1.1.0
emilmont 1:fdd22bb7aa52 6 *
emilmont 1:fdd22bb7aa52 7 * Project: CMSIS DSP Library
emilmont 1:fdd22bb7aa52 8 * Title: arm_dct4_f32.c
emilmont 1:fdd22bb7aa52 9 *
emilmont 1:fdd22bb7aa52 10 * Description: Processing function of DCT4 & IDCT4 F32.
emilmont 1:fdd22bb7aa52 11 *
emilmont 1:fdd22bb7aa52 12 * Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
emilmont 1:fdd22bb7aa52 13 *
emilmont 1:fdd22bb7aa52 14 * Version 1.1.0 2012/02/15
emilmont 1:fdd22bb7aa52 15 * Updated with more optimizations, bug fixes and minor API changes.
emilmont 1:fdd22bb7aa52 16 *
emilmont 1:fdd22bb7aa52 17 * Version 1.0.10 2011/7/15
emilmont 1:fdd22bb7aa52 18 * Big Endian support added and Merged M0 and M3/M4 Source code.
emilmont 1:fdd22bb7aa52 19 *
emilmont 1:fdd22bb7aa52 20 * Version 1.0.3 2010/11/29
emilmont 1:fdd22bb7aa52 21 * Re-organized the CMSIS folders and updated documentation.
emilmont 1:fdd22bb7aa52 22 *
emilmont 1:fdd22bb7aa52 23 * Version 1.0.2 2010/11/11
emilmont 1:fdd22bb7aa52 24 * Documentation updated.
emilmont 1:fdd22bb7aa52 25 *
emilmont 1:fdd22bb7aa52 26 * Version 1.0.1 2010/10/05
emilmont 1:fdd22bb7aa52 27 * Production release and review comments incorporated.
emilmont 1:fdd22bb7aa52 28 *
emilmont 1:fdd22bb7aa52 29 * Version 1.0.0 2010/09/20
emilmont 1:fdd22bb7aa52 30 * Production release and review comments incorporated.
emilmont 1:fdd22bb7aa52 31 * -------------------------------------------------------------------- */
emilmont 1:fdd22bb7aa52 32
emilmont 1:fdd22bb7aa52 33 #include "arm_math.h"
emilmont 1:fdd22bb7aa52 34
emilmont 1:fdd22bb7aa52 35 /**
emilmont 1:fdd22bb7aa52 36 * @ingroup groupTransforms
emilmont 1:fdd22bb7aa52 37 */
emilmont 1:fdd22bb7aa52 38
emilmont 1:fdd22bb7aa52 39 /**
emilmont 1:fdd22bb7aa52 40 * @defgroup DCT4_IDCT4 DCT Type IV Functions
emilmont 1:fdd22bb7aa52 41 * Representation of signals by minimum number of values is important for storage and transmission.
emilmont 1:fdd22bb7aa52 42 * The possibility of large discontinuity between the beginning and end of a period of a signal
emilmont 1:fdd22bb7aa52 43 * in DFT can be avoided by extending the signal so that it is even-symmetric.
emilmont 1:fdd22bb7aa52 44 * Discrete Cosine Transform (DCT) is constructed such that its energy is heavily concentrated in the lower part of the
emilmont 1:fdd22bb7aa52 45 * spectrum and is very widely used in signal and image coding applications.
emilmont 1:fdd22bb7aa52 46 * The family of DCTs (DCT type- 1,2,3,4) is the outcome of different combinations of homogeneous boundary conditions.
emilmont 1:fdd22bb7aa52 47 * DCT has an excellent energy-packing capability, hence has many applications and in data compression in particular.
emilmont 1:fdd22bb7aa52 48 *
emilmont 1:fdd22bb7aa52 49 * DCT is essentially the Discrete Fourier Transform(DFT) of an even-extended real signal.
emilmont 1:fdd22bb7aa52 50 * Reordering of the input data makes the computation of DCT just a problem of
emilmont 1:fdd22bb7aa52 51 * computing the DFT of a real signal with a few additional operations.
emilmont 1:fdd22bb7aa52 52 * This approach provides regular, simple, and very efficient DCT algorithms for practical hardware and software implementations.
emilmont 1:fdd22bb7aa52 53 *
emilmont 1:fdd22bb7aa52 54 * DCT type-II can be implemented using Fast fourier transform (FFT) internally, as the transform is applied on real values, Real FFT can be used.
emilmont 1:fdd22bb7aa52 55 * DCT4 is implemented using DCT2 as their implementations are similar except with some added pre-processing and post-processing.
emilmont 1:fdd22bb7aa52 56 * DCT2 implementation can be described in the following steps:
emilmont 1:fdd22bb7aa52 57 * - Re-ordering input
emilmont 1:fdd22bb7aa52 58 * - Calculating Real FFT
emilmont 1:fdd22bb7aa52 59 * - Multiplication of weights and Real FFT output and getting real part from the product.
emilmont 1:fdd22bb7aa52 60 *
emilmont 1:fdd22bb7aa52 61 * This process is explained by the block diagram below:
emilmont 1:fdd22bb7aa52 62 * \image html DCT4.gif "Discrete Cosine Transform - type-IV"
emilmont 1:fdd22bb7aa52 63 *
emilmont 1:fdd22bb7aa52 64 * \par Algorithm:
emilmont 1:fdd22bb7aa52 65 * The N-point type-IV DCT is defined as a real, linear transformation by the formula:
emilmont 1:fdd22bb7aa52 66 * \image html DCT4Equation.gif
emilmont 1:fdd22bb7aa52 67 * where <code>k = 0,1,2,.....N-1</code>
emilmont 1:fdd22bb7aa52 68 *\par
emilmont 1:fdd22bb7aa52 69 * Its inverse is defined as follows:
emilmont 1:fdd22bb7aa52 70 * \image html IDCT4Equation.gif
emilmont 1:fdd22bb7aa52 71 * where <code>n = 0,1,2,.....N-1</code>
emilmont 1:fdd22bb7aa52 72 *\par
emilmont 1:fdd22bb7aa52 73 * The DCT4 matrices become involutory (i.e. they are self-inverse) by multiplying with an overall scale factor of sqrt(2/N).
emilmont 1:fdd22bb7aa52 74 * The symmetry of the transform matrix indicates that the fast algorithms for the forward
emilmont 1:fdd22bb7aa52 75 * and inverse transform computation are identical.
emilmont 1:fdd22bb7aa52 76 * Note that the implementation of Inverse DCT4 and DCT4 is same, hence same process function can be used for both.
emilmont 1:fdd22bb7aa52 77 *
emilmont 1:fdd22bb7aa52 78 * \par Lengths supported by the transform:
emilmont 1:fdd22bb7aa52 79 * As DCT4 internally uses Real FFT, it supports all the lengths supported by arm_rfft_f32().
emilmont 1:fdd22bb7aa52 80 * The library provides separate functions for Q15, Q31, and floating-point data types.
emilmont 1:fdd22bb7aa52 81 * \par Instance Structure
emilmont 1:fdd22bb7aa52 82 * The instances for Real FFT and FFT, cosine values table and twiddle factor table are stored in an instance data structure.
emilmont 1:fdd22bb7aa52 83 * A separate instance structure must be defined for each transform.
emilmont 1:fdd22bb7aa52 84 * There are separate instance structure declarations for each of the 3 supported data types.
emilmont 1:fdd22bb7aa52 85 *
emilmont 1:fdd22bb7aa52 86 * \par Initialization Functions
emilmont 1:fdd22bb7aa52 87 * There is also an associated initialization function for each data type.
emilmont 1:fdd22bb7aa52 88 * The initialization function performs the following operations:
emilmont 1:fdd22bb7aa52 89 * - Sets the values of the internal structure fields.
emilmont 1:fdd22bb7aa52 90 * - Initializes Real FFT as its process function is used internally in DCT4, by calling arm_rfft_init_f32().
emilmont 1:fdd22bb7aa52 91 * \par
emilmont 1:fdd22bb7aa52 92 * Use of the initialization function is optional.
emilmont 1:fdd22bb7aa52 93 * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.
emilmont 1:fdd22bb7aa52 94 * To place an instance structure into a const data section, the instance structure must be manually initialized.
emilmont 1:fdd22bb7aa52 95 * Manually initialize the instance structure as follows:
emilmont 1:fdd22bb7aa52 96 * <pre>
emilmont 1:fdd22bb7aa52 97 *arm_dct4_instance_f32 S = {N, Nby2, normalize, pTwiddle, pCosFactor, pRfft, pCfft};
emilmont 1:fdd22bb7aa52 98 *arm_dct4_instance_q31 S = {N, Nby2, normalize, pTwiddle, pCosFactor, pRfft, pCfft};
emilmont 1:fdd22bb7aa52 99 *arm_dct4_instance_q15 S = {N, Nby2, normalize, pTwiddle, pCosFactor, pRfft, pCfft};
emilmont 1:fdd22bb7aa52 100 * </pre>
emilmont 1:fdd22bb7aa52 101 * where \c N is the length of the DCT4; \c Nby2 is half of the length of the DCT4;
emilmont 1:fdd22bb7aa52 102 * \c normalize is normalizing factor used and is equal to <code>sqrt(2/N)</code>;
emilmont 1:fdd22bb7aa52 103 * \c pTwiddle points to the twiddle factor table;
emilmont 1:fdd22bb7aa52 104 * \c pCosFactor points to the cosFactor table;
emilmont 1:fdd22bb7aa52 105 * \c pRfft points to the real FFT instance;
emilmont 1:fdd22bb7aa52 106 * \c pCfft points to the complex FFT instance;
emilmont 1:fdd22bb7aa52 107 * The CFFT and RFFT structures also needs to be initialized, refer to arm_cfft_radix4_f32()
emilmont 1:fdd22bb7aa52 108 * and arm_rfft_f32() respectively for details regarding static initialization.
emilmont 1:fdd22bb7aa52 109 *
emilmont 1:fdd22bb7aa52 110 * \par Fixed-Point Behavior
emilmont 1:fdd22bb7aa52 111 * Care must be taken when using the fixed-point versions of the DCT4 transform functions.
emilmont 1:fdd22bb7aa52 112 * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.
emilmont 1:fdd22bb7aa52 113 * Refer to the function specific documentation below for usage guidelines.
emilmont 1:fdd22bb7aa52 114 */
emilmont 1:fdd22bb7aa52 115
emilmont 1:fdd22bb7aa52 116 /**
emilmont 1:fdd22bb7aa52 117 * @addtogroup DCT4_IDCT4
emilmont 1:fdd22bb7aa52 118 * @{
emilmont 1:fdd22bb7aa52 119 */
emilmont 1:fdd22bb7aa52 120
emilmont 1:fdd22bb7aa52 121 /**
emilmont 1:fdd22bb7aa52 122 * @brief Processing function for the floating-point DCT4/IDCT4.
emilmont 1:fdd22bb7aa52 123 * @param[in] *S points to an instance of the floating-point DCT4/IDCT4 structure.
emilmont 1:fdd22bb7aa52 124 * @param[in] *pState points to state buffer.
emilmont 1:fdd22bb7aa52 125 * @param[in,out] *pInlineBuffer points to the in-place input and output buffer.
emilmont 1:fdd22bb7aa52 126 * @return none.
emilmont 1:fdd22bb7aa52 127 */
emilmont 1:fdd22bb7aa52 128
emilmont 1:fdd22bb7aa52 129 void arm_dct4_f32(
emilmont 1:fdd22bb7aa52 130 const arm_dct4_instance_f32 * S,
emilmont 1:fdd22bb7aa52 131 float32_t * pState,
emilmont 1:fdd22bb7aa52 132 float32_t * pInlineBuffer)
emilmont 1:fdd22bb7aa52 133 {
emilmont 1:fdd22bb7aa52 134 uint32_t i; /* Loop counter */
emilmont 1:fdd22bb7aa52 135 float32_t *weights = S->pTwiddle; /* Pointer to the Weights table */
emilmont 1:fdd22bb7aa52 136 float32_t *cosFact = S->pCosFactor; /* Pointer to the cos factors table */
emilmont 1:fdd22bb7aa52 137 float32_t *pS1, *pS2, *pbuff; /* Temporary pointers for input buffer and pState buffer */
emilmont 1:fdd22bb7aa52 138 float32_t in; /* Temporary variable */
emilmont 1:fdd22bb7aa52 139
emilmont 1:fdd22bb7aa52 140
emilmont 1:fdd22bb7aa52 141 /* DCT4 computation involves DCT2 (which is calculated using RFFT)
emilmont 1:fdd22bb7aa52 142 * along with some pre-processing and post-processing.
emilmont 1:fdd22bb7aa52 143 * Computational procedure is explained as follows:
emilmont 1:fdd22bb7aa52 144 * (a) Pre-processing involves multiplying input with cos factor,
emilmont 1:fdd22bb7aa52 145 * r(n) = 2 * u(n) * cos(pi*(2*n+1)/(4*n))
emilmont 1:fdd22bb7aa52 146 * where,
emilmont 1:fdd22bb7aa52 147 * r(n) -- output of preprocessing
emilmont 1:fdd22bb7aa52 148 * u(n) -- input to preprocessing(actual Source buffer)
emilmont 1:fdd22bb7aa52 149 * (b) Calculation of DCT2 using FFT is divided into three steps:
emilmont 1:fdd22bb7aa52 150 * Step1: Re-ordering of even and odd elements of input.
emilmont 1:fdd22bb7aa52 151 * Step2: Calculating FFT of the re-ordered input.
emilmont 1:fdd22bb7aa52 152 * Step3: Taking the real part of the product of FFT output and weights.
emilmont 1:fdd22bb7aa52 153 * (c) Post-processing - DCT4 can be obtained from DCT2 output using the following equation:
emilmont 1:fdd22bb7aa52 154 * Y4(k) = Y2(k) - Y4(k-1) and Y4(-1) = Y4(0)
emilmont 1:fdd22bb7aa52 155 * where,
emilmont 1:fdd22bb7aa52 156 * Y4 -- DCT4 output, Y2 -- DCT2 output
emilmont 1:fdd22bb7aa52 157 * (d) Multiplying the output with the normalizing factor sqrt(2/N).
emilmont 1:fdd22bb7aa52 158 */
emilmont 1:fdd22bb7aa52 159
emilmont 1:fdd22bb7aa52 160 /*-------- Pre-processing ------------*/
emilmont 1:fdd22bb7aa52 161 /* Multiplying input with cos factor i.e. r(n) = 2 * x(n) * cos(pi*(2*n+1)/(4*n)) */
emilmont 1:fdd22bb7aa52 162 arm_scale_f32(pInlineBuffer, 2.0f, pInlineBuffer, S->N);
emilmont 1:fdd22bb7aa52 163 arm_mult_f32(pInlineBuffer, cosFact, pInlineBuffer, S->N);
emilmont 1:fdd22bb7aa52 164
emilmont 1:fdd22bb7aa52 165 /* ----------------------------------------------------------------
emilmont 1:fdd22bb7aa52 166 * Step1: Re-ordering of even and odd elements as,
emilmont 1:fdd22bb7aa52 167 * pState[i] = pInlineBuffer[2*i] and
emilmont 1:fdd22bb7aa52 168 * pState[N-i-1] = pInlineBuffer[2*i+1] where i = 0 to N/2
emilmont 1:fdd22bb7aa52 169 ---------------------------------------------------------------------*/
emilmont 1:fdd22bb7aa52 170
emilmont 1:fdd22bb7aa52 171 /* pS1 initialized to pState */
emilmont 1:fdd22bb7aa52 172 pS1 = pState;
emilmont 1:fdd22bb7aa52 173
emilmont 1:fdd22bb7aa52 174 /* pS2 initialized to pState+N-1, so that it points to the end of the state buffer */
emilmont 1:fdd22bb7aa52 175 pS2 = pState + (S->N - 1u);
emilmont 1:fdd22bb7aa52 176
emilmont 1:fdd22bb7aa52 177 /* pbuff initialized to input buffer */
emilmont 1:fdd22bb7aa52 178 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 179
emilmont 1:fdd22bb7aa52 180 #ifndef ARM_MATH_CM0
emilmont 1:fdd22bb7aa52 181
emilmont 1:fdd22bb7aa52 182 /* Run the below code for Cortex-M4 and Cortex-M3 */
emilmont 1:fdd22bb7aa52 183
emilmont 1:fdd22bb7aa52 184 /* Initializing the loop counter to N/2 >> 2 for loop unrolling by 4 */
emilmont 1:fdd22bb7aa52 185 i = (uint32_t) S->Nby2 >> 2u;
emilmont 1:fdd22bb7aa52 186
emilmont 1:fdd22bb7aa52 187 /* First part of the processing with loop unrolling. Compute 4 outputs at a time.
emilmont 1:fdd22bb7aa52 188 ** a second loop below computes the remaining 1 to 3 samples. */
emilmont 1:fdd22bb7aa52 189 do
emilmont 1:fdd22bb7aa52 190 {
emilmont 1:fdd22bb7aa52 191 /* Re-ordering of even and odd elements */
emilmont 1:fdd22bb7aa52 192 /* pState[i] = pInlineBuffer[2*i] */
emilmont 1:fdd22bb7aa52 193 *pS1++ = *pbuff++;
emilmont 1:fdd22bb7aa52 194 /* pState[N-i-1] = pInlineBuffer[2*i+1] */
emilmont 1:fdd22bb7aa52 195 *pS2-- = *pbuff++;
emilmont 1:fdd22bb7aa52 196
emilmont 1:fdd22bb7aa52 197 *pS1++ = *pbuff++;
emilmont 1:fdd22bb7aa52 198 *pS2-- = *pbuff++;
emilmont 1:fdd22bb7aa52 199
emilmont 1:fdd22bb7aa52 200 *pS1++ = *pbuff++;
emilmont 1:fdd22bb7aa52 201 *pS2-- = *pbuff++;
emilmont 1:fdd22bb7aa52 202
emilmont 1:fdd22bb7aa52 203 *pS1++ = *pbuff++;
emilmont 1:fdd22bb7aa52 204 *pS2-- = *pbuff++;
emilmont 1:fdd22bb7aa52 205
emilmont 1:fdd22bb7aa52 206 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 207 i--;
emilmont 1:fdd22bb7aa52 208 } while(i > 0u);
emilmont 1:fdd22bb7aa52 209
emilmont 1:fdd22bb7aa52 210 /* pbuff initialized to input buffer */
emilmont 1:fdd22bb7aa52 211 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 212
emilmont 1:fdd22bb7aa52 213 /* pS1 initialized to pState */
emilmont 1:fdd22bb7aa52 214 pS1 = pState;
emilmont 1:fdd22bb7aa52 215
emilmont 1:fdd22bb7aa52 216 /* Initializing the loop counter to N/4 instead of N for loop unrolling */
emilmont 1:fdd22bb7aa52 217 i = (uint32_t) S->N >> 2u;
emilmont 1:fdd22bb7aa52 218
emilmont 1:fdd22bb7aa52 219 /* Processing with loop unrolling 4 times as N is always multiple of 4.
emilmont 1:fdd22bb7aa52 220 * Compute 4 outputs at a time */
emilmont 1:fdd22bb7aa52 221 do
emilmont 1:fdd22bb7aa52 222 {
emilmont 1:fdd22bb7aa52 223 /* Writing the re-ordered output back to inplace input buffer */
emilmont 1:fdd22bb7aa52 224 *pbuff++ = *pS1++;
emilmont 1:fdd22bb7aa52 225 *pbuff++ = *pS1++;
emilmont 1:fdd22bb7aa52 226 *pbuff++ = *pS1++;
emilmont 1:fdd22bb7aa52 227 *pbuff++ = *pS1++;
emilmont 1:fdd22bb7aa52 228
emilmont 1:fdd22bb7aa52 229 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 230 i--;
emilmont 1:fdd22bb7aa52 231 } while(i > 0u);
emilmont 1:fdd22bb7aa52 232
emilmont 1:fdd22bb7aa52 233
emilmont 1:fdd22bb7aa52 234 /* ---------------------------------------------------------
emilmont 1:fdd22bb7aa52 235 * Step2: Calculate RFFT for N-point input
emilmont 1:fdd22bb7aa52 236 * ---------------------------------------------------------- */
emilmont 1:fdd22bb7aa52 237 /* pInlineBuffer is real input of length N , pState is the complex output of length 2N */
emilmont 1:fdd22bb7aa52 238 arm_rfft_f32(S->pRfft, pInlineBuffer, pState);
emilmont 1:fdd22bb7aa52 239
emilmont 1:fdd22bb7aa52 240 /*----------------------------------------------------------------------
emilmont 1:fdd22bb7aa52 241 * Step3: Multiply the FFT output with the weights.
emilmont 1:fdd22bb7aa52 242 *----------------------------------------------------------------------*/
emilmont 1:fdd22bb7aa52 243 arm_cmplx_mult_cmplx_f32(pState, weights, pState, S->N);
emilmont 1:fdd22bb7aa52 244
emilmont 1:fdd22bb7aa52 245 /* ----------- Post-processing ---------- */
emilmont 1:fdd22bb7aa52 246 /* DCT-IV can be obtained from DCT-II by the equation,
emilmont 1:fdd22bb7aa52 247 * Y4(k) = Y2(k) - Y4(k-1) and Y4(-1) = Y4(0)
emilmont 1:fdd22bb7aa52 248 * Hence, Y4(0) = Y2(0)/2 */
emilmont 1:fdd22bb7aa52 249 /* Getting only real part from the output and Converting to DCT-IV */
emilmont 1:fdd22bb7aa52 250
emilmont 1:fdd22bb7aa52 251 /* Initializing the loop counter to N >> 2 for loop unrolling by 4 */
emilmont 1:fdd22bb7aa52 252 i = ((uint32_t) S->N - 1u) >> 2u;
emilmont 1:fdd22bb7aa52 253
emilmont 1:fdd22bb7aa52 254 /* pbuff initialized to input buffer. */
emilmont 1:fdd22bb7aa52 255 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 256
emilmont 1:fdd22bb7aa52 257 /* pS1 initialized to pState */
emilmont 1:fdd22bb7aa52 258 pS1 = pState;
emilmont 1:fdd22bb7aa52 259
emilmont 1:fdd22bb7aa52 260 /* Calculating Y4(0) from Y2(0) using Y4(0) = Y2(0)/2 */
emilmont 1:fdd22bb7aa52 261 in = *pS1++ * (float32_t) 0.5;
emilmont 1:fdd22bb7aa52 262 /* input buffer acts as inplace, so output values are stored in the input itself. */
emilmont 1:fdd22bb7aa52 263 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 264
emilmont 1:fdd22bb7aa52 265 /* pState pointer is incremented twice as the real values are located alternatively in the array */
emilmont 1:fdd22bb7aa52 266 pS1++;
emilmont 1:fdd22bb7aa52 267
emilmont 1:fdd22bb7aa52 268 /* First part of the processing with loop unrolling. Compute 4 outputs at a time.
emilmont 1:fdd22bb7aa52 269 ** a second loop below computes the remaining 1 to 3 samples. */
emilmont 1:fdd22bb7aa52 270 do
emilmont 1:fdd22bb7aa52 271 {
emilmont 1:fdd22bb7aa52 272 /* Calculating Y4(1) to Y4(N-1) from Y2 using equation Y4(k) = Y2(k) - Y4(k-1) */
emilmont 1:fdd22bb7aa52 273 /* pState pointer (pS1) is incremented twice as the real values are located alternatively in the array */
emilmont 1:fdd22bb7aa52 274 in = *pS1++ - in;
emilmont 1:fdd22bb7aa52 275 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 276 /* points to the next real value */
emilmont 1:fdd22bb7aa52 277 pS1++;
emilmont 1:fdd22bb7aa52 278
emilmont 1:fdd22bb7aa52 279 in = *pS1++ - in;
emilmont 1:fdd22bb7aa52 280 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 281 pS1++;
emilmont 1:fdd22bb7aa52 282
emilmont 1:fdd22bb7aa52 283 in = *pS1++ - in;
emilmont 1:fdd22bb7aa52 284 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 285 pS1++;
emilmont 1:fdd22bb7aa52 286
emilmont 1:fdd22bb7aa52 287 in = *pS1++ - in;
emilmont 1:fdd22bb7aa52 288 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 289 pS1++;
emilmont 1:fdd22bb7aa52 290
emilmont 1:fdd22bb7aa52 291 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 292 i--;
emilmont 1:fdd22bb7aa52 293 } while(i > 0u);
emilmont 1:fdd22bb7aa52 294
emilmont 1:fdd22bb7aa52 295 /* If the blockSize is not a multiple of 4, compute any remaining output samples here.
emilmont 1:fdd22bb7aa52 296 ** No loop unrolling is used. */
emilmont 1:fdd22bb7aa52 297 i = ((uint32_t) S->N - 1u) % 0x4u;
emilmont 1:fdd22bb7aa52 298
emilmont 1:fdd22bb7aa52 299 while(i > 0u)
emilmont 1:fdd22bb7aa52 300 {
emilmont 1:fdd22bb7aa52 301 /* Calculating Y4(1) to Y4(N-1) from Y2 using equation Y4(k) = Y2(k) - Y4(k-1) */
emilmont 1:fdd22bb7aa52 302 /* pState pointer (pS1) is incremented twice as the real values are located alternatively in the array */
emilmont 1:fdd22bb7aa52 303 in = *pS1++ - in;
emilmont 1:fdd22bb7aa52 304 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 305 /* points to the next real value */
emilmont 1:fdd22bb7aa52 306 pS1++;
emilmont 1:fdd22bb7aa52 307
emilmont 1:fdd22bb7aa52 308 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 309 i--;
emilmont 1:fdd22bb7aa52 310 }
emilmont 1:fdd22bb7aa52 311
emilmont 1:fdd22bb7aa52 312
emilmont 1:fdd22bb7aa52 313 /*------------ Normalizing the output by multiplying with the normalizing factor ----------*/
emilmont 1:fdd22bb7aa52 314
emilmont 1:fdd22bb7aa52 315 /* Initializing the loop counter to N/4 instead of N for loop unrolling */
emilmont 1:fdd22bb7aa52 316 i = (uint32_t) S->N >> 2u;
emilmont 1:fdd22bb7aa52 317
emilmont 1:fdd22bb7aa52 318 /* pbuff initialized to the pInlineBuffer(now contains the output values) */
emilmont 1:fdd22bb7aa52 319 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 320
emilmont 1:fdd22bb7aa52 321 /* Processing with loop unrolling 4 times as N is always multiple of 4. Compute 4 outputs at a time */
emilmont 1:fdd22bb7aa52 322 do
emilmont 1:fdd22bb7aa52 323 {
emilmont 1:fdd22bb7aa52 324 /* Multiplying pInlineBuffer with the normalizing factor sqrt(2/N) */
emilmont 1:fdd22bb7aa52 325 in = *pbuff;
emilmont 1:fdd22bb7aa52 326 *pbuff++ = in * S->normalize;
emilmont 1:fdd22bb7aa52 327
emilmont 1:fdd22bb7aa52 328 in = *pbuff;
emilmont 1:fdd22bb7aa52 329 *pbuff++ = in * S->normalize;
emilmont 1:fdd22bb7aa52 330
emilmont 1:fdd22bb7aa52 331 in = *pbuff;
emilmont 1:fdd22bb7aa52 332 *pbuff++ = in * S->normalize;
emilmont 1:fdd22bb7aa52 333
emilmont 1:fdd22bb7aa52 334 in = *pbuff;
emilmont 1:fdd22bb7aa52 335 *pbuff++ = in * S->normalize;
emilmont 1:fdd22bb7aa52 336
emilmont 1:fdd22bb7aa52 337 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 338 i--;
emilmont 1:fdd22bb7aa52 339 } while(i > 0u);
emilmont 1:fdd22bb7aa52 340
emilmont 1:fdd22bb7aa52 341
emilmont 1:fdd22bb7aa52 342 #else
emilmont 1:fdd22bb7aa52 343
emilmont 1:fdd22bb7aa52 344 /* Run the below code for Cortex-M0 */
emilmont 1:fdd22bb7aa52 345
emilmont 1:fdd22bb7aa52 346 /* Initializing the loop counter to N/2 */
emilmont 1:fdd22bb7aa52 347 i = (uint32_t) S->Nby2;
emilmont 1:fdd22bb7aa52 348
emilmont 1:fdd22bb7aa52 349 do
emilmont 1:fdd22bb7aa52 350 {
emilmont 1:fdd22bb7aa52 351 /* Re-ordering of even and odd elements */
emilmont 1:fdd22bb7aa52 352 /* pState[i] = pInlineBuffer[2*i] */
emilmont 1:fdd22bb7aa52 353 *pS1++ = *pbuff++;
emilmont 1:fdd22bb7aa52 354 /* pState[N-i-1] = pInlineBuffer[2*i+1] */
emilmont 1:fdd22bb7aa52 355 *pS2-- = *pbuff++;
emilmont 1:fdd22bb7aa52 356
emilmont 1:fdd22bb7aa52 357 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 358 i--;
emilmont 1:fdd22bb7aa52 359 } while(i > 0u);
emilmont 1:fdd22bb7aa52 360
emilmont 1:fdd22bb7aa52 361 /* pbuff initialized to input buffer */
emilmont 1:fdd22bb7aa52 362 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 363
emilmont 1:fdd22bb7aa52 364 /* pS1 initialized to pState */
emilmont 1:fdd22bb7aa52 365 pS1 = pState;
emilmont 1:fdd22bb7aa52 366
emilmont 1:fdd22bb7aa52 367 /* Initializing the loop counter */
emilmont 1:fdd22bb7aa52 368 i = (uint32_t) S->N;
emilmont 1:fdd22bb7aa52 369
emilmont 1:fdd22bb7aa52 370 do
emilmont 1:fdd22bb7aa52 371 {
emilmont 1:fdd22bb7aa52 372 /* Writing the re-ordered output back to inplace input buffer */
emilmont 1:fdd22bb7aa52 373 *pbuff++ = *pS1++;
emilmont 1:fdd22bb7aa52 374
emilmont 1:fdd22bb7aa52 375 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 376 i--;
emilmont 1:fdd22bb7aa52 377 } while(i > 0u);
emilmont 1:fdd22bb7aa52 378
emilmont 1:fdd22bb7aa52 379
emilmont 1:fdd22bb7aa52 380 /* ---------------------------------------------------------
emilmont 1:fdd22bb7aa52 381 * Step2: Calculate RFFT for N-point input
emilmont 1:fdd22bb7aa52 382 * ---------------------------------------------------------- */
emilmont 1:fdd22bb7aa52 383 /* pInlineBuffer is real input of length N , pState is the complex output of length 2N */
emilmont 1:fdd22bb7aa52 384 arm_rfft_f32(S->pRfft, pInlineBuffer, pState);
emilmont 1:fdd22bb7aa52 385
emilmont 1:fdd22bb7aa52 386 /*----------------------------------------------------------------------
emilmont 1:fdd22bb7aa52 387 * Step3: Multiply the FFT output with the weights.
emilmont 1:fdd22bb7aa52 388 *----------------------------------------------------------------------*/
emilmont 1:fdd22bb7aa52 389 arm_cmplx_mult_cmplx_f32(pState, weights, pState, S->N);
emilmont 1:fdd22bb7aa52 390
emilmont 1:fdd22bb7aa52 391 /* ----------- Post-processing ---------- */
emilmont 1:fdd22bb7aa52 392 /* DCT-IV can be obtained from DCT-II by the equation,
emilmont 1:fdd22bb7aa52 393 * Y4(k) = Y2(k) - Y4(k-1) and Y4(-1) = Y4(0)
emilmont 1:fdd22bb7aa52 394 * Hence, Y4(0) = Y2(0)/2 */
emilmont 1:fdd22bb7aa52 395 /* Getting only real part from the output and Converting to DCT-IV */
emilmont 1:fdd22bb7aa52 396
emilmont 1:fdd22bb7aa52 397 /* pbuff initialized to input buffer. */
emilmont 1:fdd22bb7aa52 398 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 399
emilmont 1:fdd22bb7aa52 400 /* pS1 initialized to pState */
emilmont 1:fdd22bb7aa52 401 pS1 = pState;
emilmont 1:fdd22bb7aa52 402
emilmont 1:fdd22bb7aa52 403 /* Calculating Y4(0) from Y2(0) using Y4(0) = Y2(0)/2 */
emilmont 1:fdd22bb7aa52 404 in = *pS1++ * (float32_t) 0.5;
emilmont 1:fdd22bb7aa52 405 /* input buffer acts as inplace, so output values are stored in the input itself. */
emilmont 1:fdd22bb7aa52 406 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 407
emilmont 1:fdd22bb7aa52 408 /* pState pointer is incremented twice as the real values are located alternatively in the array */
emilmont 1:fdd22bb7aa52 409 pS1++;
emilmont 1:fdd22bb7aa52 410
emilmont 1:fdd22bb7aa52 411 /* Initializing the loop counter */
emilmont 1:fdd22bb7aa52 412 i = ((uint32_t) S->N - 1u);
emilmont 1:fdd22bb7aa52 413
emilmont 1:fdd22bb7aa52 414 do
emilmont 1:fdd22bb7aa52 415 {
emilmont 1:fdd22bb7aa52 416 /* Calculating Y4(1) to Y4(N-1) from Y2 using equation Y4(k) = Y2(k) - Y4(k-1) */
emilmont 1:fdd22bb7aa52 417 /* pState pointer (pS1) is incremented twice as the real values are located alternatively in the array */
emilmont 1:fdd22bb7aa52 418 in = *pS1++ - in;
emilmont 1:fdd22bb7aa52 419 *pbuff++ = in;
emilmont 1:fdd22bb7aa52 420 /* points to the next real value */
emilmont 1:fdd22bb7aa52 421 pS1++;
emilmont 1:fdd22bb7aa52 422
emilmont 1:fdd22bb7aa52 423
emilmont 1:fdd22bb7aa52 424 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 425 i--;
emilmont 1:fdd22bb7aa52 426 } while(i > 0u);
emilmont 1:fdd22bb7aa52 427
emilmont 1:fdd22bb7aa52 428
emilmont 1:fdd22bb7aa52 429 /*------------ Normalizing the output by multiplying with the normalizing factor ----------*/
emilmont 1:fdd22bb7aa52 430
emilmont 1:fdd22bb7aa52 431 /* Initializing the loop counter */
emilmont 1:fdd22bb7aa52 432 i = (uint32_t) S->N;
emilmont 1:fdd22bb7aa52 433
emilmont 1:fdd22bb7aa52 434 /* pbuff initialized to the pInlineBuffer(now contains the output values) */
emilmont 1:fdd22bb7aa52 435 pbuff = pInlineBuffer;
emilmont 1:fdd22bb7aa52 436
emilmont 1:fdd22bb7aa52 437 do
emilmont 1:fdd22bb7aa52 438 {
emilmont 1:fdd22bb7aa52 439 /* Multiplying pInlineBuffer with the normalizing factor sqrt(2/N) */
emilmont 1:fdd22bb7aa52 440 in = *pbuff;
emilmont 1:fdd22bb7aa52 441 *pbuff++ = in * S->normalize;
emilmont 1:fdd22bb7aa52 442
emilmont 1:fdd22bb7aa52 443 /* Decrement the loop counter */
emilmont 1:fdd22bb7aa52 444 i--;
emilmont 1:fdd22bb7aa52 445 } while(i > 0u);
emilmont 1:fdd22bb7aa52 446
emilmont 1:fdd22bb7aa52 447 #endif /* #ifndef ARM_MATH_CM0 */
emilmont 1:fdd22bb7aa52 448
emilmont 1:fdd22bb7aa52 449 }
emilmont 1:fdd22bb7aa52 450
emilmont 1:fdd22bb7aa52 451 /**
emilmont 1:fdd22bb7aa52 452 * @} end of DCT4_IDCT4 group
emilmont 1:fdd22bb7aa52 453 */