Important changes to repositories hosted on mbed.com
Mbed hosted mercurial repositories are deprecated and are due to be permanently deleted in July 2026.
To keep a copy of this software download the repository Zip archive or clone locally using Mercurial.
It is also possible to export all your personal repositories from the account settings page.
arm_dct4_f32.c
00001 /* ---------------------------------------------------------------------- 00002 * Project: CMSIS DSP Library 00003 * Title: arm_dct4_f32.c 00004 * Description: Processing function of DCT4 & IDCT4 F32 00005 * 00006 * $Date: 27. January 2017 00007 * $Revision: V.1.5.1 00008 * 00009 * Target Processor: Cortex-M cores 00010 * -------------------------------------------------------------------- */ 00011 /* 00012 * Copyright (C) 2010-2017 ARM Limited or its affiliates. All rights reserved. 00013 * 00014 * SPDX-License-Identifier: Apache-2.0 00015 * 00016 * Licensed under the Apache License, Version 2.0 (the License); you may 00017 * not use this file except in compliance with the License. 00018 * You may obtain a copy of the License at 00019 * 00020 * www.apache.org/licenses/LICENSE-2.0 00021 * 00022 * Unless required by applicable law or agreed to in writing, software 00023 * distributed under the License is distributed on an AS IS BASIS, WITHOUT 00024 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 00025 * See the License for the specific language governing permissions and 00026 * limitations under the License. 00027 */ 00028 00029 #include "arm_math.h" 00030 00031 /** 00032 * @ingroup groupTransforms 00033 */ 00034 00035 /** 00036 * @defgroup DCT4_IDCT4 DCT Type IV Functions 00037 * Representation of signals by minimum number of values is important for storage and transmission. 00038 * The possibility of large discontinuity between the beginning and end of a period of a signal 00039 * in DFT can be avoided by extending the signal so that it is even-symmetric. 00040 * Discrete Cosine Transform (DCT) is constructed such that its energy is heavily concentrated in the lower part of the 00041 * spectrum and is very widely used in signal and image coding applications. 00042 * The family of DCTs (DCT type- 1,2,3,4) is the outcome of different combinations of homogeneous boundary conditions. 00043 * DCT has an excellent energy-packing capability, hence has many applications and in data compression in particular. 00044 * 00045 * DCT is essentially the Discrete Fourier Transform(DFT) of an even-extended real signal. 00046 * Reordering of the input data makes the computation of DCT just a problem of 00047 * computing the DFT of a real signal with a few additional operations. 00048 * This approach provides regular, simple, and very efficient DCT algorithms for practical hardware and software implementations. 00049 * 00050 * DCT type-II can be implemented using Fast fourier transform (FFT) internally, as the transform is applied on real values, Real FFT can be used. 00051 * DCT4 is implemented using DCT2 as their implementations are similar except with some added pre-processing and post-processing. 00052 * DCT2 implementation can be described in the following steps: 00053 * - Re-ordering input 00054 * - Calculating Real FFT 00055 * - Multiplication of weights and Real FFT output and getting real part from the product. 00056 * 00057 * This process is explained by the block diagram below: 00058 * \image html DCT4.gif "Discrete Cosine Transform - type-IV" 00059 * 00060 * \par Algorithm: 00061 * The N-point type-IV DCT is defined as a real, linear transformation by the formula: 00062 * \image html DCT4Equation.gif 00063 * where <code>k = 0,1,2,.....N-1</code> 00064 *\par 00065 * Its inverse is defined as follows: 00066 * \image html IDCT4Equation.gif 00067 * where <code>n = 0,1,2,.....N-1</code> 00068 *\par 00069 * The DCT4 matrices become involutory (i.e. they are self-inverse) by multiplying with an overall scale factor of sqrt(2/N). 00070 * The symmetry of the transform matrix indicates that the fast algorithms for the forward 00071 * and inverse transform computation are identical. 00072 * Note that the implementation of Inverse DCT4 and DCT4 is same, hence same process function can be used for both. 00073 * 00074 * \par Lengths supported by the transform: 00075 * As DCT4 internally uses Real FFT, it supports all the lengths 128, 512, 2048 and 8192. 00076 * The library provides separate functions for Q15, Q31, and floating-point data types. 00077 * \par Instance Structure 00078 * The instances for Real FFT and FFT, cosine values table and twiddle factor table are stored in an instance data structure. 00079 * A separate instance structure must be defined for each transform. 00080 * There are separate instance structure declarations for each of the 3 supported data types. 00081 * 00082 * \par Initialization Functions 00083 * There is also an associated initialization function for each data type. 00084 * The initialization function performs the following operations: 00085 * - Sets the values of the internal structure fields. 00086 * - Initializes Real FFT as its process function is used internally in DCT4, by calling arm_rfft_init_f32(). 00087 * \par 00088 * Use of the initialization function is optional. 00089 * However, if the initialization function is used, then the instance structure cannot be placed into a const data section. 00090 * To place an instance structure into a const data section, the instance structure must be manually initialized. 00091 * Manually initialize the instance structure as follows: 00092 * <pre> 00093 *arm_dct4_instance_f32 S = {N, Nby2, normalize, pTwiddle, pCosFactor, pRfft, pCfft}; 00094 *arm_dct4_instance_q31 S = {N, Nby2, normalize, pTwiddle, pCosFactor, pRfft, pCfft}; 00095 *arm_dct4_instance_q15 S = {N, Nby2, normalize, pTwiddle, pCosFactor, pRfft, pCfft}; 00096 * </pre> 00097 * where \c N is the length of the DCT4; \c Nby2 is half of the length of the DCT4; 00098 * \c normalize is normalizing factor used and is equal to <code>sqrt(2/N)</code>; 00099 * \c pTwiddle points to the twiddle factor table; 00100 * \c pCosFactor points to the cosFactor table; 00101 * \c pRfft points to the real FFT instance; 00102 * \c pCfft points to the complex FFT instance; 00103 * The CFFT and RFFT structures also needs to be initialized, refer to arm_cfft_radix4_f32() 00104 * and arm_rfft_f32() respectively for details regarding static initialization. 00105 * 00106 * \par Fixed-Point Behavior 00107 * Care must be taken when using the fixed-point versions of the DCT4 transform functions. 00108 * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered. 00109 * Refer to the function specific documentation below for usage guidelines. 00110 */ 00111 00112 /** 00113 * @addtogroup DCT4_IDCT4 00114 * @{ 00115 */ 00116 00117 /** 00118 * @brief Processing function for the floating-point DCT4/IDCT4. 00119 * @param[in] *S points to an instance of the floating-point DCT4/IDCT4 structure. 00120 * @param[in] *pState points to state buffer. 00121 * @param[in,out] *pInlineBuffer points to the in-place input and output buffer. 00122 * @return none. 00123 */ 00124 00125 void arm_dct4_f32( 00126 const arm_dct4_instance_f32 * S, 00127 float32_t * pState, 00128 float32_t * pInlineBuffer) 00129 { 00130 uint32_t i; /* Loop counter */ 00131 float32_t *weights = S->pTwiddle; /* Pointer to the Weights table */ 00132 float32_t *cosFact = S->pCosFactor; /* Pointer to the cos factors table */ 00133 float32_t *pS1, *pS2, *pbuff; /* Temporary pointers for input buffer and pState buffer */ 00134 float32_t in; /* Temporary variable */ 00135 00136 00137 /* DCT4 computation involves DCT2 (which is calculated using RFFT) 00138 * along with some pre-processing and post-processing. 00139 * Computational procedure is explained as follows: 00140 * (a) Pre-processing involves multiplying input with cos factor, 00141 * r(n) = 2 * u(n) * cos(pi*(2*n+1)/(4*n)) 00142 * where, 00143 * r(n) -- output of preprocessing 00144 * u(n) -- input to preprocessing(actual Source buffer) 00145 * (b) Calculation of DCT2 using FFT is divided into three steps: 00146 * Step1: Re-ordering of even and odd elements of input. 00147 * Step2: Calculating FFT of the re-ordered input. 00148 * Step3: Taking the real part of the product of FFT output and weights. 00149 * (c) Post-processing - DCT4 can be obtained from DCT2 output using the following equation: 00150 * Y4(k) = Y2(k) - Y4(k-1) and Y4(-1) = Y4(0) 00151 * where, 00152 * Y4 -- DCT4 output, Y2 -- DCT2 output 00153 * (d) Multiplying the output with the normalizing factor sqrt(2/N). 00154 */ 00155 00156 /*-------- Pre-processing ------------*/ 00157 /* Multiplying input with cos factor i.e. r(n) = 2 * x(n) * cos(pi*(2*n+1)/(4*n)) */ 00158 arm_scale_f32(pInlineBuffer, 2.0f, pInlineBuffer, S->N); 00159 arm_mult_f32(pInlineBuffer, cosFact, pInlineBuffer, S->N); 00160 00161 /* ---------------------------------------------------------------- 00162 * Step1: Re-ordering of even and odd elements as, 00163 * pState[i] = pInlineBuffer[2*i] and 00164 * pState[N-i-1] = pInlineBuffer[2*i+1] where i = 0 to N/2 00165 ---------------------------------------------------------------------*/ 00166 00167 /* pS1 initialized to pState */ 00168 pS1 = pState; 00169 00170 /* pS2 initialized to pState+N-1, so that it points to the end of the state buffer */ 00171 pS2 = pState + (S->N - 1U); 00172 00173 /* pbuff initialized to input buffer */ 00174 pbuff = pInlineBuffer; 00175 00176 #if defined (ARM_MATH_DSP) 00177 00178 /* Run the below code for Cortex-M4 and Cortex-M3 */ 00179 00180 /* Initializing the loop counter to N/2 >> 2 for loop unrolling by 4 */ 00181 i = (uint32_t) S->Nby2 >> 2U; 00182 00183 /* First part of the processing with loop unrolling. Compute 4 outputs at a time. 00184 ** a second loop below computes the remaining 1 to 3 samples. */ 00185 do 00186 { 00187 /* Re-ordering of even and odd elements */ 00188 /* pState[i] = pInlineBuffer[2*i] */ 00189 *pS1++ = *pbuff++; 00190 /* pState[N-i-1] = pInlineBuffer[2*i+1] */ 00191 *pS2-- = *pbuff++; 00192 00193 *pS1++ = *pbuff++; 00194 *pS2-- = *pbuff++; 00195 00196 *pS1++ = *pbuff++; 00197 *pS2-- = *pbuff++; 00198 00199 *pS1++ = *pbuff++; 00200 *pS2-- = *pbuff++; 00201 00202 /* Decrement the loop counter */ 00203 i--; 00204 } while (i > 0U); 00205 00206 /* pbuff initialized to input buffer */ 00207 pbuff = pInlineBuffer; 00208 00209 /* pS1 initialized to pState */ 00210 pS1 = pState; 00211 00212 /* Initializing the loop counter to N/4 instead of N for loop unrolling */ 00213 i = (uint32_t) S->N >> 2U; 00214 00215 /* Processing with loop unrolling 4 times as N is always multiple of 4. 00216 * Compute 4 outputs at a time */ 00217 do 00218 { 00219 /* Writing the re-ordered output back to inplace input buffer */ 00220 *pbuff++ = *pS1++; 00221 *pbuff++ = *pS1++; 00222 *pbuff++ = *pS1++; 00223 *pbuff++ = *pS1++; 00224 00225 /* Decrement the loop counter */ 00226 i--; 00227 } while (i > 0U); 00228 00229 00230 /* --------------------------------------------------------- 00231 * Step2: Calculate RFFT for N-point input 00232 * ---------------------------------------------------------- */ 00233 /* pInlineBuffer is real input of length N , pState is the complex output of length 2N */ 00234 arm_rfft_f32(S->pRfft, pInlineBuffer, pState); 00235 00236 /*---------------------------------------------------------------------- 00237 * Step3: Multiply the FFT output with the weights. 00238 *----------------------------------------------------------------------*/ 00239 arm_cmplx_mult_cmplx_f32(pState, weights, pState, S->N); 00240 00241 /* ----------- Post-processing ---------- */ 00242 /* DCT-IV can be obtained from DCT-II by the equation, 00243 * Y4(k) = Y2(k) - Y4(k-1) and Y4(-1) = Y4(0) 00244 * Hence, Y4(0) = Y2(0)/2 */ 00245 /* Getting only real part from the output and Converting to DCT-IV */ 00246 00247 /* Initializing the loop counter to N >> 2 for loop unrolling by 4 */ 00248 i = ((uint32_t) S->N - 1U) >> 2U; 00249 00250 /* pbuff initialized to input buffer. */ 00251 pbuff = pInlineBuffer; 00252 00253 /* pS1 initialized to pState */ 00254 pS1 = pState; 00255 00256 /* Calculating Y4(0) from Y2(0) using Y4(0) = Y2(0)/2 */ 00257 in = *pS1++ * (float32_t) 0.5; 00258 /* input buffer acts as inplace, so output values are stored in the input itself. */ 00259 *pbuff++ = in; 00260 00261 /* pState pointer is incremented twice as the real values are located alternatively in the array */ 00262 pS1++; 00263 00264 /* First part of the processing with loop unrolling. Compute 4 outputs at a time. 00265 ** a second loop below computes the remaining 1 to 3 samples. */ 00266 do 00267 { 00268 /* Calculating Y4(1) to Y4(N-1) from Y2 using equation Y4(k) = Y2(k) - Y4(k-1) */ 00269 /* pState pointer (pS1) is incremented twice as the real values are located alternatively in the array */ 00270 in = *pS1++ - in; 00271 *pbuff++ = in; 00272 /* points to the next real value */ 00273 pS1++; 00274 00275 in = *pS1++ - in; 00276 *pbuff++ = in; 00277 pS1++; 00278 00279 in = *pS1++ - in; 00280 *pbuff++ = in; 00281 pS1++; 00282 00283 in = *pS1++ - in; 00284 *pbuff++ = in; 00285 pS1++; 00286 00287 /* Decrement the loop counter */ 00288 i--; 00289 } while (i > 0U); 00290 00291 /* If the blockSize is not a multiple of 4, compute any remaining output samples here. 00292 ** No loop unrolling is used. */ 00293 i = ((uint32_t) S->N - 1U) % 0x4U; 00294 00295 while (i > 0U) 00296 { 00297 /* Calculating Y4(1) to Y4(N-1) from Y2 using equation Y4(k) = Y2(k) - Y4(k-1) */ 00298 /* pState pointer (pS1) is incremented twice as the real values are located alternatively in the array */ 00299 in = *pS1++ - in; 00300 *pbuff++ = in; 00301 /* points to the next real value */ 00302 pS1++; 00303 00304 /* Decrement the loop counter */ 00305 i--; 00306 } 00307 00308 00309 /*------------ Normalizing the output by multiplying with the normalizing factor ----------*/ 00310 00311 /* Initializing the loop counter to N/4 instead of N for loop unrolling */ 00312 i = (uint32_t) S->N >> 2U; 00313 00314 /* pbuff initialized to the pInlineBuffer(now contains the output values) */ 00315 pbuff = pInlineBuffer; 00316 00317 /* Processing with loop unrolling 4 times as N is always multiple of 4. Compute 4 outputs at a time */ 00318 do 00319 { 00320 /* Multiplying pInlineBuffer with the normalizing factor sqrt(2/N) */ 00321 in = *pbuff; 00322 *pbuff++ = in * S->normalize; 00323 00324 in = *pbuff; 00325 *pbuff++ = in * S->normalize; 00326 00327 in = *pbuff; 00328 *pbuff++ = in * S->normalize; 00329 00330 in = *pbuff; 00331 *pbuff++ = in * S->normalize; 00332 00333 /* Decrement the loop counter */ 00334 i--; 00335 } while (i > 0U); 00336 00337 00338 #else 00339 00340 /* Run the below code for Cortex-M0 */ 00341 00342 /* Initializing the loop counter to N/2 */ 00343 i = (uint32_t) S->Nby2; 00344 00345 do 00346 { 00347 /* Re-ordering of even and odd elements */ 00348 /* pState[i] = pInlineBuffer[2*i] */ 00349 *pS1++ = *pbuff++; 00350 /* pState[N-i-1] = pInlineBuffer[2*i+1] */ 00351 *pS2-- = *pbuff++; 00352 00353 /* Decrement the loop counter */ 00354 i--; 00355 } while (i > 0U); 00356 00357 /* pbuff initialized to input buffer */ 00358 pbuff = pInlineBuffer; 00359 00360 /* pS1 initialized to pState */ 00361 pS1 = pState; 00362 00363 /* Initializing the loop counter */ 00364 i = (uint32_t) S->N; 00365 00366 do 00367 { 00368 /* Writing the re-ordered output back to inplace input buffer */ 00369 *pbuff++ = *pS1++; 00370 00371 /* Decrement the loop counter */ 00372 i--; 00373 } while (i > 0U); 00374 00375 00376 /* --------------------------------------------------------- 00377 * Step2: Calculate RFFT for N-point input 00378 * ---------------------------------------------------------- */ 00379 /* pInlineBuffer is real input of length N , pState is the complex output of length 2N */ 00380 arm_rfft_f32(S->pRfft, pInlineBuffer, pState); 00381 00382 /*---------------------------------------------------------------------- 00383 * Step3: Multiply the FFT output with the weights. 00384 *----------------------------------------------------------------------*/ 00385 arm_cmplx_mult_cmplx_f32(pState, weights, pState, S->N); 00386 00387 /* ----------- Post-processing ---------- */ 00388 /* DCT-IV can be obtained from DCT-II by the equation, 00389 * Y4(k) = Y2(k) - Y4(k-1) and Y4(-1) = Y4(0) 00390 * Hence, Y4(0) = Y2(0)/2 */ 00391 /* Getting only real part from the output and Converting to DCT-IV */ 00392 00393 /* pbuff initialized to input buffer. */ 00394 pbuff = pInlineBuffer; 00395 00396 /* pS1 initialized to pState */ 00397 pS1 = pState; 00398 00399 /* Calculating Y4(0) from Y2(0) using Y4(0) = Y2(0)/2 */ 00400 in = *pS1++ * (float32_t) 0.5; 00401 /* input buffer acts as inplace, so output values are stored in the input itself. */ 00402 *pbuff++ = in; 00403 00404 /* pState pointer is incremented twice as the real values are located alternatively in the array */ 00405 pS1++; 00406 00407 /* Initializing the loop counter */ 00408 i = ((uint32_t) S->N - 1U); 00409 00410 do 00411 { 00412 /* Calculating Y4(1) to Y4(N-1) from Y2 using equation Y4(k) = Y2(k) - Y4(k-1) */ 00413 /* pState pointer (pS1) is incremented twice as the real values are located alternatively in the array */ 00414 in = *pS1++ - in; 00415 *pbuff++ = in; 00416 /* points to the next real value */ 00417 pS1++; 00418 00419 00420 /* Decrement the loop counter */ 00421 i--; 00422 } while (i > 0U); 00423 00424 00425 /*------------ Normalizing the output by multiplying with the normalizing factor ----------*/ 00426 00427 /* Initializing the loop counter */ 00428 i = (uint32_t) S->N; 00429 00430 /* pbuff initialized to the pInlineBuffer(now contains the output values) */ 00431 pbuff = pInlineBuffer; 00432 00433 do 00434 { 00435 /* Multiplying pInlineBuffer with the normalizing factor sqrt(2/N) */ 00436 in = *pbuff; 00437 *pbuff++ = in * S->normalize; 00438 00439 /* Decrement the loop counter */ 00440 i--; 00441 } while (i > 0U); 00442 00443 #endif /* #if defined (ARM_MATH_DSP) */ 00444 00445 } 00446 00447 /** 00448 * @} end of DCT4_IDCT4 group 00449 */ 00450
Generated on Tue Jul 12 2022 16:46:23 by
 1.7.2
 1.7.2