Fork of mbed-dsp. CMSIS-DSP library of supporting NEON
Dependents: mbed-os-example-cmsis_dsp_neon
Fork of mbed-dsp by
Information
Japanese version is available in lower part of this page.
このページの後半に日本語版が用意されています.
CMSIS-DSP of supporting NEON
What is this ?
A library for CMSIS-DSP of supporting NEON.
We supported the NEON to CMSIS-DSP Ver1.4.3(CMSIS V4.1) that ARM supplied, has achieved the processing speed improvement.
If you use the mbed-dsp library, you can use to replace this library.
CMSIS-DSP of supporting NEON is provied as a library.
Library Creation environment
CMSIS-DSP library of supporting NEON was created by the following environment.
- Compiler
ARMCC Version 5.03 - Compile option switch[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- Compile option switch[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
Effects of NEON support
In the data which passes to each function, large size will be expected more effective than small size.
Also if the data is a multiple of 16, effect will be expected in every function in the CMSIS-DSP.
NEON対応CMSIS-DSP
概要
NEON対応したCMSIS-DSPのライブラリです。
ARM社提供のCMSIS-DSP Ver1.4.3(CMSIS V4.1)をターゲットにNEON対応を行ない、処理速度向上を実現しております。
mbed-dspライブラリを使用している場合は、本ライブラリに置き換えて使用することができます。
NEON対応したCMSIS-DSPはライブラリで提供します。
ライブラリ作成環境
NEON対応CMSIS-DSPライブラリは、以下の環境で作成しています。
- コンパイラ
ARMCC Version 5.03 - コンパイルオプションスイッチ[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- コンパイルオプションスイッチ[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
NEON対応による効果について
CMSIS-DSP内の各関数へ渡すデータは、小さいサイズよりも大きいサイズの方が効果が見込めます。
また、16の倍数のデータであれば、CMSIS-DSP内のどの関数でも効果が見込めます。
Diff: cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c
- Revision:
- 1:fdd22bb7aa52
- Child:
- 2:da51fb522205
diff -r 83d0537c7d84 -r fdd22bb7aa52 cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c Wed Nov 28 12:30:09 2012 +0000 @@ -0,0 +1,668 @@ +/* ---------------------------------------------------------------------- +* Copyright (C) 2010 ARM Limited. All rights reserved. +* +* $Date: 15. February 2012 +* $Revision: V1.1.0 +* +* Project: CMSIS DSP Library +* Title: arm_mat_inverse_f32.c +* +* Description: Floating-point matrix inverse. +* +* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0 +* +* Version 1.1.0 2012/02/15 +* Updated with more optimizations, bug fixes and minor API changes. +* +* Version 1.0.10 2011/7/15 +* Big Endian support added and Merged M0 and M3/M4 Source code. +* +* Version 1.0.3 2010/11/29 +* Re-organized the CMSIS folders and updated documentation. +* +* Version 1.0.2 2010/11/11 +* Documentation updated. +* +* Version 1.0.1 2010/10/05 +* Production release and review comments incorporated. +* +* Version 1.0.0 2010/09/20 +* Production release and review comments incorporated. +* -------------------------------------------------------------------- */ + +#include "arm_math.h" + +/** + * @ingroup groupMatrix + */ + +/** + * @defgroup MatrixInv Matrix Inverse + * + * Computes the inverse of a matrix. + * + * The inverse is defined only if the input matrix is square and non-singular (the determinant + * is non-zero). The function checks that the input and output matrices are square and of the + * same size. + * + * Matrix inversion is numerically sensitive and the CMSIS DSP library only supports matrix + * inversion of floating-point matrices. + * + * \par Algorithm + * The Gauss-Jordan method is used to find the inverse. + * The algorithm performs a sequence of elementary row-operations till it + * reduces the input matrix to an identity matrix. Applying the same sequence + * of elementary row-operations to an identity matrix yields the inverse matrix. + * If the input matrix is singular, then the algorithm terminates and returns error status + * <code>ARM_MATH_SINGULAR</code>. + * \image html MatrixInverse.gif "Matrix Inverse of a 3 x 3 matrix using Gauss-Jordan Method" + */ + +/** + * @addtogroup MatrixInv + * @{ + */ + +/** + * @brief Floating-point matrix inverse. + * @param[in] *pSrc points to input matrix structure + * @param[out] *pDst points to output matrix structure + * @return The function returns + * <code>ARM_MATH_SIZE_MISMATCH</code> if the input matrix is not square or if the size + * of the output matrix does not match the size of the input matrix. + * If the input matrix is found to be singular (non-invertible), then the function returns + * <code>ARM_MATH_SINGULAR</code>. Otherwise, the function returns <code>ARM_MATH_SUCCESS</code>. + */ + +arm_status arm_mat_inverse_f32( + const arm_matrix_instance_f32 * pSrc, + arm_matrix_instance_f32 * pDst) +{ + float32_t *pIn = pSrc->pData; /* input data matrix pointer */ + float32_t *pOut = pDst->pData; /* output data matrix pointer */ + float32_t *pInT1, *pInT2; /* Temporary input data matrix pointer */ + float32_t *pInT3, *pInT4; /* Temporary output data matrix pointer */ + float32_t *pPivotRowIn, *pPRT_in, *pPivotRowDst, *pPRT_pDst; /* Temporary input and output data matrix pointer */ + uint32_t numRows = pSrc->numRows; /* Number of rows in the matrix */ + uint32_t numCols = pSrc->numCols; /* Number of Cols in the matrix */ + +#ifndef ARM_MATH_CM0 + + /* Run the below code for Cortex-M4 and Cortex-M3 */ + + float32_t Xchg, in = 0.0f, in1; /* Temporary input values */ + uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l; /* loop counters */ + arm_status status; /* status of matrix inverse */ + +#ifdef ARM_MATH_MATRIX_CHECK + + + /* Check for matrix mismatch condition */ + if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols) + || (pSrc->numRows != pDst->numRows)) + { + /* Set status as ARM_MATH_SIZE_MISMATCH */ + status = ARM_MATH_SIZE_MISMATCH; + } + else +#endif /* #ifdef ARM_MATH_MATRIX_CHECK */ + + { + + /*-------------------------------------------------------------------------------------------------------------- + * Matrix Inverse can be solved using elementary row operations. + * + * Gauss-Jordan Method: + * + * 1. First combine the identity matrix and the input matrix separated by a bar to form an + * augmented matrix as follows: + * _ _ _ _ + * | a11 a12 | 1 0 | | X11 X12 | + * | | | = | | + * |_ a21 a22 | 0 1 _| |_ X21 X21 _| + * + * 2. In our implementation, pDst Matrix is used as identity matrix. + * + * 3. Begin with the first row. Let i = 1. + * + * 4. Check to see if the pivot for row i is zero. + * The pivot is the element of the main diagonal that is on the current row. + * For instance, if working with row i, then the pivot element is aii. + * If the pivot is zero, exchange that row with a row below it that does not + * contain a zero in column i. If this is not possible, then an inverse + * to that matrix does not exist. + * + * 5. Divide every element of row i by the pivot. + * + * 6. For every row below and row i, replace that row with the sum of that row and + * a multiple of row i so that each new element in column i below row i is zero. + * + * 7. Move to the next row and column and repeat steps 2 through 5 until you have zeros + * for every element below and above the main diagonal. + * + * 8. Now an identical matrix is formed to the left of the bar(input matrix, pSrc). + * Therefore, the matrix to the right of the bar is our solution(pDst matrix, pDst). + *----------------------------------------------------------------------------------------------------------------*/ + + /* Working pointer for destination matrix */ + pInT2 = pOut; + + /* Loop over the number of rows */ + rowCnt = numRows; + + /* Making the destination matrix as identity matrix */ + while(rowCnt > 0u) + { + /* Writing all zeroes in lower triangle of the destination matrix */ + j = numRows - rowCnt; + while(j > 0u) + { + *pInT2++ = 0.0f; + j--; + } + + /* Writing all ones in the diagonal of the destination matrix */ + *pInT2++ = 1.0f; + + /* Writing all zeroes in upper triangle of the destination matrix */ + j = rowCnt - 1u; + while(j > 0u) + { + *pInT2++ = 0.0f; + j--; + } + + /* Decrement the loop counter */ + rowCnt--; + } + + /* Loop over the number of columns of the input matrix. + All the elements in each column are processed by the row operations */ + loopCnt = numCols; + + /* Index modifier to navigate through the columns */ + l = 0u; + + while(loopCnt > 0u) + { + /* Check if the pivot element is zero.. + * If it is zero then interchange the row with non zero row below. + * If there is no non zero element to replace in the rows below, + * then the matrix is Singular. */ + + /* Working pointer for the input matrix that points + * to the pivot element of the particular row */ + pInT1 = pIn + (l * numCols); + + /* Working pointer for the destination matrix that points + * to the pivot element of the particular row */ + pInT3 = pOut + (l * numCols); + + /* Temporary variable to hold the pivot value */ + in = *pInT1; + + /* Destination pointer modifier */ + k = 1u; + + /* Check if the pivot element is zero */ + if(*pInT1 == 0.0f) + { + /* Loop over the number rows present below */ + i = numRows - (l + 1u); + + while(i > 0u) + { + /* Update the input and destination pointers */ + pInT2 = pInT1 + (numCols * l); + pInT4 = pInT3 + (numCols * k); + + /* Check if there is a non zero pivot element to + * replace in the rows below */ + if(*pInT2 != 0.0f) + { + /* Loop over number of columns + * to the right of the pilot element */ + j = numCols - l; + + while(j > 0u) + { + /* Exchange the row elements of the input matrix */ + Xchg = *pInT2; + *pInT2++ = *pInT1; + *pInT1++ = Xchg; + + /* Decrement the loop counter */ + j--; + } + + /* Loop over number of columns of the destination matrix */ + j = numCols; + + while(j > 0u) + { + /* Exchange the row elements of the destination matrix */ + Xchg = *pInT4; + *pInT4++ = *pInT3; + *pInT3++ = Xchg; + + /* Decrement the loop counter */ + j--; + } + + /* Flag to indicate whether exchange is done or not */ + flag = 1u; + + /* Break after exchange is done */ + break; + } + + /* Update the destination pointer modifier */ + k++; + + /* Decrement the loop counter */ + i--; + } + } + + /* Update the status if the matrix is singular */ + if((flag != 1u) && (in == 0.0f)) + { + status = ARM_MATH_SINGULAR; + + break; + } + + /* Points to the pivot row of input and destination matrices */ + pPivotRowIn = pIn + (l * numCols); + pPivotRowDst = pOut + (l * numCols); + + /* Temporary pointers to the pivot row pointers */ + pInT1 = pPivotRowIn; + pInT2 = pPivotRowDst; + + /* Pivot element of the row */ + in = *(pIn + (l * numCols)); + + /* Loop over number of columns + * to the right of the pilot element */ + j = (numCols - l); + + while(j > 0u) + { + /* Divide each element of the row of the input matrix + * by the pivot element */ + in1 = *pInT1; + *pInT1++ = in1 / in; + + /* Decrement the loop counter */ + j--; + } + + /* Loop over number of columns of the destination matrix */ + j = numCols; + + while(j > 0u) + { + /* Divide each element of the row of the destination matrix + * by the pivot element */ + in1 = *pInT2; + *pInT2++ = in1 / in; + + /* Decrement the loop counter */ + j--; + } + + /* Replace the rows with the sum of that row and a multiple of row i + * so that each new element in column i above row i is zero.*/ + + /* Temporary pointers for input and destination matrices */ + pInT1 = pIn; + pInT2 = pOut; + + /* index used to check for pivot element */ + i = 0u; + + /* Loop over number of rows */ + /* to be replaced by the sum of that row and a multiple of row i */ + k = numRows; + + while(k > 0u) + { + /* Check for the pivot element */ + if(i == l) + { + /* If the processing element is the pivot element, + only the columns to the right are to be processed */ + pInT1 += numCols - l; + + pInT2 += numCols; + } + else + { + /* Element of the reference row */ + in = *pInT1; + + /* Working pointers for input and destination pivot rows */ + pPRT_in = pPivotRowIn; + pPRT_pDst = pPivotRowDst; + + /* Loop over the number of columns to the right of the pivot element, + to replace the elements in the input matrix */ + j = (numCols - l); + + while(j > 0u) + { + /* Replace the element by the sum of that row + and a multiple of the reference row */ + in1 = *pInT1; + *pInT1++ = in1 - (in * *pPRT_in++); + + /* Decrement the loop counter */ + j--; + } + + /* Loop over the number of columns to + replace the elements in the destination matrix */ + j = numCols; + + while(j > 0u) + { + /* Replace the element by the sum of that row + and a multiple of the reference row */ + in1 = *pInT2; + *pInT2++ = in1 - (in * *pPRT_pDst++); + + /* Decrement the loop counter */ + j--; + } + + } + + /* Increment the temporary input pointer */ + pInT1 = pInT1 + l; + + /* Decrement the loop counter */ + k--; + + /* Increment the pivot index */ + i++; + } + + /* Increment the input pointer */ + pIn++; + + /* Decrement the loop counter */ + loopCnt--; + + /* Increment the index modifier */ + l++; + } + + +#else + + /* Run the below code for Cortex-M0 */ + + float32_t Xchg, in = 0.0f; /* Temporary input values */ + uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l; /* loop counters */ + arm_status status; /* status of matrix inverse */ + +#ifdef ARM_MATH_MATRIX_CHECK + + /* Check for matrix mismatch condition */ + if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols) + || (pSrc->numRows != pDst->numRows)) + { + /* Set status as ARM_MATH_SIZE_MISMATCH */ + status = ARM_MATH_SIZE_MISMATCH; + } + else +#endif /* #ifdef ARM_MATH_MATRIX_CHECK */ + { + + /*-------------------------------------------------------------------------------------------------------------- + * Matrix Inverse can be solved using elementary row operations. + * + * Gauss-Jordan Method: + * + * 1. First combine the identity matrix and the input matrix separated by a bar to form an + * augmented matrix as follows: + * _ _ _ _ _ _ _ _ + * | | a11 a12 | | | 1 0 | | | X11 X12 | + * | | | | | | | = | | + * |_ |_ a21 a22 _| | |_0 1 _| _| |_ X21 X21 _| + * + * 2. In our implementation, pDst Matrix is used as identity matrix. + * + * 3. Begin with the first row. Let i = 1. + * + * 4. Check to see if the pivot for row i is zero. + * The pivot is the element of the main diagonal that is on the current row. + * For instance, if working with row i, then the pivot element is aii. + * If the pivot is zero, exchange that row with a row below it that does not + * contain a zero in column i. If this is not possible, then an inverse + * to that matrix does not exist. + * + * 5. Divide every element of row i by the pivot. + * + * 6. For every row below and row i, replace that row with the sum of that row and + * a multiple of row i so that each new element in column i below row i is zero. + * + * 7. Move to the next row and column and repeat steps 2 through 5 until you have zeros + * for every element below and above the main diagonal. + * + * 8. Now an identical matrix is formed to the left of the bar(input matrix, src). + * Therefore, the matrix to the right of the bar is our solution(dst matrix, dst). + *----------------------------------------------------------------------------------------------------------------*/ + + /* Working pointer for destination matrix */ + pInT2 = pOut; + + /* Loop over the number of rows */ + rowCnt = numRows; + + /* Making the destination matrix as identity matrix */ + while(rowCnt > 0u) + { + /* Writing all zeroes in lower triangle of the destination matrix */ + j = numRows - rowCnt; + while(j > 0u) + { + *pInT2++ = 0.0f; + j--; + } + + /* Writing all ones in the diagonal of the destination matrix */ + *pInT2++ = 1.0f; + + /* Writing all zeroes in upper triangle of the destination matrix */ + j = rowCnt - 1u; + while(j > 0u) + { + *pInT2++ = 0.0f; + j--; + } + + /* Decrement the loop counter */ + rowCnt--; + } + + /* Loop over the number of columns of the input matrix. + All the elements in each column are processed by the row operations */ + loopCnt = numCols; + + /* Index modifier to navigate through the columns */ + l = 0u; + //for(loopCnt = 0u; loopCnt < numCols; loopCnt++) + while(loopCnt > 0u) + { + /* Check if the pivot element is zero.. + * If it is zero then interchange the row with non zero row below. + * If there is no non zero element to replace in the rows below, + * then the matrix is Singular. */ + + /* Working pointer for the input matrix that points + * to the pivot element of the particular row */ + pInT1 = pIn + (l * numCols); + + /* Working pointer for the destination matrix that points + * to the pivot element of the particular row */ + pInT3 = pOut + (l * numCols); + + /* Temporary variable to hold the pivot value */ + in = *pInT1; + + /* Destination pointer modifier */ + k = 1u; + + /* Check if the pivot element is zero */ + if(*pInT1 == 0.0f) + { + /* Loop over the number rows present below */ + for (i = (l + 1u); i < numRows; i++) + { + /* Update the input and destination pointers */ + pInT2 = pInT1 + (numCols * l); + pInT4 = pInT3 + (numCols * k); + + /* Check if there is a non zero pivot element to + * replace in the rows below */ + if(*pInT2 != 0.0f) + { + /* Loop over number of columns + * to the right of the pilot element */ + for (j = 0u; j < (numCols - l); j++) + { + /* Exchange the row elements of the input matrix */ + Xchg = *pInT2; + *pInT2++ = *pInT1; + *pInT1++ = Xchg; + } + + for (j = 0u; j < numCols; j++) + { + Xchg = *pInT4; + *pInT4++ = *pInT3; + *pInT3++ = Xchg; + } + + /* Flag to indicate whether exchange is done or not */ + flag = 1u; + + /* Break after exchange is done */ + break; + } + + /* Update the destination pointer modifier */ + k++; + } + } + + /* Update the status if the matrix is singular */ + if((flag != 1u) && (in == 0.0f)) + { + status = ARM_MATH_SINGULAR; + + break; + } + + /* Points to the pivot row of input and destination matrices */ + pPivotRowIn = pIn + (l * numCols); + pPivotRowDst = pOut + (l * numCols); + + /* Temporary pointers to the pivot row pointers */ + pInT1 = pPivotRowIn; + pInT2 = pPivotRowDst; + + /* Pivot element of the row */ + in = *(pIn + (l * numCols)); + + /* Loop over number of columns + * to the right of the pilot element */ + for (j = 0u; j < (numCols - l); j++) + { + /* Divide each element of the row of the input matrix + * by the pivot element */ + *pInT1++ = *pInT1 / in; + } + for (j = 0u; j < numCols; j++) + { + /* Divide each element of the row of the destination matrix + * by the pivot element */ + *pInT2++ = *pInT2 / in; + } + + /* Replace the rows with the sum of that row and a multiple of row i + * so that each new element in column i above row i is zero.*/ + + /* Temporary pointers for input and destination matrices */ + pInT1 = pIn; + pInT2 = pOut; + + for (i = 0u; i < numRows; i++) + { + /* Check for the pivot element */ + if(i == l) + { + /* If the processing element is the pivot element, + only the columns to the right are to be processed */ + pInT1 += numCols - l; + pInT2 += numCols; + } + else + { + /* Element of the reference row */ + in = *pInT1; + + /* Working pointers for input and destination pivot rows */ + pPRT_in = pPivotRowIn; + pPRT_pDst = pPivotRowDst; + + /* Loop over the number of columns to the right of the pivot element, + to replace the elements in the input matrix */ + for (j = 0u; j < (numCols - l); j++) + { + /* Replace the element by the sum of that row + and a multiple of the reference row */ + *pInT1++ = *pInT1 - (in * *pPRT_in++); + } + /* Loop over the number of columns to + replace the elements in the destination matrix */ + for (j = 0u; j < numCols; j++) + { + /* Replace the element by the sum of that row + and a multiple of the reference row */ + *pInT2++ = *pInT2 - (in * *pPRT_pDst++); + } + + } + /* Increment the temporary input pointer */ + pInT1 = pInT1 + l; + } + /* Increment the input pointer */ + pIn++; + + /* Decrement the loop counter */ + loopCnt--; + /* Increment the index modifier */ + l++; + } + + +#endif /* #ifndef ARM_MATH_CM0 */ + + /* Set status as ARM_MATH_SUCCESS */ + status = ARM_MATH_SUCCESS; + + if((flag != 1u) && (in == 0.0f)) + { + status = ARM_MATH_SINGULAR; + } + } + /* Return to application */ + return (status); +} + +/** + * @} end of MatrixInv group + */