Fork of mbed-dsp. CMSIS-DSP library of supporting NEON
Dependents: mbed-os-example-cmsis_dsp_neon
Fork of mbed-dsp by
Information
Japanese version is available in lower part of this page.
このページの後半に日本語版が用意されています.
CMSIS-DSP of supporting NEON
What is this ?
A library for CMSIS-DSP of supporting NEON.
We supported the NEON to CMSIS-DSP Ver1.4.3(CMSIS V4.1) that ARM supplied, has achieved the processing speed improvement.
If you use the mbed-dsp library, you can use to replace this library.
CMSIS-DSP of supporting NEON is provied as a library.
Library Creation environment
CMSIS-DSP library of supporting NEON was created by the following environment.
- Compiler
ARMCC Version 5.03 - Compile option switch[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- Compile option switch[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
Effects of NEON support
In the data which passes to each function, large size will be expected more effective than small size.
Also if the data is a multiple of 16, effect will be expected in every function in the CMSIS-DSP.
NEON対応CMSIS-DSP
概要
NEON対応したCMSIS-DSPのライブラリです。
ARM社提供のCMSIS-DSP Ver1.4.3(CMSIS V4.1)をターゲットにNEON対応を行ない、処理速度向上を実現しております。
mbed-dspライブラリを使用している場合は、本ライブラリに置き換えて使用することができます。
NEON対応したCMSIS-DSPはライブラリで提供します。
ライブラリ作成環境
NEON対応CMSIS-DSPライブラリは、以下の環境で作成しています。
- コンパイラ
ARMCC Version 5.03 - コンパイルオプションスイッチ[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- コンパイルオプションスイッチ[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
NEON対応による効果について
CMSIS-DSP内の各関数へ渡すデータは、小さいサイズよりも大きいサイズの方が効果が見込めます。
また、16の倍数のデータであれば、CMSIS-DSP内のどの関数でも効果が見込めます。
Diff: cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c
- Revision:
- 5:a912b042151f
- Parent:
- 4:9cee975aadce
diff -r 9cee975aadce -r a912b042151f cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c --- a/cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c Mon Jun 23 09:30:09 2014 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,700 +0,0 @@ -/* ---------------------------------------------------------------------- -* Copyright (C) 2010-2013 ARM Limited. All rights reserved. -* -* $Date: 1. March 2013 -* $Revision: V1.4.1 -* -* Project: CMSIS DSP Library -* Title: arm_mat_inverse_f32.c -* -* Description: Floating-point matrix inverse. -* -* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0 -* -* Redistribution and use in source and binary forms, with or without -* modification, are permitted provided that the following conditions -* are met: -* - Redistributions of source code must retain the above copyright -* notice, this list of conditions and the following disclaimer. -* - Redistributions in binary form must reproduce the above copyright -* notice, this list of conditions and the following disclaimer in -* the documentation and/or other materials provided with the -* distribution. -* - Neither the name of ARM LIMITED nor the names of its contributors -* may be used to endorse or promote products derived from this -* software without specific prior written permission. -* -* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS -* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE -* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, -* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, -* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; -* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT -* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN -* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -* POSSIBILITY OF SUCH DAMAGE. -* -------------------------------------------------------------------- */ - -#include "arm_math.h" - -/** - * @ingroup groupMatrix - */ - -/** - * @defgroup MatrixInv Matrix Inverse - * - * Computes the inverse of a matrix. - * - * The inverse is defined only if the input matrix is square and non-singular (the determinant - * is non-zero). The function checks that the input and output matrices are square and of the - * same size. - * - * Matrix inversion is numerically sensitive and the CMSIS DSP library only supports matrix - * inversion of floating-point matrices. - * - * \par Algorithm - * The Gauss-Jordan method is used to find the inverse. - * The algorithm performs a sequence of elementary row-operations till it - * reduces the input matrix to an identity matrix. Applying the same sequence - * of elementary row-operations to an identity matrix yields the inverse matrix. - * If the input matrix is singular, then the algorithm terminates and returns error status - * <code>ARM_MATH_SINGULAR</code>. - * \image html MatrixInverse.gif "Matrix Inverse of a 3 x 3 matrix using Gauss-Jordan Method" - */ - -/** - * @addtogroup MatrixInv - * @{ - */ - -/** - * @brief Floating-point matrix inverse. - * @param[in] *pSrc points to input matrix structure - * @param[out] *pDst points to output matrix structure - * @return The function returns - * <code>ARM_MATH_SIZE_MISMATCH</code> if the input matrix is not square or if the size - * of the output matrix does not match the size of the input matrix. - * If the input matrix is found to be singular (non-invertible), then the function returns - * <code>ARM_MATH_SINGULAR</code>. Otherwise, the function returns <code>ARM_MATH_SUCCESS</code>. - */ - -arm_status arm_mat_inverse_f32( - const arm_matrix_instance_f32 * pSrc, - arm_matrix_instance_f32 * pDst) -{ - float32_t *pIn = pSrc->pData; /* input data matrix pointer */ - float32_t *pOut = pDst->pData; /* output data matrix pointer */ - float32_t *pInT1, *pInT2; /* Temporary input data matrix pointer */ - float32_t *pInT3, *pInT4; /* Temporary output data matrix pointer */ - float32_t *pPivotRowIn, *pPRT_in, *pPivotRowDst, *pPRT_pDst; /* Temporary input and output data matrix pointer */ - uint32_t numRows = pSrc->numRows; /* Number of rows in the matrix */ - uint32_t numCols = pSrc->numCols; /* Number of Cols in the matrix */ - -#ifndef ARM_MATH_CM0_FAMILY - float32_t maxC; /* maximum value in the column */ - - /* Run the below code for Cortex-M4 and Cortex-M3 */ - - float32_t Xchg, in = 0.0f, in1; /* Temporary input values */ - uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l; /* loop counters */ - arm_status status; /* status of matrix inverse */ - -#ifdef ARM_MATH_MATRIX_CHECK - - - /* Check for matrix mismatch condition */ - if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols) - || (pSrc->numRows != pDst->numRows)) - { - /* Set status as ARM_MATH_SIZE_MISMATCH */ - status = ARM_MATH_SIZE_MISMATCH; - } - else -#endif /* #ifdef ARM_MATH_MATRIX_CHECK */ - - { - - /*-------------------------------------------------------------------------------------------------------------- - * Matrix Inverse can be solved using elementary row operations. - * - * Gauss-Jordan Method: - * - * 1. First combine the identity matrix and the input matrix separated by a bar to form an - * augmented matrix as follows: - * _ _ _ _ - * | a11 a12 | 1 0 | | X11 X12 | - * | | | = | | - * |_ a21 a22 | 0 1 _| |_ X21 X21 _| - * - * 2. In our implementation, pDst Matrix is used as identity matrix. - * - * 3. Begin with the first row. Let i = 1. - * - * 4. Check to see if the pivot for column i is the greatest of the column. - * The pivot is the element of the main diagonal that is on the current row. - * For instance, if working with row i, then the pivot element is aii. - * If the pivot is not the most significant of the coluimns, exchange that row with a row - * below it that does contain the most significant value in column i. If the most - * significant value of the column is zero, then an inverse to that matrix does not exist. - * The most significant value of the column is the absolut maximum. - * - * 5. Divide every element of row i by the pivot. - * - * 6. For every row below and row i, replace that row with the sum of that row and - * a multiple of row i so that each new element in column i below row i is zero. - * - * 7. Move to the next row and column and repeat steps 2 through 5 until you have zeros - * for every element below and above the main diagonal. - * - * 8. Now an identical matrix is formed to the left of the bar(input matrix, pSrc). - * Therefore, the matrix to the right of the bar is our solution(pDst matrix, pDst). - *----------------------------------------------------------------------------------------------------------------*/ - - /* Working pointer for destination matrix */ - pInT2 = pOut; - - /* Loop over the number of rows */ - rowCnt = numRows; - - /* Making the destination matrix as identity matrix */ - while(rowCnt > 0u) - { - /* Writing all zeroes in lower triangle of the destination matrix */ - j = numRows - rowCnt; - while(j > 0u) - { - *pInT2++ = 0.0f; - j--; - } - - /* Writing all ones in the diagonal of the destination matrix */ - *pInT2++ = 1.0f; - - /* Writing all zeroes in upper triangle of the destination matrix */ - j = rowCnt - 1u; - while(j > 0u) - { - *pInT2++ = 0.0f; - j--; - } - - /* Decrement the loop counter */ - rowCnt--; - } - - /* Loop over the number of columns of the input matrix. - All the elements in each column are processed by the row operations */ - loopCnt = numCols; - - /* Index modifier to navigate through the columns */ - l = 0u; - - while(loopCnt > 0u) - { - /* Check if the pivot element is zero.. - * If it is zero then interchange the row with non zero row below. - * If there is no non zero element to replace in the rows below, - * then the matrix is Singular. */ - - /* Working pointer for the input matrix that points - * to the pivot element of the particular row */ - pInT1 = pIn + (l * numCols); - - /* Working pointer for the destination matrix that points - * to the pivot element of the particular row */ - pInT3 = pOut + (l * numCols); - - /* Temporary variable to hold the pivot value */ - in = *pInT1; - - /* Destination pointer modifier */ - k = 1u; - - /* Grab the most significant value from column l */ - maxC = 0; - for (i = 0; i < numRows; i++) - { - maxC = *pInT1 > 0 ? (*pInT1 > maxC ? *pInT1 : maxC) : (-*pInT1 > maxC ? -*pInT1 : maxC); - pInT1 += numCols; - } - - /* Update the status if the matrix is singular */ - if(maxC == 0.0f) - { - status = ARM_MATH_SINGULAR; - break; - } - - /* Restore pInT1 */ - pInT1 -= numRows * numCols; - - /* Check if the pivot element is the most significant of the column */ - if( (in > 0.0f ? in : -in) != maxC) - { - /* Loop over the number rows present below */ - i = numRows - (l + 1u); - - while(i > 0u) - { - /* Update the input and destination pointers */ - pInT2 = pInT1 + (numCols * l); - pInT4 = pInT3 + (numCols * k); - - /* Look for the most significant element to - * replace in the rows below */ - if((*pInT2 > 0.0f ? *pInT2: -*pInT2) == maxC) - { - /* Loop over number of columns - * to the right of the pilot element */ - j = numCols - l; - - while(j > 0u) - { - /* Exchange the row elements of the input matrix */ - Xchg = *pInT2; - *pInT2++ = *pInT1; - *pInT1++ = Xchg; - - /* Decrement the loop counter */ - j--; - } - - /* Loop over number of columns of the destination matrix */ - j = numCols; - - while(j > 0u) - { - /* Exchange the row elements of the destination matrix */ - Xchg = *pInT4; - *pInT4++ = *pInT3; - *pInT3++ = Xchg; - - /* Decrement the loop counter */ - j--; - } - - /* Flag to indicate whether exchange is done or not */ - flag = 1u; - - /* Break after exchange is done */ - break; - } - - /* Update the destination pointer modifier */ - k++; - - /* Decrement the loop counter */ - i--; - } - } - - /* Update the status if the matrix is singular */ - if((flag != 1u) && (in == 0.0f)) - { - status = ARM_MATH_SINGULAR; - - break; - } - - /* Points to the pivot row of input and destination matrices */ - pPivotRowIn = pIn + (l * numCols); - pPivotRowDst = pOut + (l * numCols); - - /* Temporary pointers to the pivot row pointers */ - pInT1 = pPivotRowIn; - pInT2 = pPivotRowDst; - - /* Pivot element of the row */ - in = *pPivotRowIn; - - /* Loop over number of columns - * to the right of the pilot element */ - j = (numCols - l); - - while(j > 0u) - { - /* Divide each element of the row of the input matrix - * by the pivot element */ - in1 = *pInT1; - *pInT1++ = in1 / in; - - /* Decrement the loop counter */ - j--; - } - - /* Loop over number of columns of the destination matrix */ - j = numCols; - - while(j > 0u) - { - /* Divide each element of the row of the destination matrix - * by the pivot element */ - in1 = *pInT2; - *pInT2++ = in1 / in; - - /* Decrement the loop counter */ - j--; - } - - /* Replace the rows with the sum of that row and a multiple of row i - * so that each new element in column i above row i is zero.*/ - - /* Temporary pointers for input and destination matrices */ - pInT1 = pIn; - pInT2 = pOut; - - /* index used to check for pivot element */ - i = 0u; - - /* Loop over number of rows */ - /* to be replaced by the sum of that row and a multiple of row i */ - k = numRows; - - while(k > 0u) - { - /* Check for the pivot element */ - if(i == l) - { - /* If the processing element is the pivot element, - only the columns to the right are to be processed */ - pInT1 += numCols - l; - - pInT2 += numCols; - } - else - { - /* Element of the reference row */ - in = *pInT1; - - /* Working pointers for input and destination pivot rows */ - pPRT_in = pPivotRowIn; - pPRT_pDst = pPivotRowDst; - - /* Loop over the number of columns to the right of the pivot element, - to replace the elements in the input matrix */ - j = (numCols - l); - - while(j > 0u) - { - /* Replace the element by the sum of that row - and a multiple of the reference row */ - in1 = *pInT1; - *pInT1++ = in1 - (in * *pPRT_in++); - - /* Decrement the loop counter */ - j--; - } - - /* Loop over the number of columns to - replace the elements in the destination matrix */ - j = numCols; - - while(j > 0u) - { - /* Replace the element by the sum of that row - and a multiple of the reference row */ - in1 = *pInT2; - *pInT2++ = in1 - (in * *pPRT_pDst++); - - /* Decrement the loop counter */ - j--; - } - - } - - /* Increment the temporary input pointer */ - pInT1 = pInT1 + l; - - /* Decrement the loop counter */ - k--; - - /* Increment the pivot index */ - i++; - } - - /* Increment the input pointer */ - pIn++; - - /* Decrement the loop counter */ - loopCnt--; - - /* Increment the index modifier */ - l++; - } - - -#else - - /* Run the below code for Cortex-M0 */ - - float32_t Xchg, in = 0.0f; /* Temporary input values */ - uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l; /* loop counters */ - arm_status status; /* status of matrix inverse */ - -#ifdef ARM_MATH_MATRIX_CHECK - - /* Check for matrix mismatch condition */ - if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols) - || (pSrc->numRows != pDst->numRows)) - { - /* Set status as ARM_MATH_SIZE_MISMATCH */ - status = ARM_MATH_SIZE_MISMATCH; - } - else -#endif /* #ifdef ARM_MATH_MATRIX_CHECK */ - { - - /*-------------------------------------------------------------------------------------------------------------- - * Matrix Inverse can be solved using elementary row operations. - * - * Gauss-Jordan Method: - * - * 1. First combine the identity matrix and the input matrix separated by a bar to form an - * augmented matrix as follows: - * _ _ _ _ _ _ _ _ - * | | a11 a12 | | | 1 0 | | | X11 X12 | - * | | | | | | | = | | - * |_ |_ a21 a22 _| | |_0 1 _| _| |_ X21 X21 _| - * - * 2. In our implementation, pDst Matrix is used as identity matrix. - * - * 3. Begin with the first row. Let i = 1. - * - * 4. Check to see if the pivot for row i is zero. - * The pivot is the element of the main diagonal that is on the current row. - * For instance, if working with row i, then the pivot element is aii. - * If the pivot is zero, exchange that row with a row below it that does not - * contain a zero in column i. If this is not possible, then an inverse - * to that matrix does not exist. - * - * 5. Divide every element of row i by the pivot. - * - * 6. For every row below and row i, replace that row with the sum of that row and - * a multiple of row i so that each new element in column i below row i is zero. - * - * 7. Move to the next row and column and repeat steps 2 through 5 until you have zeros - * for every element below and above the main diagonal. - * - * 8. Now an identical matrix is formed to the left of the bar(input matrix, src). - * Therefore, the matrix to the right of the bar is our solution(dst matrix, dst). - *----------------------------------------------------------------------------------------------------------------*/ - - /* Working pointer for destination matrix */ - pInT2 = pOut; - - /* Loop over the number of rows */ - rowCnt = numRows; - - /* Making the destination matrix as identity matrix */ - while(rowCnt > 0u) - { - /* Writing all zeroes in lower triangle of the destination matrix */ - j = numRows - rowCnt; - while(j > 0u) - { - *pInT2++ = 0.0f; - j--; - } - - /* Writing all ones in the diagonal of the destination matrix */ - *pInT2++ = 1.0f; - - /* Writing all zeroes in upper triangle of the destination matrix */ - j = rowCnt - 1u; - while(j > 0u) - { - *pInT2++ = 0.0f; - j--; - } - - /* Decrement the loop counter */ - rowCnt--; - } - - /* Loop over the number of columns of the input matrix. - All the elements in each column are processed by the row operations */ - loopCnt = numCols; - - /* Index modifier to navigate through the columns */ - l = 0u; - //for(loopCnt = 0u; loopCnt < numCols; loopCnt++) - while(loopCnt > 0u) - { - /* Check if the pivot element is zero.. - * If it is zero then interchange the row with non zero row below. - * If there is no non zero element to replace in the rows below, - * then the matrix is Singular. */ - - /* Working pointer for the input matrix that points - * to the pivot element of the particular row */ - pInT1 = pIn + (l * numCols); - - /* Working pointer for the destination matrix that points - * to the pivot element of the particular row */ - pInT3 = pOut + (l * numCols); - - /* Temporary variable to hold the pivot value */ - in = *pInT1; - - /* Destination pointer modifier */ - k = 1u; - - /* Check if the pivot element is zero */ - if(*pInT1 == 0.0f) - { - /* Loop over the number rows present below */ - for (i = (l + 1u); i < numRows; i++) - { - /* Update the input and destination pointers */ - pInT2 = pInT1 + (numCols * l); - pInT4 = pInT3 + (numCols * k); - - /* Check if there is a non zero pivot element to - * replace in the rows below */ - if(*pInT2 != 0.0f) - { - /* Loop over number of columns - * to the right of the pilot element */ - for (j = 0u; j < (numCols - l); j++) - { - /* Exchange the row elements of the input matrix */ - Xchg = *pInT2; - *pInT2++ = *pInT1; - *pInT1++ = Xchg; - } - - for (j = 0u; j < numCols; j++) - { - Xchg = *pInT4; - *pInT4++ = *pInT3; - *pInT3++ = Xchg; - } - - /* Flag to indicate whether exchange is done or not */ - flag = 1u; - - /* Break after exchange is done */ - break; - } - - /* Update the destination pointer modifier */ - k++; - } - } - - /* Update the status if the matrix is singular */ - if((flag != 1u) && (in == 0.0f)) - { - status = ARM_MATH_SINGULAR; - - break; - } - - /* Points to the pivot row of input and destination matrices */ - pPivotRowIn = pIn + (l * numCols); - pPivotRowDst = pOut + (l * numCols); - - /* Temporary pointers to the pivot row pointers */ - pInT1 = pPivotRowIn; - pInT2 = pPivotRowDst; - - /* Pivot element of the row */ - in = *(pIn + (l * numCols)); - - /* Loop over number of columns - * to the right of the pilot element */ - for (j = 0u; j < (numCols - l); j++) - { - /* Divide each element of the row of the input matrix - * by the pivot element */ - *pInT1 = *pInT1 / in; - pInT1++; - } - for (j = 0u; j < numCols; j++) - { - /* Divide each element of the row of the destination matrix - * by the pivot element */ - *pInT2 = *pInT2 / in; - pInT2++; - } - - /* Replace the rows with the sum of that row and a multiple of row i - * so that each new element in column i above row i is zero.*/ - - /* Temporary pointers for input and destination matrices */ - pInT1 = pIn; - pInT2 = pOut; - - for (i = 0u; i < numRows; i++) - { - /* Check for the pivot element */ - if(i == l) - { - /* If the processing element is the pivot element, - only the columns to the right are to be processed */ - pInT1 += numCols - l; - pInT2 += numCols; - } - else - { - /* Element of the reference row */ - in = *pInT1; - - /* Working pointers for input and destination pivot rows */ - pPRT_in = pPivotRowIn; - pPRT_pDst = pPivotRowDst; - - /* Loop over the number of columns to the right of the pivot element, - to replace the elements in the input matrix */ - for (j = 0u; j < (numCols - l); j++) - { - /* Replace the element by the sum of that row - and a multiple of the reference row */ - *pInT1 = *pInT1 - (in * *pPRT_in++); - pInT1++; - } - /* Loop over the number of columns to - replace the elements in the destination matrix */ - for (j = 0u; j < numCols; j++) - { - /* Replace the element by the sum of that row - and a multiple of the reference row */ - *pInT2 = *pInT2 - (in * *pPRT_pDst++); - pInT2++; - } - - } - /* Increment the temporary input pointer */ - pInT1 = pInT1 + l; - } - /* Increment the input pointer */ - pIn++; - - /* Decrement the loop counter */ - loopCnt--; - /* Increment the index modifier */ - l++; - } - - -#endif /* #ifndef ARM_MATH_CM0_FAMILY */ - - /* Set status as ARM_MATH_SUCCESS */ - status = ARM_MATH_SUCCESS; - - if((flag != 1u) && (in == 0.0f)) - { - status = ARM_MATH_SINGULAR; - } - } - /* Return to application */ - return (status); -} - -/** - * @} end of MatrixInv group - */