Fork of mbed-dsp. CMSIS-DSP library of supporting NEON
Dependents: mbed-os-example-cmsis_dsp_neon
Fork of mbed-dsp by
Information
Japanese version is available in lower part of this page.
このページの後半に日本語版が用意されています.
CMSIS-DSP of supporting NEON
What is this ?
A library for CMSIS-DSP of supporting NEON.
We supported the NEON to CMSIS-DSP Ver1.4.3(CMSIS V4.1) that ARM supplied, has achieved the processing speed improvement.
If you use the mbed-dsp library, you can use to replace this library.
CMSIS-DSP of supporting NEON is provied as a library.
Library Creation environment
CMSIS-DSP library of supporting NEON was created by the following environment.
- Compiler
ARMCC Version 5.03 - Compile option switch[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- Compile option switch[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
Effects of NEON support
In the data which passes to each function, large size will be expected more effective than small size.
Also if the data is a multiple of 16, effect will be expected in every function in the CMSIS-DSP.
NEON対応CMSIS-DSP
概要
NEON対応したCMSIS-DSPのライブラリです。
ARM社提供のCMSIS-DSP Ver1.4.3(CMSIS V4.1)をターゲットにNEON対応を行ない、処理速度向上を実現しております。
mbed-dspライブラリを使用している場合は、本ライブラリに置き換えて使用することができます。
NEON対応したCMSIS-DSPはライブラリで提供します。
ライブラリ作成環境
NEON対応CMSIS-DSPライブラリは、以下の環境で作成しています。
- コンパイラ
ARMCC Version 5.03 - コンパイルオプションスイッチ[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- コンパイルオプションスイッチ[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
NEON対応による効果について
CMSIS-DSP内の各関数へ渡すデータは、小さいサイズよりも大きいサイズの方が効果が見込めます。
また、16の倍数のデータであれば、CMSIS-DSP内のどの関数でも効果が見込めます。
cmsis_dsp/MatrixFunctions/arm_mat_inverse_f32.c
- Committer:
- emilmont
- Date:
- 2013-05-30
- Revision:
- 2:da51fb522205
- Parent:
- 1:fdd22bb7aa52
- Child:
- 3:7a284390b0ce
File content as of revision 2:da51fb522205:
/* ---------------------------------------------------------------------- * Copyright (C) 2010 ARM Limited. All rights reserved. * * $Date: 15. February 2012 * $Revision: V1.1.0 * * Project: CMSIS DSP Library * Title: arm_mat_inverse_f32.c * * Description: Floating-point matrix inverse. * * Target Processor: Cortex-M4/Cortex-M3/Cortex-M0 * * Version 1.1.0 2012/02/15 * Updated with more optimizations, bug fixes and minor API changes. * * Version 1.0.10 2011/7/15 * Big Endian support added and Merged M0 and M3/M4 Source code. * * Version 1.0.3 2010/11/29 * Re-organized the CMSIS folders and updated documentation. * * Version 1.0.2 2010/11/11 * Documentation updated. * * Version 1.0.1 2010/10/05 * Production release and review comments incorporated. * * Version 1.0.0 2010/09/20 * Production release and review comments incorporated. * -------------------------------------------------------------------- */ #include "arm_math.h" /** * @ingroup groupMatrix */ /** * @defgroup MatrixInv Matrix Inverse * * Computes the inverse of a matrix. * * The inverse is defined only if the input matrix is square and non-singular (the determinant * is non-zero). The function checks that the input and output matrices are square and of the * same size. * * Matrix inversion is numerically sensitive and the CMSIS DSP library only supports matrix * inversion of floating-point matrices. * * \par Algorithm * The Gauss-Jordan method is used to find the inverse. * The algorithm performs a sequence of elementary row-operations till it * reduces the input matrix to an identity matrix. Applying the same sequence * of elementary row-operations to an identity matrix yields the inverse matrix. * If the input matrix is singular, then the algorithm terminates and returns error status * <code>ARM_MATH_SINGULAR</code>. * \image html MatrixInverse.gif "Matrix Inverse of a 3 x 3 matrix using Gauss-Jordan Method" */ /** * @addtogroup MatrixInv * @{ */ /** * @brief Floating-point matrix inverse. * @param[in] *pSrc points to input matrix structure * @param[out] *pDst points to output matrix structure * @return The function returns * <code>ARM_MATH_SIZE_MISMATCH</code> if the input matrix is not square or if the size * of the output matrix does not match the size of the input matrix. * If the input matrix is found to be singular (non-invertible), then the function returns * <code>ARM_MATH_SINGULAR</code>. Otherwise, the function returns <code>ARM_MATH_SUCCESS</code>. */ arm_status arm_mat_inverse_f32( const arm_matrix_instance_f32 * pSrc, arm_matrix_instance_f32 * pDst) { float32_t *pIn = pSrc->pData; /* input data matrix pointer */ float32_t *pOut = pDst->pData; /* output data matrix pointer */ float32_t *pInT1, *pInT2; /* Temporary input data matrix pointer */ float32_t *pInT3, *pInT4; /* Temporary output data matrix pointer */ float32_t *pPivotRowIn, *pPRT_in, *pPivotRowDst, *pPRT_pDst; /* Temporary input and output data matrix pointer */ uint32_t numRows = pSrc->numRows; /* Number of rows in the matrix */ uint32_t numCols = pSrc->numCols; /* Number of Cols in the matrix */ #ifndef ARM_MATH_CM0 /* Run the below code for Cortex-M4 and Cortex-M3 */ float32_t Xchg, in = 0.0f, in1; /* Temporary input values */ uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l; /* loop counters */ arm_status status; /* status of matrix inverse */ #ifdef ARM_MATH_MATRIX_CHECK /* Check for matrix mismatch condition */ if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols) || (pSrc->numRows != pDst->numRows)) { /* Set status as ARM_MATH_SIZE_MISMATCH */ status = ARM_MATH_SIZE_MISMATCH; } else #endif /* #ifdef ARM_MATH_MATRIX_CHECK */ { /*-------------------------------------------------------------------------------------------------------------- * Matrix Inverse can be solved using elementary row operations. * * Gauss-Jordan Method: * * 1. First combine the identity matrix and the input matrix separated by a bar to form an * augmented matrix as follows: * _ _ _ _ * | a11 a12 | 1 0 | | X11 X12 | * | | | = | | * |_ a21 a22 | 0 1 _| |_ X21 X21 _| * * 2. In our implementation, pDst Matrix is used as identity matrix. * * 3. Begin with the first row. Let i = 1. * * 4. Check to see if the pivot for row i is zero. * The pivot is the element of the main diagonal that is on the current row. * For instance, if working with row i, then the pivot element is aii. * If the pivot is zero, exchange that row with a row below it that does not * contain a zero in column i. If this is not possible, then an inverse * to that matrix does not exist. * * 5. Divide every element of row i by the pivot. * * 6. For every row below and row i, replace that row with the sum of that row and * a multiple of row i so that each new element in column i below row i is zero. * * 7. Move to the next row and column and repeat steps 2 through 5 until you have zeros * for every element below and above the main diagonal. * * 8. Now an identical matrix is formed to the left of the bar(input matrix, pSrc). * Therefore, the matrix to the right of the bar is our solution(pDst matrix, pDst). *----------------------------------------------------------------------------------------------------------------*/ /* Working pointer for destination matrix */ pInT2 = pOut; /* Loop over the number of rows */ rowCnt = numRows; /* Making the destination matrix as identity matrix */ while(rowCnt > 0u) { /* Writing all zeroes in lower triangle of the destination matrix */ j = numRows - rowCnt; while(j > 0u) { *pInT2++ = 0.0f; j--; } /* Writing all ones in the diagonal of the destination matrix */ *pInT2++ = 1.0f; /* Writing all zeroes in upper triangle of the destination matrix */ j = rowCnt - 1u; while(j > 0u) { *pInT2++ = 0.0f; j--; } /* Decrement the loop counter */ rowCnt--; } /* Loop over the number of columns of the input matrix. All the elements in each column are processed by the row operations */ loopCnt = numCols; /* Index modifier to navigate through the columns */ l = 0u; while(loopCnt > 0u) { /* Check if the pivot element is zero.. * If it is zero then interchange the row with non zero row below. * If there is no non zero element to replace in the rows below, * then the matrix is Singular. */ /* Working pointer for the input matrix that points * to the pivot element of the particular row */ pInT1 = pIn + (l * numCols); /* Working pointer for the destination matrix that points * to the pivot element of the particular row */ pInT3 = pOut + (l * numCols); /* Temporary variable to hold the pivot value */ in = *pInT1; /* Destination pointer modifier */ k = 1u; /* Check if the pivot element is zero */ if(*pInT1 == 0.0f) { /* Loop over the number rows present below */ i = numRows - (l + 1u); while(i > 0u) { /* Update the input and destination pointers */ pInT2 = pInT1 + (numCols * l); pInT4 = pInT3 + (numCols * k); /* Check if there is a non zero pivot element to * replace in the rows below */ if(*pInT2 != 0.0f) { /* Loop over number of columns * to the right of the pilot element */ j = numCols - l; while(j > 0u) { /* Exchange the row elements of the input matrix */ Xchg = *pInT2; *pInT2++ = *pInT1; *pInT1++ = Xchg; /* Decrement the loop counter */ j--; } /* Loop over number of columns of the destination matrix */ j = numCols; while(j > 0u) { /* Exchange the row elements of the destination matrix */ Xchg = *pInT4; *pInT4++ = *pInT3; *pInT3++ = Xchg; /* Decrement the loop counter */ j--; } /* Flag to indicate whether exchange is done or not */ flag = 1u; /* Break after exchange is done */ break; } /* Update the destination pointer modifier */ k++; /* Decrement the loop counter */ i--; } } /* Update the status if the matrix is singular */ if((flag != 1u) && (in == 0.0f)) { status = ARM_MATH_SINGULAR; break; } /* Points to the pivot row of input and destination matrices */ pPivotRowIn = pIn + (l * numCols); pPivotRowDst = pOut + (l * numCols); /* Temporary pointers to the pivot row pointers */ pInT1 = pPivotRowIn; pInT2 = pPivotRowDst; /* Pivot element of the row */ in = *(pIn + (l * numCols)); /* Loop over number of columns * to the right of the pilot element */ j = (numCols - l); while(j > 0u) { /* Divide each element of the row of the input matrix * by the pivot element */ in1 = *pInT1; *pInT1++ = in1 / in; /* Decrement the loop counter */ j--; } /* Loop over number of columns of the destination matrix */ j = numCols; while(j > 0u) { /* Divide each element of the row of the destination matrix * by the pivot element */ in1 = *pInT2; *pInT2++ = in1 / in; /* Decrement the loop counter */ j--; } /* Replace the rows with the sum of that row and a multiple of row i * so that each new element in column i above row i is zero.*/ /* Temporary pointers for input and destination matrices */ pInT1 = pIn; pInT2 = pOut; /* index used to check for pivot element */ i = 0u; /* Loop over number of rows */ /* to be replaced by the sum of that row and a multiple of row i */ k = numRows; while(k > 0u) { /* Check for the pivot element */ if(i == l) { /* If the processing element is the pivot element, only the columns to the right are to be processed */ pInT1 += numCols - l; pInT2 += numCols; } else { /* Element of the reference row */ in = *pInT1; /* Working pointers for input and destination pivot rows */ pPRT_in = pPivotRowIn; pPRT_pDst = pPivotRowDst; /* Loop over the number of columns to the right of the pivot element, to replace the elements in the input matrix */ j = (numCols - l); while(j > 0u) { /* Replace the element by the sum of that row and a multiple of the reference row */ in1 = *pInT1; *pInT1++ = in1 - (in * *pPRT_in++); /* Decrement the loop counter */ j--; } /* Loop over the number of columns to replace the elements in the destination matrix */ j = numCols; while(j > 0u) { /* Replace the element by the sum of that row and a multiple of the reference row */ in1 = *pInT2; *pInT2++ = in1 - (in * *pPRT_pDst++); /* Decrement the loop counter */ j--; } } /* Increment the temporary input pointer */ pInT1 = pInT1 + l; /* Decrement the loop counter */ k--; /* Increment the pivot index */ i++; } /* Increment the input pointer */ pIn++; /* Decrement the loop counter */ loopCnt--; /* Increment the index modifier */ l++; } #else /* Run the below code for Cortex-M0 */ float32_t Xchg, in = 0.0f; /* Temporary input values */ uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l; /* loop counters */ arm_status status; /* status of matrix inverse */ #ifdef ARM_MATH_MATRIX_CHECK /* Check for matrix mismatch condition */ if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols) || (pSrc->numRows != pDst->numRows)) { /* Set status as ARM_MATH_SIZE_MISMATCH */ status = ARM_MATH_SIZE_MISMATCH; } else #endif /* #ifdef ARM_MATH_MATRIX_CHECK */ { /*-------------------------------------------------------------------------------------------------------------- * Matrix Inverse can be solved using elementary row operations. * * Gauss-Jordan Method: * * 1. First combine the identity matrix and the input matrix separated by a bar to form an * augmented matrix as follows: * _ _ _ _ _ _ _ _ * | | a11 a12 | | | 1 0 | | | X11 X12 | * | | | | | | | = | | * |_ |_ a21 a22 _| | |_0 1 _| _| |_ X21 X21 _| * * 2. In our implementation, pDst Matrix is used as identity matrix. * * 3. Begin with the first row. Let i = 1. * * 4. Check to see if the pivot for row i is zero. * The pivot is the element of the main diagonal that is on the current row. * For instance, if working with row i, then the pivot element is aii. * If the pivot is zero, exchange that row with a row below it that does not * contain a zero in column i. If this is not possible, then an inverse * to that matrix does not exist. * * 5. Divide every element of row i by the pivot. * * 6. For every row below and row i, replace that row with the sum of that row and * a multiple of row i so that each new element in column i below row i is zero. * * 7. Move to the next row and column and repeat steps 2 through 5 until you have zeros * for every element below and above the main diagonal. * * 8. Now an identical matrix is formed to the left of the bar(input matrix, src). * Therefore, the matrix to the right of the bar is our solution(dst matrix, dst). *----------------------------------------------------------------------------------------------------------------*/ /* Working pointer for destination matrix */ pInT2 = pOut; /* Loop over the number of rows */ rowCnt = numRows; /* Making the destination matrix as identity matrix */ while(rowCnt > 0u) { /* Writing all zeroes in lower triangle of the destination matrix */ j = numRows - rowCnt; while(j > 0u) { *pInT2++ = 0.0f; j--; } /* Writing all ones in the diagonal of the destination matrix */ *pInT2++ = 1.0f; /* Writing all zeroes in upper triangle of the destination matrix */ j = rowCnt - 1u; while(j > 0u) { *pInT2++ = 0.0f; j--; } /* Decrement the loop counter */ rowCnt--; } /* Loop over the number of columns of the input matrix. All the elements in each column are processed by the row operations */ loopCnt = numCols; /* Index modifier to navigate through the columns */ l = 0u; //for(loopCnt = 0u; loopCnt < numCols; loopCnt++) while(loopCnt > 0u) { /* Check if the pivot element is zero.. * If it is zero then interchange the row with non zero row below. * If there is no non zero element to replace in the rows below, * then the matrix is Singular. */ /* Working pointer for the input matrix that points * to the pivot element of the particular row */ pInT1 = pIn + (l * numCols); /* Working pointer for the destination matrix that points * to the pivot element of the particular row */ pInT3 = pOut + (l * numCols); /* Temporary variable to hold the pivot value */ in = *pInT1; /* Destination pointer modifier */ k = 1u; /* Check if the pivot element is zero */ if(*pInT1 == 0.0f) { /* Loop over the number rows present below */ for (i = (l + 1u); i < numRows; i++) { /* Update the input and destination pointers */ pInT2 = pInT1 + (numCols * l); pInT4 = pInT3 + (numCols * k); /* Check if there is a non zero pivot element to * replace in the rows below */ if(*pInT2 != 0.0f) { /* Loop over number of columns * to the right of the pilot element */ for (j = 0u; j < (numCols - l); j++) { /* Exchange the row elements of the input matrix */ Xchg = *pInT2; *pInT2++ = *pInT1; *pInT1++ = Xchg; } for (j = 0u; j < numCols; j++) { Xchg = *pInT4; *pInT4++ = *pInT3; *pInT3++ = Xchg; } /* Flag to indicate whether exchange is done or not */ flag = 1u; /* Break after exchange is done */ break; } /* Update the destination pointer modifier */ k++; } } /* Update the status if the matrix is singular */ if((flag != 1u) && (in == 0.0f)) { status = ARM_MATH_SINGULAR; break; } /* Points to the pivot row of input and destination matrices */ pPivotRowIn = pIn + (l * numCols); pPivotRowDst = pOut + (l * numCols); /* Temporary pointers to the pivot row pointers */ pInT1 = pPivotRowIn; pInT2 = pPivotRowDst; /* Pivot element of the row */ in = *(pIn + (l * numCols)); /* Loop over number of columns * to the right of the pilot element */ for (j = 0u; j < (numCols - l); j++) { /* Divide each element of the row of the input matrix * by the pivot element */ *pInT1++ = *pInT1 / in; } for (j = 0u; j < numCols; j++) { /* Divide each element of the row of the destination matrix * by the pivot element */ *pInT2++ = *pInT2 / in; } /* Replace the rows with the sum of that row and a multiple of row i * so that each new element in column i above row i is zero.*/ /* Temporary pointers for input and destination matrices */ pInT1 = pIn; pInT2 = pOut; for (i = 0u; i < numRows; i++) { /* Check for the pivot element */ if(i == l) { /* If the processing element is the pivot element, only the columns to the right are to be processed */ pInT1 += numCols - l; pInT2 += numCols; } else { /* Element of the reference row */ in = *pInT1; /* Working pointers for input and destination pivot rows */ pPRT_in = pPivotRowIn; pPRT_pDst = pPivotRowDst; /* Loop over the number of columns to the right of the pivot element, to replace the elements in the input matrix */ for (j = 0u; j < (numCols - l); j++) { /* Replace the element by the sum of that row and a multiple of the reference row */ *pInT1++ = *pInT1 - (in * *pPRT_in++); } /* Loop over the number of columns to replace the elements in the destination matrix */ for (j = 0u; j < numCols; j++) { /* Replace the element by the sum of that row and a multiple of the reference row */ *pInT2++ = *pInT2 - (in * *pPRT_pDst++); } } /* Increment the temporary input pointer */ pInT1 = pInT1 + l; } /* Increment the input pointer */ pIn++; /* Decrement the loop counter */ loopCnt--; /* Increment the index modifier */ l++; } #endif /* #ifndef ARM_MATH_CM0 */ /* Set status as ARM_MATH_SUCCESS */ status = ARM_MATH_SUCCESS; if((flag != 1u) && (in == 0.0f)) { status = ARM_MATH_SINGULAR; } } /* Return to application */ return (status); } /** * @} end of MatrixInv group */