Fork of mbed-dsp. CMSIS-DSP library of supporting NEON
Dependents: mbed-os-example-cmsis_dsp_neon
Fork of mbed-dsp by
Information
Japanese version is available in lower part of this page.
このページの後半に日本語版が用意されています.
CMSIS-DSP of supporting NEON
What is this ?
A library for CMSIS-DSP of supporting NEON.
We supported the NEON to CMSIS-DSP Ver1.4.3(CMSIS V4.1) that ARM supplied, has achieved the processing speed improvement.
If you use the mbed-dsp library, you can use to replace this library.
CMSIS-DSP of supporting NEON is provied as a library.
Library Creation environment
CMSIS-DSP library of supporting NEON was created by the following environment.
- Compiler
ARMCC Version 5.03 - Compile option switch[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- Compile option switch[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
Effects of NEON support
In the data which passes to each function, large size will be expected more effective than small size.
Also if the data is a multiple of 16, effect will be expected in every function in the CMSIS-DSP.
NEON対応CMSIS-DSP
概要
NEON対応したCMSIS-DSPのライブラリです。
ARM社提供のCMSIS-DSP Ver1.4.3(CMSIS V4.1)をターゲットにNEON対応を行ない、処理速度向上を実現しております。
mbed-dspライブラリを使用している場合は、本ライブラリに置き換えて使用することができます。
NEON対応したCMSIS-DSPはライブラリで提供します。
ライブラリ作成環境
NEON対応CMSIS-DSPライブラリは、以下の環境で作成しています。
- コンパイラ
ARMCC Version 5.03 - コンパイルオプションスイッチ[C Compiler]
-DARM_MATH_MATRIX_CHECK -DARM_MATH_ROUNDING -O3 -Otime --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp --vectorize --asm
- コンパイルオプションスイッチ[Assembler]
--cpreproc --cpu=Cortex-A9 --littleend --arm --apcs=/interwork --no_unaligned_access --fpu=vfpv3_fp16 --fpmode=fast --apcs=/hardfp
NEON対応による効果について
CMSIS-DSP内の各関数へ渡すデータは、小さいサイズよりも大きいサイズの方が効果が見込めます。
また、16の倍数のデータであれば、CMSIS-DSP内のどの関数でも効果が見込めます。
Diff: cmsis_dsp/arm_math.h
- Revision:
- 5:a912b042151f
- Parent:
- 3:7a284390b0ce
--- a/cmsis_dsp/arm_math.h Mon Jun 23 09:30:09 2014 +0100 +++ b/cmsis_dsp/arm_math.h Tue Jun 23 06:23:42 2015 +0000 @@ -1,13 +1,13 @@ /* ---------------------------------------------------------------------- -* Copyright (C) 2010-2013 ARM Limited. All rights reserved. +* Copyright (C) 2010-2014 ARM Limited. All rights reserved. * -* $Date: 17. January 2013 -* $Revision: V1.4.1 +* $Date: 12. March 2014 +* $Revision: V1.4.3 * -* Project: CMSIS DSP Library -* Title: arm_math.h +* Project: CMSIS DSP Library +* Title: arm_math.h * -* Description: Public header file for CMSIS DSP Library +* Description: Public header file for CMSIS DSP Library * * Target Processor: Cortex-M4/Cortex-M3/Cortex-M0 * @@ -41,7 +41,8 @@ /** \mainpage CMSIS DSP Software Library * - * <b>Introduction</b> + * Introduction + * ------------ * * This user manual describes the CMSIS DSP software library, * a suite of common signal processing functions for use on Cortex-M processor based devices. @@ -61,7 +62,8 @@ * The library has separate functions for operating on 8-bit integers, 16-bit integers, * 32-bit integer and 32-bit floating-point values. * - * <b>Using the Library</b> + * Using the Library + * ------------ * * The library installer contains prebuilt versions of the libraries in the <code>Lib</code> folder. * - arm_cortexM4lf_math.lib (Little endian and Floating Point Unit on Cortex-M4) @@ -79,31 +81,28 @@ * Define the appropriate pre processor MACRO ARM_MATH_CM4 or ARM_MATH_CM3 or * ARM_MATH_CM0 or ARM_MATH_CM0PLUS depending on the target processor in the application. * - * <b>Examples</b> + * Examples + * -------- * * The library ships with a number of examples which demonstrate how to use the library functions. * - * <b>Toolchain Support</b> + * Toolchain Support + * ------------ * * The library has been developed and tested with MDK-ARM version 4.60. * The library is being tested in GCC and IAR toolchains and updates on this activity will be made available shortly. * - * <b>Building the Library</b> + * Building the Library + * ------------ * - * The library installer contains project files to re build libraries on MDK Tool chain in the <code>CMSIS\\DSP_Lib\\Source\\ARM</code> folder. - * - arm_cortexM0b_math.uvproj - * - arm_cortexM0l_math.uvproj - * - arm_cortexM3b_math.uvproj - * - arm_cortexM3l_math.uvproj - * - arm_cortexM4b_math.uvproj - * - arm_cortexM4l_math.uvproj - * - arm_cortexM4bf_math.uvproj - * - arm_cortexM4lf_math.uvproj + * The library installer contains a project file to re build libraries on MDK-ARM Tool chain in the <code>CMSIS\\DSP_Lib\\Source\\ARM</code> folder. + * - arm_cortexM_math.uvproj * * - * The project can be built by opening the appropriate project in MDK-ARM 4.60 chain and defining the optional pre processor MACROs detailed above. + * The libraries can be built by opening the arm_cortexM_math.uvproj project in MDK-ARM, selecting a specific target, and defining the optional pre processor MACROs detailed above. * - * <b>Pre-processor Macros</b> + * Pre-processor Macros + * ------------ * * Each library project have differant pre-processor macros. * @@ -132,9 +131,27 @@ * * Initialize macro __FPU_PRESENT = 1 when building on FPU supported Targets. Enable this macro for M4bf and M4lf libraries * - * <b>Copyright Notice</b> + * <hr> + * CMSIS-DSP in ARM::CMSIS Pack + * ----------------------------- + * + * The following files relevant to CMSIS-DSP are present in the <b>ARM::CMSIS</b> Pack directories: + * |File/Folder |Content | + * |------------------------------|------------------------------------------------------------------------| + * |\b CMSIS\\Documentation\\DSP | This documentation | + * |\b CMSIS\\DSP_Lib | Software license agreement (license.txt) | + * |\b CMSIS\\DSP_Lib\\Examples | Example projects demonstrating the usage of the library functions | + * |\b CMSIS\\DSP_Lib\\Source | Source files for rebuilding the library | + * + * <hr> + * Revision History of CMSIS-DSP + * ------------ + * Please refer to \ref ChangeLog_pg. * - * Copyright (C) 2010-2013 ARM Limited. All rights reserved. + * Copyright Notice + * ------------ + * + * Copyright (C) 2010-2014 ARM Limited. All rights reserved. */ @@ -266,6 +283,7 @@ #define __CMSIS_GENERIC /* disable NVIC and Systick functions */ +#if 0 #if defined (ARM_MATH_CM4) #include "core_cm4.h" #elif defined (ARM_MATH_CM3) @@ -276,10 +294,51 @@ #elif defined (ARM_MATH_CM0PLUS) #include "core_cm0plus.h" #define ARM_MATH_CM0_FAMILY +#elif defined (ARM_MATH_CA9) +#include "core_ca9.h" #else #include "ARMCM4.h" #warning "Define either ARM_MATH_CM4 OR ARM_MATH_CM3...By Default building on ARM_MATH_CM4....." #endif +#else +#include <arm_neon.h> +#if !defined(__CC_ARM) +#define ARM_MATH_CM0_FAMILY +#endif +#define __FPU_USED 1 +#define __INLINE __inline +#if defined(__CC_ARM) +#define __QADD __qadd +#define __QSUB __qsub +#define __QSUB16 __qsub16 +#define __QSUB8 __qsub8 +#define __CLZ __clz +#define __SSAT __ssat +#define __SMUAD __smuad +#define __SMLALD __smlald +#define __QADD16 __qadd16 +#define __SHADD16 __shadd16 +#define __SMUADX __smuadx +#define __SMUSD __smusd +#define __SMUSDX __smusdx +#define __QASX __qasx +#define __QSAX __qsax +#define __SHASX __shasx +#define __SHSAX __shsax +#define __SHSUB16 __shsub16 +#define __SMLAD __smlad +#define __SMLADX __smladx +#define __SMLSDX __smlsdx +#define __ROR __ror +#define __SXTB16 __sxtb16 +#define __SMLALDX __smlaldx +#define __QADD8 __qadd8 +#endif +#define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) << 0) & (int32_t)0x0000FFFF) | \ + (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000) ) +#define __PKHTB(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) << 0) & (int32_t)0xFFFF0000) | \ + (((int32_t)(ARG2) >> ARG3) & (int32_t)0x0000FFFF) ) +#endif #undef __CMSIS_GENERIC /* enable NVIC and Systick functions */ #include "string.h" @@ -305,9 +364,13 @@ * @brief Macros required for SINE and COSINE Fast math approximations */ -#define TABLE_SIZE 256 -#define TABLE_SPACING_Q31 0x800000 -#define TABLE_SPACING_Q15 0x80 +#define FAST_MATH_TABLE_SIZE 512 +#define FAST_MATH_Q31_SHIFT (32 - 10) +#define FAST_MATH_Q15_SHIFT (16 - 10) +#define CONTROLLER_Q31_SHIFT (32 - 9) +#define TABLE_SIZE 256 +#define TABLE_SPACING_Q31 0x400000 +#define TABLE_SPACING_Q15 0x80 /** * @brief Macros required for SINE and COSINE Controller functions @@ -386,6 +449,9 @@ #elif defined __GNUC__ #define __SIMD32_TYPE int32_t #define CMSIS_UNUSED __attribute__((unused)) +#elif defined __CSMC__ /* Cosmic */ +#define CMSIS_UNUSED +#define __SIMD32_TYPE int32_t #else #error Unknown compiler #endif @@ -483,7 +549,10 @@ #if defined (ARM_MATH_CM0_FAMILY) && defined ( __CC_ARM ) #define __CLZ __clz -#elif defined (ARM_MATH_CM0_FAMILY) && ((defined (__ICCARM__)) ||(defined (__GNUC__)) || defined (__TASKING__) ) +#endif + +/* #if defined (ARM_MATH_CM0_FAMILY) && ((defined (__ICCARM__)) ||(defined (__GNUC__)) || defined (__TASKING__) ) */ +#if !defined(__CLZ) static __INLINE uint32_t __CLZ( q31_t data); @@ -728,8 +797,8 @@ q31_t sum; q31_t r, s; - r = (short) x; - s = (short) y; + r = (q15_t) x; + s = (q15_t) y; r = __SSAT(r + s, 16); s = __SSAT(((q31_t) ((x >> 16) + (y >> 16))), 16) << 16; @@ -751,8 +820,8 @@ q31_t sum; q31_t r, s; - r = (short) x; - s = (short) y; + r = (q15_t) x; + s = (q15_t) y; r = ((r >> 1) + (s >> 1)); s = ((q31_t) ((x >> 17) + (y >> 17))) << 16; @@ -774,8 +843,8 @@ q31_t sum; q31_t r, s; - r = (short) x; - s = (short) y; + r = (q15_t) x; + s = (q15_t) y; r = __SSAT(r - s, 16); s = __SSAT(((q31_t) ((x >> 16) - (y >> 16))), 16) << 16; @@ -796,8 +865,8 @@ q31_t diff; q31_t r, s; - r = (short) x; - s = (short) y; + r = (q15_t) x; + s = (q15_t) y; r = ((r >> 1) - (s >> 1)); s = (((x >> 17) - (y >> 17)) << 16); @@ -819,8 +888,8 @@ sum = ((sum + - clip_q31_to_q15((q31_t) ((short) (x >> 16) + (short) y))) << 16) + - clip_q31_to_q15((q31_t) ((short) x - (short) (y >> 16))); + clip_q31_to_q15((q31_t) ((q15_t) (x >> 16) + (q15_t) y))) << 16) + + clip_q31_to_q15((q31_t) ((q15_t) x - (q15_t) (y >> 16))); return sum; } @@ -836,8 +905,8 @@ q31_t sum; q31_t r, s; - r = (short) x; - s = (short) y; + r = (q15_t) x; + s = (q15_t) y; r = ((r >> 1) - (y >> 17)); s = (((x >> 17) + (s >> 1)) << 16); @@ -860,8 +929,8 @@ sum = ((sum + - clip_q31_to_q15((q31_t) ((short) (x >> 16) - (short) y))) << 16) + - clip_q31_to_q15((q31_t) ((short) x + (short) (y >> 16))); + clip_q31_to_q15((q31_t) ((q15_t) (x >> 16) - (q15_t) y))) << 16) + + clip_q31_to_q15((q31_t) ((q15_t) x + (q15_t) (y >> 16))); return sum; } @@ -877,8 +946,8 @@ q31_t sum; q31_t r, s; - r = (short) x; - s = (short) y; + r = (q15_t) x; + s = (q15_t) y; r = ((r >> 1) + (y >> 17)); s = (((x >> 17) - (s >> 1)) << 16); @@ -896,8 +965,8 @@ q31_t y) { - return ((q31_t) (((short) x * (short) (y >> 16)) - - ((short) (x >> 16) * (short) y))); + return ((q31_t) (((q15_t) x * (q15_t) (y >> 16)) - + ((q15_t) (x >> 16) * (q15_t) y))); } /* @@ -908,8 +977,8 @@ q31_t y) { - return ((q31_t) (((short) x * (short) (y >> 16)) + - ((short) (x >> 16) * (short) y))); + return ((q31_t) (((q15_t) x * (q15_t) (y >> 16)) + + ((q15_t) (x >> 16) * (q15_t) y))); } /* @@ -941,8 +1010,8 @@ q31_t sum) { - return (sum + ((short) (x >> 16) * (short) (y >> 16)) + - ((short) x * (short) y)); + return (sum + ((q15_t) (x >> 16) * (q15_t) (y >> 16)) + + ((q15_t) x * (q15_t) y)); } /* @@ -954,8 +1023,8 @@ q31_t sum) { - return (sum + ((short) (x >> 16) * (short) (y)) + - ((short) x * (short) (y >> 16))); + return (sum + ((q15_t) (x >> 16) * (q15_t) (y)) + + ((q15_t) x * (q15_t) (y >> 16))); } /* @@ -967,8 +1036,8 @@ q31_t sum) { - return (sum - ((short) (x >> 16) * (short) (y)) + - ((short) x * (short) (y >> 16))); + return (sum - ((q15_t) (x >> 16) * (q15_t) (y)) + + ((q15_t) x * (q15_t) (y >> 16))); } /* @@ -980,8 +1049,8 @@ q63_t sum) { - return (sum + ((short) (x >> 16) * (short) (y >> 16)) + - ((short) x * (short) y)); + return (sum + ((q15_t) (x >> 16) * (q15_t) (y >> 16)) + + ((q15_t) x * (q15_t) y)); } /* @@ -993,8 +1062,8 @@ q63_t sum) { - return (sum + ((short) (x >> 16) * (short) y)) + - ((short) x * (short) (y >> 16)); + return (sum + ((q15_t) (x >> 16) * (q15_t) y)) + + ((q15_t) x * (q15_t) (y >> 16)); } /* @@ -1476,6 +1545,49 @@ const arm_matrix_instance_q31 * pSrcB, arm_matrix_instance_q31 * pDst); + /** + * @brief Floating-point, complex, matrix multiplication. + * @param[in] *pSrcA points to the first input matrix structure + * @param[in] *pSrcB points to the second input matrix structure + * @param[out] *pDst points to output matrix structure + * @return The function returns either + * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking. + */ + + arm_status arm_mat_cmplx_mult_f32( + const arm_matrix_instance_f32 * pSrcA, + const arm_matrix_instance_f32 * pSrcB, + arm_matrix_instance_f32 * pDst); + + /** + * @brief Q15, complex, matrix multiplication. + * @param[in] *pSrcA points to the first input matrix structure + * @param[in] *pSrcB points to the second input matrix structure + * @param[out] *pDst points to output matrix structure + * @return The function returns either + * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking. + */ + + arm_status arm_mat_cmplx_mult_q15( + const arm_matrix_instance_q15 * pSrcA, + const arm_matrix_instance_q15 * pSrcB, + arm_matrix_instance_q15 * pDst, + q15_t * pScratch); + + /** + * @brief Q31, complex, matrix multiplication. + * @param[in] *pSrcA points to the first input matrix structure + * @param[in] *pSrcB points to the second input matrix structure + * @param[out] *pDst points to output matrix structure + * @return The function returns either + * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking. + */ + + arm_status arm_mat_cmplx_mult_q31( + const arm_matrix_instance_q31 * pSrcA, + const arm_matrix_instance_q31 * pSrcB, + arm_matrix_instance_q31 * pDst); + /** * @brief Floating-point matrix transpose. @@ -1534,7 +1646,7 @@ * @param[in] *pSrcA points to the first input matrix structure * @param[in] *pSrcB points to the second input matrix structure * @param[out] *pDst points to output matrix structure - * @param[in] *pState points to the array for storing intermediate results + * @param[in] *pState points to the array for storing intermediate results * @return The function returns either * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking. */ @@ -2046,7 +2158,6 @@ uint16_t bitRevFactor; /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */ } arm_cfft_radix4_instance_q31; - void arm_cfft_radix4_q31( const arm_cfft_radix4_instance_q31 * S, q31_t * pSrc); @@ -5874,8 +5985,12 @@ #if (__FPU_USED == 1) && defined ( __CC_ARM ) *pOut = __sqrtf(in); #else +#if defined(__GNUC__) + *pOut = __builtin_sqrtf(in); +#else *pOut = sqrtf(in); #endif +#endif return (ARM_MATH_SUCCESS); } @@ -6348,7 +6463,7 @@ void arm_var_q31( q31_t * pSrc, uint32_t blockSize, - q63_t * pResult); + q31_t * pResult); /** * @brief Variance of the elements of a Q15 vector. @@ -6361,7 +6476,7 @@ void arm_var_q15( q15_t * pSrc, uint32_t blockSize, - q31_t * pResult); + q15_t * pResult); /** * @brief Root Mean Square of the elements of a floating-point vector. @@ -7284,6 +7399,24 @@ #define IAR_ONLY_LOW_OPTIMIZATION_EXIT +#elif defined(__CSMC__) // Cosmic +//SMMLA + #define multAcc_32x32_keep32_R(a, x, y) \ + a += (q31_t) (((q63_t) x * y) >> 32) + + //SMMLS + #define multSub_32x32_keep32_R(a, x, y) \ + a -= (q31_t) (((q63_t) x * y) >> 32) + +//SMMUL + #define mult_32x32_keep32_R(a, x, y) \ + a = (q31_t) (((q63_t) x * y ) >> 32) + +#define LOW_OPTIMIZATION_ENTER +#define LOW_OPTIMIZATION_EXIT +#define IAR_ONLY_LOW_OPTIMIZATION_ENTER +#define IAR_ONLY_LOW_OPTIMIZATION_EXIT + #endif