

How to use CMSIS-DSP

Notes on using CMSIS-DSP with RZ/A1(GR-PEACH, GR-LYCHEE, etc.).

  • Tested compiler
    • ARMCompiler5 (ARMCompiler5.06u6)
    • ARMCompiler6.11
    • GNU Tools ARM Embedded (6 2017-q2-update)
    • IAR Systems (Embedded Workbench 8.2) However, "When using NEON" is a build error.

When using only CPU

  1. Copy and update CMSIS Core
    Add the following URL include file to your project.
    In the case of Mbed environment, since the same file exists in "mbed-os\cmsis\TARGET_CORTEX_A" in the mbed-os library, update them.
  2. Add DSP code
    Add Include and Source folders at the following URL to your project.
  3. Exclude unnecessary files Exclude the following files from compilation.
    (Just include other .c files. Duplicate definitions result in an error at compile.)
    - Source/BasicMathFunctions/BasicMathFunctions.c
    - Source/CommonTables/CommonTables.c
    - Source/ComplexMathFunctions/ComplexMathFunctions.c
    - Source/ControllerFunctions/ControllerFunctions.c
    - Source/FastMathFunctions/FastMathFunctions.c
    - Source/FilteringFunctions/FilteringFunctions.c
    - Source/MatrixFunctions/MatrixFunctions.c
    - Source/StatisticsFunctions/StatisticsFunctions.c
    - Source/SupportFunctions/SupportFunctions.c
    - Source/TransformFunctions/TransformFunctions.c

When using 32-bit SIMD instructions

In addition to "When using only CPU", make the following changes.

  1. Add compiler options
    Add the following to the compiler options:

When using NEON

In addition to "When using 32-bit SIMD instructions", make the following changes. IAR causes a build error.

  1. Changing FPU options
    Set the compiler FPU option to "-mfpu=neon.
  2. Add compiler options
    Add the following to the compiler options:
  3. Add ComputeLibrary
    Add the ComputeLibrary folder at the following URL to your project.
  4. Error file correction (for GCC)
    The following file will cause an error related to the type.
    Fix errors in the following files:
    - Source/DistanceFunctions/arm_canberra_distance_f32.c
    Refer to "#if (1) / neon test" in "arm_canberra_distance_f32.c" of "Code correction points".
  5. Error file correction (for Arm Compiler 6)
    Fix errors in the following files:
    - Source/TransformFunctions/arm_bitreversal2.S
    Refer to "/* Arm Compiler 6 */" in "arm_bitreversal2.S" of "Code correction points".

Code correction points


float32_t arm_canberra_distance_f32(const float32_t *pA,const float32_t *pB, uint32_t blockSize)
   float32_t accum=0.0, tmpA, tmpB,diff,sum;
   uint32_t i,blkCnt;
   float32x4_t a,b,c,d,accumV;
   float32x2_t accumV2;
   int32x4_t   isZeroV;
   float32x4_t zeroV = vdupq_n_f32(0.0);

   accumV = vdupq_n_f32(0.0);

   blkCnt = blockSize >> 2;
   while(blkCnt > 0)
        a = vld1q_f32(pA);
        b = vld1q_f32(pB);

        c = vabdq_f32(a,b);

        a = vabsq_f32(a);
        b = vabsq_f32(b);
        a = vaddq_f32(a,b);
#if(1) // neon test
        isZeroV = (int32x4_t)vceqq_f32(a,zeroV);
        isZeroV = vceqq_f32(a,zeroV);

         * May divide by zero when a and b have both the same lane at zero.
        a = vinvq_f32(a);
         * Force result of a division by 0 to 0. It the behavior of the
         * sklearn canberra function.
#if(1) // neon test
        a = (float32x4_t)vbicq_s32((int32x4_t)a,isZeroV);
        a = vbicq_s32(a,isZeroV);
        c = vmulq_f32(c,a);
        accumV = vaddq_f32(accumV,c);

        pA += 4;
        pB += 4;
        blkCnt --;
==  omit ==


== omit ==
#if   defined ( __CC_ARM )     /* Keil */
    #define CODESECT AREA     ||.text||, CODE, READONLY, ALIGN=2
    #define LABEL
#elif defined ( __ARMCC_VERSION ) && ( __ARMCC_VERSION >= 6010050 ) /* Arm Compiler 6 */
    #define CODESECT AREA     ||.text||, CODE, READONLY, ALIGN=2
    #define LABEL
#elif defined ( __IASMARM__ )  /* IAR */
== omit ==



  • 試したコンパイラ:
    • ARMCompiler5 (ARMCompiler5.06u6)
    • ARMCompiler6.11
    • GNU Tools ARM Embedded (6 2017-q2-update)
    • IAR Systems (Embedded Workbench 8.2) 但し「NEONを使用する場合」はビルドエラー。


  1. CMSIS Coreのコピー と 更新
  2. DSPコードの追加
    下記URLの IncludeSource フォルダを自分のプロジェクトに追加する。
  3. 不要ファイルの除外 下記ファイルをコンパイル対象から除外する。
    - Source/BasicMathFunctions/BasicMathFunctions.c
    - Source/CommonTables/CommonTables.c
    - Source/ComplexMathFunctions/ComplexMathFunctions.c
    - Source/ControllerFunctions/ControllerFunctions.c
    - Source/FastMathFunctions/FastMathFunctions.c
    - Source/FilteringFunctions/FilteringFunctions.c
    - Source/MatrixFunctions/MatrixFunctions.c
    - Source/StatisticsFunctions/StatisticsFunctions.c
    - Source/SupportFunctions/SupportFunctions.c
    - Source/TransformFunctions/TransformFunctions.c



  1. コンパイラオプションの追加



  1. FPUのオプションの変更
  2. コンパイラオプションの追加
  3. ComputeLibraryの追加
    下記URLの ComputeLibrary フォルダを自分のプロジェクトに追加する。
  4. エラーファイルの修正 (GCCの場合)
    - Source/DistanceFunctions/arm_canberra_distance_f32.c
    コード修正箇所 arm_canberra_distance_f32.c の「#if(1) neon test」を参照。
  5. エラーファイルの修正 (Arm Compiler 6の場合)
    - Source/TransformFunctions/arm_bitreversal2.S
    コード修正箇所 "arm_bitreversal2.S" の「/* Arm Compiler 6 */」を参照。



float32_t arm_canberra_distance_f32(const float32_t *pA,const float32_t *pB, uint32_t blockSize)
   float32_t accum=0.0, tmpA, tmpB,diff,sum;
   uint32_t i,blkCnt;
   float32x4_t a,b,c,d,accumV;
   float32x2_t accumV2;
   int32x4_t   isZeroV;
   float32x4_t zeroV = vdupq_n_f32(0.0);

   accumV = vdupq_n_f32(0.0);

   blkCnt = blockSize >> 2;
   while(blkCnt > 0)
        a = vld1q_f32(pA);
        b = vld1q_f32(pB);

        c = vabdq_f32(a,b);

        a = vabsq_f32(a);
        b = vabsq_f32(b);
        a = vaddq_f32(a,b);
#if(1) // neon test
        isZeroV = (int32x4_t)vceqq_f32(a,zeroV);
        isZeroV = vceqq_f32(a,zeroV);

         * May divide by zero when a and b have both the same lane at zero.
        a = vinvq_f32(a);
         * Force result of a division by 0 to 0. It the behavior of the
         * sklearn canberra function.
#if(1) // neon test
        a = (float32x4_t)vbicq_s32((int32x4_t)a,isZeroV);
        a = vbicq_s32(a,isZeroV);
        c = vmulq_f32(c,a);
        accumV = vaddq_f32(accumV,c);

        pA += 4;
        pB += 4;
        blkCnt --;
==  omit ==


== omit ==
#if   defined ( __CC_ARM )     /* Keil */
    #define CODESECT AREA     ||.text||, CODE, READONLY, ALIGN=2
    #define LABEL
#elif defined ( __ARMCC_VERSION ) && ( __ARMCC_VERSION >= 6010050 ) /* Arm Compiler 6 */
    #define CODESECT AREA     ||.text||, CODE, READONLY, ALIGN=2
    #define LABEL
#elif defined ( __IASMARM__ )  /* IAR */
== omit ==

Please log in to post comments.