EVAL-CN0540-ARDZ

Analog Devices » Code » EVAL-CN0540-ARDZ

Analog Devices / Mbed OS EVAL-CN0540-ARDZ

Dependencies: platform_drivers LTC26X6 AD77681

Revision 1:9dd7c64b4a64, committed 2021-12-06

Comitter:: jngarlitos
Date:: Mon Dec 06 05:22:28 2021 +0000
Parent:: 0:b9debc14d077
Commit message:: EVAL-CN0540-ARDZ mbed example program Initial Commit

Changed in this revision

CMSIS-DSP_Lib/Include/arm_common_tables.h	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Include/arm_const_structs.h	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Include/arm_math.h	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/CommonTables/arm_common_tables.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/CommonTables/arm_const_structs.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_sin_cos_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/ControllerFunctions/arm_sin_cos_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sqrt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sqrt_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_32x64_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_32x64_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_f64.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_init_f64.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_stereo_df2T_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_stereo_df2T_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_opt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_opt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_opt_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_opt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_opt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_opt_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_opt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_opt_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_opt_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_inverse_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_inverse_f64.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_fast_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_fast_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_float.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_float.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_q7.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_float.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_q15.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_q31.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/TransformFunctions/arm_bitreversal.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/TransformFunctions/arm_cfft_radix4_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/Source/TransformFunctions/arm_cfft_radix4_init_f32.c	Show annotated file Show diff for this revision Revisions of this file
CMSIS-DSP_Lib/license.txt	Show annotated file Show diff for this revision Revisions of this file
CN0540_FFT/cn0540_adi_fft.c	Show annotated file Show diff for this revision Revisions of this file
CN0540_FFT/cn0540_adi_fft.h	Show annotated file Show diff for this revision Revisions of this file
CN0540_FFT/cn0540_windowing.h	Show annotated file Show diff for this revision Revisions of this file
LICENSE.md	Show annotated file Show diff for this revision Revisions of this file
LTC26X6.lib	Show annotated file Show diff for this revision Revisions of this file
README.txt	Show annotated file Show diff for this revision Revisions of this file
cn0540_app_config.h	Show annotated file Show diff for this revision Revisions of this file
cn0540_init_params.h	Show annotated file Show diff for this revision Revisions of this file
main.cpp	Show annotated file Show diff for this revision Revisions of this file
main.h	Show annotated file Show diff for this revision Revisions of this file
mbed-os.lib	Show annotated file Show diff for this revision Revisions of this file
platform_drivers.lib	Show annotated file Show diff for this revision Revisions of this file

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Include/arm_common_tables.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Include/arm_common_tables.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,136 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
+*
+* $Date:        19. October 2015
+* $Revision:    V.1.4.5 a
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_common_tables.h
+*
+* Description:  This file has extern declaration for common tables like Bitreverse, reciprocal etc which are used across different functions
+*
+* Target Processor: Cortex-M4/Cortex-M3
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#ifndef _ARM_COMMON_TABLES_H
+#define _ARM_COMMON_TABLES_H
+
+#include "arm_math.h"
+
+extern const uint16_t armBitRevTable[1024];
+extern const q15_t armRecipTableQ15[64];
+extern const q31_t armRecipTableQ31[64];
+/* extern const q31_t realCoefAQ31[1024]; */
+/* extern const q31_t realCoefBQ31[1024]; */
+extern const float32_t twiddleCoef_16[32];
+extern const float32_t twiddleCoef_32[64];
+extern const float32_t twiddleCoef_64[128];
+extern const float32_t twiddleCoef_128[256];
+extern const float32_t twiddleCoef_256[512];
+extern const float32_t twiddleCoef_512[1024];
+extern const float32_t twiddleCoef_1024[2048];
+extern const float32_t twiddleCoef_2048[4096];
+extern const float32_t twiddleCoef_4096[8192];
+#define twiddleCoef twiddleCoef_4096
+extern const q31_t twiddleCoef_16_q31[24];
+extern const q31_t twiddleCoef_32_q31[48];
+extern const q31_t twiddleCoef_64_q31[96];
+extern const q31_t twiddleCoef_128_q31[192];
+extern const q31_t twiddleCoef_256_q31[384];
+extern const q31_t twiddleCoef_512_q31[768];
+extern const q31_t twiddleCoef_1024_q31[1536];
+extern const q31_t twiddleCoef_2048_q31[3072];
+extern const q31_t twiddleCoef_4096_q31[6144];
+extern const q15_t twiddleCoef_16_q15[24];
+extern const q15_t twiddleCoef_32_q15[48];
+extern const q15_t twiddleCoef_64_q15[96];
+extern const q15_t twiddleCoef_128_q15[192];
+extern const q15_t twiddleCoef_256_q15[384];
+extern const q15_t twiddleCoef_512_q15[768];
+extern const q15_t twiddleCoef_1024_q15[1536];
+extern const q15_t twiddleCoef_2048_q15[3072];
+extern const q15_t twiddleCoef_4096_q15[6144];
+extern const float32_t twiddleCoef_rfft_32[32];
+extern const float32_t twiddleCoef_rfft_64[64];
+extern const float32_t twiddleCoef_rfft_128[128];
+extern const float32_t twiddleCoef_rfft_256[256];
+extern const float32_t twiddleCoef_rfft_512[512];
+extern const float32_t twiddleCoef_rfft_1024[1024];
+extern const float32_t twiddleCoef_rfft_2048[2048];
+extern const float32_t twiddleCoef_rfft_4096[4096];
+
+
+/* floating-point bit reversal tables */
+#define ARMBITREVINDEXTABLE__16_TABLE_LENGTH ((uint16_t)20  )
+#define ARMBITREVINDEXTABLE__32_TABLE_LENGTH ((uint16_t)48  )
+#define ARMBITREVINDEXTABLE__64_TABLE_LENGTH ((uint16_t)56  )
+#define ARMBITREVINDEXTABLE_128_TABLE_LENGTH ((uint16_t)208 )
+#define ARMBITREVINDEXTABLE_256_TABLE_LENGTH ((uint16_t)440 )
+#define ARMBITREVINDEXTABLE_512_TABLE_LENGTH ((uint16_t)448 )
+#define ARMBITREVINDEXTABLE1024_TABLE_LENGTH ((uint16_t)1800)
+#define ARMBITREVINDEXTABLE2048_TABLE_LENGTH ((uint16_t)3808)
+#define ARMBITREVINDEXTABLE4096_TABLE_LENGTH ((uint16_t)4032)
+
+extern const uint16_t armBitRevIndexTable16[ARMBITREVINDEXTABLE__16_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable32[ARMBITREVINDEXTABLE__32_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable64[ARMBITREVINDEXTABLE__64_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable128[ARMBITREVINDEXTABLE_128_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable256[ARMBITREVINDEXTABLE_256_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable512[ARMBITREVINDEXTABLE_512_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable1024[ARMBITREVINDEXTABLE1024_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable2048[ARMBITREVINDEXTABLE2048_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable4096[ARMBITREVINDEXTABLE4096_TABLE_LENGTH];
+
+/* fixed-point bit reversal tables */
+#define ARMBITREVINDEXTABLE_FIXED___16_TABLE_LENGTH ((uint16_t)12  )
+#define ARMBITREVINDEXTABLE_FIXED___32_TABLE_LENGTH ((uint16_t)24  )
+#define ARMBITREVINDEXTABLE_FIXED___64_TABLE_LENGTH ((uint16_t)56  )
+#define ARMBITREVINDEXTABLE_FIXED__128_TABLE_LENGTH ((uint16_t)112 )
+#define ARMBITREVINDEXTABLE_FIXED__256_TABLE_LENGTH ((uint16_t)240 )
+#define ARMBITREVINDEXTABLE_FIXED__512_TABLE_LENGTH ((uint16_t)480 )
+#define ARMBITREVINDEXTABLE_FIXED_1024_TABLE_LENGTH ((uint16_t)992 )
+#define ARMBITREVINDEXTABLE_FIXED_2048_TABLE_LENGTH ((uint16_t)1984)
+#define ARMBITREVINDEXTABLE_FIXED_4096_TABLE_LENGTH ((uint16_t)4032)
+
+extern const uint16_t armBitRevIndexTable_fixed_16[ARMBITREVINDEXTABLE_FIXED___16_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_32[ARMBITREVINDEXTABLE_FIXED___32_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_64[ARMBITREVINDEXTABLE_FIXED___64_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_128[ARMBITREVINDEXTABLE_FIXED__128_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_256[ARMBITREVINDEXTABLE_FIXED__256_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_512[ARMBITREVINDEXTABLE_FIXED__512_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_1024[ARMBITREVINDEXTABLE_FIXED_1024_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_2048[ARMBITREVINDEXTABLE_FIXED_2048_TABLE_LENGTH];
+extern const uint16_t armBitRevIndexTable_fixed_4096[ARMBITREVINDEXTABLE_FIXED_4096_TABLE_LENGTH];
+
+/* Tables for Fast Math Sine and Cosine */
+extern const float32_t sinTable_f32[FAST_MATH_TABLE_SIZE + 1];
+extern const q31_t sinTable_q31[FAST_MATH_TABLE_SIZE + 1];
+extern const q15_t sinTable_q15[FAST_MATH_TABLE_SIZE + 1];
+
+#endif /*  ARM_COMMON_TABLES_H */
\ No newline at end of file

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Include/arm_const_structs.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Include/arm_const_structs.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,79 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
+*
+* $Date:        19. March 2015
+* $Revision:    V.1.4.5
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_const_structs.h
+*
+* Description:  This file has constant structs that are initialized for
+*              user convenience.  For example, some can be given as
+*              arguments to the arm_cfft_f32() function.
+*
+* Target Processor: Cortex-M4/Cortex-M3
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#ifndef _ARM_CONST_STRUCTS_H
+#define _ARM_CONST_STRUCTS_H
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len16;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len32;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len64;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len128;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len256;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len512;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len1024;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len2048;
+   extern const arm_cfft_instance_f32 arm_cfft_sR_f32_len4096;
+
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len16;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len32;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len64;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len128;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len256;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len512;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len1024;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len2048;
+   extern const arm_cfft_instance_q31 arm_cfft_sR_q31_len4096;
+
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len16;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len32;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len64;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len128;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len256;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len512;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len1024;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len2048;
+   extern const arm_cfft_instance_q15 arm_cfft_sR_q15_len4096;
+
+#endif
\ No newline at end of file

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Include/arm_math.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Include/arm_math.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,7154 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2015 ARM Limited. All rights reserved.
+*
+* $Date:        20. October 2015
+* $Revision:    V1.4.5 b
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_math.h
+*
+* Description:  Public header file for CMSIS DSP Library
+*
+* Target Processor: Cortex-M7/Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+ * -------------------------------------------------------------------- */
+
+/**
+   \mainpage CMSIS DSP Software Library
+   *
+   * Introduction
+   * ------------
+   *
+   * This user manual describes the CMSIS DSP software library,
+   * a suite of common signal processing functions for use on Cortex-M processor based devices.
+   *
+   * The library is divided into a number of functions each covering a specific category:
+   * - Basic math functions
+   * - Fast math functions
+   * - Complex math functions
+   * - Filters
+   * - Matrix functions
+   * - Transforms
+   * - Motor control functions
+   * - Statistical functions
+   * - Support functions
+   * - Interpolation functions
+   *
+   * The library has separate functions for operating on 8-bit integers, 16-bit integers,
+   * 32-bit integer and 32-bit floating-point values.
+   *
+   * Using the Library
+   * ------------
+   *
+   * The library installer contains prebuilt versions of the libraries in the <code>Lib</code> folder.
+   * - arm_cortexM7lfdp_math.lib (Little endian and Double Precision Floating Point Unit on Cortex-M7)
+   * - arm_cortexM7bfdp_math.lib (Big endian and Double Precision Floating Point Unit on Cortex-M7)
+   * - arm_cortexM7lfsp_math.lib (Little endian and Single Precision Floating Point Unit on Cortex-M7)
+   * - arm_cortexM7bfsp_math.lib (Big endian and Single Precision Floating Point Unit on Cortex-M7)
+   * - arm_cortexM7l_math.lib (Little endian on Cortex-M7)
+   * - arm_cortexM7b_math.lib (Big endian on Cortex-M7)
+   * - arm_cortexM4lf_math.lib (Little endian and Floating Point Unit on Cortex-M4)
+   * - arm_cortexM4bf_math.lib (Big endian and Floating Point Unit on Cortex-M4)
+   * - arm_cortexM4l_math.lib (Little endian on Cortex-M4)
+   * - arm_cortexM4b_math.lib (Big endian on Cortex-M4)
+   * - arm_cortexM3l_math.lib (Little endian on Cortex-M3)
+   * - arm_cortexM3b_math.lib (Big endian on Cortex-M3)
+   * - arm_cortexM0l_math.lib (Little endian on Cortex-M0 / CortexM0+)
+   * - arm_cortexM0b_math.lib (Big endian on Cortex-M0 / CortexM0+)
+   *
+   * The library functions are declared in the public file <code>arm_math.h</code> which is placed in the <code>Include</code> folder.
+   * Simply include this file and link the appropriate library in the application and begin calling the library functions. The Library supports single
+   * public header file <code> arm_math.h</code> for Cortex-M7/M4/M3/M0/M0+ with little endian and big endian. Same header file will be used for floating point unit(FPU) variants.
+   * Define the appropriate pre processor MACRO ARM_MATH_CM7 or ARM_MATH_CM4 or  ARM_MATH_CM3 or
+   * ARM_MATH_CM0 or ARM_MATH_CM0PLUS depending on the target processor in the application.
+   *
+   * Examples
+   * --------
+   *
+   * The library ships with a number of examples which demonstrate how to use the library functions.
+   *
+   * Toolchain Support
+   * ------------
+   *
+   * The library has been developed and tested with MDK-ARM version 5.14.0.0
+   * The library is being tested in GCC and IAR toolchains and updates on this activity will be made available shortly.
+   *
+   * Building the Library
+   * ------------
+   *
+   * The library installer contains a project file to re build libraries on MDK-ARM Tool chain in the <code>CMSIS\\DSP_Lib\\Source\\ARM</code> folder.
+   * - arm_cortexM_math.uvprojx
+   *
+   *
+   * The libraries can be built by opening the arm_cortexM_math.uvprojx project in MDK-ARM, selecting a specific target, and defining the optional pre processor MACROs detailed above.
+   *
+   * Pre-processor Macros
+   * ------------
+   *
+   * Each library project have differant pre-processor macros.
+   *
+   * - UNALIGNED_SUPPORT_DISABLE:
+   *
+   * Define macro UNALIGNED_SUPPORT_DISABLE, If the silicon does not support unaligned memory access
+   *
+   * - ARM_MATH_BIG_ENDIAN:
+   *
+   * Define macro ARM_MATH_BIG_ENDIAN to build the library for big endian targets. By default library builds for little endian targets.
+   *
+   * - ARM_MATH_MATRIX_CHECK:
+   *
+   * Define macro ARM_MATH_MATRIX_CHECK for checking on the input and output sizes of matrices
+   *
+   * - ARM_MATH_ROUNDING:
+   *
+   * Define macro ARM_MATH_ROUNDING for rounding on support functions
+   *
+   * - ARM_MATH_CMx:
+   *
+   * Define macro ARM_MATH_CM4 for building the library on Cortex-M4 target, ARM_MATH_CM3 for building library on Cortex-M3 target
+   * and ARM_MATH_CM0 for building library on Cortex-M0 target, ARM_MATH_CM0PLUS for building library on Cortex-M0+ target, and
+   * ARM_MATH_CM7 for building the library on cortex-M7.
+   *
+   * - __FPU_PRESENT:
+   *
+   * Initialize macro __FPU_PRESENT = 1 when building on FPU supported Targets. Enable this macro for M4bf and M4lf libraries
+   *
+   * <hr>
+   * CMSIS-DSP in ARM::CMSIS Pack
+   * -----------------------------
+   *
+   * The following files relevant to CMSIS-DSP are present in the <b>ARM::CMSIS</b> Pack directories:
+   * |File/Folder                   |Content                                                                 |
+   * |------------------------------|------------------------------------------------------------------------|
+   * |\b CMSIS\\Documentation\\DSP  | This documentation                                                     |
+   * |\b CMSIS\\DSP_Lib             | Software license agreement (license.txt)                               |
+   * |\b CMSIS\\DSP_Lib\\Examples   | Example projects demonstrating the usage of the library functions      |
+   * |\b CMSIS\\DSP_Lib\\Source     | Source files for rebuilding the library                                |
+   *
+   * <hr>
+   * Revision History of CMSIS-DSP
+   * ------------
+   * Please refer to \ref ChangeLog_pg.
+   *
+   * Copyright Notice
+   * ------------
+   *
+   * Copyright (C) 2010-2015 ARM Limited. All rights reserved.
+   */
+
+
+/**
+ * @defgroup groupMath Basic Math Functions
+ */
+
+/**
+ * @defgroup groupFastMath Fast Math Functions
+ * This set of functions provides a fast approximation to sine, cosine, and square root.
+ * As compared to most of the other functions in the CMSIS math library, the fast math functions
+ * operate on individual values and not arrays.
+ * There are separate functions for Q15, Q31, and floating-point data.
+ *
+ */
+
+/**
+ * @defgroup groupCmplxMath Complex Math Functions
+ * This set of functions operates on complex data vectors.
+ * The data in the complex arrays is stored in an interleaved fashion
+ * (real, imag, real, imag, ...).
+ * In the API functions, the number of samples in a complex array refers
+ * to the number of complex values; the array contains twice this number of
+ * real values.
+ */
+
+/**
+ * @defgroup groupFilters Filtering Functions
+ */
+
+/**
+ * @defgroup groupMatrix Matrix Functions
+ *
+ * This set of functions provides basic matrix math operations.
+ * The functions operate on matrix data structures.  For example,
+ * the type
+ * definition for the floating-point matrix structure is shown
+ * below:
+ * <pre>
+ *     typedef struct
+ *     {
+ *       uint16_t numRows;     // number of rows of the matrix.
+ *       uint16_t numCols;     // number of columns of the matrix.
+ *       float32_t *pData;     // points to the data of the matrix.
+ *     } arm_matrix_instance_f32;
+ * </pre>
+ * There are similar definitions for Q15 and Q31 data types.
+ *
+ * The structure specifies the size of the matrix and then points to
+ * an array of data.  The array is of size <code>numRows X numCols</code>
+ * and the values are arranged in row order.  That is, the
+ * matrix element (i, j) is stored at:
+ * <pre>
+ *     pData[i*numCols + j]
+ * </pre>
+ *
+ * \par Init Functions
+ * There is an associated initialization function for each type of matrix
+ * data structure.
+ * The initialization function sets the values of the internal structure fields.
+ * Refer to the function <code>arm_mat_init_f32()</code>, <code>arm_mat_init_q31()</code>
+ * and <code>arm_mat_init_q15()</code> for floating-point, Q31 and Q15 types,  respectively.
+ *
+ * \par
+ * Use of the initialization function is optional. However, if initialization function is used
+ * then the instance structure cannot be placed into a const data section.
+ * To place the instance structure in a const data
+ * section, manually initialize the data structure.  For example:
+ * <pre>
+ * <code>arm_matrix_instance_f32 S = {nRows, nColumns, pData};</code>
+ * <code>arm_matrix_instance_q31 S = {nRows, nColumns, pData};</code>
+ * <code>arm_matrix_instance_q15 S = {nRows, nColumns, pData};</code>
+ * </pre>
+ * where <code>nRows</code> specifies the number of rows, <code>nColumns</code>
+ * specifies the number of columns, and <code>pData</code> points to the
+ * data array.
+ *
+ * \par Size Checking
+ * By default all of the matrix functions perform size checking on the input and
+ * output matrices.  For example, the matrix addition function verifies that the
+ * two input matrices and the output matrix all have the same number of rows and
+ * columns.  If the size check fails the functions return:
+ * <pre>
+ *     ARM_MATH_SIZE_MISMATCH
+ * </pre>
+ * Otherwise the functions return
+ * <pre>
+ *     ARM_MATH_SUCCESS
+ * </pre>
+ * There is some overhead associated with this matrix size checking.
+ * The matrix size checking is enabled via the \#define
+ * <pre>
+ *     ARM_MATH_MATRIX_CHECK
+ * </pre>
+ * within the library project settings.  By default this macro is defined
+ * and size checking is enabled.  By changing the project settings and
+ * undefining this macro size checking is eliminated and the functions
+ * run a bit faster.  With size checking disabled the functions always
+ * return <code>ARM_MATH_SUCCESS</code>.
+ */
+
+/**
+ * @defgroup groupTransforms Transform Functions
+ */
+
+/**
+ * @defgroup groupController Controller Functions
+ */
+
+/**
+ * @defgroup groupStats Statistics Functions
+ */
+/**
+ * @defgroup groupSupport Support Functions
+ */
+
+/**
+ * @defgroup groupInterpolation Interpolation Functions
+ * These functions perform 1- and 2-dimensional interpolation of data.
+ * Linear interpolation is used for 1-dimensional data and
+ * bilinear interpolation is used for 2-dimensional data.
+ */
+
+/**
+ * @defgroup groupExamples Examples
+ */
+#ifndef _ARM_MATH_H
+#define _ARM_MATH_H
+
+/* ignore some GCC warnings */
+#if defined ( __GNUC__ )
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wsign-conversion"
+#pragma GCC diagnostic ignored "-Wconversion"
+#pragma GCC diagnostic ignored "-Wunused-parameter"
+#endif
+
+#define __CMSIS_GENERIC         /* disable NVIC and Systick functions */
+
+#if defined(ARM_MATH_CM7)
+  #include "core_cm7.h"
+#elif defined (ARM_MATH_CM4)
+  #include "core_cm4.h"
+#elif defined (ARM_MATH_CM3)
+  #include "core_cm3.h"
+#elif defined (ARM_MATH_CM0)
+  #include "core_cm0.h"
+  #define ARM_MATH_CM0_FAMILY
+#elif defined (ARM_MATH_CM0PLUS)
+  #include "core_cm0plus.h"
+  #define ARM_MATH_CM0_FAMILY
+#else
+  #error "Define according the used Cortex core ARM_MATH_CM7, ARM_MATH_CM4, ARM_MATH_CM3, ARM_MATH_CM0PLUS or ARM_MATH_CM0"
+#endif
+
+#undef  __CMSIS_GENERIC         /* enable NVIC and Systick functions */
+#include "string.h"
+#include "math.h"
+#ifdef   __cplusplus
+extern "C"
+{
+#endif
+
+
+  /**
+   * @brief Macros required for reciprocal calculation in Normalized LMS
+   */
+
+#define DELTA_Q31          (0x100)
+#define DELTA_Q15          0x5
+#define INDEX_MASK         0x0000003F
+#ifndef PI
+#define PI                 3.14159265358979f
+#endif
+
+  /**
+   * @brief Macros required for SINE and COSINE Fast math approximations
+   */
+
+#define FAST_MATH_TABLE_SIZE  512
+#define FAST_MATH_Q31_SHIFT   (32 - 10)
+#define FAST_MATH_Q15_SHIFT   (16 - 10)
+#define CONTROLLER_Q31_SHIFT  (32 - 9)
+#define TABLE_SIZE  256
+#define TABLE_SPACING_Q31     0x400000
+#define TABLE_SPACING_Q15     0x80
+
+  /**
+   * @brief Macros required for SINE and COSINE Controller functions
+   */
+  /* 1.31(q31) Fixed value of 2/360 */
+  /* -1 to +1 is divided into 360 values so total spacing is (2/360) */
+#define INPUT_SPACING         0xB60B61
+
+  /**
+   * @brief Macro for Unaligned Support
+   */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+    #define ALIGN4
+#else
+  #if defined  (__GNUC__)
+    #define ALIGN4 __attribute__((aligned(4)))
+  #else
+    #define ALIGN4 __align(4)
+  #endif
+#endif   /* #ifndef UNALIGNED_SUPPORT_DISABLE */
+
+  /**
+   * @brief Error status returned by some functions in the library.
+   */
+
+  typedef enum
+  {
+    ARM_MATH_SUCCESS = 0,                /**< No error */
+    ARM_MATH_ARGUMENT_ERROR = -1,        /**< One or more arguments are incorrect */
+    ARM_MATH_LENGTH_ERROR = -2,          /**< Length of data buffer is incorrect */
+    ARM_MATH_SIZE_MISMATCH = -3,         /**< Size of matrices is not compatible with the operation. */
+    ARM_MATH_NANINF = -4,                /**< Not-a-number (NaN) or infinity is generated */
+    ARM_MATH_SINGULAR = -5,              /**< Generated by matrix inversion if the input matrix is singular and cannot be inverted. */
+    ARM_MATH_TEST_FAILURE = -6           /**< Test Failed  */
+  } arm_status;
+
+  /**
+   * @brief 8-bit fractional data type in 1.7 format.
+   */
+  typedef int8_t q7_t;
+
+  /**
+   * @brief 16-bit fractional data type in 1.15 format.
+   */
+  typedef int16_t q15_t;
+
+  /**
+   * @brief 32-bit fractional data type in 1.31 format.
+   */
+  typedef int32_t q31_t;
+
+  /**
+   * @brief 64-bit fractional data type in 1.63 format.
+   */
+  typedef int64_t q63_t;
+
+  /**
+   * @brief 32-bit floating-point type definition.
+   */
+  typedef float float32_t;
+
+  /**
+   * @brief 64-bit floating-point type definition.
+   */
+  typedef double float64_t;
+
+  /**
+   * @brief definition to read/write two 16 bit values.
+   */
+#if defined __CC_ARM
+  #define __SIMD32_TYPE int32_t __packed
+  #define CMSIS_UNUSED __attribute__((unused))
+
+#elif defined(__ARMCC_VERSION) && (__ARMCC_VERSION >= 6010050)
+  #define __SIMD32_TYPE int32_t
+  #define CMSIS_UNUSED __attribute__((unused))
+
+#elif defined __GNUC__
+  #define __SIMD32_TYPE int32_t
+  #define CMSIS_UNUSED __attribute__((unused))
+
+#elif defined __ICCARM__
+  #define __SIMD32_TYPE int32_t __packed
+  #define CMSIS_UNUSED
+
+#elif defined __CSMC__
+  #define __SIMD32_TYPE int32_t
+  #define CMSIS_UNUSED
+
+#elif defined __TASKING__
+  #define __SIMD32_TYPE __unaligned int32_t
+  #define CMSIS_UNUSED
+
+#else
+  #error Unknown compiler
+#endif
+
+#define __SIMD32(addr)        (*(__SIMD32_TYPE **) & (addr))
+#define __SIMD32_CONST(addr)  ((__SIMD32_TYPE *)(addr))
+#define _SIMD32_OFFSET(addr)  (*(__SIMD32_TYPE *)  (addr))
+#define __SIMD64(addr)        (*(int64_t **) & (addr))
+
+#if defined (ARM_MATH_CM3) || defined (ARM_MATH_CM0_FAMILY)
+  /**
+   * @brief definition to pack two 16 bit values.
+   */
+#define __PKHBT(ARG1, ARG2, ARG3)      ( (((int32_t)(ARG1) <<  0) & (int32_t)0x0000FFFF) | \
+                                         (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000)  )
+#define __PKHTB(ARG1, ARG2, ARG3)      ( (((int32_t)(ARG1) <<  0) & (int32_t)0xFFFF0000) | \
+                                         (((int32_t)(ARG2) >> ARG3) & (int32_t)0x0000FFFF)  )
+
+#endif
+
+
+   /**
+   * @brief definition to pack four 8 bit values.
+   */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+#define __PACKq7(v0,v1,v2,v3) ( (((int32_t)(v0) <<  0) & (int32_t)0x000000FF) | \
+                                (((int32_t)(v1) <<  8) & (int32_t)0x0000FF00) | \
+                                (((int32_t)(v2) << 16) & (int32_t)0x00FF0000) | \
+                                (((int32_t)(v3) << 24) & (int32_t)0xFF000000)  )
+#else
+
+#define __PACKq7(v0,v1,v2,v3) ( (((int32_t)(v3) <<  0) & (int32_t)0x000000FF) | \
+                                (((int32_t)(v2) <<  8) & (int32_t)0x0000FF00) | \
+                                (((int32_t)(v1) << 16) & (int32_t)0x00FF0000) | \
+                                (((int32_t)(v0) << 24) & (int32_t)0xFF000000)  )
+
+#endif
+
+
+  /**
+   * @brief Clips Q63 to Q31 values.
+   */
+  static __INLINE q31_t clip_q63_to_q31(
+  q63_t x)
+  {
+    return ((q31_t) (x >> 32) != ((q31_t) x >> 31)) ?
+      ((0x7FFFFFFF ^ ((q31_t) (x >> 63)))) : (q31_t) x;
+  }
+
+  /**
+   * @brief Clips Q63 to Q15 values.
+   */
+  static __INLINE q15_t clip_q63_to_q15(
+  q63_t x)
+  {
+    return ((q31_t) (x >> 32) != ((q31_t) x >> 31)) ?
+      ((0x7FFF ^ ((q15_t) (x >> 63)))) : (q15_t) (x >> 15);
+  }
+
+  /**
+   * @brief Clips Q31 to Q7 values.
+   */
+  static __INLINE q7_t clip_q31_to_q7(
+  q31_t x)
+  {
+    return ((q31_t) (x >> 24) != ((q31_t) x >> 23)) ?
+      ((0x7F ^ ((q7_t) (x >> 31)))) : (q7_t) x;
+  }
+
+  /**
+   * @brief Clips Q31 to Q15 values.
+   */
+  static __INLINE q15_t clip_q31_to_q15(
+  q31_t x)
+  {
+    return ((q31_t) (x >> 16) != ((q31_t) x >> 15)) ?
+      ((0x7FFF ^ ((q15_t) (x >> 31)))) : (q15_t) x;
+  }
+
+  /**
+   * @brief Multiplies 32 X 64 and returns 32 bit result in 2.30 format.
+   */
+
+  static __INLINE q63_t mult32x64(
+  q63_t x,
+  q31_t y)
+  {
+    return ((((q63_t) (x & 0x00000000FFFFFFFF) * y) >> 32) +
+            (((q63_t) (x >> 32) * y)));
+  }
+
+/*
+  #if defined (ARM_MATH_CM0_FAMILY) && defined ( __CC_ARM   )
+  #define __CLZ __clz
+  #endif
+ */
+/* note: function can be removed when all toolchain support __CLZ for Cortex-M0 */
+#if defined (ARM_MATH_CM0_FAMILY) && ((defined (__ICCARM__))  )
+  static __INLINE uint32_t __CLZ(
+  q31_t data);
+
+  static __INLINE uint32_t __CLZ(
+  q31_t data)
+  {
+    uint32_t count = 0;
+    uint32_t mask = 0x80000000;
+
+    while((data & mask) == 0)
+    {
+      count += 1u;
+      mask = mask >> 1u;
+    }
+
+    return (count);
+  }
+#endif
+
+  /**
+   * @brief Function to Calculates 1/in (reciprocal) value of Q31 Data type.
+   */
+
+  static __INLINE uint32_t arm_recip_q31(
+  q31_t in,
+  q31_t * dst,
+  q31_t * pRecipTable)
+  {
+    q31_t out;
+    uint32_t tempVal;
+    uint32_t index, i;
+    uint32_t signBits;
+
+    if(in > 0)
+    {
+      signBits = ((uint32_t) (__CLZ( in) - 1));
+    }
+    else
+    {
+      signBits = ((uint32_t) (__CLZ(-in) - 1));
+    }
+
+    /* Convert input sample to 1.31 format */
+    in = (in << signBits);
+
+    /* calculation of index for initial approximated Val */
+    index = (uint32_t)(in >> 24);
+    index = (index & INDEX_MASK);
+
+    /* 1.31 with exp 1 */
+    out = pRecipTable[index];
+
+    /* calculation of reciprocal value */
+    /* running approximation for two iterations */
+    for (i = 0u; i < 2u; i++)
+    {
+      tempVal = (uint32_t) (((q63_t) in * out) >> 31);
+      tempVal = 0x7FFFFFFFu - tempVal;
+      /*      1.31 with exp 1 */
+      /* out = (q31_t) (((q63_t) out * tempVal) >> 30); */
+      out = clip_q63_to_q31(((q63_t) out * tempVal) >> 30);
+    }
+
+    /* write output */
+    *dst = out;
+
+    /* return num of signbits of out = 1/in value */
+    return (signBits + 1u);
+  }
+
+
+  /**
+   * @brief Function to Calculates 1/in (reciprocal) value of Q15 Data type.
+   */
+  static __INLINE uint32_t arm_recip_q15(
+  q15_t in,
+  q15_t * dst,
+  q15_t * pRecipTable)
+  {
+    q15_t out = 0;
+    uint32_t tempVal = 0;
+    uint32_t index = 0, i = 0;
+    uint32_t signBits = 0;
+
+    if(in > 0)
+    {
+      signBits = ((uint32_t)(__CLZ( in) - 17));
+    }
+    else
+    {
+      signBits = ((uint32_t)(__CLZ(-in) - 17));
+    }
+
+    /* Convert input sample to 1.15 format */
+    in = (in << signBits);
+
+    /* calculation of index for initial approximated Val */
+    index = (uint32_t)(in >>  8);
+    index = (index & INDEX_MASK);
+
+    /*      1.15 with exp 1  */
+    out = pRecipTable[index];
+
+    /* calculation of reciprocal value */
+    /* running approximation for two iterations */
+    for (i = 0u; i < 2u; i++)
+    {
+      tempVal = (uint32_t) (((q31_t) in * out) >> 15);
+      tempVal = 0x7FFFu - tempVal;
+      /*      1.15 with exp 1 */
+      out = (q15_t) (((q31_t) out * tempVal) >> 14);
+      /* out = clip_q31_to_q15(((q31_t) out * tempVal) >> 14); */
+    }
+
+    /* write output */
+    *dst = out;
+
+    /* return num of signbits of out = 1/in value */
+    return (signBits + 1);
+  }
+
+
+  /*
+   * @brief C custom defined intrinisic function for only M0 processors
+   */
+#if defined(ARM_MATH_CM0_FAMILY)
+  static __INLINE q31_t __SSAT(
+  q31_t x,
+  uint32_t y)
+  {
+    int32_t posMax, negMin;
+    uint32_t i;
+
+    posMax = 1;
+    for (i = 0; i < (y - 1); i++)
+    {
+      posMax = posMax * 2;
+    }
+
+    if(x > 0)
+    {
+      posMax = (posMax - 1);
+
+      if(x > posMax)
+      {
+        x = posMax;
+      }
+    }
+    else
+    {
+      negMin = -posMax;
+
+      if(x < negMin)
+      {
+        x = negMin;
+      }
+    }
+    return (x);
+  }
+#endif /* end of ARM_MATH_CM0_FAMILY */
+
+
+  /*
+   * @brief C custom defined intrinsic function for M3 and M0 processors
+   */
+#if defined (ARM_MATH_CM3) || defined (ARM_MATH_CM0_FAMILY)
+
+  /*
+   * @brief C custom defined QADD8 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __QADD8(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s, t, u;
+
+    r = __SSAT(((((q31_t)x << 24) >> 24) + (((q31_t)y << 24) >> 24)), 8) & (int32_t)0x000000FF;
+    s = __SSAT(((((q31_t)x << 16) >> 24) + (((q31_t)y << 16) >> 24)), 8) & (int32_t)0x000000FF;
+    t = __SSAT(((((q31_t)x <<  8) >> 24) + (((q31_t)y <<  8) >> 24)), 8) & (int32_t)0x000000FF;
+    u = __SSAT(((((q31_t)x      ) >> 24) + (((q31_t)y      ) >> 24)), 8) & (int32_t)0x000000FF;
+
+    return ((uint32_t)((u << 24) | (t << 16) | (s <<  8) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined QSUB8 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __QSUB8(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s, t, u;
+
+    r = __SSAT(((((q31_t)x << 24) >> 24) - (((q31_t)y << 24) >> 24)), 8) & (int32_t)0x000000FF;
+    s = __SSAT(((((q31_t)x << 16) >> 24) - (((q31_t)y << 16) >> 24)), 8) & (int32_t)0x000000FF;
+    t = __SSAT(((((q31_t)x <<  8) >> 24) - (((q31_t)y <<  8) >> 24)), 8) & (int32_t)0x000000FF;
+    u = __SSAT(((((q31_t)x      ) >> 24) - (((q31_t)y      ) >> 24)), 8) & (int32_t)0x000000FF;
+
+    return ((uint32_t)((u << 24) | (t << 16) | (s <<  8) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined QADD16 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __QADD16(
+  uint32_t x,
+  uint32_t y)
+  {
+/*  q31_t r,     s;  without initialisation 'arm_offset_q15 test' fails  but 'intrinsic' tests pass! for armCC */
+    q31_t r = 0, s = 0;
+
+    r = __SSAT(((((q31_t)x << 16) >> 16) + (((q31_t)y << 16) >> 16)), 16) & (int32_t)0x0000FFFF;
+    s = __SSAT(((((q31_t)x      ) >> 16) + (((q31_t)y      ) >> 16)), 16) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined SHADD16 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SHADD16(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = (((((q31_t)x << 16) >> 16) + (((q31_t)y << 16) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+    s = (((((q31_t)x      ) >> 16) + (((q31_t)y      ) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined QSUB16 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __QSUB16(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = __SSAT(((((q31_t)x << 16) >> 16) - (((q31_t)y << 16) >> 16)), 16) & (int32_t)0x0000FFFF;
+    s = __SSAT(((((q31_t)x      ) >> 16) - (((q31_t)y      ) >> 16)), 16) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined SHSUB16 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SHSUB16(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = (((((q31_t)x << 16) >> 16) - (((q31_t)y << 16) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+    s = (((((q31_t)x      ) >> 16) - (((q31_t)y      ) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined QASX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __QASX(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = __SSAT(((((q31_t)x << 16) >> 16) - (((q31_t)y      ) >> 16)), 16) & (int32_t)0x0000FFFF;
+    s = __SSAT(((((q31_t)x      ) >> 16) + (((q31_t)y << 16) >> 16)), 16) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined SHASX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SHASX(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = (((((q31_t)x << 16) >> 16) - (((q31_t)y      ) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+    s = (((((q31_t)x      ) >> 16) + (((q31_t)y << 16) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined QSAX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __QSAX(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = __SSAT(((((q31_t)x << 16) >> 16) + (((q31_t)y      ) >> 16)), 16) & (int32_t)0x0000FFFF;
+    s = __SSAT(((((q31_t)x      ) >> 16) - (((q31_t)y << 16) >> 16)), 16) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined SHSAX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SHSAX(
+  uint32_t x,
+  uint32_t y)
+  {
+    q31_t r, s;
+
+    r = (((((q31_t)x << 16) >> 16) + (((q31_t)y      ) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+    s = (((((q31_t)x      ) >> 16) - (((q31_t)y << 16) >> 16)) >> 1) & (int32_t)0x0000FFFF;
+
+    return ((uint32_t)((s << 16) | (r      )));
+  }
+
+
+  /*
+   * @brief C custom defined SMUSDX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMUSDX(
+  uint32_t x,
+  uint32_t y)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y      ) >> 16)) -
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y << 16) >> 16))   ));
+  }
+
+  /*
+   * @brief C custom defined SMUADX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMUADX(
+  uint32_t x,
+  uint32_t y)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y      ) >> 16)) +
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y << 16) >> 16))   ));
+  }
+
+
+  /*
+   * @brief C custom defined QADD for M3 and M0 processors
+   */
+  static __INLINE int32_t __QADD(
+  int32_t x,
+  int32_t y)
+  {
+    return ((int32_t)(clip_q63_to_q31((q63_t)x + (q31_t)y)));
+  }
+
+
+  /*
+   * @brief C custom defined QSUB for M3 and M0 processors
+   */
+  static __INLINE int32_t __QSUB(
+  int32_t x,
+  int32_t y)
+  {
+    return ((int32_t)(clip_q63_to_q31((q63_t)x - (q31_t)y)));
+  }
+
+
+  /*
+   * @brief C custom defined SMLAD for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMLAD(
+  uint32_t x,
+  uint32_t y,
+  uint32_t sum)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y << 16) >> 16)) +
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y      ) >> 16)) +
+                       ( ((q31_t)sum    )                                  )   ));
+  }
+
+
+  /*
+   * @brief C custom defined SMLADX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMLADX(
+  uint32_t x,
+  uint32_t y,
+  uint32_t sum)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y      ) >> 16)) +
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y << 16) >> 16)) +
+                       ( ((q31_t)sum    )                                  )   ));
+  }
+
+
+  /*
+   * @brief C custom defined SMLSDX for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMLSDX(
+  uint32_t x,
+  uint32_t y,
+  uint32_t sum)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y      ) >> 16)) -
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y << 16) >> 16)) +
+                       ( ((q31_t)sum    )                                  )   ));
+  }
+
+
+  /*
+   * @brief C custom defined SMLALD for M3 and M0 processors
+   */
+  static __INLINE uint64_t __SMLALD(
+  uint32_t x,
+  uint32_t y,
+  uint64_t sum)
+  {
+/*  return (sum + ((q15_t) (x >> 16) * (q15_t) (y >> 16)) + ((q15_t) x * (q15_t) y)); */
+    return ((uint64_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y << 16) >> 16)) +
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y      ) >> 16)) +
+                       ( ((q63_t)sum    )                                  )   ));
+  }
+
+
+  /*
+   * @brief C custom defined SMLALDX for M3 and M0 processors
+   */
+  static __INLINE uint64_t __SMLALDX(
+  uint32_t x,
+  uint32_t y,
+  uint64_t sum)
+  {
+/*  return (sum + ((q15_t) (x >> 16) * (q15_t) y)) + ((q15_t) x * (q15_t) (y >> 16)); */
+    return ((uint64_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y      ) >> 16)) +
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y << 16) >> 16)) +
+                       ( ((q63_t)sum    )                                  )   ));
+  }
+
+
+  /*
+   * @brief C custom defined SMUAD for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMUAD(
+  uint32_t x,
+  uint32_t y)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y << 16) >> 16)) +
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y      ) >> 16))   ));
+  }
+
+
+  /*
+   * @brief C custom defined SMUSD for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SMUSD(
+  uint32_t x,
+  uint32_t y)
+  {
+    return ((uint32_t)(((((q31_t)x << 16) >> 16) * (((q31_t)y << 16) >> 16)) -
+                       ((((q31_t)x      ) >> 16) * (((q31_t)y      ) >> 16))   ));
+  }
+
+
+  /*
+   * @brief C custom defined SXTB16 for M3 and M0 processors
+   */
+  static __INLINE uint32_t __SXTB16(
+  uint32_t x)
+  {
+    return ((uint32_t)(((((q31_t)x << 24) >> 24) & (q31_t)0x0000FFFF) |
+                       ((((q31_t)x <<  8) >>  8) & (q31_t)0xFFFF0000)  ));
+  }
+
+#endif /* defined (ARM_MATH_CM3) || defined (ARM_MATH_CM0_FAMILY) */
+
+
+  /**
+   * @brief Instance structure for the Q7 FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;        /**< number of filter coefficients in the filter. */
+    q7_t *pState;            /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q7_t *pCoeffs;           /**< points to the coefficient array. The array is of length numTaps.*/
+  } arm_fir_instance_q7;
+
+  /**
+   * @brief Instance structure for the Q15 FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;         /**< number of filter coefficients in the filter. */
+    q15_t *pState;            /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q15_t *pCoeffs;           /**< points to the coefficient array. The array is of length numTaps.*/
+  } arm_fir_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;         /**< number of filter coefficients in the filter. */
+    q31_t *pState;            /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q31_t *pCoeffs;           /**< points to the coefficient array. The array is of length numTaps. */
+  } arm_fir_instance_q31;
+
+  /**
+   * @brief Instance structure for the floating-point FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;     /**< number of filter coefficients in the filter. */
+    float32_t *pState;    /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    float32_t *pCoeffs;   /**< points to the coefficient array. The array is of length numTaps. */
+  } arm_fir_instance_f32;
+
+
+  /**
+   * @brief Processing function for the Q7 FIR filter.
+   * @param[in]  S          points to an instance of the Q7 FIR filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_q7(
+  const arm_fir_instance_q7 * S,
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q7 FIR filter.
+   * @param[in,out] S          points to an instance of the Q7 FIR structure.
+   * @param[in]     numTaps    Number of filter coefficients in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of samples that are processed.
+   */
+  void arm_fir_init_q7(
+  arm_fir_instance_q7 * S,
+  uint16_t numTaps,
+  q7_t * pCoeffs,
+  q7_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q15 FIR filter.
+   * @param[in]  S          points to an instance of the Q15 FIR structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_q15(
+  const arm_fir_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the fast Q15 FIR filter for Cortex-M3 and Cortex-M4.
+   * @param[in]  S          points to an instance of the Q15 FIR filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_fast_q15(
+  const arm_fir_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q15 FIR filter.
+   * @param[in,out] S          points to an instance of the Q15 FIR filter structure.
+   * @param[in]     numTaps    Number of filter coefficients in the filter. Must be even and greater than or equal to 4.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of samples that are processed at a time.
+   * @return The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_ARGUMENT_ERROR if
+   * <code>numTaps</code> is not a supported value.
+   */
+  arm_status arm_fir_init_q15(
+  arm_fir_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q31 FIR filter.
+   * @param[in]  S          points to an instance of the Q31 FIR filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_q31(
+  const arm_fir_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the fast Q31 FIR filter for Cortex-M3 and Cortex-M4.
+   * @param[in]  S          points to an instance of the Q31 FIR structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_fast_q31(
+  const arm_fir_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q31 FIR filter.
+   * @param[in,out] S          points to an instance of the Q31 FIR structure.
+   * @param[in]     numTaps    Number of filter coefficients in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of samples that are processed at a time.
+   */
+  void arm_fir_init_q31(
+  arm_fir_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the floating-point FIR filter.
+   * @param[in]  S          points to an instance of the floating-point FIR structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_f32(
+  const arm_fir_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the floating-point FIR filter.
+   * @param[in,out] S          points to an instance of the floating-point FIR filter structure.
+   * @param[in]     numTaps    Number of filter coefficients in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of samples that are processed at a time.
+   */
+  void arm_fir_init_f32(
+  arm_fir_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q15 Biquad cascade filter.
+   */
+  typedef struct
+  {
+    int8_t numStages;        /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    q15_t *pState;           /**< Points to the array of state coefficients.  The array is of length 4*numStages. */
+    q15_t *pCoeffs;          /**< Points to the array of coefficients.  The array is of length 5*numStages. */
+    int8_t postShift;        /**< Additional shift, in bits, applied to each output sample. */
+  } arm_biquad_casd_df1_inst_q15;
+
+  /**
+   * @brief Instance structure for the Q31 Biquad cascade filter.
+   */
+  typedef struct
+  {
+    uint32_t numStages;      /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    q31_t *pState;           /**< Points to the array of state coefficients.  The array is of length 4*numStages. */
+    q31_t *pCoeffs;          /**< Points to the array of coefficients.  The array is of length 5*numStages. */
+    uint8_t postShift;       /**< Additional shift, in bits, applied to each output sample. */
+  } arm_biquad_casd_df1_inst_q31;
+
+  /**
+   * @brief Instance structure for the floating-point Biquad cascade filter.
+   */
+  typedef struct
+  {
+    uint32_t numStages;      /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    float32_t *pState;       /**< Points to the array of state coefficients.  The array is of length 4*numStages. */
+    float32_t *pCoeffs;      /**< Points to the array of coefficients.  The array is of length 5*numStages. */
+  } arm_biquad_casd_df1_inst_f32;
+
+
+  /**
+   * @brief Processing function for the Q15 Biquad cascade filter.
+   * @param[in]  S          points to an instance of the Q15 Biquad cascade structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df1_q15(
+  const arm_biquad_casd_df1_inst_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q15 Biquad cascade filter.
+   * @param[in,out] S          points to an instance of the Q15 Biquad cascade structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     postShift  Shift to be applied to the output. Varies according to the coefficients format
+   */
+  void arm_biquad_cascade_df1_init_q15(
+  arm_biquad_casd_df1_inst_q15 * S,
+  uint8_t numStages,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  int8_t postShift);
+
+
+  /**
+   * @brief Fast but less precise processing function for the Q15 Biquad cascade filter for Cortex-M3 and Cortex-M4.
+   * @param[in]  S          points to an instance of the Q15 Biquad cascade structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df1_fast_q15(
+  const arm_biquad_casd_df1_inst_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q31 Biquad cascade filter
+   * @param[in]  S          points to an instance of the Q31 Biquad cascade structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df1_q31(
+  const arm_biquad_casd_df1_inst_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Fast but less precise processing function for the Q31 Biquad cascade filter for Cortex-M3 and Cortex-M4.
+   * @param[in]  S          points to an instance of the Q31 Biquad cascade structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df1_fast_q31(
+  const arm_biquad_casd_df1_inst_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q31 Biquad cascade filter.
+   * @param[in,out] S          points to an instance of the Q31 Biquad cascade structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     postShift  Shift to be applied to the output. Varies according to the coefficients format
+   */
+  void arm_biquad_cascade_df1_init_q31(
+  arm_biquad_casd_df1_inst_q31 * S,
+  uint8_t numStages,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  int8_t postShift);
+
+
+  /**
+   * @brief Processing function for the floating-point Biquad cascade filter.
+   * @param[in]  S          points to an instance of the floating-point Biquad cascade structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df1_f32(
+  const arm_biquad_casd_df1_inst_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the floating-point Biquad cascade filter.
+   * @param[in,out] S          points to an instance of the floating-point Biquad cascade structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   */
+  void arm_biquad_cascade_df1_init_f32(
+  arm_biquad_casd_df1_inst_f32 * S,
+  uint8_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState);
+
+
+  /**
+   * @brief Instance structure for the floating-point matrix structure.
+   */
+  typedef struct
+  {
+    uint16_t numRows;     /**< number of rows of the matrix.     */
+    uint16_t numCols;     /**< number of columns of the matrix.  */
+    float32_t *pData;     /**< points to the data of the matrix. */
+  } arm_matrix_instance_f32;
+
+
+  /**
+   * @brief Instance structure for the floating-point matrix structure.
+   */
+  typedef struct
+  {
+    uint16_t numRows;     /**< number of rows of the matrix.     */
+    uint16_t numCols;     /**< number of columns of the matrix.  */
+    float64_t *pData;     /**< points to the data of the matrix. */
+  } arm_matrix_instance_f64;
+
+  /**
+   * @brief Instance structure for the Q15 matrix structure.
+   */
+  typedef struct
+  {
+    uint16_t numRows;     /**< number of rows of the matrix.     */
+    uint16_t numCols;     /**< number of columns of the matrix.  */
+    q15_t *pData;         /**< points to the data of the matrix. */
+  } arm_matrix_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 matrix structure.
+   */
+  typedef struct
+  {
+    uint16_t numRows;     /**< number of rows of the matrix.     */
+    uint16_t numCols;     /**< number of columns of the matrix.  */
+    q31_t *pData;         /**< points to the data of the matrix. */
+  } arm_matrix_instance_q31;
+
+
+  /**
+   * @brief Floating-point matrix addition.
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_add_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst);
+
+
+  /**
+   * @brief Q15 matrix addition.
+   * @param[in]   pSrcA  points to the first input matrix structure
+   * @param[in]   pSrcB  points to the second input matrix structure
+   * @param[out]  pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_add_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst);
+
+
+  /**
+   * @brief Q31 matrix addition.
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_add_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief Floating-point, complex, matrix multiplication.
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_cmplx_mult_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst);
+
+
+  /**
+   * @brief Q15, complex,  matrix multiplication.
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_cmplx_mult_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst,
+  q15_t * pScratch);
+
+
+  /**
+   * @brief Q31, complex, matrix multiplication.
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_cmplx_mult_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief Floating-point matrix transpose.
+   * @param[in]  pSrc  points to the input matrix
+   * @param[out] pDst  points to the output matrix
+   * @return    The function returns either  <code>ARM_MATH_SIZE_MISMATCH</code>
+   * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_trans_f32(
+  const arm_matrix_instance_f32 * pSrc,
+  arm_matrix_instance_f32 * pDst);
+
+
+  /**
+   * @brief Q15 matrix transpose.
+   * @param[in]  pSrc  points to the input matrix
+   * @param[out] pDst  points to the output matrix
+   * @return    The function returns either  <code>ARM_MATH_SIZE_MISMATCH</code>
+   * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_trans_q15(
+  const arm_matrix_instance_q15 * pSrc,
+  arm_matrix_instance_q15 * pDst);
+
+
+  /**
+   * @brief Q31 matrix transpose.
+   * @param[in]  pSrc  points to the input matrix
+   * @param[out] pDst  points to the output matrix
+   * @return    The function returns either  <code>ARM_MATH_SIZE_MISMATCH</code>
+   * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_trans_q31(
+  const arm_matrix_instance_q31 * pSrc,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief Floating-point matrix multiplication
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_mult_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst);
+
+
+  /**
+   * @brief Q15 matrix multiplication
+   * @param[in]  pSrcA   points to the first input matrix structure
+   * @param[in]  pSrcB   points to the second input matrix structure
+   * @param[out] pDst    points to output matrix structure
+   * @param[in]  pState  points to the array for storing intermediate results
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_mult_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst,
+  q15_t * pState);
+
+
+  /**
+   * @brief Q15 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA   points to the first input matrix structure
+   * @param[in]  pSrcB   points to the second input matrix structure
+   * @param[out] pDst    points to output matrix structure
+   * @param[in]  pState  points to the array for storing intermediate results
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_mult_fast_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst,
+  q15_t * pState);
+
+
+  /**
+   * @brief Q31 matrix multiplication
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_mult_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief Q31 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_mult_fast_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief Floating-point matrix subtraction
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_sub_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst);
+
+
+  /**
+   * @brief Q15 matrix subtraction
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_sub_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst);
+
+
+  /**
+   * @brief Q31 matrix subtraction
+   * @param[in]  pSrcA  points to the first input matrix structure
+   * @param[in]  pSrcB  points to the second input matrix structure
+   * @param[out] pDst   points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_sub_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief Floating-point matrix scaling.
+   * @param[in]  pSrc   points to the input matrix
+   * @param[in]  scale  scale factor
+   * @param[out] pDst   points to the output matrix
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_scale_f32(
+  const arm_matrix_instance_f32 * pSrc,
+  float32_t scale,
+  arm_matrix_instance_f32 * pDst);
+
+
+  /**
+   * @brief Q15 matrix scaling.
+   * @param[in]  pSrc        points to input matrix
+   * @param[in]  scaleFract  fractional portion of the scale factor
+   * @param[in]  shift       number of bits to shift the result by
+   * @param[out] pDst        points to output matrix
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_scale_q15(
+  const arm_matrix_instance_q15 * pSrc,
+  q15_t scaleFract,
+  int32_t shift,
+  arm_matrix_instance_q15 * pDst);
+
+
+  /**
+   * @brief Q31 matrix scaling.
+   * @param[in]  pSrc        points to input matrix
+   * @param[in]  scaleFract  fractional portion of the scale factor
+   * @param[in]  shift       number of bits to shift the result by
+   * @param[out] pDst        points to output matrix structure
+   * @return     The function returns either
+   * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.
+   */
+  arm_status arm_mat_scale_q31(
+  const arm_matrix_instance_q31 * pSrc,
+  q31_t scaleFract,
+  int32_t shift,
+  arm_matrix_instance_q31 * pDst);
+
+
+  /**
+   * @brief  Q31 matrix initialization.
+   * @param[in,out] S         points to an instance of the floating-point matrix structure.
+   * @param[in]     nRows     number of rows in the matrix.
+   * @param[in]     nColumns  number of columns in the matrix.
+   * @param[in]     pData     points to the matrix data array.
+   */
+  void arm_mat_init_q31(
+  arm_matrix_instance_q31 * S,
+  uint16_t nRows,
+  uint16_t nColumns,
+  q31_t * pData);
+
+
+  /**
+   * @brief  Q15 matrix initialization.
+   * @param[in,out] S         points to an instance of the floating-point matrix structure.
+   * @param[in]     nRows     number of rows in the matrix.
+   * @param[in]     nColumns  number of columns in the matrix.
+   * @param[in]     pData     points to the matrix data array.
+   */
+  void arm_mat_init_q15(
+  arm_matrix_instance_q15 * S,
+  uint16_t nRows,
+  uint16_t nColumns,
+  q15_t * pData);
+
+
+  /**
+   * @brief  Floating-point matrix initialization.
+   * @param[in,out] S         points to an instance of the floating-point matrix structure.
+   * @param[in]     nRows     number of rows in the matrix.
+   * @param[in]     nColumns  number of columns in the matrix.
+   * @param[in]     pData     points to the matrix data array.
+   */
+  void arm_mat_init_f32(
+  arm_matrix_instance_f32 * S,
+  uint16_t nRows,
+  uint16_t nColumns,
+  float32_t * pData);
+
+
+
+  /**
+   * @brief Instance structure for the Q15 PID Control.
+   */
+  typedef struct
+  {
+    q15_t A0;           /**< The derived gain, A0 = Kp + Ki + Kd . */
+#ifdef ARM_MATH_CM0_FAMILY
+    q15_t A1;
+    q15_t A2;
+#else
+    q31_t A1;           /**< The derived gain A1 = -Kp - 2Kd | Kd.*/
+#endif
+    q15_t state[3];     /**< The state array of length 3. */
+    q15_t Kp;           /**< The proportional gain. */
+    q15_t Ki;           /**< The integral gain. */
+    q15_t Kd;           /**< The derivative gain. */
+  } arm_pid_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 PID Control.
+   */
+  typedef struct
+  {
+    q31_t A0;            /**< The derived gain, A0 = Kp + Ki + Kd . */
+    q31_t A1;            /**< The derived gain, A1 = -Kp - 2Kd. */
+    q31_t A2;            /**< The derived gain, A2 = Kd . */
+    q31_t state[3];      /**< The state array of length 3. */
+    q31_t Kp;            /**< The proportional gain. */
+    q31_t Ki;            /**< The integral gain. */
+    q31_t Kd;            /**< The derivative gain. */
+  } arm_pid_instance_q31;
+
+  /**
+   * @brief Instance structure for the floating-point PID Control.
+   */
+  typedef struct
+  {
+    float32_t A0;          /**< The derived gain, A0 = Kp + Ki + Kd . */
+    float32_t A1;          /**< The derived gain, A1 = -Kp - 2Kd. */
+    float32_t A2;          /**< The derived gain, A2 = Kd . */
+    float32_t state[3];    /**< The state array of length 3. */
+    float32_t Kp;          /**< The proportional gain. */
+    float32_t Ki;          /**< The integral gain. */
+    float32_t Kd;          /**< The derivative gain. */
+  } arm_pid_instance_f32;
+
+
+
+  /**
+   * @brief  Initialization function for the floating-point PID Control.
+   * @param[in,out] S               points to an instance of the PID structure.
+   * @param[in]     resetStateFlag  flag to reset the state. 0 = no change in state 1 = reset the state.
+   */
+  void arm_pid_init_f32(
+  arm_pid_instance_f32 * S,
+  int32_t resetStateFlag);
+
+
+  /**
+   * @brief  Reset function for the floating-point PID Control.
+   * @param[in,out] S  is an instance of the floating-point PID Control structure
+   */
+  void arm_pid_reset_f32(
+  arm_pid_instance_f32 * S);
+
+
+  /**
+   * @brief  Initialization function for the Q31 PID Control.
+   * @param[in,out] S               points to an instance of the Q15 PID structure.
+   * @param[in]     resetStateFlag  flag to reset the state. 0 = no change in state 1 = reset the state.
+   */
+  void arm_pid_init_q31(
+  arm_pid_instance_q31 * S,
+  int32_t resetStateFlag);
+
+
+  /**
+   * @brief  Reset function for the Q31 PID Control.
+   * @param[in,out] S   points to an instance of the Q31 PID Control structure
+   */
+
+  void arm_pid_reset_q31(
+  arm_pid_instance_q31 * S);
+
+
+  /**
+   * @brief  Initialization function for the Q15 PID Control.
+   * @param[in,out] S               points to an instance of the Q15 PID structure.
+   * @param[in]     resetStateFlag  flag to reset the state. 0 = no change in state 1 = reset the state.
+   */
+  void arm_pid_init_q15(
+  arm_pid_instance_q15 * S,
+  int32_t resetStateFlag);
+
+
+  /**
+   * @brief  Reset function for the Q15 PID Control.
+   * @param[in,out] S  points to an instance of the q15 PID Control structure
+   */
+  void arm_pid_reset_q15(
+  arm_pid_instance_q15 * S);
+
+
+  /**
+   * @brief Instance structure for the floating-point Linear Interpolate function.
+   */
+  typedef struct
+  {
+    uint32_t nValues;           /**< nValues */
+    float32_t x1;               /**< x1 */
+    float32_t xSpacing;         /**< xSpacing */
+    float32_t *pYData;          /**< pointer to the table of Y values */
+  } arm_linear_interp_instance_f32;
+
+  /**
+   * @brief Instance structure for the floating-point bilinear interpolation function.
+   */
+  typedef struct
+  {
+    uint16_t numRows;   /**< number of rows in the data table. */
+    uint16_t numCols;   /**< number of columns in the data table. */
+    float32_t *pData;   /**< points to the data table. */
+  } arm_bilinear_interp_instance_f32;
+
+   /**
+   * @brief Instance structure for the Q31 bilinear interpolation function.
+   */
+  typedef struct
+  {
+    uint16_t numRows;   /**< number of rows in the data table. */
+    uint16_t numCols;   /**< number of columns in the data table. */
+    q31_t *pData;       /**< points to the data table. */
+  } arm_bilinear_interp_instance_q31;
+
+   /**
+   * @brief Instance structure for the Q15 bilinear interpolation function.
+   */
+  typedef struct
+  {
+    uint16_t numRows;   /**< number of rows in the data table. */
+    uint16_t numCols;   /**< number of columns in the data table. */
+    q15_t *pData;       /**< points to the data table. */
+  } arm_bilinear_interp_instance_q15;
+
+   /**
+   * @brief Instance structure for the Q15 bilinear interpolation function.
+   */
+  typedef struct
+  {
+    uint16_t numRows;   /**< number of rows in the data table. */
+    uint16_t numCols;   /**< number of columns in the data table. */
+    q7_t *pData;        /**< points to the data table. */
+  } arm_bilinear_interp_instance_q7;
+
+
+  /**
+   * @brief Q7 vector multiplication.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_mult_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q15 vector multiplication.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_mult_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q31 vector multiplication.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_mult_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Floating-point vector multiplication.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_mult_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q15 CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                 /**< length of the FFT. */
+    uint8_t ifftFlag;                /**< flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform. */
+    uint8_t bitReverseFlag;          /**< flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output. */
+    q15_t *pTwiddle;                 /**< points to the Sin twiddle factor table. */
+    uint16_t *pBitRevTable;          /**< points to the bit reversal table. */
+    uint16_t twidCoefModifier;       /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    uint16_t bitRevFactor;           /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */
+  } arm_cfft_radix2_instance_q15;
+
+/* Deprecated */
+  arm_status arm_cfft_radix2_init_q15(
+  arm_cfft_radix2_instance_q15 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+/* Deprecated */
+  void arm_cfft_radix2_q15(
+  const arm_cfft_radix2_instance_q15 * S,
+  q15_t * pSrc);
+
+
+  /**
+   * @brief Instance structure for the Q15 CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                 /**< length of the FFT. */
+    uint8_t ifftFlag;                /**< flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform. */
+    uint8_t bitReverseFlag;          /**< flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output. */
+    q15_t *pTwiddle;                 /**< points to the twiddle factor table. */
+    uint16_t *pBitRevTable;          /**< points to the bit reversal table. */
+    uint16_t twidCoefModifier;       /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    uint16_t bitRevFactor;           /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */
+  } arm_cfft_radix4_instance_q15;
+
+/* Deprecated */
+  arm_status arm_cfft_radix4_init_q15(
+  arm_cfft_radix4_instance_q15 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+/* Deprecated */
+  void arm_cfft_radix4_q15(
+  const arm_cfft_radix4_instance_q15 * S,
+  q15_t * pSrc);
+
+  /**
+   * @brief Instance structure for the Radix-2 Q31 CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                 /**< length of the FFT. */
+    uint8_t ifftFlag;                /**< flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform. */
+    uint8_t bitReverseFlag;          /**< flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output. */
+    q31_t *pTwiddle;                 /**< points to the Twiddle factor table. */
+    uint16_t *pBitRevTable;          /**< points to the bit reversal table. */
+    uint16_t twidCoefModifier;       /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    uint16_t bitRevFactor;           /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */
+  } arm_cfft_radix2_instance_q31;
+
+/* Deprecated */
+  arm_status arm_cfft_radix2_init_q31(
+  arm_cfft_radix2_instance_q31 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+/* Deprecated */
+  void arm_cfft_radix2_q31(
+  const arm_cfft_radix2_instance_q31 * S,
+  q31_t * pSrc);
+
+  /**
+   * @brief Instance structure for the Q31 CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                 /**< length of the FFT. */
+    uint8_t ifftFlag;                /**< flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform. */
+    uint8_t bitReverseFlag;          /**< flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output. */
+    q31_t *pTwiddle;                 /**< points to the twiddle factor table. */
+    uint16_t *pBitRevTable;          /**< points to the bit reversal table. */
+    uint16_t twidCoefModifier;       /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    uint16_t bitRevFactor;           /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */
+  } arm_cfft_radix4_instance_q31;
+
+/* Deprecated */
+  void arm_cfft_radix4_q31(
+  const arm_cfft_radix4_instance_q31 * S,
+  q31_t * pSrc);
+
+/* Deprecated */
+  arm_status arm_cfft_radix4_init_q31(
+  arm_cfft_radix4_instance_q31 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+  /**
+   * @brief Instance structure for the floating-point CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                   /**< length of the FFT. */
+    uint8_t ifftFlag;                  /**< flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform. */
+    uint8_t bitReverseFlag;            /**< flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output. */
+    float32_t *pTwiddle;               /**< points to the Twiddle factor table. */
+    uint16_t *pBitRevTable;            /**< points to the bit reversal table. */
+    uint16_t twidCoefModifier;         /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    uint16_t bitRevFactor;             /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */
+    float32_t onebyfftLen;             /**< value of 1/fftLen. */
+  } arm_cfft_radix2_instance_f32;
+
+/* Deprecated */
+  arm_status arm_cfft_radix2_init_f32(
+  arm_cfft_radix2_instance_f32 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+/* Deprecated */
+  void arm_cfft_radix2_f32(
+  const arm_cfft_radix2_instance_f32 * S,
+  float32_t * pSrc);
+
+  /**
+   * @brief Instance structure for the floating-point CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                   /**< length of the FFT. */
+    uint8_t ifftFlag;                  /**< flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform. */
+    uint8_t bitReverseFlag;            /**< flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output. */
+    float32_t *pTwiddle;               /**< points to the Twiddle factor table. */
+    uint16_t *pBitRevTable;            /**< points to the bit reversal table. */
+    uint16_t twidCoefModifier;         /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    uint16_t bitRevFactor;             /**< bit reversal modifier that supports different size FFTs with the same bit reversal table. */
+    float32_t onebyfftLen;             /**< value of 1/fftLen. */
+  } arm_cfft_radix4_instance_f32;
+
+/* Deprecated */
+  arm_status arm_cfft_radix4_init_f32(
+  arm_cfft_radix4_instance_f32 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+/* Deprecated */
+  void arm_cfft_radix4_f32(
+  const arm_cfft_radix4_instance_f32 * S,
+  float32_t * pSrc);
+
+  /**
+   * @brief Instance structure for the fixed-point CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                   /**< length of the FFT. */
+    const q15_t *pTwiddle;             /**< points to the Twiddle factor table. */
+    const uint16_t *pBitRevTable;      /**< points to the bit reversal table. */
+    uint16_t bitRevLength;             /**< bit reversal table length. */
+  } arm_cfft_instance_q15;
+
+void arm_cfft_q15(
+    const arm_cfft_instance_q15 * S,
+    q15_t * p1,
+    uint8_t ifftFlag,
+    uint8_t bitReverseFlag);
+
+  /**
+   * @brief Instance structure for the fixed-point CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                   /**< length of the FFT. */
+    const q31_t *pTwiddle;             /**< points to the Twiddle factor table. */
+    const uint16_t *pBitRevTable;      /**< points to the bit reversal table. */
+    uint16_t bitRevLength;             /**< bit reversal table length. */
+  } arm_cfft_instance_q31;
+
+void arm_cfft_q31(
+    const arm_cfft_instance_q31 * S,
+    q31_t * p1,
+    uint8_t ifftFlag,
+    uint8_t bitReverseFlag);
+
+  /**
+   * @brief Instance structure for the floating-point CFFT/CIFFT function.
+   */
+  typedef struct
+  {
+    uint16_t fftLen;                   /**< length of the FFT. */
+    const float32_t *pTwiddle;         /**< points to the Twiddle factor table. */
+    const uint16_t *pBitRevTable;      /**< points to the bit reversal table. */
+    uint16_t bitRevLength;             /**< bit reversal table length. */
+  } arm_cfft_instance_f32;
+
+  void arm_cfft_f32(
+  const arm_cfft_instance_f32 * S,
+  float32_t * p1,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag);
+
+  /**
+   * @brief Instance structure for the Q15 RFFT/RIFFT function.
+   */
+  typedef struct
+  {
+    uint32_t fftLenReal;                      /**< length of the real FFT. */
+    uint8_t ifftFlagR;                        /**< flag that selects forward (ifftFlagR=0) or inverse (ifftFlagR=1) transform. */
+    uint8_t bitReverseFlagR;                  /**< flag that enables (bitReverseFlagR=1) or disables (bitReverseFlagR=0) bit reversal of output. */
+    uint32_t twidCoefRModifier;               /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    q15_t *pTwiddleAReal;                     /**< points to the real twiddle factor table. */
+    q15_t *pTwiddleBReal;                     /**< points to the imag twiddle factor table. */
+    const arm_cfft_instance_q15 *pCfft;       /**< points to the complex FFT instance. */
+  } arm_rfft_instance_q15;
+
+  arm_status arm_rfft_init_q15(
+  arm_rfft_instance_q15 * S,
+  uint32_t fftLenReal,
+  uint32_t ifftFlagR,
+  uint32_t bitReverseFlag);
+
+  void arm_rfft_q15(
+  const arm_rfft_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst);
+
+  /**
+   * @brief Instance structure for the Q31 RFFT/RIFFT function.
+   */
+  typedef struct
+  {
+    uint32_t fftLenReal;                        /**< length of the real FFT. */
+    uint8_t ifftFlagR;                          /**< flag that selects forward (ifftFlagR=0) or inverse (ifftFlagR=1) transform. */
+    uint8_t bitReverseFlagR;                    /**< flag that enables (bitReverseFlagR=1) or disables (bitReverseFlagR=0) bit reversal of output. */
+    uint32_t twidCoefRModifier;                 /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    q31_t *pTwiddleAReal;                       /**< points to the real twiddle factor table. */
+    q31_t *pTwiddleBReal;                       /**< points to the imag twiddle factor table. */
+    const arm_cfft_instance_q31 *pCfft;         /**< points to the complex FFT instance. */
+  } arm_rfft_instance_q31;
+
+  arm_status arm_rfft_init_q31(
+  arm_rfft_instance_q31 * S,
+  uint32_t fftLenReal,
+  uint32_t ifftFlagR,
+  uint32_t bitReverseFlag);
+
+  void arm_rfft_q31(
+  const arm_rfft_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst);
+
+  /**
+   * @brief Instance structure for the floating-point RFFT/RIFFT function.
+   */
+  typedef struct
+  {
+    uint32_t fftLenReal;                        /**< length of the real FFT. */
+    uint16_t fftLenBy2;                         /**< length of the complex FFT. */
+    uint8_t ifftFlagR;                          /**< flag that selects forward (ifftFlagR=0) or inverse (ifftFlagR=1) transform. */
+    uint8_t bitReverseFlagR;                    /**< flag that enables (bitReverseFlagR=1) or disables (bitReverseFlagR=0) bit reversal of output. */
+    uint32_t twidCoefRModifier;                     /**< twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table. */
+    float32_t *pTwiddleAReal;                   /**< points to the real twiddle factor table. */
+    float32_t *pTwiddleBReal;                   /**< points to the imag twiddle factor table. */
+    arm_cfft_radix4_instance_f32 *pCfft;        /**< points to the complex FFT instance. */
+  } arm_rfft_instance_f32;
+
+  arm_status arm_rfft_init_f32(
+  arm_rfft_instance_f32 * S,
+  arm_cfft_radix4_instance_f32 * S_CFFT,
+  uint32_t fftLenReal,
+  uint32_t ifftFlagR,
+  uint32_t bitReverseFlag);
+
+  void arm_rfft_f32(
+  const arm_rfft_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst);
+
+  /**
+   * @brief Instance structure for the floating-point RFFT/RIFFT function.
+   */
+typedef struct
+  {
+    arm_cfft_instance_f32 Sint;      /**< Internal CFFT structure. */
+    uint16_t fftLenRFFT;             /**< length of the real sequence */
+    float32_t * pTwiddleRFFT;        /**< Twiddle factors real stage  */
+  } arm_rfft_fast_instance_f32 ;
+
+arm_status arm_rfft_fast_init_f32 (
+   arm_rfft_fast_instance_f32 * S,
+   uint16_t fftLen);
+
+void arm_rfft_fast_f32(
+  arm_rfft_fast_instance_f32 * S,
+  float32_t * p, float32_t * pOut,
+  uint8_t ifftFlag);
+
+  /**
+   * @brief Instance structure for the floating-point DCT4/IDCT4 function.
+   */
+  typedef struct
+  {
+    uint16_t N;                          /**< length of the DCT4. */
+    uint16_t Nby2;                       /**< half of the length of the DCT4. */
+    float32_t normalize;                 /**< normalizing factor. */
+    float32_t *pTwiddle;                 /**< points to the twiddle factor table. */
+    float32_t *pCosFactor;               /**< points to the cosFactor table. */
+    arm_rfft_instance_f32 *pRfft;        /**< points to the real FFT instance. */
+    arm_cfft_radix4_instance_f32 *pCfft; /**< points to the complex FFT instance. */
+  } arm_dct4_instance_f32;
+
+
+  /**
+   * @brief  Initialization function for the floating-point DCT4/IDCT4.
+   * @param[in,out] S          points to an instance of floating-point DCT4/IDCT4 structure.
+   * @param[in]     S_RFFT     points to an instance of floating-point RFFT/RIFFT structure.
+   * @param[in]     S_CFFT     points to an instance of floating-point CFFT/CIFFT structure.
+   * @param[in]     N          length of the DCT4.
+   * @param[in]     Nby2       half of the length of the DCT4.
+   * @param[in]     normalize  normalizing factor.
+   * @return      arm_status function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_ARGUMENT_ERROR if <code>fftLenReal</code> is not a supported transform length.
+   */
+  arm_status arm_dct4_init_f32(
+  arm_dct4_instance_f32 * S,
+  arm_rfft_instance_f32 * S_RFFT,
+  arm_cfft_radix4_instance_f32 * S_CFFT,
+  uint16_t N,
+  uint16_t Nby2,
+  float32_t normalize);
+
+
+  /**
+   * @brief Processing function for the floating-point DCT4/IDCT4.
+   * @param[in]     S              points to an instance of the floating-point DCT4/IDCT4 structure.
+   * @param[in]     pState         points to state buffer.
+   * @param[in,out] pInlineBuffer  points to the in-place input and output buffer.
+   */
+  void arm_dct4_f32(
+  const arm_dct4_instance_f32 * S,
+  float32_t * pState,
+  float32_t * pInlineBuffer);
+
+
+  /**
+   * @brief Instance structure for the Q31 DCT4/IDCT4 function.
+   */
+  typedef struct
+  {
+    uint16_t N;                          /**< length of the DCT4. */
+    uint16_t Nby2;                       /**< half of the length of the DCT4. */
+    q31_t normalize;                     /**< normalizing factor. */
+    q31_t *pTwiddle;                     /**< points to the twiddle factor table. */
+    q31_t *pCosFactor;                   /**< points to the cosFactor table. */
+    arm_rfft_instance_q31 *pRfft;        /**< points to the real FFT instance. */
+    arm_cfft_radix4_instance_q31 *pCfft; /**< points to the complex FFT instance. */
+  } arm_dct4_instance_q31;
+
+
+  /**
+   * @brief  Initialization function for the Q31 DCT4/IDCT4.
+   * @param[in,out] S          points to an instance of Q31 DCT4/IDCT4 structure.
+   * @param[in]     S_RFFT     points to an instance of Q31 RFFT/RIFFT structure
+   * @param[in]     S_CFFT     points to an instance of Q31 CFFT/CIFFT structure
+   * @param[in]     N          length of the DCT4.
+   * @param[in]     Nby2       half of the length of the DCT4.
+   * @param[in]     normalize  normalizing factor.
+   * @return      arm_status function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_ARGUMENT_ERROR if <code>N</code> is not a supported transform length.
+   */
+  arm_status arm_dct4_init_q31(
+  arm_dct4_instance_q31 * S,
+  arm_rfft_instance_q31 * S_RFFT,
+  arm_cfft_radix4_instance_q31 * S_CFFT,
+  uint16_t N,
+  uint16_t Nby2,
+  q31_t normalize);
+
+
+  /**
+   * @brief Processing function for the Q31 DCT4/IDCT4.
+   * @param[in]     S              points to an instance of the Q31 DCT4 structure.
+   * @param[in]     pState         points to state buffer.
+   * @param[in,out] pInlineBuffer  points to the in-place input and output buffer.
+   */
+  void arm_dct4_q31(
+  const arm_dct4_instance_q31 * S,
+  q31_t * pState,
+  q31_t * pInlineBuffer);
+
+
+  /**
+   * @brief Instance structure for the Q15 DCT4/IDCT4 function.
+   */
+  typedef struct
+  {
+    uint16_t N;                          /**< length of the DCT4. */
+    uint16_t Nby2;                       /**< half of the length of the DCT4. */
+    q15_t normalize;                     /**< normalizing factor. */
+    q15_t *pTwiddle;                     /**< points to the twiddle factor table. */
+    q15_t *pCosFactor;                   /**< points to the cosFactor table. */
+    arm_rfft_instance_q15 *pRfft;        /**< points to the real FFT instance. */
+    arm_cfft_radix4_instance_q15 *pCfft; /**< points to the complex FFT instance. */
+  } arm_dct4_instance_q15;
+
+
+  /**
+   * @brief  Initialization function for the Q15 DCT4/IDCT4.
+   * @param[in,out] S          points to an instance of Q15 DCT4/IDCT4 structure.
+   * @param[in]     S_RFFT     points to an instance of Q15 RFFT/RIFFT structure.
+   * @param[in]     S_CFFT     points to an instance of Q15 CFFT/CIFFT structure.
+   * @param[in]     N          length of the DCT4.
+   * @param[in]     Nby2       half of the length of the DCT4.
+   * @param[in]     normalize  normalizing factor.
+   * @return      arm_status function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_ARGUMENT_ERROR if <code>N</code> is not a supported transform length.
+   */
+  arm_status arm_dct4_init_q15(
+  arm_dct4_instance_q15 * S,
+  arm_rfft_instance_q15 * S_RFFT,
+  arm_cfft_radix4_instance_q15 * S_CFFT,
+  uint16_t N,
+  uint16_t Nby2,
+  q15_t normalize);
+
+
+  /**
+   * @brief Processing function for the Q15 DCT4/IDCT4.
+   * @param[in]     S              points to an instance of the Q15 DCT4 structure.
+   * @param[in]     pState         points to state buffer.
+   * @param[in,out] pInlineBuffer  points to the in-place input and output buffer.
+   */
+  void arm_dct4_q15(
+  const arm_dct4_instance_q15 * S,
+  q15_t * pState,
+  q15_t * pInlineBuffer);
+
+
+  /**
+   * @brief Floating-point vector addition.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_add_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q7 vector addition.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_add_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q15 vector addition.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_add_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q31 vector addition.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_add_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Floating-point vector subtraction.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_sub_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q7 vector subtraction.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_sub_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q15 vector subtraction.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_sub_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q31 vector subtraction.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_sub_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Multiplies a floating-point vector by a scalar.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  scale      scale factor to be applied
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_scale_f32(
+  float32_t * pSrc,
+  float32_t scale,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Multiplies a Q7 vector by a scalar.
+   * @param[in]  pSrc        points to the input vector
+   * @param[in]  scaleFract  fractional portion of the scale value
+   * @param[in]  shift       number of bits to shift the result by
+   * @param[out] pDst        points to the output vector
+   * @param[in]  blockSize   number of samples in the vector
+   */
+  void arm_scale_q7(
+  q7_t * pSrc,
+  q7_t scaleFract,
+  int8_t shift,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Multiplies a Q15 vector by a scalar.
+   * @param[in]  pSrc        points to the input vector
+   * @param[in]  scaleFract  fractional portion of the scale value
+   * @param[in]  shift       number of bits to shift the result by
+   * @param[out] pDst        points to the output vector
+   * @param[in]  blockSize   number of samples in the vector
+   */
+  void arm_scale_q15(
+  q15_t * pSrc,
+  q15_t scaleFract,
+  int8_t shift,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Multiplies a Q31 vector by a scalar.
+   * @param[in]  pSrc        points to the input vector
+   * @param[in]  scaleFract  fractional portion of the scale value
+   * @param[in]  shift       number of bits to shift the result by
+   * @param[out] pDst        points to the output vector
+   * @param[in]  blockSize   number of samples in the vector
+   */
+  void arm_scale_q31(
+  q31_t * pSrc,
+  q31_t scaleFract,
+  int8_t shift,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q7 vector absolute value.
+   * @param[in]  pSrc       points to the input buffer
+   * @param[out] pDst       points to the output buffer
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_abs_q7(
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Floating-point vector absolute value.
+   * @param[in]  pSrc       points to the input buffer
+   * @param[out] pDst       points to the output buffer
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_abs_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q15 vector absolute value.
+   * @param[in]  pSrc       points to the input buffer
+   * @param[out] pDst       points to the output buffer
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_abs_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Q31 vector absolute value.
+   * @param[in]  pSrc       points to the input buffer
+   * @param[out] pDst       points to the output buffer
+   * @param[in]  blockSize  number of samples in each vector
+   */
+  void arm_abs_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Dot product of floating-point vectors.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[in]  blockSize  number of samples in each vector
+   * @param[out] result     output result returned here
+   */
+  void arm_dot_prod_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  uint32_t blockSize,
+  float32_t * result);
+
+
+  /**
+   * @brief Dot product of Q7 vectors.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[in]  blockSize  number of samples in each vector
+   * @param[out] result     output result returned here
+   */
+  void arm_dot_prod_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  uint32_t blockSize,
+  q31_t * result);
+
+
+  /**
+   * @brief Dot product of Q15 vectors.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[in]  blockSize  number of samples in each vector
+   * @param[out] result     output result returned here
+   */
+  void arm_dot_prod_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  uint32_t blockSize,
+  q63_t * result);
+
+
+  /**
+   * @brief Dot product of Q31 vectors.
+   * @param[in]  pSrcA      points to the first input vector
+   * @param[in]  pSrcB      points to the second input vector
+   * @param[in]  blockSize  number of samples in each vector
+   * @param[out] result     output result returned here
+   */
+  void arm_dot_prod_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  uint32_t blockSize,
+  q63_t * result);
+
+
+  /**
+   * @brief  Shifts the elements of a Q7 vector a specified number of bits.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  shiftBits  number of bits to shift.  A positive value shifts left; a negative value shifts right.
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_shift_q7(
+  q7_t * pSrc,
+  int8_t shiftBits,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Shifts the elements of a Q15 vector a specified number of bits.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  shiftBits  number of bits to shift.  A positive value shifts left; a negative value shifts right.
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_shift_q15(
+  q15_t * pSrc,
+  int8_t shiftBits,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Shifts the elements of a Q31 vector a specified number of bits.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  shiftBits  number of bits to shift.  A positive value shifts left; a negative value shifts right.
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_shift_q31(
+  q31_t * pSrc,
+  int8_t shiftBits,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Adds a constant offset to a floating-point vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  offset     is the offset to be added
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_offset_f32(
+  float32_t * pSrc,
+  float32_t offset,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Adds a constant offset to a Q7 vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  offset     is the offset to be added
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_offset_q7(
+  q7_t * pSrc,
+  q7_t offset,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Adds a constant offset to a Q15 vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  offset     is the offset to be added
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_offset_q15(
+  q15_t * pSrc,
+  q15_t offset,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Adds a constant offset to a Q31 vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[in]  offset     is the offset to be added
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_offset_q31(
+  q31_t * pSrc,
+  q31_t offset,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Negates the elements of a floating-point vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_negate_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Negates the elements of a Q7 vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_negate_q7(
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Negates the elements of a Q15 vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_negate_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Negates the elements of a Q31 vector.
+   * @param[in]  pSrc       points to the input vector
+   * @param[out] pDst       points to the output vector
+   * @param[in]  blockSize  number of samples in the vector
+   */
+  void arm_negate_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Copies the elements of a floating-point vector.
+   * @param[in]  pSrc       input pointer
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_copy_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Copies the elements of a Q7 vector.
+   * @param[in]  pSrc       input pointer
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_copy_q7(
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Copies the elements of a Q15 vector.
+   * @param[in]  pSrc       input pointer
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_copy_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Copies the elements of a Q31 vector.
+   * @param[in]  pSrc       input pointer
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_copy_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Fills a constant value into a floating-point vector.
+   * @param[in]  value      input value to be filled
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_fill_f32(
+  float32_t value,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Fills a constant value into a Q7 vector.
+   * @param[in]  value      input value to be filled
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_fill_q7(
+  q7_t value,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Fills a constant value into a Q15 vector.
+   * @param[in]  value      input value to be filled
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_fill_q15(
+  q15_t value,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Fills a constant value into a Q31 vector.
+   * @param[in]  value      input value to be filled
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_fill_q31(
+  q31_t value,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+/**
+ * @brief Convolution of floating-point sequences.
+ * @param[in]  pSrcA    points to the first input sequence.
+ * @param[in]  srcALen  length of the first input sequence.
+ * @param[in]  pSrcB    points to the second input sequence.
+ * @param[in]  srcBLen  length of the second input sequence.
+ * @param[out] pDst     points to the location where the output result is written.  Length srcALen+srcBLen-1.
+ */
+  void arm_conv_f32(
+  float32_t * pSrcA,
+  uint32_t srcALen,
+  float32_t * pSrcB,
+  uint32_t srcBLen,
+  float32_t * pDst);
+
+
+  /**
+   * @brief Convolution of Q15 sequences.
+   * @param[in]  pSrcA      points to the first input sequence.
+   * @param[in]  srcALen    length of the first input sequence.
+   * @param[in]  pSrcB      points to the second input sequence.
+   * @param[in]  srcBLen    length of the second input sequence.
+   * @param[out] pDst       points to the block of output data  Length srcALen+srcBLen-1.
+   * @param[in]  pScratch1  points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2  points to scratch buffer of size min(srcALen, srcBLen).
+   */
+  void arm_conv_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+/**
+ * @brief Convolution of Q15 sequences.
+ * @param[in]  pSrcA    points to the first input sequence.
+ * @param[in]  srcALen  length of the first input sequence.
+ * @param[in]  pSrcB    points to the second input sequence.
+ * @param[in]  srcBLen  length of the second input sequence.
+ * @param[out] pDst     points to the location where the output result is written.  Length srcALen+srcBLen-1.
+ */
+  void arm_conv_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst);
+
+
+  /**
+   * @brief Convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length srcALen+srcBLen-1.
+   */
+  void arm_conv_fast_q15(
+          q15_t * pSrcA,
+          uint32_t srcALen,
+          q15_t * pSrcB,
+          uint32_t srcBLen,
+          q15_t * pDst);
+
+
+  /**
+   * @brief Convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA      points to the first input sequence.
+   * @param[in]  srcALen    length of the first input sequence.
+   * @param[in]  pSrcB      points to the second input sequence.
+   * @param[in]  srcBLen    length of the second input sequence.
+   * @param[out] pDst       points to the block of output data  Length srcALen+srcBLen-1.
+   * @param[in]  pScratch1  points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2  points to scratch buffer of size min(srcALen, srcBLen).
+   */
+  void arm_conv_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+  /**
+   * @brief Convolution of Q31 sequences.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length srcALen+srcBLen-1.
+   */
+  void arm_conv_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst);
+
+
+  /**
+   * @brief Convolution of Q31 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length srcALen+srcBLen-1.
+   */
+  void arm_conv_fast_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst);
+
+
+    /**
+   * @brief Convolution of Q7 sequences.
+   * @param[in]  pSrcA      points to the first input sequence.
+   * @param[in]  srcALen    length of the first input sequence.
+   * @param[in]  pSrcB      points to the second input sequence.
+   * @param[in]  srcBLen    length of the second input sequence.
+   * @param[out] pDst       points to the block of output data  Length srcALen+srcBLen-1.
+   * @param[in]  pScratch1  points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2  points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).
+   */
+  void arm_conv_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+  /**
+   * @brief Convolution of Q7 sequences.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length srcALen+srcBLen-1.
+   */
+  void arm_conv_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst);
+
+
+  /**
+   * @brief Partial convolution of floating-point sequences.
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_f32(
+  float32_t * pSrcA,
+  uint32_t srcALen,
+  float32_t * pSrcB,
+  uint32_t srcBLen,
+  float32_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints);
+
+
+  /**
+   * @brief Partial convolution of Q15 sequences.
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @param[in]  pScratch1   points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2   points to scratch buffer of size min(srcALen, srcBLen).
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+  /**
+   * @brief Partial convolution of Q15 sequences.
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints);
+
+
+  /**
+   * @brief Partial convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_fast_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints);
+
+
+  /**
+   * @brief Partial convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @param[in]  pScratch1   points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2   points to scratch buffer of size min(srcALen, srcBLen).
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+  /**
+   * @brief Partial convolution of Q31 sequences.
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints);
+
+
+  /**
+   * @brief Partial convolution of Q31 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_fast_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints);
+
+
+  /**
+   * @brief Partial convolution of Q7 sequences
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @param[in]  pScratch1   points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2   points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+/**
+   * @brief Partial convolution of Q7 sequences.
+   * @param[in]  pSrcA       points to the first input sequence.
+   * @param[in]  srcALen     length of the first input sequence.
+   * @param[in]  pSrcB       points to the second input sequence.
+   * @param[in]  srcBLen     length of the second input sequence.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  firstIndex  is the first output sample to start with.
+   * @param[in]  numPoints   is the number of output points to be computed.
+   * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].
+   */
+  arm_status arm_conv_partial_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints);
+
+
+  /**
+   * @brief Instance structure for the Q15 FIR decimator.
+   */
+  typedef struct
+  {
+    uint8_t M;                  /**< decimation factor. */
+    uint16_t numTaps;           /**< number of coefficients in the filter. */
+    q15_t *pCoeffs;             /**< points to the coefficient array. The array is of length numTaps.*/
+    q15_t *pState;              /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+  } arm_fir_decimate_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 FIR decimator.
+   */
+  typedef struct
+  {
+    uint8_t M;                  /**< decimation factor. */
+    uint16_t numTaps;           /**< number of coefficients in the filter. */
+    q31_t *pCoeffs;             /**< points to the coefficient array. The array is of length numTaps.*/
+    q31_t *pState;              /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+  } arm_fir_decimate_instance_q31;
+
+  /**
+   * @brief Instance structure for the floating-point FIR decimator.
+   */
+  typedef struct
+  {
+    uint8_t M;                  /**< decimation factor. */
+    uint16_t numTaps;           /**< number of coefficients in the filter. */
+    float32_t *pCoeffs;         /**< points to the coefficient array. The array is of length numTaps.*/
+    float32_t *pState;          /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+  } arm_fir_decimate_instance_f32;
+
+
+  /**
+   * @brief Processing function for the floating-point FIR decimator.
+   * @param[in]  S          points to an instance of the floating-point FIR decimator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_decimate_f32(
+  const arm_fir_decimate_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the floating-point FIR decimator.
+   * @param[in,out] S          points to an instance of the floating-point FIR decimator structure.
+   * @param[in]     numTaps    number of coefficients in the filter.
+   * @param[in]     M          decimation factor.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of input samples to process per call.
+   * @return    The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_LENGTH_ERROR if
+   * <code>blockSize</code> is not a multiple of <code>M</code>.
+   */
+  arm_status arm_fir_decimate_init_f32(
+  arm_fir_decimate_instance_f32 * S,
+  uint16_t numTaps,
+  uint8_t M,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q15 FIR decimator.
+   * @param[in]  S          points to an instance of the Q15 FIR decimator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_decimate_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q15 FIR decimator (fast variant) for Cortex-M3 and Cortex-M4.
+   * @param[in]  S          points to an instance of the Q15 FIR decimator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_decimate_fast_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q15 FIR decimator.
+   * @param[in,out] S          points to an instance of the Q15 FIR decimator structure.
+   * @param[in]     numTaps    number of coefficients in the filter.
+   * @param[in]     M          decimation factor.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of input samples to process per call.
+   * @return    The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_LENGTH_ERROR if
+   * <code>blockSize</code> is not a multiple of <code>M</code>.
+   */
+  arm_status arm_fir_decimate_init_q15(
+  arm_fir_decimate_instance_q15 * S,
+  uint16_t numTaps,
+  uint8_t M,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q31 FIR decimator.
+   * @param[in]  S     points to an instance of the Q31 FIR decimator structure.
+   * @param[in]  pSrc  points to the block of input data.
+   * @param[out] pDst  points to the block of output data
+   * @param[in] blockSize number of input samples to process per call.
+   */
+  void arm_fir_decimate_q31(
+  const arm_fir_decimate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+  /**
+   * @brief Processing function for the Q31 FIR decimator (fast variant) for Cortex-M3 and Cortex-M4.
+   * @param[in]  S          points to an instance of the Q31 FIR decimator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_decimate_fast_q31(
+  arm_fir_decimate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q31 FIR decimator.
+   * @param[in,out] S          points to an instance of the Q31 FIR decimator structure.
+   * @param[in]     numTaps    number of coefficients in the filter.
+   * @param[in]     M          decimation factor.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of input samples to process per call.
+   * @return    The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_LENGTH_ERROR if
+   * <code>blockSize</code> is not a multiple of <code>M</code>.
+   */
+  arm_status arm_fir_decimate_init_q31(
+  arm_fir_decimate_instance_q31 * S,
+  uint16_t numTaps,
+  uint8_t M,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q15 FIR interpolator.
+   */
+  typedef struct
+  {
+    uint8_t L;                      /**< upsample factor. */
+    uint16_t phaseLength;           /**< length of each polyphase filter component. */
+    q15_t *pCoeffs;                 /**< points to the coefficient array. The array is of length L*phaseLength. */
+    q15_t *pState;                  /**< points to the state variable array. The array is of length blockSize+phaseLength-1. */
+  } arm_fir_interpolate_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 FIR interpolator.
+   */
+  typedef struct
+  {
+    uint8_t L;                      /**< upsample factor. */
+    uint16_t phaseLength;           /**< length of each polyphase filter component. */
+    q31_t *pCoeffs;                 /**< points to the coefficient array. The array is of length L*phaseLength. */
+    q31_t *pState;                  /**< points to the state variable array. The array is of length blockSize+phaseLength-1. */
+  } arm_fir_interpolate_instance_q31;
+
+  /**
+   * @brief Instance structure for the floating-point FIR interpolator.
+   */
+  typedef struct
+  {
+    uint8_t L;                     /**< upsample factor. */
+    uint16_t phaseLength;          /**< length of each polyphase filter component. */
+    float32_t *pCoeffs;            /**< points to the coefficient array. The array is of length L*phaseLength. */
+    float32_t *pState;             /**< points to the state variable array. The array is of length phaseLength+numTaps-1. */
+  } arm_fir_interpolate_instance_f32;
+
+
+  /**
+   * @brief Processing function for the Q15 FIR interpolator.
+   * @param[in]  S          points to an instance of the Q15 FIR interpolator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_interpolate_q15(
+  const arm_fir_interpolate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q15 FIR interpolator.
+   * @param[in,out] S          points to an instance of the Q15 FIR interpolator structure.
+   * @param[in]     L          upsample factor.
+   * @param[in]     numTaps    number of filter coefficients in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficient buffer.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of input samples to process per call.
+   * @return        The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_LENGTH_ERROR if
+   * the filter length <code>numTaps</code> is not a multiple of the interpolation factor <code>L</code>.
+   */
+  arm_status arm_fir_interpolate_init_q15(
+  arm_fir_interpolate_instance_q15 * S,
+  uint8_t L,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q31 FIR interpolator.
+   * @param[in]  S          points to an instance of the Q15 FIR interpolator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_interpolate_q31(
+  const arm_fir_interpolate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q31 FIR interpolator.
+   * @param[in,out] S          points to an instance of the Q31 FIR interpolator structure.
+   * @param[in]     L          upsample factor.
+   * @param[in]     numTaps    number of filter coefficients in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficient buffer.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of input samples to process per call.
+   * @return        The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_LENGTH_ERROR if
+   * the filter length <code>numTaps</code> is not a multiple of the interpolation factor <code>L</code>.
+   */
+  arm_status arm_fir_interpolate_init_q31(
+  arm_fir_interpolate_instance_q31 * S,
+  uint8_t L,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the floating-point FIR interpolator.
+   * @param[in]  S          points to an instance of the floating-point FIR interpolator structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of input samples to process per call.
+   */
+  void arm_fir_interpolate_f32(
+  const arm_fir_interpolate_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the floating-point FIR interpolator.
+   * @param[in,out] S          points to an instance of the floating-point FIR interpolator structure.
+   * @param[in]     L          upsample factor.
+   * @param[in]     numTaps    number of filter coefficients in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficient buffer.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     blockSize  number of input samples to process per call.
+   * @return        The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_LENGTH_ERROR if
+   * the filter length <code>numTaps</code> is not a multiple of the interpolation factor <code>L</code>.
+   */
+  arm_status arm_fir_interpolate_init_f32(
+  arm_fir_interpolate_instance_f32 * S,
+  uint8_t L,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the high precision Q31 Biquad cascade filter.
+   */
+  typedef struct
+  {
+    uint8_t numStages;       /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    q63_t *pState;           /**< points to the array of state coefficients.  The array is of length 4*numStages. */
+    q31_t *pCoeffs;          /**< points to the array of coefficients.  The array is of length 5*numStages. */
+    uint8_t postShift;       /**< additional shift, in bits, applied to each output sample. */
+  } arm_biquad_cas_df1_32x64_ins_q31;
+
+
+  /**
+   * @param[in]  S          points to an instance of the high precision Q31 Biquad cascade filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cas_df1_32x64_q31(
+  const arm_biquad_cas_df1_32x64_ins_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @param[in,out] S          points to an instance of the high precision Q31 Biquad cascade filter structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     postShift  shift to be applied to the output. Varies according to the coefficients format
+   */
+  void arm_biquad_cas_df1_32x64_init_q31(
+  arm_biquad_cas_df1_32x64_ins_q31 * S,
+  uint8_t numStages,
+  q31_t * pCoeffs,
+  q63_t * pState,
+  uint8_t postShift);
+
+
+  /**
+   * @brief Instance structure for the floating-point transposed direct form II Biquad cascade filter.
+   */
+  typedef struct
+  {
+    uint8_t numStages;         /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    float32_t *pState;         /**< points to the array of state coefficients.  The array is of length 2*numStages. */
+    float32_t *pCoeffs;        /**< points to the array of coefficients.  The array is of length 5*numStages. */
+  } arm_biquad_cascade_df2T_instance_f32;
+
+  /**
+   * @brief Instance structure for the floating-point transposed direct form II Biquad cascade filter.
+   */
+  typedef struct
+  {
+    uint8_t numStages;         /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    float32_t *pState;         /**< points to the array of state coefficients.  The array is of length 4*numStages. */
+    float32_t *pCoeffs;        /**< points to the array of coefficients.  The array is of length 5*numStages. */
+  } arm_biquad_cascade_stereo_df2T_instance_f32;
+
+  /**
+   * @brief Instance structure for the floating-point transposed direct form II Biquad cascade filter.
+   */
+  typedef struct
+  {
+    uint8_t numStages;         /**< number of 2nd order stages in the filter.  Overall order is 2*numStages. */
+    float64_t *pState;         /**< points to the array of state coefficients.  The array is of length 2*numStages. */
+    float64_t *pCoeffs;        /**< points to the array of coefficients.  The array is of length 5*numStages. */
+  } arm_biquad_cascade_df2T_instance_f64;
+
+
+  /**
+   * @brief Processing function for the floating-point transposed direct form II Biquad cascade filter.
+   * @param[in]  S          points to an instance of the filter data structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df2T_f32(
+  const arm_biquad_cascade_df2T_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the floating-point transposed direct form II Biquad cascade filter. 2 channels
+   * @param[in]  S          points to an instance of the filter data structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_stereo_df2T_f32(
+  const arm_biquad_cascade_stereo_df2T_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the floating-point transposed direct form II Biquad cascade filter.
+   * @param[in]  S          points to an instance of the filter data structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_biquad_cascade_df2T_f64(
+  const arm_biquad_cascade_df2T_instance_f64 * S,
+  float64_t * pSrc,
+  float64_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the floating-point transposed direct form II Biquad cascade filter.
+   * @param[in,out] S          points to an instance of the filter data structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   */
+  void arm_biquad_cascade_df2T_init_f32(
+  arm_biquad_cascade_df2T_instance_f32 * S,
+  uint8_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState);
+
+
+  /**
+   * @brief  Initialization function for the floating-point transposed direct form II Biquad cascade filter.
+   * @param[in,out] S          points to an instance of the filter data structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   */
+  void arm_biquad_cascade_stereo_df2T_init_f32(
+  arm_biquad_cascade_stereo_df2T_instance_f32 * S,
+  uint8_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState);
+
+
+  /**
+   * @brief  Initialization function for the floating-point transposed direct form II Biquad cascade filter.
+   * @param[in,out] S          points to an instance of the filter data structure.
+   * @param[in]     numStages  number of 2nd order stages in the filter.
+   * @param[in]     pCoeffs    points to the filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   */
+  void arm_biquad_cascade_df2T_init_f64(
+  arm_biquad_cascade_df2T_instance_f64 * S,
+  uint8_t numStages,
+  float64_t * pCoeffs,
+  float64_t * pState);
+
+
+  /**
+   * @brief Instance structure for the Q15 FIR lattice filter.
+   */
+  typedef struct
+  {
+    uint16_t numStages;                  /**< number of filter stages. */
+    q15_t *pState;                       /**< points to the state variable array. The array is of length numStages. */
+    q15_t *pCoeffs;                      /**< points to the coefficient array. The array is of length numStages. */
+  } arm_fir_lattice_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 FIR lattice filter.
+   */
+  typedef struct
+  {
+    uint16_t numStages;                  /**< number of filter stages. */
+    q31_t *pState;                       /**< points to the state variable array. The array is of length numStages. */
+    q31_t *pCoeffs;                      /**< points to the coefficient array. The array is of length numStages. */
+  } arm_fir_lattice_instance_q31;
+
+  /**
+   * @brief Instance structure for the floating-point FIR lattice filter.
+   */
+  typedef struct
+  {
+    uint16_t numStages;                  /**< number of filter stages. */
+    float32_t *pState;                   /**< points to the state variable array. The array is of length numStages. */
+    float32_t *pCoeffs;                  /**< points to the coefficient array. The array is of length numStages. */
+  } arm_fir_lattice_instance_f32;
+
+
+  /**
+   * @brief Initialization function for the Q15 FIR lattice filter.
+   * @param[in] S          points to an instance of the Q15 FIR lattice structure.
+   * @param[in] numStages  number of filter stages.
+   * @param[in] pCoeffs    points to the coefficient buffer.  The array is of length numStages.
+   * @param[in] pState     points to the state buffer.  The array is of length numStages.
+   */
+  void arm_fir_lattice_init_q15(
+  arm_fir_lattice_instance_q15 * S,
+  uint16_t numStages,
+  q15_t * pCoeffs,
+  q15_t * pState);
+
+
+  /**
+   * @brief Processing function for the Q15 FIR lattice filter.
+   * @param[in]  S          points to an instance of the Q15 FIR lattice structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_lattice_q15(
+  const arm_fir_lattice_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for the Q31 FIR lattice filter.
+   * @param[in] S          points to an instance of the Q31 FIR lattice structure.
+   * @param[in] numStages  number of filter stages.
+   * @param[in] pCoeffs    points to the coefficient buffer.  The array is of length numStages.
+   * @param[in] pState     points to the state buffer.   The array is of length numStages.
+   */
+  void arm_fir_lattice_init_q31(
+  arm_fir_lattice_instance_q31 * S,
+  uint16_t numStages,
+  q31_t * pCoeffs,
+  q31_t * pState);
+
+
+  /**
+   * @brief Processing function for the Q31 FIR lattice filter.
+   * @param[in]  S          points to an instance of the Q31 FIR lattice structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_lattice_q31(
+  const arm_fir_lattice_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+/**
+ * @brief Initialization function for the floating-point FIR lattice filter.
+ * @param[in] S          points to an instance of the floating-point FIR lattice structure.
+ * @param[in] numStages  number of filter stages.
+ * @param[in] pCoeffs    points to the coefficient buffer.  The array is of length numStages.
+ * @param[in] pState     points to the state buffer.  The array is of length numStages.
+ */
+  void arm_fir_lattice_init_f32(
+  arm_fir_lattice_instance_f32 * S,
+  uint16_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState);
+
+
+  /**
+   * @brief Processing function for the floating-point FIR lattice filter.
+   * @param[in]  S          points to an instance of the floating-point FIR lattice structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_fir_lattice_f32(
+  const arm_fir_lattice_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q15 IIR lattice filter.
+   */
+  typedef struct
+  {
+    uint16_t numStages;                  /**< number of stages in the filter. */
+    q15_t *pState;                       /**< points to the state variable array. The array is of length numStages+blockSize. */
+    q15_t *pkCoeffs;                     /**< points to the reflection coefficient array. The array is of length numStages. */
+    q15_t *pvCoeffs;                     /**< points to the ladder coefficient array. The array is of length numStages+1. */
+  } arm_iir_lattice_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q31 IIR lattice filter.
+   */
+  typedef struct
+  {
+    uint16_t numStages;                  /**< number of stages in the filter. */
+    q31_t *pState;                       /**< points to the state variable array. The array is of length numStages+blockSize. */
+    q31_t *pkCoeffs;                     /**< points to the reflection coefficient array. The array is of length numStages. */
+    q31_t *pvCoeffs;                     /**< points to the ladder coefficient array. The array is of length numStages+1. */
+  } arm_iir_lattice_instance_q31;
+
+  /**
+   * @brief Instance structure for the floating-point IIR lattice filter.
+   */
+  typedef struct
+  {
+    uint16_t numStages;                  /**< number of stages in the filter. */
+    float32_t *pState;                   /**< points to the state variable array. The array is of length numStages+blockSize. */
+    float32_t *pkCoeffs;                 /**< points to the reflection coefficient array. The array is of length numStages. */
+    float32_t *pvCoeffs;                 /**< points to the ladder coefficient array. The array is of length numStages+1. */
+  } arm_iir_lattice_instance_f32;
+
+
+  /**
+   * @brief Processing function for the floating-point IIR lattice filter.
+   * @param[in]  S          points to an instance of the floating-point IIR lattice structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_iir_lattice_f32(
+  const arm_iir_lattice_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for the floating-point IIR lattice filter.
+   * @param[in] S          points to an instance of the floating-point IIR lattice structure.
+   * @param[in] numStages  number of stages in the filter.
+   * @param[in] pkCoeffs   points to the reflection coefficient buffer.  The array is of length numStages.
+   * @param[in] pvCoeffs   points to the ladder coefficient buffer.  The array is of length numStages+1.
+   * @param[in] pState     points to the state buffer.  The array is of length numStages+blockSize-1.
+   * @param[in] blockSize  number of samples to process.
+   */
+  void arm_iir_lattice_init_f32(
+  arm_iir_lattice_instance_f32 * S,
+  uint16_t numStages,
+  float32_t * pkCoeffs,
+  float32_t * pvCoeffs,
+  float32_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q31 IIR lattice filter.
+   * @param[in]  S          points to an instance of the Q31 IIR lattice structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_iir_lattice_q31(
+  const arm_iir_lattice_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for the Q31 IIR lattice filter.
+   * @param[in] S          points to an instance of the Q31 IIR lattice structure.
+   * @param[in] numStages  number of stages in the filter.
+   * @param[in] pkCoeffs   points to the reflection coefficient buffer.  The array is of length numStages.
+   * @param[in] pvCoeffs   points to the ladder coefficient buffer.  The array is of length numStages+1.
+   * @param[in] pState     points to the state buffer.  The array is of length numStages+blockSize.
+   * @param[in] blockSize  number of samples to process.
+   */
+  void arm_iir_lattice_init_q31(
+  arm_iir_lattice_instance_q31 * S,
+  uint16_t numStages,
+  q31_t * pkCoeffs,
+  q31_t * pvCoeffs,
+  q31_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q15 IIR lattice filter.
+   * @param[in]  S          points to an instance of the Q15 IIR lattice structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[out] pDst       points to the block of output data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_iir_lattice_q15(
+  const arm_iir_lattice_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+/**
+ * @brief Initialization function for the Q15 IIR lattice filter.
+ * @param[in] S          points to an instance of the fixed-point Q15 IIR lattice structure.
+ * @param[in] numStages  number of stages in the filter.
+ * @param[in] pkCoeffs   points to reflection coefficient buffer.  The array is of length numStages.
+ * @param[in] pvCoeffs   points to ladder coefficient buffer.  The array is of length numStages+1.
+ * @param[in] pState     points to state buffer.  The array is of length numStages+blockSize.
+ * @param[in] blockSize  number of samples to process per call.
+ */
+  void arm_iir_lattice_init_q15(
+  arm_iir_lattice_instance_q15 * S,
+  uint16_t numStages,
+  q15_t * pkCoeffs,
+  q15_t * pvCoeffs,
+  q15_t * pState,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the floating-point LMS filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;    /**< number of coefficients in the filter. */
+    float32_t *pState;   /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    float32_t *pCoeffs;  /**< points to the coefficient array. The array is of length numTaps. */
+    float32_t mu;        /**< step size that controls filter coefficient updates. */
+  } arm_lms_instance_f32;
+
+
+  /**
+   * @brief Processing function for floating-point LMS filter.
+   * @param[in]  S          points to an instance of the floating-point LMS filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[in]  pRef       points to the block of reference data.
+   * @param[out] pOut       points to the block of output data.
+   * @param[out] pErr       points to the block of error data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_lms_f32(
+  const arm_lms_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pRef,
+  float32_t * pOut,
+  float32_t * pErr,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for floating-point LMS filter.
+   * @param[in] S          points to an instance of the floating-point LMS filter structure.
+   * @param[in] numTaps    number of filter coefficients.
+   * @param[in] pCoeffs    points to the coefficient buffer.
+   * @param[in] pState     points to state buffer.
+   * @param[in] mu         step size that controls filter coefficient updates.
+   * @param[in] blockSize  number of samples to process.
+   */
+  void arm_lms_init_f32(
+  arm_lms_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  float32_t mu,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q15 LMS filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;    /**< number of coefficients in the filter. */
+    q15_t *pState;       /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q15_t *pCoeffs;      /**< points to the coefficient array. The array is of length numTaps. */
+    q15_t mu;            /**< step size that controls filter coefficient updates. */
+    uint32_t postShift;  /**< bit shift applied to coefficients. */
+  } arm_lms_instance_q15;
+
+
+  /**
+   * @brief Initialization function for the Q15 LMS filter.
+   * @param[in] S          points to an instance of the Q15 LMS filter structure.
+   * @param[in] numTaps    number of filter coefficients.
+   * @param[in] pCoeffs    points to the coefficient buffer.
+   * @param[in] pState     points to the state buffer.
+   * @param[in] mu         step size that controls filter coefficient updates.
+   * @param[in] blockSize  number of samples to process.
+   * @param[in] postShift  bit shift applied to coefficients.
+   */
+  void arm_lms_init_q15(
+  arm_lms_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  q15_t mu,
+  uint32_t blockSize,
+  uint32_t postShift);
+
+
+  /**
+   * @brief Processing function for Q15 LMS filter.
+   * @param[in]  S          points to an instance of the Q15 LMS filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[in]  pRef       points to the block of reference data.
+   * @param[out] pOut       points to the block of output data.
+   * @param[out] pErr       points to the block of error data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_lms_q15(
+  const arm_lms_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pRef,
+  q15_t * pOut,
+  q15_t * pErr,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q31 LMS filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;    /**< number of coefficients in the filter. */
+    q31_t *pState;       /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q31_t *pCoeffs;      /**< points to the coefficient array. The array is of length numTaps. */
+    q31_t mu;            /**< step size that controls filter coefficient updates. */
+    uint32_t postShift;  /**< bit shift applied to coefficients. */
+  } arm_lms_instance_q31;
+
+
+  /**
+   * @brief Processing function for Q31 LMS filter.
+   * @param[in]  S          points to an instance of the Q15 LMS filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[in]  pRef       points to the block of reference data.
+   * @param[out] pOut       points to the block of output data.
+   * @param[out] pErr       points to the block of error data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_lms_q31(
+  const arm_lms_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pRef,
+  q31_t * pOut,
+  q31_t * pErr,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for Q31 LMS filter.
+   * @param[in] S          points to an instance of the Q31 LMS filter structure.
+   * @param[in] numTaps    number of filter coefficients.
+   * @param[in] pCoeffs    points to coefficient buffer.
+   * @param[in] pState     points to state buffer.
+   * @param[in] mu         step size that controls filter coefficient updates.
+   * @param[in] blockSize  number of samples to process.
+   * @param[in] postShift  bit shift applied to coefficients.
+   */
+  void arm_lms_init_q31(
+  arm_lms_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  q31_t mu,
+  uint32_t blockSize,
+  uint32_t postShift);
+
+
+  /**
+   * @brief Instance structure for the floating-point normalized LMS filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;     /**< number of coefficients in the filter. */
+    float32_t *pState;    /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    float32_t *pCoeffs;   /**< points to the coefficient array. The array is of length numTaps. */
+    float32_t mu;         /**< step size that control filter coefficient updates. */
+    float32_t energy;     /**< saves previous frame energy. */
+    float32_t x0;         /**< saves previous input sample. */
+  } arm_lms_norm_instance_f32;
+
+
+  /**
+   * @brief Processing function for floating-point normalized LMS filter.
+   * @param[in]  S          points to an instance of the floating-point normalized LMS filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[in]  pRef       points to the block of reference data.
+   * @param[out] pOut       points to the block of output data.
+   * @param[out] pErr       points to the block of error data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_lms_norm_f32(
+  arm_lms_norm_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pRef,
+  float32_t * pOut,
+  float32_t * pErr,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for floating-point normalized LMS filter.
+   * @param[in] S          points to an instance of the floating-point LMS filter structure.
+   * @param[in] numTaps    number of filter coefficients.
+   * @param[in] pCoeffs    points to coefficient buffer.
+   * @param[in] pState     points to state buffer.
+   * @param[in] mu         step size that controls filter coefficient updates.
+   * @param[in] blockSize  number of samples to process.
+   */
+  void arm_lms_norm_init_f32(
+  arm_lms_norm_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  float32_t mu,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Instance structure for the Q31 normalized LMS filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;     /**< number of coefficients in the filter. */
+    q31_t *pState;        /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q31_t *pCoeffs;       /**< points to the coefficient array. The array is of length numTaps. */
+    q31_t mu;             /**< step size that controls filter coefficient updates. */
+    uint8_t postShift;    /**< bit shift applied to coefficients. */
+    q31_t *recipTable;    /**< points to the reciprocal initial value table. */
+    q31_t energy;         /**< saves previous frame energy. */
+    q31_t x0;             /**< saves previous input sample. */
+  } arm_lms_norm_instance_q31;
+
+
+  /**
+   * @brief Processing function for Q31 normalized LMS filter.
+   * @param[in]  S          points to an instance of the Q31 normalized LMS filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[in]  pRef       points to the block of reference data.
+   * @param[out] pOut       points to the block of output data.
+   * @param[out] pErr       points to the block of error data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_lms_norm_q31(
+  arm_lms_norm_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pRef,
+  q31_t * pOut,
+  q31_t * pErr,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for Q31 normalized LMS filter.
+   * @param[in] S          points to an instance of the Q31 normalized LMS filter structure.
+   * @param[in] numTaps    number of filter coefficients.
+   * @param[in] pCoeffs    points to coefficient buffer.
+   * @param[in] pState     points to state buffer.
+   * @param[in] mu         step size that controls filter coefficient updates.
+   * @param[in] blockSize  number of samples to process.
+   * @param[in] postShift  bit shift applied to coefficients.
+   */
+  void arm_lms_norm_init_q31(
+  arm_lms_norm_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  q31_t mu,
+  uint32_t blockSize,
+  uint8_t postShift);
+
+
+  /**
+   * @brief Instance structure for the Q15 normalized LMS filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;     /**< Number of coefficients in the filter. */
+    q15_t *pState;        /**< points to the state variable array. The array is of length numTaps+blockSize-1. */
+    q15_t *pCoeffs;       /**< points to the coefficient array. The array is of length numTaps. */
+    q15_t mu;             /**< step size that controls filter coefficient updates. */
+    uint8_t postShift;    /**< bit shift applied to coefficients. */
+    q15_t *recipTable;    /**< Points to the reciprocal initial value table. */
+    q15_t energy;         /**< saves previous frame energy. */
+    q15_t x0;             /**< saves previous input sample. */
+  } arm_lms_norm_instance_q15;
+
+
+  /**
+   * @brief Processing function for Q15 normalized LMS filter.
+   * @param[in]  S          points to an instance of the Q15 normalized LMS filter structure.
+   * @param[in]  pSrc       points to the block of input data.
+   * @param[in]  pRef       points to the block of reference data.
+   * @param[out] pOut       points to the block of output data.
+   * @param[out] pErr       points to the block of error data.
+   * @param[in]  blockSize  number of samples to process.
+   */
+  void arm_lms_norm_q15(
+  arm_lms_norm_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pRef,
+  q15_t * pOut,
+  q15_t * pErr,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Initialization function for Q15 normalized LMS filter.
+   * @param[in] S          points to an instance of the Q15 normalized LMS filter structure.
+   * @param[in] numTaps    number of filter coefficients.
+   * @param[in] pCoeffs    points to coefficient buffer.
+   * @param[in] pState     points to state buffer.
+   * @param[in] mu         step size that controls filter coefficient updates.
+   * @param[in] blockSize  number of samples to process.
+   * @param[in] postShift  bit shift applied to coefficients.
+   */
+  void arm_lms_norm_init_q15(
+  arm_lms_norm_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  q15_t mu,
+  uint32_t blockSize,
+  uint8_t postShift);
+
+
+  /**
+   * @brief Correlation of floating-point sequences.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   */
+  void arm_correlate_f32(
+  float32_t * pSrcA,
+  uint32_t srcALen,
+  float32_t * pSrcB,
+  uint32_t srcBLen,
+  float32_t * pDst);
+
+
+   /**
+   * @brief Correlation of Q15 sequences
+   * @param[in]  pSrcA     points to the first input sequence.
+   * @param[in]  srcALen   length of the first input sequence.
+   * @param[in]  pSrcB     points to the second input sequence.
+   * @param[in]  srcBLen   length of the second input sequence.
+   * @param[out] pDst      points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   * @param[in]  pScratch  points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   */
+  void arm_correlate_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch);
+
+
+  /**
+   * @brief Correlation of Q15 sequences.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   */
+
+  void arm_correlate_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst);
+
+
+  /**
+   * @brief Correlation of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   */
+
+  void arm_correlate_fast_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst);
+
+
+  /**
+   * @brief Correlation of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.
+   * @param[in]  pSrcA     points to the first input sequence.
+   * @param[in]  srcALen   length of the first input sequence.
+   * @param[in]  pSrcB     points to the second input sequence.
+   * @param[in]  srcBLen   length of the second input sequence.
+   * @param[out] pDst      points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   * @param[in]  pScratch  points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   */
+  void arm_correlate_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch);
+
+
+  /**
+   * @brief Correlation of Q31 sequences.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   */
+  void arm_correlate_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst);
+
+
+  /**
+   * @brief Correlation of Q31 sequences (fast version) for Cortex-M3 and Cortex-M4
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   */
+  void arm_correlate_fast_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst);
+
+
+ /**
+   * @brief Correlation of Q7 sequences.
+   * @param[in]  pSrcA      points to the first input sequence.
+   * @param[in]  srcALen    length of the first input sequence.
+   * @param[in]  pSrcB      points to the second input sequence.
+   * @param[in]  srcBLen    length of the second input sequence.
+   * @param[out] pDst       points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   * @param[in]  pScratch1  points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.
+   * @param[in]  pScratch2  points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).
+   */
+  void arm_correlate_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2);
+
+
+  /**
+   * @brief Correlation of Q7 sequences.
+   * @param[in]  pSrcA    points to the first input sequence.
+   * @param[in]  srcALen  length of the first input sequence.
+   * @param[in]  pSrcB    points to the second input sequence.
+   * @param[in]  srcBLen  length of the second input sequence.
+   * @param[out] pDst     points to the block of output data  Length 2 * max(srcALen, srcBLen) - 1.
+   */
+  void arm_correlate_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst);
+
+
+  /**
+   * @brief Instance structure for the floating-point sparse FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;             /**< number of coefficients in the filter. */
+    uint16_t stateIndex;          /**< state buffer index.  Points to the oldest sample in the state buffer. */
+    float32_t *pState;            /**< points to the state buffer array. The array is of length maxDelay+blockSize-1. */
+    float32_t *pCoeffs;           /**< points to the coefficient array. The array is of length numTaps.*/
+    uint16_t maxDelay;            /**< maximum offset specified by the pTapDelay array. */
+    int32_t *pTapDelay;           /**< points to the array of delay values.  The array is of length numTaps. */
+  } arm_fir_sparse_instance_f32;
+
+  /**
+   * @brief Instance structure for the Q31 sparse FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;             /**< number of coefficients in the filter. */
+    uint16_t stateIndex;          /**< state buffer index.  Points to the oldest sample in the state buffer. */
+    q31_t *pState;                /**< points to the state buffer array. The array is of length maxDelay+blockSize-1. */
+    q31_t *pCoeffs;               /**< points to the coefficient array. The array is of length numTaps.*/
+    uint16_t maxDelay;            /**< maximum offset specified by the pTapDelay array. */
+    int32_t *pTapDelay;           /**< points to the array of delay values.  The array is of length numTaps. */
+  } arm_fir_sparse_instance_q31;
+
+  /**
+   * @brief Instance structure for the Q15 sparse FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;             /**< number of coefficients in the filter. */
+    uint16_t stateIndex;          /**< state buffer index.  Points to the oldest sample in the state buffer. */
+    q15_t *pState;                /**< points to the state buffer array. The array is of length maxDelay+blockSize-1. */
+    q15_t *pCoeffs;               /**< points to the coefficient array. The array is of length numTaps.*/
+    uint16_t maxDelay;            /**< maximum offset specified by the pTapDelay array. */
+    int32_t *pTapDelay;           /**< points to the array of delay values.  The array is of length numTaps. */
+  } arm_fir_sparse_instance_q15;
+
+  /**
+   * @brief Instance structure for the Q7 sparse FIR filter.
+   */
+  typedef struct
+  {
+    uint16_t numTaps;             /**< number of coefficients in the filter. */
+    uint16_t stateIndex;          /**< state buffer index.  Points to the oldest sample in the state buffer. */
+    q7_t *pState;                 /**< points to the state buffer array. The array is of length maxDelay+blockSize-1. */
+    q7_t *pCoeffs;                /**< points to the coefficient array. The array is of length numTaps.*/
+    uint16_t maxDelay;            /**< maximum offset specified by the pTapDelay array. */
+    int32_t *pTapDelay;           /**< points to the array of delay values.  The array is of length numTaps. */
+  } arm_fir_sparse_instance_q7;
+
+
+  /**
+   * @brief Processing function for the floating-point sparse FIR filter.
+   * @param[in]  S           points to an instance of the floating-point sparse FIR structure.
+   * @param[in]  pSrc        points to the block of input data.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  pScratchIn  points to a temporary buffer of size blockSize.
+   * @param[in]  blockSize   number of input samples to process per call.
+   */
+  void arm_fir_sparse_f32(
+  arm_fir_sparse_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  float32_t * pScratchIn,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the floating-point sparse FIR filter.
+   * @param[in,out] S          points to an instance of the floating-point sparse FIR structure.
+   * @param[in]     numTaps    number of nonzero coefficients in the filter.
+   * @param[in]     pCoeffs    points to the array of filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     pTapDelay  points to the array of offset times.
+   * @param[in]     maxDelay   maximum offset time supported.
+   * @param[in]     blockSize  number of samples that will be processed per block.
+   */
+  void arm_fir_sparse_init_f32(
+  arm_fir_sparse_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q31 sparse FIR filter.
+   * @param[in]  S           points to an instance of the Q31 sparse FIR structure.
+   * @param[in]  pSrc        points to the block of input data.
+   * @param[out] pDst        points to the block of output data
+   * @param[in]  pScratchIn  points to a temporary buffer of size blockSize.
+   * @param[in]  blockSize   number of input samples to process per call.
+   */
+  void arm_fir_sparse_q31(
+  arm_fir_sparse_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  q31_t * pScratchIn,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q31 sparse FIR filter.
+   * @param[in,out] S          points to an instance of the Q31 sparse FIR structure.
+   * @param[in]     numTaps    number of nonzero coefficients in the filter.
+   * @param[in]     pCoeffs    points to the array of filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     pTapDelay  points to the array of offset times.
+   * @param[in]     maxDelay   maximum offset time supported.
+   * @param[in]     blockSize  number of samples that will be processed per block.
+   */
+  void arm_fir_sparse_init_q31(
+  arm_fir_sparse_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q15 sparse FIR filter.
+   * @param[in]  S            points to an instance of the Q15 sparse FIR structure.
+   * @param[in]  pSrc         points to the block of input data.
+   * @param[out] pDst         points to the block of output data
+   * @param[in]  pScratchIn   points to a temporary buffer of size blockSize.
+   * @param[in]  pScratchOut  points to a temporary buffer of size blockSize.
+   * @param[in]  blockSize    number of input samples to process per call.
+   */
+  void arm_fir_sparse_q15(
+  arm_fir_sparse_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  q15_t * pScratchIn,
+  q31_t * pScratchOut,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q15 sparse FIR filter.
+   * @param[in,out] S          points to an instance of the Q15 sparse FIR structure.
+   * @param[in]     numTaps    number of nonzero coefficients in the filter.
+   * @param[in]     pCoeffs    points to the array of filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     pTapDelay  points to the array of offset times.
+   * @param[in]     maxDelay   maximum offset time supported.
+   * @param[in]     blockSize  number of samples that will be processed per block.
+   */
+  void arm_fir_sparse_init_q15(
+  arm_fir_sparse_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Processing function for the Q7 sparse FIR filter.
+   * @param[in]  S            points to an instance of the Q7 sparse FIR structure.
+   * @param[in]  pSrc         points to the block of input data.
+   * @param[out] pDst         points to the block of output data
+   * @param[in]  pScratchIn   points to a temporary buffer of size blockSize.
+   * @param[in]  pScratchOut  points to a temporary buffer of size blockSize.
+   * @param[in]  blockSize    number of input samples to process per call.
+   */
+  void arm_fir_sparse_q7(
+  arm_fir_sparse_instance_q7 * S,
+  q7_t * pSrc,
+  q7_t * pDst,
+  q7_t * pScratchIn,
+  q31_t * pScratchOut,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Initialization function for the Q7 sparse FIR filter.
+   * @param[in,out] S          points to an instance of the Q7 sparse FIR structure.
+   * @param[in]     numTaps    number of nonzero coefficients in the filter.
+   * @param[in]     pCoeffs    points to the array of filter coefficients.
+   * @param[in]     pState     points to the state buffer.
+   * @param[in]     pTapDelay  points to the array of offset times.
+   * @param[in]     maxDelay   maximum offset time supported.
+   * @param[in]     blockSize  number of samples that will be processed per block.
+   */
+  void arm_fir_sparse_init_q7(
+  arm_fir_sparse_instance_q7 * S,
+  uint16_t numTaps,
+  q7_t * pCoeffs,
+  q7_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Floating-point sin_cos function.
+   * @param[in]  theta   input value in degrees
+   * @param[out] pSinVal  points to the processed sine output.
+   * @param[out] pCosVal  points to the processed cos output.
+   */
+  void arm_sin_cos_f32(
+  float32_t theta,
+  float32_t * pSinVal,
+  float32_t * pCosVal);
+
+
+  /**
+   * @brief  Q31 sin_cos function.
+   * @param[in]  theta    scaled input value in degrees
+   * @param[out] pSinVal  points to the processed sine output.
+   * @param[out] pCosVal  points to the processed cosine output.
+   */
+  void arm_sin_cos_q31(
+  q31_t theta,
+  q31_t * pSinVal,
+  q31_t * pCosVal);
+
+
+  /**
+   * @brief  Floating-point complex conjugate.
+   * @param[in]  pSrc        points to the input vector
+   * @param[out] pDst        points to the output vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   */
+  void arm_cmplx_conj_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t numSamples);
+
+  /**
+   * @brief  Q31 complex conjugate.
+   * @param[in]  pSrc        points to the input vector
+   * @param[out] pDst        points to the output vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   */
+  void arm_cmplx_conj_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q15 complex conjugate.
+   * @param[in]  pSrc        points to the input vector
+   * @param[out] pDst        points to the output vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   */
+  void arm_cmplx_conj_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Floating-point complex magnitude squared
+   * @param[in]  pSrc        points to the complex input vector
+   * @param[out] pDst        points to the real output vector
+   * @param[in]  numSamples  number of complex samples in the input vector
+   */
+  void arm_cmplx_mag_squared_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q31 complex magnitude squared
+   * @param[in]  pSrc        points to the complex input vector
+   * @param[out] pDst        points to the real output vector
+   * @param[in]  numSamples  number of complex samples in the input vector
+   */
+  void arm_cmplx_mag_squared_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q15 complex magnitude squared
+   * @param[in]  pSrc        points to the complex input vector
+   * @param[out] pDst        points to the real output vector
+   * @param[in]  numSamples  number of complex samples in the input vector
+   */
+  void arm_cmplx_mag_squared_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t numSamples);
+
+
+ /**
+   * @ingroup groupController
+   */
+
+  /**
+   * @defgroup PID PID Motor Control
+   *
+   * A Proportional Integral Derivative (PID) controller is a generic feedback control
+   * loop mechanism widely used in industrial control systems.
+   * A PID controller is the most commonly used type of feedback controller.
+   *
+   * This set of functions implements (PID) controllers
+   * for Q15, Q31, and floating-point data types.  The functions operate on a single sample
+   * of data and each call to the function returns a single processed value.
+   * <code>S</code> points to an instance of the PID control data structure.  <code>in</code>
+   * is the input sample value. The functions return the output value.
+   *
+   * \par Algorithm:
+   * <pre>
+   *    y[n] = y[n-1] + A0 * x[n] + A1 * x[n-1] + A2 * x[n-2]
+   *    A0 = Kp + Ki + Kd
+   *    A1 = (-Kp ) - (2 * Kd )
+   *    A2 = Kd  </pre>
+   *
+   * \par
+   * where \c Kp is proportional constant, \c Ki is Integral constant and \c Kd is Derivative constant
+   *
+   * \par
+   * \image html PID.gif "Proportional Integral Derivative Controller"
+   *
+   * \par
+   * The PID controller calculates an "error" value as the difference between
+   * the measured output and the reference input.
+   * The controller attempts to minimize the error by adjusting the process control inputs.
+   * The proportional value determines the reaction to the current error,
+   * the integral value determines the reaction based on the sum of recent errors,
+   * and the derivative value determines the reaction based on the rate at which the error has been changing.
+   *
+   * \par Instance Structure
+   * The Gains A0, A1, A2 and state variables for a PID controller are stored together in an instance data structure.
+   * A separate instance structure must be defined for each PID Controller.
+   * There are separate instance structure declarations for each of the 3 supported data types.
+   *
+   * \par Reset Functions
+   * There is also an associated reset function for each data type which clears the state array.
+   *
+   * \par Initialization Functions
+   * There is also an associated initialization function for each data type.
+   * The initialization function performs the following operations:
+   * - Initializes the Gains A0, A1, A2 from Kp,Ki, Kd gains.
+   * - Zeros out the values in the state buffer.
+   *
+   * \par
+   * Instance structure cannot be placed into a const data section and it is recommended to use the initialization function.
+   *
+   * \par Fixed-Point Behavior
+   * Care must be taken when using the fixed-point versions of the PID Controller functions.
+   * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.
+   * Refer to the function specific documentation below for usage guidelines.
+   */
+
+  /**
+   * @addtogroup PID
+   * @{
+   */
+
+  /**
+   * @brief  Process function for the floating-point PID Control.
+   * @param[in,out] S   is an instance of the floating-point PID Control structure
+   * @param[in]     in  input sample to process
+   * @return out processed output sample.
+   */
+  static __INLINE float32_t arm_pid_f32(
+  arm_pid_instance_f32 * S,
+  float32_t in)
+  {
+    float32_t out;
+
+    /* y[n] = y[n-1] + A0 * x[n] + A1 * x[n-1] + A2 * x[n-2]  */
+    out = (S->A0 * in) +
+      (S->A1 * S->state[0]) + (S->A2 * S->state[1]) + (S->state[2]);
+
+    /* Update state */
+    S->state[1] = S->state[0];
+    S->state[0] = in;
+    S->state[2] = out;
+
+    /* return to application */
+    return (out);
+
+  }
+
+  /**
+   * @brief  Process function for the Q31 PID Control.
+   * @param[in,out] S  points to an instance of the Q31 PID Control structure
+   * @param[in]     in  input sample to process
+   * @return out processed output sample.
+   *
+   * <b>Scaling and Overflow Behavior:</b>
+   * \par
+   * The function is implemented using an internal 64-bit accumulator.
+   * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.
+   * Thus, if the accumulator result overflows it wraps around rather than clip.
+   * In order to avoid overflows completely the input signal must be scaled down by 2 bits as there are four additions.
+   * After all multiply-accumulates are performed, the 2.62 accumulator is truncated to 1.32 format and then saturated to 1.31 format.
+   */
+  static __INLINE q31_t arm_pid_q31(
+  arm_pid_instance_q31 * S,
+  q31_t in)
+  {
+    q63_t acc;
+    q31_t out;
+
+    /* acc = A0 * x[n]  */
+    acc = (q63_t) S->A0 * in;
+
+    /* acc += A1 * x[n-1] */
+    acc += (q63_t) S->A1 * S->state[0];
+
+    /* acc += A2 * x[n-2]  */
+    acc += (q63_t) S->A2 * S->state[1];
+
+    /* convert output to 1.31 format to add y[n-1] */
+    out = (q31_t) (acc >> 31u);
+
+    /* out += y[n-1] */
+    out += S->state[2];
+
+    /* Update state */
+    S->state[1] = S->state[0];
+    S->state[0] = in;
+    S->state[2] = out;
+
+    /* return to application */
+    return (out);
+  }
+
+
+  /**
+   * @brief  Process function for the Q15 PID Control.
+   * @param[in,out] S   points to an instance of the Q15 PID Control structure
+   * @param[in]     in  input sample to process
+   * @return out processed output sample.
+   *
+   * <b>Scaling and Overflow Behavior:</b>
+   * \par
+   * The function is implemented using a 64-bit internal accumulator.
+   * Both Gains and state variables are represented in 1.15 format and multiplications yield a 2.30 result.
+   * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.
+   * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.
+   * After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits.
+   * Lastly, the accumulator is saturated to yield a result in 1.15 format.
+   */
+  static __INLINE q15_t arm_pid_q15(
+  arm_pid_instance_q15 * S,
+  q15_t in)
+  {
+    q63_t acc;
+    q15_t out;
+
+#ifndef ARM_MATH_CM0_FAMILY
+    __SIMD32_TYPE *vstate;
+
+    /* Implementation of PID controller */
+
+    /* acc = A0 * x[n]  */
+    acc = (q31_t) __SMUAD((uint32_t)S->A0, (uint32_t)in);
+
+    /* acc += A1 * x[n-1] + A2 * x[n-2]  */
+    vstate = __SIMD32_CONST(S->state);
+    acc = (q63_t)__SMLALD((uint32_t)S->A1, (uint32_t)*vstate, (uint64_t)acc);
+#else
+    /* acc = A0 * x[n]  */
+    acc = ((q31_t) S->A0) * in;
+
+    /* acc += A1 * x[n-1] + A2 * x[n-2]  */
+    acc += (q31_t) S->A1 * S->state[0];
+    acc += (q31_t) S->A2 * S->state[1];
+#endif
+
+    /* acc += y[n-1] */
+    acc += (q31_t) S->state[2] << 15;
+
+    /* saturate the output */
+    out = (q15_t) (__SSAT((acc >> 15), 16));
+
+    /* Update state */
+    S->state[1] = S->state[0];
+    S->state[0] = in;
+    S->state[2] = out;
+
+    /* return to application */
+    return (out);
+  }
+
+  /**
+   * @} end of PID group
+   */
+
+
+  /**
+   * @brief Floating-point matrix inverse.
+   * @param[in]  src   points to the instance of the input floating-point matrix structure.
+   * @param[out] dst   points to the instance of the output floating-point matrix structure.
+   * @return The function returns ARM_MATH_SIZE_MISMATCH, if the dimensions do not match.
+   * If the input matrix is singular (does not have an inverse), then the algorithm terminates and returns error status ARM_MATH_SINGULAR.
+   */
+  arm_status arm_mat_inverse_f32(
+  const arm_matrix_instance_f32 * src,
+  arm_matrix_instance_f32 * dst);
+
+
+  /**
+   * @brief Floating-point matrix inverse.
+   * @param[in]  src   points to the instance of the input floating-point matrix structure.
+   * @param[out] dst   points to the instance of the output floating-point matrix structure.
+   * @return The function returns ARM_MATH_SIZE_MISMATCH, if the dimensions do not match.
+   * If the input matrix is singular (does not have an inverse), then the algorithm terminates and returns error status ARM_MATH_SINGULAR.
+   */
+  arm_status arm_mat_inverse_f64(
+  const arm_matrix_instance_f64 * src,
+  arm_matrix_instance_f64 * dst);
+
+
+
+  /**
+   * @ingroup groupController
+   */
+
+  /**
+   * @defgroup clarke Vector Clarke Transform
+   * Forward Clarke transform converts the instantaneous stator phases into a two-coordinate time invariant vector.
+   * Generally the Clarke transform uses three-phase currents <code>Ia, Ib and Ic</code> to calculate currents
+   * in the two-phase orthogonal stator axis <code>Ialpha</code> and <code>Ibeta</code>.
+   * When <code>Ialpha</code> is superposed with <code>Ia</code> as shown in the figure below
+   * \image html clarke.gif Stator current space vector and its components in (a,b).
+   * and <code>Ia + Ib + Ic = 0</code>, in this condition <code>Ialpha</code> and <code>Ibeta</code>
+   * can be calculated using only <code>Ia</code> and <code>Ib</code>.
+   *
+   * The function operates on a single sample of data and each call to the function returns the processed output.
+   * The library provides separate functions for Q31 and floating-point data types.
+   * \par Algorithm
+   * \image html clarkeFormula.gif
+   * where <code>Ia</code> and <code>Ib</code> are the instantaneous stator phases and
+   * <code>pIalpha</code> and <code>pIbeta</code> are the two coordinates of time invariant vector.
+   * \par Fixed-Point Behavior
+   * Care must be taken when using the Q31 version of the Clarke transform.
+   * In particular, the overflow and saturation behavior of the accumulator used must be considered.
+   * Refer to the function specific documentation below for usage guidelines.
+   */
+
+  /**
+   * @addtogroup clarke
+   * @{
+   */
+
+  /**
+   *
+   * @brief  Floating-point Clarke transform
+   * @param[in]  Ia       input three-phase coordinate <code>a</code>
+   * @param[in]  Ib       input three-phase coordinate <code>b</code>
+   * @param[out] pIalpha  points to output two-phase orthogonal vector axis alpha
+   * @param[out] pIbeta   points to output two-phase orthogonal vector axis beta
+   */
+  static __INLINE void arm_clarke_f32(
+  float32_t Ia,
+  float32_t Ib,
+  float32_t * pIalpha,
+  float32_t * pIbeta)
+  {
+    /* Calculate pIalpha using the equation, pIalpha = Ia */
+    *pIalpha = Ia;
+
+    /* Calculate pIbeta using the equation, pIbeta = (1/sqrt(3)) * Ia + (2/sqrt(3)) * Ib */
+    *pIbeta = ((float32_t) 0.57735026919 * Ia + (float32_t) 1.15470053838 * Ib);
+  }
+
+
+  /**
+   * @brief  Clarke transform for Q31 version
+   * @param[in]  Ia       input three-phase coordinate <code>a</code>
+   * @param[in]  Ib       input three-phase coordinate <code>b</code>
+   * @param[out] pIalpha  points to output two-phase orthogonal vector axis alpha
+   * @param[out] pIbeta   points to output two-phase orthogonal vector axis beta
+   *
+   * <b>Scaling and Overflow Behavior:</b>
+   * \par
+   * The function is implemented using an internal 32-bit accumulator.
+   * The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format.
+   * There is saturation on the addition, hence there is no risk of overflow.
+   */
+  static __INLINE void arm_clarke_q31(
+  q31_t Ia,
+  q31_t Ib,
+  q31_t * pIalpha,
+  q31_t * pIbeta)
+  {
+    q31_t product1, product2;                    /* Temporary variables used to store intermediate results */
+
+    /* Calculating pIalpha from Ia by equation pIalpha = Ia */
+    *pIalpha = Ia;
+
+    /* Intermediate product is calculated by (1/(sqrt(3)) * Ia) */
+    product1 = (q31_t) (((q63_t) Ia * 0x24F34E8B) >> 30);
+
+    /* Intermediate product is calculated by (2/sqrt(3) * Ib) */
+    product2 = (q31_t) (((q63_t) Ib * 0x49E69D16) >> 30);
+
+    /* pIbeta is calculated by adding the intermediate products */
+    *pIbeta = __QADD(product1, product2);
+  }
+
+  /**
+   * @} end of clarke group
+   */
+
+  /**
+   * @brief  Converts the elements of the Q7 vector to Q31 vector.
+   * @param[in]  pSrc       input pointer
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_q7_to_q31(
+  q7_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+
+  /**
+   * @ingroup groupController
+   */
+
+  /**
+   * @defgroup inv_clarke Vector Inverse Clarke Transform
+   * Inverse Clarke transform converts the two-coordinate time invariant vector into instantaneous stator phases.
+   *
+   * The function operates on a single sample of data and each call to the function returns the processed output.
+   * The library provides separate functions for Q31 and floating-point data types.
+   * \par Algorithm
+   * \image html clarkeInvFormula.gif
+   * where <code>pIa</code> and <code>pIb</code> are the instantaneous stator phases and
+   * <code>Ialpha</code> and <code>Ibeta</code> are the two coordinates of time invariant vector.
+   * \par Fixed-Point Behavior
+   * Care must be taken when using the Q31 version of the Clarke transform.
+   * In particular, the overflow and saturation behavior of the accumulator used must be considered.
+   * Refer to the function specific documentation below for usage guidelines.
+   */
+
+  /**
+   * @addtogroup inv_clarke
+   * @{
+   */
+
+   /**
+   * @brief  Floating-point Inverse Clarke transform
+   * @param[in]  Ialpha  input two-phase orthogonal vector axis alpha
+   * @param[in]  Ibeta   input two-phase orthogonal vector axis beta
+   * @param[out] pIa     points to output three-phase coordinate <code>a</code>
+   * @param[out] pIb     points to output three-phase coordinate <code>b</code>
+   */
+  static __INLINE void arm_inv_clarke_f32(
+  float32_t Ialpha,
+  float32_t Ibeta,
+  float32_t * pIa,
+  float32_t * pIb)
+  {
+    /* Calculating pIa from Ialpha by equation pIa = Ialpha */
+    *pIa = Ialpha;
+
+    /* Calculating pIb from Ialpha and Ibeta by equation pIb = -(1/2) * Ialpha + (sqrt(3)/2) * Ibeta */
+    *pIb = -0.5f * Ialpha + 0.8660254039f * Ibeta;
+  }
+
+
+  /**
+   * @brief  Inverse Clarke transform for Q31 version
+   * @param[in]  Ialpha  input two-phase orthogonal vector axis alpha
+   * @param[in]  Ibeta   input two-phase orthogonal vector axis beta
+   * @param[out] pIa     points to output three-phase coordinate <code>a</code>
+   * @param[out] pIb     points to output three-phase coordinate <code>b</code>
+   *
+   * <b>Scaling and Overflow Behavior:</b>
+   * \par
+   * The function is implemented using an internal 32-bit accumulator.
+   * The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format.
+   * There is saturation on the subtraction, hence there is no risk of overflow.
+   */
+  static __INLINE void arm_inv_clarke_q31(
+  q31_t Ialpha,
+  q31_t Ibeta,
+  q31_t * pIa,
+  q31_t * pIb)
+  {
+    q31_t product1, product2;                    /* Temporary variables used to store intermediate results */
+
+    /* Calculating pIa from Ialpha by equation pIa = Ialpha */
+    *pIa = Ialpha;
+
+    /* Intermediate product is calculated by (1/(2*sqrt(3)) * Ia) */
+    product1 = (q31_t) (((q63_t) (Ialpha) * (0x40000000)) >> 31);
+
+    /* Intermediate product is calculated by (1/sqrt(3) * pIb) */
+    product2 = (q31_t) (((q63_t) (Ibeta) * (0x6ED9EBA1)) >> 31);
+
+    /* pIb is calculated by subtracting the products */
+    *pIb = __QSUB(product2, product1);
+  }
+
+  /**
+   * @} end of inv_clarke group
+   */
+
+  /**
+   * @brief  Converts the elements of the Q7 vector to Q15 vector.
+   * @param[in]  pSrc       input pointer
+   * @param[out] pDst       output pointer
+   * @param[in]  blockSize  number of samples to process
+   */
+  void arm_q7_to_q15(
+  q7_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+
+  /**
+   * @ingroup groupController
+   */
+
+  /**
+   * @defgroup park Vector Park Transform
+   *
+   * Forward Park transform converts the input two-coordinate vector to flux and torque components.
+   * The Park transform can be used to realize the transformation of the <code>Ialpha</code> and the <code>Ibeta</code> currents
+   * from the stationary to the moving reference frame and control the spatial relationship between
+   * the stator vector current and rotor flux vector.
+   * If we consider the d axis aligned with the rotor flux, the diagram below shows the
+   * current vector and the relationship from the two reference frames:
+   * \image html park.gif "Stator current space vector and its component in (a,b) and in the d,q rotating reference frame"
+   *
+   * The function operates on a single sample of data and each call to the function returns the processed output.
+   * The library provides separate functions for Q31 and floating-point data types.
+   * \par Algorithm
+   * \image html parkFormula.gif
+   * where <code>Ialpha</code> and <code>Ibeta</code> are the stator vector components,
+   * <code>pId</code> and <code>pIq</code> are rotor vector components and <code>cosVal</code> and <code>sinVal</code> are the
+   * cosine and sine values of theta (rotor flux position).
+   * \par Fixed-Point Behavior
+   * Care must be taken when using the Q31 version of the Park transform.
+   * In particular, the overflow and saturation behavior of the accumulator used must be considered.
+   * Refer to the function specific documentation below for usage guidelines.
+   */
+
+  /**
+   * @addtogroup park
+   * @{
+   */
+
+  /**
+   * @brief Floating-point Park transform
+   * @param[in]  Ialpha  input two-phase vector coordinate alpha
+   * @param[in]  Ibeta   input two-phase vector coordinate beta
+   * @param[out] pId     points to output   rotor reference frame d
+   * @param[out] pIq     points to output   rotor reference frame q
+   * @param[in]  sinVal  sine value of rotation angle theta
+   * @param[in]  cosVal  cosine value of rotation angle theta
+   *
+   * The function implements the forward Park transform.
+   *
+   */
+  static __INLINE void arm_park_f32(
+  float32_t Ialpha,
+  float32_t Ibeta,
+  float32_t * pId,
+  float32_t * pIq,
+  float32_t sinVal,
+  float32_t cosVal)
+  {
+    /* Calculate pId using the equation, pId = Ialpha * cosVal + Ibeta * sinVal */
+    *pId = Ialpha * cosVal + Ibeta * sinVal;
+
+    /* Calculate pIq using the equation, pIq = - Ialpha * sinVal + Ibeta * cosVal */
+    *pIq = -Ialpha * sinVal + Ibeta * cosVal;
+  }
+
+
+  /**
+   * @brief  Park transform for Q31 version
+   * @param[in]  Ialpha  input two-phase vector coordinate alpha
+   * @param[in]  Ibeta   input two-phase vector coordinate beta
+   * @param[out] pId     points to output rotor reference frame d
+   * @param[out] pIq     points to output rotor reference frame q
+   * @param[in]  sinVal  sine value of rotation angle theta
+   * @param[in]  cosVal  cosine value of rotation angle theta
+   *
+   * <b>Scaling and Overflow Behavior:</b>
+   * \par
+   * The function is implemented using an internal 32-bit accumulator.
+   * The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format.
+   * There is saturation on the addition and subtraction, hence there is no risk of overflow.
+   */
+  static __INLINE void arm_park_q31(
+  q31_t Ialpha,
+  q31_t Ibeta,
+  q31_t * pId,
+  q31_t * pIq,
+  q31_t sinVal,
+  q31_t cosVal)
+  {
+    q31_t product1, product2;                    /* Temporary variables used to store intermediate results */
+    q31_t product3, product4;                    /* Temporary variables used to store intermediate results */
+
+    /* Intermediate product is calculated by (Ialpha * cosVal) */
+    product1 = (q31_t) (((q63_t) (Ialpha) * (cosVal)) >> 31);
+
+    /* Intermediate product is calculated by (Ibeta * sinVal) */
+    product2 = (q31_t) (((q63_t) (Ibeta) * (sinVal)) >> 31);
+
+
+    /* Intermediate product is calculated by (Ialpha * sinVal) */
+    product3 = (q31_t) (((q63_t) (Ialpha) * (sinVal)) >> 31);
+
+    /* Intermediate product is calculated by (Ibeta * cosVal) */
+    product4 = (q31_t) (((q63_t) (Ibeta) * (cosVal)) >> 31);
+
+    /* Calculate pId by adding the two intermediate products 1 and 2 */
+    *pId = __QADD(product1, product2);
+
+    /* Calculate pIq by subtracting the two intermediate products 3 from 4 */
+    *pIq = __QSUB(product4, product3);
+  }
+
+  /**
+   * @} end of park group
+   */
+
+  /**
+   * @brief  Converts the elements of the Q7 vector to floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q7_to_float(
+  q7_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @ingroup groupController
+   */
+
+  /**
+   * @defgroup inv_park Vector Inverse Park transform
+   * Inverse Park transform converts the input flux and torque components to two-coordinate vector.
+   *
+   * The function operates on a single sample of data and each call to the function returns the processed output.
+   * The library provides separate functions for Q31 and floating-point data types.
+   * \par Algorithm
+   * \image html parkInvFormula.gif
+   * where <code>pIalpha</code> and <code>pIbeta</code> are the stator vector components,
+   * <code>Id</code> and <code>Iq</code> are rotor vector components and <code>cosVal</code> and <code>sinVal</code> are the
+   * cosine and sine values of theta (rotor flux position).
+   * \par Fixed-Point Behavior
+   * Care must be taken when using the Q31 version of the Park transform.
+   * In particular, the overflow and saturation behavior of the accumulator used must be considered.
+   * Refer to the function specific documentation below for usage guidelines.
+   */
+
+  /**
+   * @addtogroup inv_park
+   * @{
+   */
+
+   /**
+   * @brief  Floating-point Inverse Park transform
+   * @param[in]  Id       input coordinate of rotor reference frame d
+   * @param[in]  Iq       input coordinate of rotor reference frame q
+   * @param[out] pIalpha  points to output two-phase orthogonal vector axis alpha
+   * @param[out] pIbeta   points to output two-phase orthogonal vector axis beta
+   * @param[in]  sinVal   sine value of rotation angle theta
+   * @param[in]  cosVal   cosine value of rotation angle theta
+   */
+  static __INLINE void arm_inv_park_f32(
+  float32_t Id,
+  float32_t Iq,
+  float32_t * pIalpha,
+  float32_t * pIbeta,
+  float32_t sinVal,
+  float32_t cosVal)
+  {
+    /* Calculate pIalpha using the equation, pIalpha = Id * cosVal - Iq * sinVal */
+    *pIalpha = Id * cosVal - Iq * sinVal;
+
+    /* Calculate pIbeta using the equation, pIbeta = Id * sinVal + Iq * cosVal */
+    *pIbeta = Id * sinVal + Iq * cosVal;
+  }
+
+
+  /**
+   * @brief  Inverse Park transform for   Q31 version
+   * @param[in]  Id       input coordinate of rotor reference frame d
+   * @param[in]  Iq       input coordinate of rotor reference frame q
+   * @param[out] pIalpha  points to output two-phase orthogonal vector axis alpha
+   * @param[out] pIbeta   points to output two-phase orthogonal vector axis beta
+   * @param[in]  sinVal   sine value of rotation angle theta
+   * @param[in]  cosVal   cosine value of rotation angle theta
+   *
+   * <b>Scaling and Overflow Behavior:</b>
+   * \par
+   * The function is implemented using an internal 32-bit accumulator.
+   * The accumulator maintains 1.31 format by truncating lower 31 bits of the intermediate multiplication in 2.62 format.
+   * There is saturation on the addition, hence there is no risk of overflow.
+   */
+  static __INLINE void arm_inv_park_q31(
+  q31_t Id,
+  q31_t Iq,
+  q31_t * pIalpha,
+  q31_t * pIbeta,
+  q31_t sinVal,
+  q31_t cosVal)
+  {
+    q31_t product1, product2;                    /* Temporary variables used to store intermediate results */
+    q31_t product3, product4;                    /* Temporary variables used to store intermediate results */
+
+    /* Intermediate product is calculated by (Id * cosVal) */
+    product1 = (q31_t) (((q63_t) (Id) * (cosVal)) >> 31);
+
+    /* Intermediate product is calculated by (Iq * sinVal) */
+    product2 = (q31_t) (((q63_t) (Iq) * (sinVal)) >> 31);
+
+
+    /* Intermediate product is calculated by (Id * sinVal) */
+    product3 = (q31_t) (((q63_t) (Id) * (sinVal)) >> 31);
+
+    /* Intermediate product is calculated by (Iq * cosVal) */
+    product4 = (q31_t) (((q63_t) (Iq) * (cosVal)) >> 31);
+
+    /* Calculate pIalpha by using the two intermediate products 1 and 2 */
+    *pIalpha = __QSUB(product1, product2);
+
+    /* Calculate pIbeta by using the two intermediate products 3 and 4 */
+    *pIbeta = __QADD(product4, product3);
+  }
+
+  /**
+   * @} end of Inverse park group
+   */
+
+
+  /**
+   * @brief  Converts the elements of the Q31 vector to floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q31_to_float(
+  q31_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+  /**
+   * @ingroup groupInterpolation
+   */
+
+  /**
+   * @defgroup LinearInterpolate Linear Interpolation
+   *
+   * Linear interpolation is a method of curve fitting using linear polynomials.
+   * Linear interpolation works by effectively drawing a straight line between two neighboring samples and returning the appropriate point along that line
+   *
+   * \par
+   * \image html LinearInterp.gif "Linear interpolation"
+   *
+   * \par
+   * A  Linear Interpolate function calculates an output value(y), for the input(x)
+   * using linear interpolation of the input values x0, x1( nearest input values) and the output values y0 and y1(nearest output values)
+   *
+   * \par Algorithm:
+   * <pre>
+   *       y = y0 + (x - x0) * ((y1 - y0)/(x1-x0))
+   *       where x0, x1 are nearest values of input x
+   *             y0, y1 are nearest values to output y
+   * </pre>
+   *
+   * \par
+   * This set of functions implements Linear interpolation process
+   * for Q7, Q15, Q31, and floating-point data types.  The functions operate on a single
+   * sample of data and each call to the function returns a single processed value.
+   * <code>S</code> points to an instance of the Linear Interpolate function data structure.
+   * <code>x</code> is the input sample value. The functions returns the output value.
+   *
+   * \par
+   * if x is outside of the table boundary, Linear interpolation returns first value of the table
+   * if x is below input range and returns last value of table if x is above range.
+   */
+
+  /**
+   * @addtogroup LinearInterpolate
+   * @{
+   */
+
+  /**
+   * @brief  Process function for the floating-point Linear Interpolation Function.
+   * @param[in,out] S  is an instance of the floating-point Linear Interpolation structure
+   * @param[in]     x  input sample to process
+   * @return y processed output sample.
+   *
+   */
+  static __INLINE float32_t arm_linear_interp_f32(
+  arm_linear_interp_instance_f32 * S,
+  float32_t x)
+  {
+    float32_t y;
+    float32_t x0, x1;                            /* Nearest input values */
+    float32_t y0, y1;                            /* Nearest output values */
+    float32_t xSpacing = S->xSpacing;            /* spacing between input values */
+    int32_t i;                                   /* Index variable */
+    float32_t *pYData = S->pYData;               /* pointer to output table */
+
+    /* Calculation of index */
+    i = (int32_t) ((x - S->x1) / xSpacing);
+
+    if(i < 0)
+    {
+      /* Iniatilize output for below specified range as least output value of table */
+      y = pYData[0];
+    }
+    else if((uint32_t)i >= S->nValues)
+    {
+      /* Iniatilize output for above specified range as last output value of table */
+      y = pYData[S->nValues - 1];
+    }
+    else
+    {
+      /* Calculation of nearest input values */
+      x0 = S->x1 +  i      * xSpacing;
+      x1 = S->x1 + (i + 1) * xSpacing;
+
+      /* Read of nearest output values */
+      y0 = pYData[i];
+      y1 = pYData[i + 1];
+
+      /* Calculation of output */
+      y = y0 + (x - x0) * ((y1 - y0) / (x1 - x0));
+
+    }
+
+    /* returns output value */
+    return (y);
+  }
+
+
+   /**
+   *
+   * @brief  Process function for the Q31 Linear Interpolation Function.
+   * @param[in] pYData   pointer to Q31 Linear Interpolation table
+   * @param[in] x        input sample to process
+   * @param[in] nValues  number of table values
+   * @return y processed output sample.
+   *
+   * \par
+   * Input sample <code>x</code> is in 12.20 format which contains 12 bits for table index and 20 bits for fractional part.
+   * This function can support maximum of table size 2^12.
+   *
+   */
+  static __INLINE q31_t arm_linear_interp_q31(
+  q31_t * pYData,
+  q31_t x,
+  uint32_t nValues)
+  {
+    q31_t y;                                     /* output */
+    q31_t y0, y1;                                /* Nearest output values */
+    q31_t fract;                                 /* fractional part */
+    int32_t index;                               /* Index to read nearest output values */
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    index = ((x & (q31_t)0xFFF00000) >> 20);
+
+    if(index >= (int32_t)(nValues - 1))
+    {
+      return (pYData[nValues - 1]);
+    }
+    else if(index < 0)
+    {
+      return (pYData[0]);
+    }
+    else
+    {
+      /* 20 bits for the fractional part */
+      /* shift left by 11 to keep fract in 1.31 format */
+      fract = (x & 0x000FFFFF) << 11;
+
+      /* Read two nearest output values from the index in 1.31(q31) format */
+      y0 = pYData[index];
+      y1 = pYData[index + 1];
+
+      /* Calculation of y0 * (1-fract) and y is in 2.30 format */
+      y = ((q31_t) ((q63_t) y0 * (0x7FFFFFFF - fract) >> 32));
+
+      /* Calculation of y0 * (1-fract) + y1 *fract and y is in 2.30 format */
+      y += ((q31_t) (((q63_t) y1 * fract) >> 32));
+
+      /* Convert y to 1.31 format */
+      return (y << 1u);
+    }
+  }
+
+
+  /**
+   *
+   * @brief  Process function for the Q15 Linear Interpolation Function.
+   * @param[in] pYData   pointer to Q15 Linear Interpolation table
+   * @param[in] x        input sample to process
+   * @param[in] nValues  number of table values
+   * @return y processed output sample.
+   *
+   * \par
+   * Input sample <code>x</code> is in 12.20 format which contains 12 bits for table index and 20 bits for fractional part.
+   * This function can support maximum of table size 2^12.
+   *
+   */
+  static __INLINE q15_t arm_linear_interp_q15(
+  q15_t * pYData,
+  q31_t x,
+  uint32_t nValues)
+  {
+    q63_t y;                                     /* output */
+    q15_t y0, y1;                                /* Nearest output values */
+    q31_t fract;                                 /* fractional part */
+    int32_t index;                               /* Index to read nearest output values */
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    index = ((x & (int32_t)0xFFF00000) >> 20);
+
+    if(index >= (int32_t)(nValues - 1))
+    {
+      return (pYData[nValues - 1]);
+    }
+    else if(index < 0)
+    {
+      return (pYData[0]);
+    }
+    else
+    {
+      /* 20 bits for the fractional part */
+      /* fract is in 12.20 format */
+      fract = (x & 0x000FFFFF);
+
+      /* Read two nearest output values from the index */
+      y0 = pYData[index];
+      y1 = pYData[index + 1];
+
+      /* Calculation of y0 * (1-fract) and y is in 13.35 format */
+      y = ((q63_t) y0 * (0xFFFFF - fract));
+
+      /* Calculation of (y0 * (1-fract) + y1 * fract) and y is in 13.35 format */
+      y += ((q63_t) y1 * (fract));
+
+      /* convert y to 1.15 format */
+      return (q15_t) (y >> 20);
+    }
+  }
+
+
+  /**
+   *
+   * @brief  Process function for the Q7 Linear Interpolation Function.
+   * @param[in] pYData   pointer to Q7 Linear Interpolation table
+   * @param[in] x        input sample to process
+   * @param[in] nValues  number of table values
+   * @return y processed output sample.
+   *
+   * \par
+   * Input sample <code>x</code> is in 12.20 format which contains 12 bits for table index and 20 bits for fractional part.
+   * This function can support maximum of table size 2^12.
+   */
+  static __INLINE q7_t arm_linear_interp_q7(
+  q7_t * pYData,
+  q31_t x,
+  uint32_t nValues)
+  {
+    q31_t y;                                     /* output */
+    q7_t y0, y1;                                 /* Nearest output values */
+    q31_t fract;                                 /* fractional part */
+    uint32_t index;                              /* Index to read nearest output values */
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    if (x < 0)
+    {
+      return (pYData[0]);
+    }
+    index = (x >> 20) & 0xfff;
+
+    if(index >= (nValues - 1))
+    {
+      return (pYData[nValues - 1]);
+    }
+    else
+    {
+      /* 20 bits for the fractional part */
+      /* fract is in 12.20 format */
+      fract = (x & 0x000FFFFF);
+
+      /* Read two nearest output values from the index and are in 1.7(q7) format */
+      y0 = pYData[index];
+      y1 = pYData[index + 1];
+
+      /* Calculation of y0 * (1-fract ) and y is in 13.27(q27) format */
+      y = ((y0 * (0xFFFFF - fract)));
+
+      /* Calculation of y1 * fract + y0 * (1-fract) and y is in 13.27(q27) format */
+      y += (y1 * fract);
+
+      /* convert y to 1.7(q7) format */
+      return (q7_t) (y >> 20);
+     }
+  }
+
+  /**
+   * @} end of LinearInterpolate group
+   */
+
+  /**
+   * @brief  Fast approximation to the trigonometric sine function for floating-point data.
+   * @param[in] x  input value in radians.
+   * @return  sin(x).
+   */
+  float32_t arm_sin_f32(
+  float32_t x);
+
+
+  /**
+   * @brief  Fast approximation to the trigonometric sine function for Q31 data.
+   * @param[in] x  Scaled input value in radians.
+   * @return  sin(x).
+   */
+  q31_t arm_sin_q31(
+  q31_t x);
+
+
+  /**
+   * @brief  Fast approximation to the trigonometric sine function for Q15 data.
+   * @param[in] x  Scaled input value in radians.
+   * @return  sin(x).
+   */
+  q15_t arm_sin_q15(
+  q15_t x);
+
+
+  /**
+   * @brief  Fast approximation to the trigonometric cosine function for floating-point data.
+   * @param[in] x  input value in radians.
+   * @return  cos(x).
+   */
+  float32_t arm_cos_f32(
+  float32_t x);
+
+
+  /**
+   * @brief Fast approximation to the trigonometric cosine function for Q31 data.
+   * @param[in] x  Scaled input value in radians.
+   * @return  cos(x).
+   */
+  q31_t arm_cos_q31(
+  q31_t x);
+
+
+  /**
+   * @brief  Fast approximation to the trigonometric cosine function for Q15 data.
+   * @param[in] x  Scaled input value in radians.
+   * @return  cos(x).
+   */
+  q15_t arm_cos_q15(
+  q15_t x);
+
+
+  /**
+   * @ingroup groupFastMath
+   */
+
+
+  /**
+   * @defgroup SQRT Square Root
+   *
+   * Computes the square root of a number.
+   * There are separate functions for Q15, Q31, and floating-point data types.
+   * The square root function is computed using the Newton-Raphson algorithm.
+   * This is an iterative algorithm of the form:
+   * <pre>
+   *      x1 = x0 - f(x0)/f'(x0)
+   * </pre>
+   * where <code>x1</code> is the current estimate,
+   * <code>x0</code> is the previous estimate, and
+   * <code>f'(x0)</code> is the derivative of <code>f()</code> evaluated at <code>x0</code>.
+   * For the square root function, the algorithm reduces to:
+   * <pre>
+   *     x0 = in/2                         [initial guess]
+   *     x1 = 1/2 * ( x0 + in / x0)        [each iteration]
+   * </pre>
+   */
+
+
+  /**
+   * @addtogroup SQRT
+   * @{
+   */
+
+  /**
+   * @brief  Floating-point square root function.
+   * @param[in]  in    input value.
+   * @param[out] pOut  square root of input value.
+   * @return The function returns ARM_MATH_SUCCESS if input value is positive value or ARM_MATH_ARGUMENT_ERROR if
+   * <code>in</code> is negative value and returns zero output for negative values.
+   */
+  static __INLINE arm_status arm_sqrt_f32(
+  float32_t in,
+  float32_t * pOut)
+  {
+    if(in >= 0.0f)
+    {
+
+#if   (__FPU_USED == 1) && defined ( __CC_ARM   )
+      *pOut = __sqrtf(in);
+#elif (__FPU_USED == 1) && (defined(__ARMCC_VERSION) && (__ARMCC_VERSION >= 6010050))
+      *pOut = __builtin_sqrtf(in);
+#elif (__FPU_USED == 1) && defined(__GNUC__)
+      *pOut = __builtin_sqrtf(in);
+#elif (__FPU_USED == 1) && defined ( __ICCARM__ ) && (__VER__ >= 6040000)
+      __ASM("VSQRT.F32 %0,%1" : "=t"(*pOut) : "t"(in));
+#else
+      *pOut = sqrtf(in);
+#endif
+
+      return (ARM_MATH_SUCCESS);
+    }
+    else
+    {
+      *pOut = 0.0f;
+      return (ARM_MATH_ARGUMENT_ERROR);
+    }
+  }
+
+
+  /**
+   * @brief Q31 square root function.
+   * @param[in]  in    input value.  The range of the input value is [0 +1) or 0x00000000 to 0x7FFFFFFF.
+   * @param[out] pOut  square root of input value.
+   * @return The function returns ARM_MATH_SUCCESS if input value is positive value or ARM_MATH_ARGUMENT_ERROR if
+   * <code>in</code> is negative value and returns zero output for negative values.
+   */
+  arm_status arm_sqrt_q31(
+  q31_t in,
+  q31_t * pOut);
+
+
+  /**
+   * @brief  Q15 square root function.
+   * @param[in]  in    input value.  The range of the input value is [0 +1) or 0x0000 to 0x7FFF.
+   * @param[out] pOut  square root of input value.
+   * @return The function returns ARM_MATH_SUCCESS if input value is positive value or ARM_MATH_ARGUMENT_ERROR if
+   * <code>in</code> is negative value and returns zero output for negative values.
+   */
+  arm_status arm_sqrt_q15(
+  q15_t in,
+  q15_t * pOut);
+
+  /**
+   * @} end of SQRT group
+   */
+
+
+  /**
+   * @brief floating-point Circular write function.
+   */
+  static __INLINE void arm_circularWrite_f32(
+  int32_t * circBuffer,
+  int32_t L,
+  uint16_t * writeOffset,
+  int32_t bufferInc,
+  const int32_t * src,
+  int32_t srcInc,
+  uint32_t blockSize)
+  {
+    uint32_t i = 0u;
+    int32_t wOffset;
+
+    /* Copy the value of Index pointer that points
+     * to the current location where the input samples to be copied */
+    wOffset = *writeOffset;
+
+    /* Loop over the blockSize */
+    i = blockSize;
+
+    while(i > 0u)
+    {
+      /* copy the input sample to the circular buffer */
+      circBuffer[wOffset] = *src;
+
+      /* Update the input pointer */
+      src += srcInc;
+
+      /* Circularly update wOffset.  Watch out for positive and negative value */
+      wOffset += bufferInc;
+      if(wOffset >= L)
+        wOffset -= L;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Update the index pointer */
+    *writeOffset = (uint16_t)wOffset;
+  }
+
+
+
+  /**
+   * @brief floating-point Circular Read function.
+   */
+  static __INLINE void arm_circularRead_f32(
+  int32_t * circBuffer,
+  int32_t L,
+  int32_t * readOffset,
+  int32_t bufferInc,
+  int32_t * dst,
+  int32_t * dst_base,
+  int32_t dst_length,
+  int32_t dstInc,
+  uint32_t blockSize)
+  {
+    uint32_t i = 0u;
+    int32_t rOffset, dst_end;
+
+    /* Copy the value of Index pointer that points
+     * to the current location from where the input samples to be read */
+    rOffset = *readOffset;
+    dst_end = (int32_t) (dst_base + dst_length);
+
+    /* Loop over the blockSize */
+    i = blockSize;
+
+    while(i > 0u)
+    {
+      /* copy the sample from the circular buffer to the destination buffer */
+      *dst = circBuffer[rOffset];
+
+      /* Update the input pointer */
+      dst += dstInc;
+
+      if(dst == (int32_t *) dst_end)
+      {
+        dst = dst_base;
+      }
+
+      /* Circularly update rOffset.  Watch out for positive and negative value  */
+      rOffset += bufferInc;
+
+      if(rOffset >= L)
+      {
+        rOffset -= L;
+      }
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Update the index pointer */
+    *readOffset = rOffset;
+  }
+
+
+  /**
+   * @brief Q15 Circular write function.
+   */
+  static __INLINE void arm_circularWrite_q15(
+  q15_t * circBuffer,
+  int32_t L,
+  uint16_t * writeOffset,
+  int32_t bufferInc,
+  const q15_t * src,
+  int32_t srcInc,
+  uint32_t blockSize)
+  {
+    uint32_t i = 0u;
+    int32_t wOffset;
+
+    /* Copy the value of Index pointer that points
+     * to the current location where the input samples to be copied */
+    wOffset = *writeOffset;
+
+    /* Loop over the blockSize */
+    i = blockSize;
+
+    while(i > 0u)
+    {
+      /* copy the input sample to the circular buffer */
+      circBuffer[wOffset] = *src;
+
+      /* Update the input pointer */
+      src += srcInc;
+
+      /* Circularly update wOffset.  Watch out for positive and negative value */
+      wOffset += bufferInc;
+      if(wOffset >= L)
+        wOffset -= L;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Update the index pointer */
+    *writeOffset = (uint16_t)wOffset;
+  }
+
+
+  /**
+   * @brief Q15 Circular Read function.
+   */
+  static __INLINE void arm_circularRead_q15(
+  q15_t * circBuffer,
+  int32_t L,
+  int32_t * readOffset,
+  int32_t bufferInc,
+  q15_t * dst,
+  q15_t * dst_base,
+  int32_t dst_length,
+  int32_t dstInc,
+  uint32_t blockSize)
+  {
+    uint32_t i = 0;
+    int32_t rOffset, dst_end;
+
+    /* Copy the value of Index pointer that points
+     * to the current location from where the input samples to be read */
+    rOffset = *readOffset;
+
+    dst_end = (int32_t) (dst_base + dst_length);
+
+    /* Loop over the blockSize */
+    i = blockSize;
+
+    while(i > 0u)
+    {
+      /* copy the sample from the circular buffer to the destination buffer */
+      *dst = circBuffer[rOffset];
+
+      /* Update the input pointer */
+      dst += dstInc;
+
+      if(dst == (q15_t *) dst_end)
+      {
+        dst = dst_base;
+      }
+
+      /* Circularly update wOffset.  Watch out for positive and negative value */
+      rOffset += bufferInc;
+
+      if(rOffset >= L)
+      {
+        rOffset -= L;
+      }
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Update the index pointer */
+    *readOffset = rOffset;
+  }
+
+
+  /**
+   * @brief Q7 Circular write function.
+   */
+  static __INLINE void arm_circularWrite_q7(
+  q7_t * circBuffer,
+  int32_t L,
+  uint16_t * writeOffset,
+  int32_t bufferInc,
+  const q7_t * src,
+  int32_t srcInc,
+  uint32_t blockSize)
+  {
+    uint32_t i = 0u;
+    int32_t wOffset;
+
+    /* Copy the value of Index pointer that points
+     * to the current location where the input samples to be copied */
+    wOffset = *writeOffset;
+
+    /* Loop over the blockSize */
+    i = blockSize;
+
+    while(i > 0u)
+    {
+      /* copy the input sample to the circular buffer */
+      circBuffer[wOffset] = *src;
+
+      /* Update the input pointer */
+      src += srcInc;
+
+      /* Circularly update wOffset.  Watch out for positive and negative value */
+      wOffset += bufferInc;
+      if(wOffset >= L)
+        wOffset -= L;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Update the index pointer */
+    *writeOffset = (uint16_t)wOffset;
+  }
+
+
+  /**
+   * @brief Q7 Circular Read function.
+   */
+  static __INLINE void arm_circularRead_q7(
+  q7_t * circBuffer,
+  int32_t L,
+  int32_t * readOffset,
+  int32_t bufferInc,
+  q7_t * dst,
+  q7_t * dst_base,
+  int32_t dst_length,
+  int32_t dstInc,
+  uint32_t blockSize)
+  {
+    uint32_t i = 0;
+    int32_t rOffset, dst_end;
+
+    /* Copy the value of Index pointer that points
+     * to the current location from where the input samples to be read */
+    rOffset = *readOffset;
+
+    dst_end = (int32_t) (dst_base + dst_length);
+
+    /* Loop over the blockSize */
+    i = blockSize;
+
+    while(i > 0u)
+    {
+      /* copy the sample from the circular buffer to the destination buffer */
+      *dst = circBuffer[rOffset];
+
+      /* Update the input pointer */
+      dst += dstInc;
+
+      if(dst == (q7_t *) dst_end)
+      {
+        dst = dst_base;
+      }
+
+      /* Circularly update rOffset.  Watch out for positive and negative value */
+      rOffset += bufferInc;
+
+      if(rOffset >= L)
+      {
+        rOffset -= L;
+      }
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Update the index pointer */
+    *readOffset = rOffset;
+  }
+
+
+  /**
+   * @brief  Sum of the squares of the elements of a Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_power_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q63_t * pResult);
+
+
+  /**
+   * @brief  Sum of the squares of the elements of a floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_power_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult);
+
+
+  /**
+   * @brief  Sum of the squares of the elements of a Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_power_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q63_t * pResult);
+
+
+  /**
+   * @brief  Sum of the squares of the elements of a Q7 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_power_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult);
+
+
+  /**
+   * @brief  Mean value of a Q7 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_mean_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q7_t * pResult);
+
+
+  /**
+   * @brief  Mean value of a Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_mean_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult);
+
+
+  /**
+   * @brief  Mean value of a Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_mean_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult);
+
+
+  /**
+   * @brief  Mean value of a floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_mean_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult);
+
+
+  /**
+   * @brief  Variance of the elements of a floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_var_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult);
+
+
+  /**
+   * @brief  Variance of the elements of a Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_var_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult);
+
+
+  /**
+   * @brief  Variance of the elements of a Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_var_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult);
+
+
+  /**
+   * @brief  Root Mean Square of the elements of a floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_rms_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult);
+
+
+  /**
+   * @brief  Root Mean Square of the elements of a Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_rms_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult);
+
+
+  /**
+   * @brief  Root Mean Square of the elements of a Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_rms_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult);
+
+
+  /**
+   * @brief  Standard deviation of the elements of a floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_std_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult);
+
+
+  /**
+   * @brief  Standard deviation of the elements of a Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_std_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult);
+
+
+  /**
+   * @brief  Standard deviation of the elements of a Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output value.
+   */
+  void arm_std_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult);
+
+
+  /**
+   * @brief  Floating-point complex magnitude
+   * @param[in]  pSrc        points to the complex input vector
+   * @param[out] pDst        points to the real output vector
+   * @param[in]  numSamples  number of complex samples in the input vector
+   */
+  void arm_cmplx_mag_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q31 complex magnitude
+   * @param[in]  pSrc        points to the complex input vector
+   * @param[out] pDst        points to the real output vector
+   * @param[in]  numSamples  number of complex samples in the input vector
+   */
+  void arm_cmplx_mag_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q15 complex magnitude
+   * @param[in]  pSrc        points to the complex input vector
+   * @param[out] pDst        points to the real output vector
+   * @param[in]  numSamples  number of complex samples in the input vector
+   */
+  void arm_cmplx_mag_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q15 complex dot product
+   * @param[in]  pSrcA       points to the first input vector
+   * @param[in]  pSrcB       points to the second input vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   * @param[out] realResult  real part of the result returned here
+   * @param[out] imagResult  imaginary part of the result returned here
+   */
+  void arm_cmplx_dot_prod_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  uint32_t numSamples,
+  q31_t * realResult,
+  q31_t * imagResult);
+
+
+  /**
+   * @brief  Q31 complex dot product
+   * @param[in]  pSrcA       points to the first input vector
+   * @param[in]  pSrcB       points to the second input vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   * @param[out] realResult  real part of the result returned here
+   * @param[out] imagResult  imaginary part of the result returned here
+   */
+  void arm_cmplx_dot_prod_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  uint32_t numSamples,
+  q63_t * realResult,
+  q63_t * imagResult);
+
+
+  /**
+   * @brief  Floating-point complex dot product
+   * @param[in]  pSrcA       points to the first input vector
+   * @param[in]  pSrcB       points to the second input vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   * @param[out] realResult  real part of the result returned here
+   * @param[out] imagResult  imaginary part of the result returned here
+   */
+  void arm_cmplx_dot_prod_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  uint32_t numSamples,
+  float32_t * realResult,
+  float32_t * imagResult);
+
+
+  /**
+   * @brief  Q15 complex-by-real multiplication
+   * @param[in]  pSrcCmplx   points to the complex input vector
+   * @param[in]  pSrcReal    points to the real input vector
+   * @param[out] pCmplxDst   points to the complex output vector
+   * @param[in]  numSamples  number of samples in each vector
+   */
+  void arm_cmplx_mult_real_q15(
+  q15_t * pSrcCmplx,
+  q15_t * pSrcReal,
+  q15_t * pCmplxDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q31 complex-by-real multiplication
+   * @param[in]  pSrcCmplx   points to the complex input vector
+   * @param[in]  pSrcReal    points to the real input vector
+   * @param[out] pCmplxDst   points to the complex output vector
+   * @param[in]  numSamples  number of samples in each vector
+   */
+  void arm_cmplx_mult_real_q31(
+  q31_t * pSrcCmplx,
+  q31_t * pSrcReal,
+  q31_t * pCmplxDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Floating-point complex-by-real multiplication
+   * @param[in]  pSrcCmplx   points to the complex input vector
+   * @param[in]  pSrcReal    points to the real input vector
+   * @param[out] pCmplxDst   points to the complex output vector
+   * @param[in]  numSamples  number of samples in each vector
+   */
+  void arm_cmplx_mult_real_f32(
+  float32_t * pSrcCmplx,
+  float32_t * pSrcReal,
+  float32_t * pCmplxDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Minimum value of a Q7 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] result     is output pointer
+   * @param[in]  index      is the array index of the minimum value in the input buffer.
+   */
+  void arm_min_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q7_t * result,
+  uint32_t * index);
+
+
+  /**
+   * @brief  Minimum value of a Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output pointer
+   * @param[in]  pIndex     is the array index of the minimum value in the input buffer.
+   */
+  void arm_min_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult,
+  uint32_t * pIndex);
+
+
+  /**
+   * @brief  Minimum value of a Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output pointer
+   * @param[out] pIndex     is the array index of the minimum value in the input buffer.
+   */
+  void arm_min_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult,
+  uint32_t * pIndex);
+
+
+  /**
+   * @brief  Minimum value of a floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[in]  blockSize  is the number of samples to process
+   * @param[out] pResult    is output pointer
+   * @param[out] pIndex     is the array index of the minimum value in the input buffer.
+   */
+  void arm_min_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult,
+  uint32_t * pIndex);
+
+
+/**
+ * @brief Maximum value of a Q7 vector.
+ * @param[in]  pSrc       points to the input buffer
+ * @param[in]  blockSize  length of the input vector
+ * @param[out] pResult    maximum value returned here
+ * @param[out] pIndex     index of maximum value returned here
+ */
+  void arm_max_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q7_t * pResult,
+  uint32_t * pIndex);
+
+
+/**
+ * @brief Maximum value of a Q15 vector.
+ * @param[in]  pSrc       points to the input buffer
+ * @param[in]  blockSize  length of the input vector
+ * @param[out] pResult    maximum value returned here
+ * @param[out] pIndex     index of maximum value returned here
+ */
+  void arm_max_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult,
+  uint32_t * pIndex);
+
+
+/**
+ * @brief Maximum value of a Q31 vector.
+ * @param[in]  pSrc       points to the input buffer
+ * @param[in]  blockSize  length of the input vector
+ * @param[out] pResult    maximum value returned here
+ * @param[out] pIndex     index of maximum value returned here
+ */
+  void arm_max_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult,
+  uint32_t * pIndex);
+
+
+/**
+ * @brief Maximum value of a floating-point vector.
+ * @param[in]  pSrc       points to the input buffer
+ * @param[in]  blockSize  length of the input vector
+ * @param[out] pResult    maximum value returned here
+ * @param[out] pIndex     index of maximum value returned here
+ */
+  void arm_max_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult,
+  uint32_t * pIndex);
+
+
+  /**
+   * @brief  Q15 complex-by-complex multiplication
+   * @param[in]  pSrcA       points to the first input vector
+   * @param[in]  pSrcB       points to the second input vector
+   * @param[out] pDst        points to the output vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   */
+  void arm_cmplx_mult_cmplx_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Q31 complex-by-complex multiplication
+   * @param[in]  pSrcA       points to the first input vector
+   * @param[in]  pSrcB       points to the second input vector
+   * @param[out] pDst        points to the output vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   */
+  void arm_cmplx_mult_cmplx_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief  Floating-point complex-by-complex multiplication
+   * @param[in]  pSrcA       points to the first input vector
+   * @param[in]  pSrcB       points to the second input vector
+   * @param[out] pDst        points to the output vector
+   * @param[in]  numSamples  number of complex samples in each vector
+   */
+  void arm_cmplx_mult_cmplx_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t numSamples);
+
+
+  /**
+   * @brief Converts the elements of the floating-point vector to Q31 vector.
+   * @param[in]  pSrc       points to the floating-point input vector
+   * @param[out] pDst       points to the Q31 output vector
+   * @param[in]  blockSize  length of the input vector
+   */
+  void arm_float_to_q31(
+  float32_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Converts the elements of the floating-point vector to Q15 vector.
+   * @param[in]  pSrc       points to the floating-point input vector
+   * @param[out] pDst       points to the Q15 output vector
+   * @param[in]  blockSize  length of the input vector
+   */
+  void arm_float_to_q15(
+  float32_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief Converts the elements of the floating-point vector to Q7 vector.
+   * @param[in]  pSrc       points to the floating-point input vector
+   * @param[out] pDst       points to the Q7 output vector
+   * @param[in]  blockSize  length of the input vector
+   */
+  void arm_float_to_q7(
+  float32_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Converts the elements of the Q31 vector to Q15 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q31_to_q15(
+  q31_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Converts the elements of the Q31 vector to Q7 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q31_to_q7(
+  q31_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Converts the elements of the Q15 vector to floating-point vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q15_to_float(
+  q15_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Converts the elements of the Q15 vector to Q31 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q15_to_q31(
+  q15_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @brief  Converts the elements of the Q15 vector to Q7 vector.
+   * @param[in]  pSrc       is input pointer
+   * @param[out] pDst       is output pointer
+   * @param[in]  blockSize  is the number of samples to process
+   */
+  void arm_q15_to_q7(
+  q15_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize);
+
+
+  /**
+   * @ingroup groupInterpolation
+   */
+
+  /**
+   * @defgroup BilinearInterpolate Bilinear Interpolation
+   *
+   * Bilinear interpolation is an extension of linear interpolation applied to a two dimensional grid.
+   * The underlying function <code>f(x, y)</code> is sampled on a regular grid and the interpolation process
+   * determines values between the grid points.
+   * Bilinear interpolation is equivalent to two step linear interpolation, first in the x-dimension and then in the y-dimension.
+   * Bilinear interpolation is often used in image processing to rescale images.
+   * The CMSIS DSP library provides bilinear interpolation functions for Q7, Q15, Q31, and floating-point data types.
+   *
+   * <b>Algorithm</b>
+   * \par
+   * The instance structure used by the bilinear interpolation functions describes a two dimensional data table.
+   * For floating-point, the instance structure is defined as:
+   * <pre>
+   *   typedef struct
+   *   {
+   *     uint16_t numRows;
+   *     uint16_t numCols;
+   *     float32_t *pData;
+   * } arm_bilinear_interp_instance_f32;
+   * </pre>
+   *
+   * \par
+   * where <code>numRows</code> specifies the number of rows in the table;
+   * <code>numCols</code> specifies the number of columns in the table;
+   * and <code>pData</code> points to an array of size <code>numRows*numCols</code> values.
+   * The data table <code>pTable</code> is organized in row order and the supplied data values fall on integer indexes.
+   * That is, table element (x,y) is located at <code>pTable[x + y*numCols]</code> where x and y are integers.
+   *
+   * \par
+   * Let <code>(x, y)</code> specify the desired interpolation point.  Then define:
+   * <pre>
+   *     XF = floor(x)
+   *     YF = floor(y)
+   * </pre>
+   * \par
+   * The interpolated output point is computed as:
+   * <pre>
+   *  f(x, y) = f(XF, YF) * (1-(x-XF)) * (1-(y-YF))
+   *           + f(XF+1, YF) * (x-XF)*(1-(y-YF))
+   *           + f(XF, YF+1) * (1-(x-XF))*(y-YF)
+   *           + f(XF+1, YF+1) * (x-XF)*(y-YF)
+   * </pre>
+   * Note that the coordinates (x, y) contain integer and fractional components.
+   * The integer components specify which portion of the table to use while the
+   * fractional components control the interpolation processor.
+   *
+   * \par
+   * if (x,y) are outside of the table boundary, Bilinear interpolation returns zero output.
+   */
+
+  /**
+   * @addtogroup BilinearInterpolate
+   * @{
+   */
+
+
+  /**
+  *
+  * @brief  Floating-point bilinear interpolation.
+  * @param[in,out] S  points to an instance of the interpolation structure.
+  * @param[in]     X  interpolation coordinate.
+  * @param[in]     Y  interpolation coordinate.
+  * @return out interpolated value.
+  */
+  static __INLINE float32_t arm_bilinear_interp_f32(
+  const arm_bilinear_interp_instance_f32 * S,
+  float32_t X,
+  float32_t Y)
+  {
+    float32_t out;
+    float32_t f00, f01, f10, f11;
+    float32_t *pData = S->pData;
+    int32_t xIndex, yIndex, index;
+    float32_t xdiff, ydiff;
+    float32_t b1, b2, b3, b4;
+
+    xIndex = (int32_t) X;
+    yIndex = (int32_t) Y;
+
+    /* Care taken for table outside boundary */
+    /* Returns zero output when values are outside table boundary */
+    if(xIndex < 0 || xIndex > (S->numRows - 1) || yIndex < 0 || yIndex > (S->numCols - 1))
+    {
+      return (0);
+    }
+
+    /* Calculation of index for two nearest points in X-direction */
+    index = (xIndex - 1) + (yIndex - 1) * S->numCols;
+
+
+    /* Read two nearest points in X-direction */
+    f00 = pData[index];
+    f01 = pData[index + 1];
+
+    /* Calculation of index for two nearest points in Y-direction */
+    index = (xIndex - 1) + (yIndex) * S->numCols;
+
+
+    /* Read two nearest points in Y-direction */
+    f10 = pData[index];
+    f11 = pData[index + 1];
+
+    /* Calculation of intermediate values */
+    b1 = f00;
+    b2 = f01 - f00;
+    b3 = f10 - f00;
+    b4 = f00 - f01 - f10 + f11;
+
+    /* Calculation of fractional part in X */
+    xdiff = X - xIndex;
+
+    /* Calculation of fractional part in Y */
+    ydiff = Y - yIndex;
+
+    /* Calculation of bi-linear interpolated output */
+    out = b1 + b2 * xdiff + b3 * ydiff + b4 * xdiff * ydiff;
+
+    /* return to application */
+    return (out);
+  }
+
+
+  /**
+  *
+  * @brief  Q31 bilinear interpolation.
+  * @param[in,out] S  points to an instance of the interpolation structure.
+  * @param[in]     X  interpolation coordinate in 12.20 format.
+  * @param[in]     Y  interpolation coordinate in 12.20 format.
+  * @return out interpolated value.
+  */
+  static __INLINE q31_t arm_bilinear_interp_q31(
+  arm_bilinear_interp_instance_q31 * S,
+  q31_t X,
+  q31_t Y)
+  {
+    q31_t out;                                   /* Temporary output */
+    q31_t acc = 0;                               /* output */
+    q31_t xfract, yfract;                        /* X, Y fractional parts */
+    q31_t x1, x2, y1, y2;                        /* Nearest output values */
+    int32_t rI, cI;                              /* Row and column indices */
+    q31_t *pYData = S->pData;                    /* pointer to output table values */
+    uint32_t nCols = S->numCols;                 /* num of rows */
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    rI = ((X & (q31_t)0xFFF00000) >> 20);
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    cI = ((Y & (q31_t)0xFFF00000) >> 20);
+
+    /* Care taken for table outside boundary */
+    /* Returns zero output when values are outside table boundary */
+    if(rI < 0 || rI > (S->numRows - 1) || cI < 0 || cI > (S->numCols - 1))
+    {
+      return (0);
+    }
+
+    /* 20 bits for the fractional part */
+    /* shift left xfract by 11 to keep 1.31 format */
+    xfract = (X & 0x000FFFFF) << 11u;
+
+    /* Read two nearest output values from the index */
+    x1 = pYData[(rI) + (int32_t)nCols * (cI)    ];
+    x2 = pYData[(rI) + (int32_t)nCols * (cI) + 1];
+
+    /* 20 bits for the fractional part */
+    /* shift left yfract by 11 to keep 1.31 format */
+    yfract = (Y & 0x000FFFFF) << 11u;
+
+    /* Read two nearest output values from the index */
+    y1 = pYData[(rI) + (int32_t)nCols * (cI + 1)    ];
+    y2 = pYData[(rI) + (int32_t)nCols * (cI + 1) + 1];
+
+    /* Calculation of x1 * (1-xfract ) * (1-yfract) and acc is in 3.29(q29) format */
+    out = ((q31_t) (((q63_t) x1  * (0x7FFFFFFF - xfract)) >> 32));
+    acc = ((q31_t) (((q63_t) out * (0x7FFFFFFF - yfract)) >> 32));
+
+    /* x2 * (xfract) * (1-yfract)  in 3.29(q29) and adding to acc */
+    out = ((q31_t) ((q63_t) x2 * (0x7FFFFFFF - yfract) >> 32));
+    acc += ((q31_t) ((q63_t) out * (xfract) >> 32));
+
+    /* y1 * (1 - xfract) * (yfract)  in 3.29(q29) and adding to acc */
+    out = ((q31_t) ((q63_t) y1 * (0x7FFFFFFF - xfract) >> 32));
+    acc += ((q31_t) ((q63_t) out * (yfract) >> 32));
+
+    /* y2 * (xfract) * (yfract)  in 3.29(q29) and adding to acc */
+    out = ((q31_t) ((q63_t) y2 * (xfract) >> 32));
+    acc += ((q31_t) ((q63_t) out * (yfract) >> 32));
+
+    /* Convert acc to 1.31(q31) format */
+    return ((q31_t)(acc << 2));
+  }
+
+
+  /**
+  * @brief  Q15 bilinear interpolation.
+  * @param[in,out] S  points to an instance of the interpolation structure.
+  * @param[in]     X  interpolation coordinate in 12.20 format.
+  * @param[in]     Y  interpolation coordinate in 12.20 format.
+  * @return out interpolated value.
+  */
+  static __INLINE q15_t arm_bilinear_interp_q15(
+  arm_bilinear_interp_instance_q15 * S,
+  q31_t X,
+  q31_t Y)
+  {
+    q63_t acc = 0;                               /* output */
+    q31_t out;                                   /* Temporary output */
+    q15_t x1, x2, y1, y2;                        /* Nearest output values */
+    q31_t xfract, yfract;                        /* X, Y fractional parts */
+    int32_t rI, cI;                              /* Row and column indices */
+    q15_t *pYData = S->pData;                    /* pointer to output table values */
+    uint32_t nCols = S->numCols;                 /* num of rows */
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    rI = ((X & (q31_t)0xFFF00000) >> 20);
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    cI = ((Y & (q31_t)0xFFF00000) >> 20);
+
+    /* Care taken for table outside boundary */
+    /* Returns zero output when values are outside table boundary */
+    if(rI < 0 || rI > (S->numRows - 1) || cI < 0 || cI > (S->numCols - 1))
+    {
+      return (0);
+    }
+
+    /* 20 bits for the fractional part */
+    /* xfract should be in 12.20 format */
+    xfract = (X & 0x000FFFFF);
+
+    /* Read two nearest output values from the index */
+    x1 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI)    ];
+    x2 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI) + 1];
+
+    /* 20 bits for the fractional part */
+    /* yfract should be in 12.20 format */
+    yfract = (Y & 0x000FFFFF);
+
+    /* Read two nearest output values from the index */
+    y1 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI + 1)    ];
+    y2 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI + 1) + 1];
+
+    /* Calculation of x1 * (1-xfract ) * (1-yfract) and acc is in 13.51 format */
+
+    /* x1 is in 1.15(q15), xfract in 12.20 format and out is in 13.35 format */
+    /* convert 13.35 to 13.31 by right shifting  and out is in 1.31 */
+    out = (q31_t) (((q63_t) x1 * (0xFFFFF - xfract)) >> 4u);
+    acc = ((q63_t) out * (0xFFFFF - yfract));
+
+    /* x2 * (xfract) * (1-yfract)  in 1.51 and adding to acc */
+    out = (q31_t) (((q63_t) x2 * (0xFFFFF - yfract)) >> 4u);
+    acc += ((q63_t) out * (xfract));
+
+    /* y1 * (1 - xfract) * (yfract)  in 1.51 and adding to acc */
+    out = (q31_t) (((q63_t) y1 * (0xFFFFF - xfract)) >> 4u);
+    acc += ((q63_t) out * (yfract));
+
+    /* y2 * (xfract) * (yfract)  in 1.51 and adding to acc */
+    out = (q31_t) (((q63_t) y2 * (xfract)) >> 4u);
+    acc += ((q63_t) out * (yfract));
+
+    /* acc is in 13.51 format and down shift acc by 36 times */
+    /* Convert out to 1.15 format */
+    return ((q15_t)(acc >> 36));
+  }
+
+
+  /**
+  * @brief  Q7 bilinear interpolation.
+  * @param[in,out] S  points to an instance of the interpolation structure.
+  * @param[in]     X  interpolation coordinate in 12.20 format.
+  * @param[in]     Y  interpolation coordinate in 12.20 format.
+  * @return out interpolated value.
+  */
+  static __INLINE q7_t arm_bilinear_interp_q7(
+  arm_bilinear_interp_instance_q7 * S,
+  q31_t X,
+  q31_t Y)
+  {
+    q63_t acc = 0;                               /* output */
+    q31_t out;                                   /* Temporary output */
+    q31_t xfract, yfract;                        /* X, Y fractional parts */
+    q7_t x1, x2, y1, y2;                         /* Nearest output values */
+    int32_t rI, cI;                              /* Row and column indices */
+    q7_t *pYData = S->pData;                     /* pointer to output table values */
+    uint32_t nCols = S->numCols;                 /* num of rows */
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    rI = ((X & (q31_t)0xFFF00000) >> 20);
+
+    /* Input is in 12.20 format */
+    /* 12 bits for the table index */
+    /* Index value calculation */
+    cI = ((Y & (q31_t)0xFFF00000) >> 20);
+
+    /* Care taken for table outside boundary */
+    /* Returns zero output when values are outside table boundary */
+    if(rI < 0 || rI > (S->numRows - 1) || cI < 0 || cI > (S->numCols - 1))
+    {
+      return (0);
+    }
+
+    /* 20 bits for the fractional part */
+    /* xfract should be in 12.20 format */
+    xfract = (X & (q31_t)0x000FFFFF);
+
+    /* Read two nearest output values from the index */
+    x1 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI)    ];
+    x2 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI) + 1];
+
+    /* 20 bits for the fractional part */
+    /* yfract should be in 12.20 format */
+    yfract = (Y & (q31_t)0x000FFFFF);
+
+    /* Read two nearest output values from the index */
+    y1 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI + 1)    ];
+    y2 = pYData[((uint32_t)rI) + nCols * ((uint32_t)cI + 1) + 1];
+
+    /* Calculation of x1 * (1-xfract ) * (1-yfract) and acc is in 16.47 format */
+    out = ((x1 * (0xFFFFF - xfract)));
+    acc = (((q63_t) out * (0xFFFFF - yfract)));
+
+    /* x2 * (xfract) * (1-yfract)  in 2.22 and adding to acc */
+    out = ((x2 * (0xFFFFF - yfract)));
+    acc += (((q63_t) out * (xfract)));
+
+    /* y1 * (1 - xfract) * (yfract)  in 2.22 and adding to acc */
+    out = ((y1 * (0xFFFFF - xfract)));
+    acc += (((q63_t) out * (yfract)));
+
+    /* y2 * (xfract) * (yfract)  in 2.22 and adding to acc */
+    out = ((y2 * (yfract)));
+    acc += (((q63_t) out * (xfract)));
+
+    /* acc in 16.47 format and down shift by 40 to convert to 1.7 format */
+    return ((q7_t)(acc >> 40));
+  }
+
+  /**
+   * @} end of BilinearInterpolate group
+   */
+
+
+/* SMMLAR */
+#define multAcc_32x32_keep32_R(a, x, y) \
+    a = (q31_t) (((((q63_t) a) << 32) + ((q63_t) x * y) + 0x80000000LL ) >> 32)
+
+/* SMMLSR */
+#define multSub_32x32_keep32_R(a, x, y) \
+    a = (q31_t) (((((q63_t) a) << 32) - ((q63_t) x * y) + 0x80000000LL ) >> 32)
+
+/* SMMULR */
+#define mult_32x32_keep32_R(a, x, y) \
+    a = (q31_t) (((q63_t) x * y + 0x80000000LL ) >> 32)
+
+/* SMMLA */
+#define multAcc_32x32_keep32(a, x, y) \
+    a += (q31_t) (((q63_t) x * y) >> 32)
+
+/* SMMLS */
+#define multSub_32x32_keep32(a, x, y) \
+    a -= (q31_t) (((q63_t) x * y) >> 32)
+
+/* SMMUL */
+#define mult_32x32_keep32(a, x, y) \
+    a = (q31_t) (((q63_t) x * y ) >> 32)
+
+
+#if defined ( __CC_ARM )
+  /* Enter low optimization region - place directly above function definition */
+  #if defined( ARM_MATH_CM4 ) || defined( ARM_MATH_CM7)
+    #define LOW_OPTIMIZATION_ENTER \
+       _Pragma ("push")         \
+       _Pragma ("O1")
+  #else
+    #define LOW_OPTIMIZATION_ENTER
+  #endif
+
+  /* Exit low optimization region - place directly after end of function definition */
+  #if defined( ARM_MATH_CM4 ) || defined( ARM_MATH_CM7)
+    #define LOW_OPTIMIZATION_EXIT \
+       _Pragma ("pop")
+  #else
+    #define LOW_OPTIMIZATION_EXIT
+  #endif
+
+  /* Enter low optimization region - place directly above function definition */
+  #define IAR_ONLY_LOW_OPTIMIZATION_ENTER
+
+  /* Exit low optimization region - place directly after end of function definition */
+  #define IAR_ONLY_LOW_OPTIMIZATION_EXIT
+
+#elif defined(__ARMCC_VERSION) && (__ARMCC_VERSION >= 6010050)
+  #define LOW_OPTIMIZATION_ENTER
+  #define LOW_OPTIMIZATION_EXIT
+  #define IAR_ONLY_LOW_OPTIMIZATION_ENTER
+  #define IAR_ONLY_LOW_OPTIMIZATION_EXIT
+
+#elif defined(__GNUC__)
+  #define LOW_OPTIMIZATION_ENTER __attribute__(( optimize("-O1") ))
+  #define LOW_OPTIMIZATION_EXIT
+  #define IAR_ONLY_LOW_OPTIMIZATION_ENTER
+  #define IAR_ONLY_LOW_OPTIMIZATION_EXIT
+
+#elif defined(__ICCARM__)
+  /* Enter low optimization region - place directly above function definition */
+  #if defined( ARM_MATH_CM4 ) || defined( ARM_MATH_CM7)
+    #define LOW_OPTIMIZATION_ENTER \
+       _Pragma ("optimize=low")
+  #else
+    #define LOW_OPTIMIZATION_ENTER
+  #endif
+
+  /* Exit low optimization region - place directly after end of function definition */
+  #define LOW_OPTIMIZATION_EXIT
+
+  /* Enter low optimization region - place directly above function definition */
+  #if defined( ARM_MATH_CM4 ) || defined( ARM_MATH_CM7)
+    #define IAR_ONLY_LOW_OPTIMIZATION_ENTER \
+       _Pragma ("optimize=low")
+  #else
+    #define IAR_ONLY_LOW_OPTIMIZATION_ENTER
+  #endif
+
+  /* Exit low optimization region - place directly after end of function definition */
+  #define IAR_ONLY_LOW_OPTIMIZATION_EXIT
+
+#elif defined(__CSMC__)
+  #define LOW_OPTIMIZATION_ENTER
+  #define LOW_OPTIMIZATION_EXIT
+  #define IAR_ONLY_LOW_OPTIMIZATION_ENTER
+  #define IAR_ONLY_LOW_OPTIMIZATION_EXIT
+
+#elif defined(__TASKING__)
+  #define LOW_OPTIMIZATION_ENTER
+  #define LOW_OPTIMIZATION_EXIT
+  #define IAR_ONLY_LOW_OPTIMIZATION_ENTER
+  #define IAR_ONLY_LOW_OPTIMIZATION_EXIT
+
+#endif
+
+
+#ifdef   __cplusplus
+}
+#endif
+
+
+#if defined ( __GNUC__ )
+#pragma GCC diagnostic pop
+#endif
+
+#endif /* _ARM_MATH_H */
+
+/**
+ *
+ * End of file.
+ */
\ No newline at end of file

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,165 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_abs_f32.c    
+*    
+* Description:	Vector absolute value.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include <math.h>
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup BasicAbs Vector Absolute Value        
+ *        
+ * Computes the absolute value of a vector on an element-by-element basis.        
+ *        
+ * <pre>        
+ *     pDst[n] = abs(pSrc[n]),   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * The functions support in-place computation allowing the source and
+ * destination pointers to reference the same memory buffer.
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.
+ */
+
+/**        
+ * @addtogroup BasicAbs        
+ * @{        
+ */
+
+/**        
+ * @brief Floating-point vector absolute value.        
+ * @param[in]       *pSrc points to the input buffer        
+ * @param[out]      *pDst points to the output buffer        
+ * @param[in]       blockSize number of samples in each vector        
+ * @return none.        
+ */
+
+void arm_abs_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1, in2, in3, in4;                  /* temporary variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Calculate absolute and then store the results in the destination buffer. */
+    /* read sample from source */
+    in1 = *pSrc;
+    in2 = *(pSrc + 1);
+    in3 = *(pSrc + 2);
+
+    /* find absolute value */
+    in1 = fabsf(in1);
+
+    /* read sample from source */
+    in4 = *(pSrc + 3);
+
+    /* find absolute value */
+    in2 = fabsf(in2);
+
+    /* read sample from source */
+    *pDst = in1;
+
+    /* find absolute value */
+    in3 = fabsf(in3);
+
+    /* find absolute value */
+    in4 = fabsf(in4);
+
+    /* store result to destination */
+    *(pDst + 1) = in2;
+
+    /* store result to destination */
+    *(pDst + 2) = in3;
+
+    /* store result to destination */
+    *(pDst + 3) = in4;
+
+
+    /* Update source pointer to process next sampels */
+    pSrc += 4u;
+
+    /* Update destination pointer to process next sampels */
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY   */
+
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Calculate absolute and then store the results in the destination buffer. */
+    *pDst++ = fabsf(*pSrc++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of BasicAbs group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,179 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_abs_q15.c    
+*    
+* Description:	Q15 vector absolute value.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicAbs    
+ * @{    
+ */
+
+/**    
+ * @brief Q15 vector absolute value.    
+ * @param[in]       *pSrc points to the input buffer    
+ * @param[out]      *pDst points to the output buffer    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * The Q15 value -1 (0x8000) will be saturated to the maximum allowable positive value 0x7FFF.    
+ */
+
+void arm_abs_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+  __SIMD32_TYPE *simd;
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t in1;                                     /* Input value1 */
+  q15_t in2;                                     /* Input value2 */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  simd = __SIMD32_CONST(pDst);
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Read two inputs */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+
+
+    /* Store the Absolute result in the destination buffer by packing the two values, in a single cycle */
+#ifndef  ARM_MATH_BIG_ENDIAN
+    *simd++ =
+      __PKHBT(((in1 > 0) ? in1 : (q15_t)__QSUB16(0, in1)),
+              ((in2 > 0) ? in2 : (q15_t)__QSUB16(0, in2)), 16);
+
+#else
+
+
+    *simd++ =
+      __PKHBT(((in2 > 0) ? in2 : (q15_t)__QSUB16(0, in2)),
+              ((in1 > 0) ? in1 : (q15_t)__QSUB16(0, in1)), 16);
+
+#endif /* #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+    *simd++ =
+      __PKHBT(((in1 > 0) ? in1 : (q15_t)__QSUB16(0, in1)),
+              ((in2 > 0) ? in2 : (q15_t)__QSUB16(0, in2)), 16);
+
+#else
+
+
+    *simd++ =
+      __PKHBT(((in2 > 0) ? in2 : (q15_t)__QSUB16(0, in2)),
+              ((in1 > 0) ? in1 : (q15_t)__QSUB16(0, in1)), 16);
+
+#endif /* #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+  pDst = (q15_t *)simd;
+	
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Read the input */
+    in1 = *pSrc++;
+
+    /* Calculate absolute value of input and then store the result in the destination buffer. */
+    *pDst++ = (in1 > 0) ? in1 : (q15_t)__QSUB16(0, in1);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q15_t in;                                      /* Temporary input variable */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Read the input */
+    in = *pSrc++;
+
+    /* Calculate absolute value of input and then store the result in the destination buffer. */
+    *pDst++ = (in > 0) ? in : ((in == (q15_t) 0x8000) ? 0x7fff : -in);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of BasicAbs group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,130 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_abs_q31.c    
+*    
+* Description:	Q31 vector absolute value.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicAbs    
+ * @{    
+ */
+
+
+/**    
+ * @brief Q31 vector absolute value.    
+ * @param[in]       *pSrc points to the input buffer    
+ * @param[out]      *pDst points to the output buffer    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * The Q31 value -1 (0x80000000) will be saturated to the maximum allowable positive value 0x7FFFFFFF.    
+ */
+
+void arm_abs_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  q31_t in;                                      /* Input value */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Calculate absolute of input (if -1 then saturated to 0x7fffffff) and then store the results in the destination buffer. */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    *pDst++ = (in1 > 0) ? in1 : (q31_t)__QSUB(0, in1);
+    *pDst++ = (in2 > 0) ? in2 : (q31_t)__QSUB(0, in2);
+    *pDst++ = (in3 > 0) ? in3 : (q31_t)__QSUB(0, in3);
+    *pDst++ = (in4 > 0) ? in4 : (q31_t)__QSUB(0, in4);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY   */
+
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Calculate absolute value of the input (if -1 then saturated to 0x7fffffff) and then store the results in the destination buffer. */
+    in = *pSrc++;
+    *pDst++ = (in > 0) ? in : ((in == INT32_MIN) ? INT32_MAX : -in);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of BasicAbs group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_abs_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,157 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_abs_q7.c    
+*    
+* Description:	Q7 vector absolute value.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @addtogroup BasicAbs        
+ * @{        
+ */
+
+/**        
+ * @brief Q7 vector absolute value.        
+ * @param[in]       *pSrc points to the input buffer        
+ * @param[out]      *pDst points to the output buffer        
+ * @param[in]       blockSize number of samples in each vector        
+ * @return none.        
+ *    
+ * \par Conditions for optimum performance    
+ *  Input and output buffers should be aligned by 32-bit    
+ *    
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function uses saturating arithmetic.        
+ * The Q7 value -1 (0x80) will be saturated to the maximum allowable positive value 0x7F.        
+ */
+
+void arm_abs_q7(
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  q7_t in;                                       /* Input value1 */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;                      /* temporary input variables */
+  q31_t out1, out2, out3, out4;                  /* temporary output variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Read inputs */
+    in1 = (q31_t) * pSrc;
+    in2 = (q31_t) * (pSrc + 1);
+    in3 = (q31_t) * (pSrc + 2);
+
+    /* find absolute value */
+    out1 = (in1 > 0) ? in1 : (q31_t)__QSUB8(0, in1);
+
+    /* read input */
+    in4 = (q31_t) * (pSrc + 3);
+
+    /* find absolute value */
+    out2 = (in2 > 0) ? in2 : (q31_t)__QSUB8(0, in2);
+
+    /* store result to destination */
+    *pDst = (q7_t) out1;
+
+    /* find absolute value */
+    out3 = (in3 > 0) ? in3 : (q31_t)__QSUB8(0, in3);
+
+    /* find absolute value */
+    out4 = (in4 > 0) ? in4 : (q31_t)__QSUB8(0, in4);
+
+    /* store result to destination */
+    *(pDst + 1) = (q7_t) out2;
+
+    /* store result to destination */
+    *(pDst + 2) = (q7_t) out3;
+
+    /* store result to destination */
+    *(pDst + 3) = (q7_t) out4;
+
+    /* update pointers to process next samples */
+    pSrc += 4u;
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = blockSize;
+
+#endif /* #define ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = |A| */
+    /* Read the input */
+    in = *pSrc++;
+
+    /* Store the Absolute result in the destination buffer */
+    *pDst++ = (in > 0) ? in : ((in == (q7_t) 0x80) ? 0x7f : -in);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of BasicAbs group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,150 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_add_f32.c    
+*    
+* Description:	Floating-point vector addition.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup BasicAdd Vector Addition        
+ *        
+ * Element-by-element addition of two vectors.        
+ *        
+ * <pre>        
+ *     pDst[n] = pSrcA[n] + pSrcB[n],   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup BasicAdd        
+ * @{        
+ */
+
+/**        
+ * @brief Floating-point vector addition.        
+ * @param[in]       *pSrcA points to the first input vector        
+ * @param[in]       *pSrcB points to the second input vector        
+ * @param[out]      *pDst points to the output vector        
+ * @param[in]       blockSize number of samples in each vector        
+ * @return none.        
+ */
+
+void arm_add_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t inA1, inA2, inA3, inA4;              /* temporary input variabels */
+  float32_t inB1, inB2, inB3, inB4;              /* temporary input variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+
+    /* read four inputs from sourceA and four inputs from sourceB */
+    inA1 = *pSrcA;
+    inB1 = *pSrcB;
+    inA2 = *(pSrcA + 1);
+    inB2 = *(pSrcB + 1);
+    inA3 = *(pSrcA + 2);
+    inB3 = *(pSrcB + 2);
+    inA4 = *(pSrcA + 3);
+    inB4 = *(pSrcB + 3);
+
+    /* C = A + B */
+    /* add and store result to destination */
+    *pDst = inA1 + inB1;
+    *(pDst + 1) = inA2 + inB2;
+    *(pDst + 2) = inA3 + inB3;
+    *(pDst + 3) = inA4 + inB4;
+
+    /* update pointers to process next samples */
+    pSrcA += 4u;
+    pSrcB += 4u;
+    pDst += 4u;
+
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = (*pSrcA++) + (*pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of BasicAdd group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,140 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_add_q15.c    
+*    
+* Description:	Q15 vector addition    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicAdd    
+ * @{    
+ */
+
+/**    
+ * @brief Q15 vector addition.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+void arm_add_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2, inB1, inB2;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    inA1 = *__SIMD32(pSrcA)++;
+    inA2 = *__SIMD32(pSrcA)++;
+    inB1 = *__SIMD32(pSrcB)++;
+    inB2 = *__SIMD32(pSrcB)++;
+
+    *__SIMD32(pDst)++ = __QADD16(inA1, inB1);
+    *__SIMD32(pDst)++ = __QADD16(inA2, inB2);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = (q15_t) __QADD16(*pSrcA++, *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = (q15_t) __SSAT(((q31_t) * pSrcA++ + *pSrcB++), 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+}
+
+/**    
+ * @} end of BasicAdd group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,148 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_add_q31.c    
+*    
+* Description:	Q31 vector addition.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicAdd    
+ * @{    
+ */
+
+
+/**    
+ * @brief Q31 vector addition.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] will be saturated.    
+ */
+
+void arm_add_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2, inA3, inA4;
+  q31_t inB1, inB2, inB3, inB4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    inA1 = *pSrcA++;
+    inA2 = *pSrcA++;
+    inB1 = *pSrcB++;
+    inB2 = *pSrcB++;
+
+    inA3 = *pSrcA++;
+    inA4 = *pSrcA++;
+    inB3 = *pSrcB++;
+    inB4 = *pSrcB++;
+
+    *pDst++ = __QADD(inA1, inB1);
+    *pDst++ = __QADD(inA2, inB2);
+    *pDst++ = __QADD(inA3, inB3);
+    *pDst++ = __QADD(inA4, inB4);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = __QADD(*pSrcA++, *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrcA++ + *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of BasicAdd group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_add_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,134 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_add_q7.c    
+*    
+* Description:	Q7 vector addition.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicAdd    
+ * @{    
+ */
+
+/**    
+ * @brief Q7 vector addition.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.    
+ */
+
+void arm_add_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *__SIMD32(pDst)++ = __QADD8(*__SIMD32(pSrcA)++, *__SIMD32(pSrcB)++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = (q7_t) __SSAT(*pSrcA++ + *pSrcB++, 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + B */
+    /* Add and then store the results in the destination buffer. */
+    *pDst++ = (q7_t) __SSAT((q15_t) * pSrcA++ + *pSrcB++, 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+}
+
+/**    
+ * @} end of BasicAdd group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,135 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_dot_prod_f32.c    
+*    
+* Description:	Floating-point dot product.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath
+ */
+
+/**
+ * @defgroup dot_prod Vector Dot Product
+ *
+ * Computes the dot product of two vectors.
+ * The vectors are multiplied element-by-element and then summed.
+ *
+ * <pre>
+ *     sum = pSrcA[0]*pSrcB[0] + pSrcA[1]*pSrcB[1] + ... + pSrcA[blockSize-1]*pSrcB[blockSize-1]
+ * </pre>     
+ *
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.    
+ */
+
+/**    
+ * @addtogroup dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief Dot product of floating-point vectors.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @param[out]      *result output result returned here    
+ * @return none.    
+ */
+
+
+void arm_dot_prod_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  uint32_t blockSize,
+  float32_t * result)
+{
+  float32_t sum = 0.0f;                          /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the result in a temporary buffer */
+    sum += (*pSrcA++) * (*pSrcB++);
+    sum += (*pSrcA++) * (*pSrcB++);
+    sum += (*pSrcA++) * (*pSrcB++);
+    sum += (*pSrcA++) * (*pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the result in a temporary buffer. */
+    sum += (*pSrcA++) * (*pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+  /* Store the result back in the destination buffer */
+  *result = sum;
+}
+
+/**    
+ * @} end of dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,140 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_dot_prod_q15.c    
+*    
+* Description:	Q15 dot product.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief Dot product of Q15 vectors.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @param[out]      *result output result returned here    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The intermediate multiplications are in 1.15 x 1.15 = 2.30 format and these    
+ * results are added to a 64-bit accumulator in 34.30 format.    
+ * Nonsaturating additions are used and given that there are 33 guard bits in the accumulator    
+ * there is no risk of overflow.    
+ * The return result is in 34.30 format.    
+ */
+
+void arm_dot_prod_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  uint32_t blockSize,
+  q63_t * result)
+{
+  q63_t sum = 0;                                 /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the result in a temporary buffer. */
+    sum = __SMLALD(*__SIMD32(pSrcA)++, *__SIMD32(pSrcB)++, sum);
+    sum = __SMLALD(*__SIMD32(pSrcA)++, *__SIMD32(pSrcB)++, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the results in a temporary buffer. */
+    sum = __SMLALD(*pSrcA++, *pSrcB++, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the results in a temporary buffer. */
+    sum += (q63_t) ((q31_t) * pSrcA++ * *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Store the result in the destination buffer in 34.30 format */
+  *result = sum;
+
+}
+
+/**    
+ * @} end of dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,143 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_dot_prod_q31.c    
+*    
+* Description:	Q31 dot product.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief Dot product of Q31 vectors.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @param[out]      *result output result returned here    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The intermediate multiplications are in 1.31 x 1.31 = 2.62 format and these    
+ * are truncated to 2.48 format by discarding the lower 14 bits.    
+ * The 2.48 result is then added without saturation to a 64-bit accumulator in 16.48 format.    
+ * There are 15 guard bits in the accumulator and there is no risk of overflow as long as    
+ * the length of the vectors is less than 2^16 elements.    
+ * The return result is in 16.48 format.    
+ */
+
+void arm_dot_prod_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  uint32_t blockSize,
+  q63_t * result)
+{
+  q63_t sum = 0;                                 /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2, inA3, inA4;
+  q31_t inB1, inB2, inB3, inB4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the result in a temporary buffer. */
+    inA1 = *pSrcA++;
+    inA2 = *pSrcA++;
+    inA3 = *pSrcA++;
+    inA4 = *pSrcA++;
+    inB1 = *pSrcB++;
+    inB2 = *pSrcB++;
+    inB3 = *pSrcB++;
+    inB4 = *pSrcB++;
+
+    sum += ((q63_t) inA1 * inB1) >> 14u;
+    sum += ((q63_t) inA2 * inB2) >> 14u;
+    sum += ((q63_t) inA3 * inB3) >> 14u;
+    sum += ((q63_t) inA4 * inB4) >> 14u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Calculate dot product and then store the result in a temporary buffer. */
+    sum += ((q63_t) * pSrcA++ * *pSrcB++) >> 14u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Store the result in the destination buffer in 16.48 format */
+  *result = sum;
+}
+
+/**    
+ * @} end of dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_dot_prod_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,159 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_dot_prod_q7.c    
+*    
+* Description:	Q7 dot product.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief Dot product of Q7 vectors.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @param[out]      *result output result returned here    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The intermediate multiplications are in 1.7 x 1.7 = 2.14 format and these    
+ * results are added to an accumulator in 18.14 format.    
+ * Nonsaturating additions are used and there is no danger of wrap around as long as    
+ * the vectors are less than 2^18 elements long.    
+ * The return result is in 18.14 format.    
+ */
+
+void arm_dot_prod_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  uint32_t blockSize,
+  q31_t * result)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+  q31_t sum = 0;                                 /* Temporary variables to store output */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t input1, input2;                          /* Temporary variables to store input */
+  q31_t inA1, inA2, inB1, inB2;                  /* Temporary variables to store input */
+
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* read 4 samples at a time from sourceA */
+    input1 = *__SIMD32(pSrcA)++;
+    /* read 4 samples at a time from sourceB */
+    input2 = *__SIMD32(pSrcB)++;
+
+    /* extract two q7_t samples to q15_t samples */
+    inA1 = __SXTB16(__ROR(input1, 8));
+    /* extract reminaing two samples */
+    inA2 = __SXTB16(input1);
+    /* extract two q7_t samples to q15_t samples */
+    inB1 = __SXTB16(__ROR(input2, 8));
+    /* extract reminaing two samples */
+    inB2 = __SXTB16(input2);
+
+    /* multiply and accumulate two samples at a time */
+    sum = __SMLAD(inA1, inB1, sum);
+    sum = __SMLAD(inA2, inB2, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Dot product and then store the results in a temporary buffer. */
+    sum = __SMLAD(*pSrcA++, *pSrcB++, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0]* B[0] + A[1]* B[1] + A[2]* B[2] + .....+ A[blockSize-1]* B[blockSize-1] */
+    /* Dot product and then store the results in a temporary buffer. */
+    sum += (q31_t) ((q15_t) * pSrcA++ * *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  /* Store the result in the destination buffer in 18.14 format */
+  *result = sum;
+}
+
+/**    
+ * @} end of dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,174 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mult_f32.c    
+*    
+* Description:	Floating-point vector multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup BasicMult Vector Multiplication        
+ *        
+ * Element-by-element multiplication of two vectors.        
+ *        
+ * <pre>        
+ *     pDst[n] = pSrcA[n] * pSrcB[n],   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup BasicMult        
+ * @{        
+ */
+
+/**        
+ * @brief Floating-point vector multiplication.        
+ * @param[in]       *pSrcA points to the first input vector        
+ * @param[in]       *pSrcB points to the second input vector        
+ * @param[out]      *pDst points to the output vector        
+ * @param[in]       blockSize number of samples in each vector        
+ * @return none.        
+ */
+
+void arm_mult_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counters */
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t inA1, inA2, inA3, inA4;              /* temporary input variables */
+  float32_t inB1, inB2, inB3, inB4;              /* temporary input variables */
+  float32_t out1, out2, out3, out4;              /* temporary output variables */
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and store the results in output buffer */
+    /* read sample from sourceA */
+    inA1 = *pSrcA;
+    /* read sample from sourceB */
+    inB1 = *pSrcB;
+    /* read sample from sourceA */
+    inA2 = *(pSrcA + 1);
+    /* read sample from sourceB */
+    inB2 = *(pSrcB + 1);
+
+    /* out = sourceA * sourceB */
+    out1 = inA1 * inB1;
+
+    /* read sample from sourceA */
+    inA3 = *(pSrcA + 2);
+    /* read sample from sourceB */
+    inB3 = *(pSrcB + 2);
+
+    /* out = sourceA * sourceB */
+    out2 = inA2 * inB2;
+
+    /* read sample from sourceA */
+    inA4 = *(pSrcA + 3);
+
+    /* store result to destination buffer */
+    *pDst = out1;
+
+    /* read sample from sourceB */
+    inB4 = *(pSrcB + 3);
+
+    /* out = sourceA * sourceB */
+    out3 = inA3 * inB3;
+
+    /* store result to destination buffer */
+    *(pDst + 1) = out2;
+
+    /* out = sourceA * sourceB */
+    out4 = inA4 * inB4;
+    /* store result to destination buffer */
+    *(pDst + 2) = out3;
+    /* store result to destination buffer */
+    *(pDst + 3) = out4;
+
+
+    /* update pointers to process next samples */
+    pSrcA += 4u;
+    pSrcB += 4u;
+    pDst += 4u;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and store the results in output buffer */
+    *pDst++ = (*pSrcA++) * (*pSrcB++);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of BasicMult group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,154 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mult_q15.c    
+*    
+* Description:	Q15 vector multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicMult    
+ * @{    
+ */
+
+
+/**    
+ * @brief           Q15 vector multiplication    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+void arm_mult_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counters */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2, inB1, inB2;                  /* temporary input variables */
+  q15_t out1, out2, out3, out4;                  /* temporary output variables */
+  q31_t mul1, mul2, mul3, mul4;                  /* temporary variables */
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* read two samples at a time from sourceA */
+    inA1 = *__SIMD32(pSrcA)++;
+    /* read two samples at a time from sourceB */
+    inB1 = *__SIMD32(pSrcB)++;
+    /* read two samples at a time from sourceA */
+    inA2 = *__SIMD32(pSrcA)++;
+    /* read two samples at a time from sourceB */
+    inB2 = *__SIMD32(pSrcB)++;
+
+    /* multiply mul = sourceA * sourceB */
+    mul1 = (q31_t) ((q15_t) (inA1 >> 16) * (q15_t) (inB1 >> 16));
+    mul2 = (q31_t) ((q15_t) inA1 * (q15_t) inB1);
+    mul3 = (q31_t) ((q15_t) (inA2 >> 16) * (q15_t) (inB2 >> 16));
+    mul4 = (q31_t) ((q15_t) inA2 * (q15_t) inB2);
+
+    /* saturate result to 16 bit */
+    out1 = (q15_t) __SSAT(mul1 >> 15, 16);
+    out2 = (q15_t) __SSAT(mul2 >> 15, 16);
+    out3 = (q15_t) __SSAT(mul3 >> 15, 16);
+    out4 = (q15_t) __SSAT(mul4 >> 15, 16);
+
+    /* store the result */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pDst)++ = __PKHBT(out2, out1, 16);
+    *__SIMD32(pDst)++ = __PKHBT(out4, out3, 16);
+
+#else
+
+    *__SIMD32(pDst)++ = __PKHBT(out2, out1, 16);
+    *__SIMD32(pDst)++ = __PKHBT(out4, out3, 16);
+
+#endif /* #ifndef ARM_MATH_BIG_ENDIAN */
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and store the result in the destination buffer */
+    *pDst++ = (q15_t) __SSAT((((q31_t) (*pSrcA++) * (*pSrcB++)) >> 15), 16);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of BasicMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,160 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mult_q31.c    
+*    
+* Description:	Q31 vector multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicMult    
+ * @{    
+ */
+
+/**    
+ * @brief Q31 vector multiplication.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] will be saturated.    
+ */
+
+void arm_mult_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counters */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2, inA3, inA4;                  /* temporary input variables */
+  q31_t inB1, inB2, inB3, inB4;                  /* temporary input variables */
+  q31_t out1, out2, out3, out4;                  /* temporary output variables */
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and then store the results in the destination buffer. */
+    inA1 = *pSrcA++;
+    inA2 = *pSrcA++;
+    inA3 = *pSrcA++;
+    inA4 = *pSrcA++;
+    inB1 = *pSrcB++;
+    inB2 = *pSrcB++;
+    inB3 = *pSrcB++;
+    inB4 = *pSrcB++;
+
+    out1 = ((q63_t) inA1 * inB1) >> 32;
+    out2 = ((q63_t) inA2 * inB2) >> 32;
+    out3 = ((q63_t) inA3 * inB3) >> 32;
+    out4 = ((q63_t) inA4 * inB4) >> 32;
+
+    out1 = __SSAT(out1, 31);
+    out2 = __SSAT(out2, 31);
+    out3 = __SSAT(out3, 31);
+    out4 = __SSAT(out4, 31);
+
+    *pDst++ = out1 << 1u;
+    *pDst++ = out2 << 1u;
+    *pDst++ = out3 << 1u;
+    *pDst++ = out4 << 1u;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+  
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and then store the results in the destination buffer. */
+    inA1 = *pSrcA++;
+    inB1 = *pSrcB++;
+    out1 = ((q63_t) inA1 * inB1) >> 32;
+    out1 = __SSAT(out1, 31);
+    *pDst++ = out1 << 1u;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and then store the results in the destination buffer. */
+    *pDst++ =
+      (q31_t) clip_q63_to_q31(((q63_t) (*pSrcA++) * (*pSrcB++)) >> 31);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+  
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+}
+
+/**    
+ * @} end of BasicMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_mult_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,127 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mult_q7.c    
+*    
+* Description:	Q7 vector multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicMult    
+ * @{    
+ */
+
+/**    
+ * @brief           Q7 vector multiplication    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.    
+ */
+
+void arm_mult_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counters */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q7_t out1, out2, out3, out4;                   /* Temporary variables to store the product */
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and store the results in temporary variables */
+    out1 = (q7_t) __SSAT((((q15_t) (*pSrcA++) * (*pSrcB++)) >> 7), 8);
+    out2 = (q7_t) __SSAT((((q15_t) (*pSrcA++) * (*pSrcB++)) >> 7), 8);
+    out3 = (q7_t) __SSAT((((q15_t) (*pSrcA++) * (*pSrcB++)) >> 7), 8);
+    out4 = (q7_t) __SSAT((((q15_t) (*pSrcA++) * (*pSrcB++)) >> 7), 8);
+
+    /* Store the results of 4 inputs in the destination buffer in single cycle by packing */
+    *__SIMD32(pDst)++ = __PACKq7(out1, out2, out3, out4);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * B */
+    /* Multiply the inputs and store the result in the destination buffer */
+    *pDst++ = (q7_t) __SSAT((((q15_t) (*pSrcA++) * (*pSrcB++)) >> 7), 8);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of BasicMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,146 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_negate_f32.c    
+*    
+* Description:	Negates floating-point vectors.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup negate Vector Negate        
+ *        
+ * Negates the elements of a vector.        
+ *        
+ * <pre>        
+ *     pDst[n] = -pSrc[n],   0 <= n < blockSize.        
+ * </pre>        
+ *
+ * The functions support in-place computation allowing the source and
+ * destination pointers to reference the same memory buffer.
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.
+ */
+
+/**        
+ * @addtogroup negate        
+ * @{        
+ */
+
+/**        
+ * @brief  Negates the elements of a floating-point vector.        
+ * @param[in]  *pSrc points to the input vector        
+ * @param[out]  *pDst points to the output vector        
+ * @param[in]  blockSize number of samples in the vector        
+ * @return none.        
+ */
+
+void arm_negate_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1, in2, in3, in4;                  /* temporary variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* read inputs from source */
+    in1 = *pSrc;
+    in2 = *(pSrc + 1);
+    in3 = *(pSrc + 2);
+    in4 = *(pSrc + 3);
+
+    /* negate the input */
+    in1 = -in1;
+    in2 = -in2;
+    in3 = -in3;
+    in4 = -in4;
+
+    /* store the result to destination */
+    *pDst = in1;
+    *(pDst + 1) = in2;
+    *(pDst + 2) = in3;
+    *(pDst + 3) = in4;
+
+    /* update pointers to process next samples */
+    pSrc += 4u;
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Negate and then store the results in the destination buffer. */
+    *pDst++ = -*pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of negate group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,142 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_negate_q15.c    
+*    
+* Description:	Negates Q15 vectors.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @addtogroup negate        
+ * @{        
+ */
+
+/**        
+ * @brief  Negates the elements of a Q15 vector.        
+ * @param[in]  *pSrc points to the input vector        
+ * @param[out]  *pDst points to the output vector        
+ * @param[in]  blockSize number of samples in the vector        
+ * @return none.        
+ *    
+ * \par Conditions for optimum performance    
+ *  Input and output buffers should be aligned by 32-bit    
+ *    
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function uses saturating arithmetic.        
+ * The Q15 value -1 (0x8000) will be saturated to the maximum allowable positive value 0x7FFF.        
+ */
+
+void arm_negate_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  q15_t in;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in1, in2;                                /* Temporary variables */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Read two inputs at a time */
+    in1 = _SIMD32_OFFSET(pSrc);
+    in2 = _SIMD32_OFFSET(pSrc + 2);
+
+    /* negate two samples at a time */
+    in1 = __QSUB16(0, in1);
+
+    /* negate two samples at a time */
+    in2 = __QSUB16(0, in2);
+
+    /* store the result to destination 2 samples at a time */
+    _SIMD32_OFFSET(pDst) = in1;
+    /* store the result to destination 2 samples at a time */
+    _SIMD32_OFFSET(pDst + 2) = in2;
+
+
+    /* update pointers to process next samples */
+    pSrc += 4u;
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Negate and then store the result in the destination buffer. */
+    in = *pSrc++;
+    *pDst++ = (in == (q15_t) 0x8000) ? 0x7fff : -in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of negate group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,129 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_negate_q31.c    
+*    
+* Description:	Negates Q31 vectors.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup negate    
+ * @{    
+ */
+
+/**    
+ * @brief  Negates the elements of a Q31 vector.    
+ * @param[in]  *pSrc points to the input vector    
+ * @param[out]  *pDst points to the output vector    
+ * @param[in]  blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * The Q31 value -1 (0x80000000) will be saturated to the maximum allowable positive value 0x7FFFFFFF.    
+ */
+
+void arm_negate_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t in;                                      /* Temporary variable */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Negate and then store the results in the destination buffer. */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    *pDst++ = __QSUB(0, in1);
+    *pDst++ = __QSUB(0, in2);
+    *pDst++ = __QSUB(0, in3);
+    *pDst++ = __QSUB(0, in4);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Negate and then store the result in the destination buffer. */
+    in = *pSrc++;
+    *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of negate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_negate_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,125 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_negate_q7.c    
+*    
+* Description:	Negates Q7 vectors.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup negate    
+ * @{    
+ */
+
+/**    
+ * @brief  Negates the elements of a Q7 vector.    
+ * @param[in]  *pSrc points to the input vector    
+ * @param[out]  *pDst points to the output vector    
+ * @param[in]  blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * The Q7 value -1 (0x80) will be saturated to the maximum allowable positive value 0x7F.    
+ */
+
+void arm_negate_q7(
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  q7_t in;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t input;                                   /* Input values1-4 */
+  q31_t zero = 0x00000000;
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Read four inputs */
+    input = *__SIMD32(pSrc)++;
+
+    /* Store the Negated results in the destination buffer in a single cycle by packing the results */
+    *__SIMD32(pDst)++ = __QSUB8(zero, input);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = -A */
+    /* Negate and then store the results in the destination buffer. */ \
+      in = *pSrc++;
+    *pDst++ = (in == (q7_t) 0x80) ? 0x7f : -in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of negate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,165 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_offset_f32.c    
+*    
+* Description:	Floating-point vector offset.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup offset Vector Offset        
+ *        
+ * Adds a constant offset to each element of a vector.        
+ *        
+ * <pre>        
+ *     pDst[n] = pSrc[n] + offset,   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * The functions support in-place computation allowing the source and
+ * destination pointers to reference the same memory buffer.
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.
+ */
+
+/**        
+ * @addtogroup offset        
+ * @{        
+ */
+
+/**        
+ * @brief  Adds a constant offset to a floating-point vector.        
+ * @param[in]  *pSrc points to the input vector        
+ * @param[in]  offset is the offset to be added        
+ * @param[out]  *pDst points to the output vector        
+ * @param[in]  blockSize number of samples in the vector        
+ * @return none.        
+ */
+
+
+void arm_offset_f32(
+  float32_t * pSrc,
+  float32_t offset,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the results in the destination buffer. */
+    /* read samples from source */
+    in1 = *pSrc;
+    in2 = *(pSrc + 1);
+
+    /* add offset to input */
+    in1 = in1 + offset;
+
+    /* read samples from source */
+    in3 = *(pSrc + 2);
+
+    /* add offset to input */
+    in2 = in2 + offset;
+
+    /* read samples from source */
+    in4 = *(pSrc + 3);
+
+    /* add offset to input */
+    in3 = in3 + offset;
+
+    /* store result to destination */
+    *pDst = in1;
+
+    /* add offset to input */
+    in4 = in4 + offset;
+
+    /* store result to destination */
+    *(pDst + 1) = in2;
+
+    /* store result to destination */
+    *(pDst + 2) = in3;
+
+    /* store result to destination */
+    *(pDst + 3) = in4;
+
+    /* update pointers to process next samples */
+    pSrc += 4u;
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the result in the destination buffer. */
+    *pDst++ = (*pSrc++) + offset;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of offset group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,136 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_offset_q15.c    
+*    
+* Description:	Q15 vector offset.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup offset    
+ * @{    
+ */
+
+/**    
+ * @brief  Adds a constant offset to a Q15 vector.    
+ * @param[in]  *pSrc points to the input vector    
+ * @param[in]  offset is the offset to be added    
+ * @param[out]  *pDst points to the output vector    
+ * @param[in]  blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated.    
+ */
+
+void arm_offset_q15(
+  q15_t * pSrc,
+  q15_t offset,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t offset_packed;                           /* Offset packed to 32 bit */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* Offset is packed to 32 bit in order to use SIMD32 for addition */
+  offset_packed = __PKHBT(offset, offset, 16);
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the results in the destination buffer, 2 samples at a time. */
+    *__SIMD32(pDst)++ = __QADD16(*__SIMD32(pSrc)++, offset_packed);
+    *__SIMD32(pDst)++ = __QADD16(*__SIMD32(pSrc)++, offset_packed);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the results in the destination buffer. */
+    *pDst++ = (q15_t) __QADD16(*pSrc++, offset);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the results in the destination buffer. */
+    *pDst++ = (q15_t) __SSAT(((q31_t) * pSrc++ + offset), 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of offset group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,140 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_offset_q31.c    
+*    
+* Description:	Q31 vector offset.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup offset    
+ * @{    
+ */
+
+/**    
+ * @brief  Adds a constant offset to a Q31 vector.    
+ * @param[in]  *pSrc points to the input vector    
+ * @param[in]  offset is the offset to be added    
+ * @param[out]  *pDst points to the output vector    
+ * @param[in]  blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] are saturated.    
+ */
+
+void arm_offset_q31(
+  q31_t * pSrc,
+  q31_t offset,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the results in the destination buffer. */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    *pDst++ = __QADD(in1, offset);
+    *pDst++ = __QADD(in2, offset);
+    *pDst++ = __QADD(in3, offset);
+    *pDst++ = __QADD(in4, offset);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the result in the destination buffer. */
+    *pDst++ = __QADD(*pSrc++, offset);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the result in the destination buffer. */
+    *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of offset group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_offset_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,135 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_offset_q7.c    
+*    
+* Description:	Q7 vector offset.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup offset    
+ * @{    
+ */
+
+/**    
+ * @brief  Adds a constant offset to a Q7 vector.    
+ * @param[in]  *pSrc points to the input vector    
+ * @param[in]  offset is the offset to be added    
+ * @param[out]  *pDst points to the output vector    
+ * @param[in]  blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q7 range [0x80 0x7F] are saturated.    
+ */
+
+void arm_offset_q7(
+  q7_t * pSrc,
+  q7_t offset,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t offset_packed;                           /* Offset packed to 32 bit */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* Offset is packed to 32 bit in order to use SIMD32 for addition */
+  offset_packed = __PACKq7(offset, offset, offset, offset);
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the results in the destination bufferfor 4 samples at a time. */
+    *__SIMD32(pDst)++ = __QADD8(*__SIMD32(pSrc)++, offset_packed);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the result in the destination buffer. */
+    *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A + offset */
+    /* Add offset and then store the result in the destination buffer. */
+    *pDst++ = (q7_t) __SSAT((q15_t) * pSrc++ + offset, 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of offset group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,169 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_scale_f32.c    
+*    
+* Description:	Multiplies a floating-point vector by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup scale Vector Scale        
+ *        
+ * Multiply a vector by a scalar value.  For floating-point data, the algorithm used is:        
+ *        
+ * <pre>        
+ *     pDst[n] = pSrc[n] * scale,   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * In the fixed-point Q7, Q15, and Q31 functions, <code>scale</code> is represented by        
+ * a fractional multiplication <code>scaleFract</code> and an arithmetic shift <code>shift</code>.        
+ * The shift allows the gain of the scaling operation to exceed 1.0.        
+ * The algorithm used with fixed-point data is:        
+ *        
+ * <pre>        
+ *     pDst[n] = (pSrc[n] * scaleFract) << shift,   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * The overall scale factor applied to the fixed-point data is        
+ * <pre>        
+ *     scale = scaleFract * 2^shift.        
+ * </pre>        
+ *
+ * The functions support in-place computation allowing the source and destination
+ * pointers to reference the same memory buffer.
+ */
+
+/**        
+ * @addtogroup scale        
+ * @{        
+ */
+
+/**        
+ * @brief Multiplies a floating-point vector by a scalar.        
+ * @param[in]       *pSrc points to the input vector        
+ * @param[in]       scale scale factor to be applied        
+ * @param[out]      *pDst points to the output vector        
+ * @param[in]       blockSize number of samples in the vector        
+ * @return none.        
+ */
+
+
+void arm_scale_f32(
+  float32_t * pSrc,
+  float32_t scale,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1, in2, in3, in4;                  /* temporary variabels */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A * scale */
+    /* Scale the input and then store the results in the destination buffer. */
+    /* read input samples from source */
+    in1 = *pSrc;
+    in2 = *(pSrc + 1);
+
+    /* multiply with scaling factor */
+    in1 = in1 * scale;
+
+    /* read input sample from source */
+    in3 = *(pSrc + 2);
+
+    /* multiply with scaling factor */
+    in2 = in2 * scale;
+
+    /* read input sample from source */
+    in4 = *(pSrc + 3);
+
+    /* multiply with scaling factor */
+    in3 = in3 * scale;
+    in4 = in4 * scale;
+    /* store the result to destination */
+    *pDst = in1;
+    *(pDst + 1) = in2;
+    *(pDst + 2) = in3;
+    *(pDst + 3) = in4;
+
+    /* update pointers to process next samples */
+    pSrc += 4u;
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * scale */
+    /* Scale the input and then store the result in the destination buffer. */
+    *pDst++ = (*pSrc++) * scale;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of scale group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,162 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_scale_q15.c    
+*    
+* Description:	Multiplies a Q15 vector by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup scale    
+ * @{    
+ */
+
+/**    
+ * @brief Multiplies a Q15 vector by a scalar.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       scaleFract fractional portion of the scale value    
+ * @param[in]       shift number of bits to shift the result by    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The input data <code>*pSrc</code> and <code>scaleFract</code> are in 1.15 format.    
+ * These are multiplied to yield a 2.30 intermediate result and this is shifted with saturation to 1.15 format.    
+ */
+
+
+void arm_scale_q15(
+  q15_t * pSrc,
+  q15_t scaleFract,
+  int8_t shift,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  int8_t kShift = 15 - shift;                    /* shift to apply after scaling */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q15_t in1, in2, in3, in4;
+  q31_t inA1, inA2;                              /* Temporary variables */
+  q31_t out1, out2, out3, out4;
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Reading 2 inputs from memory */
+    inA1 = *__SIMD32(pSrc)++;
+    inA2 = *__SIMD32(pSrc)++;
+
+    /* C = A * scale */
+    /* Scale the inputs and then store the 2 results in the destination buffer        
+     * in single cycle by packing the outputs */
+    out1 = (q31_t) ((q15_t) (inA1 >> 16) * scaleFract);
+    out2 = (q31_t) ((q15_t) inA1 * scaleFract);
+    out3 = (q31_t) ((q15_t) (inA2 >> 16) * scaleFract);
+    out4 = (q31_t) ((q15_t) inA2 * scaleFract);
+
+    /* apply shifting */
+    out1 = out1 >> kShift;
+    out2 = out2 >> kShift;
+    out3 = out3 >> kShift;
+    out4 = out4 >> kShift;
+
+    /* saturate the output */
+    in1 = (q15_t) (__SSAT(out1, 16));
+    in2 = (q15_t) (__SSAT(out2, 16));
+    in3 = (q15_t) (__SSAT(out3, 16));
+    in4 = (q15_t) (__SSAT(out4, 16));
+
+    /* store the result to destination */
+    *__SIMD32(pDst)++ = __PKHBT(in2, in1, 16);
+    *__SIMD32(pDst)++ = __PKHBT(in4, in3, 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * scale */
+    /* Scale the input and then store the result in the destination buffer. */
+    *pDst++ = (q15_t) (__SSAT(((*pSrc++) * scaleFract) >> kShift, 16));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * scale */
+    /* Scale the input and then store the result in the destination buffer. */
+    *pDst++ = (q15_t) (__SSAT(((q31_t) * pSrc++ * scaleFract) >> kShift, 16));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of scale group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,239 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_scale_q31.c    
+*    
+* Description:	Multiplies a Q31 vector by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**       
+ * @ingroup groupMath       
+ */
+
+/**       
+ * @addtogroup scale       
+ * @{       
+ */
+
+/**       
+ * @brief Multiplies a Q31 vector by a scalar.       
+ * @param[in]       *pSrc points to the input vector       
+ * @param[in]       scaleFract fractional portion of the scale value       
+ * @param[in]       shift number of bits to shift the result by       
+ * @param[out]      *pDst points to the output vector       
+ * @param[in]       blockSize number of samples in the vector       
+ * @return none.       
+ *       
+ * <b>Scaling and Overflow Behavior:</b>       
+ * \par       
+ * The input data <code>*pSrc</code> and <code>scaleFract</code> are in 1.31 format.       
+ * These are multiplied to yield a 2.62 intermediate result and this is shifted with saturation to 1.31 format.       
+ */
+
+void arm_scale_q31(
+  q31_t * pSrc,
+  q31_t scaleFract,
+  int8_t shift,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  int8_t kShift = shift + 1;                     /* Shift to apply after scaling */
+  int8_t sign = (kShift & 0x80);
+  uint32_t blkCnt;                               /* loop counter */
+  q31_t in, out;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in1, in2, in3, in4;                      /* temporary input variables */
+  q31_t out1, out2, out3, out4;                  /* temporary output variabels */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  if(sign == 0u)
+  {
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* read four inputs from source */
+      in1 = *pSrc;
+      in2 = *(pSrc + 1);
+      in3 = *(pSrc + 2);
+      in4 = *(pSrc + 3);
+
+      /* multiply input with scaler value */
+      in1 = ((q63_t) in1 * scaleFract) >> 32;
+      in2 = ((q63_t) in2 * scaleFract) >> 32;
+      in3 = ((q63_t) in3 * scaleFract) >> 32;
+      in4 = ((q63_t) in4 * scaleFract) >> 32;
+
+      /* apply shifting */
+      out1 = in1 << kShift;
+      out2 = in2 << kShift;
+
+      /* saturate the results. */
+      if(in1 != (out1 >> kShift))
+        out1 = 0x7FFFFFFF ^ (in1 >> 31);
+
+      if(in2 != (out2 >> kShift))
+        out2 = 0x7FFFFFFF ^ (in2 >> 31);
+
+      out3 = in3 << kShift;
+      out4 = in4 << kShift;
+
+      *pDst = out1;
+      *(pDst + 1) = out2;
+
+      if(in3 != (out3 >> kShift))
+        out3 = 0x7FFFFFFF ^ (in3 >> 31);
+
+      if(in4 != (out4 >> kShift))
+        out4 = 0x7FFFFFFF ^ (in4 >> 31);
+
+      /* Store result destination */
+      *(pDst + 2) = out3;
+      *(pDst + 3) = out4;
+
+      /* Update pointers to process next sampels */
+      pSrc += 4u;
+      pDst += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+  }
+  else
+  {
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* read four inputs from source */
+      in1 = *pSrc;
+      in2 = *(pSrc + 1);
+      in3 = *(pSrc + 2);
+      in4 = *(pSrc + 3);
+
+      /* multiply input with scaler value */
+      in1 = ((q63_t) in1 * scaleFract) >> 32;
+      in2 = ((q63_t) in2 * scaleFract) >> 32;
+      in3 = ((q63_t) in3 * scaleFract) >> 32;
+      in4 = ((q63_t) in4 * scaleFract) >> 32;
+
+      /* apply shifting */
+      out1 = in1 >> -kShift;
+      out2 = in2 >> -kShift;
+
+      out3 = in3 >> -kShift;
+      out4 = in4 >> -kShift;
+
+      /* Store result destination */
+      *pDst = out1;
+      *(pDst + 1) = out2;
+
+      *(pDst + 2) = out3;
+      *(pDst + 3) = out4;
+
+      /* Update pointers to process next sampels */
+      pSrc += 4u;
+      pDst += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.       
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  if(sign == 0)
+  {
+	  while(blkCnt > 0u)
+	  {
+		/* C = A * scale */
+		/* Scale the input and then store the result in the destination buffer. */
+		in = *pSrc++;
+		in = ((q63_t) in * scaleFract) >> 32;
+
+		out = in << kShift;
+		
+		if(in != (out >> kShift))
+			out = 0x7FFFFFFF ^ (in >> 31);
+
+		*pDst++ = out;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	  }
+  }
+  else
+  {
+	  while(blkCnt > 0u)
+	  {
+		/* C = A * scale */
+		/* Scale the input and then store the result in the destination buffer. */
+		in = *pSrc++;
+		in = ((q63_t) in * scaleFract) >> 32;
+
+		out = in >> -kShift;
+
+		*pDst++ = out;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	  }
+  
+  }
+}
+
+/**       
+ * @} end of scale group       
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_scale_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,149 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_scale_q7.c    
+*    
+* Description:	Multiplies a Q7 vector by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup scale    
+ * @{    
+ */
+
+/**    
+ * @brief Multiplies a Q7 vector by a scalar.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       scaleFract fractional portion of the scale value    
+ * @param[in]       shift number of bits to shift the result by    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The input data <code>*pSrc</code> and <code>scaleFract</code> are in 1.7 format.    
+ * These are multiplied to yield a 2.14 intermediate result and this is shifted with saturation to 1.7 format.    
+ */
+
+void arm_scale_q7(
+  q7_t * pSrc,
+  q7_t scaleFract,
+  int8_t shift,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  int8_t kShift = 7 - shift;                     /* shift to apply after scaling */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q7_t in1, in2, in3, in4, out1, out2, out3, out4;      /* Temporary variables to store input & output */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Reading 4 inputs from memory */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    /* C = A * scale */
+    /* Scale the inputs and then store the results in the temporary variables. */
+    out1 = (q7_t) (__SSAT(((in1) * scaleFract) >> kShift, 8));
+    out2 = (q7_t) (__SSAT(((in2) * scaleFract) >> kShift, 8));
+    out3 = (q7_t) (__SSAT(((in3) * scaleFract) >> kShift, 8));
+    out4 = (q7_t) (__SSAT(((in4) * scaleFract) >> kShift, 8));
+
+    /* Packing the individual outputs into 32bit and storing in    
+     * destination buffer in single write */
+    *__SIMD32(pDst)++ = __PACKq7(out1, out2, out3, out4);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * scale */
+    /* Scale the input and then store the result in the destination buffer. */
+    *pDst++ = (q7_t) (__SSAT(((*pSrc++) * scaleFract) >> kShift, 8));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A * scale */
+    /* Scale the input and then store the result in the destination buffer. */
+    *pDst++ = (q7_t) (__SSAT((((q15_t) * pSrc++ * scaleFract) >> kShift), 8));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of scale group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,248 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_shift_q15.c    
+*    
+* Description:	Shifts the elements of a Q15 vector by a specified number of bits.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup shift    
+ * @{    
+ */
+
+/**    
+ * @brief  Shifts the elements of a Q15 vector a specified number of bits.    
+ * @param[in]  *pSrc points to the input vector    
+ * @param[in]  shiftBits number of bits to shift.  A positive value shifts left; a negative value shifts right.    
+ * @param[out]  *pDst points to the output vector    
+ * @param[in]  blockSize number of samples in the vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+void arm_shift_q15(
+  q15_t * pSrc,
+  int8_t shiftBits,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  uint8_t sign;                                  /* Sign of shiftBits */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t in1, in2;                                /* Temporary variables */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* Getting the sign of shiftBits */
+  sign = (shiftBits & 0x80);
+
+  /* If the shift value is positive then do right shift else left shift */
+  if(sign == 0u)
+  {
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* Read 2 inputs */
+      in1 = *pSrc++;
+      in2 = *pSrc++;
+      /* C = A << shiftBits */
+      /* Shift the inputs and then store the results in the destination buffer. */
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pDst)++ = __PKHBT(__SSAT((in1 << shiftBits), 16),
+                                  __SSAT((in2 << shiftBits), 16), 16);
+
+#else
+
+      *__SIMD32(pDst)++ = __PKHBT(__SSAT((in2 << shiftBits), 16),
+                                  __SSAT((in1 << shiftBits), 16), 16);
+
+#endif /* #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      in1 = *pSrc++;
+      in2 = *pSrc++;
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pDst)++ = __PKHBT(__SSAT((in1 << shiftBits), 16),
+                                  __SSAT((in2 << shiftBits), 16), 16);
+
+#else
+
+      *__SIMD32(pDst)++ = __PKHBT(__SSAT((in2 << shiftBits), 16),
+                                  __SSAT((in1 << shiftBits), 16), 16);
+
+#endif /* #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A << shiftBits */
+      /* Shift and then store the results in the destination buffer. */
+      *pDst++ = __SSAT((*pSrc++ << shiftBits), 16);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* Read 2 inputs */
+      in1 = *pSrc++;
+      in2 = *pSrc++;
+
+      /* C = A >> shiftBits */
+      /* Shift the inputs and then store the results in the destination buffer. */
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pDst)++ = __PKHBT((in1 >> -shiftBits),
+                                  (in2 >> -shiftBits), 16);
+
+#else
+
+      *__SIMD32(pDst)++ = __PKHBT((in2 >> -shiftBits),
+                                  (in1 >> -shiftBits), 16);
+
+#endif /* #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      in1 = *pSrc++;
+      in2 = *pSrc++;
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pDst)++ = __PKHBT((in1 >> -shiftBits),
+                                  (in2 >> -shiftBits), 16);
+
+#else
+
+      *__SIMD32(pDst)++ = __PKHBT((in2 >> -shiftBits),
+                                  (in1 >> -shiftBits), 16);
+
+#endif /* #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A >> shiftBits */
+      /* Shift the inputs and then store the results in the destination buffer. */
+      *pDst++ = (*pSrc++ >> -shiftBits);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Getting the sign of shiftBits */
+  sign = (shiftBits & 0x80);
+
+  /* If the shift value is positive then do right shift else left shift */
+  if(sign == 0u)
+  {
+    /* Initialize blkCnt with number of samples */
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A << shiftBits */
+      /* Shift and then store the results in the destination buffer. */
+      *pDst++ = __SSAT(((q31_t) * pSrc++ << shiftBits), 16);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* Initialize blkCnt with number of samples */
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A >> shiftBits */
+      /* Shift the inputs and then store the results in the destination buffer. */
+      *pDst++ = (*pSrc++ >> -shiftBits);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of shift group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,203 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_shift_q31.c    
+*    
+* Description:	Shifts the elements of a Q31 vector by a specified number of bits.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+/**        
+ * @defgroup shift Vector Shift        
+ *        
+ * Shifts the elements of a fixed-point vector by a specified number of bits.        
+ * There are separate functions for Q7, Q15, and Q31 data types.        
+ * The underlying algorithm used is:        
+ *        
+ * <pre>        
+ *     pDst[n] = pSrc[n] << shift,   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * If <code>shift</code> is positive then the elements of the vector are shifted to the left.        
+ * If <code>shift</code> is negative then the elements of the vector are shifted to the right.        
+ *
+ * The functions support in-place computation allowing the source and destination
+ * pointers to reference the same memory buffer.
+ */
+
+/**        
+ * @addtogroup shift        
+ * @{        
+ */
+
+/**        
+ * @brief  Shifts the elements of a Q31 vector a specified number of bits.        
+ * @param[in]  *pSrc points to the input vector        
+ * @param[in]  shiftBits number of bits to shift.  A positive value shifts left; a negative value shifts right.        
+ * @param[out]  *pDst points to the output vector        
+ * @param[in]  blockSize number of samples in the vector        
+ * @return none.        
+ *        
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function uses saturating arithmetic.        
+ * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated.        
+ */
+
+void arm_shift_q31(
+  q31_t * pSrc,
+  int8_t shiftBits,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  uint8_t sign = (shiftBits & 0x80);             /* Sign of shiftBits */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  q31_t in1, in2, in3, in4;                      /* Temporary input variables */
+  q31_t out1, out2, out3, out4;                  /* Temporary output variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+
+  if(sign == 0u)
+  {
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C = A  << shiftBits */
+      /* Shift the input and then store the results in the destination buffer. */
+      in1 = *pSrc;
+      in2 = *(pSrc + 1);
+      out1 = in1 << shiftBits;
+      in3 = *(pSrc + 2);
+      out2 = in2 << shiftBits;
+      in4 = *(pSrc + 3);
+      if(in1 != (out1 >> shiftBits))
+        out1 = 0x7FFFFFFF ^ (in1 >> 31);
+
+      if(in2 != (out2 >> shiftBits))
+        out2 = 0x7FFFFFFF ^ (in2 >> 31);
+
+      *pDst = out1;
+      out3 = in3 << shiftBits;
+      *(pDst + 1) = out2;
+      out4 = in4 << shiftBits;
+
+      if(in3 != (out3 >> shiftBits))
+        out3 = 0x7FFFFFFF ^ (in3 >> 31);
+
+      if(in4 != (out4 >> shiftBits))
+        out4 = 0x7FFFFFFF ^ (in4 >> 31);
+
+      *(pDst + 2) = out3;
+      *(pDst + 3) = out4;
+
+      /* Update destination pointer to process next sampels */
+      pSrc += 4u;
+      pDst += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C = A >>  shiftBits */
+      /* Shift the input and then store the results in the destination buffer. */
+      in1 = *pSrc;
+      in2 = *(pSrc + 1);
+      in3 = *(pSrc + 2);
+      in4 = *(pSrc + 3);
+
+      *pDst = (in1 >> -shiftBits);
+      *(pDst + 1) = (in2 >> -shiftBits);
+      *(pDst + 2) = (in3 >> -shiftBits);
+      *(pDst + 3) = (in4 >> -shiftBits);
+
+
+      pSrc += 4u;
+      pDst += 4u;
+
+      blkCnt--;
+    }
+
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A (>> or <<) shiftBits */
+    /* Shift the input and then store the result in the destination buffer. */
+    *pDst++ = (sign == 0u) ? clip_q63_to_q31((q63_t) * pSrc++ << shiftBits) :
+      (*pSrc++ >> -shiftBits);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+}
+
+/**        
+ * @} end of shift group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_shift_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,220 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_shift_q7.c    
+*    
+* Description:	Processing function for the Q7 Shifting    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @addtogroup shift        
+ * @{        
+ */
+
+
+/**        
+ * @brief  Shifts the elements of a Q7 vector a specified number of bits.        
+ * @param[in]  *pSrc points to the input vector        
+ * @param[in]  shiftBits number of bits to shift.  A positive value shifts left; a negative value shifts right.        
+ * @param[out]  *pDst points to the output vector        
+ * @param[in]  blockSize number of samples in the vector        
+ * @return none.        
+ *    
+ * \par Conditions for optimum performance    
+ *  Input and output buffers should be aligned by 32-bit    
+ *    
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function uses saturating arithmetic.        
+ * Results outside of the allowable Q7 range [0x8 0x7F] will be saturated.        
+ */
+
+void arm_shift_q7(
+  q7_t * pSrc,
+  int8_t shiftBits,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  uint8_t sign;                                  /* Sign of shiftBits */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q7_t in1;                                      /* Input value1 */
+  q7_t in2;                                      /* Input value2 */
+  q7_t in3;                                      /* Input value3 */
+  q7_t in4;                                      /* Input value4 */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* Getting the sign of shiftBits */
+  sign = (shiftBits & 0x80);
+
+  /* If the shift value is positive then do right shift else left shift */
+  if(sign == 0u)
+  {
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C = A << shiftBits */
+      /* Read 4 inputs */
+      in1 = *pSrc;
+      in2 = *(pSrc + 1);
+      in3 = *(pSrc + 2);
+      in4 = *(pSrc + 3);
+
+      /* Store the Shifted result in the destination buffer in single cycle by packing the outputs */
+      *__SIMD32(pDst)++ = __PACKq7(__SSAT((in1 << shiftBits), 8),
+                                   __SSAT((in2 << shiftBits), 8),
+                                   __SSAT((in3 << shiftBits), 8),
+                                   __SSAT((in4 << shiftBits), 8));
+      /* Update source pointer to process next sampels */
+      pSrc += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+     ** No loop unrolling is used. */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A << shiftBits */
+      /* Shift the input and then store the result in the destination buffer. */
+      *pDst++ = (q7_t) __SSAT((*pSrc++ << shiftBits), 8);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    shiftBits = -shiftBits;
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C = A >> shiftBits */
+      /* Read 4 inputs */
+      in1 = *pSrc;
+      in2 = *(pSrc + 1);
+      in3 = *(pSrc + 2);
+      in4 = *(pSrc + 3);
+
+      /* Store the Shifted result in the destination buffer in single cycle by packing the outputs */
+      *__SIMD32(pDst)++ = __PACKq7((in1 >> shiftBits), (in2 >> shiftBits),
+                                   (in3 >> shiftBits), (in4 >> shiftBits));
+
+
+      pSrc += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A >> shiftBits */
+      /* Shift the input and then store the result in the destination buffer. */
+      in1 = *pSrc++;
+      *pDst++ = (in1 >> shiftBits);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Getting the sign of shiftBits */
+  sign = (shiftBits & 0x80);
+
+  /* If the shift value is positive then do right shift else left shift */
+  if(sign == 0u)
+  {
+    /* Initialize blkCnt with number of samples */
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A << shiftBits */
+      /* Shift the input and then store the result in the destination buffer. */
+      *pDst++ = (q7_t) __SSAT(((q15_t) * pSrc++ << shiftBits), 8);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* Initialize blkCnt with number of samples */
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* C = A >> shiftBits */
+      /* Shift the input and then store the result in the destination buffer. */
+      *pDst++ = (*pSrc++ >> -shiftBits);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+}
+
+/**        
+ * @} end of shift group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,150 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sub_f32.c    
+*    
+* Description:	Floating-point vector subtraction.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMath        
+ */
+
+/**        
+ * @defgroup BasicSub Vector Subtraction        
+ *        
+ * Element-by-element subtraction of two vectors.        
+ *        
+ * <pre>        
+ *     pDst[n] = pSrcA[n] - pSrcB[n],   0 <= n < blockSize.        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q7, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup BasicSub        
+ * @{        
+ */
+
+
+/**        
+ * @brief Floating-point vector subtraction.        
+ * @param[in]       *pSrcA points to the first input vector        
+ * @param[in]       *pSrcB points to the second input vector        
+ * @param[out]      *pDst points to the output vector        
+ * @param[in]       blockSize number of samples in each vector        
+ * @return none.        
+ */
+
+void arm_sub_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t inA1, inA2, inA3, inA4;              /* temporary variables */
+  float32_t inB1, inB2, inB3, inB4;              /* temporary variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the results in the destination buffer. */
+    /* Read 4 input samples from sourceA and sourceB */
+    inA1 = *pSrcA;
+    inB1 = *pSrcB;
+    inA2 = *(pSrcA + 1);
+    inB2 = *(pSrcB + 1);
+    inA3 = *(pSrcA + 2);
+    inB3 = *(pSrcB + 2);
+    inA4 = *(pSrcA + 3);
+    inB4 = *(pSrcB + 3);
+
+    /* dst = srcA - srcB */
+    /* subtract and store the result */
+    *pDst = inA1 - inB1;
+    *(pDst + 1) = inA2 - inB2;
+    *(pDst + 2) = inA3 - inB3;
+    *(pDst + 3) = inA4 - inB4;
+
+
+    /* Update pointers to process next sampels */
+    pSrcA += 4u;
+    pSrcB += 4u;
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the results in the destination buffer. */
+    *pDst++ = (*pSrcA++) - (*pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of BasicSub group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,140 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sub_q15.c    
+*    
+* Description:	Q15 vector subtraction.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicSub    
+ * @{    
+ */
+
+/**    
+ * @brief Q15 vector subtraction.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+void arm_sub_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2;
+  q31_t inB1, inB2;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the results in the destination buffer two samples at a time. */
+    inA1 = *__SIMD32(pSrcA)++;
+    inA2 = *__SIMD32(pSrcA)++;
+    inB1 = *__SIMD32(pSrcB)++;
+    inB2 = *__SIMD32(pSrcB)++;
+
+    *__SIMD32(pDst)++ = __QSUB16(inA1, inB1);
+    *__SIMD32(pDst)++ = __QSUB16(inA2, inB2);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the result in the destination buffer. */
+    *pDst++ = (q15_t) __QSUB16(*pSrcA++, *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the result in the destination buffer. */
+    *pDst++ = (q15_t) __SSAT(((q31_t) * pSrcA++ - *pSrcB++), 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+}
+
+/**    
+ * @} end of BasicSub group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,146 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sub_q31.c    
+*    
+* Description:	Q31 vector subtraction.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicSub    
+ * @{    
+ */
+
+/**    
+ * @brief Q31 vector subtraction.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated.    
+ */
+
+void arm_sub_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inA1, inA2, inA3, inA4;
+  q31_t inB1, inB2, inB3, inB4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the results in the destination buffer. */
+    inA1 = *pSrcA++;
+    inA2 = *pSrcA++;
+    inB1 = *pSrcB++;
+    inB2 = *pSrcB++;
+
+    inA3 = *pSrcA++;
+    inA4 = *pSrcA++;
+    inB3 = *pSrcB++;
+    inB4 = *pSrcB++;
+
+    *pDst++ = __QSUB(inA1, inB1);
+    *pDst++ = __QSUB(inA2, inB2);
+    *pDst++ = __QSUB(inA3, inB3);
+    *pDst++ = __QSUB(inA4, inB4);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the result in the destination buffer. */
+    *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the result in the destination buffer. */
+    *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrcA++ - *pSrcB++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of BasicSub group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/BasicMathFunctions/arm_sub_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,131 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sub_q7.c    
+*    
+* Description:	Q7 vector subtraction.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMath    
+ */
+
+/**    
+ * @addtogroup BasicSub    
+ * @{    
+ */
+
+/**    
+ * @brief Q7 vector subtraction.    
+ * @param[in]       *pSrcA points to the first input vector    
+ * @param[in]       *pSrcB points to the second input vector    
+ * @param[out]      *pDst points to the output vector    
+ * @param[in]       blockSize number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.    
+ */
+
+void arm_sub_q7(
+  q7_t * pSrcA,
+  q7_t * pSrcB,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the results in the destination buffer 4 samples at a time. */
+    *__SIMD32(pDst)++ = __QSUB8(*__SIMD32(pSrcA)++, *__SIMD32(pSrcB)++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the result in the destination buffer. */
+    *pDst++ = __SSAT(*pSrcA++ - *pSrcB++, 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Initialize blkCnt with number of samples */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A - B */
+    /* Subtract and then store the result in the destination buffer. */
+    *pDst++ = (q7_t) __SSAT((q15_t) * pSrcA++ - *pSrcB++, 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+}
+
+/**    
+ * @} end of BasicSub group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/CommonTables/arm_common_tables.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/CommonTables/arm_common_tables.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,27251 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_common_tables.c    
+*    
+* Description:	This file has common tables like fft twiddle factors, Bitreverse, reciprocal etc which are used across different functions    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @ingroup groupTransforms    
+ */
+
+/**    
+ * @addtogroup CFFT_CIFFT Complex FFT Tables  
+ * @{    
+ */
+
+/**    
+* \par    
+* Pseudo code for Generation of Bit reversal Table is    
+* \par    
+* <pre>for(l=1;l <= N/4;l++)    
+* {    
+*   for(i=0;i<logN2;i++)    
+*   {     
+*     a[i]=l&(1<<i);    
+*   }    
+*   for(j=0; j<logN2; j++)    
+*   {    
+*     if (a[j]!=0)    
+*     y[l]+=(1<<((logN2-1)-j));    
+*   }    
+*   y[l] = y[l] >> 1;    
+*  } </pre>    
+* \par    
+* where N = 4096	logN2 = 12   
+* \par    
+* N is the maximum FFT Size supported    
+*/
+
+/*    
+* @brief  Table for bit reversal process    
+*/
+const uint16_t armBitRevTable[1024] = {
+   0x400, 0x200, 0x600, 0x100, 0x500, 0x300, 0x700, 0x80, 0x480, 0x280, 
+   0x680, 0x180, 0x580, 0x380, 0x780, 0x40, 0x440, 0x240, 0x640, 0x140, 
+   0x540, 0x340, 0x740, 0xc0, 0x4c0, 0x2c0, 0x6c0, 0x1c0, 0x5c0, 0x3c0, 
+   0x7c0, 0x20, 0x420, 0x220, 0x620, 0x120, 0x520, 0x320, 0x720, 0xa0, 
+   0x4a0, 0x2a0, 0x6a0, 0x1a0, 0x5a0, 0x3a0, 0x7a0, 0x60, 0x460, 0x260, 
+   0x660, 0x160, 0x560, 0x360, 0x760, 0xe0, 0x4e0, 0x2e0, 0x6e0, 0x1e0, 
+   0x5e0, 0x3e0, 0x7e0, 0x10, 0x410, 0x210, 0x610, 0x110, 0x510, 0x310, 
+   0x710, 0x90, 0x490, 0x290, 0x690, 0x190, 0x590, 0x390, 0x790, 0x50, 
+   0x450, 0x250, 0x650, 0x150, 0x550, 0x350, 0x750, 0xd0, 0x4d0, 0x2d0, 
+   0x6d0, 0x1d0, 0x5d0, 0x3d0, 0x7d0, 0x30, 0x430, 0x230, 0x630, 0x130, 
+   0x530, 0x330, 0x730, 0xb0, 0x4b0, 0x2b0, 0x6b0, 0x1b0, 0x5b0, 0x3b0, 
+   0x7b0, 0x70, 0x470, 0x270, 0x670, 0x170, 0x570, 0x370, 0x770, 0xf0, 
+   0x4f0, 0x2f0, 0x6f0, 0x1f0, 0x5f0, 0x3f0, 0x7f0, 0x8, 0x408, 0x208, 
+   0x608, 0x108, 0x508, 0x308, 0x708, 0x88, 0x488, 0x288, 0x688, 0x188, 
+   0x588, 0x388, 0x788, 0x48, 0x448, 0x248, 0x648, 0x148, 0x548, 0x348, 
+   0x748, 0xc8, 0x4c8, 0x2c8, 0x6c8, 0x1c8, 0x5c8, 0x3c8, 0x7c8, 0x28, 
+   0x428, 0x228, 0x628, 0x128, 0x528, 0x328, 0x728, 0xa8, 0x4a8, 0x2a8, 
+   0x6a8, 0x1a8, 0x5a8, 0x3a8, 0x7a8, 0x68, 0x468, 0x268, 0x668, 0x168, 
+   0x568, 0x368, 0x768, 0xe8, 0x4e8, 0x2e8, 0x6e8, 0x1e8, 0x5e8, 0x3e8, 
+   0x7e8, 0x18, 0x418, 0x218, 0x618, 0x118, 0x518, 0x318, 0x718, 0x98, 
+   0x498, 0x298, 0x698, 0x198, 0x598, 0x398, 0x798, 0x58, 0x458, 0x258, 
+   0x658, 0x158, 0x558, 0x358, 0x758, 0xd8, 0x4d8, 0x2d8, 0x6d8, 0x1d8, 
+   0x5d8, 0x3d8, 0x7d8, 0x38, 0x438, 0x238, 0x638, 0x138, 0x538, 0x338, 
+   0x738, 0xb8, 0x4b8, 0x2b8, 0x6b8, 0x1b8, 0x5b8, 0x3b8, 0x7b8, 0x78, 
+   0x478, 0x278, 0x678, 0x178, 0x578, 0x378, 0x778, 0xf8, 0x4f8, 0x2f8, 
+   0x6f8, 0x1f8, 0x5f8, 0x3f8, 0x7f8, 0x4, 0x404, 0x204, 0x604, 0x104, 
+   0x504, 0x304, 0x704, 0x84, 0x484, 0x284, 0x684, 0x184, 0x584, 0x384, 
+   0x784, 0x44, 0x444, 0x244, 0x644, 0x144, 0x544, 0x344, 0x744, 0xc4, 
+   0x4c4, 0x2c4, 0x6c4, 0x1c4, 0x5c4, 0x3c4, 0x7c4, 0x24, 0x424, 0x224, 
+   0x624, 0x124, 0x524, 0x324, 0x724, 0xa4, 0x4a4, 0x2a4, 0x6a4, 0x1a4, 
+   0x5a4, 0x3a4, 0x7a4, 0x64, 0x464, 0x264, 0x664, 0x164, 0x564, 0x364, 
+   0x764, 0xe4, 0x4e4, 0x2e4, 0x6e4, 0x1e4, 0x5e4, 0x3e4, 0x7e4, 0x14, 
+   0x414, 0x214, 0x614, 0x114, 0x514, 0x314, 0x714, 0x94, 0x494, 0x294, 
+   0x694, 0x194, 0x594, 0x394, 0x794, 0x54, 0x454, 0x254, 0x654, 0x154, 
+   0x554, 0x354, 0x754, 0xd4, 0x4d4, 0x2d4, 0x6d4, 0x1d4, 0x5d4, 0x3d4, 
+   0x7d4, 0x34, 0x434, 0x234, 0x634, 0x134, 0x534, 0x334, 0x734, 0xb4, 
+   0x4b4, 0x2b4, 0x6b4, 0x1b4, 0x5b4, 0x3b4, 0x7b4, 0x74, 0x474, 0x274, 
+   0x674, 0x174, 0x574, 0x374, 0x774, 0xf4, 0x4f4, 0x2f4, 0x6f4, 0x1f4, 
+   0x5f4, 0x3f4, 0x7f4, 0xc, 0x40c, 0x20c, 0x60c, 0x10c, 0x50c, 0x30c, 
+   0x70c, 0x8c, 0x48c, 0x28c, 0x68c, 0x18c, 0x58c, 0x38c, 0x78c, 0x4c, 
+   0x44c, 0x24c, 0x64c, 0x14c, 0x54c, 0x34c, 0x74c, 0xcc, 0x4cc, 0x2cc, 
+   0x6cc, 0x1cc, 0x5cc, 0x3cc, 0x7cc, 0x2c, 0x42c, 0x22c, 0x62c, 0x12c, 
+   0x52c, 0x32c, 0x72c, 0xac, 0x4ac, 0x2ac, 0x6ac, 0x1ac, 0x5ac, 0x3ac, 
+   0x7ac, 0x6c, 0x46c, 0x26c, 0x66c, 0x16c, 0x56c, 0x36c, 0x76c, 0xec, 
+   0x4ec, 0x2ec, 0x6ec, 0x1ec, 0x5ec, 0x3ec, 0x7ec, 0x1c, 0x41c, 0x21c, 
+   0x61c, 0x11c, 0x51c, 0x31c, 0x71c, 0x9c, 0x49c, 0x29c, 0x69c, 0x19c, 
+   0x59c, 0x39c, 0x79c, 0x5c, 0x45c, 0x25c, 0x65c, 0x15c, 0x55c, 0x35c, 
+   0x75c, 0xdc, 0x4dc, 0x2dc, 0x6dc, 0x1dc, 0x5dc, 0x3dc, 0x7dc, 0x3c, 
+   0x43c, 0x23c, 0x63c, 0x13c, 0x53c, 0x33c, 0x73c, 0xbc, 0x4bc, 0x2bc, 
+   0x6bc, 0x1bc, 0x5bc, 0x3bc, 0x7bc, 0x7c, 0x47c, 0x27c, 0x67c, 0x17c, 
+   0x57c, 0x37c, 0x77c, 0xfc, 0x4fc, 0x2fc, 0x6fc, 0x1fc, 0x5fc, 0x3fc, 
+   0x7fc, 0x2, 0x402, 0x202, 0x602, 0x102, 0x502, 0x302, 0x702, 0x82, 
+   0x482, 0x282, 0x682, 0x182, 0x582, 0x382, 0x782, 0x42, 0x442, 0x242, 
+   0x642, 0x142, 0x542, 0x342, 0x742, 0xc2, 0x4c2, 0x2c2, 0x6c2, 0x1c2, 
+   0x5c2, 0x3c2, 0x7c2, 0x22, 0x422, 0x222, 0x622, 0x122, 0x522, 0x322, 
+   0x722, 0xa2, 0x4a2, 0x2a2, 0x6a2, 0x1a2, 0x5a2, 0x3a2, 0x7a2, 0x62, 
+   0x462, 0x262, 0x662, 0x162, 0x562, 0x362, 0x762, 0xe2, 0x4e2, 0x2e2, 
+   0x6e2, 0x1e2, 0x5e2, 0x3e2, 0x7e2, 0x12, 0x412, 0x212, 0x612, 0x112, 
+   0x512, 0x312, 0x712, 0x92, 0x492, 0x292, 0x692, 0x192, 0x592, 0x392, 
+   0x792, 0x52, 0x452, 0x252, 0x652, 0x152, 0x552, 0x352, 0x752, 0xd2, 
+   0x4d2, 0x2d2, 0x6d2, 0x1d2, 0x5d2, 0x3d2, 0x7d2, 0x32, 0x432, 0x232, 
+   0x632, 0x132, 0x532, 0x332, 0x732, 0xb2, 0x4b2, 0x2b2, 0x6b2, 0x1b2, 
+   0x5b2, 0x3b2, 0x7b2, 0x72, 0x472, 0x272, 0x672, 0x172, 0x572, 0x372, 
+   0x772, 0xf2, 0x4f2, 0x2f2, 0x6f2, 0x1f2, 0x5f2, 0x3f2, 0x7f2, 0xa, 
+   0x40a, 0x20a, 0x60a, 0x10a, 0x50a, 0x30a, 0x70a, 0x8a, 0x48a, 0x28a, 
+   0x68a, 0x18a, 0x58a, 0x38a, 0x78a, 0x4a, 0x44a, 0x24a, 0x64a, 0x14a, 
+   0x54a, 0x34a, 0x74a, 0xca, 0x4ca, 0x2ca, 0x6ca, 0x1ca, 0x5ca, 0x3ca, 
+   0x7ca, 0x2a, 0x42a, 0x22a, 0x62a, 0x12a, 0x52a, 0x32a, 0x72a, 0xaa, 
+   0x4aa, 0x2aa, 0x6aa, 0x1aa, 0x5aa, 0x3aa, 0x7aa, 0x6a, 0x46a, 0x26a, 
+   0x66a, 0x16a, 0x56a, 0x36a, 0x76a, 0xea, 0x4ea, 0x2ea, 0x6ea, 0x1ea, 
+   0x5ea, 0x3ea, 0x7ea, 0x1a, 0x41a, 0x21a, 0x61a, 0x11a, 0x51a, 0x31a, 
+   0x71a, 0x9a, 0x49a, 0x29a, 0x69a, 0x19a, 0x59a, 0x39a, 0x79a, 0x5a, 
+   0x45a, 0x25a, 0x65a, 0x15a, 0x55a, 0x35a, 0x75a, 0xda, 0x4da, 0x2da, 
+   0x6da, 0x1da, 0x5da, 0x3da, 0x7da, 0x3a, 0x43a, 0x23a, 0x63a, 0x13a, 
+   0x53a, 0x33a, 0x73a, 0xba, 0x4ba, 0x2ba, 0x6ba, 0x1ba, 0x5ba, 0x3ba, 
+   0x7ba, 0x7a, 0x47a, 0x27a, 0x67a, 0x17a, 0x57a, 0x37a, 0x77a, 0xfa, 
+   0x4fa, 0x2fa, 0x6fa, 0x1fa, 0x5fa, 0x3fa, 0x7fa, 0x6, 0x406, 0x206, 
+   0x606, 0x106, 0x506, 0x306, 0x706, 0x86, 0x486, 0x286, 0x686, 0x186, 
+   0x586, 0x386, 0x786, 0x46, 0x446, 0x246, 0x646, 0x146, 0x546, 0x346, 
+   0x746, 0xc6, 0x4c6, 0x2c6, 0x6c6, 0x1c6, 0x5c6, 0x3c6, 0x7c6, 0x26, 
+   0x426, 0x226, 0x626, 0x126, 0x526, 0x326, 0x726, 0xa6, 0x4a6, 0x2a6, 
+   0x6a6, 0x1a6, 0x5a6, 0x3a6, 0x7a6, 0x66, 0x466, 0x266, 0x666, 0x166, 
+   0x566, 0x366, 0x766, 0xe6, 0x4e6, 0x2e6, 0x6e6, 0x1e6, 0x5e6, 0x3e6, 
+   0x7e6, 0x16, 0x416, 0x216, 0x616, 0x116, 0x516, 0x316, 0x716, 0x96, 
+   0x496, 0x296, 0x696, 0x196, 0x596, 0x396, 0x796, 0x56, 0x456, 0x256, 
+   0x656, 0x156, 0x556, 0x356, 0x756, 0xd6, 0x4d6, 0x2d6, 0x6d6, 0x1d6, 
+   0x5d6, 0x3d6, 0x7d6, 0x36, 0x436, 0x236, 0x636, 0x136, 0x536, 0x336, 
+   0x736, 0xb6, 0x4b6, 0x2b6, 0x6b6, 0x1b6, 0x5b6, 0x3b6, 0x7b6, 0x76, 
+   0x476, 0x276, 0x676, 0x176, 0x576, 0x376, 0x776, 0xf6, 0x4f6, 0x2f6, 
+   0x6f6, 0x1f6, 0x5f6, 0x3f6, 0x7f6, 0xe, 0x40e, 0x20e, 0x60e, 0x10e, 
+   0x50e, 0x30e, 0x70e, 0x8e, 0x48e, 0x28e, 0x68e, 0x18e, 0x58e, 0x38e, 
+   0x78e, 0x4e, 0x44e, 0x24e, 0x64e, 0x14e, 0x54e, 0x34e, 0x74e, 0xce, 
+   0x4ce, 0x2ce, 0x6ce, 0x1ce, 0x5ce, 0x3ce, 0x7ce, 0x2e, 0x42e, 0x22e, 
+   0x62e, 0x12e, 0x52e, 0x32e, 0x72e, 0xae, 0x4ae, 0x2ae, 0x6ae, 0x1ae, 
+   0x5ae, 0x3ae, 0x7ae, 0x6e, 0x46e, 0x26e, 0x66e, 0x16e, 0x56e, 0x36e, 
+   0x76e, 0xee, 0x4ee, 0x2ee, 0x6ee, 0x1ee, 0x5ee, 0x3ee, 0x7ee, 0x1e, 
+   0x41e, 0x21e, 0x61e, 0x11e, 0x51e, 0x31e, 0x71e, 0x9e, 0x49e, 0x29e, 
+   0x69e, 0x19e, 0x59e, 0x39e, 0x79e, 0x5e, 0x45e, 0x25e, 0x65e, 0x15e, 
+   0x55e, 0x35e, 0x75e, 0xde, 0x4de, 0x2de, 0x6de, 0x1de, 0x5de, 0x3de, 
+   0x7de, 0x3e, 0x43e, 0x23e, 0x63e, 0x13e, 0x53e, 0x33e, 0x73e, 0xbe, 
+   0x4be, 0x2be, 0x6be, 0x1be, 0x5be, 0x3be, 0x7be, 0x7e, 0x47e, 0x27e, 
+   0x67e, 0x17e, 0x57e, 0x37e, 0x77e, 0xfe, 0x4fe, 0x2fe, 0x6fe, 0x1fe, 
+   0x5fe, 0x3fe, 0x7fe, 0x1 
+};
+
+
+/*    
+* @brief  Floating-point Twiddle factors Table Generation    
+*/
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 16	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_16[32] = {
+    1.000000000f,  0.000000000f,
+    0.923879533f,  0.382683432f,
+    0.707106781f,  0.707106781f,
+    0.382683432f,  0.923879533f,
+    0.000000000f,  1.000000000f,
+   -0.382683432f,  0.923879533f,
+   -0.707106781f,  0.707106781f,
+   -0.923879533f,  0.382683432f,
+   -1.000000000f,  0.000000000f,
+   -0.923879533f, -0.382683432f,
+   -0.707106781f, -0.707106781f,
+   -0.382683432f, -0.923879533f,
+   -0.000000000f, -1.000000000f,
+    0.382683432f, -0.923879533f,
+    0.707106781f, -0.707106781f,
+    0.923879533f, -0.382683432f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 32	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_32[64] = {
+    1.000000000f,  0.000000000f,
+    0.980785280f,  0.195090322f,
+    0.923879533f,  0.382683432f,
+    0.831469612f,  0.555570233f,
+    0.707106781f,  0.707106781f,
+    0.555570233f,  0.831469612f,
+    0.382683432f,  0.923879533f,
+    0.195090322f,  0.980785280f,
+    0.000000000f,  1.000000000f,
+   -0.195090322f,  0.980785280f,
+   -0.382683432f,  0.923879533f,
+   -0.555570233f,  0.831469612f,
+   -0.707106781f,  0.707106781f,
+   -0.831469612f,  0.555570233f,
+   -0.923879533f,  0.382683432f,
+   -0.980785280f,  0.195090322f,
+   -1.000000000f,  0.000000000f,
+   -0.980785280f, -0.195090322f,
+   -0.923879533f, -0.382683432f,
+   -0.831469612f, -0.555570233f,
+   -0.707106781f, -0.707106781f,
+   -0.555570233f, -0.831469612f,
+   -0.382683432f, -0.923879533f,
+   -0.195090322f, -0.980785280f,
+   -0.000000000f, -1.000000000f,
+    0.195090322f, -0.980785280f,
+    0.382683432f, -0.923879533f,
+    0.555570233f, -0.831469612f,
+    0.707106781f, -0.707106781f,
+    0.831469612f, -0.555570233f,
+    0.923879533f, -0.382683432f,
+    0.980785280f, -0.195090322f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 64	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_64[128] = {
+    1.000000000f,  0.000000000f,
+    0.995184727f,  0.098017140f,
+    0.980785280f,  0.195090322f,
+    0.956940336f,  0.290284677f,
+    0.923879533f,  0.382683432f,
+    0.881921264f,  0.471396737f,
+    0.831469612f,  0.555570233f,
+    0.773010453f,  0.634393284f,
+    0.707106781f,  0.707106781f,
+    0.634393284f,  0.773010453f,
+    0.555570233f,  0.831469612f,
+    0.471396737f,  0.881921264f,
+    0.382683432f,  0.923879533f,
+    0.290284677f,  0.956940336f,
+    0.195090322f,  0.980785280f,
+    0.098017140f,  0.995184727f,
+    0.000000000f,  1.000000000f,
+   -0.098017140f,  0.995184727f,
+   -0.195090322f,  0.980785280f,
+   -0.290284677f,  0.956940336f,
+   -0.382683432f,  0.923879533f,
+   -0.471396737f,  0.881921264f,
+   -0.555570233f,  0.831469612f,
+   -0.634393284f,  0.773010453f,
+   -0.707106781f,  0.707106781f,
+   -0.773010453f,  0.634393284f,
+   -0.831469612f,  0.555570233f,
+   -0.881921264f,  0.471396737f,
+   -0.923879533f,  0.382683432f,
+   -0.956940336f,  0.290284677f,
+   -0.980785280f,  0.195090322f,
+   -0.995184727f,  0.098017140f,
+   -1.000000000f,  0.000000000f,
+   -0.995184727f, -0.098017140f,
+   -0.980785280f, -0.195090322f,
+   -0.956940336f, -0.290284677f,
+   -0.923879533f, -0.382683432f,
+   -0.881921264f, -0.471396737f,
+   -0.831469612f, -0.555570233f,
+   -0.773010453f, -0.634393284f,
+   -0.707106781f, -0.707106781f,
+   -0.634393284f, -0.773010453f,
+   -0.555570233f, -0.831469612f,
+   -0.471396737f, -0.881921264f,
+   -0.382683432f, -0.923879533f,
+   -0.290284677f, -0.956940336f,
+   -0.195090322f, -0.980785280f,
+   -0.098017140f, -0.995184727f,
+   -0.000000000f, -1.000000000f,
+    0.098017140f, -0.995184727f,
+    0.195090322f, -0.980785280f,
+    0.290284677f, -0.956940336f,
+    0.382683432f, -0.923879533f,
+    0.471396737f, -0.881921264f,
+    0.555570233f, -0.831469612f,
+    0.634393284f, -0.773010453f,
+    0.707106781f, -0.707106781f,
+    0.773010453f, -0.634393284f,
+    0.831469612f, -0.555570233f,
+    0.881921264f, -0.471396737f,
+    0.923879533f, -0.382683432f,
+    0.956940336f, -0.290284677f,
+    0.980785280f, -0.195090322f,
+    0.995184727f, -0.098017140f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 128	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+
+const float32_t twiddleCoef_128[256] = {
+    1.000000000f	,	0.000000000f	,
+    0.998795456f	,	0.049067674f	,
+    0.995184727f	,	0.098017140f	,
+    0.989176510f	,	0.146730474f	,
+    0.980785280f	,	0.195090322f	,
+    0.970031253f	,	0.242980180f	,
+    0.956940336f	,	0.290284677f	,
+    0.941544065f	,	0.336889853f	,
+    0.923879533f	,	0.382683432f	,
+    0.903989293f	,	0.427555093f	,
+    0.881921264f	,	0.471396737f	,
+    0.857728610f	,	0.514102744f	,
+    0.831469612f	,	0.555570233f	,
+    0.803207531f	,	0.595699304f	,
+    0.773010453f	,	0.634393284f	,
+    0.740951125f	,	0.671558955f	,
+    0.707106781f	,	0.707106781f	,
+    0.671558955f	,	0.740951125f	,
+    0.634393284f	,	0.773010453f	,
+    0.595699304f	,	0.803207531f	,
+    0.555570233f	,	0.831469612f	,
+    0.514102744f	,	0.857728610f	,
+    0.471396737f	,	0.881921264f	,
+    0.427555093f	,	0.903989293f	,
+    0.382683432f	,	0.923879533f	,
+    0.336889853f	,	0.941544065f	,
+    0.290284677f	,	0.956940336f	,
+    0.242980180f	,	0.970031253f	,
+    0.195090322f	,	0.980785280f	,
+    0.146730474f	,	0.989176510f	,
+    0.098017140f	,	0.995184727f	,
+    0.049067674f	,	0.998795456f	,
+    0.000000000f	,	1.000000000f	,
+    -0.049067674f	,	0.998795456f	,
+    -0.098017140f	,	0.995184727f	,
+    -0.146730474f	,	0.989176510f	,
+    -0.195090322f	,	0.980785280f	,
+    -0.242980180f	,	0.970031253f	,
+    -0.290284677f	,	0.956940336f	,
+    -0.336889853f	,	0.941544065f	,
+    -0.382683432f	,	0.923879533f	,
+    -0.427555093f	,	0.903989293f	,
+    -0.471396737f	,	0.881921264f	,
+    -0.514102744f	,	0.857728610f	,
+    -0.555570233f	,	0.831469612f	,
+    -0.595699304f	,	0.803207531f	,
+    -0.634393284f	,	0.773010453f	,
+    -0.671558955f	,	0.740951125f	,
+    -0.707106781f	,	0.707106781f	,
+    -0.740951125f	,	0.671558955f	,
+    -0.773010453f	,	0.634393284f	,
+    -0.803207531f	,	0.595699304f	,
+    -0.831469612f	,	0.555570233f	,
+    -0.857728610f	,	0.514102744f	,
+    -0.881921264f	,	0.471396737f	,
+    -0.903989293f	,	0.427555093f	,
+    -0.923879533f	,	0.382683432f	,
+    -0.941544065f	,	0.336889853f	,
+    -0.956940336f	,	0.290284677f	,
+    -0.970031253f	,	0.242980180f	,
+    -0.980785280f	,	0.195090322f	,
+    -0.989176510f	,	0.146730474f	,
+    -0.995184727f	,	0.098017140f	,
+    -0.998795456f	,	0.049067674f	,
+    -1.000000000f	,	0.000000000f	,
+    -0.998795456f	,	-0.049067674f	,
+    -0.995184727f	,	-0.098017140f	,
+    -0.989176510f	,	-0.146730474f	,
+    -0.980785280f	,	-0.195090322f	,
+    -0.970031253f	,	-0.242980180f	,
+    -0.956940336f	,	-0.290284677f	,
+    -0.941544065f	,	-0.336889853f	,
+    -0.923879533f	,	-0.382683432f	,
+    -0.903989293f	,	-0.427555093f	,
+    -0.881921264f	,	-0.471396737f	,
+    -0.857728610f	,	-0.514102744f	,
+    -0.831469612f	,	-0.555570233f	,
+    -0.803207531f	,	-0.595699304f	,
+    -0.773010453f	,	-0.634393284f	,
+    -0.740951125f	,	-0.671558955f	,
+    -0.707106781f	,	-0.707106781f	,
+    -0.671558955f	,	-0.740951125f	,
+    -0.634393284f	,	-0.773010453f	,
+    -0.595699304f	,	-0.803207531f	,
+    -0.555570233f	,	-0.831469612f	,
+    -0.514102744f	,	-0.857728610f	,
+    -0.471396737f	,	-0.881921264f	,
+    -0.427555093f	,	-0.903989293f	,
+    -0.382683432f	,	-0.923879533f	,
+    -0.336889853f	,	-0.941544065f	,
+    -0.290284677f	,	-0.956940336f	,
+    -0.242980180f	,	-0.970031253f	,
+    -0.195090322f	,	-0.980785280f	,
+    -0.146730474f	,	-0.989176510f	,
+    -0.098017140f	,	-0.995184727f	,
+    -0.049067674f	,	-0.998795456f	,
+    -0.000000000f	,	-1.000000000f	,
+    0.049067674f	,	-0.998795456f	,
+    0.098017140f	,	-0.995184727f	,
+    0.146730474f	,	-0.989176510f	,
+    0.195090322f	,	-0.980785280f	,
+    0.242980180f	,	-0.970031253f	,
+    0.290284677f	,	-0.956940336f	,
+    0.336889853f	,	-0.941544065f	,
+    0.382683432f	,	-0.923879533f	,
+    0.427555093f	,	-0.903989293f	,
+    0.471396737f	,	-0.881921264f	,
+    0.514102744f	,	-0.857728610f	,
+    0.555570233f	,	-0.831469612f	,
+    0.595699304f	,	-0.803207531f	,
+    0.634393284f	,	-0.773010453f	,
+    0.671558955f	,	-0.740951125f	,
+    0.707106781f	,	-0.707106781f	,
+    0.740951125f	,	-0.671558955f	,
+    0.773010453f	,	-0.634393284f	,
+    0.803207531f	,	-0.595699304f	,
+    0.831469612f	,	-0.555570233f	,
+    0.857728610f	,	-0.514102744f	,
+    0.881921264f	,	-0.471396737f	,
+    0.903989293f	,	-0.427555093f	,
+    0.923879533f	,	-0.382683432f	,
+    0.941544065f	,	-0.336889853f	,
+    0.956940336f	,	-0.290284677f	,
+    0.970031253f	,	-0.242980180f	,
+    0.980785280f	,	-0.195090322f	,
+    0.989176510f	,	-0.146730474f	,
+    0.995184727f	,	-0.098017140f	,
+    0.998795456f	,	-0.049067674f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 256	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_256[512] = {
+    1.000000000f,  0.000000000f,
+    0.999698819f,  0.024541229f,
+    0.998795456f,  0.049067674f,
+    0.997290457f,  0.073564564f,
+    0.995184727f,  0.098017140f,
+    0.992479535f,  0.122410675f,
+    0.989176510f,  0.146730474f,
+    0.985277642f,  0.170961889f,
+    0.980785280f,  0.195090322f,
+    0.975702130f,  0.219101240f,
+    0.970031253f,  0.242980180f,
+    0.963776066f,  0.266712757f,
+    0.956940336f,  0.290284677f,
+    0.949528181f,  0.313681740f,
+    0.941544065f,  0.336889853f,
+    0.932992799f,  0.359895037f,
+    0.923879533f,  0.382683432f,
+    0.914209756f,  0.405241314f,
+    0.903989293f,  0.427555093f,
+    0.893224301f,  0.449611330f,
+    0.881921264f,  0.471396737f,
+    0.870086991f,  0.492898192f,
+    0.857728610f,  0.514102744f,
+    0.844853565f,  0.534997620f,
+    0.831469612f,  0.555570233f,
+    0.817584813f,  0.575808191f,
+    0.803207531f,  0.595699304f,
+    0.788346428f,  0.615231591f,
+    0.773010453f,  0.634393284f,
+    0.757208847f,  0.653172843f,
+    0.740951125f,  0.671558955f,
+    0.724247083f,  0.689540545f,
+    0.707106781f,  0.707106781f,
+    0.689540545f,  0.724247083f,
+    0.671558955f,  0.740951125f,
+    0.653172843f,  0.757208847f,
+    0.634393284f,  0.773010453f,
+    0.615231591f,  0.788346428f,
+    0.595699304f,  0.803207531f,
+    0.575808191f,  0.817584813f,
+    0.555570233f,  0.831469612f,
+    0.534997620f,  0.844853565f,
+    0.514102744f,  0.857728610f,
+    0.492898192f,  0.870086991f,
+    0.471396737f,  0.881921264f,
+    0.449611330f,  0.893224301f,
+    0.427555093f,  0.903989293f,
+    0.405241314f,  0.914209756f,
+    0.382683432f,  0.923879533f,
+    0.359895037f,  0.932992799f,
+    0.336889853f,  0.941544065f,
+    0.313681740f,  0.949528181f,
+    0.290284677f,  0.956940336f,
+    0.266712757f,  0.963776066f,
+    0.242980180f,  0.970031253f,
+    0.219101240f,  0.975702130f,
+    0.195090322f,  0.980785280f,
+    0.170961889f,  0.985277642f,
+    0.146730474f,  0.989176510f,
+    0.122410675f,  0.992479535f,
+    0.098017140f,  0.995184727f,
+    0.073564564f,  0.997290457f,
+    0.049067674f,  0.998795456f,
+    0.024541229f,  0.999698819f,
+    0.000000000f,  1.000000000f,
+   -0.024541229f,  0.999698819f,
+   -0.049067674f,  0.998795456f,
+   -0.073564564f,  0.997290457f,
+   -0.098017140f,  0.995184727f,
+   -0.122410675f,  0.992479535f,
+   -0.146730474f,  0.989176510f,
+   -0.170961889f,  0.985277642f,
+   -0.195090322f,  0.980785280f,
+   -0.219101240f,  0.975702130f,
+   -0.242980180f,  0.970031253f,
+   -0.266712757f,  0.963776066f,
+   -0.290284677f,  0.956940336f,
+   -0.313681740f,  0.949528181f,
+   -0.336889853f,  0.941544065f,
+   -0.359895037f,  0.932992799f,
+   -0.382683432f,  0.923879533f,
+   -0.405241314f,  0.914209756f,
+   -0.427555093f,  0.903989293f,
+   -0.449611330f,  0.893224301f,
+   -0.471396737f,  0.881921264f,
+   -0.492898192f,  0.870086991f,
+   -0.514102744f,  0.857728610f,
+   -0.534997620f,  0.844853565f,
+   -0.555570233f,  0.831469612f,
+   -0.575808191f,  0.817584813f,
+   -0.595699304f,  0.803207531f,
+   -0.615231591f,  0.788346428f,
+   -0.634393284f,  0.773010453f,
+   -0.653172843f,  0.757208847f,
+   -0.671558955f,  0.740951125f,
+   -0.689540545f,  0.724247083f,
+   -0.707106781f,  0.707106781f,
+   -0.724247083f,  0.689540545f,
+   -0.740951125f,  0.671558955f,
+   -0.757208847f,  0.653172843f,
+   -0.773010453f,  0.634393284f,
+   -0.788346428f,  0.615231591f,
+   -0.803207531f,  0.595699304f,
+   -0.817584813f,  0.575808191f,
+   -0.831469612f,  0.555570233f,
+   -0.844853565f,  0.534997620f,
+   -0.857728610f,  0.514102744f,
+   -0.870086991f,  0.492898192f,
+   -0.881921264f,  0.471396737f,
+   -0.893224301f,  0.449611330f,
+   -0.903989293f,  0.427555093f,
+   -0.914209756f,  0.405241314f,
+   -0.923879533f,  0.382683432f,
+   -0.932992799f,  0.359895037f,
+   -0.941544065f,  0.336889853f,
+   -0.949528181f,  0.313681740f,
+   -0.956940336f,  0.290284677f,
+   -0.963776066f,  0.266712757f,
+   -0.970031253f,  0.242980180f,
+   -0.975702130f,  0.219101240f,
+   -0.980785280f,  0.195090322f,
+   -0.985277642f,  0.170961889f,
+   -0.989176510f,  0.146730474f,
+   -0.992479535f,  0.122410675f,
+   -0.995184727f,  0.098017140f,
+   -0.997290457f,  0.073564564f,
+   -0.998795456f,  0.049067674f,
+   -0.999698819f,  0.024541229f,
+   -1.000000000f,  0.000000000f,
+   -0.999698819f, -0.024541229f,
+   -0.998795456f, -0.049067674f,
+   -0.997290457f, -0.073564564f,
+   -0.995184727f, -0.098017140f,
+   -0.992479535f, -0.122410675f,
+   -0.989176510f, -0.146730474f,
+   -0.985277642f, -0.170961889f,
+   -0.980785280f, -0.195090322f,
+   -0.975702130f, -0.219101240f,
+   -0.970031253f, -0.242980180f,
+   -0.963776066f, -0.266712757f,
+   -0.956940336f, -0.290284677f,
+   -0.949528181f, -0.313681740f,
+   -0.941544065f, -0.336889853f,
+   -0.932992799f, -0.359895037f,
+   -0.923879533f, -0.382683432f,
+   -0.914209756f, -0.405241314f,
+   -0.903989293f, -0.427555093f,
+   -0.893224301f, -0.449611330f,
+   -0.881921264f, -0.471396737f,
+   -0.870086991f, -0.492898192f,
+   -0.857728610f, -0.514102744f,
+   -0.844853565f, -0.534997620f,
+   -0.831469612f, -0.555570233f,
+   -0.817584813f, -0.575808191f,
+   -0.803207531f, -0.595699304f,
+   -0.788346428f, -0.615231591f,
+   -0.773010453f, -0.634393284f,
+   -0.757208847f, -0.653172843f,
+   -0.740951125f, -0.671558955f,
+   -0.724247083f, -0.689540545f,
+   -0.707106781f, -0.707106781f,
+   -0.689540545f, -0.724247083f,
+   -0.671558955f, -0.740951125f,
+   -0.653172843f, -0.757208847f,
+   -0.634393284f, -0.773010453f,
+   -0.615231591f, -0.788346428f,
+   -0.595699304f, -0.803207531f,
+   -0.575808191f, -0.817584813f,
+   -0.555570233f, -0.831469612f,
+   -0.534997620f, -0.844853565f,
+   -0.514102744f, -0.857728610f,
+   -0.492898192f, -0.870086991f,
+   -0.471396737f, -0.881921264f,
+   -0.449611330f, -0.893224301f,
+   -0.427555093f, -0.903989293f,
+   -0.405241314f, -0.914209756f,
+   -0.382683432f, -0.923879533f,
+   -0.359895037f, -0.932992799f,
+   -0.336889853f, -0.941544065f,
+   -0.313681740f, -0.949528181f,
+   -0.290284677f, -0.956940336f,
+   -0.266712757f, -0.963776066f,
+   -0.242980180f, -0.970031253f,
+   -0.219101240f, -0.975702130f,
+   -0.195090322f, -0.980785280f,
+   -0.170961889f, -0.985277642f,
+   -0.146730474f, -0.989176510f,
+   -0.122410675f, -0.992479535f,
+   -0.098017140f, -0.995184727f,
+   -0.073564564f, -0.997290457f,
+   -0.049067674f, -0.998795456f,
+   -0.024541229f, -0.999698819f,
+   -0.000000000f, -1.000000000f,
+    0.024541229f, -0.999698819f,
+    0.049067674f, -0.998795456f,
+    0.073564564f, -0.997290457f,
+    0.098017140f, -0.995184727f,
+    0.122410675f, -0.992479535f,
+    0.146730474f, -0.989176510f,
+    0.170961889f, -0.985277642f,
+    0.195090322f, -0.980785280f,
+    0.219101240f, -0.975702130f,
+    0.242980180f, -0.970031253f,
+    0.266712757f, -0.963776066f,
+    0.290284677f, -0.956940336f,
+    0.313681740f, -0.949528181f,
+    0.336889853f, -0.941544065f,
+    0.359895037f, -0.932992799f,
+    0.382683432f, -0.923879533f,
+    0.405241314f, -0.914209756f,
+    0.427555093f, -0.903989293f,
+    0.449611330f, -0.893224301f,
+    0.471396737f, -0.881921264f,
+    0.492898192f, -0.870086991f,
+    0.514102744f, -0.857728610f,
+    0.534997620f, -0.844853565f,
+    0.555570233f, -0.831469612f,
+    0.575808191f, -0.817584813f,
+    0.595699304f, -0.803207531f,
+    0.615231591f, -0.788346428f,
+    0.634393284f, -0.773010453f,
+    0.653172843f, -0.757208847f,
+    0.671558955f, -0.740951125f,
+    0.689540545f, -0.724247083f,
+    0.707106781f, -0.707106781f,
+    0.724247083f, -0.689540545f,
+    0.740951125f, -0.671558955f,
+    0.757208847f, -0.653172843f,
+    0.773010453f, -0.634393284f,
+    0.788346428f, -0.615231591f,
+    0.803207531f, -0.595699304f,
+    0.817584813f, -0.575808191f,
+    0.831469612f, -0.555570233f,
+    0.844853565f, -0.534997620f,
+    0.857728610f, -0.514102744f,
+    0.870086991f, -0.492898192f,
+    0.881921264f, -0.471396737f,
+    0.893224301f, -0.449611330f,
+    0.903989293f, -0.427555093f,
+    0.914209756f, -0.405241314f,
+    0.923879533f, -0.382683432f,
+    0.932992799f, -0.359895037f,
+    0.941544065f, -0.336889853f,
+    0.949528181f, -0.313681740f,
+    0.956940336f, -0.290284677f,
+    0.963776066f, -0.266712757f,
+    0.970031253f, -0.242980180f,
+    0.975702130f, -0.219101240f,
+    0.980785280f, -0.195090322f,
+    0.985277642f, -0.170961889f,
+    0.989176510f, -0.146730474f,
+    0.992479535f, -0.122410675f,
+    0.995184727f, -0.098017140f,
+    0.997290457f, -0.073564564f,
+    0.998795456f, -0.049067674f,
+    0.999698819f, -0.024541229f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 512	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_512[1024] = {
+    1.000000000f,  0.000000000f,
+    0.999924702f,  0.012271538f,
+    0.999698819f,  0.024541229f,
+    0.999322385f,  0.036807223f,
+    0.998795456f,  0.049067674f,
+    0.998118113f,  0.061320736f,
+    0.997290457f,  0.073564564f,
+    0.996312612f,  0.085797312f,
+    0.995184727f,  0.098017140f,
+    0.993906970f,  0.110222207f,
+    0.992479535f,  0.122410675f,
+    0.990902635f,  0.134580709f,
+    0.989176510f,  0.146730474f,
+    0.987301418f,  0.158858143f,
+    0.985277642f,  0.170961889f,
+    0.983105487f,  0.183039888f,
+    0.980785280f,  0.195090322f,
+    0.978317371f,  0.207111376f,
+    0.975702130f,  0.219101240f,
+    0.972939952f,  0.231058108f,
+    0.970031253f,  0.242980180f,
+    0.966976471f,  0.254865660f,
+    0.963776066f,  0.266712757f,
+    0.960430519f,  0.278519689f,
+    0.956940336f,  0.290284677f,
+    0.953306040f,  0.302005949f,
+    0.949528181f,  0.313681740f,
+    0.945607325f,  0.325310292f,
+    0.941544065f,  0.336889853f,
+    0.937339012f,  0.348418680f,
+    0.932992799f,  0.359895037f,
+    0.928506080f,  0.371317194f,
+    0.923879533f,  0.382683432f,
+    0.919113852f,  0.393992040f,
+    0.914209756f,  0.405241314f,
+    0.909167983f,  0.416429560f,
+    0.903989293f,  0.427555093f,
+    0.898674466f,  0.438616239f,
+    0.893224301f,  0.449611330f,
+    0.887639620f,  0.460538711f,
+    0.881921264f,  0.471396737f,
+    0.876070094f,  0.482183772f,
+    0.870086991f,  0.492898192f,
+    0.863972856f,  0.503538384f,
+    0.857728610f,  0.514102744f,
+    0.851355193f,  0.524589683f,
+    0.844853565f,  0.534997620f,
+    0.838224706f,  0.545324988f,
+    0.831469612f,  0.555570233f,
+    0.824589303f,  0.565731811f,
+    0.817584813f,  0.575808191f,
+    0.810457198f,  0.585797857f,
+    0.803207531f,  0.595699304f,
+    0.795836905f,  0.605511041f,
+    0.788346428f,  0.615231591f,
+    0.780737229f,  0.624859488f,
+    0.773010453f,  0.634393284f,
+    0.765167266f,  0.643831543f,
+    0.757208847f,  0.653172843f,
+    0.749136395f,  0.662415778f,
+    0.740951125f,  0.671558955f,
+    0.732654272f,  0.680600998f,
+    0.724247083f,  0.689540545f,
+    0.715730825f,  0.698376249f,
+    0.707106781f,  0.707106781f,
+    0.698376249f,  0.715730825f,
+    0.689540545f,  0.724247083f,
+    0.680600998f,  0.732654272f,
+    0.671558955f,  0.740951125f,
+    0.662415778f,  0.749136395f,
+    0.653172843f,  0.757208847f,
+    0.643831543f,  0.765167266f,
+    0.634393284f,  0.773010453f,
+    0.624859488f,  0.780737229f,
+    0.615231591f,  0.788346428f,
+    0.605511041f,  0.795836905f,
+    0.595699304f,  0.803207531f,
+    0.585797857f,  0.810457198f,
+    0.575808191f,  0.817584813f,
+    0.565731811f,  0.824589303f,
+    0.555570233f,  0.831469612f,
+    0.545324988f,  0.838224706f,
+    0.534997620f,  0.844853565f,
+    0.524589683f,  0.851355193f,
+    0.514102744f,  0.857728610f,
+    0.503538384f,  0.863972856f,
+    0.492898192f,  0.870086991f,
+    0.482183772f,  0.876070094f,
+    0.471396737f,  0.881921264f,
+    0.460538711f,  0.887639620f,
+    0.449611330f,  0.893224301f,
+    0.438616239f,  0.898674466f,
+    0.427555093f,  0.903989293f,
+    0.416429560f,  0.909167983f,
+    0.405241314f,  0.914209756f,
+    0.393992040f,  0.919113852f,
+    0.382683432f,  0.923879533f,
+    0.371317194f,  0.928506080f,
+    0.359895037f,  0.932992799f,
+    0.348418680f,  0.937339012f,
+    0.336889853f,  0.941544065f,
+    0.325310292f,  0.945607325f,
+    0.313681740f,  0.949528181f,
+    0.302005949f,  0.953306040f,
+    0.290284677f,  0.956940336f,
+    0.278519689f,  0.960430519f,
+    0.266712757f,  0.963776066f,
+    0.254865660f,  0.966976471f,
+    0.242980180f,  0.970031253f,
+    0.231058108f,  0.972939952f,
+    0.219101240f,  0.975702130f,
+    0.207111376f,  0.978317371f,
+    0.195090322f,  0.980785280f,
+    0.183039888f,  0.983105487f,
+    0.170961889f,  0.985277642f,
+    0.158858143f,  0.987301418f,
+    0.146730474f,  0.989176510f,
+    0.134580709f,  0.990902635f,
+    0.122410675f,  0.992479535f,
+    0.110222207f,  0.993906970f,
+    0.098017140f,  0.995184727f,
+    0.085797312f,  0.996312612f,
+    0.073564564f,  0.997290457f,
+    0.061320736f,  0.998118113f,
+    0.049067674f,  0.998795456f,
+    0.036807223f,  0.999322385f,
+    0.024541229f,  0.999698819f,
+    0.012271538f,  0.999924702f,
+    0.000000000f,  1.000000000f,
+   -0.012271538f,  0.999924702f,
+   -0.024541229f,  0.999698819f,
+   -0.036807223f,  0.999322385f,
+   -0.049067674f,  0.998795456f,
+   -0.061320736f,  0.998118113f,
+   -0.073564564f,  0.997290457f,
+   -0.085797312f,  0.996312612f,
+   -0.098017140f,  0.995184727f,
+   -0.110222207f,  0.993906970f,
+   -0.122410675f,  0.992479535f,
+   -0.134580709f,  0.990902635f,
+   -0.146730474f,  0.989176510f,
+   -0.158858143f,  0.987301418f,
+   -0.170961889f,  0.985277642f,
+   -0.183039888f,  0.983105487f,
+   -0.195090322f,  0.980785280f,
+   -0.207111376f,  0.978317371f,
+   -0.219101240f,  0.975702130f,
+   -0.231058108f,  0.972939952f,
+   -0.242980180f,  0.970031253f,
+   -0.254865660f,  0.966976471f,
+   -0.266712757f,  0.963776066f,
+   -0.278519689f,  0.960430519f,
+   -0.290284677f,  0.956940336f,
+   -0.302005949f,  0.953306040f,
+   -0.313681740f,  0.949528181f,
+   -0.325310292f,  0.945607325f,
+   -0.336889853f,  0.941544065f,
+   -0.348418680f,  0.937339012f,
+   -0.359895037f,  0.932992799f,
+   -0.371317194f,  0.928506080f,
+   -0.382683432f,  0.923879533f,
+   -0.393992040f,  0.919113852f,
+   -0.405241314f,  0.914209756f,
+   -0.416429560f,  0.909167983f,
+   -0.427555093f,  0.903989293f,
+   -0.438616239f,  0.898674466f,
+   -0.449611330f,  0.893224301f,
+   -0.460538711f,  0.887639620f,
+   -0.471396737f,  0.881921264f,
+   -0.482183772f,  0.876070094f,
+   -0.492898192f,  0.870086991f,
+   -0.503538384f,  0.863972856f,
+   -0.514102744f,  0.857728610f,
+   -0.524589683f,  0.851355193f,
+   -0.534997620f,  0.844853565f,
+   -0.545324988f,  0.838224706f,
+   -0.555570233f,  0.831469612f,
+   -0.565731811f,  0.824589303f,
+   -0.575808191f,  0.817584813f,
+   -0.585797857f,  0.810457198f,
+   -0.595699304f,  0.803207531f,
+   -0.605511041f,  0.795836905f,
+   -0.615231591f,  0.788346428f,
+   -0.624859488f,  0.780737229f,
+   -0.634393284f,  0.773010453f,
+   -0.643831543f,  0.765167266f,
+   -0.653172843f,  0.757208847f,
+   -0.662415778f,  0.749136395f,
+   -0.671558955f,  0.740951125f,
+   -0.680600998f,  0.732654272f,
+   -0.689540545f,  0.724247083f,
+   -0.698376249f,  0.715730825f,
+   -0.707106781f,  0.707106781f,
+   -0.715730825f,  0.698376249f,
+   -0.724247083f,  0.689540545f,
+   -0.732654272f,  0.680600998f,
+   -0.740951125f,  0.671558955f,
+   -0.749136395f,  0.662415778f,
+   -0.757208847f,  0.653172843f,
+   -0.765167266f,  0.643831543f,
+   -0.773010453f,  0.634393284f,
+   -0.780737229f,  0.624859488f,
+   -0.788346428f,  0.615231591f,
+   -0.795836905f,  0.605511041f,
+   -0.803207531f,  0.595699304f,
+   -0.810457198f,  0.585797857f,
+   -0.817584813f,  0.575808191f,
+   -0.824589303f,  0.565731811f,
+   -0.831469612f,  0.555570233f,
+   -0.838224706f,  0.545324988f,
+   -0.844853565f,  0.534997620f,
+   -0.851355193f,  0.524589683f,
+   -0.857728610f,  0.514102744f,
+   -0.863972856f,  0.503538384f,
+   -0.870086991f,  0.492898192f,
+   -0.876070094f,  0.482183772f,
+   -0.881921264f,  0.471396737f,
+   -0.887639620f,  0.460538711f,
+   -0.893224301f,  0.449611330f,
+   -0.898674466f,  0.438616239f,
+   -0.903989293f,  0.427555093f,
+   -0.909167983f,  0.416429560f,
+   -0.914209756f,  0.405241314f,
+   -0.919113852f,  0.393992040f,
+   -0.923879533f,  0.382683432f,
+   -0.928506080f,  0.371317194f,
+   -0.932992799f,  0.359895037f,
+   -0.937339012f,  0.348418680f,
+   -0.941544065f,  0.336889853f,
+   -0.945607325f,  0.325310292f,
+   -0.949528181f,  0.313681740f,
+   -0.953306040f,  0.302005949f,
+   -0.956940336f,  0.290284677f,
+   -0.960430519f,  0.278519689f,
+   -0.963776066f,  0.266712757f,
+   -0.966976471f,  0.254865660f,
+   -0.970031253f,  0.242980180f,
+   -0.972939952f,  0.231058108f,
+   -0.975702130f,  0.219101240f,
+   -0.978317371f,  0.207111376f,
+   -0.980785280f,  0.195090322f,
+   -0.983105487f,  0.183039888f,
+   -0.985277642f,  0.170961889f,
+   -0.987301418f,  0.158858143f,
+   -0.989176510f,  0.146730474f,
+   -0.990902635f,  0.134580709f,
+   -0.992479535f,  0.122410675f,
+   -0.993906970f,  0.110222207f,
+   -0.995184727f,  0.098017140f,
+   -0.996312612f,  0.085797312f,
+   -0.997290457f,  0.073564564f,
+   -0.998118113f,  0.061320736f,
+   -0.998795456f,  0.049067674f,
+   -0.999322385f,  0.036807223f,
+   -0.999698819f,  0.024541229f,
+   -0.999924702f,  0.012271538f,
+   -1.000000000f,  0.000000000f,
+   -0.999924702f, -0.012271538f,
+   -0.999698819f, -0.024541229f,
+   -0.999322385f, -0.036807223f,
+   -0.998795456f, -0.049067674f,
+   -0.998118113f, -0.061320736f,
+   -0.997290457f, -0.073564564f,
+   -0.996312612f, -0.085797312f,
+   -0.995184727f, -0.098017140f,
+   -0.993906970f, -0.110222207f,
+   -0.992479535f, -0.122410675f,
+   -0.990902635f, -0.134580709f,
+   -0.989176510f, -0.146730474f,
+   -0.987301418f, -0.158858143f,
+   -0.985277642f, -0.170961889f,
+   -0.983105487f, -0.183039888f,
+   -0.980785280f, -0.195090322f,
+   -0.978317371f, -0.207111376f,
+   -0.975702130f, -0.219101240f,
+   -0.972939952f, -0.231058108f,
+   -0.970031253f, -0.242980180f,
+   -0.966976471f, -0.254865660f,
+   -0.963776066f, -0.266712757f,
+   -0.960430519f, -0.278519689f,
+   -0.956940336f, -0.290284677f,
+   -0.953306040f, -0.302005949f,
+   -0.949528181f, -0.313681740f,
+   -0.945607325f, -0.325310292f,
+   -0.941544065f, -0.336889853f,
+   -0.937339012f, -0.348418680f,
+   -0.932992799f, -0.359895037f,
+   -0.928506080f, -0.371317194f,
+   -0.923879533f, -0.382683432f,
+   -0.919113852f, -0.393992040f,
+   -0.914209756f, -0.405241314f,
+   -0.909167983f, -0.416429560f,
+   -0.903989293f, -0.427555093f,
+   -0.898674466f, -0.438616239f,
+   -0.893224301f, -0.449611330f,
+   -0.887639620f, -0.460538711f,
+   -0.881921264f, -0.471396737f,
+   -0.876070094f, -0.482183772f,
+   -0.870086991f, -0.492898192f,
+   -0.863972856f, -0.503538384f,
+   -0.857728610f, -0.514102744f,
+   -0.851355193f, -0.524589683f,
+   -0.844853565f, -0.534997620f,
+   -0.838224706f, -0.545324988f,
+   -0.831469612f, -0.555570233f,
+   -0.824589303f, -0.565731811f,
+   -0.817584813f, -0.575808191f,
+   -0.810457198f, -0.585797857f,
+   -0.803207531f, -0.595699304f,
+   -0.795836905f, -0.605511041f,
+   -0.788346428f, -0.615231591f,
+   -0.780737229f, -0.624859488f,
+   -0.773010453f, -0.634393284f,
+   -0.765167266f, -0.643831543f,
+   -0.757208847f, -0.653172843f,
+   -0.749136395f, -0.662415778f,
+   -0.740951125f, -0.671558955f,
+   -0.732654272f, -0.680600998f,
+   -0.724247083f, -0.689540545f,
+   -0.715730825f, -0.698376249f,
+   -0.707106781f, -0.707106781f,
+   -0.698376249f, -0.715730825f,
+   -0.689540545f, -0.724247083f,
+   -0.680600998f, -0.732654272f,
+   -0.671558955f, -0.740951125f,
+   -0.662415778f, -0.749136395f,
+   -0.653172843f, -0.757208847f,
+   -0.643831543f, -0.765167266f,
+   -0.634393284f, -0.773010453f,
+   -0.624859488f, -0.780737229f,
+   -0.615231591f, -0.788346428f,
+   -0.605511041f, -0.795836905f,
+   -0.595699304f, -0.803207531f,
+   -0.585797857f, -0.810457198f,
+   -0.575808191f, -0.817584813f,
+   -0.565731811f, -0.824589303f,
+   -0.555570233f, -0.831469612f,
+   -0.545324988f, -0.838224706f,
+   -0.534997620f, -0.844853565f,
+   -0.524589683f, -0.851355193f,
+   -0.514102744f, -0.857728610f,
+   -0.503538384f, -0.863972856f,
+   -0.492898192f, -0.870086991f,
+   -0.482183772f, -0.876070094f,
+   -0.471396737f, -0.881921264f,
+   -0.460538711f, -0.887639620f,
+   -0.449611330f, -0.893224301f,
+   -0.438616239f, -0.898674466f,
+   -0.427555093f, -0.903989293f,
+   -0.416429560f, -0.909167983f,
+   -0.405241314f, -0.914209756f,
+   -0.393992040f, -0.919113852f,
+   -0.382683432f, -0.923879533f,
+   -0.371317194f, -0.928506080f,
+   -0.359895037f, -0.932992799f,
+   -0.348418680f, -0.937339012f,
+   -0.336889853f, -0.941544065f,
+   -0.325310292f, -0.945607325f,
+   -0.313681740f, -0.949528181f,
+   -0.302005949f, -0.953306040f,
+   -0.290284677f, -0.956940336f,
+   -0.278519689f, -0.960430519f,
+   -0.266712757f, -0.963776066f,
+   -0.254865660f, -0.966976471f,
+   -0.242980180f, -0.970031253f,
+   -0.231058108f, -0.972939952f,
+   -0.219101240f, -0.975702130f,
+   -0.207111376f, -0.978317371f,
+   -0.195090322f, -0.980785280f,
+   -0.183039888f, -0.983105487f,
+   -0.170961889f, -0.985277642f,
+   -0.158858143f, -0.987301418f,
+   -0.146730474f, -0.989176510f,
+   -0.134580709f, -0.990902635f,
+   -0.122410675f, -0.992479535f,
+   -0.110222207f, -0.993906970f,
+   -0.098017140f, -0.995184727f,
+   -0.085797312f, -0.996312612f,
+   -0.073564564f, -0.997290457f,
+   -0.061320736f, -0.998118113f,
+   -0.049067674f, -0.998795456f,
+   -0.036807223f, -0.999322385f,
+   -0.024541229f, -0.999698819f,
+   -0.012271538f, -0.999924702f,
+   -0.000000000f, -1.000000000f,
+    0.012271538f, -0.999924702f,
+    0.024541229f, -0.999698819f,
+    0.036807223f, -0.999322385f,
+    0.049067674f, -0.998795456f,
+    0.061320736f, -0.998118113f,
+    0.073564564f, -0.997290457f,
+    0.085797312f, -0.996312612f,
+    0.098017140f, -0.995184727f,
+    0.110222207f, -0.993906970f,
+    0.122410675f, -0.992479535f,
+    0.134580709f, -0.990902635f,
+    0.146730474f, -0.989176510f,
+    0.158858143f, -0.987301418f,
+    0.170961889f, -0.985277642f,
+    0.183039888f, -0.983105487f,
+    0.195090322f, -0.980785280f,
+    0.207111376f, -0.978317371f,
+    0.219101240f, -0.975702130f,
+    0.231058108f, -0.972939952f,
+    0.242980180f, -0.970031253f,
+    0.254865660f, -0.966976471f,
+    0.266712757f, -0.963776066f,
+    0.278519689f, -0.960430519f,
+    0.290284677f, -0.956940336f,
+    0.302005949f, -0.953306040f,
+    0.313681740f, -0.949528181f,
+    0.325310292f, -0.945607325f,
+    0.336889853f, -0.941544065f,
+    0.348418680f, -0.937339012f,
+    0.359895037f, -0.932992799f,
+    0.371317194f, -0.928506080f,
+    0.382683432f, -0.923879533f,
+    0.393992040f, -0.919113852f,
+    0.405241314f, -0.914209756f,
+    0.416429560f, -0.909167983f,
+    0.427555093f, -0.903989293f,
+    0.438616239f, -0.898674466f,
+    0.449611330f, -0.893224301f,
+    0.460538711f, -0.887639620f,
+    0.471396737f, -0.881921264f,
+    0.482183772f, -0.876070094f,
+    0.492898192f, -0.870086991f,
+    0.503538384f, -0.863972856f,
+    0.514102744f, -0.857728610f,
+    0.524589683f, -0.851355193f,
+    0.534997620f, -0.844853565f,
+    0.545324988f, -0.838224706f,
+    0.555570233f, -0.831469612f,
+    0.565731811f, -0.824589303f,
+    0.575808191f, -0.817584813f,
+    0.585797857f, -0.810457198f,
+    0.595699304f, -0.803207531f,
+    0.605511041f, -0.795836905f,
+    0.615231591f, -0.788346428f,
+    0.624859488f, -0.780737229f,
+    0.634393284f, -0.773010453f,
+    0.643831543f, -0.765167266f,
+    0.653172843f, -0.757208847f,
+    0.662415778f, -0.749136395f,
+    0.671558955f, -0.740951125f,
+    0.680600998f, -0.732654272f,
+    0.689540545f, -0.724247083f,
+    0.698376249f, -0.715730825f,
+    0.707106781f, -0.707106781f,
+    0.715730825f, -0.698376249f,
+    0.724247083f, -0.689540545f,
+    0.732654272f, -0.680600998f,
+    0.740951125f, -0.671558955f,
+    0.749136395f, -0.662415778f,
+    0.757208847f, -0.653172843f,
+    0.765167266f, -0.643831543f,
+    0.773010453f, -0.634393284f,
+    0.780737229f, -0.624859488f,
+    0.788346428f, -0.615231591f,
+    0.795836905f, -0.605511041f,
+    0.803207531f, -0.595699304f,
+    0.810457198f, -0.585797857f,
+    0.817584813f, -0.575808191f,
+    0.824589303f, -0.565731811f,
+    0.831469612f, -0.555570233f,
+    0.838224706f, -0.545324988f,
+    0.844853565f, -0.534997620f,
+    0.851355193f, -0.524589683f,
+    0.857728610f, -0.514102744f,
+    0.863972856f, -0.503538384f,
+    0.870086991f, -0.492898192f,
+    0.876070094f, -0.482183772f,
+    0.881921264f, -0.471396737f,
+    0.887639620f, -0.460538711f,
+    0.893224301f, -0.449611330f,
+    0.898674466f, -0.438616239f,
+    0.903989293f, -0.427555093f,
+    0.909167983f, -0.416429560f,
+    0.914209756f, -0.405241314f,
+    0.919113852f, -0.393992040f,
+    0.923879533f, -0.382683432f,
+    0.928506080f, -0.371317194f,
+    0.932992799f, -0.359895037f,
+    0.937339012f, -0.348418680f,
+    0.941544065f, -0.336889853f,
+    0.945607325f, -0.325310292f,
+    0.949528181f, -0.313681740f,
+    0.953306040f, -0.302005949f,
+    0.956940336f, -0.290284677f,
+    0.960430519f, -0.278519689f,
+    0.963776066f, -0.266712757f,
+    0.966976471f, -0.254865660f,
+    0.970031253f, -0.242980180f,
+    0.972939952f, -0.231058108f,
+    0.975702130f, -0.219101240f,
+    0.978317371f, -0.207111376f,
+    0.980785280f, -0.195090322f,
+    0.983105487f, -0.183039888f,
+    0.985277642f, -0.170961889f,
+    0.987301418f, -0.158858143f,
+    0.989176510f, -0.146730474f,
+    0.990902635f, -0.134580709f,
+    0.992479535f, -0.122410675f,
+    0.993906970f, -0.110222207f,
+    0.995184727f, -0.098017140f,
+    0.996312612f, -0.085797312f,
+    0.997290457f, -0.073564564f,
+    0.998118113f, -0.061320736f,
+    0.998795456f, -0.049067674f,
+    0.999322385f, -0.036807223f,
+    0.999698819f, -0.024541229f,
+    0.999924702f, -0.012271538f
+};
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 1024	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_1024[2048] = {
+1.000000000f	,	0.000000000f	,
+0.999981175f	,	0.006135885f	,
+0.999924702f	,	0.012271538f	,
+0.999830582f	,	0.018406730f	,
+0.999698819f	,	0.024541229f	,
+0.999529418f	,	0.030674803f	,
+0.999322385f	,	0.036807223f	,
+0.999077728f	,	0.042938257f	,
+0.998795456f	,	0.049067674f	,
+0.998475581f	,	0.055195244f	,
+0.998118113f	,	0.061320736f	,
+0.997723067f	,	0.067443920f	,
+0.997290457f	,	0.073564564f	,
+0.996820299f	,	0.079682438f	,
+0.996312612f	,	0.085797312f	,
+0.995767414f	,	0.091908956f	,
+0.995184727f	,	0.098017140f	,
+0.994564571f	,	0.104121634f	,
+0.993906970f	,	0.110222207f	,
+0.993211949f	,	0.116318631f	,
+0.992479535f	,	0.122410675f	,
+0.991709754f	,	0.128498111f	,
+0.990902635f	,	0.134580709f	,
+0.990058210f	,	0.140658239f	,
+0.989176510f	,	0.146730474f	,
+0.988257568f	,	0.152797185f	,
+0.987301418f	,	0.158858143f	,
+0.986308097f	,	0.164913120f	,
+0.985277642f	,	0.170961889f	,
+0.984210092f	,	0.177004220f	,
+0.983105487f	,	0.183039888f	,
+0.981963869f	,	0.189068664f	,
+0.980785280f	,	0.195090322f	,
+0.979569766f	,	0.201104635f	,
+0.978317371f	,	0.207111376f	,
+0.977028143f	,	0.213110320f	,
+0.975702130f	,	0.219101240f	,
+0.974339383f	,	0.225083911f	,
+0.972939952f	,	0.231058108f	,
+0.971503891f	,	0.237023606f	,
+0.970031253f	,	0.242980180f	,
+0.968522094f	,	0.248927606f	,
+0.966976471f	,	0.254865660f	,
+0.965394442f	,	0.260794118f	,
+0.963776066f	,	0.266712757f	,
+0.962121404f	,	0.272621355f	,
+0.960430519f	,	0.278519689f	,
+0.958703475f	,	0.284407537f	,
+0.956940336f	,	0.290284677f	,
+0.955141168f	,	0.296150888f	,
+0.953306040f	,	0.302005949f	,
+0.951435021f	,	0.307849640f	,
+0.949528181f	,	0.313681740f	,
+0.947585591f	,	0.319502031f	,
+0.945607325f	,	0.325310292f	,
+0.943593458f	,	0.331106306f	,
+0.941544065f	,	0.336889853f	,
+0.939459224f	,	0.342660717f	,
+0.937339012f	,	0.348418680f	,
+0.935183510f	,	0.354163525f	,
+0.932992799f	,	0.359895037f	,
+0.930766961f	,	0.365612998f	,
+0.928506080f	,	0.371317194f	,
+0.926210242f	,	0.377007410f	,
+0.923879533f	,	0.382683432f	,
+0.921514039f	,	0.388345047f	,
+0.919113852f	,	0.393992040f	,
+0.916679060f	,	0.399624200f	,
+0.914209756f	,	0.405241314f	,
+0.911706032f	,	0.410843171f	,
+0.909167983f	,	0.416429560f	,
+0.906595705f	,	0.422000271f	,
+0.903989293f	,	0.427555093f	,
+0.901348847f	,	0.433093819f	,
+0.898674466f	,	0.438616239f	,
+0.895966250f	,	0.444122145f	,
+0.893224301f	,	0.449611330f	,
+0.890448723f	,	0.455083587f	,
+0.887639620f	,	0.460538711f	,
+0.884797098f	,	0.465976496f	,
+0.881921264f	,	0.471396737f	,
+0.879012226f	,	0.476799230f	,
+0.876070094f	,	0.482183772f	,
+0.873094978f	,	0.487550160f	,
+0.870086991f	,	0.492898192f	,
+0.867046246f	,	0.498227667f	,
+0.863972856f	,	0.503538384f	,
+0.860866939f	,	0.508830143f	,
+0.857728610f	,	0.514102744f	,
+0.854557988f	,	0.519355990f	,
+0.851355193f	,	0.524589683f	,
+0.848120345f	,	0.529803625f	,
+0.844853565f	,	0.534997620f	,
+0.841554977f	,	0.540171473f	,
+0.838224706f	,	0.545324988f	,
+0.834862875f	,	0.550457973f	,
+0.831469612f	,	0.555570233f	,
+0.828045045f	,	0.560661576f	,
+0.824589303f	,	0.565731811f	,
+0.821102515f	,	0.570780746f	,
+0.817584813f	,	0.575808191f	,
+0.814036330f	,	0.580813958f	,
+0.810457198f	,	0.585797857f	,
+0.806847554f	,	0.590759702f	,
+0.803207531f	,	0.595699304f	,
+0.799537269f	,	0.600616479f	,
+0.795836905f	,	0.605511041f	,
+0.792106577f	,	0.610382806f	,
+0.788346428f	,	0.615231591f	,
+0.784556597f	,	0.620057212f	,
+0.780737229f	,	0.624859488f	,
+0.776888466f	,	0.629638239f	,
+0.773010453f	,	0.634393284f	,
+0.769103338f	,	0.639124445f	,
+0.765167266f	,	0.643831543f	,
+0.761202385f	,	0.648514401f	,
+0.757208847f	,	0.653172843f	,
+0.753186799f	,	0.657806693f	,
+0.749136395f	,	0.662415778f	,
+0.745057785f	,	0.666999922f	,
+0.740951125f	,	0.671558955f	,
+0.736816569f	,	0.676092704f	,
+0.732654272f	,	0.680600998f	,
+0.728464390f	,	0.685083668f	,
+0.724247083f	,	0.689540545f	,
+0.720002508f	,	0.693971461f	,
+0.715730825f	,	0.698376249f	,
+0.711432196f	,	0.702754744f	,
+0.707106781f	,	0.707106781f	,
+0.702754744f	,	0.711432196f	,
+0.698376249f	,	0.715730825f	,
+0.693971461f	,	0.720002508f	,
+0.689540545f	,	0.724247083f	,
+0.685083668f	,	0.728464390f	,
+0.680600998f	,	0.732654272f	,
+0.676092704f	,	0.736816569f	,
+0.671558955f	,	0.740951125f	,
+0.666999922f	,	0.745057785f	,
+0.662415778f	,	0.749136395f	,
+0.657806693f	,	0.753186799f	,
+0.653172843f	,	0.757208847f	,
+0.648514401f	,	0.761202385f	,
+0.643831543f	,	0.765167266f	,
+0.639124445f	,	0.769103338f	,
+0.634393284f	,	0.773010453f	,
+0.629638239f	,	0.776888466f	,
+0.624859488f	,	0.780737229f	,
+0.620057212f	,	0.784556597f	,
+0.615231591f	,	0.788346428f	,
+0.610382806f	,	0.792106577f	,
+0.605511041f	,	0.795836905f	,
+0.600616479f	,	0.799537269f	,
+0.595699304f	,	0.803207531f	,
+0.590759702f	,	0.806847554f	,
+0.585797857f	,	0.810457198f	,
+0.580813958f	,	0.814036330f	,
+0.575808191f	,	0.817584813f	,
+0.570780746f	,	0.821102515f	,
+0.565731811f	,	0.824589303f	,
+0.560661576f	,	0.828045045f	,
+0.555570233f	,	0.831469612f	,
+0.550457973f	,	0.834862875f	,
+0.545324988f	,	0.838224706f	,
+0.540171473f	,	0.841554977f	,
+0.534997620f	,	0.844853565f	,
+0.529803625f	,	0.848120345f	,
+0.524589683f	,	0.851355193f	,
+0.519355990f	,	0.854557988f	,
+0.514102744f	,	0.857728610f	,
+0.508830143f	,	0.860866939f	,
+0.503538384f	,	0.863972856f	,
+0.498227667f	,	0.867046246f	,
+0.492898192f	,	0.870086991f	,
+0.487550160f	,	0.873094978f	,
+0.482183772f	,	0.876070094f	,
+0.476799230f	,	0.879012226f	,
+0.471396737f	,	0.881921264f	,
+0.465976496f	,	0.884797098f	,
+0.460538711f	,	0.887639620f	,
+0.455083587f	,	0.890448723f	,
+0.449611330f	,	0.893224301f	,
+0.444122145f	,	0.895966250f	,
+0.438616239f	,	0.898674466f	,
+0.433093819f	,	0.901348847f	,
+0.427555093f	,	0.903989293f	,
+0.422000271f	,	0.906595705f	,
+0.416429560f	,	0.909167983f	,
+0.410843171f	,	0.911706032f	,
+0.405241314f	,	0.914209756f	,
+0.399624200f	,	0.916679060f	,
+0.393992040f	,	0.919113852f	,
+0.388345047f	,	0.921514039f	,
+0.382683432f	,	0.923879533f	,
+0.377007410f	,	0.926210242f	,
+0.371317194f	,	0.928506080f	,
+0.365612998f	,	0.930766961f	,
+0.359895037f	,	0.932992799f	,
+0.354163525f	,	0.935183510f	,
+0.348418680f	,	0.937339012f	,
+0.342660717f	,	0.939459224f	,
+0.336889853f	,	0.941544065f	,
+0.331106306f	,	0.943593458f	,
+0.325310292f	,	0.945607325f	,
+0.319502031f	,	0.947585591f	,
+0.313681740f	,	0.949528181f	,
+0.307849640f	,	0.951435021f	,
+0.302005949f	,	0.953306040f	,
+0.296150888f	,	0.955141168f	,
+0.290284677f	,	0.956940336f	,
+0.284407537f	,	0.958703475f	,
+0.278519689f	,	0.960430519f	,
+0.272621355f	,	0.962121404f	,
+0.266712757f	,	0.963776066f	,
+0.260794118f	,	0.965394442f	,
+0.254865660f	,	0.966976471f	,
+0.248927606f	,	0.968522094f	,
+0.242980180f	,	0.970031253f	,
+0.237023606f	,	0.971503891f	,
+0.231058108f	,	0.972939952f	,
+0.225083911f	,	0.974339383f	,
+0.219101240f	,	0.975702130f	,
+0.213110320f	,	0.977028143f	,
+0.207111376f	,	0.978317371f	,
+0.201104635f	,	0.979569766f	,
+0.195090322f	,	0.980785280f	,
+0.189068664f	,	0.981963869f	,
+0.183039888f	,	0.983105487f	,
+0.177004220f	,	0.984210092f	,
+0.170961889f	,	0.985277642f	,
+0.164913120f	,	0.986308097f	,
+0.158858143f	,	0.987301418f	,
+0.152797185f	,	0.988257568f	,
+0.146730474f	,	0.989176510f	,
+0.140658239f	,	0.990058210f	,
+0.134580709f	,	0.990902635f	,
+0.128498111f	,	0.991709754f	,
+0.122410675f	,	0.992479535f	,
+0.116318631f	,	0.993211949f	,
+0.110222207f	,	0.993906970f	,
+0.104121634f	,	0.994564571f	,
+0.098017140f	,	0.995184727f	,
+0.091908956f	,	0.995767414f	,
+0.085797312f	,	0.996312612f	,
+0.079682438f	,	0.996820299f	,
+0.073564564f	,	0.997290457f	,
+0.067443920f	,	0.997723067f	,
+0.061320736f	,	0.998118113f	,
+0.055195244f	,	0.998475581f	,
+0.049067674f	,	0.998795456f	,
+0.042938257f	,	0.999077728f	,
+0.036807223f	,	0.999322385f	,
+0.030674803f	,	0.999529418f	,
+0.024541229f	,	0.999698819f	,
+0.018406730f	,	0.999830582f	,
+0.012271538f	,	0.999924702f	,
+0.006135885f	,	0.999981175f	,
+0.000000000f	,	1.000000000f	,
+-0.006135885f	,	0.999981175f	,
+-0.012271538f	,	0.999924702f	,
+-0.018406730f	,	0.999830582f	,
+-0.024541229f	,	0.999698819f	,
+-0.030674803f	,	0.999529418f	,
+-0.036807223f	,	0.999322385f	,
+-0.042938257f	,	0.999077728f	,
+-0.049067674f	,	0.998795456f	,
+-0.055195244f	,	0.998475581f	,
+-0.061320736f	,	0.998118113f	,
+-0.067443920f	,	0.997723067f	,
+-0.073564564f	,	0.997290457f	,
+-0.079682438f	,	0.996820299f	,
+-0.085797312f	,	0.996312612f	,
+-0.091908956f	,	0.995767414f	,
+-0.098017140f	,	0.995184727f	,
+-0.104121634f	,	0.994564571f	,
+-0.110222207f	,	0.993906970f	,
+-0.116318631f	,	0.993211949f	,
+-0.122410675f	,	0.992479535f	,
+-0.128498111f	,	0.991709754f	,
+-0.134580709f	,	0.990902635f	,
+-0.140658239f	,	0.990058210f	,
+-0.146730474f	,	0.989176510f	,
+-0.152797185f	,	0.988257568f	,
+-0.158858143f	,	0.987301418f	,
+-0.164913120f	,	0.986308097f	,
+-0.170961889f	,	0.985277642f	,
+-0.177004220f	,	0.984210092f	,
+-0.183039888f	,	0.983105487f	,
+-0.189068664f	,	0.981963869f	,
+-0.195090322f	,	0.980785280f	,
+-0.201104635f	,	0.979569766f	,
+-0.207111376f	,	0.978317371f	,
+-0.213110320f	,	0.977028143f	,
+-0.219101240f	,	0.975702130f	,
+-0.225083911f	,	0.974339383f	,
+-0.231058108f	,	0.972939952f	,
+-0.237023606f	,	0.971503891f	,
+-0.242980180f	,	0.970031253f	,
+-0.248927606f	,	0.968522094f	,
+-0.254865660f	,	0.966976471f	,
+-0.260794118f	,	0.965394442f	,
+-0.266712757f	,	0.963776066f	,
+-0.272621355f	,	0.962121404f	,
+-0.278519689f	,	0.960430519f	,
+-0.284407537f	,	0.958703475f	,
+-0.290284677f	,	0.956940336f	,
+-0.296150888f	,	0.955141168f	,
+-0.302005949f	,	0.953306040f	,
+-0.307849640f	,	0.951435021f	,
+-0.313681740f	,	0.949528181f	,
+-0.319502031f	,	0.947585591f	,
+-0.325310292f	,	0.945607325f	,
+-0.331106306f	,	0.943593458f	,
+-0.336889853f	,	0.941544065f	,
+-0.342660717f	,	0.939459224f	,
+-0.348418680f	,	0.937339012f	,
+-0.354163525f	,	0.935183510f	,
+-0.359895037f	,	0.932992799f	,
+-0.365612998f	,	0.930766961f	,
+-0.371317194f	,	0.928506080f	,
+-0.377007410f	,	0.926210242f	,
+-0.382683432f	,	0.923879533f	,
+-0.388345047f	,	0.921514039f	,
+-0.393992040f	,	0.919113852f	,
+-0.399624200f	,	0.916679060f	,
+-0.405241314f	,	0.914209756f	,
+-0.410843171f	,	0.911706032f	,
+-0.416429560f	,	0.909167983f	,
+-0.422000271f	,	0.906595705f	,
+-0.427555093f	,	0.903989293f	,
+-0.433093819f	,	0.901348847f	,
+-0.438616239f	,	0.898674466f	,
+-0.444122145f	,	0.895966250f	,
+-0.449611330f	,	0.893224301f	,
+-0.455083587f	,	0.890448723f	,
+-0.460538711f	,	0.887639620f	,
+-0.465976496f	,	0.884797098f	,
+-0.471396737f	,	0.881921264f	,
+-0.476799230f	,	0.879012226f	,
+-0.482183772f	,	0.876070094f	,
+-0.487550160f	,	0.873094978f	,
+-0.492898192f	,	0.870086991f	,
+-0.498227667f	,	0.867046246f	,
+-0.503538384f	,	0.863972856f	,
+-0.508830143f	,	0.860866939f	,
+-0.514102744f	,	0.857728610f	,
+-0.519355990f	,	0.854557988f	,
+-0.524589683f	,	0.851355193f	,
+-0.529803625f	,	0.848120345f	,
+-0.534997620f	,	0.844853565f	,
+-0.540171473f	,	0.841554977f	,
+-0.545324988f	,	0.838224706f	,
+-0.550457973f	,	0.834862875f	,
+-0.555570233f	,	0.831469612f	,
+-0.560661576f	,	0.828045045f	,
+-0.565731811f	,	0.824589303f	,
+-0.570780746f	,	0.821102515f	,
+-0.575808191f	,	0.817584813f	,
+-0.580813958f	,	0.814036330f	,
+-0.585797857f	,	0.810457198f	,
+-0.590759702f	,	0.806847554f	,
+-0.595699304f	,	0.803207531f	,
+-0.600616479f	,	0.799537269f	,
+-0.605511041f	,	0.795836905f	,
+-0.610382806f	,	0.792106577f	,
+-0.615231591f	,	0.788346428f	,
+-0.620057212f	,	0.784556597f	,
+-0.624859488f	,	0.780737229f	,
+-0.629638239f	,	0.776888466f	,
+-0.634393284f	,	0.773010453f	,
+-0.639124445f	,	0.769103338f	,
+-0.643831543f	,	0.765167266f	,
+-0.648514401f	,	0.761202385f	,
+-0.653172843f	,	0.757208847f	,
+-0.657806693f	,	0.753186799f	,
+-0.662415778f	,	0.749136395f	,
+-0.666999922f	,	0.745057785f	,
+-0.671558955f	,	0.740951125f	,
+-0.676092704f	,	0.736816569f	,
+-0.680600998f	,	0.732654272f	,
+-0.685083668f	,	0.728464390f	,
+-0.689540545f	,	0.724247083f	,
+-0.693971461f	,	0.720002508f	,
+-0.698376249f	,	0.715730825f	,
+-0.702754744f	,	0.711432196f	,
+-0.707106781f	,	0.707106781f	,
+-0.711432196f	,	0.702754744f	,
+-0.715730825f	,	0.698376249f	,
+-0.720002508f	,	0.693971461f	,
+-0.724247083f	,	0.689540545f	,
+-0.728464390f	,	0.685083668f	,
+-0.732654272f	,	0.680600998f	,
+-0.736816569f	,	0.676092704f	,
+-0.740951125f	,	0.671558955f	,
+-0.745057785f	,	0.666999922f	,
+-0.749136395f	,	0.662415778f	,
+-0.753186799f	,	0.657806693f	,
+-0.757208847f	,	0.653172843f	,
+-0.761202385f	,	0.648514401f	,
+-0.765167266f	,	0.643831543f	,
+-0.769103338f	,	0.639124445f	,
+-0.773010453f	,	0.634393284f	,
+-0.776888466f	,	0.629638239f	,
+-0.780737229f	,	0.624859488f	,
+-0.784556597f	,	0.620057212f	,
+-0.788346428f	,	0.615231591f	,
+-0.792106577f	,	0.610382806f	,
+-0.795836905f	,	0.605511041f	,
+-0.799537269f	,	0.600616479f	,
+-0.803207531f	,	0.595699304f	,
+-0.806847554f	,	0.590759702f	,
+-0.810457198f	,	0.585797857f	,
+-0.814036330f	,	0.580813958f	,
+-0.817584813f	,	0.575808191f	,
+-0.821102515f	,	0.570780746f	,
+-0.824589303f	,	0.565731811f	,
+-0.828045045f	,	0.560661576f	,
+-0.831469612f	,	0.555570233f	,
+-0.834862875f	,	0.550457973f	,
+-0.838224706f	,	0.545324988f	,
+-0.841554977f	,	0.540171473f	,
+-0.844853565f	,	0.534997620f	,
+-0.848120345f	,	0.529803625f	,
+-0.851355193f	,	0.524589683f	,
+-0.854557988f	,	0.519355990f	,
+-0.857728610f	,	0.514102744f	,
+-0.860866939f	,	0.508830143f	,
+-0.863972856f	,	0.503538384f	,
+-0.867046246f	,	0.498227667f	,
+-0.870086991f	,	0.492898192f	,
+-0.873094978f	,	0.487550160f	,
+-0.876070094f	,	0.482183772f	,
+-0.879012226f	,	0.476799230f	,
+-0.881921264f	,	0.471396737f	,
+-0.884797098f	,	0.465976496f	,
+-0.887639620f	,	0.460538711f	,
+-0.890448723f	,	0.455083587f	,
+-0.893224301f	,	0.449611330f	,
+-0.895966250f	,	0.444122145f	,
+-0.898674466f	,	0.438616239f	,
+-0.901348847f	,	0.433093819f	,
+-0.903989293f	,	0.427555093f	,
+-0.906595705f	,	0.422000271f	,
+-0.909167983f	,	0.416429560f	,
+-0.911706032f	,	0.410843171f	,
+-0.914209756f	,	0.405241314f	,
+-0.916679060f	,	0.399624200f	,
+-0.919113852f	,	0.393992040f	,
+-0.921514039f	,	0.388345047f	,
+-0.923879533f	,	0.382683432f	,
+-0.926210242f	,	0.377007410f	,
+-0.928506080f	,	0.371317194f	,
+-0.930766961f	,	0.365612998f	,
+-0.932992799f	,	0.359895037f	,
+-0.935183510f	,	0.354163525f	,
+-0.937339012f	,	0.348418680f	,
+-0.939459224f	,	0.342660717f	,
+-0.941544065f	,	0.336889853f	,
+-0.943593458f	,	0.331106306f	,
+-0.945607325f	,	0.325310292f	,
+-0.947585591f	,	0.319502031f	,
+-0.949528181f	,	0.313681740f	,
+-0.951435021f	,	0.307849640f	,
+-0.953306040f	,	0.302005949f	,
+-0.955141168f	,	0.296150888f	,
+-0.956940336f	,	0.290284677f	,
+-0.958703475f	,	0.284407537f	,
+-0.960430519f	,	0.278519689f	,
+-0.962121404f	,	0.272621355f	,
+-0.963776066f	,	0.266712757f	,
+-0.965394442f	,	0.260794118f	,
+-0.966976471f	,	0.254865660f	,
+-0.968522094f	,	0.248927606f	,
+-0.970031253f	,	0.242980180f	,
+-0.971503891f	,	0.237023606f	,
+-0.972939952f	,	0.231058108f	,
+-0.974339383f	,	0.225083911f	,
+-0.975702130f	,	0.219101240f	,
+-0.977028143f	,	0.213110320f	,
+-0.978317371f	,	0.207111376f	,
+-0.979569766f	,	0.201104635f	,
+-0.980785280f	,	0.195090322f	,
+-0.981963869f	,	0.189068664f	,
+-0.983105487f	,	0.183039888f	,
+-0.984210092f	,	0.177004220f	,
+-0.985277642f	,	0.170961889f	,
+-0.986308097f	,	0.164913120f	,
+-0.987301418f	,	0.158858143f	,
+-0.988257568f	,	0.152797185f	,
+-0.989176510f	,	0.146730474f	,
+-0.990058210f	,	0.140658239f	,
+-0.990902635f	,	0.134580709f	,
+-0.991709754f	,	0.128498111f	,
+-0.992479535f	,	0.122410675f	,
+-0.993211949f	,	0.116318631f	,
+-0.993906970f	,	0.110222207f	,
+-0.994564571f	,	0.104121634f	,
+-0.995184727f	,	0.098017140f	,
+-0.995767414f	,	0.091908956f	,
+-0.996312612f	,	0.085797312f	,
+-0.996820299f	,	0.079682438f	,
+-0.997290457f	,	0.073564564f	,
+-0.997723067f	,	0.067443920f	,
+-0.998118113f	,	0.061320736f	,
+-0.998475581f	,	0.055195244f	,
+-0.998795456f	,	0.049067674f	,
+-0.999077728f	,	0.042938257f	,
+-0.999322385f	,	0.036807223f	,
+-0.999529418f	,	0.030674803f	,
+-0.999698819f	,	0.024541229f	,
+-0.999830582f	,	0.018406730f	,
+-0.999924702f	,	0.012271538f	,
+-0.999981175f	,	0.006135885f	,
+-1.000000000f	,	0.000000000f	,
+-0.999981175f	,	-0.006135885f	,
+-0.999924702f	,	-0.012271538f	,
+-0.999830582f	,	-0.018406730f	,
+-0.999698819f	,	-0.024541229f	,
+-0.999529418f	,	-0.030674803f	,
+-0.999322385f	,	-0.036807223f	,
+-0.999077728f	,	-0.042938257f	,
+-0.998795456f	,	-0.049067674f	,
+-0.998475581f	,	-0.055195244f	,
+-0.998118113f	,	-0.061320736f	,
+-0.997723067f	,	-0.067443920f	,
+-0.997290457f	,	-0.073564564f	,
+-0.996820299f	,	-0.079682438f	,
+-0.996312612f	,	-0.085797312f	,
+-0.995767414f	,	-0.091908956f	,
+-0.995184727f	,	-0.098017140f	,
+-0.994564571f	,	-0.104121634f	,
+-0.993906970f	,	-0.110222207f	,
+-0.993211949f	,	-0.116318631f	,
+-0.992479535f	,	-0.122410675f	,
+-0.991709754f	,	-0.128498111f	,
+-0.990902635f	,	-0.134580709f	,
+-0.990058210f	,	-0.140658239f	,
+-0.989176510f	,	-0.146730474f	,
+-0.988257568f	,	-0.152797185f	,
+-0.987301418f	,	-0.158858143f	,
+-0.986308097f	,	-0.164913120f	,
+-0.985277642f	,	-0.170961889f	,
+-0.984210092f	,	-0.177004220f	,
+-0.983105487f	,	-0.183039888f	,
+-0.981963869f	,	-0.189068664f	,
+-0.980785280f	,	-0.195090322f	,
+-0.979569766f	,	-0.201104635f	,
+-0.978317371f	,	-0.207111376f	,
+-0.977028143f	,	-0.213110320f	,
+-0.975702130f	,	-0.219101240f	,
+-0.974339383f	,	-0.225083911f	,
+-0.972939952f	,	-0.231058108f	,
+-0.971503891f	,	-0.237023606f	,
+-0.970031253f	,	-0.242980180f	,
+-0.968522094f	,	-0.248927606f	,
+-0.966976471f	,	-0.254865660f	,
+-0.965394442f	,	-0.260794118f	,
+-0.963776066f	,	-0.266712757f	,
+-0.962121404f	,	-0.272621355f	,
+-0.960430519f	,	-0.278519689f	,
+-0.958703475f	,	-0.284407537f	,
+-0.956940336f	,	-0.290284677f	,
+-0.955141168f	,	-0.296150888f	,
+-0.953306040f	,	-0.302005949f	,
+-0.951435021f	,	-0.307849640f	,
+-0.949528181f	,	-0.313681740f	,
+-0.947585591f	,	-0.319502031f	,
+-0.945607325f	,	-0.325310292f	,
+-0.943593458f	,	-0.331106306f	,
+-0.941544065f	,	-0.336889853f	,
+-0.939459224f	,	-0.342660717f	,
+-0.937339012f	,	-0.348418680f	,
+-0.935183510f	,	-0.354163525f	,
+-0.932992799f	,	-0.359895037f	,
+-0.930766961f	,	-0.365612998f	,
+-0.928506080f	,	-0.371317194f	,
+-0.926210242f	,	-0.377007410f	,
+-0.923879533f	,	-0.382683432f	,
+-0.921514039f	,	-0.388345047f	,
+-0.919113852f	,	-0.393992040f	,
+-0.916679060f	,	-0.399624200f	,
+-0.914209756f	,	-0.405241314f	,
+-0.911706032f	,	-0.410843171f	,
+-0.909167983f	,	-0.416429560f	,
+-0.906595705f	,	-0.422000271f	,
+-0.903989293f	,	-0.427555093f	,
+-0.901348847f	,	-0.433093819f	,
+-0.898674466f	,	-0.438616239f	,
+-0.895966250f	,	-0.444122145f	,
+-0.893224301f	,	-0.449611330f	,
+-0.890448723f	,	-0.455083587f	,
+-0.887639620f	,	-0.460538711f	,
+-0.884797098f	,	-0.465976496f	,
+-0.881921264f	,	-0.471396737f	,
+-0.879012226f	,	-0.476799230f	,
+-0.876070094f	,	-0.482183772f	,
+-0.873094978f	,	-0.487550160f	,
+-0.870086991f	,	-0.492898192f	,
+-0.867046246f	,	-0.498227667f	,
+-0.863972856f	,	-0.503538384f	,
+-0.860866939f	,	-0.508830143f	,
+-0.857728610f	,	-0.514102744f	,
+-0.854557988f	,	-0.519355990f	,
+-0.851355193f	,	-0.524589683f	,
+-0.848120345f	,	-0.529803625f	,
+-0.844853565f	,	-0.534997620f	,
+-0.841554977f	,	-0.540171473f	,
+-0.838224706f	,	-0.545324988f	,
+-0.834862875f	,	-0.550457973f	,
+-0.831469612f	,	-0.555570233f	,
+-0.828045045f	,	-0.560661576f	,
+-0.824589303f	,	-0.565731811f	,
+-0.821102515f	,	-0.570780746f	,
+-0.817584813f	,	-0.575808191f	,
+-0.814036330f	,	-0.580813958f	,
+-0.810457198f	,	-0.585797857f	,
+-0.806847554f	,	-0.590759702f	,
+-0.803207531f	,	-0.595699304f	,
+-0.799537269f	,	-0.600616479f	,
+-0.795836905f	,	-0.605511041f	,
+-0.792106577f	,	-0.610382806f	,
+-0.788346428f	,	-0.615231591f	,
+-0.784556597f	,	-0.620057212f	,
+-0.780737229f	,	-0.624859488f	,
+-0.776888466f	,	-0.629638239f	,
+-0.773010453f	,	-0.634393284f	,
+-0.769103338f	,	-0.639124445f	,
+-0.765167266f	,	-0.643831543f	,
+-0.761202385f	,	-0.648514401f	,
+-0.757208847f	,	-0.653172843f	,
+-0.753186799f	,	-0.657806693f	,
+-0.749136395f	,	-0.662415778f	,
+-0.745057785f	,	-0.666999922f	,
+-0.740951125f	,	-0.671558955f	,
+-0.736816569f	,	-0.676092704f	,
+-0.732654272f	,	-0.680600998f	,
+-0.728464390f	,	-0.685083668f	,
+-0.724247083f	,	-0.689540545f	,
+-0.720002508f	,	-0.693971461f	,
+-0.715730825f	,	-0.698376249f	,
+-0.711432196f	,	-0.702754744f	,
+-0.707106781f	,	-0.707106781f	,
+-0.702754744f	,	-0.711432196f	,
+-0.698376249f	,	-0.715730825f	,
+-0.693971461f	,	-0.720002508f	,
+-0.689540545f	,	-0.724247083f	,
+-0.685083668f	,	-0.728464390f	,
+-0.680600998f	,	-0.732654272f	,
+-0.676092704f	,	-0.736816569f	,
+-0.671558955f	,	-0.740951125f	,
+-0.666999922f	,	-0.745057785f	,
+-0.662415778f	,	-0.749136395f	,
+-0.657806693f	,	-0.753186799f	,
+-0.653172843f	,	-0.757208847f	,
+-0.648514401f	,	-0.761202385f	,
+-0.643831543f	,	-0.765167266f	,
+-0.639124445f	,	-0.769103338f	,
+-0.634393284f	,	-0.773010453f	,
+-0.629638239f	,	-0.776888466f	,
+-0.624859488f	,	-0.780737229f	,
+-0.620057212f	,	-0.784556597f	,
+-0.615231591f	,	-0.788346428f	,
+-0.610382806f	,	-0.792106577f	,
+-0.605511041f	,	-0.795836905f	,
+-0.600616479f	,	-0.799537269f	,
+-0.595699304f	,	-0.803207531f	,
+-0.590759702f	,	-0.806847554f	,
+-0.585797857f	,	-0.810457198f	,
+-0.580813958f	,	-0.814036330f	,
+-0.575808191f	,	-0.817584813f	,
+-0.570780746f	,	-0.821102515f	,
+-0.565731811f	,	-0.824589303f	,
+-0.560661576f	,	-0.828045045f	,
+-0.555570233f	,	-0.831469612f	,
+-0.550457973f	,	-0.834862875f	,
+-0.545324988f	,	-0.838224706f	,
+-0.540171473f	,	-0.841554977f	,
+-0.534997620f	,	-0.844853565f	,
+-0.529803625f	,	-0.848120345f	,
+-0.524589683f	,	-0.851355193f	,
+-0.519355990f	,	-0.854557988f	,
+-0.514102744f	,	-0.857728610f	,
+-0.508830143f	,	-0.860866939f	,
+-0.503538384f	,	-0.863972856f	,
+-0.498227667f	,	-0.867046246f	,
+-0.492898192f	,	-0.870086991f	,
+-0.487550160f	,	-0.873094978f	,
+-0.482183772f	,	-0.876070094f	,
+-0.476799230f	,	-0.879012226f	,
+-0.471396737f	,	-0.881921264f	,
+-0.465976496f	,	-0.884797098f	,
+-0.460538711f	,	-0.887639620f	,
+-0.455083587f	,	-0.890448723f	,
+-0.449611330f	,	-0.893224301f	,
+-0.444122145f	,	-0.895966250f	,
+-0.438616239f	,	-0.898674466f	,
+-0.433093819f	,	-0.901348847f	,
+-0.427555093f	,	-0.903989293f	,
+-0.422000271f	,	-0.906595705f	,
+-0.416429560f	,	-0.909167983f	,
+-0.410843171f	,	-0.911706032f	,
+-0.405241314f	,	-0.914209756f	,
+-0.399624200f	,	-0.916679060f	,
+-0.393992040f	,	-0.919113852f	,
+-0.388345047f	,	-0.921514039f	,
+-0.382683432f	,	-0.923879533f	,
+-0.377007410f	,	-0.926210242f	,
+-0.371317194f	,	-0.928506080f	,
+-0.365612998f	,	-0.930766961f	,
+-0.359895037f	,	-0.932992799f	,
+-0.354163525f	,	-0.935183510f	,
+-0.348418680f	,	-0.937339012f	,
+-0.342660717f	,	-0.939459224f	,
+-0.336889853f	,	-0.941544065f	,
+-0.331106306f	,	-0.943593458f	,
+-0.325310292f	,	-0.945607325f	,
+-0.319502031f	,	-0.947585591f	,
+-0.313681740f	,	-0.949528181f	,
+-0.307849640f	,	-0.951435021f	,
+-0.302005949f	,	-0.953306040f	,
+-0.296150888f	,	-0.955141168f	,
+-0.290284677f	,	-0.956940336f	,
+-0.284407537f	,	-0.958703475f	,
+-0.278519689f	,	-0.960430519f	,
+-0.272621355f	,	-0.962121404f	,
+-0.266712757f	,	-0.963776066f	,
+-0.260794118f	,	-0.965394442f	,
+-0.254865660f	,	-0.966976471f	,
+-0.248927606f	,	-0.968522094f	,
+-0.242980180f	,	-0.970031253f	,
+-0.237023606f	,	-0.971503891f	,
+-0.231058108f	,	-0.972939952f	,
+-0.225083911f	,	-0.974339383f	,
+-0.219101240f	,	-0.975702130f	,
+-0.213110320f	,	-0.977028143f	,
+-0.207111376f	,	-0.978317371f	,
+-0.201104635f	,	-0.979569766f	,
+-0.195090322f	,	-0.980785280f	,
+-0.189068664f	,	-0.981963869f	,
+-0.183039888f	,	-0.983105487f	,
+-0.177004220f	,	-0.984210092f	,
+-0.170961889f	,	-0.985277642f	,
+-0.164913120f	,	-0.986308097f	,
+-0.158858143f	,	-0.987301418f	,
+-0.152797185f	,	-0.988257568f	,
+-0.146730474f	,	-0.989176510f	,
+-0.140658239f	,	-0.990058210f	,
+-0.134580709f	,	-0.990902635f	,
+-0.128498111f	,	-0.991709754f	,
+-0.122410675f	,	-0.992479535f	,
+-0.116318631f	,	-0.993211949f	,
+-0.110222207f	,	-0.993906970f	,
+-0.104121634f	,	-0.994564571f	,
+-0.098017140f	,	-0.995184727f	,
+-0.091908956f	,	-0.995767414f	,
+-0.085797312f	,	-0.996312612f	,
+-0.079682438f	,	-0.996820299f	,
+-0.073564564f	,	-0.997290457f	,
+-0.067443920f	,	-0.997723067f	,
+-0.061320736f	,	-0.998118113f	,
+-0.055195244f	,	-0.998475581f	,
+-0.049067674f	,	-0.998795456f	,
+-0.042938257f	,	-0.999077728f	,
+-0.036807223f	,	-0.999322385f	,
+-0.030674803f	,	-0.999529418f	,
+-0.024541229f	,	-0.999698819f	,
+-0.018406730f	,	-0.999830582f	,
+-0.012271538f	,	-0.999924702f	,
+-0.006135885f	,	-0.999981175f	,
+-0.000000000f	,	-1.000000000f	,
+0.006135885f	,	-0.999981175f	,
+0.012271538f	,	-0.999924702f	,
+0.018406730f	,	-0.999830582f	,
+0.024541229f	,	-0.999698819f	,
+0.030674803f	,	-0.999529418f	,
+0.036807223f	,	-0.999322385f	,
+0.042938257f	,	-0.999077728f	,
+0.049067674f	,	-0.998795456f	,
+0.055195244f	,	-0.998475581f	,
+0.061320736f	,	-0.998118113f	,
+0.067443920f	,	-0.997723067f	,
+0.073564564f	,	-0.997290457f	,
+0.079682438f	,	-0.996820299f	,
+0.085797312f	,	-0.996312612f	,
+0.091908956f	,	-0.995767414f	,
+0.098017140f	,	-0.995184727f	,
+0.104121634f	,	-0.994564571f	,
+0.110222207f	,	-0.993906970f	,
+0.116318631f	,	-0.993211949f	,
+0.122410675f	,	-0.992479535f	,
+0.128498111f	,	-0.991709754f	,
+0.134580709f	,	-0.990902635f	,
+0.140658239f	,	-0.990058210f	,
+0.146730474f	,	-0.989176510f	,
+0.152797185f	,	-0.988257568f	,
+0.158858143f	,	-0.987301418f	,
+0.164913120f	,	-0.986308097f	,
+0.170961889f	,	-0.985277642f	,
+0.177004220f	,	-0.984210092f	,
+0.183039888f	,	-0.983105487f	,
+0.189068664f	,	-0.981963869f	,
+0.195090322f	,	-0.980785280f	,
+0.201104635f	,	-0.979569766f	,
+0.207111376f	,	-0.978317371f	,
+0.213110320f	,	-0.977028143f	,
+0.219101240f	,	-0.975702130f	,
+0.225083911f	,	-0.974339383f	,
+0.231058108f	,	-0.972939952f	,
+0.237023606f	,	-0.971503891f	,
+0.242980180f	,	-0.970031253f	,
+0.248927606f	,	-0.968522094f	,
+0.254865660f	,	-0.966976471f	,
+0.260794118f	,	-0.965394442f	,
+0.266712757f	,	-0.963776066f	,
+0.272621355f	,	-0.962121404f	,
+0.278519689f	,	-0.960430519f	,
+0.284407537f	,	-0.958703475f	,
+0.290284677f	,	-0.956940336f	,
+0.296150888f	,	-0.955141168f	,
+0.302005949f	,	-0.953306040f	,
+0.307849640f	,	-0.951435021f	,
+0.313681740f	,	-0.949528181f	,
+0.319502031f	,	-0.947585591f	,
+0.325310292f	,	-0.945607325f	,
+0.331106306f	,	-0.943593458f	,
+0.336889853f	,	-0.941544065f	,
+0.342660717f	,	-0.939459224f	,
+0.348418680f	,	-0.937339012f	,
+0.354163525f	,	-0.935183510f	,
+0.359895037f	,	-0.932992799f	,
+0.365612998f	,	-0.930766961f	,
+0.371317194f	,	-0.928506080f	,
+0.377007410f	,	-0.926210242f	,
+0.382683432f	,	-0.923879533f	,
+0.388345047f	,	-0.921514039f	,
+0.393992040f	,	-0.919113852f	,
+0.399624200f	,	-0.916679060f	,
+0.405241314f	,	-0.914209756f	,
+0.410843171f	,	-0.911706032f	,
+0.416429560f	,	-0.909167983f	,
+0.422000271f	,	-0.906595705f	,
+0.427555093f	,	-0.903989293f	,
+0.433093819f	,	-0.901348847f	,
+0.438616239f	,	-0.898674466f	,
+0.444122145f	,	-0.895966250f	,
+0.449611330f	,	-0.893224301f	,
+0.455083587f	,	-0.890448723f	,
+0.460538711f	,	-0.887639620f	,
+0.465976496f	,	-0.884797098f	,
+0.471396737f	,	-0.881921264f	,
+0.476799230f	,	-0.879012226f	,
+0.482183772f	,	-0.876070094f	,
+0.487550160f	,	-0.873094978f	,
+0.492898192f	,	-0.870086991f	,
+0.498227667f	,	-0.867046246f	,
+0.503538384f	,	-0.863972856f	,
+0.508830143f	,	-0.860866939f	,
+0.514102744f	,	-0.857728610f	,
+0.519355990f	,	-0.854557988f	,
+0.524589683f	,	-0.851355193f	,
+0.529803625f	,	-0.848120345f	,
+0.534997620f	,	-0.844853565f	,
+0.540171473f	,	-0.841554977f	,
+0.545324988f	,	-0.838224706f	,
+0.550457973f	,	-0.834862875f	,
+0.555570233f	,	-0.831469612f	,
+0.560661576f	,	-0.828045045f	,
+0.565731811f	,	-0.824589303f	,
+0.570780746f	,	-0.821102515f	,
+0.575808191f	,	-0.817584813f	,
+0.580813958f	,	-0.814036330f	,
+0.585797857f	,	-0.810457198f	,
+0.590759702f	,	-0.806847554f	,
+0.595699304f	,	-0.803207531f	,
+0.600616479f	,	-0.799537269f	,
+0.605511041f	,	-0.795836905f	,
+0.610382806f	,	-0.792106577f	,
+0.615231591f	,	-0.788346428f	,
+0.620057212f	,	-0.784556597f	,
+0.624859488f	,	-0.780737229f	,
+0.629638239f	,	-0.776888466f	,
+0.634393284f	,	-0.773010453f	,
+0.639124445f	,	-0.769103338f	,
+0.643831543f	,	-0.765167266f	,
+0.648514401f	,	-0.761202385f	,
+0.653172843f	,	-0.757208847f	,
+0.657806693f	,	-0.753186799f	,
+0.662415778f	,	-0.749136395f	,
+0.666999922f	,	-0.745057785f	,
+0.671558955f	,	-0.740951125f	,
+0.676092704f	,	-0.736816569f	,
+0.680600998f	,	-0.732654272f	,
+0.685083668f	,	-0.728464390f	,
+0.689540545f	,	-0.724247083f	,
+0.693971461f	,	-0.720002508f	,
+0.698376249f	,	-0.715730825f	,
+0.702754744f	,	-0.711432196f	,
+0.707106781f	,	-0.707106781f	,
+0.711432196f	,	-0.702754744f	,
+0.715730825f	,	-0.698376249f	,
+0.720002508f	,	-0.693971461f	,
+0.724247083f	,	-0.689540545f	,
+0.728464390f	,	-0.685083668f	,
+0.732654272f	,	-0.680600998f	,
+0.736816569f	,	-0.676092704f	,
+0.740951125f	,	-0.671558955f	,
+0.745057785f	,	-0.666999922f	,
+0.749136395f	,	-0.662415778f	,
+0.753186799f	,	-0.657806693f	,
+0.757208847f	,	-0.653172843f	,
+0.761202385f	,	-0.648514401f	,
+0.765167266f	,	-0.643831543f	,
+0.769103338f	,	-0.639124445f	,
+0.773010453f	,	-0.634393284f	,
+0.776888466f	,	-0.629638239f	,
+0.780737229f	,	-0.624859488f	,
+0.784556597f	,	-0.620057212f	,
+0.788346428f	,	-0.615231591f	,
+0.792106577f	,	-0.610382806f	,
+0.795836905f	,	-0.605511041f	,
+0.799537269f	,	-0.600616479f	,
+0.803207531f	,	-0.595699304f	,
+0.806847554f	,	-0.590759702f	,
+0.810457198f	,	-0.585797857f	,
+0.814036330f	,	-0.580813958f	,
+0.817584813f	,	-0.575808191f	,
+0.821102515f	,	-0.570780746f	,
+0.824589303f	,	-0.565731811f	,
+0.828045045f	,	-0.560661576f	,
+0.831469612f	,	-0.555570233f	,
+0.834862875f	,	-0.550457973f	,
+0.838224706f	,	-0.545324988f	,
+0.841554977f	,	-0.540171473f	,
+0.844853565f	,	-0.534997620f	,
+0.848120345f	,	-0.529803625f	,
+0.851355193f	,	-0.524589683f	,
+0.854557988f	,	-0.519355990f	,
+0.857728610f	,	-0.514102744f	,
+0.860866939f	,	-0.508830143f	,
+0.863972856f	,	-0.503538384f	,
+0.867046246f	,	-0.498227667f	,
+0.870086991f	,	-0.492898192f	,
+0.873094978f	,	-0.487550160f	,
+0.876070094f	,	-0.482183772f	,
+0.879012226f	,	-0.476799230f	,
+0.881921264f	,	-0.471396737f	,
+0.884797098f	,	-0.465976496f	,
+0.887639620f	,	-0.460538711f	,
+0.890448723f	,	-0.455083587f	,
+0.893224301f	,	-0.449611330f	,
+0.895966250f	,	-0.444122145f	,
+0.898674466f	,	-0.438616239f	,
+0.901348847f	,	-0.433093819f	,
+0.903989293f	,	-0.427555093f	,
+0.906595705f	,	-0.422000271f	,
+0.909167983f	,	-0.416429560f	,
+0.911706032f	,	-0.410843171f	,
+0.914209756f	,	-0.405241314f	,
+0.916679060f	,	-0.399624200f	,
+0.919113852f	,	-0.393992040f	,
+0.921514039f	,	-0.388345047f	,
+0.923879533f	,	-0.382683432f	,
+0.926210242f	,	-0.377007410f	,
+0.928506080f	,	-0.371317194f	,
+0.930766961f	,	-0.365612998f	,
+0.932992799f	,	-0.359895037f	,
+0.935183510f	,	-0.354163525f	,
+0.937339012f	,	-0.348418680f	,
+0.939459224f	,	-0.342660717f	,
+0.941544065f	,	-0.336889853f	,
+0.943593458f	,	-0.331106306f	,
+0.945607325f	,	-0.325310292f	,
+0.947585591f	,	-0.319502031f	,
+0.949528181f	,	-0.313681740f	,
+0.951435021f	,	-0.307849640f	,
+0.953306040f	,	-0.302005949f	,
+0.955141168f	,	-0.296150888f	,
+0.956940336f	,	-0.290284677f	,
+0.958703475f	,	-0.284407537f	,
+0.960430519f	,	-0.278519689f	,
+0.962121404f	,	-0.272621355f	,
+0.963776066f	,	-0.266712757f	,
+0.965394442f	,	-0.260794118f	,
+0.966976471f	,	-0.254865660f	,
+0.968522094f	,	-0.248927606f	,
+0.970031253f	,	-0.242980180f	,
+0.971503891f	,	-0.237023606f	,
+0.972939952f	,	-0.231058108f	,
+0.974339383f	,	-0.225083911f	,
+0.975702130f	,	-0.219101240f	,
+0.977028143f	,	-0.213110320f	,
+0.978317371f	,	-0.207111376f	,
+0.979569766f	,	-0.201104635f	,
+0.980785280f	,	-0.195090322f	,
+0.981963869f	,	-0.189068664f	,
+0.983105487f	,	-0.183039888f	,
+0.984210092f	,	-0.177004220f	,
+0.985277642f	,	-0.170961889f	,
+0.986308097f	,	-0.164913120f	,
+0.987301418f	,	-0.158858143f	,
+0.988257568f	,	-0.152797185f	,
+0.989176510f	,	-0.146730474f	,
+0.990058210f	,	-0.140658239f	,
+0.990902635f	,	-0.134580709f	,
+0.991709754f	,	-0.128498111f	,
+0.992479535f	,	-0.122410675f	,
+0.993211949f	,	-0.116318631f	,
+0.993906970f	,	-0.110222207f	,
+0.994564571f	,	-0.104121634f	,
+0.995184727f	,	-0.098017140f	,
+0.995767414f	,	-0.091908956f	,
+0.996312612f	,	-0.085797312f	,
+0.996820299f	,	-0.079682438f	,
+0.997290457f	,	-0.073564564f	,
+0.997723067f	,	-0.067443920f	,
+0.998118113f	,	-0.061320736f	,
+0.998475581f	,	-0.055195244f	,
+0.998795456f	,	-0.049067674f	,
+0.999077728f	,	-0.042938257f	,
+0.999322385f	,	-0.036807223f	,
+0.999529418f	,	-0.030674803f	,
+0.999698819f	,	-0.024541229f	,
+0.999830582f	,	-0.018406730f	,
+0.999924702f	,	-0.012271538f	,
+0.999981175f	,	-0.006135885f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 2048	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_2048[4096] = {
+    1.000000000f,  0.000000000f,
+    0.999995294f,  0.003067957f,
+    0.999981175f,  0.006135885f,
+    0.999957645f,  0.009203755f,
+    0.999924702f,  0.012271538f,
+    0.999882347f,  0.015339206f,
+    0.999830582f,  0.018406730f,
+    0.999769405f,  0.021474080f,
+    0.999698819f,  0.024541229f,
+    0.999618822f,  0.027608146f,
+    0.999529418f,  0.030674803f,
+    0.999430605f,  0.033741172f,
+    0.999322385f,  0.036807223f,
+    0.999204759f,  0.039872928f,
+    0.999077728f,  0.042938257f,
+    0.998941293f,  0.046003182f,
+    0.998795456f,  0.049067674f,
+    0.998640218f,  0.052131705f,
+    0.998475581f,  0.055195244f,
+    0.998301545f,  0.058258265f,
+    0.998118113f,  0.061320736f,
+    0.997925286f,  0.064382631f,
+    0.997723067f,  0.067443920f,
+    0.997511456f,  0.070504573f,
+    0.997290457f,  0.073564564f,
+    0.997060070f,  0.076623861f,
+    0.996820299f,  0.079682438f,
+    0.996571146f,  0.082740265f,
+    0.996312612f,  0.085797312f,
+    0.996044701f,  0.088853553f,
+    0.995767414f,  0.091908956f,
+    0.995480755f,  0.094963495f,
+    0.995184727f,  0.098017140f,
+    0.994879331f,  0.101069863f,
+    0.994564571f,  0.104121634f,
+    0.994240449f,  0.107172425f,
+    0.993906970f,  0.110222207f,
+    0.993564136f,  0.113270952f,
+    0.993211949f,  0.116318631f,
+    0.992850414f,  0.119365215f,
+    0.992479535f,  0.122410675f,
+    0.992099313f,  0.125454983f,
+    0.991709754f,  0.128498111f,
+    0.991310860f,  0.131540029f,
+    0.990902635f,  0.134580709f,
+    0.990485084f,  0.137620122f,
+    0.990058210f,  0.140658239f,
+    0.989622017f,  0.143695033f,
+    0.989176510f,  0.146730474f,
+    0.988721692f,  0.149764535f,
+    0.988257568f,  0.152797185f,
+    0.987784142f,  0.155828398f,
+    0.987301418f,  0.158858143f,
+    0.986809402f,  0.161886394f,
+    0.986308097f,  0.164913120f,
+    0.985797509f,  0.167938295f,
+    0.985277642f,  0.170961889f,
+    0.984748502f,  0.173983873f,
+    0.984210092f,  0.177004220f,
+    0.983662419f,  0.180022901f,
+    0.983105487f,  0.183039888f,
+    0.982539302f,  0.186055152f,
+    0.981963869f,  0.189068664f,
+    0.981379193f,  0.192080397f,
+    0.980785280f,  0.195090322f,
+    0.980182136f,  0.198098411f,
+    0.979569766f,  0.201104635f,
+    0.978948175f,  0.204108966f,
+    0.978317371f,  0.207111376f,
+    0.977677358f,  0.210111837f,
+    0.977028143f,  0.213110320f,
+    0.976369731f,  0.216106797f,
+    0.975702130f,  0.219101240f,
+    0.975025345f,  0.222093621f,
+    0.974339383f,  0.225083911f,
+    0.973644250f,  0.228072083f,
+    0.972939952f,  0.231058108f,
+    0.972226497f,  0.234041959f,
+    0.971503891f,  0.237023606f,
+    0.970772141f,  0.240003022f,
+    0.970031253f,  0.242980180f,
+    0.969281235f,  0.245955050f,
+    0.968522094f,  0.248927606f,
+    0.967753837f,  0.251897818f,
+    0.966976471f,  0.254865660f,
+    0.966190003f,  0.257831102f,
+    0.965394442f,  0.260794118f,
+    0.964589793f,  0.263754679f,
+    0.963776066f,  0.266712757f,
+    0.962953267f,  0.269668326f,
+    0.962121404f,  0.272621355f,
+    0.961280486f,  0.275571819f,
+    0.960430519f,  0.278519689f,
+    0.959571513f,  0.281464938f,
+    0.958703475f,  0.284407537f,
+    0.957826413f,  0.287347460f,
+    0.956940336f,  0.290284677f,
+    0.956045251f,  0.293219163f,
+    0.955141168f,  0.296150888f,
+    0.954228095f,  0.299079826f,
+    0.953306040f,  0.302005949f,
+    0.952375013f,  0.304929230f,
+    0.951435021f,  0.307849640f,
+    0.950486074f,  0.310767153f,
+    0.949528181f,  0.313681740f,
+    0.948561350f,  0.316593376f,
+    0.947585591f,  0.319502031f,
+    0.946600913f,  0.322407679f,
+    0.945607325f,  0.325310292f,
+    0.944604837f,  0.328209844f,
+    0.943593458f,  0.331106306f,
+    0.942573198f,  0.333999651f,
+    0.941544065f,  0.336889853f,
+    0.940506071f,  0.339776884f,
+    0.939459224f,  0.342660717f,
+    0.938403534f,  0.345541325f,
+    0.937339012f,  0.348418680f,
+    0.936265667f,  0.351292756f,
+    0.935183510f,  0.354163525f,
+    0.934092550f,  0.357030961f,
+    0.932992799f,  0.359895037f,
+    0.931884266f,  0.362755724f,
+    0.930766961f,  0.365612998f,
+    0.929640896f,  0.368466830f,
+    0.928506080f,  0.371317194f,
+    0.927362526f,  0.374164063f,
+    0.926210242f,  0.377007410f,
+    0.925049241f,  0.379847209f,
+    0.923879533f,  0.382683432f,
+    0.922701128f,  0.385516054f,
+    0.921514039f,  0.388345047f,
+    0.920318277f,  0.391170384f,
+    0.919113852f,  0.393992040f,
+    0.917900776f,  0.396809987f,
+    0.916679060f,  0.399624200f,
+    0.915448716f,  0.402434651f,
+    0.914209756f,  0.405241314f,
+    0.912962190f,  0.408044163f,
+    0.911706032f,  0.410843171f,
+    0.910441292f,  0.413638312f,
+    0.909167983f,  0.416429560f,
+    0.907886116f,  0.419216888f,
+    0.906595705f,  0.422000271f,
+    0.905296759f,  0.424779681f,
+    0.903989293f,  0.427555093f,
+    0.902673318f,  0.430326481f,
+    0.901348847f,  0.433093819f,
+    0.900015892f,  0.435857080f,
+    0.898674466f,  0.438616239f,
+    0.897324581f,  0.441371269f,
+    0.895966250f,  0.444122145f,
+    0.894599486f,  0.446868840f,
+    0.893224301f,  0.449611330f,
+    0.891840709f,  0.452349587f,
+    0.890448723f,  0.455083587f,
+    0.889048356f,  0.457813304f,
+    0.887639620f,  0.460538711f,
+    0.886222530f,  0.463259784f,
+    0.884797098f,  0.465976496f,
+    0.883363339f,  0.468688822f,
+    0.881921264f,  0.471396737f,
+    0.880470889f,  0.474100215f,
+    0.879012226f,  0.476799230f,
+    0.877545290f,  0.479493758f,
+    0.876070094f,  0.482183772f,
+    0.874586652f,  0.484869248f,
+    0.873094978f,  0.487550160f,
+    0.871595087f,  0.490226483f,
+    0.870086991f,  0.492898192f,
+    0.868570706f,  0.495565262f,
+    0.867046246f,  0.498227667f,
+    0.865513624f,  0.500885383f,
+    0.863972856f,  0.503538384f,
+    0.862423956f,  0.506186645f,
+    0.860866939f,  0.508830143f,
+    0.859301818f,  0.511468850f,
+    0.857728610f,  0.514102744f,
+    0.856147328f,  0.516731799f,
+    0.854557988f,  0.519355990f,
+    0.852960605f,  0.521975293f,
+    0.851355193f,  0.524589683f,
+    0.849741768f,  0.527199135f,
+    0.848120345f,  0.529803625f,
+    0.846490939f,  0.532403128f,
+    0.844853565f,  0.534997620f,
+    0.843208240f,  0.537587076f,
+    0.841554977f,  0.540171473f,
+    0.839893794f,  0.542750785f,
+    0.838224706f,  0.545324988f,
+    0.836547727f,  0.547894059f,
+    0.834862875f,  0.550457973f,
+    0.833170165f,  0.553016706f,
+    0.831469612f,  0.555570233f,
+    0.829761234f,  0.558118531f,
+    0.828045045f,  0.560661576f,
+    0.826321063f,  0.563199344f,
+    0.824589303f,  0.565731811f,
+    0.822849781f,  0.568258953f,
+    0.821102515f,  0.570780746f,
+    0.819347520f,  0.573297167f,
+    0.817584813f,  0.575808191f,
+    0.815814411f,  0.578313796f,
+    0.814036330f,  0.580813958f,
+    0.812250587f,  0.583308653f,
+    0.810457198f,  0.585797857f,
+    0.808656182f,  0.588281548f,
+    0.806847554f,  0.590759702f,
+    0.805031331f,  0.593232295f,
+    0.803207531f,  0.595699304f,
+    0.801376172f,  0.598160707f,
+    0.799537269f,  0.600616479f,
+    0.797690841f,  0.603066599f,
+    0.795836905f,  0.605511041f,
+    0.793975478f,  0.607949785f,
+    0.792106577f,  0.610382806f,
+    0.790230221f,  0.612810082f,
+    0.788346428f,  0.615231591f,
+    0.786455214f,  0.617647308f,
+    0.784556597f,  0.620057212f,
+    0.782650596f,  0.622461279f,
+    0.780737229f,  0.624859488f,
+    0.778816512f,  0.627251815f,
+    0.776888466f,  0.629638239f,
+    0.774953107f,  0.632018736f,
+    0.773010453f,  0.634393284f,
+    0.771060524f,  0.636761861f,
+    0.769103338f,  0.639124445f,
+    0.767138912f,  0.641481013f,
+    0.765167266f,  0.643831543f,
+    0.763188417f,  0.646176013f,
+    0.761202385f,  0.648514401f,
+    0.759209189f,  0.650846685f,
+    0.757208847f,  0.653172843f,
+    0.755201377f,  0.655492853f,
+    0.753186799f,  0.657806693f,
+    0.751165132f,  0.660114342f,
+    0.749136395f,  0.662415778f,
+    0.747100606f,  0.664710978f,
+    0.745057785f,  0.666999922f,
+    0.743007952f,  0.669282588f,
+    0.740951125f,  0.671558955f,
+    0.738887324f,  0.673829000f,
+    0.736816569f,  0.676092704f,
+    0.734738878f,  0.678350043f,
+    0.732654272f,  0.680600998f,
+    0.730562769f,  0.682845546f,
+    0.728464390f,  0.685083668f,
+    0.726359155f,  0.687315341f,
+    0.724247083f,  0.689540545f,
+    0.722128194f,  0.691759258f,
+    0.720002508f,  0.693971461f,
+    0.717870045f,  0.696177131f,
+    0.715730825f,  0.698376249f,
+    0.713584869f,  0.700568794f,
+    0.711432196f,  0.702754744f,
+    0.709272826f,  0.704934080f,
+    0.707106781f,  0.707106781f,
+    0.704934080f,  0.709272826f,
+    0.702754744f,  0.711432196f,
+    0.700568794f,  0.713584869f,
+    0.698376249f,  0.715730825f,
+    0.696177131f,  0.717870045f,
+    0.693971461f,  0.720002508f,
+    0.691759258f,  0.722128194f,
+    0.689540545f,  0.724247083f,
+    0.687315341f,  0.726359155f,
+    0.685083668f,  0.728464390f,
+    0.682845546f,  0.730562769f,
+    0.680600998f,  0.732654272f,
+    0.678350043f,  0.734738878f,
+    0.676092704f,  0.736816569f,
+    0.673829000f,  0.738887324f,
+    0.671558955f,  0.740951125f,
+    0.669282588f,  0.743007952f,
+    0.666999922f,  0.745057785f,
+    0.664710978f,  0.747100606f,
+    0.662415778f,  0.749136395f,
+    0.660114342f,  0.751165132f,
+    0.657806693f,  0.753186799f,
+    0.655492853f,  0.755201377f,
+    0.653172843f,  0.757208847f,
+    0.650846685f,  0.759209189f,
+    0.648514401f,  0.761202385f,
+    0.646176013f,  0.763188417f,
+    0.643831543f,  0.765167266f,
+    0.641481013f,  0.767138912f,
+    0.639124445f,  0.769103338f,
+    0.636761861f,  0.771060524f,
+    0.634393284f,  0.773010453f,
+    0.632018736f,  0.774953107f,
+    0.629638239f,  0.776888466f,
+    0.627251815f,  0.778816512f,
+    0.624859488f,  0.780737229f,
+    0.622461279f,  0.782650596f,
+    0.620057212f,  0.784556597f,
+    0.617647308f,  0.786455214f,
+    0.615231591f,  0.788346428f,
+    0.612810082f,  0.790230221f,
+    0.610382806f,  0.792106577f,
+    0.607949785f,  0.793975478f,
+    0.605511041f,  0.795836905f,
+    0.603066599f,  0.797690841f,
+    0.600616479f,  0.799537269f,
+    0.598160707f,  0.801376172f,
+    0.595699304f,  0.803207531f,
+    0.593232295f,  0.805031331f,
+    0.590759702f,  0.806847554f,
+    0.588281548f,  0.808656182f,
+    0.585797857f,  0.810457198f,
+    0.583308653f,  0.812250587f,
+    0.580813958f,  0.814036330f,
+    0.578313796f,  0.815814411f,
+    0.575808191f,  0.817584813f,
+    0.573297167f,  0.819347520f,
+    0.570780746f,  0.821102515f,
+    0.568258953f,  0.822849781f,
+    0.565731811f,  0.824589303f,
+    0.563199344f,  0.826321063f,
+    0.560661576f,  0.828045045f,
+    0.558118531f,  0.829761234f,
+    0.555570233f,  0.831469612f,
+    0.553016706f,  0.833170165f,
+    0.550457973f,  0.834862875f,
+    0.547894059f,  0.836547727f,
+    0.545324988f,  0.838224706f,
+    0.542750785f,  0.839893794f,
+    0.540171473f,  0.841554977f,
+    0.537587076f,  0.843208240f,
+    0.534997620f,  0.844853565f,
+    0.532403128f,  0.846490939f,
+    0.529803625f,  0.848120345f,
+    0.527199135f,  0.849741768f,
+    0.524589683f,  0.851355193f,
+    0.521975293f,  0.852960605f,
+    0.519355990f,  0.854557988f,
+    0.516731799f,  0.856147328f,
+    0.514102744f,  0.857728610f,
+    0.511468850f,  0.859301818f,
+    0.508830143f,  0.860866939f,
+    0.506186645f,  0.862423956f,
+    0.503538384f,  0.863972856f,
+    0.500885383f,  0.865513624f,
+    0.498227667f,  0.867046246f,
+    0.495565262f,  0.868570706f,
+    0.492898192f,  0.870086991f,
+    0.490226483f,  0.871595087f,
+    0.487550160f,  0.873094978f,
+    0.484869248f,  0.874586652f,
+    0.482183772f,  0.876070094f,
+    0.479493758f,  0.877545290f,
+    0.476799230f,  0.879012226f,
+    0.474100215f,  0.880470889f,
+    0.471396737f,  0.881921264f,
+    0.468688822f,  0.883363339f,
+    0.465976496f,  0.884797098f,
+    0.463259784f,  0.886222530f,
+    0.460538711f,  0.887639620f,
+    0.457813304f,  0.889048356f,
+    0.455083587f,  0.890448723f,
+    0.452349587f,  0.891840709f,
+    0.449611330f,  0.893224301f,
+    0.446868840f,  0.894599486f,
+    0.444122145f,  0.895966250f,
+    0.441371269f,  0.897324581f,
+    0.438616239f,  0.898674466f,
+    0.435857080f,  0.900015892f,
+    0.433093819f,  0.901348847f,
+    0.430326481f,  0.902673318f,
+    0.427555093f,  0.903989293f,
+    0.424779681f,  0.905296759f,
+    0.422000271f,  0.906595705f,
+    0.419216888f,  0.907886116f,
+    0.416429560f,  0.909167983f,
+    0.413638312f,  0.910441292f,
+    0.410843171f,  0.911706032f,
+    0.408044163f,  0.912962190f,
+    0.405241314f,  0.914209756f,
+    0.402434651f,  0.915448716f,
+    0.399624200f,  0.916679060f,
+    0.396809987f,  0.917900776f,
+    0.393992040f,  0.919113852f,
+    0.391170384f,  0.920318277f,
+    0.388345047f,  0.921514039f,
+    0.385516054f,  0.922701128f,
+    0.382683432f,  0.923879533f,
+    0.379847209f,  0.925049241f,
+    0.377007410f,  0.926210242f,
+    0.374164063f,  0.927362526f,
+    0.371317194f,  0.928506080f,
+    0.368466830f,  0.929640896f,
+    0.365612998f,  0.930766961f,
+    0.362755724f,  0.931884266f,
+    0.359895037f,  0.932992799f,
+    0.357030961f,  0.934092550f,
+    0.354163525f,  0.935183510f,
+    0.351292756f,  0.936265667f,
+    0.348418680f,  0.937339012f,
+    0.345541325f,  0.938403534f,
+    0.342660717f,  0.939459224f,
+    0.339776884f,  0.940506071f,
+    0.336889853f,  0.941544065f,
+    0.333999651f,  0.942573198f,
+    0.331106306f,  0.943593458f,
+    0.328209844f,  0.944604837f,
+    0.325310292f,  0.945607325f,
+    0.322407679f,  0.946600913f,
+    0.319502031f,  0.947585591f,
+    0.316593376f,  0.948561350f,
+    0.313681740f,  0.949528181f,
+    0.310767153f,  0.950486074f,
+    0.307849640f,  0.951435021f,
+    0.304929230f,  0.952375013f,
+    0.302005949f,  0.953306040f,
+    0.299079826f,  0.954228095f,
+    0.296150888f,  0.955141168f,
+    0.293219163f,  0.956045251f,
+    0.290284677f,  0.956940336f,
+    0.287347460f,  0.957826413f,
+    0.284407537f,  0.958703475f,
+    0.281464938f,  0.959571513f,
+    0.278519689f,  0.960430519f,
+    0.275571819f,  0.961280486f,
+    0.272621355f,  0.962121404f,
+    0.269668326f,  0.962953267f,
+    0.266712757f,  0.963776066f,
+    0.263754679f,  0.964589793f,
+    0.260794118f,  0.965394442f,
+    0.257831102f,  0.966190003f,
+    0.254865660f,  0.966976471f,
+    0.251897818f,  0.967753837f,
+    0.248927606f,  0.968522094f,
+    0.245955050f,  0.969281235f,
+    0.242980180f,  0.970031253f,
+    0.240003022f,  0.970772141f,
+    0.237023606f,  0.971503891f,
+    0.234041959f,  0.972226497f,
+    0.231058108f,  0.972939952f,
+    0.228072083f,  0.973644250f,
+    0.225083911f,  0.974339383f,
+    0.222093621f,  0.975025345f,
+    0.219101240f,  0.975702130f,
+    0.216106797f,  0.976369731f,
+    0.213110320f,  0.977028143f,
+    0.210111837f,  0.977677358f,
+    0.207111376f,  0.978317371f,
+    0.204108966f,  0.978948175f,
+    0.201104635f,  0.979569766f,
+    0.198098411f,  0.980182136f,
+    0.195090322f,  0.980785280f,
+    0.192080397f,  0.981379193f,
+    0.189068664f,  0.981963869f,
+    0.186055152f,  0.982539302f,
+    0.183039888f,  0.983105487f,
+    0.180022901f,  0.983662419f,
+    0.177004220f,  0.984210092f,
+    0.173983873f,  0.984748502f,
+    0.170961889f,  0.985277642f,
+    0.167938295f,  0.985797509f,
+    0.164913120f,  0.986308097f,
+    0.161886394f,  0.986809402f,
+    0.158858143f,  0.987301418f,
+    0.155828398f,  0.987784142f,
+    0.152797185f,  0.988257568f,
+    0.149764535f,  0.988721692f,
+    0.146730474f,  0.989176510f,
+    0.143695033f,  0.989622017f,
+    0.140658239f,  0.990058210f,
+    0.137620122f,  0.990485084f,
+    0.134580709f,  0.990902635f,
+    0.131540029f,  0.991310860f,
+    0.128498111f,  0.991709754f,
+    0.125454983f,  0.992099313f,
+    0.122410675f,  0.992479535f,
+    0.119365215f,  0.992850414f,
+    0.116318631f,  0.993211949f,
+    0.113270952f,  0.993564136f,
+    0.110222207f,  0.993906970f,
+    0.107172425f,  0.994240449f,
+    0.104121634f,  0.994564571f,
+    0.101069863f,  0.994879331f,
+    0.098017140f,  0.995184727f,
+    0.094963495f,  0.995480755f,
+    0.091908956f,  0.995767414f,
+    0.088853553f,  0.996044701f,
+    0.085797312f,  0.996312612f,
+    0.082740265f,  0.996571146f,
+    0.079682438f,  0.996820299f,
+    0.076623861f,  0.997060070f,
+    0.073564564f,  0.997290457f,
+    0.070504573f,  0.997511456f,
+    0.067443920f,  0.997723067f,
+    0.064382631f,  0.997925286f,
+    0.061320736f,  0.998118113f,
+    0.058258265f,  0.998301545f,
+    0.055195244f,  0.998475581f,
+    0.052131705f,  0.998640218f,
+    0.049067674f,  0.998795456f,
+    0.046003182f,  0.998941293f,
+    0.042938257f,  0.999077728f,
+    0.039872928f,  0.999204759f,
+    0.036807223f,  0.999322385f,
+    0.033741172f,  0.999430605f,
+    0.030674803f,  0.999529418f,
+    0.027608146f,  0.999618822f,
+    0.024541229f,  0.999698819f,
+    0.021474080f,  0.999769405f,
+    0.018406730f,  0.999830582f,
+    0.015339206f,  0.999882347f,
+    0.012271538f,  0.999924702f,
+    0.009203755f,  0.999957645f,
+    0.006135885f,  0.999981175f,
+    0.003067957f,  0.999995294f,
+    0.000000000f,  1.000000000f,
+   -0.003067957f,  0.999995294f,
+   -0.006135885f,  0.999981175f,
+   -0.009203755f,  0.999957645f,
+   -0.012271538f,  0.999924702f,
+   -0.015339206f,  0.999882347f,
+   -0.018406730f,  0.999830582f,
+   -0.021474080f,  0.999769405f,
+   -0.024541229f,  0.999698819f,
+   -0.027608146f,  0.999618822f,
+   -0.030674803f,  0.999529418f,
+   -0.033741172f,  0.999430605f,
+   -0.036807223f,  0.999322385f,
+   -0.039872928f,  0.999204759f,
+   -0.042938257f,  0.999077728f,
+   -0.046003182f,  0.998941293f,
+   -0.049067674f,  0.998795456f,
+   -0.052131705f,  0.998640218f,
+   -0.055195244f,  0.998475581f,
+   -0.058258265f,  0.998301545f,
+   -0.061320736f,  0.998118113f,
+   -0.064382631f,  0.997925286f,
+   -0.067443920f,  0.997723067f,
+   -0.070504573f,  0.997511456f,
+   -0.073564564f,  0.997290457f,
+   -0.076623861f,  0.997060070f,
+   -0.079682438f,  0.996820299f,
+   -0.082740265f,  0.996571146f,
+   -0.085797312f,  0.996312612f,
+   -0.088853553f,  0.996044701f,
+   -0.091908956f,  0.995767414f,
+   -0.094963495f,  0.995480755f,
+   -0.098017140f,  0.995184727f,
+   -0.101069863f,  0.994879331f,
+   -0.104121634f,  0.994564571f,
+   -0.107172425f,  0.994240449f,
+   -0.110222207f,  0.993906970f,
+   -0.113270952f,  0.993564136f,
+   -0.116318631f,  0.993211949f,
+   -0.119365215f,  0.992850414f,
+   -0.122410675f,  0.992479535f,
+   -0.125454983f,  0.992099313f,
+   -0.128498111f,  0.991709754f,
+   -0.131540029f,  0.991310860f,
+   -0.134580709f,  0.990902635f,
+   -0.137620122f,  0.990485084f,
+   -0.140658239f,  0.990058210f,
+   -0.143695033f,  0.989622017f,
+   -0.146730474f,  0.989176510f,
+   -0.149764535f,  0.988721692f,
+   -0.152797185f,  0.988257568f,
+   -0.155828398f,  0.987784142f,
+   -0.158858143f,  0.987301418f,
+   -0.161886394f,  0.986809402f,
+   -0.164913120f,  0.986308097f,
+   -0.167938295f,  0.985797509f,
+   -0.170961889f,  0.985277642f,
+   -0.173983873f,  0.984748502f,
+   -0.177004220f,  0.984210092f,
+   -0.180022901f,  0.983662419f,
+   -0.183039888f,  0.983105487f,
+   -0.186055152f,  0.982539302f,
+   -0.189068664f,  0.981963869f,
+   -0.192080397f,  0.981379193f,
+   -0.195090322f,  0.980785280f,
+   -0.198098411f,  0.980182136f,
+   -0.201104635f,  0.979569766f,
+   -0.204108966f,  0.978948175f,
+   -0.207111376f,  0.978317371f,
+   -0.210111837f,  0.977677358f,
+   -0.213110320f,  0.977028143f,
+   -0.216106797f,  0.976369731f,
+   -0.219101240f,  0.975702130f,
+   -0.222093621f,  0.975025345f,
+   -0.225083911f,  0.974339383f,
+   -0.228072083f,  0.973644250f,
+   -0.231058108f,  0.972939952f,
+   -0.234041959f,  0.972226497f,
+   -0.237023606f,  0.971503891f,
+   -0.240003022f,  0.970772141f,
+   -0.242980180f,  0.970031253f,
+   -0.245955050f,  0.969281235f,
+   -0.248927606f,  0.968522094f,
+   -0.251897818f,  0.967753837f,
+   -0.254865660f,  0.966976471f,
+   -0.257831102f,  0.966190003f,
+   -0.260794118f,  0.965394442f,
+   -0.263754679f,  0.964589793f,
+   -0.266712757f,  0.963776066f,
+   -0.269668326f,  0.962953267f,
+   -0.272621355f,  0.962121404f,
+   -0.275571819f,  0.961280486f,
+   -0.278519689f,  0.960430519f,
+   -0.281464938f,  0.959571513f,
+   -0.284407537f,  0.958703475f,
+   -0.287347460f,  0.957826413f,
+   -0.290284677f,  0.956940336f,
+   -0.293219163f,  0.956045251f,
+   -0.296150888f,  0.955141168f,
+   -0.299079826f,  0.954228095f,
+   -0.302005949f,  0.953306040f,
+   -0.304929230f,  0.952375013f,
+   -0.307849640f,  0.951435021f,
+   -0.310767153f,  0.950486074f,
+   -0.313681740f,  0.949528181f,
+   -0.316593376f,  0.948561350f,
+   -0.319502031f,  0.947585591f,
+   -0.322407679f,  0.946600913f,
+   -0.325310292f,  0.945607325f,
+   -0.328209844f,  0.944604837f,
+   -0.331106306f,  0.943593458f,
+   -0.333999651f,  0.942573198f,
+   -0.336889853f,  0.941544065f,
+   -0.339776884f,  0.940506071f,
+   -0.342660717f,  0.939459224f,
+   -0.345541325f,  0.938403534f,
+   -0.348418680f,  0.937339012f,
+   -0.351292756f,  0.936265667f,
+   -0.354163525f,  0.935183510f,
+   -0.357030961f,  0.934092550f,
+   -0.359895037f,  0.932992799f,
+   -0.362755724f,  0.931884266f,
+   -0.365612998f,  0.930766961f,
+   -0.368466830f,  0.929640896f,
+   -0.371317194f,  0.928506080f,
+   -0.374164063f,  0.927362526f,
+   -0.377007410f,  0.926210242f,
+   -0.379847209f,  0.925049241f,
+   -0.382683432f,  0.923879533f,
+   -0.385516054f,  0.922701128f,
+   -0.388345047f,  0.921514039f,
+   -0.391170384f,  0.920318277f,
+   -0.393992040f,  0.919113852f,
+   -0.396809987f,  0.917900776f,
+   -0.399624200f,  0.916679060f,
+   -0.402434651f,  0.915448716f,
+   -0.405241314f,  0.914209756f,
+   -0.408044163f,  0.912962190f,
+   -0.410843171f,  0.911706032f,
+   -0.413638312f,  0.910441292f,
+   -0.416429560f,  0.909167983f,
+   -0.419216888f,  0.907886116f,
+   -0.422000271f,  0.906595705f,
+   -0.424779681f,  0.905296759f,
+   -0.427555093f,  0.903989293f,
+   -0.430326481f,  0.902673318f,
+   -0.433093819f,  0.901348847f,
+   -0.435857080f,  0.900015892f,
+   -0.438616239f,  0.898674466f,
+   -0.441371269f,  0.897324581f,
+   -0.444122145f,  0.895966250f,
+   -0.446868840f,  0.894599486f,
+   -0.449611330f,  0.893224301f,
+   -0.452349587f,  0.891840709f,
+   -0.455083587f,  0.890448723f,
+   -0.457813304f,  0.889048356f,
+   -0.460538711f,  0.887639620f,
+   -0.463259784f,  0.886222530f,
+   -0.465976496f,  0.884797098f,
+   -0.468688822f,  0.883363339f,
+   -0.471396737f,  0.881921264f,
+   -0.474100215f,  0.880470889f,
+   -0.476799230f,  0.879012226f,
+   -0.479493758f,  0.877545290f,
+   -0.482183772f,  0.876070094f,
+   -0.484869248f,  0.874586652f,
+   -0.487550160f,  0.873094978f,
+   -0.490226483f,  0.871595087f,
+   -0.492898192f,  0.870086991f,
+   -0.495565262f,  0.868570706f,
+   -0.498227667f,  0.867046246f,
+   -0.500885383f,  0.865513624f,
+   -0.503538384f,  0.863972856f,
+   -0.506186645f,  0.862423956f,
+   -0.508830143f,  0.860866939f,
+   -0.511468850f,  0.859301818f,
+   -0.514102744f,  0.857728610f,
+   -0.516731799f,  0.856147328f,
+   -0.519355990f,  0.854557988f,
+   -0.521975293f,  0.852960605f,
+   -0.524589683f,  0.851355193f,
+   -0.527199135f,  0.849741768f,
+   -0.529803625f,  0.848120345f,
+   -0.532403128f,  0.846490939f,
+   -0.534997620f,  0.844853565f,
+   -0.537587076f,  0.843208240f,
+   -0.540171473f,  0.841554977f,
+   -0.542750785f,  0.839893794f,
+   -0.545324988f,  0.838224706f,
+   -0.547894059f,  0.836547727f,
+   -0.550457973f,  0.834862875f,
+   -0.553016706f,  0.833170165f,
+   -0.555570233f,  0.831469612f,
+   -0.558118531f,  0.829761234f,
+   -0.560661576f,  0.828045045f,
+   -0.563199344f,  0.826321063f,
+   -0.565731811f,  0.824589303f,
+   -0.568258953f,  0.822849781f,
+   -0.570780746f,  0.821102515f,
+   -0.573297167f,  0.819347520f,
+   -0.575808191f,  0.817584813f,
+   -0.578313796f,  0.815814411f,
+   -0.580813958f,  0.814036330f,
+   -0.583308653f,  0.812250587f,
+   -0.585797857f,  0.810457198f,
+   -0.588281548f,  0.808656182f,
+   -0.590759702f,  0.806847554f,
+   -0.593232295f,  0.805031331f,
+   -0.595699304f,  0.803207531f,
+   -0.598160707f,  0.801376172f,
+   -0.600616479f,  0.799537269f,
+   -0.603066599f,  0.797690841f,
+   -0.605511041f,  0.795836905f,
+   -0.607949785f,  0.793975478f,
+   -0.610382806f,  0.792106577f,
+   -0.612810082f,  0.790230221f,
+   -0.615231591f,  0.788346428f,
+   -0.617647308f,  0.786455214f,
+   -0.620057212f,  0.784556597f,
+   -0.622461279f,  0.782650596f,
+   -0.624859488f,  0.780737229f,
+   -0.627251815f,  0.778816512f,
+   -0.629638239f,  0.776888466f,
+   -0.632018736f,  0.774953107f,
+   -0.634393284f,  0.773010453f,
+   -0.636761861f,  0.771060524f,
+   -0.639124445f,  0.769103338f,
+   -0.641481013f,  0.767138912f,
+   -0.643831543f,  0.765167266f,
+   -0.646176013f,  0.763188417f,
+   -0.648514401f,  0.761202385f,
+   -0.650846685f,  0.759209189f,
+   -0.653172843f,  0.757208847f,
+   -0.655492853f,  0.755201377f,
+   -0.657806693f,  0.753186799f,
+   -0.660114342f,  0.751165132f,
+   -0.662415778f,  0.749136395f,
+   -0.664710978f,  0.747100606f,
+   -0.666999922f,  0.745057785f,
+   -0.669282588f,  0.743007952f,
+   -0.671558955f,  0.740951125f,
+   -0.673829000f,  0.738887324f,
+   -0.676092704f,  0.736816569f,
+   -0.678350043f,  0.734738878f,
+   -0.680600998f,  0.732654272f,
+   -0.682845546f,  0.730562769f,
+   -0.685083668f,  0.728464390f,
+   -0.687315341f,  0.726359155f,
+   -0.689540545f,  0.724247083f,
+   -0.691759258f,  0.722128194f,
+   -0.693971461f,  0.720002508f,
+   -0.696177131f,  0.717870045f,
+   -0.698376249f,  0.715730825f,
+   -0.700568794f,  0.713584869f,
+   -0.702754744f,  0.711432196f,
+   -0.704934080f,  0.709272826f,
+   -0.707106781f,  0.707106781f,
+   -0.709272826f,  0.704934080f,
+   -0.711432196f,  0.702754744f,
+   -0.713584869f,  0.700568794f,
+   -0.715730825f,  0.698376249f,
+   -0.717870045f,  0.696177131f,
+   -0.720002508f,  0.693971461f,
+   -0.722128194f,  0.691759258f,
+   -0.724247083f,  0.689540545f,
+   -0.726359155f,  0.687315341f,
+   -0.728464390f,  0.685083668f,
+   -0.730562769f,  0.682845546f,
+   -0.732654272f,  0.680600998f,
+   -0.734738878f,  0.678350043f,
+   -0.736816569f,  0.676092704f,
+   -0.738887324f,  0.673829000f,
+   -0.740951125f,  0.671558955f,
+   -0.743007952f,  0.669282588f,
+   -0.745057785f,  0.666999922f,
+   -0.747100606f,  0.664710978f,
+   -0.749136395f,  0.662415778f,
+   -0.751165132f,  0.660114342f,
+   -0.753186799f,  0.657806693f,
+   -0.755201377f,  0.655492853f,
+   -0.757208847f,  0.653172843f,
+   -0.759209189f,  0.650846685f,
+   -0.761202385f,  0.648514401f,
+   -0.763188417f,  0.646176013f,
+   -0.765167266f,  0.643831543f,
+   -0.767138912f,  0.641481013f,
+   -0.769103338f,  0.639124445f,
+   -0.771060524f,  0.636761861f,
+   -0.773010453f,  0.634393284f,
+   -0.774953107f,  0.632018736f,
+   -0.776888466f,  0.629638239f,
+   -0.778816512f,  0.627251815f,
+   -0.780737229f,  0.624859488f,
+   -0.782650596f,  0.622461279f,
+   -0.784556597f,  0.620057212f,
+   -0.786455214f,  0.617647308f,
+   -0.788346428f,  0.615231591f,
+   -0.790230221f,  0.612810082f,
+   -0.792106577f,  0.610382806f,
+   -0.793975478f,  0.607949785f,
+   -0.795836905f,  0.605511041f,
+   -0.797690841f,  0.603066599f,
+   -0.799537269f,  0.600616479f,
+   -0.801376172f,  0.598160707f,
+   -0.803207531f,  0.595699304f,
+   -0.805031331f,  0.593232295f,
+   -0.806847554f,  0.590759702f,
+   -0.808656182f,  0.588281548f,
+   -0.810457198f,  0.585797857f,
+   -0.812250587f,  0.583308653f,
+   -0.814036330f,  0.580813958f,
+   -0.815814411f,  0.578313796f,
+   -0.817584813f,  0.575808191f,
+   -0.819347520f,  0.573297167f,
+   -0.821102515f,  0.570780746f,
+   -0.822849781f,  0.568258953f,
+   -0.824589303f,  0.565731811f,
+   -0.826321063f,  0.563199344f,
+   -0.828045045f,  0.560661576f,
+   -0.829761234f,  0.558118531f,
+   -0.831469612f,  0.555570233f,
+   -0.833170165f,  0.553016706f,
+   -0.834862875f,  0.550457973f,
+   -0.836547727f,  0.547894059f,
+   -0.838224706f,  0.545324988f,
+   -0.839893794f,  0.542750785f,
+   -0.841554977f,  0.540171473f,
+   -0.843208240f,  0.537587076f,
+   -0.844853565f,  0.534997620f,
+   -0.846490939f,  0.532403128f,
+   -0.848120345f,  0.529803625f,
+   -0.849741768f,  0.527199135f,
+   -0.851355193f,  0.524589683f,
+   -0.852960605f,  0.521975293f,
+   -0.854557988f,  0.519355990f,
+   -0.856147328f,  0.516731799f,
+   -0.857728610f,  0.514102744f,
+   -0.859301818f,  0.511468850f,
+   -0.860866939f,  0.508830143f,
+   -0.862423956f,  0.506186645f,
+   -0.863972856f,  0.503538384f,
+   -0.865513624f,  0.500885383f,
+   -0.867046246f,  0.498227667f,
+   -0.868570706f,  0.495565262f,
+   -0.870086991f,  0.492898192f,
+   -0.871595087f,  0.490226483f,
+   -0.873094978f,  0.487550160f,
+   -0.874586652f,  0.484869248f,
+   -0.876070094f,  0.482183772f,
+   -0.877545290f,  0.479493758f,
+   -0.879012226f,  0.476799230f,
+   -0.880470889f,  0.474100215f,
+   -0.881921264f,  0.471396737f,
+   -0.883363339f,  0.468688822f,
+   -0.884797098f,  0.465976496f,
+   -0.886222530f,  0.463259784f,
+   -0.887639620f,  0.460538711f,
+   -0.889048356f,  0.457813304f,
+   -0.890448723f,  0.455083587f,
+   -0.891840709f,  0.452349587f,
+   -0.893224301f,  0.449611330f,
+   -0.894599486f,  0.446868840f,
+   -0.895966250f,  0.444122145f,
+   -0.897324581f,  0.441371269f,
+   -0.898674466f,  0.438616239f,
+   -0.900015892f,  0.435857080f,
+   -0.901348847f,  0.433093819f,
+   -0.902673318f,  0.430326481f,
+   -0.903989293f,  0.427555093f,
+   -0.905296759f,  0.424779681f,
+   -0.906595705f,  0.422000271f,
+   -0.907886116f,  0.419216888f,
+   -0.909167983f,  0.416429560f,
+   -0.910441292f,  0.413638312f,
+   -0.911706032f,  0.410843171f,
+   -0.912962190f,  0.408044163f,
+   -0.914209756f,  0.405241314f,
+   -0.915448716f,  0.402434651f,
+   -0.916679060f,  0.399624200f,
+   -0.917900776f,  0.396809987f,
+   -0.919113852f,  0.393992040f,
+   -0.920318277f,  0.391170384f,
+   -0.921514039f,  0.388345047f,
+   -0.922701128f,  0.385516054f,
+   -0.923879533f,  0.382683432f,
+   -0.925049241f,  0.379847209f,
+   -0.926210242f,  0.377007410f,
+   -0.927362526f,  0.374164063f,
+   -0.928506080f,  0.371317194f,
+   -0.929640896f,  0.368466830f,
+   -0.930766961f,  0.365612998f,
+   -0.931884266f,  0.362755724f,
+   -0.932992799f,  0.359895037f,
+   -0.934092550f,  0.357030961f,
+   -0.935183510f,  0.354163525f,
+   -0.936265667f,  0.351292756f,
+   -0.937339012f,  0.348418680f,
+   -0.938403534f,  0.345541325f,
+   -0.939459224f,  0.342660717f,
+   -0.940506071f,  0.339776884f,
+   -0.941544065f,  0.336889853f,
+   -0.942573198f,  0.333999651f,
+   -0.943593458f,  0.331106306f,
+   -0.944604837f,  0.328209844f,
+   -0.945607325f,  0.325310292f,
+   -0.946600913f,  0.322407679f,
+   -0.947585591f,  0.319502031f,
+   -0.948561350f,  0.316593376f,
+   -0.949528181f,  0.313681740f,
+   -0.950486074f,  0.310767153f,
+   -0.951435021f,  0.307849640f,
+   -0.952375013f,  0.304929230f,
+   -0.953306040f,  0.302005949f,
+   -0.954228095f,  0.299079826f,
+   -0.955141168f,  0.296150888f,
+   -0.956045251f,  0.293219163f,
+   -0.956940336f,  0.290284677f,
+   -0.957826413f,  0.287347460f,
+   -0.958703475f,  0.284407537f,
+   -0.959571513f,  0.281464938f,
+   -0.960430519f,  0.278519689f,
+   -0.961280486f,  0.275571819f,
+   -0.962121404f,  0.272621355f,
+   -0.962953267f,  0.269668326f,
+   -0.963776066f,  0.266712757f,
+   -0.964589793f,  0.263754679f,
+   -0.965394442f,  0.260794118f,
+   -0.966190003f,  0.257831102f,
+   -0.966976471f,  0.254865660f,
+   -0.967753837f,  0.251897818f,
+   -0.968522094f,  0.248927606f,
+   -0.969281235f,  0.245955050f,
+   -0.970031253f,  0.242980180f,
+   -0.970772141f,  0.240003022f,
+   -0.971503891f,  0.237023606f,
+   -0.972226497f,  0.234041959f,
+   -0.972939952f,  0.231058108f,
+   -0.973644250f,  0.228072083f,
+   -0.974339383f,  0.225083911f,
+   -0.975025345f,  0.222093621f,
+   -0.975702130f,  0.219101240f,
+   -0.976369731f,  0.216106797f,
+   -0.977028143f,  0.213110320f,
+   -0.977677358f,  0.210111837f,
+   -0.978317371f,  0.207111376f,
+   -0.978948175f,  0.204108966f,
+   -0.979569766f,  0.201104635f,
+   -0.980182136f,  0.198098411f,
+   -0.980785280f,  0.195090322f,
+   -0.981379193f,  0.192080397f,
+   -0.981963869f,  0.189068664f,
+   -0.982539302f,  0.186055152f,
+   -0.983105487f,  0.183039888f,
+   -0.983662419f,  0.180022901f,
+   -0.984210092f,  0.177004220f,
+   -0.984748502f,  0.173983873f,
+   -0.985277642f,  0.170961889f,
+   -0.985797509f,  0.167938295f,
+   -0.986308097f,  0.164913120f,
+   -0.986809402f,  0.161886394f,
+   -0.987301418f,  0.158858143f,
+   -0.987784142f,  0.155828398f,
+   -0.988257568f,  0.152797185f,
+   -0.988721692f,  0.149764535f,
+   -0.989176510f,  0.146730474f,
+   -0.989622017f,  0.143695033f,
+   -0.990058210f,  0.140658239f,
+   -0.990485084f,  0.137620122f,
+   -0.990902635f,  0.134580709f,
+   -0.991310860f,  0.131540029f,
+   -0.991709754f,  0.128498111f,
+   -0.992099313f,  0.125454983f,
+   -0.992479535f,  0.122410675f,
+   -0.992850414f,  0.119365215f,
+   -0.993211949f,  0.116318631f,
+   -0.993564136f,  0.113270952f,
+   -0.993906970f,  0.110222207f,
+   -0.994240449f,  0.107172425f,
+   -0.994564571f,  0.104121634f,
+   -0.994879331f,  0.101069863f,
+   -0.995184727f,  0.098017140f,
+   -0.995480755f,  0.094963495f,
+   -0.995767414f,  0.091908956f,
+   -0.996044701f,  0.088853553f,
+   -0.996312612f,  0.085797312f,
+   -0.996571146f,  0.082740265f,
+   -0.996820299f,  0.079682438f,
+   -0.997060070f,  0.076623861f,
+   -0.997290457f,  0.073564564f,
+   -0.997511456f,  0.070504573f,
+   -0.997723067f,  0.067443920f,
+   -0.997925286f,  0.064382631f,
+   -0.998118113f,  0.061320736f,
+   -0.998301545f,  0.058258265f,
+   -0.998475581f,  0.055195244f,
+   -0.998640218f,  0.052131705f,
+   -0.998795456f,  0.049067674f,
+   -0.998941293f,  0.046003182f,
+   -0.999077728f,  0.042938257f,
+   -0.999204759f,  0.039872928f,
+   -0.999322385f,  0.036807223f,
+   -0.999430605f,  0.033741172f,
+   -0.999529418f,  0.030674803f,
+   -0.999618822f,  0.027608146f,
+   -0.999698819f,  0.024541229f,
+   -0.999769405f,  0.021474080f,
+   -0.999830582f,  0.018406730f,
+   -0.999882347f,  0.015339206f,
+   -0.999924702f,  0.012271538f,
+   -0.999957645f,  0.009203755f,
+   -0.999981175f,  0.006135885f,
+   -0.999995294f,  0.003067957f,
+   -1.000000000f,  0.000000000f,
+   -0.999995294f, -0.003067957f,
+   -0.999981175f, -0.006135885f,
+   -0.999957645f, -0.009203755f,
+   -0.999924702f, -0.012271538f,
+   -0.999882347f, -0.015339206f,
+   -0.999830582f, -0.018406730f,
+   -0.999769405f, -0.021474080f,
+   -0.999698819f, -0.024541229f,
+   -0.999618822f, -0.027608146f,
+   -0.999529418f, -0.030674803f,
+   -0.999430605f, -0.033741172f,
+   -0.999322385f, -0.036807223f,
+   -0.999204759f, -0.039872928f,
+   -0.999077728f, -0.042938257f,
+   -0.998941293f, -0.046003182f,
+   -0.998795456f, -0.049067674f,
+   -0.998640218f, -0.052131705f,
+   -0.998475581f, -0.055195244f,
+   -0.998301545f, -0.058258265f,
+   -0.998118113f, -0.061320736f,
+   -0.997925286f, -0.064382631f,
+   -0.997723067f, -0.067443920f,
+   -0.997511456f, -0.070504573f,
+   -0.997290457f, -0.073564564f,
+   -0.997060070f, -0.076623861f,
+   -0.996820299f, -0.079682438f,
+   -0.996571146f, -0.082740265f,
+   -0.996312612f, -0.085797312f,
+   -0.996044701f, -0.088853553f,
+   -0.995767414f, -0.091908956f,
+   -0.995480755f, -0.094963495f,
+   -0.995184727f, -0.098017140f,
+   -0.994879331f, -0.101069863f,
+   -0.994564571f, -0.104121634f,
+   -0.994240449f, -0.107172425f,
+   -0.993906970f, -0.110222207f,
+   -0.993564136f, -0.113270952f,
+   -0.993211949f, -0.116318631f,
+   -0.992850414f, -0.119365215f,
+   -0.992479535f, -0.122410675f,
+   -0.992099313f, -0.125454983f,
+   -0.991709754f, -0.128498111f,
+   -0.991310860f, -0.131540029f,
+   -0.990902635f, -0.134580709f,
+   -0.990485084f, -0.137620122f,
+   -0.990058210f, -0.140658239f,
+   -0.989622017f, -0.143695033f,
+   -0.989176510f, -0.146730474f,
+   -0.988721692f, -0.149764535f,
+   -0.988257568f, -0.152797185f,
+   -0.987784142f, -0.155828398f,
+   -0.987301418f, -0.158858143f,
+   -0.986809402f, -0.161886394f,
+   -0.986308097f, -0.164913120f,
+   -0.985797509f, -0.167938295f,
+   -0.985277642f, -0.170961889f,
+   -0.984748502f, -0.173983873f,
+   -0.984210092f, -0.177004220f,
+   -0.983662419f, -0.180022901f,
+   -0.983105487f, -0.183039888f,
+   -0.982539302f, -0.186055152f,
+   -0.981963869f, -0.189068664f,
+   -0.981379193f, -0.192080397f,
+   -0.980785280f, -0.195090322f,
+   -0.980182136f, -0.198098411f,
+   -0.979569766f, -0.201104635f,
+   -0.978948175f, -0.204108966f,
+   -0.978317371f, -0.207111376f,
+   -0.977677358f, -0.210111837f,
+   -0.977028143f, -0.213110320f,
+   -0.976369731f, -0.216106797f,
+   -0.975702130f, -0.219101240f,
+   -0.975025345f, -0.222093621f,
+   -0.974339383f, -0.225083911f,
+   -0.973644250f, -0.228072083f,
+   -0.972939952f, -0.231058108f,
+   -0.972226497f, -0.234041959f,
+   -0.971503891f, -0.237023606f,
+   -0.970772141f, -0.240003022f,
+   -0.970031253f, -0.242980180f,
+   -0.969281235f, -0.245955050f,
+   -0.968522094f, -0.248927606f,
+   -0.967753837f, -0.251897818f,
+   -0.966976471f, -0.254865660f,
+   -0.966190003f, -0.257831102f,
+   -0.965394442f, -0.260794118f,
+   -0.964589793f, -0.263754679f,
+   -0.963776066f, -0.266712757f,
+   -0.962953267f, -0.269668326f,
+   -0.962121404f, -0.272621355f,
+   -0.961280486f, -0.275571819f,
+   -0.960430519f, -0.278519689f,
+   -0.959571513f, -0.281464938f,
+   -0.958703475f, -0.284407537f,
+   -0.957826413f, -0.287347460f,
+   -0.956940336f, -0.290284677f,
+   -0.956045251f, -0.293219163f,
+   -0.955141168f, -0.296150888f,
+   -0.954228095f, -0.299079826f,
+   -0.953306040f, -0.302005949f,
+   -0.952375013f, -0.304929230f,
+   -0.951435021f, -0.307849640f,
+   -0.950486074f, -0.310767153f,
+   -0.949528181f, -0.313681740f,
+   -0.948561350f, -0.316593376f,
+   -0.947585591f, -0.319502031f,
+   -0.946600913f, -0.322407679f,
+   -0.945607325f, -0.325310292f,
+   -0.944604837f, -0.328209844f,
+   -0.943593458f, -0.331106306f,
+   -0.942573198f, -0.333999651f,
+   -0.941544065f, -0.336889853f,
+   -0.940506071f, -0.339776884f,
+   -0.939459224f, -0.342660717f,
+   -0.938403534f, -0.345541325f,
+   -0.937339012f, -0.348418680f,
+   -0.936265667f, -0.351292756f,
+   -0.935183510f, -0.354163525f,
+   -0.934092550f, -0.357030961f,
+   -0.932992799f, -0.359895037f,
+   -0.931884266f, -0.362755724f,
+   -0.930766961f, -0.365612998f,
+   -0.929640896f, -0.368466830f,
+   -0.928506080f, -0.371317194f,
+   -0.927362526f, -0.374164063f,
+   -0.926210242f, -0.377007410f,
+   -0.925049241f, -0.379847209f,
+   -0.923879533f, -0.382683432f,
+   -0.922701128f, -0.385516054f,
+   -0.921514039f, -0.388345047f,
+   -0.920318277f, -0.391170384f,
+   -0.919113852f, -0.393992040f,
+   -0.917900776f, -0.396809987f,
+   -0.916679060f, -0.399624200f,
+   -0.915448716f, -0.402434651f,
+   -0.914209756f, -0.405241314f,
+   -0.912962190f, -0.408044163f,
+   -0.911706032f, -0.410843171f,
+   -0.910441292f, -0.413638312f,
+   -0.909167983f, -0.416429560f,
+   -0.907886116f, -0.419216888f,
+   -0.906595705f, -0.422000271f,
+   -0.905296759f, -0.424779681f,
+   -0.903989293f, -0.427555093f,
+   -0.902673318f, -0.430326481f,
+   -0.901348847f, -0.433093819f,
+   -0.900015892f, -0.435857080f,
+   -0.898674466f, -0.438616239f,
+   -0.897324581f, -0.441371269f,
+   -0.895966250f, -0.444122145f,
+   -0.894599486f, -0.446868840f,
+   -0.893224301f, -0.449611330f,
+   -0.891840709f, -0.452349587f,
+   -0.890448723f, -0.455083587f,
+   -0.889048356f, -0.457813304f,
+   -0.887639620f, -0.460538711f,
+   -0.886222530f, -0.463259784f,
+   -0.884797098f, -0.465976496f,
+   -0.883363339f, -0.468688822f,
+   -0.881921264f, -0.471396737f,
+   -0.880470889f, -0.474100215f,
+   -0.879012226f, -0.476799230f,
+   -0.877545290f, -0.479493758f,
+   -0.876070094f, -0.482183772f,
+   -0.874586652f, -0.484869248f,
+   -0.873094978f, -0.487550160f,
+   -0.871595087f, -0.490226483f,
+   -0.870086991f, -0.492898192f,
+   -0.868570706f, -0.495565262f,
+   -0.867046246f, -0.498227667f,
+   -0.865513624f, -0.500885383f,
+   -0.863972856f, -0.503538384f,
+   -0.862423956f, -0.506186645f,
+   -0.860866939f, -0.508830143f,
+   -0.859301818f, -0.511468850f,
+   -0.857728610f, -0.514102744f,
+   -0.856147328f, -0.516731799f,
+   -0.854557988f, -0.519355990f,
+   -0.852960605f, -0.521975293f,
+   -0.851355193f, -0.524589683f,
+   -0.849741768f, -0.527199135f,
+   -0.848120345f, -0.529803625f,
+   -0.846490939f, -0.532403128f,
+   -0.844853565f, -0.534997620f,
+   -0.843208240f, -0.537587076f,
+   -0.841554977f, -0.540171473f,
+   -0.839893794f, -0.542750785f,
+   -0.838224706f, -0.545324988f,
+   -0.836547727f, -0.547894059f,
+   -0.834862875f, -0.550457973f,
+   -0.833170165f, -0.553016706f,
+   -0.831469612f, -0.555570233f,
+   -0.829761234f, -0.558118531f,
+   -0.828045045f, -0.560661576f,
+   -0.826321063f, -0.563199344f,
+   -0.824589303f, -0.565731811f,
+   -0.822849781f, -0.568258953f,
+   -0.821102515f, -0.570780746f,
+   -0.819347520f, -0.573297167f,
+   -0.817584813f, -0.575808191f,
+   -0.815814411f, -0.578313796f,
+   -0.814036330f, -0.580813958f,
+   -0.812250587f, -0.583308653f,
+   -0.810457198f, -0.585797857f,
+   -0.808656182f, -0.588281548f,
+   -0.806847554f, -0.590759702f,
+   -0.805031331f, -0.593232295f,
+   -0.803207531f, -0.595699304f,
+   -0.801376172f, -0.598160707f,
+   -0.799537269f, -0.600616479f,
+   -0.797690841f, -0.603066599f,
+   -0.795836905f, -0.605511041f,
+   -0.793975478f, -0.607949785f,
+   -0.792106577f, -0.610382806f,
+   -0.790230221f, -0.612810082f,
+   -0.788346428f, -0.615231591f,
+   -0.786455214f, -0.617647308f,
+   -0.784556597f, -0.620057212f,
+   -0.782650596f, -0.622461279f,
+   -0.780737229f, -0.624859488f,
+   -0.778816512f, -0.627251815f,
+   -0.776888466f, -0.629638239f,
+   -0.774953107f, -0.632018736f,
+   -0.773010453f, -0.634393284f,
+   -0.771060524f, -0.636761861f,
+   -0.769103338f, -0.639124445f,
+   -0.767138912f, -0.641481013f,
+   -0.765167266f, -0.643831543f,
+   -0.763188417f, -0.646176013f,
+   -0.761202385f, -0.648514401f,
+   -0.759209189f, -0.650846685f,
+   -0.757208847f, -0.653172843f,
+   -0.755201377f, -0.655492853f,
+   -0.753186799f, -0.657806693f,
+   -0.751165132f, -0.660114342f,
+   -0.749136395f, -0.662415778f,
+   -0.747100606f, -0.664710978f,
+   -0.745057785f, -0.666999922f,
+   -0.743007952f, -0.669282588f,
+   -0.740951125f, -0.671558955f,
+   -0.738887324f, -0.673829000f,
+   -0.736816569f, -0.676092704f,
+   -0.734738878f, -0.678350043f,
+   -0.732654272f, -0.680600998f,
+   -0.730562769f, -0.682845546f,
+   -0.728464390f, -0.685083668f,
+   -0.726359155f, -0.687315341f,
+   -0.724247083f, -0.689540545f,
+   -0.722128194f, -0.691759258f,
+   -0.720002508f, -0.693971461f,
+   -0.717870045f, -0.696177131f,
+   -0.715730825f, -0.698376249f,
+   -0.713584869f, -0.700568794f,
+   -0.711432196f, -0.702754744f,
+   -0.709272826f, -0.704934080f,
+   -0.707106781f, -0.707106781f,
+   -0.704934080f, -0.709272826f,
+   -0.702754744f, -0.711432196f,
+   -0.700568794f, -0.713584869f,
+   -0.698376249f, -0.715730825f,
+   -0.696177131f, -0.717870045f,
+   -0.693971461f, -0.720002508f,
+   -0.691759258f, -0.722128194f,
+   -0.689540545f, -0.724247083f,
+   -0.687315341f, -0.726359155f,
+   -0.685083668f, -0.728464390f,
+   -0.682845546f, -0.730562769f,
+   -0.680600998f, -0.732654272f,
+   -0.678350043f, -0.734738878f,
+   -0.676092704f, -0.736816569f,
+   -0.673829000f, -0.738887324f,
+   -0.671558955f, -0.740951125f,
+   -0.669282588f, -0.743007952f,
+   -0.666999922f, -0.745057785f,
+   -0.664710978f, -0.747100606f,
+   -0.662415778f, -0.749136395f,
+   -0.660114342f, -0.751165132f,
+   -0.657806693f, -0.753186799f,
+   -0.655492853f, -0.755201377f,
+   -0.653172843f, -0.757208847f,
+   -0.650846685f, -0.759209189f,
+   -0.648514401f, -0.761202385f,
+   -0.646176013f, -0.763188417f,
+   -0.643831543f, -0.765167266f,
+   -0.641481013f, -0.767138912f,
+   -0.639124445f, -0.769103338f,
+   -0.636761861f, -0.771060524f,
+   -0.634393284f, -0.773010453f,
+   -0.632018736f, -0.774953107f,
+   -0.629638239f, -0.776888466f,
+   -0.627251815f, -0.778816512f,
+   -0.624859488f, -0.780737229f,
+   -0.622461279f, -0.782650596f,
+   -0.620057212f, -0.784556597f,
+   -0.617647308f, -0.786455214f,
+   -0.615231591f, -0.788346428f,
+   -0.612810082f, -0.790230221f,
+   -0.610382806f, -0.792106577f,
+   -0.607949785f, -0.793975478f,
+   -0.605511041f, -0.795836905f,
+   -0.603066599f, -0.797690841f,
+   -0.600616479f, -0.799537269f,
+   -0.598160707f, -0.801376172f,
+   -0.595699304f, -0.803207531f,
+   -0.593232295f, -0.805031331f,
+   -0.590759702f, -0.806847554f,
+   -0.588281548f, -0.808656182f,
+   -0.585797857f, -0.810457198f,
+   -0.583308653f, -0.812250587f,
+   -0.580813958f, -0.814036330f,
+   -0.578313796f, -0.815814411f,
+   -0.575808191f, -0.817584813f,
+   -0.573297167f, -0.819347520f,
+   -0.570780746f, -0.821102515f,
+   -0.568258953f, -0.822849781f,
+   -0.565731811f, -0.824589303f,
+   -0.563199344f, -0.826321063f,
+   -0.560661576f, -0.828045045f,
+   -0.558118531f, -0.829761234f,
+   -0.555570233f, -0.831469612f,
+   -0.553016706f, -0.833170165f,
+   -0.550457973f, -0.834862875f,
+   -0.547894059f, -0.836547727f,
+   -0.545324988f, -0.838224706f,
+   -0.542750785f, -0.839893794f,
+   -0.540171473f, -0.841554977f,
+   -0.537587076f, -0.843208240f,
+   -0.534997620f, -0.844853565f,
+   -0.532403128f, -0.846490939f,
+   -0.529803625f, -0.848120345f,
+   -0.527199135f, -0.849741768f,
+   -0.524589683f, -0.851355193f,
+   -0.521975293f, -0.852960605f,
+   -0.519355990f, -0.854557988f,
+   -0.516731799f, -0.856147328f,
+   -0.514102744f, -0.857728610f,
+   -0.511468850f, -0.859301818f,
+   -0.508830143f, -0.860866939f,
+   -0.506186645f, -0.862423956f,
+   -0.503538384f, -0.863972856f,
+   -0.500885383f, -0.865513624f,
+   -0.498227667f, -0.867046246f,
+   -0.495565262f, -0.868570706f,
+   -0.492898192f, -0.870086991f,
+   -0.490226483f, -0.871595087f,
+   -0.487550160f, -0.873094978f,
+   -0.484869248f, -0.874586652f,
+   -0.482183772f, -0.876070094f,
+   -0.479493758f, -0.877545290f,
+   -0.476799230f, -0.879012226f,
+   -0.474100215f, -0.880470889f,
+   -0.471396737f, -0.881921264f,
+   -0.468688822f, -0.883363339f,
+   -0.465976496f, -0.884797098f,
+   -0.463259784f, -0.886222530f,
+   -0.460538711f, -0.887639620f,
+   -0.457813304f, -0.889048356f,
+   -0.455083587f, -0.890448723f,
+   -0.452349587f, -0.891840709f,
+   -0.449611330f, -0.893224301f,
+   -0.446868840f, -0.894599486f,
+   -0.444122145f, -0.895966250f,
+   -0.441371269f, -0.897324581f,
+   -0.438616239f, -0.898674466f,
+   -0.435857080f, -0.900015892f,
+   -0.433093819f, -0.901348847f,
+   -0.430326481f, -0.902673318f,
+   -0.427555093f, -0.903989293f,
+   -0.424779681f, -0.905296759f,
+   -0.422000271f, -0.906595705f,
+   -0.419216888f, -0.907886116f,
+   -0.416429560f, -0.909167983f,
+   -0.413638312f, -0.910441292f,
+   -0.410843171f, -0.911706032f,
+   -0.408044163f, -0.912962190f,
+   -0.405241314f, -0.914209756f,
+   -0.402434651f, -0.915448716f,
+   -0.399624200f, -0.916679060f,
+   -0.396809987f, -0.917900776f,
+   -0.393992040f, -0.919113852f,
+   -0.391170384f, -0.920318277f,
+   -0.388345047f, -0.921514039f,
+   -0.385516054f, -0.922701128f,
+   -0.382683432f, -0.923879533f,
+   -0.379847209f, -0.925049241f,
+   -0.377007410f, -0.926210242f,
+   -0.374164063f, -0.927362526f,
+   -0.371317194f, -0.928506080f,
+   -0.368466830f, -0.929640896f,
+   -0.365612998f, -0.930766961f,
+   -0.362755724f, -0.931884266f,
+   -0.359895037f, -0.932992799f,
+   -0.357030961f, -0.934092550f,
+   -0.354163525f, -0.935183510f,
+   -0.351292756f, -0.936265667f,
+   -0.348418680f, -0.937339012f,
+   -0.345541325f, -0.938403534f,
+   -0.342660717f, -0.939459224f,
+   -0.339776884f, -0.940506071f,
+   -0.336889853f, -0.941544065f,
+   -0.333999651f, -0.942573198f,
+   -0.331106306f, -0.943593458f,
+   -0.328209844f, -0.944604837f,
+   -0.325310292f, -0.945607325f,
+   -0.322407679f, -0.946600913f,
+   -0.319502031f, -0.947585591f,
+   -0.316593376f, -0.948561350f,
+   -0.313681740f, -0.949528181f,
+   -0.310767153f, -0.950486074f,
+   -0.307849640f, -0.951435021f,
+   -0.304929230f, -0.952375013f,
+   -0.302005949f, -0.953306040f,
+   -0.299079826f, -0.954228095f,
+   -0.296150888f, -0.955141168f,
+   -0.293219163f, -0.956045251f,
+   -0.290284677f, -0.956940336f,
+   -0.287347460f, -0.957826413f,
+   -0.284407537f, -0.958703475f,
+   -0.281464938f, -0.959571513f,
+   -0.278519689f, -0.960430519f,
+   -0.275571819f, -0.961280486f,
+   -0.272621355f, -0.962121404f,
+   -0.269668326f, -0.962953267f,
+   -0.266712757f, -0.963776066f,
+   -0.263754679f, -0.964589793f,
+   -0.260794118f, -0.965394442f,
+   -0.257831102f, -0.966190003f,
+   -0.254865660f, -0.966976471f,
+   -0.251897818f, -0.967753837f,
+   -0.248927606f, -0.968522094f,
+   -0.245955050f, -0.969281235f,
+   -0.242980180f, -0.970031253f,
+   -0.240003022f, -0.970772141f,
+   -0.237023606f, -0.971503891f,
+   -0.234041959f, -0.972226497f,
+   -0.231058108f, -0.972939952f,
+   -0.228072083f, -0.973644250f,
+   -0.225083911f, -0.974339383f,
+   -0.222093621f, -0.975025345f,
+   -0.219101240f, -0.975702130f,
+   -0.216106797f, -0.976369731f,
+   -0.213110320f, -0.977028143f,
+   -0.210111837f, -0.977677358f,
+   -0.207111376f, -0.978317371f,
+   -0.204108966f, -0.978948175f,
+   -0.201104635f, -0.979569766f,
+   -0.198098411f, -0.980182136f,
+   -0.195090322f, -0.980785280f,
+   -0.192080397f, -0.981379193f,
+   -0.189068664f, -0.981963869f,
+   -0.186055152f, -0.982539302f,
+   -0.183039888f, -0.983105487f,
+   -0.180022901f, -0.983662419f,
+   -0.177004220f, -0.984210092f,
+   -0.173983873f, -0.984748502f,
+   -0.170961889f, -0.985277642f,
+   -0.167938295f, -0.985797509f,
+   -0.164913120f, -0.986308097f,
+   -0.161886394f, -0.986809402f,
+   -0.158858143f, -0.987301418f,
+   -0.155828398f, -0.987784142f,
+   -0.152797185f, -0.988257568f,
+   -0.149764535f, -0.988721692f,
+   -0.146730474f, -0.989176510f,
+   -0.143695033f, -0.989622017f,
+   -0.140658239f, -0.990058210f,
+   -0.137620122f, -0.990485084f,
+   -0.134580709f, -0.990902635f,
+   -0.131540029f, -0.991310860f,
+   -0.128498111f, -0.991709754f,
+   -0.125454983f, -0.992099313f,
+   -0.122410675f, -0.992479535f,
+   -0.119365215f, -0.992850414f,
+   -0.116318631f, -0.993211949f,
+   -0.113270952f, -0.993564136f,
+   -0.110222207f, -0.993906970f,
+   -0.107172425f, -0.994240449f,
+   -0.104121634f, -0.994564571f,
+   -0.101069863f, -0.994879331f,
+   -0.098017140f, -0.995184727f,
+   -0.094963495f, -0.995480755f,
+   -0.091908956f, -0.995767414f,
+   -0.088853553f, -0.996044701f,
+   -0.085797312f, -0.996312612f,
+   -0.082740265f, -0.996571146f,
+   -0.079682438f, -0.996820299f,
+   -0.076623861f, -0.997060070f,
+   -0.073564564f, -0.997290457f,
+   -0.070504573f, -0.997511456f,
+   -0.067443920f, -0.997723067f,
+   -0.064382631f, -0.997925286f,
+   -0.061320736f, -0.998118113f,
+   -0.058258265f, -0.998301545f,
+   -0.055195244f, -0.998475581f,
+   -0.052131705f, -0.998640218f,
+   -0.049067674f, -0.998795456f,
+   -0.046003182f, -0.998941293f,
+   -0.042938257f, -0.999077728f,
+   -0.039872928f, -0.999204759f,
+   -0.036807223f, -0.999322385f,
+   -0.033741172f, -0.999430605f,
+   -0.030674803f, -0.999529418f,
+   -0.027608146f, -0.999618822f,
+   -0.024541229f, -0.999698819f,
+   -0.021474080f, -0.999769405f,
+   -0.018406730f, -0.999830582f,
+   -0.015339206f, -0.999882347f,
+   -0.012271538f, -0.999924702f,
+   -0.009203755f, -0.999957645f,
+   -0.006135885f, -0.999981175f,
+   -0.003067957f, -0.999995294f,
+   -0.000000000f, -1.000000000f,
+    0.003067957f, -0.999995294f,
+    0.006135885f, -0.999981175f,
+    0.009203755f, -0.999957645f,
+    0.012271538f, -0.999924702f,
+    0.015339206f, -0.999882347f,
+    0.018406730f, -0.999830582f,
+    0.021474080f, -0.999769405f,
+    0.024541229f, -0.999698819f,
+    0.027608146f, -0.999618822f,
+    0.030674803f, -0.999529418f,
+    0.033741172f, -0.999430605f,
+    0.036807223f, -0.999322385f,
+    0.039872928f, -0.999204759f,
+    0.042938257f, -0.999077728f,
+    0.046003182f, -0.998941293f,
+    0.049067674f, -0.998795456f,
+    0.052131705f, -0.998640218f,
+    0.055195244f, -0.998475581f,
+    0.058258265f, -0.998301545f,
+    0.061320736f, -0.998118113f,
+    0.064382631f, -0.997925286f,
+    0.067443920f, -0.997723067f,
+    0.070504573f, -0.997511456f,
+    0.073564564f, -0.997290457f,
+    0.076623861f, -0.997060070f,
+    0.079682438f, -0.996820299f,
+    0.082740265f, -0.996571146f,
+    0.085797312f, -0.996312612f,
+    0.088853553f, -0.996044701f,
+    0.091908956f, -0.995767414f,
+    0.094963495f, -0.995480755f,
+    0.098017140f, -0.995184727f,
+    0.101069863f, -0.994879331f,
+    0.104121634f, -0.994564571f,
+    0.107172425f, -0.994240449f,
+    0.110222207f, -0.993906970f,
+    0.113270952f, -0.993564136f,
+    0.116318631f, -0.993211949f,
+    0.119365215f, -0.992850414f,
+    0.122410675f, -0.992479535f,
+    0.125454983f, -0.992099313f,
+    0.128498111f, -0.991709754f,
+    0.131540029f, -0.991310860f,
+    0.134580709f, -0.990902635f,
+    0.137620122f, -0.990485084f,
+    0.140658239f, -0.990058210f,
+    0.143695033f, -0.989622017f,
+    0.146730474f, -0.989176510f,
+    0.149764535f, -0.988721692f,
+    0.152797185f, -0.988257568f,
+    0.155828398f, -0.987784142f,
+    0.158858143f, -0.987301418f,
+    0.161886394f, -0.986809402f,
+    0.164913120f, -0.986308097f,
+    0.167938295f, -0.985797509f,
+    0.170961889f, -0.985277642f,
+    0.173983873f, -0.984748502f,
+    0.177004220f, -0.984210092f,
+    0.180022901f, -0.983662419f,
+    0.183039888f, -0.983105487f,
+    0.186055152f, -0.982539302f,
+    0.189068664f, -0.981963869f,
+    0.192080397f, -0.981379193f,
+    0.195090322f, -0.980785280f,
+    0.198098411f, -0.980182136f,
+    0.201104635f, -0.979569766f,
+    0.204108966f, -0.978948175f,
+    0.207111376f, -0.978317371f,
+    0.210111837f, -0.977677358f,
+    0.213110320f, -0.977028143f,
+    0.216106797f, -0.976369731f,
+    0.219101240f, -0.975702130f,
+    0.222093621f, -0.975025345f,
+    0.225083911f, -0.974339383f,
+    0.228072083f, -0.973644250f,
+    0.231058108f, -0.972939952f,
+    0.234041959f, -0.972226497f,
+    0.237023606f, -0.971503891f,
+    0.240003022f, -0.970772141f,
+    0.242980180f, -0.970031253f,
+    0.245955050f, -0.969281235f,
+    0.248927606f, -0.968522094f,
+    0.251897818f, -0.967753837f,
+    0.254865660f, -0.966976471f,
+    0.257831102f, -0.966190003f,
+    0.260794118f, -0.965394442f,
+    0.263754679f, -0.964589793f,
+    0.266712757f, -0.963776066f,
+    0.269668326f, -0.962953267f,
+    0.272621355f, -0.962121404f,
+    0.275571819f, -0.961280486f,
+    0.278519689f, -0.960430519f,
+    0.281464938f, -0.959571513f,
+    0.284407537f, -0.958703475f,
+    0.287347460f, -0.957826413f,
+    0.290284677f, -0.956940336f,
+    0.293219163f, -0.956045251f,
+    0.296150888f, -0.955141168f,
+    0.299079826f, -0.954228095f,
+    0.302005949f, -0.953306040f,
+    0.304929230f, -0.952375013f,
+    0.307849640f, -0.951435021f,
+    0.310767153f, -0.950486074f,
+    0.313681740f, -0.949528181f,
+    0.316593376f, -0.948561350f,
+    0.319502031f, -0.947585591f,
+    0.322407679f, -0.946600913f,
+    0.325310292f, -0.945607325f,
+    0.328209844f, -0.944604837f,
+    0.331106306f, -0.943593458f,
+    0.333999651f, -0.942573198f,
+    0.336889853f, -0.941544065f,
+    0.339776884f, -0.940506071f,
+    0.342660717f, -0.939459224f,
+    0.345541325f, -0.938403534f,
+    0.348418680f, -0.937339012f,
+    0.351292756f, -0.936265667f,
+    0.354163525f, -0.935183510f,
+    0.357030961f, -0.934092550f,
+    0.359895037f, -0.932992799f,
+    0.362755724f, -0.931884266f,
+    0.365612998f, -0.930766961f,
+    0.368466830f, -0.929640896f,
+    0.371317194f, -0.928506080f,
+    0.374164063f, -0.927362526f,
+    0.377007410f, -0.926210242f,
+    0.379847209f, -0.925049241f,
+    0.382683432f, -0.923879533f,
+    0.385516054f, -0.922701128f,
+    0.388345047f, -0.921514039f,
+    0.391170384f, -0.920318277f,
+    0.393992040f, -0.919113852f,
+    0.396809987f, -0.917900776f,
+    0.399624200f, -0.916679060f,
+    0.402434651f, -0.915448716f,
+    0.405241314f, -0.914209756f,
+    0.408044163f, -0.912962190f,
+    0.410843171f, -0.911706032f,
+    0.413638312f, -0.910441292f,
+    0.416429560f, -0.909167983f,
+    0.419216888f, -0.907886116f,
+    0.422000271f, -0.906595705f,
+    0.424779681f, -0.905296759f,
+    0.427555093f, -0.903989293f,
+    0.430326481f, -0.902673318f,
+    0.433093819f, -0.901348847f,
+    0.435857080f, -0.900015892f,
+    0.438616239f, -0.898674466f,
+    0.441371269f, -0.897324581f,
+    0.444122145f, -0.895966250f,
+    0.446868840f, -0.894599486f,
+    0.449611330f, -0.893224301f,
+    0.452349587f, -0.891840709f,
+    0.455083587f, -0.890448723f,
+    0.457813304f, -0.889048356f,
+    0.460538711f, -0.887639620f,
+    0.463259784f, -0.886222530f,
+    0.465976496f, -0.884797098f,
+    0.468688822f, -0.883363339f,
+    0.471396737f, -0.881921264f,
+    0.474100215f, -0.880470889f,
+    0.476799230f, -0.879012226f,
+    0.479493758f, -0.877545290f,
+    0.482183772f, -0.876070094f,
+    0.484869248f, -0.874586652f,
+    0.487550160f, -0.873094978f,
+    0.490226483f, -0.871595087f,
+    0.492898192f, -0.870086991f,
+    0.495565262f, -0.868570706f,
+    0.498227667f, -0.867046246f,
+    0.500885383f, -0.865513624f,
+    0.503538384f, -0.863972856f,
+    0.506186645f, -0.862423956f,
+    0.508830143f, -0.860866939f,
+    0.511468850f, -0.859301818f,
+    0.514102744f, -0.857728610f,
+    0.516731799f, -0.856147328f,
+    0.519355990f, -0.854557988f,
+    0.521975293f, -0.852960605f,
+    0.524589683f, -0.851355193f,
+    0.527199135f, -0.849741768f,
+    0.529803625f, -0.848120345f,
+    0.532403128f, -0.846490939f,
+    0.534997620f, -0.844853565f,
+    0.537587076f, -0.843208240f,
+    0.540171473f, -0.841554977f,
+    0.542750785f, -0.839893794f,
+    0.545324988f, -0.838224706f,
+    0.547894059f, -0.836547727f,
+    0.550457973f, -0.834862875f,
+    0.553016706f, -0.833170165f,
+    0.555570233f, -0.831469612f,
+    0.558118531f, -0.829761234f,
+    0.560661576f, -0.828045045f,
+    0.563199344f, -0.826321063f,
+    0.565731811f, -0.824589303f,
+    0.568258953f, -0.822849781f,
+    0.570780746f, -0.821102515f,
+    0.573297167f, -0.819347520f,
+    0.575808191f, -0.817584813f,
+    0.578313796f, -0.815814411f,
+    0.580813958f, -0.814036330f,
+    0.583308653f, -0.812250587f,
+    0.585797857f, -0.810457198f,
+    0.588281548f, -0.808656182f,
+    0.590759702f, -0.806847554f,
+    0.593232295f, -0.805031331f,
+    0.595699304f, -0.803207531f,
+    0.598160707f, -0.801376172f,
+    0.600616479f, -0.799537269f,
+    0.603066599f, -0.797690841f,
+    0.605511041f, -0.795836905f,
+    0.607949785f, -0.793975478f,
+    0.610382806f, -0.792106577f,
+    0.612810082f, -0.790230221f,
+    0.615231591f, -0.788346428f,
+    0.617647308f, -0.786455214f,
+    0.620057212f, -0.784556597f,
+    0.622461279f, -0.782650596f,
+    0.624859488f, -0.780737229f,
+    0.627251815f, -0.778816512f,
+    0.629638239f, -0.776888466f,
+    0.632018736f, -0.774953107f,
+    0.634393284f, -0.773010453f,
+    0.636761861f, -0.771060524f,
+    0.639124445f, -0.769103338f,
+    0.641481013f, -0.767138912f,
+    0.643831543f, -0.765167266f,
+    0.646176013f, -0.763188417f,
+    0.648514401f, -0.761202385f,
+    0.650846685f, -0.759209189f,
+    0.653172843f, -0.757208847f,
+    0.655492853f, -0.755201377f,
+    0.657806693f, -0.753186799f,
+    0.660114342f, -0.751165132f,
+    0.662415778f, -0.749136395f,
+    0.664710978f, -0.747100606f,
+    0.666999922f, -0.745057785f,
+    0.669282588f, -0.743007952f,
+    0.671558955f, -0.740951125f,
+    0.673829000f, -0.738887324f,
+    0.676092704f, -0.736816569f,
+    0.678350043f, -0.734738878f,
+    0.680600998f, -0.732654272f,
+    0.682845546f, -0.730562769f,
+    0.685083668f, -0.728464390f,
+    0.687315341f, -0.726359155f,
+    0.689540545f, -0.724247083f,
+    0.691759258f, -0.722128194f,
+    0.693971461f, -0.720002508f,
+    0.696177131f, -0.717870045f,
+    0.698376249f, -0.715730825f,
+    0.700568794f, -0.713584869f,
+    0.702754744f, -0.711432196f,
+    0.704934080f, -0.709272826f,
+    0.707106781f, -0.707106781f,
+    0.709272826f, -0.704934080f,
+    0.711432196f, -0.702754744f,
+    0.713584869f, -0.700568794f,
+    0.715730825f, -0.698376249f,
+    0.717870045f, -0.696177131f,
+    0.720002508f, -0.693971461f,
+    0.722128194f, -0.691759258f,
+    0.724247083f, -0.689540545f,
+    0.726359155f, -0.687315341f,
+    0.728464390f, -0.685083668f,
+    0.730562769f, -0.682845546f,
+    0.732654272f, -0.680600998f,
+    0.734738878f, -0.678350043f,
+    0.736816569f, -0.676092704f,
+    0.738887324f, -0.673829000f,
+    0.740951125f, -0.671558955f,
+    0.743007952f, -0.669282588f,
+    0.745057785f, -0.666999922f,
+    0.747100606f, -0.664710978f,
+    0.749136395f, -0.662415778f,
+    0.751165132f, -0.660114342f,
+    0.753186799f, -0.657806693f,
+    0.755201377f, -0.655492853f,
+    0.757208847f, -0.653172843f,
+    0.759209189f, -0.650846685f,
+    0.761202385f, -0.648514401f,
+    0.763188417f, -0.646176013f,
+    0.765167266f, -0.643831543f,
+    0.767138912f, -0.641481013f,
+    0.769103338f, -0.639124445f,
+    0.771060524f, -0.636761861f,
+    0.773010453f, -0.634393284f,
+    0.774953107f, -0.632018736f,
+    0.776888466f, -0.629638239f,
+    0.778816512f, -0.627251815f,
+    0.780737229f, -0.624859488f,
+    0.782650596f, -0.622461279f,
+    0.784556597f, -0.620057212f,
+    0.786455214f, -0.617647308f,
+    0.788346428f, -0.615231591f,
+    0.790230221f, -0.612810082f,
+    0.792106577f, -0.610382806f,
+    0.793975478f, -0.607949785f,
+    0.795836905f, -0.605511041f,
+    0.797690841f, -0.603066599f,
+    0.799537269f, -0.600616479f,
+    0.801376172f, -0.598160707f,
+    0.803207531f, -0.595699304f,
+    0.805031331f, -0.593232295f,
+    0.806847554f, -0.590759702f,
+    0.808656182f, -0.588281548f,
+    0.810457198f, -0.585797857f,
+    0.812250587f, -0.583308653f,
+    0.814036330f, -0.580813958f,
+    0.815814411f, -0.578313796f,
+    0.817584813f, -0.575808191f,
+    0.819347520f, -0.573297167f,
+    0.821102515f, -0.570780746f,
+    0.822849781f, -0.568258953f,
+    0.824589303f, -0.565731811f,
+    0.826321063f, -0.563199344f,
+    0.828045045f, -0.560661576f,
+    0.829761234f, -0.558118531f,
+    0.831469612f, -0.555570233f,
+    0.833170165f, -0.553016706f,
+    0.834862875f, -0.550457973f,
+    0.836547727f, -0.547894059f,
+    0.838224706f, -0.545324988f,
+    0.839893794f, -0.542750785f,
+    0.841554977f, -0.540171473f,
+    0.843208240f, -0.537587076f,
+    0.844853565f, -0.534997620f,
+    0.846490939f, -0.532403128f,
+    0.848120345f, -0.529803625f,
+    0.849741768f, -0.527199135f,
+    0.851355193f, -0.524589683f,
+    0.852960605f, -0.521975293f,
+    0.854557988f, -0.519355990f,
+    0.856147328f, -0.516731799f,
+    0.857728610f, -0.514102744f,
+    0.859301818f, -0.511468850f,
+    0.860866939f, -0.508830143f,
+    0.862423956f, -0.506186645f,
+    0.863972856f, -0.503538384f,
+    0.865513624f, -0.500885383f,
+    0.867046246f, -0.498227667f,
+    0.868570706f, -0.495565262f,
+    0.870086991f, -0.492898192f,
+    0.871595087f, -0.490226483f,
+    0.873094978f, -0.487550160f,
+    0.874586652f, -0.484869248f,
+    0.876070094f, -0.482183772f,
+    0.877545290f, -0.479493758f,
+    0.879012226f, -0.476799230f,
+    0.880470889f, -0.474100215f,
+    0.881921264f, -0.471396737f,
+    0.883363339f, -0.468688822f,
+    0.884797098f, -0.465976496f,
+    0.886222530f, -0.463259784f,
+    0.887639620f, -0.460538711f,
+    0.889048356f, -0.457813304f,
+    0.890448723f, -0.455083587f,
+    0.891840709f, -0.452349587f,
+    0.893224301f, -0.449611330f,
+    0.894599486f, -0.446868840f,
+    0.895966250f, -0.444122145f,
+    0.897324581f, -0.441371269f,
+    0.898674466f, -0.438616239f,
+    0.900015892f, -0.435857080f,
+    0.901348847f, -0.433093819f,
+    0.902673318f, -0.430326481f,
+    0.903989293f, -0.427555093f,
+    0.905296759f, -0.424779681f,
+    0.906595705f, -0.422000271f,
+    0.907886116f, -0.419216888f,
+    0.909167983f, -0.416429560f,
+    0.910441292f, -0.413638312f,
+    0.911706032f, -0.410843171f,
+    0.912962190f, -0.408044163f,
+    0.914209756f, -0.405241314f,
+    0.915448716f, -0.402434651f,
+    0.916679060f, -0.399624200f,
+    0.917900776f, -0.396809987f,
+    0.919113852f, -0.393992040f,
+    0.920318277f, -0.391170384f,
+    0.921514039f, -0.388345047f,
+    0.922701128f, -0.385516054f,
+    0.923879533f, -0.382683432f,
+    0.925049241f, -0.379847209f,
+    0.926210242f, -0.377007410f,
+    0.927362526f, -0.374164063f,
+    0.928506080f, -0.371317194f,
+    0.929640896f, -0.368466830f,
+    0.930766961f, -0.365612998f,
+    0.931884266f, -0.362755724f,
+    0.932992799f, -0.359895037f,
+    0.934092550f, -0.357030961f,
+    0.935183510f, -0.354163525f,
+    0.936265667f, -0.351292756f,
+    0.937339012f, -0.348418680f,
+    0.938403534f, -0.345541325f,
+    0.939459224f, -0.342660717f,
+    0.940506071f, -0.339776884f,
+    0.941544065f, -0.336889853f,
+    0.942573198f, -0.333999651f,
+    0.943593458f, -0.331106306f,
+    0.944604837f, -0.328209844f,
+    0.945607325f, -0.325310292f,
+    0.946600913f, -0.322407679f,
+    0.947585591f, -0.319502031f,
+    0.948561350f, -0.316593376f,
+    0.949528181f, -0.313681740f,
+    0.950486074f, -0.310767153f,
+    0.951435021f, -0.307849640f,
+    0.952375013f, -0.304929230f,
+    0.953306040f, -0.302005949f,
+    0.954228095f, -0.299079826f,
+    0.955141168f, -0.296150888f,
+    0.956045251f, -0.293219163f,
+    0.956940336f, -0.290284677f,
+    0.957826413f, -0.287347460f,
+    0.958703475f, -0.284407537f,
+    0.959571513f, -0.281464938f,
+    0.960430519f, -0.278519689f,
+    0.961280486f, -0.275571819f,
+    0.962121404f, -0.272621355f,
+    0.962953267f, -0.269668326f,
+    0.963776066f, -0.266712757f,
+    0.964589793f, -0.263754679f,
+    0.965394442f, -0.260794118f,
+    0.966190003f, -0.257831102f,
+    0.966976471f, -0.254865660f,
+    0.967753837f, -0.251897818f,
+    0.968522094f, -0.248927606f,
+    0.969281235f, -0.245955050f,
+    0.970031253f, -0.242980180f,
+    0.970772141f, -0.240003022f,
+    0.971503891f, -0.237023606f,
+    0.972226497f, -0.234041959f,
+    0.972939952f, -0.231058108f,
+    0.973644250f, -0.228072083f,
+    0.974339383f, -0.225083911f,
+    0.975025345f, -0.222093621f,
+    0.975702130f, -0.219101240f,
+    0.976369731f, -0.216106797f,
+    0.977028143f, -0.213110320f,
+    0.977677358f, -0.210111837f,
+    0.978317371f, -0.207111376f,
+    0.978948175f, -0.204108966f,
+    0.979569766f, -0.201104635f,
+    0.980182136f, -0.198098411f,
+    0.980785280f, -0.195090322f,
+    0.981379193f, -0.192080397f,
+    0.981963869f, -0.189068664f,
+    0.982539302f, -0.186055152f,
+    0.983105487f, -0.183039888f,
+    0.983662419f, -0.180022901f,
+    0.984210092f, -0.177004220f,
+    0.984748502f, -0.173983873f,
+    0.985277642f, -0.170961889f,
+    0.985797509f, -0.167938295f,
+    0.986308097f, -0.164913120f,
+    0.986809402f, -0.161886394f,
+    0.987301418f, -0.158858143f,
+    0.987784142f, -0.155828398f,
+    0.988257568f, -0.152797185f,
+    0.988721692f, -0.149764535f,
+    0.989176510f, -0.146730474f,
+    0.989622017f, -0.143695033f,
+    0.990058210f, -0.140658239f,
+    0.990485084f, -0.137620122f,
+    0.990902635f, -0.134580709f,
+    0.991310860f, -0.131540029f,
+    0.991709754f, -0.128498111f,
+    0.992099313f, -0.125454983f,
+    0.992479535f, -0.122410675f,
+    0.992850414f, -0.119365215f,
+    0.993211949f, -0.116318631f,
+    0.993564136f, -0.113270952f,
+    0.993906970f, -0.110222207f,
+    0.994240449f, -0.107172425f,
+    0.994564571f, -0.104121634f,
+    0.994879331f, -0.101069863f,
+    0.995184727f, -0.098017140f,
+    0.995480755f, -0.094963495f,
+    0.995767414f, -0.091908956f,
+    0.996044701f, -0.088853553f,
+    0.996312612f, -0.085797312f,
+    0.996571146f, -0.082740265f,
+    0.996820299f, -0.079682438f,
+    0.997060070f, -0.076623861f,
+    0.997290457f, -0.073564564f,
+    0.997511456f, -0.070504573f,
+    0.997723067f, -0.067443920f,
+    0.997925286f, -0.064382631f,
+    0.998118113f, -0.061320736f,
+    0.998301545f, -0.058258265f,
+    0.998475581f, -0.055195244f,
+    0.998640218f, -0.052131705f,
+    0.998795456f, -0.049067674f,
+    0.998941293f, -0.046003182f,
+    0.999077728f, -0.042938257f,
+    0.999204759f, -0.039872928f,
+    0.999322385f, -0.036807223f,
+    0.999430605f, -0.033741172f,
+    0.999529418f, -0.030674803f,
+    0.999618822f, -0.027608146f,
+    0.999698819f, -0.024541229f,
+    0.999769405f, -0.021474080f,
+    0.999830582f, -0.018406730f,
+    0.999882347f, -0.015339206f,
+    0.999924702f, -0.012271538f,
+    0.999957645f, -0.009203755f,
+    0.999981175f, -0.006135885f,
+    0.999995294f, -0.003067957f
+};
+
+/**    
+* \par    
+* Example code for Floating-point Twiddle factors Generation:    
+* \par    
+* <pre>for(i = 0; i< N/; i++)    
+* {    
+*	twiddleCoef[2*i]= cos(i * 2*PI/(float)N);    
+*	twiddleCoef[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 4096	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are in interleaved fashion    
+*     
+*/
+const float32_t twiddleCoef_4096[8192] = {
+    1.000000000f,  0.000000000f,
+    0.999998823f,  0.001533980f,
+    0.999995294f,  0.003067957f,
+    0.999989411f,  0.004601926f,
+    0.999981175f,  0.006135885f,
+    0.999970586f,  0.007669829f,
+    0.999957645f,  0.009203755f,
+    0.999942350f,  0.010737659f,
+    0.999924702f,  0.012271538f,
+    0.999904701f,  0.013805389f,
+    0.999882347f,  0.015339206f,
+    0.999857641f,  0.016872988f,
+    0.999830582f,  0.018406730f,
+    0.999801170f,  0.019940429f,
+    0.999769405f,  0.021474080f,
+    0.999735288f,  0.023007681f,
+    0.999698819f,  0.024541229f,
+    0.999659997f,  0.026074718f,
+    0.999618822f,  0.027608146f,
+    0.999575296f,  0.029141509f,
+    0.999529418f,  0.030674803f,
+    0.999481187f,  0.032208025f,
+    0.999430605f,  0.033741172f,
+    0.999377670f,  0.035274239f,
+    0.999322385f,  0.036807223f,
+    0.999264747f,  0.038340120f,
+    0.999204759f,  0.039872928f,
+    0.999142419f,  0.041405641f,
+    0.999077728f,  0.042938257f,
+    0.999010686f,  0.044470772f,
+    0.998941293f,  0.046003182f,
+    0.998869550f,  0.047535484f,
+    0.998795456f,  0.049067674f,
+    0.998719012f,  0.050599749f,
+    0.998640218f,  0.052131705f,
+    0.998559074f,  0.053663538f,
+    0.998475581f,  0.055195244f,
+    0.998389737f,  0.056726821f,
+    0.998301545f,  0.058258265f,
+    0.998211003f,  0.059789571f,
+    0.998118113f,  0.061320736f,
+    0.998022874f,  0.062851758f,
+    0.997925286f,  0.064382631f,
+    0.997825350f,  0.065913353f,
+    0.997723067f,  0.067443920f,
+    0.997618435f,  0.068974328f,
+    0.997511456f,  0.070504573f,
+    0.997402130f,  0.072034653f,
+    0.997290457f,  0.073564564f,
+    0.997176437f,  0.075094301f,
+    0.997060070f,  0.076623861f,
+    0.996941358f,  0.078153242f,
+    0.996820299f,  0.079682438f,
+    0.996696895f,  0.081211447f,
+    0.996571146f,  0.082740265f,
+    0.996443051f,  0.084268888f,
+    0.996312612f,  0.085797312f,
+    0.996179829f,  0.087325535f,
+    0.996044701f,  0.088853553f,
+    0.995907229f,  0.090381361f,
+    0.995767414f,  0.091908956f,
+    0.995625256f,  0.093436336f,
+    0.995480755f,  0.094963495f,
+    0.995333912f,  0.096490431f,
+    0.995184727f,  0.098017140f,
+    0.995033199f,  0.099543619f,
+    0.994879331f,  0.101069863f,
+    0.994723121f,  0.102595869f,
+    0.994564571f,  0.104121634f,
+    0.994403680f,  0.105647154f,
+    0.994240449f,  0.107172425f,
+    0.994074879f,  0.108697444f,
+    0.993906970f,  0.110222207f,
+    0.993736722f,  0.111746711f,
+    0.993564136f,  0.113270952f,
+    0.993389211f,  0.114794927f,
+    0.993211949f,  0.116318631f,
+    0.993032350f,  0.117842062f,
+    0.992850414f,  0.119365215f,
+    0.992666142f,  0.120888087f,
+    0.992479535f,  0.122410675f,
+    0.992290591f,  0.123932975f,
+    0.992099313f,  0.125454983f,
+    0.991905700f,  0.126976696f,
+    0.991709754f,  0.128498111f,
+    0.991511473f,  0.130019223f,
+    0.991310860f,  0.131540029f,
+    0.991107914f,  0.133060525f,
+    0.990902635f,  0.134580709f,
+    0.990695025f,  0.136100575f,
+    0.990485084f,  0.137620122f,
+    0.990272812f,  0.139139344f,
+    0.990058210f,  0.140658239f,
+    0.989841278f,  0.142176804f,
+    0.989622017f,  0.143695033f,
+    0.989400428f,  0.145212925f,
+    0.989176510f,  0.146730474f,
+    0.988950265f,  0.148247679f,
+    0.988721692f,  0.149764535f,
+    0.988490793f,  0.151281038f,
+    0.988257568f,  0.152797185f,
+    0.988022017f,  0.154312973f,
+    0.987784142f,  0.155828398f,
+    0.987543942f,  0.157343456f,
+    0.987301418f,  0.158858143f,
+    0.987056571f,  0.160372457f,
+    0.986809402f,  0.161886394f,
+    0.986559910f,  0.163399949f,
+    0.986308097f,  0.164913120f,
+    0.986053963f,  0.166425904f,
+    0.985797509f,  0.167938295f,
+    0.985538735f,  0.169450291f,
+    0.985277642f,  0.170961889f,
+    0.985014231f,  0.172473084f,
+    0.984748502f,  0.173983873f,
+    0.984480455f,  0.175494253f,
+    0.984210092f,  0.177004220f,
+    0.983937413f,  0.178513771f,
+    0.983662419f,  0.180022901f,
+    0.983385110f,  0.181531608f,
+    0.983105487f,  0.183039888f,
+    0.982823551f,  0.184547737f,
+    0.982539302f,  0.186055152f,
+    0.982252741f,  0.187562129f,
+    0.981963869f,  0.189068664f,
+    0.981672686f,  0.190574755f,
+    0.981379193f,  0.192080397f,
+    0.981083391f,  0.193585587f,
+    0.980785280f,  0.195090322f,
+    0.980484862f,  0.196594598f,
+    0.980182136f,  0.198098411f,
+    0.979877104f,  0.199601758f,
+    0.979569766f,  0.201104635f,
+    0.979260123f,  0.202607039f,
+    0.978948175f,  0.204108966f,
+    0.978633924f,  0.205610413f,
+    0.978317371f,  0.207111376f,
+    0.977998515f,  0.208611852f,
+    0.977677358f,  0.210111837f,
+    0.977353900f,  0.211611327f,
+    0.977028143f,  0.213110320f,
+    0.976700086f,  0.214608811f,
+    0.976369731f,  0.216106797f,
+    0.976037079f,  0.217604275f,
+    0.975702130f,  0.219101240f,
+    0.975364885f,  0.220597690f,
+    0.975025345f,  0.222093621f,
+    0.974683511f,  0.223589029f,
+    0.974339383f,  0.225083911f,
+    0.973992962f,  0.226578264f,
+    0.973644250f,  0.228072083f,
+    0.973293246f,  0.229565366f,
+    0.972939952f,  0.231058108f,
+    0.972584369f,  0.232550307f,
+    0.972226497f,  0.234041959f,
+    0.971866337f,  0.235533059f,
+    0.971503891f,  0.237023606f,
+    0.971139158f,  0.238513595f,
+    0.970772141f,  0.240003022f,
+    0.970402839f,  0.241491885f,
+    0.970031253f,  0.242980180f,
+    0.969657385f,  0.244467903f,
+    0.969281235f,  0.245955050f,
+    0.968902805f,  0.247441619f,
+    0.968522094f,  0.248927606f,
+    0.968139105f,  0.250413007f,
+    0.967753837f,  0.251897818f,
+    0.967366292f,  0.253382037f,
+    0.966976471f,  0.254865660f,
+    0.966584374f,  0.256348682f,
+    0.966190003f,  0.257831102f,
+    0.965793359f,  0.259312915f,
+    0.965394442f,  0.260794118f,
+    0.964993253f,  0.262274707f,
+    0.964589793f,  0.263754679f,
+    0.964184064f,  0.265234030f,
+    0.963776066f,  0.266712757f,
+    0.963365800f,  0.268190857f,
+    0.962953267f,  0.269668326f,
+    0.962538468f,  0.271145160f,
+    0.962121404f,  0.272621355f,
+    0.961702077f,  0.274096910f,
+    0.961280486f,  0.275571819f,
+    0.960856633f,  0.277046080f,
+    0.960430519f,  0.278519689f,
+    0.960002146f,  0.279992643f,
+    0.959571513f,  0.281464938f,
+    0.959138622f,  0.282936570f,
+    0.958703475f,  0.284407537f,
+    0.958266071f,  0.285877835f,
+    0.957826413f,  0.287347460f,
+    0.957384501f,  0.288816408f,
+    0.956940336f,  0.290284677f,
+    0.956493919f,  0.291752263f,
+    0.956045251f,  0.293219163f,
+    0.955594334f,  0.294685372f,
+    0.955141168f,  0.296150888f,
+    0.954685755f,  0.297615707f,
+    0.954228095f,  0.299079826f,
+    0.953768190f,  0.300543241f,
+    0.953306040f,  0.302005949f,
+    0.952841648f,  0.303467947f,
+    0.952375013f,  0.304929230f,
+    0.951906137f,  0.306389795f,
+    0.951435021f,  0.307849640f,
+    0.950961666f,  0.309308760f,
+    0.950486074f,  0.310767153f,
+    0.950008245f,  0.312224814f,
+    0.949528181f,  0.313681740f,
+    0.949045882f,  0.315137929f,
+    0.948561350f,  0.316593376f,
+    0.948074586f,  0.318048077f,
+    0.947585591f,  0.319502031f,
+    0.947094366f,  0.320955232f,
+    0.946600913f,  0.322407679f,
+    0.946105232f,  0.323859367f,
+    0.945607325f,  0.325310292f,
+    0.945107193f,  0.326760452f,
+    0.944604837f,  0.328209844f,
+    0.944100258f,  0.329658463f,
+    0.943593458f,  0.331106306f,
+    0.943084437f,  0.332553370f,
+    0.942573198f,  0.333999651f,
+    0.942059740f,  0.335445147f,
+    0.941544065f,  0.336889853f,
+    0.941026175f,  0.338333767f,
+    0.940506071f,  0.339776884f,
+    0.939983753f,  0.341219202f,
+    0.939459224f,  0.342660717f,
+    0.938932484f,  0.344101426f,
+    0.938403534f,  0.345541325f,
+    0.937872376f,  0.346980411f,
+    0.937339012f,  0.348418680f,
+    0.936803442f,  0.349856130f,
+    0.936265667f,  0.351292756f,
+    0.935725689f,  0.352728556f,
+    0.935183510f,  0.354163525f,
+    0.934639130f,  0.355597662f,
+    0.934092550f,  0.357030961f,
+    0.933543773f,  0.358463421f,
+    0.932992799f,  0.359895037f,
+    0.932439629f,  0.361325806f,
+    0.931884266f,  0.362755724f,
+    0.931326709f,  0.364184790f,
+    0.930766961f,  0.365612998f,
+    0.930205023f,  0.367040346f,
+    0.929640896f,  0.368466830f,
+    0.929074581f,  0.369892447f,
+    0.928506080f,  0.371317194f,
+    0.927935395f,  0.372741067f,
+    0.927362526f,  0.374164063f,
+    0.926787474f,  0.375586178f,
+    0.926210242f,  0.377007410f,
+    0.925630831f,  0.378427755f,
+    0.925049241f,  0.379847209f,
+    0.924465474f,  0.381265769f,
+    0.923879533f,  0.382683432f,
+    0.923291417f,  0.384100195f,
+    0.922701128f,  0.385516054f,
+    0.922108669f,  0.386931006f,
+    0.921514039f,  0.388345047f,
+    0.920917242f,  0.389758174f,
+    0.920318277f,  0.391170384f,
+    0.919717146f,  0.392581674f,
+    0.919113852f,  0.393992040f,
+    0.918508394f,  0.395401479f,
+    0.917900776f,  0.396809987f,
+    0.917290997f,  0.398217562f,
+    0.916679060f,  0.399624200f,
+    0.916064966f,  0.401029897f,
+    0.915448716f,  0.402434651f,
+    0.914830312f,  0.403838458f,
+    0.914209756f,  0.405241314f,
+    0.913587048f,  0.406643217f,
+    0.912962190f,  0.408044163f,
+    0.912335185f,  0.409444149f,
+    0.911706032f,  0.410843171f,
+    0.911074734f,  0.412241227f,
+    0.910441292f,  0.413638312f,
+    0.909805708f,  0.415034424f,
+    0.909167983f,  0.416429560f,
+    0.908528119f,  0.417823716f,
+    0.907886116f,  0.419216888f,
+    0.907241978f,  0.420609074f,
+    0.906595705f,  0.422000271f,
+    0.905947298f,  0.423390474f,
+    0.905296759f,  0.424779681f,
+    0.904644091f,  0.426167889f,
+    0.903989293f,  0.427555093f,
+    0.903332368f,  0.428941292f,
+    0.902673318f,  0.430326481f,
+    0.902012144f,  0.431710658f,
+    0.901348847f,  0.433093819f,
+    0.900683429f,  0.434475961f,
+    0.900015892f,  0.435857080f,
+    0.899346237f,  0.437237174f,
+    0.898674466f,  0.438616239f,
+    0.898000580f,  0.439994271f,
+    0.897324581f,  0.441371269f,
+    0.896646470f,  0.442747228f,
+    0.895966250f,  0.444122145f,
+    0.895283921f,  0.445496017f,
+    0.894599486f,  0.446868840f,
+    0.893912945f,  0.448240612f,
+    0.893224301f,  0.449611330f,
+    0.892533555f,  0.450980989f,
+    0.891840709f,  0.452349587f,
+    0.891145765f,  0.453717121f,
+    0.890448723f,  0.455083587f,
+    0.889749586f,  0.456448982f,
+    0.889048356f,  0.457813304f,
+    0.888345033f,  0.459176548f,
+    0.887639620f,  0.460538711f,
+    0.886932119f,  0.461899791f,
+    0.886222530f,  0.463259784f,
+    0.885510856f,  0.464618686f,
+    0.884797098f,  0.465976496f,
+    0.884081259f,  0.467333209f,
+    0.883363339f,  0.468688822f,
+    0.882643340f,  0.470043332f,
+    0.881921264f,  0.471396737f,
+    0.881197113f,  0.472749032f,
+    0.880470889f,  0.474100215f,
+    0.879742593f,  0.475450282f,
+    0.879012226f,  0.476799230f,
+    0.878279792f,  0.478147056f,
+    0.877545290f,  0.479493758f,
+    0.876808724f,  0.480839331f,
+    0.876070094f,  0.482183772f,
+    0.875329403f,  0.483527079f,
+    0.874586652f,  0.484869248f,
+    0.873841843f,  0.486210276f,
+    0.873094978f,  0.487550160f,
+    0.872346059f,  0.488888897f,
+    0.871595087f,  0.490226483f,
+    0.870842063f,  0.491562916f,
+    0.870086991f,  0.492898192f,
+    0.869329871f,  0.494232309f,
+    0.868570706f,  0.495565262f,
+    0.867809497f,  0.496897049f,
+    0.867046246f,  0.498227667f,
+    0.866280954f,  0.499557113f,
+    0.865513624f,  0.500885383f,
+    0.864744258f,  0.502212474f,
+    0.863972856f,  0.503538384f,
+    0.863199422f,  0.504863109f,
+    0.862423956f,  0.506186645f,
+    0.861646461f,  0.507508991f,
+    0.860866939f,  0.508830143f,
+    0.860085390f,  0.510150097f,
+    0.859301818f,  0.511468850f,
+    0.858516224f,  0.512786401f,
+    0.857728610f,  0.514102744f,
+    0.856938977f,  0.515417878f,
+    0.856147328f,  0.516731799f,
+    0.855353665f,  0.518044504f,
+    0.854557988f,  0.519355990f,
+    0.853760301f,  0.520666254f,
+    0.852960605f,  0.521975293f,
+    0.852158902f,  0.523283103f,
+    0.851355193f,  0.524589683f,
+    0.850549481f,  0.525895027f,
+    0.849741768f,  0.527199135f,
+    0.848932055f,  0.528502002f,
+    0.848120345f,  0.529803625f,
+    0.847306639f,  0.531104001f,
+    0.846490939f,  0.532403128f,
+    0.845673247f,  0.533701002f,
+    0.844853565f,  0.534997620f,
+    0.844031895f,  0.536292979f,
+    0.843208240f,  0.537587076f,
+    0.842382600f,  0.538879909f,
+    0.841554977f,  0.540171473f,
+    0.840725375f,  0.541461766f,
+    0.839893794f,  0.542750785f,
+    0.839060237f,  0.544038527f,
+    0.838224706f,  0.545324988f,
+    0.837387202f,  0.546610167f,
+    0.836547727f,  0.547894059f,
+    0.835706284f,  0.549176662f,
+    0.834862875f,  0.550457973f,
+    0.834017501f,  0.551737988f,
+    0.833170165f,  0.553016706f,
+    0.832320868f,  0.554294121f,
+    0.831469612f,  0.555570233f,
+    0.830616400f,  0.556845037f,
+    0.829761234f,  0.558118531f,
+    0.828904115f,  0.559390712f,
+    0.828045045f,  0.560661576f,
+    0.827184027f,  0.561931121f,
+    0.826321063f,  0.563199344f,
+    0.825456154f,  0.564466242f,
+    0.824589303f,  0.565731811f,
+    0.823720511f,  0.566996049f,
+    0.822849781f,  0.568258953f,
+    0.821977115f,  0.569520519f,
+    0.821102515f,  0.570780746f,
+    0.820225983f,  0.572039629f,
+    0.819347520f,  0.573297167f,
+    0.818467130f,  0.574553355f,
+    0.817584813f,  0.575808191f,
+    0.816700573f,  0.577061673f,
+    0.815814411f,  0.578313796f,
+    0.814926329f,  0.579564559f,
+    0.814036330f,  0.580813958f,
+    0.813144415f,  0.582061990f,
+    0.812250587f,  0.583308653f,
+    0.811354847f,  0.584553943f,
+    0.810457198f,  0.585797857f,
+    0.809557642f,  0.587040394f,
+    0.808656182f,  0.588281548f,
+    0.807752818f,  0.589521319f,
+    0.806847554f,  0.590759702f,
+    0.805940391f,  0.591996695f,
+    0.805031331f,  0.593232295f,
+    0.804120377f,  0.594466499f,
+    0.803207531f,  0.595699304f,
+    0.802292796f,  0.596930708f,
+    0.801376172f,  0.598160707f,
+    0.800457662f,  0.599389298f,
+    0.799537269f,  0.600616479f,
+    0.798614995f,  0.601842247f,
+    0.797690841f,  0.603066599f,
+    0.796764810f,  0.604289531f,
+    0.795836905f,  0.605511041f,
+    0.794907126f,  0.606731127f,
+    0.793975478f,  0.607949785f,
+    0.793041960f,  0.609167012f,
+    0.792106577f,  0.610382806f,
+    0.791169330f,  0.611597164f,
+    0.790230221f,  0.612810082f,
+    0.789289253f,  0.614021559f,
+    0.788346428f,  0.615231591f,
+    0.787401747f,  0.616440175f,
+    0.786455214f,  0.617647308f,
+    0.785506830f,  0.618852988f,
+    0.784556597f,  0.620057212f,
+    0.783604519f,  0.621259977f,
+    0.782650596f,  0.622461279f,
+    0.781694832f,  0.623661118f,
+    0.780737229f,  0.624859488f,
+    0.779777788f,  0.626056388f,
+    0.778816512f,  0.627251815f,
+    0.777853404f,  0.628445767f,
+    0.776888466f,  0.629638239f,
+    0.775921699f,  0.630829230f,
+    0.774953107f,  0.632018736f,
+    0.773982691f,  0.633206755f,
+    0.773010453f,  0.634393284f,
+    0.772036397f,  0.635578320f,
+    0.771060524f,  0.636761861f,
+    0.770082837f,  0.637943904f,
+    0.769103338f,  0.639124445f,
+    0.768122029f,  0.640303482f,
+    0.767138912f,  0.641481013f,
+    0.766153990f,  0.642657034f,
+    0.765167266f,  0.643831543f,
+    0.764178741f,  0.645004537f,
+    0.763188417f,  0.646176013f,
+    0.762196298f,  0.647345969f,
+    0.761202385f,  0.648514401f,
+    0.760206682f,  0.649681307f,
+    0.759209189f,  0.650846685f,
+    0.758209910f,  0.652010531f,
+    0.757208847f,  0.653172843f,
+    0.756206001f,  0.654333618f,
+    0.755201377f,  0.655492853f,
+    0.754194975f,  0.656650546f,
+    0.753186799f,  0.657806693f,
+    0.752176850f,  0.658961293f,
+    0.751165132f,  0.660114342f,
+    0.750151646f,  0.661265838f,
+    0.749136395f,  0.662415778f,
+    0.748119380f,  0.663564159f,
+    0.747100606f,  0.664710978f,
+    0.746080074f,  0.665856234f,
+    0.745057785f,  0.666999922f,
+    0.744033744f,  0.668142041f,
+    0.743007952f,  0.669282588f,
+    0.741980412f,  0.670421560f,
+    0.740951125f,  0.671558955f,
+    0.739920095f,  0.672694769f,
+    0.738887324f,  0.673829000f,
+    0.737852815f,  0.674961646f,
+    0.736816569f,  0.676092704f,
+    0.735778589f,  0.677222170f,
+    0.734738878f,  0.678350043f,
+    0.733697438f,  0.679476320f,
+    0.732654272f,  0.680600998f,
+    0.731609381f,  0.681724074f,
+    0.730562769f,  0.682845546f,
+    0.729514438f,  0.683965412f,
+    0.728464390f,  0.685083668f,
+    0.727412629f,  0.686200312f,
+    0.726359155f,  0.687315341f,
+    0.725303972f,  0.688428753f,
+    0.724247083f,  0.689540545f,
+    0.723188489f,  0.690650714f,
+    0.722128194f,  0.691759258f,
+    0.721066199f,  0.692866175f,
+    0.720002508f,  0.693971461f,
+    0.718937122f,  0.695075114f,
+    0.717870045f,  0.696177131f,
+    0.716801279f,  0.697277511f,
+    0.715730825f,  0.698376249f,
+    0.714658688f,  0.699473345f,
+    0.713584869f,  0.700568794f,
+    0.712509371f,  0.701662595f,
+    0.711432196f,  0.702754744f,
+    0.710353347f,  0.703845241f,
+    0.709272826f,  0.704934080f,
+    0.708190637f,  0.706021261f,
+    0.707106781f,  0.707106781f,
+    0.706021261f,  0.708190637f,
+    0.704934080f,  0.709272826f,
+    0.703845241f,  0.710353347f,
+    0.702754744f,  0.711432196f,
+    0.701662595f,  0.712509371f,
+    0.700568794f,  0.713584869f,
+    0.699473345f,  0.714658688f,
+    0.698376249f,  0.715730825f,
+    0.697277511f,  0.716801279f,
+    0.696177131f,  0.717870045f,
+    0.695075114f,  0.718937122f,
+    0.693971461f,  0.720002508f,
+    0.692866175f,  0.721066199f,
+    0.691759258f,  0.722128194f,
+    0.690650714f,  0.723188489f,
+    0.689540545f,  0.724247083f,
+    0.688428753f,  0.725303972f,
+    0.687315341f,  0.726359155f,
+    0.686200312f,  0.727412629f,
+    0.685083668f,  0.728464390f,
+    0.683965412f,  0.729514438f,
+    0.682845546f,  0.730562769f,
+    0.681724074f,  0.731609381f,
+    0.680600998f,  0.732654272f,
+    0.679476320f,  0.733697438f,
+    0.678350043f,  0.734738878f,
+    0.677222170f,  0.735778589f,
+    0.676092704f,  0.736816569f,
+    0.674961646f,  0.737852815f,
+    0.673829000f,  0.738887324f,
+    0.672694769f,  0.739920095f,
+    0.671558955f,  0.740951125f,
+    0.670421560f,  0.741980412f,
+    0.669282588f,  0.743007952f,
+    0.668142041f,  0.744033744f,
+    0.666999922f,  0.745057785f,
+    0.665856234f,  0.746080074f,
+    0.664710978f,  0.747100606f,
+    0.663564159f,  0.748119380f,
+    0.662415778f,  0.749136395f,
+    0.661265838f,  0.750151646f,
+    0.660114342f,  0.751165132f,
+    0.658961293f,  0.752176850f,
+    0.657806693f,  0.753186799f,
+    0.656650546f,  0.754194975f,
+    0.655492853f,  0.755201377f,
+    0.654333618f,  0.756206001f,
+    0.653172843f,  0.757208847f,
+    0.652010531f,  0.758209910f,
+    0.650846685f,  0.759209189f,
+    0.649681307f,  0.760206682f,
+    0.648514401f,  0.761202385f,
+    0.647345969f,  0.762196298f,
+    0.646176013f,  0.763188417f,
+    0.645004537f,  0.764178741f,
+    0.643831543f,  0.765167266f,
+    0.642657034f,  0.766153990f,
+    0.641481013f,  0.767138912f,
+    0.640303482f,  0.768122029f,
+    0.639124445f,  0.769103338f,
+    0.637943904f,  0.770082837f,
+    0.636761861f,  0.771060524f,
+    0.635578320f,  0.772036397f,
+    0.634393284f,  0.773010453f,
+    0.633206755f,  0.773982691f,
+    0.632018736f,  0.774953107f,
+    0.630829230f,  0.775921699f,
+    0.629638239f,  0.776888466f,
+    0.628445767f,  0.777853404f,
+    0.627251815f,  0.778816512f,
+    0.626056388f,  0.779777788f,
+    0.624859488f,  0.780737229f,
+    0.623661118f,  0.781694832f,
+    0.622461279f,  0.782650596f,
+    0.621259977f,  0.783604519f,
+    0.620057212f,  0.784556597f,
+    0.618852988f,  0.785506830f,
+    0.617647308f,  0.786455214f,
+    0.616440175f,  0.787401747f,
+    0.615231591f,  0.788346428f,
+    0.614021559f,  0.789289253f,
+    0.612810082f,  0.790230221f,
+    0.611597164f,  0.791169330f,
+    0.610382806f,  0.792106577f,
+    0.609167012f,  0.793041960f,
+    0.607949785f,  0.793975478f,
+    0.606731127f,  0.794907126f,
+    0.605511041f,  0.795836905f,
+    0.604289531f,  0.796764810f,
+    0.603066599f,  0.797690841f,
+    0.601842247f,  0.798614995f,
+    0.600616479f,  0.799537269f,
+    0.599389298f,  0.800457662f,
+    0.598160707f,  0.801376172f,
+    0.596930708f,  0.802292796f,
+    0.595699304f,  0.803207531f,
+    0.594466499f,  0.804120377f,
+    0.593232295f,  0.805031331f,
+    0.591996695f,  0.805940391f,
+    0.590759702f,  0.806847554f,
+    0.589521319f,  0.807752818f,
+    0.588281548f,  0.808656182f,
+    0.587040394f,  0.809557642f,
+    0.585797857f,  0.810457198f,
+    0.584553943f,  0.811354847f,
+    0.583308653f,  0.812250587f,
+    0.582061990f,  0.813144415f,
+    0.580813958f,  0.814036330f,
+    0.579564559f,  0.814926329f,
+    0.578313796f,  0.815814411f,
+    0.577061673f,  0.816700573f,
+    0.575808191f,  0.817584813f,
+    0.574553355f,  0.818467130f,
+    0.573297167f,  0.819347520f,
+    0.572039629f,  0.820225983f,
+    0.570780746f,  0.821102515f,
+    0.569520519f,  0.821977115f,
+    0.568258953f,  0.822849781f,
+    0.566996049f,  0.823720511f,
+    0.565731811f,  0.824589303f,
+    0.564466242f,  0.825456154f,
+    0.563199344f,  0.826321063f,
+    0.561931121f,  0.827184027f,
+    0.560661576f,  0.828045045f,
+    0.559390712f,  0.828904115f,
+    0.558118531f,  0.829761234f,
+    0.556845037f,  0.830616400f,
+    0.555570233f,  0.831469612f,
+    0.554294121f,  0.832320868f,
+    0.553016706f,  0.833170165f,
+    0.551737988f,  0.834017501f,
+    0.550457973f,  0.834862875f,
+    0.549176662f,  0.835706284f,
+    0.547894059f,  0.836547727f,
+    0.546610167f,  0.837387202f,
+    0.545324988f,  0.838224706f,
+    0.544038527f,  0.839060237f,
+    0.542750785f,  0.839893794f,
+    0.541461766f,  0.840725375f,
+    0.540171473f,  0.841554977f,
+    0.538879909f,  0.842382600f,
+    0.537587076f,  0.843208240f,
+    0.536292979f,  0.844031895f,
+    0.534997620f,  0.844853565f,
+    0.533701002f,  0.845673247f,
+    0.532403128f,  0.846490939f,
+    0.531104001f,  0.847306639f,
+    0.529803625f,  0.848120345f,
+    0.528502002f,  0.848932055f,
+    0.527199135f,  0.849741768f,
+    0.525895027f,  0.850549481f,
+    0.524589683f,  0.851355193f,
+    0.523283103f,  0.852158902f,
+    0.521975293f,  0.852960605f,
+    0.520666254f,  0.853760301f,
+    0.519355990f,  0.854557988f,
+    0.518044504f,  0.855353665f,
+    0.516731799f,  0.856147328f,
+    0.515417878f,  0.856938977f,
+    0.514102744f,  0.857728610f,
+    0.512786401f,  0.858516224f,
+    0.511468850f,  0.859301818f,
+    0.510150097f,  0.860085390f,
+    0.508830143f,  0.860866939f,
+    0.507508991f,  0.861646461f,
+    0.506186645f,  0.862423956f,
+    0.504863109f,  0.863199422f,
+    0.503538384f,  0.863972856f,
+    0.502212474f,  0.864744258f,
+    0.500885383f,  0.865513624f,
+    0.499557113f,  0.866280954f,
+    0.498227667f,  0.867046246f,
+    0.496897049f,  0.867809497f,
+    0.495565262f,  0.868570706f,
+    0.494232309f,  0.869329871f,
+    0.492898192f,  0.870086991f,
+    0.491562916f,  0.870842063f,
+    0.490226483f,  0.871595087f,
+    0.488888897f,  0.872346059f,
+    0.487550160f,  0.873094978f,
+    0.486210276f,  0.873841843f,
+    0.484869248f,  0.874586652f,
+    0.483527079f,  0.875329403f,
+    0.482183772f,  0.876070094f,
+    0.480839331f,  0.876808724f,
+    0.479493758f,  0.877545290f,
+    0.478147056f,  0.878279792f,
+    0.476799230f,  0.879012226f,
+    0.475450282f,  0.879742593f,
+    0.474100215f,  0.880470889f,
+    0.472749032f,  0.881197113f,
+    0.471396737f,  0.881921264f,
+    0.470043332f,  0.882643340f,
+    0.468688822f,  0.883363339f,
+    0.467333209f,  0.884081259f,
+    0.465976496f,  0.884797098f,
+    0.464618686f,  0.885510856f,
+    0.463259784f,  0.886222530f,
+    0.461899791f,  0.886932119f,
+    0.460538711f,  0.887639620f,
+    0.459176548f,  0.888345033f,
+    0.457813304f,  0.889048356f,
+    0.456448982f,  0.889749586f,
+    0.455083587f,  0.890448723f,
+    0.453717121f,  0.891145765f,
+    0.452349587f,  0.891840709f,
+    0.450980989f,  0.892533555f,
+    0.449611330f,  0.893224301f,
+    0.448240612f,  0.893912945f,
+    0.446868840f,  0.894599486f,
+    0.445496017f,  0.895283921f,
+    0.444122145f,  0.895966250f,
+    0.442747228f,  0.896646470f,
+    0.441371269f,  0.897324581f,
+    0.439994271f,  0.898000580f,
+    0.438616239f,  0.898674466f,
+    0.437237174f,  0.899346237f,
+    0.435857080f,  0.900015892f,
+    0.434475961f,  0.900683429f,
+    0.433093819f,  0.901348847f,
+    0.431710658f,  0.902012144f,
+    0.430326481f,  0.902673318f,
+    0.428941292f,  0.903332368f,
+    0.427555093f,  0.903989293f,
+    0.426167889f,  0.904644091f,
+    0.424779681f,  0.905296759f,
+    0.423390474f,  0.905947298f,
+    0.422000271f,  0.906595705f,
+    0.420609074f,  0.907241978f,
+    0.419216888f,  0.907886116f,
+    0.417823716f,  0.908528119f,
+    0.416429560f,  0.909167983f,
+    0.415034424f,  0.909805708f,
+    0.413638312f,  0.910441292f,
+    0.412241227f,  0.911074734f,
+    0.410843171f,  0.911706032f,
+    0.409444149f,  0.912335185f,
+    0.408044163f,  0.912962190f,
+    0.406643217f,  0.913587048f,
+    0.405241314f,  0.914209756f,
+    0.403838458f,  0.914830312f,
+    0.402434651f,  0.915448716f,
+    0.401029897f,  0.916064966f,
+    0.399624200f,  0.916679060f,
+    0.398217562f,  0.917290997f,
+    0.396809987f,  0.917900776f,
+    0.395401479f,  0.918508394f,
+    0.393992040f,  0.919113852f,
+    0.392581674f,  0.919717146f,
+    0.391170384f,  0.920318277f,
+    0.389758174f,  0.920917242f,
+    0.388345047f,  0.921514039f,
+    0.386931006f,  0.922108669f,
+    0.385516054f,  0.922701128f,
+    0.384100195f,  0.923291417f,
+    0.382683432f,  0.923879533f,
+    0.381265769f,  0.924465474f,
+    0.379847209f,  0.925049241f,
+    0.378427755f,  0.925630831f,
+    0.377007410f,  0.926210242f,
+    0.375586178f,  0.926787474f,
+    0.374164063f,  0.927362526f,
+    0.372741067f,  0.927935395f,
+    0.371317194f,  0.928506080f,
+    0.369892447f,  0.929074581f,
+    0.368466830f,  0.929640896f,
+    0.367040346f,  0.930205023f,
+    0.365612998f,  0.930766961f,
+    0.364184790f,  0.931326709f,
+    0.362755724f,  0.931884266f,
+    0.361325806f,  0.932439629f,
+    0.359895037f,  0.932992799f,
+    0.358463421f,  0.933543773f,
+    0.357030961f,  0.934092550f,
+    0.355597662f,  0.934639130f,
+    0.354163525f,  0.935183510f,
+    0.352728556f,  0.935725689f,
+    0.351292756f,  0.936265667f,
+    0.349856130f,  0.936803442f,
+    0.348418680f,  0.937339012f,
+    0.346980411f,  0.937872376f,
+    0.345541325f,  0.938403534f,
+    0.344101426f,  0.938932484f,
+    0.342660717f,  0.939459224f,
+    0.341219202f,  0.939983753f,
+    0.339776884f,  0.940506071f,
+    0.338333767f,  0.941026175f,
+    0.336889853f,  0.941544065f,
+    0.335445147f,  0.942059740f,
+    0.333999651f,  0.942573198f,
+    0.332553370f,  0.943084437f,
+    0.331106306f,  0.943593458f,
+    0.329658463f,  0.944100258f,
+    0.328209844f,  0.944604837f,
+    0.326760452f,  0.945107193f,
+    0.325310292f,  0.945607325f,
+    0.323859367f,  0.946105232f,
+    0.322407679f,  0.946600913f,
+    0.320955232f,  0.947094366f,
+    0.319502031f,  0.947585591f,
+    0.318048077f,  0.948074586f,
+    0.316593376f,  0.948561350f,
+    0.315137929f,  0.949045882f,
+    0.313681740f,  0.949528181f,
+    0.312224814f,  0.950008245f,
+    0.310767153f,  0.950486074f,
+    0.309308760f,  0.950961666f,
+    0.307849640f,  0.951435021f,
+    0.306389795f,  0.951906137f,
+    0.304929230f,  0.952375013f,
+    0.303467947f,  0.952841648f,
+    0.302005949f,  0.953306040f,
+    0.300543241f,  0.953768190f,
+    0.299079826f,  0.954228095f,
+    0.297615707f,  0.954685755f,
+    0.296150888f,  0.955141168f,
+    0.294685372f,  0.955594334f,
+    0.293219163f,  0.956045251f,
+    0.291752263f,  0.956493919f,
+    0.290284677f,  0.956940336f,
+    0.288816408f,  0.957384501f,
+    0.287347460f,  0.957826413f,
+    0.285877835f,  0.958266071f,
+    0.284407537f,  0.958703475f,
+    0.282936570f,  0.959138622f,
+    0.281464938f,  0.959571513f,
+    0.279992643f,  0.960002146f,
+    0.278519689f,  0.960430519f,
+    0.277046080f,  0.960856633f,
+    0.275571819f,  0.961280486f,
+    0.274096910f,  0.961702077f,
+    0.272621355f,  0.962121404f,
+    0.271145160f,  0.962538468f,
+    0.269668326f,  0.962953267f,
+    0.268190857f,  0.963365800f,
+    0.266712757f,  0.963776066f,
+    0.265234030f,  0.964184064f,
+    0.263754679f,  0.964589793f,
+    0.262274707f,  0.964993253f,
+    0.260794118f,  0.965394442f,
+    0.259312915f,  0.965793359f,
+    0.257831102f,  0.966190003f,
+    0.256348682f,  0.966584374f,
+    0.254865660f,  0.966976471f,
+    0.253382037f,  0.967366292f,
+    0.251897818f,  0.967753837f,
+    0.250413007f,  0.968139105f,
+    0.248927606f,  0.968522094f,
+    0.247441619f,  0.968902805f,
+    0.245955050f,  0.969281235f,
+    0.244467903f,  0.969657385f,
+    0.242980180f,  0.970031253f,
+    0.241491885f,  0.970402839f,
+    0.240003022f,  0.970772141f,
+    0.238513595f,  0.971139158f,
+    0.237023606f,  0.971503891f,
+    0.235533059f,  0.971866337f,
+    0.234041959f,  0.972226497f,
+    0.232550307f,  0.972584369f,
+    0.231058108f,  0.972939952f,
+    0.229565366f,  0.973293246f,
+    0.228072083f,  0.973644250f,
+    0.226578264f,  0.973992962f,
+    0.225083911f,  0.974339383f,
+    0.223589029f,  0.974683511f,
+    0.222093621f,  0.975025345f,
+    0.220597690f,  0.975364885f,
+    0.219101240f,  0.975702130f,
+    0.217604275f,  0.976037079f,
+    0.216106797f,  0.976369731f,
+    0.214608811f,  0.976700086f,
+    0.213110320f,  0.977028143f,
+    0.211611327f,  0.977353900f,
+    0.210111837f,  0.977677358f,
+    0.208611852f,  0.977998515f,
+    0.207111376f,  0.978317371f,
+    0.205610413f,  0.978633924f,
+    0.204108966f,  0.978948175f,
+    0.202607039f,  0.979260123f,
+    0.201104635f,  0.979569766f,
+    0.199601758f,  0.979877104f,
+    0.198098411f,  0.980182136f,
+    0.196594598f,  0.980484862f,
+    0.195090322f,  0.980785280f,
+    0.193585587f,  0.981083391f,
+    0.192080397f,  0.981379193f,
+    0.190574755f,  0.981672686f,
+    0.189068664f,  0.981963869f,
+    0.187562129f,  0.982252741f,
+    0.186055152f,  0.982539302f,
+    0.184547737f,  0.982823551f,
+    0.183039888f,  0.983105487f,
+    0.181531608f,  0.983385110f,
+    0.180022901f,  0.983662419f,
+    0.178513771f,  0.983937413f,
+    0.177004220f,  0.984210092f,
+    0.175494253f,  0.984480455f,
+    0.173983873f,  0.984748502f,
+    0.172473084f,  0.985014231f,
+    0.170961889f,  0.985277642f,
+    0.169450291f,  0.985538735f,
+    0.167938295f,  0.985797509f,
+    0.166425904f,  0.986053963f,
+    0.164913120f,  0.986308097f,
+    0.163399949f,  0.986559910f,
+    0.161886394f,  0.986809402f,
+    0.160372457f,  0.987056571f,
+    0.158858143f,  0.987301418f,
+    0.157343456f,  0.987543942f,
+    0.155828398f,  0.987784142f,
+    0.154312973f,  0.988022017f,
+    0.152797185f,  0.988257568f,
+    0.151281038f,  0.988490793f,
+    0.149764535f,  0.988721692f,
+    0.148247679f,  0.988950265f,
+    0.146730474f,  0.989176510f,
+    0.145212925f,  0.989400428f,
+    0.143695033f,  0.989622017f,
+    0.142176804f,  0.989841278f,
+    0.140658239f,  0.990058210f,
+    0.139139344f,  0.990272812f,
+    0.137620122f,  0.990485084f,
+    0.136100575f,  0.990695025f,
+    0.134580709f,  0.990902635f,
+    0.133060525f,  0.991107914f,
+    0.131540029f,  0.991310860f,
+    0.130019223f,  0.991511473f,
+    0.128498111f,  0.991709754f,
+    0.126976696f,  0.991905700f,
+    0.125454983f,  0.992099313f,
+    0.123932975f,  0.992290591f,
+    0.122410675f,  0.992479535f,
+    0.120888087f,  0.992666142f,
+    0.119365215f,  0.992850414f,
+    0.117842062f,  0.993032350f,
+    0.116318631f,  0.993211949f,
+    0.114794927f,  0.993389211f,
+    0.113270952f,  0.993564136f,
+    0.111746711f,  0.993736722f,
+    0.110222207f,  0.993906970f,
+    0.108697444f,  0.994074879f,
+    0.107172425f,  0.994240449f,
+    0.105647154f,  0.994403680f,
+    0.104121634f,  0.994564571f,
+    0.102595869f,  0.994723121f,
+    0.101069863f,  0.994879331f,
+    0.099543619f,  0.995033199f,
+    0.098017140f,  0.995184727f,
+    0.096490431f,  0.995333912f,
+    0.094963495f,  0.995480755f,
+    0.093436336f,  0.995625256f,
+    0.091908956f,  0.995767414f,
+    0.090381361f,  0.995907229f,
+    0.088853553f,  0.996044701f,
+    0.087325535f,  0.996179829f,
+    0.085797312f,  0.996312612f,
+    0.084268888f,  0.996443051f,
+    0.082740265f,  0.996571146f,
+    0.081211447f,  0.996696895f,
+    0.079682438f,  0.996820299f,
+    0.078153242f,  0.996941358f,
+    0.076623861f,  0.997060070f,
+    0.075094301f,  0.997176437f,
+    0.073564564f,  0.997290457f,
+    0.072034653f,  0.997402130f,
+    0.070504573f,  0.997511456f,
+    0.068974328f,  0.997618435f,
+    0.067443920f,  0.997723067f,
+    0.065913353f,  0.997825350f,
+    0.064382631f,  0.997925286f,
+    0.062851758f,  0.998022874f,
+    0.061320736f,  0.998118113f,
+    0.059789571f,  0.998211003f,
+    0.058258265f,  0.998301545f,
+    0.056726821f,  0.998389737f,
+    0.055195244f,  0.998475581f,
+    0.053663538f,  0.998559074f,
+    0.052131705f,  0.998640218f,
+    0.050599749f,  0.998719012f,
+    0.049067674f,  0.998795456f,
+    0.047535484f,  0.998869550f,
+    0.046003182f,  0.998941293f,
+    0.044470772f,  0.999010686f,
+    0.042938257f,  0.999077728f,
+    0.041405641f,  0.999142419f,
+    0.039872928f,  0.999204759f,
+    0.038340120f,  0.999264747f,
+    0.036807223f,  0.999322385f,
+    0.035274239f,  0.999377670f,
+    0.033741172f,  0.999430605f,
+    0.032208025f,  0.999481187f,
+    0.030674803f,  0.999529418f,
+    0.029141509f,  0.999575296f,
+    0.027608146f,  0.999618822f,
+    0.026074718f,  0.999659997f,
+    0.024541229f,  0.999698819f,
+    0.023007681f,  0.999735288f,
+    0.021474080f,  0.999769405f,
+    0.019940429f,  0.999801170f,
+    0.018406730f,  0.999830582f,
+    0.016872988f,  0.999857641f,
+    0.015339206f,  0.999882347f,
+    0.013805389f,  0.999904701f,
+    0.012271538f,  0.999924702f,
+    0.010737659f,  0.999942350f,
+    0.009203755f,  0.999957645f,
+    0.007669829f,  0.999970586f,
+    0.006135885f,  0.999981175f,
+    0.004601926f,  0.999989411f,
+    0.003067957f,  0.999995294f,
+    0.001533980f,  0.999998823f,
+    0.000000000f,  1.000000000f,
+   -0.001533980f,  0.999998823f,
+   -0.003067957f,  0.999995294f,
+   -0.004601926f,  0.999989411f,
+   -0.006135885f,  0.999981175f,
+   -0.007669829f,  0.999970586f,
+   -0.009203755f,  0.999957645f,
+   -0.010737659f,  0.999942350f,
+   -0.012271538f,  0.999924702f,
+   -0.013805389f,  0.999904701f,
+   -0.015339206f,  0.999882347f,
+   -0.016872988f,  0.999857641f,
+   -0.018406730f,  0.999830582f,
+   -0.019940429f,  0.999801170f,
+   -0.021474080f,  0.999769405f,
+   -0.023007681f,  0.999735288f,
+   -0.024541229f,  0.999698819f,
+   -0.026074718f,  0.999659997f,
+   -0.027608146f,  0.999618822f,
+   -0.029141509f,  0.999575296f,
+   -0.030674803f,  0.999529418f,
+   -0.032208025f,  0.999481187f,
+   -0.033741172f,  0.999430605f,
+   -0.035274239f,  0.999377670f,
+   -0.036807223f,  0.999322385f,
+   -0.038340120f,  0.999264747f,
+   -0.039872928f,  0.999204759f,
+   -0.041405641f,  0.999142419f,
+   -0.042938257f,  0.999077728f,
+   -0.044470772f,  0.999010686f,
+   -0.046003182f,  0.998941293f,
+   -0.047535484f,  0.998869550f,
+   -0.049067674f,  0.998795456f,
+   -0.050599749f,  0.998719012f,
+   -0.052131705f,  0.998640218f,
+   -0.053663538f,  0.998559074f,
+   -0.055195244f,  0.998475581f,
+   -0.056726821f,  0.998389737f,
+   -0.058258265f,  0.998301545f,
+   -0.059789571f,  0.998211003f,
+   -0.061320736f,  0.998118113f,
+   -0.062851758f,  0.998022874f,
+   -0.064382631f,  0.997925286f,
+   -0.065913353f,  0.997825350f,
+   -0.067443920f,  0.997723067f,
+   -0.068974328f,  0.997618435f,
+   -0.070504573f,  0.997511456f,
+   -0.072034653f,  0.997402130f,
+   -0.073564564f,  0.997290457f,
+   -0.075094301f,  0.997176437f,
+   -0.076623861f,  0.997060070f,
+   -0.078153242f,  0.996941358f,
+   -0.079682438f,  0.996820299f,
+   -0.081211447f,  0.996696895f,
+   -0.082740265f,  0.996571146f,
+   -0.084268888f,  0.996443051f,
+   -0.085797312f,  0.996312612f,
+   -0.087325535f,  0.996179829f,
+   -0.088853553f,  0.996044701f,
+   -0.090381361f,  0.995907229f,
+   -0.091908956f,  0.995767414f,
+   -0.093436336f,  0.995625256f,
+   -0.094963495f,  0.995480755f,
+   -0.096490431f,  0.995333912f,
+   -0.098017140f,  0.995184727f,
+   -0.099543619f,  0.995033199f,
+   -0.101069863f,  0.994879331f,
+   -0.102595869f,  0.994723121f,
+   -0.104121634f,  0.994564571f,
+   -0.105647154f,  0.994403680f,
+   -0.107172425f,  0.994240449f,
+   -0.108697444f,  0.994074879f,
+   -0.110222207f,  0.993906970f,
+   -0.111746711f,  0.993736722f,
+   -0.113270952f,  0.993564136f,
+   -0.114794927f,  0.993389211f,
+   -0.116318631f,  0.993211949f,
+   -0.117842062f,  0.993032350f,
+   -0.119365215f,  0.992850414f,
+   -0.120888087f,  0.992666142f,
+   -0.122410675f,  0.992479535f,
+   -0.123932975f,  0.992290591f,
+   -0.125454983f,  0.992099313f,
+   -0.126976696f,  0.991905700f,
+   -0.128498111f,  0.991709754f,
+   -0.130019223f,  0.991511473f,
+   -0.131540029f,  0.991310860f,
+   -0.133060525f,  0.991107914f,
+   -0.134580709f,  0.990902635f,
+   -0.136100575f,  0.990695025f,
+   -0.137620122f,  0.990485084f,
+   -0.139139344f,  0.990272812f,
+   -0.140658239f,  0.990058210f,
+   -0.142176804f,  0.989841278f,
+   -0.143695033f,  0.989622017f,
+   -0.145212925f,  0.989400428f,
+   -0.146730474f,  0.989176510f,
+   -0.148247679f,  0.988950265f,
+   -0.149764535f,  0.988721692f,
+   -0.151281038f,  0.988490793f,
+   -0.152797185f,  0.988257568f,
+   -0.154312973f,  0.988022017f,
+   -0.155828398f,  0.987784142f,
+   -0.157343456f,  0.987543942f,
+   -0.158858143f,  0.987301418f,
+   -0.160372457f,  0.987056571f,
+   -0.161886394f,  0.986809402f,
+   -0.163399949f,  0.986559910f,
+   -0.164913120f,  0.986308097f,
+   -0.166425904f,  0.986053963f,
+   -0.167938295f,  0.985797509f,
+   -0.169450291f,  0.985538735f,
+   -0.170961889f,  0.985277642f,
+   -0.172473084f,  0.985014231f,
+   -0.173983873f,  0.984748502f,
+   -0.175494253f,  0.984480455f,
+   -0.177004220f,  0.984210092f,
+   -0.178513771f,  0.983937413f,
+   -0.180022901f,  0.983662419f,
+   -0.181531608f,  0.983385110f,
+   -0.183039888f,  0.983105487f,
+   -0.184547737f,  0.982823551f,
+   -0.186055152f,  0.982539302f,
+   -0.187562129f,  0.982252741f,
+   -0.189068664f,  0.981963869f,
+   -0.190574755f,  0.981672686f,
+   -0.192080397f,  0.981379193f,
+   -0.193585587f,  0.981083391f,
+   -0.195090322f,  0.980785280f,
+   -0.196594598f,  0.980484862f,
+   -0.198098411f,  0.980182136f,
+   -0.199601758f,  0.979877104f,
+   -0.201104635f,  0.979569766f,
+   -0.202607039f,  0.979260123f,
+   -0.204108966f,  0.978948175f,
+   -0.205610413f,  0.978633924f,
+   -0.207111376f,  0.978317371f,
+   -0.208611852f,  0.977998515f,
+   -0.210111837f,  0.977677358f,
+   -0.211611327f,  0.977353900f,
+   -0.213110320f,  0.977028143f,
+   -0.214608811f,  0.976700086f,
+   -0.216106797f,  0.976369731f,
+   -0.217604275f,  0.976037079f,
+   -0.219101240f,  0.975702130f,
+   -0.220597690f,  0.975364885f,
+   -0.222093621f,  0.975025345f,
+   -0.223589029f,  0.974683511f,
+   -0.225083911f,  0.974339383f,
+   -0.226578264f,  0.973992962f,
+   -0.228072083f,  0.973644250f,
+   -0.229565366f,  0.973293246f,
+   -0.231058108f,  0.972939952f,
+   -0.232550307f,  0.972584369f,
+   -0.234041959f,  0.972226497f,
+   -0.235533059f,  0.971866337f,
+   -0.237023606f,  0.971503891f,
+   -0.238513595f,  0.971139158f,
+   -0.240003022f,  0.970772141f,
+   -0.241491885f,  0.970402839f,
+   -0.242980180f,  0.970031253f,
+   -0.244467903f,  0.969657385f,
+   -0.245955050f,  0.969281235f,
+   -0.247441619f,  0.968902805f,
+   -0.248927606f,  0.968522094f,
+   -0.250413007f,  0.968139105f,
+   -0.251897818f,  0.967753837f,
+   -0.253382037f,  0.967366292f,
+   -0.254865660f,  0.966976471f,
+   -0.256348682f,  0.966584374f,
+   -0.257831102f,  0.966190003f,
+   -0.259312915f,  0.965793359f,
+   -0.260794118f,  0.965394442f,
+   -0.262274707f,  0.964993253f,
+   -0.263754679f,  0.964589793f,
+   -0.265234030f,  0.964184064f,
+   -0.266712757f,  0.963776066f,
+   -0.268190857f,  0.963365800f,
+   -0.269668326f,  0.962953267f,
+   -0.271145160f,  0.962538468f,
+   -0.272621355f,  0.962121404f,
+   -0.274096910f,  0.961702077f,
+   -0.275571819f,  0.961280486f,
+   -0.277046080f,  0.960856633f,
+   -0.278519689f,  0.960430519f,
+   -0.279992643f,  0.960002146f,
+   -0.281464938f,  0.959571513f,
+   -0.282936570f,  0.959138622f,
+   -0.284407537f,  0.958703475f,
+   -0.285877835f,  0.958266071f,
+   -0.287347460f,  0.957826413f,
+   -0.288816408f,  0.957384501f,
+   -0.290284677f,  0.956940336f,
+   -0.291752263f,  0.956493919f,
+   -0.293219163f,  0.956045251f,
+   -0.294685372f,  0.955594334f,
+   -0.296150888f,  0.955141168f,
+   -0.297615707f,  0.954685755f,
+   -0.299079826f,  0.954228095f,
+   -0.300543241f,  0.953768190f,
+   -0.302005949f,  0.953306040f,
+   -0.303467947f,  0.952841648f,
+   -0.304929230f,  0.952375013f,
+   -0.306389795f,  0.951906137f,
+   -0.307849640f,  0.951435021f,
+   -0.309308760f,  0.950961666f,
+   -0.310767153f,  0.950486074f,
+   -0.312224814f,  0.950008245f,
+   -0.313681740f,  0.949528181f,
+   -0.315137929f,  0.949045882f,
+   -0.316593376f,  0.948561350f,
+   -0.318048077f,  0.948074586f,
+   -0.319502031f,  0.947585591f,
+   -0.320955232f,  0.947094366f,
+   -0.322407679f,  0.946600913f,
+   -0.323859367f,  0.946105232f,
+   -0.325310292f,  0.945607325f,
+   -0.326760452f,  0.945107193f,
+   -0.328209844f,  0.944604837f,
+   -0.329658463f,  0.944100258f,
+   -0.331106306f,  0.943593458f,
+   -0.332553370f,  0.943084437f,
+   -0.333999651f,  0.942573198f,
+   -0.335445147f,  0.942059740f,
+   -0.336889853f,  0.941544065f,
+   -0.338333767f,  0.941026175f,
+   -0.339776884f,  0.940506071f,
+   -0.341219202f,  0.939983753f,
+   -0.342660717f,  0.939459224f,
+   -0.344101426f,  0.938932484f,
+   -0.345541325f,  0.938403534f,
+   -0.346980411f,  0.937872376f,
+   -0.348418680f,  0.937339012f,
+   -0.349856130f,  0.936803442f,
+   -0.351292756f,  0.936265667f,
+   -0.352728556f,  0.935725689f,
+   -0.354163525f,  0.935183510f,
+   -0.355597662f,  0.934639130f,
+   -0.357030961f,  0.934092550f,
+   -0.358463421f,  0.933543773f,
+   -0.359895037f,  0.932992799f,
+   -0.361325806f,  0.932439629f,
+   -0.362755724f,  0.931884266f,
+   -0.364184790f,  0.931326709f,
+   -0.365612998f,  0.930766961f,
+   -0.367040346f,  0.930205023f,
+   -0.368466830f,  0.929640896f,
+   -0.369892447f,  0.929074581f,
+   -0.371317194f,  0.928506080f,
+   -0.372741067f,  0.927935395f,
+   -0.374164063f,  0.927362526f,
+   -0.375586178f,  0.926787474f,
+   -0.377007410f,  0.926210242f,
+   -0.378427755f,  0.925630831f,
+   -0.379847209f,  0.925049241f,
+   -0.381265769f,  0.924465474f,
+   -0.382683432f,  0.923879533f,
+   -0.384100195f,  0.923291417f,
+   -0.385516054f,  0.922701128f,
+   -0.386931006f,  0.922108669f,
+   -0.388345047f,  0.921514039f,
+   -0.389758174f,  0.920917242f,
+   -0.391170384f,  0.920318277f,
+   -0.392581674f,  0.919717146f,
+   -0.393992040f,  0.919113852f,
+   -0.395401479f,  0.918508394f,
+   -0.396809987f,  0.917900776f,
+   -0.398217562f,  0.917290997f,
+   -0.399624200f,  0.916679060f,
+   -0.401029897f,  0.916064966f,
+   -0.402434651f,  0.915448716f,
+   -0.403838458f,  0.914830312f,
+   -0.405241314f,  0.914209756f,
+   -0.406643217f,  0.913587048f,
+   -0.408044163f,  0.912962190f,
+   -0.409444149f,  0.912335185f,
+   -0.410843171f,  0.911706032f,
+   -0.412241227f,  0.911074734f,
+   -0.413638312f,  0.910441292f,
+   -0.415034424f,  0.909805708f,
+   -0.416429560f,  0.909167983f,
+   -0.417823716f,  0.908528119f,
+   -0.419216888f,  0.907886116f,
+   -0.420609074f,  0.907241978f,
+   -0.422000271f,  0.906595705f,
+   -0.423390474f,  0.905947298f,
+   -0.424779681f,  0.905296759f,
+   -0.426167889f,  0.904644091f,
+   -0.427555093f,  0.903989293f,
+   -0.428941292f,  0.903332368f,
+   -0.430326481f,  0.902673318f,
+   -0.431710658f,  0.902012144f,
+   -0.433093819f,  0.901348847f,
+   -0.434475961f,  0.900683429f,
+   -0.435857080f,  0.900015892f,
+   -0.437237174f,  0.899346237f,
+   -0.438616239f,  0.898674466f,
+   -0.439994271f,  0.898000580f,
+   -0.441371269f,  0.897324581f,
+   -0.442747228f,  0.896646470f,
+   -0.444122145f,  0.895966250f,
+   -0.445496017f,  0.895283921f,
+   -0.446868840f,  0.894599486f,
+   -0.448240612f,  0.893912945f,
+   -0.449611330f,  0.893224301f,
+   -0.450980989f,  0.892533555f,
+   -0.452349587f,  0.891840709f,
+   -0.453717121f,  0.891145765f,
+   -0.455083587f,  0.890448723f,
+   -0.456448982f,  0.889749586f,
+   -0.457813304f,  0.889048356f,
+   -0.459176548f,  0.888345033f,
+   -0.460538711f,  0.887639620f,
+   -0.461899791f,  0.886932119f,
+   -0.463259784f,  0.886222530f,
+   -0.464618686f,  0.885510856f,
+   -0.465976496f,  0.884797098f,
+   -0.467333209f,  0.884081259f,
+   -0.468688822f,  0.883363339f,
+   -0.470043332f,  0.882643340f,
+   -0.471396737f,  0.881921264f,
+   -0.472749032f,  0.881197113f,
+   -0.474100215f,  0.880470889f,
+   -0.475450282f,  0.879742593f,
+   -0.476799230f,  0.879012226f,
+   -0.478147056f,  0.878279792f,
+   -0.479493758f,  0.877545290f,
+   -0.480839331f,  0.876808724f,
+   -0.482183772f,  0.876070094f,
+   -0.483527079f,  0.875329403f,
+   -0.484869248f,  0.874586652f,
+   -0.486210276f,  0.873841843f,
+   -0.487550160f,  0.873094978f,
+   -0.488888897f,  0.872346059f,
+   -0.490226483f,  0.871595087f,
+   -0.491562916f,  0.870842063f,
+   -0.492898192f,  0.870086991f,
+   -0.494232309f,  0.869329871f,
+   -0.495565262f,  0.868570706f,
+   -0.496897049f,  0.867809497f,
+   -0.498227667f,  0.867046246f,
+   -0.499557113f,  0.866280954f,
+   -0.500885383f,  0.865513624f,
+   -0.502212474f,  0.864744258f,
+   -0.503538384f,  0.863972856f,
+   -0.504863109f,  0.863199422f,
+   -0.506186645f,  0.862423956f,
+   -0.507508991f,  0.861646461f,
+   -0.508830143f,  0.860866939f,
+   -0.510150097f,  0.860085390f,
+   -0.511468850f,  0.859301818f,
+   -0.512786401f,  0.858516224f,
+   -0.514102744f,  0.857728610f,
+   -0.515417878f,  0.856938977f,
+   -0.516731799f,  0.856147328f,
+   -0.518044504f,  0.855353665f,
+   -0.519355990f,  0.854557988f,
+   -0.520666254f,  0.853760301f,
+   -0.521975293f,  0.852960605f,
+   -0.523283103f,  0.852158902f,
+   -0.524589683f,  0.851355193f,
+   -0.525895027f,  0.850549481f,
+   -0.527199135f,  0.849741768f,
+   -0.528502002f,  0.848932055f,
+   -0.529803625f,  0.848120345f,
+   -0.531104001f,  0.847306639f,
+   -0.532403128f,  0.846490939f,
+   -0.533701002f,  0.845673247f,
+   -0.534997620f,  0.844853565f,
+   -0.536292979f,  0.844031895f,
+   -0.537587076f,  0.843208240f,
+   -0.538879909f,  0.842382600f,
+   -0.540171473f,  0.841554977f,
+   -0.541461766f,  0.840725375f,
+   -0.542750785f,  0.839893794f,
+   -0.544038527f,  0.839060237f,
+   -0.545324988f,  0.838224706f,
+   -0.546610167f,  0.837387202f,
+   -0.547894059f,  0.836547727f,
+   -0.549176662f,  0.835706284f,
+   -0.550457973f,  0.834862875f,
+   -0.551737988f,  0.834017501f,
+   -0.553016706f,  0.833170165f,
+   -0.554294121f,  0.832320868f,
+   -0.555570233f,  0.831469612f,
+   -0.556845037f,  0.830616400f,
+   -0.558118531f,  0.829761234f,
+   -0.559390712f,  0.828904115f,
+   -0.560661576f,  0.828045045f,
+   -0.561931121f,  0.827184027f,
+   -0.563199344f,  0.826321063f,
+   -0.564466242f,  0.825456154f,
+   -0.565731811f,  0.824589303f,
+   -0.566996049f,  0.823720511f,
+   -0.568258953f,  0.822849781f,
+   -0.569520519f,  0.821977115f,
+   -0.570780746f,  0.821102515f,
+   -0.572039629f,  0.820225983f,
+   -0.573297167f,  0.819347520f,
+   -0.574553355f,  0.818467130f,
+   -0.575808191f,  0.817584813f,
+   -0.577061673f,  0.816700573f,
+   -0.578313796f,  0.815814411f,
+   -0.579564559f,  0.814926329f,
+   -0.580813958f,  0.814036330f,
+   -0.582061990f,  0.813144415f,
+   -0.583308653f,  0.812250587f,
+   -0.584553943f,  0.811354847f,
+   -0.585797857f,  0.810457198f,
+   -0.587040394f,  0.809557642f,
+   -0.588281548f,  0.808656182f,
+   -0.589521319f,  0.807752818f,
+   -0.590759702f,  0.806847554f,
+   -0.591996695f,  0.805940391f,
+   -0.593232295f,  0.805031331f,
+   -0.594466499f,  0.804120377f,
+   -0.595699304f,  0.803207531f,
+   -0.596930708f,  0.802292796f,
+   -0.598160707f,  0.801376172f,
+   -0.599389298f,  0.800457662f,
+   -0.600616479f,  0.799537269f,
+   -0.601842247f,  0.798614995f,
+   -0.603066599f,  0.797690841f,
+   -0.604289531f,  0.796764810f,
+   -0.605511041f,  0.795836905f,
+   -0.606731127f,  0.794907126f,
+   -0.607949785f,  0.793975478f,
+   -0.609167012f,  0.793041960f,
+   -0.610382806f,  0.792106577f,
+   -0.611597164f,  0.791169330f,
+   -0.612810082f,  0.790230221f,
+   -0.614021559f,  0.789289253f,
+   -0.615231591f,  0.788346428f,
+   -0.616440175f,  0.787401747f,
+   -0.617647308f,  0.786455214f,
+   -0.618852988f,  0.785506830f,
+   -0.620057212f,  0.784556597f,
+   -0.621259977f,  0.783604519f,
+   -0.622461279f,  0.782650596f,
+   -0.623661118f,  0.781694832f,
+   -0.624859488f,  0.780737229f,
+   -0.626056388f,  0.779777788f,
+   -0.627251815f,  0.778816512f,
+   -0.628445767f,  0.777853404f,
+   -0.629638239f,  0.776888466f,
+   -0.630829230f,  0.775921699f,
+   -0.632018736f,  0.774953107f,
+   -0.633206755f,  0.773982691f,
+   -0.634393284f,  0.773010453f,
+   -0.635578320f,  0.772036397f,
+   -0.636761861f,  0.771060524f,
+   -0.637943904f,  0.770082837f,
+   -0.639124445f,  0.769103338f,
+   -0.640303482f,  0.768122029f,
+   -0.641481013f,  0.767138912f,
+   -0.642657034f,  0.766153990f,
+   -0.643831543f,  0.765167266f,
+   -0.645004537f,  0.764178741f,
+   -0.646176013f,  0.763188417f,
+   -0.647345969f,  0.762196298f,
+   -0.648514401f,  0.761202385f,
+   -0.649681307f,  0.760206682f,
+   -0.650846685f,  0.759209189f,
+   -0.652010531f,  0.758209910f,
+   -0.653172843f,  0.757208847f,
+   -0.654333618f,  0.756206001f,
+   -0.655492853f,  0.755201377f,
+   -0.656650546f,  0.754194975f,
+   -0.657806693f,  0.753186799f,
+   -0.658961293f,  0.752176850f,
+   -0.660114342f,  0.751165132f,
+   -0.661265838f,  0.750151646f,
+   -0.662415778f,  0.749136395f,
+   -0.663564159f,  0.748119380f,
+   -0.664710978f,  0.747100606f,
+   -0.665856234f,  0.746080074f,
+   -0.666999922f,  0.745057785f,
+   -0.668142041f,  0.744033744f,
+   -0.669282588f,  0.743007952f,
+   -0.670421560f,  0.741980412f,
+   -0.671558955f,  0.740951125f,
+   -0.672694769f,  0.739920095f,
+   -0.673829000f,  0.738887324f,
+   -0.674961646f,  0.737852815f,
+   -0.676092704f,  0.736816569f,
+   -0.677222170f,  0.735778589f,
+   -0.678350043f,  0.734738878f,
+   -0.679476320f,  0.733697438f,
+   -0.680600998f,  0.732654272f,
+   -0.681724074f,  0.731609381f,
+   -0.682845546f,  0.730562769f,
+   -0.683965412f,  0.729514438f,
+   -0.685083668f,  0.728464390f,
+   -0.686200312f,  0.727412629f,
+   -0.687315341f,  0.726359155f,
+   -0.688428753f,  0.725303972f,
+   -0.689540545f,  0.724247083f,
+   -0.690650714f,  0.723188489f,
+   -0.691759258f,  0.722128194f,
+   -0.692866175f,  0.721066199f,
+   -0.693971461f,  0.720002508f,
+   -0.695075114f,  0.718937122f,
+   -0.696177131f,  0.717870045f,
+   -0.697277511f,  0.716801279f,
+   -0.698376249f,  0.715730825f,
+   -0.699473345f,  0.714658688f,
+   -0.700568794f,  0.713584869f,
+   -0.701662595f,  0.712509371f,
+   -0.702754744f,  0.711432196f,
+   -0.703845241f,  0.710353347f,
+   -0.704934080f,  0.709272826f,
+   -0.706021261f,  0.708190637f,
+   -0.707106781f,  0.707106781f,
+   -0.708190637f,  0.706021261f,
+   -0.709272826f,  0.704934080f,
+   -0.710353347f,  0.703845241f,
+   -0.711432196f,  0.702754744f,
+   -0.712509371f,  0.701662595f,
+   -0.713584869f,  0.700568794f,
+   -0.714658688f,  0.699473345f,
+   -0.715730825f,  0.698376249f,
+   -0.716801279f,  0.697277511f,
+   -0.717870045f,  0.696177131f,
+   -0.718937122f,  0.695075114f,
+   -0.720002508f,  0.693971461f,
+   -0.721066199f,  0.692866175f,
+   -0.722128194f,  0.691759258f,
+   -0.723188489f,  0.690650714f,
+   -0.724247083f,  0.689540545f,
+   -0.725303972f,  0.688428753f,
+   -0.726359155f,  0.687315341f,
+   -0.727412629f,  0.686200312f,
+   -0.728464390f,  0.685083668f,
+   -0.729514438f,  0.683965412f,
+   -0.730562769f,  0.682845546f,
+   -0.731609381f,  0.681724074f,
+   -0.732654272f,  0.680600998f,
+   -0.733697438f,  0.679476320f,
+   -0.734738878f,  0.678350043f,
+   -0.735778589f,  0.677222170f,
+   -0.736816569f,  0.676092704f,
+   -0.737852815f,  0.674961646f,
+   -0.738887324f,  0.673829000f,
+   -0.739920095f,  0.672694769f,
+   -0.740951125f,  0.671558955f,
+   -0.741980412f,  0.670421560f,
+   -0.743007952f,  0.669282588f,
+   -0.744033744f,  0.668142041f,
+   -0.745057785f,  0.666999922f,
+   -0.746080074f,  0.665856234f,
+   -0.747100606f,  0.664710978f,
+   -0.748119380f,  0.663564159f,
+   -0.749136395f,  0.662415778f,
+   -0.750151646f,  0.661265838f,
+   -0.751165132f,  0.660114342f,
+   -0.752176850f,  0.658961293f,
+   -0.753186799f,  0.657806693f,
+   -0.754194975f,  0.656650546f,
+   -0.755201377f,  0.655492853f,
+   -0.756206001f,  0.654333618f,
+   -0.757208847f,  0.653172843f,
+   -0.758209910f,  0.652010531f,
+   -0.759209189f,  0.650846685f,
+   -0.760206682f,  0.649681307f,
+   -0.761202385f,  0.648514401f,
+   -0.762196298f,  0.647345969f,
+   -0.763188417f,  0.646176013f,
+   -0.764178741f,  0.645004537f,
+   -0.765167266f,  0.643831543f,
+   -0.766153990f,  0.642657034f,
+   -0.767138912f,  0.641481013f,
+   -0.768122029f,  0.640303482f,
+   -0.769103338f,  0.639124445f,
+   -0.770082837f,  0.637943904f,
+   -0.771060524f,  0.636761861f,
+   -0.772036397f,  0.635578320f,
+   -0.773010453f,  0.634393284f,
+   -0.773982691f,  0.633206755f,
+   -0.774953107f,  0.632018736f,
+   -0.775921699f,  0.630829230f,
+   -0.776888466f,  0.629638239f,
+   -0.777853404f,  0.628445767f,
+   -0.778816512f,  0.627251815f,
+   -0.779777788f,  0.626056388f,
+   -0.780737229f,  0.624859488f,
+   -0.781694832f,  0.623661118f,
+   -0.782650596f,  0.622461279f,
+   -0.783604519f,  0.621259977f,
+   -0.784556597f,  0.620057212f,
+   -0.785506830f,  0.618852988f,
+   -0.786455214f,  0.617647308f,
+   -0.787401747f,  0.616440175f,
+   -0.788346428f,  0.615231591f,
+   -0.789289253f,  0.614021559f,
+   -0.790230221f,  0.612810082f,
+   -0.791169330f,  0.611597164f,
+   -0.792106577f,  0.610382806f,
+   -0.793041960f,  0.609167012f,
+   -0.793975478f,  0.607949785f,
+   -0.794907126f,  0.606731127f,
+   -0.795836905f,  0.605511041f,
+   -0.796764810f,  0.604289531f,
+   -0.797690841f,  0.603066599f,
+   -0.798614995f,  0.601842247f,
+   -0.799537269f,  0.600616479f,
+   -0.800457662f,  0.599389298f,
+   -0.801376172f,  0.598160707f,
+   -0.802292796f,  0.596930708f,
+   -0.803207531f,  0.595699304f,
+   -0.804120377f,  0.594466499f,
+   -0.805031331f,  0.593232295f,
+   -0.805940391f,  0.591996695f,
+   -0.806847554f,  0.590759702f,
+   -0.807752818f,  0.589521319f,
+   -0.808656182f,  0.588281548f,
+   -0.809557642f,  0.587040394f,
+   -0.810457198f,  0.585797857f,
+   -0.811354847f,  0.584553943f,
+   -0.812250587f,  0.583308653f,
+   -0.813144415f,  0.582061990f,
+   -0.814036330f,  0.580813958f,
+   -0.814926329f,  0.579564559f,
+   -0.815814411f,  0.578313796f,
+   -0.816700573f,  0.577061673f,
+   -0.817584813f,  0.575808191f,
+   -0.818467130f,  0.574553355f,
+   -0.819347520f,  0.573297167f,
+   -0.820225983f,  0.572039629f,
+   -0.821102515f,  0.570780746f,
+   -0.821977115f,  0.569520519f,
+   -0.822849781f,  0.568258953f,
+   -0.823720511f,  0.566996049f,
+   -0.824589303f,  0.565731811f,
+   -0.825456154f,  0.564466242f,
+   -0.826321063f,  0.563199344f,
+   -0.827184027f,  0.561931121f,
+   -0.828045045f,  0.560661576f,
+   -0.828904115f,  0.559390712f,
+   -0.829761234f,  0.558118531f,
+   -0.830616400f,  0.556845037f,
+   -0.831469612f,  0.555570233f,
+   -0.832320868f,  0.554294121f,
+   -0.833170165f,  0.553016706f,
+   -0.834017501f,  0.551737988f,
+   -0.834862875f,  0.550457973f,
+   -0.835706284f,  0.549176662f,
+   -0.836547727f,  0.547894059f,
+   -0.837387202f,  0.546610167f,
+   -0.838224706f,  0.545324988f,
+   -0.839060237f,  0.544038527f,
+   -0.839893794f,  0.542750785f,
+   -0.840725375f,  0.541461766f,
+   -0.841554977f,  0.540171473f,
+   -0.842382600f,  0.538879909f,
+   -0.843208240f,  0.537587076f,
+   -0.844031895f,  0.536292979f,
+   -0.844853565f,  0.534997620f,
+   -0.845673247f,  0.533701002f,
+   -0.846490939f,  0.532403128f,
+   -0.847306639f,  0.531104001f,
+   -0.848120345f,  0.529803625f,
+   -0.848932055f,  0.528502002f,
+   -0.849741768f,  0.527199135f,
+   -0.850549481f,  0.525895027f,
+   -0.851355193f,  0.524589683f,
+   -0.852158902f,  0.523283103f,
+   -0.852960605f,  0.521975293f,
+   -0.853760301f,  0.520666254f,
+   -0.854557988f,  0.519355990f,
+   -0.855353665f,  0.518044504f,
+   -0.856147328f,  0.516731799f,
+   -0.856938977f,  0.515417878f,
+   -0.857728610f,  0.514102744f,
+   -0.858516224f,  0.512786401f,
+   -0.859301818f,  0.511468850f,
+   -0.860085390f,  0.510150097f,
+   -0.860866939f,  0.508830143f,
+   -0.861646461f,  0.507508991f,
+   -0.862423956f,  0.506186645f,
+   -0.863199422f,  0.504863109f,
+   -0.863972856f,  0.503538384f,
+   -0.864744258f,  0.502212474f,
+   -0.865513624f,  0.500885383f,
+   -0.866280954f,  0.499557113f,
+   -0.867046246f,  0.498227667f,
+   -0.867809497f,  0.496897049f,
+   -0.868570706f,  0.495565262f,
+   -0.869329871f,  0.494232309f,
+   -0.870086991f,  0.492898192f,
+   -0.870842063f,  0.491562916f,
+   -0.871595087f,  0.490226483f,
+   -0.872346059f,  0.488888897f,
+   -0.873094978f,  0.487550160f,
+   -0.873841843f,  0.486210276f,
+   -0.874586652f,  0.484869248f,
+   -0.875329403f,  0.483527079f,
+   -0.876070094f,  0.482183772f,
+   -0.876808724f,  0.480839331f,
+   -0.877545290f,  0.479493758f,
+   -0.878279792f,  0.478147056f,
+   -0.879012226f,  0.476799230f,
+   -0.879742593f,  0.475450282f,
+   -0.880470889f,  0.474100215f,
+   -0.881197113f,  0.472749032f,
+   -0.881921264f,  0.471396737f,
+   -0.882643340f,  0.470043332f,
+   -0.883363339f,  0.468688822f,
+   -0.884081259f,  0.467333209f,
+   -0.884797098f,  0.465976496f,
+   -0.885510856f,  0.464618686f,
+   -0.886222530f,  0.463259784f,
+   -0.886932119f,  0.461899791f,
+   -0.887639620f,  0.460538711f,
+   -0.888345033f,  0.459176548f,
+   -0.889048356f,  0.457813304f,
+   -0.889749586f,  0.456448982f,
+   -0.890448723f,  0.455083587f,
+   -0.891145765f,  0.453717121f,
+   -0.891840709f,  0.452349587f,
+   -0.892533555f,  0.450980989f,
+   -0.893224301f,  0.449611330f,
+   -0.893912945f,  0.448240612f,
+   -0.894599486f,  0.446868840f,
+   -0.895283921f,  0.445496017f,
+   -0.895966250f,  0.444122145f,
+   -0.896646470f,  0.442747228f,
+   -0.897324581f,  0.441371269f,
+   -0.898000580f,  0.439994271f,
+   -0.898674466f,  0.438616239f,
+   -0.899346237f,  0.437237174f,
+   -0.900015892f,  0.435857080f,
+   -0.900683429f,  0.434475961f,
+   -0.901348847f,  0.433093819f,
+   -0.902012144f,  0.431710658f,
+   -0.902673318f,  0.430326481f,
+   -0.903332368f,  0.428941292f,
+   -0.903989293f,  0.427555093f,
+   -0.904644091f,  0.426167889f,
+   -0.905296759f,  0.424779681f,
+   -0.905947298f,  0.423390474f,
+   -0.906595705f,  0.422000271f,
+   -0.907241978f,  0.420609074f,
+   -0.907886116f,  0.419216888f,
+   -0.908528119f,  0.417823716f,
+   -0.909167983f,  0.416429560f,
+   -0.909805708f,  0.415034424f,
+   -0.910441292f,  0.413638312f,
+   -0.911074734f,  0.412241227f,
+   -0.911706032f,  0.410843171f,
+   -0.912335185f,  0.409444149f,
+   -0.912962190f,  0.408044163f,
+   -0.913587048f,  0.406643217f,
+   -0.914209756f,  0.405241314f,
+   -0.914830312f,  0.403838458f,
+   -0.915448716f,  0.402434651f,
+   -0.916064966f,  0.401029897f,
+   -0.916679060f,  0.399624200f,
+   -0.917290997f,  0.398217562f,
+   -0.917900776f,  0.396809987f,
+   -0.918508394f,  0.395401479f,
+   -0.919113852f,  0.393992040f,
+   -0.919717146f,  0.392581674f,
+   -0.920318277f,  0.391170384f,
+   -0.920917242f,  0.389758174f,
+   -0.921514039f,  0.388345047f,
+   -0.922108669f,  0.386931006f,
+   -0.922701128f,  0.385516054f,
+   -0.923291417f,  0.384100195f,
+   -0.923879533f,  0.382683432f,
+   -0.924465474f,  0.381265769f,
+   -0.925049241f,  0.379847209f,
+   -0.925630831f,  0.378427755f,
+   -0.926210242f,  0.377007410f,
+   -0.926787474f,  0.375586178f,
+   -0.927362526f,  0.374164063f,
+   -0.927935395f,  0.372741067f,
+   -0.928506080f,  0.371317194f,
+   -0.929074581f,  0.369892447f,
+   -0.929640896f,  0.368466830f,
+   -0.930205023f,  0.367040346f,
+   -0.930766961f,  0.365612998f,
+   -0.931326709f,  0.364184790f,
+   -0.931884266f,  0.362755724f,
+   -0.932439629f,  0.361325806f,
+   -0.932992799f,  0.359895037f,
+   -0.933543773f,  0.358463421f,
+   -0.934092550f,  0.357030961f,
+   -0.934639130f,  0.355597662f,
+   -0.935183510f,  0.354163525f,
+   -0.935725689f,  0.352728556f,
+   -0.936265667f,  0.351292756f,
+   -0.936803442f,  0.349856130f,
+   -0.937339012f,  0.348418680f,
+   -0.937872376f,  0.346980411f,
+   -0.938403534f,  0.345541325f,
+   -0.938932484f,  0.344101426f,
+   -0.939459224f,  0.342660717f,
+   -0.939983753f,  0.341219202f,
+   -0.940506071f,  0.339776884f,
+   -0.941026175f,  0.338333767f,
+   -0.941544065f,  0.336889853f,
+   -0.942059740f,  0.335445147f,
+   -0.942573198f,  0.333999651f,
+   -0.943084437f,  0.332553370f,
+   -0.943593458f,  0.331106306f,
+   -0.944100258f,  0.329658463f,
+   -0.944604837f,  0.328209844f,
+   -0.945107193f,  0.326760452f,
+   -0.945607325f,  0.325310292f,
+   -0.946105232f,  0.323859367f,
+   -0.946600913f,  0.322407679f,
+   -0.947094366f,  0.320955232f,
+   -0.947585591f,  0.319502031f,
+   -0.948074586f,  0.318048077f,
+   -0.948561350f,  0.316593376f,
+   -0.949045882f,  0.315137929f,
+   -0.949528181f,  0.313681740f,
+   -0.950008245f,  0.312224814f,
+   -0.950486074f,  0.310767153f,
+   -0.950961666f,  0.309308760f,
+   -0.951435021f,  0.307849640f,
+   -0.951906137f,  0.306389795f,
+   -0.952375013f,  0.304929230f,
+   -0.952841648f,  0.303467947f,
+   -0.953306040f,  0.302005949f,
+   -0.953768190f,  0.300543241f,
+   -0.954228095f,  0.299079826f,
+   -0.954685755f,  0.297615707f,
+   -0.955141168f,  0.296150888f,
+   -0.955594334f,  0.294685372f,
+   -0.956045251f,  0.293219163f,
+   -0.956493919f,  0.291752263f,
+   -0.956940336f,  0.290284677f,
+   -0.957384501f,  0.288816408f,
+   -0.957826413f,  0.287347460f,
+   -0.958266071f,  0.285877835f,
+   -0.958703475f,  0.284407537f,
+   -0.959138622f,  0.282936570f,
+   -0.959571513f,  0.281464938f,
+   -0.960002146f,  0.279992643f,
+   -0.960430519f,  0.278519689f,
+   -0.960856633f,  0.277046080f,
+   -0.961280486f,  0.275571819f,
+   -0.961702077f,  0.274096910f,
+   -0.962121404f,  0.272621355f,
+   -0.962538468f,  0.271145160f,
+   -0.962953267f,  0.269668326f,
+   -0.963365800f,  0.268190857f,
+   -0.963776066f,  0.266712757f,
+   -0.964184064f,  0.265234030f,
+   -0.964589793f,  0.263754679f,
+   -0.964993253f,  0.262274707f,
+   -0.965394442f,  0.260794118f,
+   -0.965793359f,  0.259312915f,
+   -0.966190003f,  0.257831102f,
+   -0.966584374f,  0.256348682f,
+   -0.966976471f,  0.254865660f,
+   -0.967366292f,  0.253382037f,
+   -0.967753837f,  0.251897818f,
+   -0.968139105f,  0.250413007f,
+   -0.968522094f,  0.248927606f,
+   -0.968902805f,  0.247441619f,
+   -0.969281235f,  0.245955050f,
+   -0.969657385f,  0.244467903f,
+   -0.970031253f,  0.242980180f,
+   -0.970402839f,  0.241491885f,
+   -0.970772141f,  0.240003022f,
+   -0.971139158f,  0.238513595f,
+   -0.971503891f,  0.237023606f,
+   -0.971866337f,  0.235533059f,
+   -0.972226497f,  0.234041959f,
+   -0.972584369f,  0.232550307f,
+   -0.972939952f,  0.231058108f,
+   -0.973293246f,  0.229565366f,
+   -0.973644250f,  0.228072083f,
+   -0.973992962f,  0.226578264f,
+   -0.974339383f,  0.225083911f,
+   -0.974683511f,  0.223589029f,
+   -0.975025345f,  0.222093621f,
+   -0.975364885f,  0.220597690f,
+   -0.975702130f,  0.219101240f,
+   -0.976037079f,  0.217604275f,
+   -0.976369731f,  0.216106797f,
+   -0.976700086f,  0.214608811f,
+   -0.977028143f,  0.213110320f,
+   -0.977353900f,  0.211611327f,
+   -0.977677358f,  0.210111837f,
+   -0.977998515f,  0.208611852f,
+   -0.978317371f,  0.207111376f,
+   -0.978633924f,  0.205610413f,
+   -0.978948175f,  0.204108966f,
+   -0.979260123f,  0.202607039f,
+   -0.979569766f,  0.201104635f,
+   -0.979877104f,  0.199601758f,
+   -0.980182136f,  0.198098411f,
+   -0.980484862f,  0.196594598f,
+   -0.980785280f,  0.195090322f,
+   -0.981083391f,  0.193585587f,
+   -0.981379193f,  0.192080397f,
+   -0.981672686f,  0.190574755f,
+   -0.981963869f,  0.189068664f,
+   -0.982252741f,  0.187562129f,
+   -0.982539302f,  0.186055152f,
+   -0.982823551f,  0.184547737f,
+   -0.983105487f,  0.183039888f,
+   -0.983385110f,  0.181531608f,
+   -0.983662419f,  0.180022901f,
+   -0.983937413f,  0.178513771f,
+   -0.984210092f,  0.177004220f,
+   -0.984480455f,  0.175494253f,
+   -0.984748502f,  0.173983873f,
+   -0.985014231f,  0.172473084f,
+   -0.985277642f,  0.170961889f,
+   -0.985538735f,  0.169450291f,
+   -0.985797509f,  0.167938295f,
+   -0.986053963f,  0.166425904f,
+   -0.986308097f,  0.164913120f,
+   -0.986559910f,  0.163399949f,
+   -0.986809402f,  0.161886394f,
+   -0.987056571f,  0.160372457f,
+   -0.987301418f,  0.158858143f,
+   -0.987543942f,  0.157343456f,
+   -0.987784142f,  0.155828398f,
+   -0.988022017f,  0.154312973f,
+   -0.988257568f,  0.152797185f,
+   -0.988490793f,  0.151281038f,
+   -0.988721692f,  0.149764535f,
+   -0.988950265f,  0.148247679f,
+   -0.989176510f,  0.146730474f,
+   -0.989400428f,  0.145212925f,
+   -0.989622017f,  0.143695033f,
+   -0.989841278f,  0.142176804f,
+   -0.990058210f,  0.140658239f,
+   -0.990272812f,  0.139139344f,
+   -0.990485084f,  0.137620122f,
+   -0.990695025f,  0.136100575f,
+   -0.990902635f,  0.134580709f,
+   -0.991107914f,  0.133060525f,
+   -0.991310860f,  0.131540029f,
+   -0.991511473f,  0.130019223f,
+   -0.991709754f,  0.128498111f,
+   -0.991905700f,  0.126976696f,
+   -0.992099313f,  0.125454983f,
+   -0.992290591f,  0.123932975f,
+   -0.992479535f,  0.122410675f,
+   -0.992666142f,  0.120888087f,
+   -0.992850414f,  0.119365215f,
+   -0.993032350f,  0.117842062f,
+   -0.993211949f,  0.116318631f,
+   -0.993389211f,  0.114794927f,
+   -0.993564136f,  0.113270952f,
+   -0.993736722f,  0.111746711f,
+   -0.993906970f,  0.110222207f,
+   -0.994074879f,  0.108697444f,
+   -0.994240449f,  0.107172425f,
+   -0.994403680f,  0.105647154f,
+   -0.994564571f,  0.104121634f,
+   -0.994723121f,  0.102595869f,
+   -0.994879331f,  0.101069863f,
+   -0.995033199f,  0.099543619f,
+   -0.995184727f,  0.098017140f,
+   -0.995333912f,  0.096490431f,
+   -0.995480755f,  0.094963495f,
+   -0.995625256f,  0.093436336f,
+   -0.995767414f,  0.091908956f,
+   -0.995907229f,  0.090381361f,
+   -0.996044701f,  0.088853553f,
+   -0.996179829f,  0.087325535f,
+   -0.996312612f,  0.085797312f,
+   -0.996443051f,  0.084268888f,
+   -0.996571146f,  0.082740265f,
+   -0.996696895f,  0.081211447f,
+   -0.996820299f,  0.079682438f,
+   -0.996941358f,  0.078153242f,
+   -0.997060070f,  0.076623861f,
+   -0.997176437f,  0.075094301f,
+   -0.997290457f,  0.073564564f,
+   -0.997402130f,  0.072034653f,
+   -0.997511456f,  0.070504573f,
+   -0.997618435f,  0.068974328f,
+   -0.997723067f,  0.067443920f,
+   -0.997825350f,  0.065913353f,
+   -0.997925286f,  0.064382631f,
+   -0.998022874f,  0.062851758f,
+   -0.998118113f,  0.061320736f,
+   -0.998211003f,  0.059789571f,
+   -0.998301545f,  0.058258265f,
+   -0.998389737f,  0.056726821f,
+   -0.998475581f,  0.055195244f,
+   -0.998559074f,  0.053663538f,
+   -0.998640218f,  0.052131705f,
+   -0.998719012f,  0.050599749f,
+   -0.998795456f,  0.049067674f,
+   -0.998869550f,  0.047535484f,
+   -0.998941293f,  0.046003182f,
+   -0.999010686f,  0.044470772f,
+   -0.999077728f,  0.042938257f,
+   -0.999142419f,  0.041405641f,
+   -0.999204759f,  0.039872928f,
+   -0.999264747f,  0.038340120f,
+   -0.999322385f,  0.036807223f,
+   -0.999377670f,  0.035274239f,
+   -0.999430605f,  0.033741172f,
+   -0.999481187f,  0.032208025f,
+   -0.999529418f,  0.030674803f,
+   -0.999575296f,  0.029141509f,
+   -0.999618822f,  0.027608146f,
+   -0.999659997f,  0.026074718f,
+   -0.999698819f,  0.024541229f,
+   -0.999735288f,  0.023007681f,
+   -0.999769405f,  0.021474080f,
+   -0.999801170f,  0.019940429f,
+   -0.999830582f,  0.018406730f,
+   -0.999857641f,  0.016872988f,
+   -0.999882347f,  0.015339206f,
+   -0.999904701f,  0.013805389f,
+   -0.999924702f,  0.012271538f,
+   -0.999942350f,  0.010737659f,
+   -0.999957645f,  0.009203755f,
+   -0.999970586f,  0.007669829f,
+   -0.999981175f,  0.006135885f,
+   -0.999989411f,  0.004601926f,
+   -0.999995294f,  0.003067957f,
+   -0.999998823f,  0.001533980f,
+   -1.000000000f,  0.000000000f,
+   -0.999998823f, -0.001533980f,
+   -0.999995294f, -0.003067957f,
+   -0.999989411f, -0.004601926f,
+   -0.999981175f, -0.006135885f,
+   -0.999970586f, -0.007669829f,
+   -0.999957645f, -0.009203755f,
+   -0.999942350f, -0.010737659f,
+   -0.999924702f, -0.012271538f,
+   -0.999904701f, -0.013805389f,
+   -0.999882347f, -0.015339206f,
+   -0.999857641f, -0.016872988f,
+   -0.999830582f, -0.018406730f,
+   -0.999801170f, -0.019940429f,
+   -0.999769405f, -0.021474080f,
+   -0.999735288f, -0.023007681f,
+   -0.999698819f, -0.024541229f,
+   -0.999659997f, -0.026074718f,
+   -0.999618822f, -0.027608146f,
+   -0.999575296f, -0.029141509f,
+   -0.999529418f, -0.030674803f,
+   -0.999481187f, -0.032208025f,
+   -0.999430605f, -0.033741172f,
+   -0.999377670f, -0.035274239f,
+   -0.999322385f, -0.036807223f,
+   -0.999264747f, -0.038340120f,
+   -0.999204759f, -0.039872928f,
+   -0.999142419f, -0.041405641f,
+   -0.999077728f, -0.042938257f,
+   -0.999010686f, -0.044470772f,
+   -0.998941293f, -0.046003182f,
+   -0.998869550f, -0.047535484f,
+   -0.998795456f, -0.049067674f,
+   -0.998719012f, -0.050599749f,
+   -0.998640218f, -0.052131705f,
+   -0.998559074f, -0.053663538f,
+   -0.998475581f, -0.055195244f,
+   -0.998389737f, -0.056726821f,
+   -0.998301545f, -0.058258265f,
+   -0.998211003f, -0.059789571f,
+   -0.998118113f, -0.061320736f,
+   -0.998022874f, -0.062851758f,
+   -0.997925286f, -0.064382631f,
+   -0.997825350f, -0.065913353f,
+   -0.997723067f, -0.067443920f,
+   -0.997618435f, -0.068974328f,
+   -0.997511456f, -0.070504573f,
+   -0.997402130f, -0.072034653f,
+   -0.997290457f, -0.073564564f,
+   -0.997176437f, -0.075094301f,
+   -0.997060070f, -0.076623861f,
+   -0.996941358f, -0.078153242f,
+   -0.996820299f, -0.079682438f,
+   -0.996696895f, -0.081211447f,
+   -0.996571146f, -0.082740265f,
+   -0.996443051f, -0.084268888f,
+   -0.996312612f, -0.085797312f,
+   -0.996179829f, -0.087325535f,
+   -0.996044701f, -0.088853553f,
+   -0.995907229f, -0.090381361f,
+   -0.995767414f, -0.091908956f,
+   -0.995625256f, -0.093436336f,
+   -0.995480755f, -0.094963495f,
+   -0.995333912f, -0.096490431f,
+   -0.995184727f, -0.098017140f,
+   -0.995033199f, -0.099543619f,
+   -0.994879331f, -0.101069863f,
+   -0.994723121f, -0.102595869f,
+   -0.994564571f, -0.104121634f,
+   -0.994403680f, -0.105647154f,
+   -0.994240449f, -0.107172425f,
+   -0.994074879f, -0.108697444f,
+   -0.993906970f, -0.110222207f,
+   -0.993736722f, -0.111746711f,
+   -0.993564136f, -0.113270952f,
+   -0.993389211f, -0.114794927f,
+   -0.993211949f, -0.116318631f,
+   -0.993032350f, -0.117842062f,
+   -0.992850414f, -0.119365215f,
+   -0.992666142f, -0.120888087f,
+   -0.992479535f, -0.122410675f,
+   -0.992290591f, -0.123932975f,
+   -0.992099313f, -0.125454983f,
+   -0.991905700f, -0.126976696f,
+   -0.991709754f, -0.128498111f,
+   -0.991511473f, -0.130019223f,
+   -0.991310860f, -0.131540029f,
+   -0.991107914f, -0.133060525f,
+   -0.990902635f, -0.134580709f,
+   -0.990695025f, -0.136100575f,
+   -0.990485084f, -0.137620122f,
+   -0.990272812f, -0.139139344f,
+   -0.990058210f, -0.140658239f,
+   -0.989841278f, -0.142176804f,
+   -0.989622017f, -0.143695033f,
+   -0.989400428f, -0.145212925f,
+   -0.989176510f, -0.146730474f,
+   -0.988950265f, -0.148247679f,
+   -0.988721692f, -0.149764535f,
+   -0.988490793f, -0.151281038f,
+   -0.988257568f, -0.152797185f,
+   -0.988022017f, -0.154312973f,
+   -0.987784142f, -0.155828398f,
+   -0.987543942f, -0.157343456f,
+   -0.987301418f, -0.158858143f,
+   -0.987056571f, -0.160372457f,
+   -0.986809402f, -0.161886394f,
+   -0.986559910f, -0.163399949f,
+   -0.986308097f, -0.164913120f,
+   -0.986053963f, -0.166425904f,
+   -0.985797509f, -0.167938295f,
+   -0.985538735f, -0.169450291f,
+   -0.985277642f, -0.170961889f,
+   -0.985014231f, -0.172473084f,
+   -0.984748502f, -0.173983873f,
+   -0.984480455f, -0.175494253f,
+   -0.984210092f, -0.177004220f,
+   -0.983937413f, -0.178513771f,
+   -0.983662419f, -0.180022901f,
+   -0.983385110f, -0.181531608f,
+   -0.983105487f, -0.183039888f,
+   -0.982823551f, -0.184547737f,
+   -0.982539302f, -0.186055152f,
+   -0.982252741f, -0.187562129f,
+   -0.981963869f, -0.189068664f,
+   -0.981672686f, -0.190574755f,
+   -0.981379193f, -0.192080397f,
+   -0.981083391f, -0.193585587f,
+   -0.980785280f, -0.195090322f,
+   -0.980484862f, -0.196594598f,
+   -0.980182136f, -0.198098411f,
+   -0.979877104f, -0.199601758f,
+   -0.979569766f, -0.201104635f,
+   -0.979260123f, -0.202607039f,
+   -0.978948175f, -0.204108966f,
+   -0.978633924f, -0.205610413f,
+   -0.978317371f, -0.207111376f,
+   -0.977998515f, -0.208611852f,
+   -0.977677358f, -0.210111837f,
+   -0.977353900f, -0.211611327f,
+   -0.977028143f, -0.213110320f,
+   -0.976700086f, -0.214608811f,
+   -0.976369731f, -0.216106797f,
+   -0.976037079f, -0.217604275f,
+   -0.975702130f, -0.219101240f,
+   -0.975364885f, -0.220597690f,
+   -0.975025345f, -0.222093621f,
+   -0.974683511f, -0.223589029f,
+   -0.974339383f, -0.225083911f,
+   -0.973992962f, -0.226578264f,
+   -0.973644250f, -0.228072083f,
+   -0.973293246f, -0.229565366f,
+   -0.972939952f, -0.231058108f,
+   -0.972584369f, -0.232550307f,
+   -0.972226497f, -0.234041959f,
+   -0.971866337f, -0.235533059f,
+   -0.971503891f, -0.237023606f,
+   -0.971139158f, -0.238513595f,
+   -0.970772141f, -0.240003022f,
+   -0.970402839f, -0.241491885f,
+   -0.970031253f, -0.242980180f,
+   -0.969657385f, -0.244467903f,
+   -0.969281235f, -0.245955050f,
+   -0.968902805f, -0.247441619f,
+   -0.968522094f, -0.248927606f,
+   -0.968139105f, -0.250413007f,
+   -0.967753837f, -0.251897818f,
+   -0.967366292f, -0.253382037f,
+   -0.966976471f, -0.254865660f,
+   -0.966584374f, -0.256348682f,
+   -0.966190003f, -0.257831102f,
+   -0.965793359f, -0.259312915f,
+   -0.965394442f, -0.260794118f,
+   -0.964993253f, -0.262274707f,
+   -0.964589793f, -0.263754679f,
+   -0.964184064f, -0.265234030f,
+   -0.963776066f, -0.266712757f,
+   -0.963365800f, -0.268190857f,
+   -0.962953267f, -0.269668326f,
+   -0.962538468f, -0.271145160f,
+   -0.962121404f, -0.272621355f,
+   -0.961702077f, -0.274096910f,
+   -0.961280486f, -0.275571819f,
+   -0.960856633f, -0.277046080f,
+   -0.960430519f, -0.278519689f,
+   -0.960002146f, -0.279992643f,
+   -0.959571513f, -0.281464938f,
+   -0.959138622f, -0.282936570f,
+   -0.958703475f, -0.284407537f,
+   -0.958266071f, -0.285877835f,
+   -0.957826413f, -0.287347460f,
+   -0.957384501f, -0.288816408f,
+   -0.956940336f, -0.290284677f,
+   -0.956493919f, -0.291752263f,
+   -0.956045251f, -0.293219163f,
+   -0.955594334f, -0.294685372f,
+   -0.955141168f, -0.296150888f,
+   -0.954685755f, -0.297615707f,
+   -0.954228095f, -0.299079826f,
+   -0.953768190f, -0.300543241f,
+   -0.953306040f, -0.302005949f,
+   -0.952841648f, -0.303467947f,
+   -0.952375013f, -0.304929230f,
+   -0.951906137f, -0.306389795f,
+   -0.951435021f, -0.307849640f,
+   -0.950961666f, -0.309308760f,
+   -0.950486074f, -0.310767153f,
+   -0.950008245f, -0.312224814f,
+   -0.949528181f, -0.313681740f,
+   -0.949045882f, -0.315137929f,
+   -0.948561350f, -0.316593376f,
+   -0.948074586f, -0.318048077f,
+   -0.947585591f, -0.319502031f,
+   -0.947094366f, -0.320955232f,
+   -0.946600913f, -0.322407679f,
+   -0.946105232f, -0.323859367f,
+   -0.945607325f, -0.325310292f,
+   -0.945107193f, -0.326760452f,
+   -0.944604837f, -0.328209844f,
+   -0.944100258f, -0.329658463f,
+   -0.943593458f, -0.331106306f,
+   -0.943084437f, -0.332553370f,
+   -0.942573198f, -0.333999651f,
+   -0.942059740f, -0.335445147f,
+   -0.941544065f, -0.336889853f,
+   -0.941026175f, -0.338333767f,
+   -0.940506071f, -0.339776884f,
+   -0.939983753f, -0.341219202f,
+   -0.939459224f, -0.342660717f,
+   -0.938932484f, -0.344101426f,
+   -0.938403534f, -0.345541325f,
+   -0.937872376f, -0.346980411f,
+   -0.937339012f, -0.348418680f,
+   -0.936803442f, -0.349856130f,
+   -0.936265667f, -0.351292756f,
+   -0.935725689f, -0.352728556f,
+   -0.935183510f, -0.354163525f,
+   -0.934639130f, -0.355597662f,
+   -0.934092550f, -0.357030961f,
+   -0.933543773f, -0.358463421f,
+   -0.932992799f, -0.359895037f,
+   -0.932439629f, -0.361325806f,
+   -0.931884266f, -0.362755724f,
+   -0.931326709f, -0.364184790f,
+   -0.930766961f, -0.365612998f,
+   -0.930205023f, -0.367040346f,
+   -0.929640896f, -0.368466830f,
+   -0.929074581f, -0.369892447f,
+   -0.928506080f, -0.371317194f,
+   -0.927935395f, -0.372741067f,
+   -0.927362526f, -0.374164063f,
+   -0.926787474f, -0.375586178f,
+   -0.926210242f, -0.377007410f,
+   -0.925630831f, -0.378427755f,
+   -0.925049241f, -0.379847209f,
+   -0.924465474f, -0.381265769f,
+   -0.923879533f, -0.382683432f,
+   -0.923291417f, -0.384100195f,
+   -0.922701128f, -0.385516054f,
+   -0.922108669f, -0.386931006f,
+   -0.921514039f, -0.388345047f,
+   -0.920917242f, -0.389758174f,
+   -0.920318277f, -0.391170384f,
+   -0.919717146f, -0.392581674f,
+   -0.919113852f, -0.393992040f,
+   -0.918508394f, -0.395401479f,
+   -0.917900776f, -0.396809987f,
+   -0.917290997f, -0.398217562f,
+   -0.916679060f, -0.399624200f,
+   -0.916064966f, -0.401029897f,
+   -0.915448716f, -0.402434651f,
+   -0.914830312f, -0.403838458f,
+   -0.914209756f, -0.405241314f,
+   -0.913587048f, -0.406643217f,
+   -0.912962190f, -0.408044163f,
+   -0.912335185f, -0.409444149f,
+   -0.911706032f, -0.410843171f,
+   -0.911074734f, -0.412241227f,
+   -0.910441292f, -0.413638312f,
+   -0.909805708f, -0.415034424f,
+   -0.909167983f, -0.416429560f,
+   -0.908528119f, -0.417823716f,
+   -0.907886116f, -0.419216888f,
+   -0.907241978f, -0.420609074f,
+   -0.906595705f, -0.422000271f,
+   -0.905947298f, -0.423390474f,
+   -0.905296759f, -0.424779681f,
+   -0.904644091f, -0.426167889f,
+   -0.903989293f, -0.427555093f,
+   -0.903332368f, -0.428941292f,
+   -0.902673318f, -0.430326481f,
+   -0.902012144f, -0.431710658f,
+   -0.901348847f, -0.433093819f,
+   -0.900683429f, -0.434475961f,
+   -0.900015892f, -0.435857080f,
+   -0.899346237f, -0.437237174f,
+   -0.898674466f, -0.438616239f,
+   -0.898000580f, -0.439994271f,
+   -0.897324581f, -0.441371269f,
+   -0.896646470f, -0.442747228f,
+   -0.895966250f, -0.444122145f,
+   -0.895283921f, -0.445496017f,
+   -0.894599486f, -0.446868840f,
+   -0.893912945f, -0.448240612f,
+   -0.893224301f, -0.449611330f,
+   -0.892533555f, -0.450980989f,
+   -0.891840709f, -0.452349587f,
+   -0.891145765f, -0.453717121f,
+   -0.890448723f, -0.455083587f,
+   -0.889749586f, -0.456448982f,
+   -0.889048356f, -0.457813304f,
+   -0.888345033f, -0.459176548f,
+   -0.887639620f, -0.460538711f,
+   -0.886932119f, -0.461899791f,
+   -0.886222530f, -0.463259784f,
+   -0.885510856f, -0.464618686f,
+   -0.884797098f, -0.465976496f,
+   -0.884081259f, -0.467333209f,
+   -0.883363339f, -0.468688822f,
+   -0.882643340f, -0.470043332f,
+   -0.881921264f, -0.471396737f,
+   -0.881197113f, -0.472749032f,
+   -0.880470889f, -0.474100215f,
+   -0.879742593f, -0.475450282f,
+   -0.879012226f, -0.476799230f,
+   -0.878279792f, -0.478147056f,
+   -0.877545290f, -0.479493758f,
+   -0.876808724f, -0.480839331f,
+   -0.876070094f, -0.482183772f,
+   -0.875329403f, -0.483527079f,
+   -0.874586652f, -0.484869248f,
+   -0.873841843f, -0.486210276f,
+   -0.873094978f, -0.487550160f,
+   -0.872346059f, -0.488888897f,
+   -0.871595087f, -0.490226483f,
+   -0.870842063f, -0.491562916f,
+   -0.870086991f, -0.492898192f,
+   -0.869329871f, -0.494232309f,
+   -0.868570706f, -0.495565262f,
+   -0.867809497f, -0.496897049f,
+   -0.867046246f, -0.498227667f,
+   -0.866280954f, -0.499557113f,
+   -0.865513624f, -0.500885383f,
+   -0.864744258f, -0.502212474f,
+   -0.863972856f, -0.503538384f,
+   -0.863199422f, -0.504863109f,
+   -0.862423956f, -0.506186645f,
+   -0.861646461f, -0.507508991f,
+   -0.860866939f, -0.508830143f,
+   -0.860085390f, -0.510150097f,
+   -0.859301818f, -0.511468850f,
+   -0.858516224f, -0.512786401f,
+   -0.857728610f, -0.514102744f,
+   -0.856938977f, -0.515417878f,
+   -0.856147328f, -0.516731799f,
+   -0.855353665f, -0.518044504f,
+   -0.854557988f, -0.519355990f,
+   -0.853760301f, -0.520666254f,
+   -0.852960605f, -0.521975293f,
+   -0.852158902f, -0.523283103f,
+   -0.851355193f, -0.524589683f,
+   -0.850549481f, -0.525895027f,
+   -0.849741768f, -0.527199135f,
+   -0.848932055f, -0.528502002f,
+   -0.848120345f, -0.529803625f,
+   -0.847306639f, -0.531104001f,
+   -0.846490939f, -0.532403128f,
+   -0.845673247f, -0.533701002f,
+   -0.844853565f, -0.534997620f,
+   -0.844031895f, -0.536292979f,
+   -0.843208240f, -0.537587076f,
+   -0.842382600f, -0.538879909f,
+   -0.841554977f, -0.540171473f,
+   -0.840725375f, -0.541461766f,
+   -0.839893794f, -0.542750785f,
+   -0.839060237f, -0.544038527f,
+   -0.838224706f, -0.545324988f,
+   -0.837387202f, -0.546610167f,
+   -0.836547727f, -0.547894059f,
+   -0.835706284f, -0.549176662f,
+   -0.834862875f, -0.550457973f,
+   -0.834017501f, -0.551737988f,
+   -0.833170165f, -0.553016706f,
+   -0.832320868f, -0.554294121f,
+   -0.831469612f, -0.555570233f,
+   -0.830616400f, -0.556845037f,
+   -0.829761234f, -0.558118531f,
+   -0.828904115f, -0.559390712f,
+   -0.828045045f, -0.560661576f,
+   -0.827184027f, -0.561931121f,
+   -0.826321063f, -0.563199344f,
+   -0.825456154f, -0.564466242f,
+   -0.824589303f, -0.565731811f,
+   -0.823720511f, -0.566996049f,
+   -0.822849781f, -0.568258953f,
+   -0.821977115f, -0.569520519f,
+   -0.821102515f, -0.570780746f,
+   -0.820225983f, -0.572039629f,
+   -0.819347520f, -0.573297167f,
+   -0.818467130f, -0.574553355f,
+   -0.817584813f, -0.575808191f,
+   -0.816700573f, -0.577061673f,
+   -0.815814411f, -0.578313796f,
+   -0.814926329f, -0.579564559f,
+   -0.814036330f, -0.580813958f,
+   -0.813144415f, -0.582061990f,
+   -0.812250587f, -0.583308653f,
+   -0.811354847f, -0.584553943f,
+   -0.810457198f, -0.585797857f,
+   -0.809557642f, -0.587040394f,
+   -0.808656182f, -0.588281548f,
+   -0.807752818f, -0.589521319f,
+   -0.806847554f, -0.590759702f,
+   -0.805940391f, -0.591996695f,
+   -0.805031331f, -0.593232295f,
+   -0.804120377f, -0.594466499f,
+   -0.803207531f, -0.595699304f,
+   -0.802292796f, -0.596930708f,
+   -0.801376172f, -0.598160707f,
+   -0.800457662f, -0.599389298f,
+   -0.799537269f, -0.600616479f,
+   -0.798614995f, -0.601842247f,
+   -0.797690841f, -0.603066599f,
+   -0.796764810f, -0.604289531f,
+   -0.795836905f, -0.605511041f,
+   -0.794907126f, -0.606731127f,
+   -0.793975478f, -0.607949785f,
+   -0.793041960f, -0.609167012f,
+   -0.792106577f, -0.610382806f,
+   -0.791169330f, -0.611597164f,
+   -0.790230221f, -0.612810082f,
+   -0.789289253f, -0.614021559f,
+   -0.788346428f, -0.615231591f,
+   -0.787401747f, -0.616440175f,
+   -0.786455214f, -0.617647308f,
+   -0.785506830f, -0.618852988f,
+   -0.784556597f, -0.620057212f,
+   -0.783604519f, -0.621259977f,
+   -0.782650596f, -0.622461279f,
+   -0.781694832f, -0.623661118f,
+   -0.780737229f, -0.624859488f,
+   -0.779777788f, -0.626056388f,
+   -0.778816512f, -0.627251815f,
+   -0.777853404f, -0.628445767f,
+   -0.776888466f, -0.629638239f,
+   -0.775921699f, -0.630829230f,
+   -0.774953107f, -0.632018736f,
+   -0.773982691f, -0.633206755f,
+   -0.773010453f, -0.634393284f,
+   -0.772036397f, -0.635578320f,
+   -0.771060524f, -0.636761861f,
+   -0.770082837f, -0.637943904f,
+   -0.769103338f, -0.639124445f,
+   -0.768122029f, -0.640303482f,
+   -0.767138912f, -0.641481013f,
+   -0.766153990f, -0.642657034f,
+   -0.765167266f, -0.643831543f,
+   -0.764178741f, -0.645004537f,
+   -0.763188417f, -0.646176013f,
+   -0.762196298f, -0.647345969f,
+   -0.761202385f, -0.648514401f,
+   -0.760206682f, -0.649681307f,
+   -0.759209189f, -0.650846685f,
+   -0.758209910f, -0.652010531f,
+   -0.757208847f, -0.653172843f,
+   -0.756206001f, -0.654333618f,
+   -0.755201377f, -0.655492853f,
+   -0.754194975f, -0.656650546f,
+   -0.753186799f, -0.657806693f,
+   -0.752176850f, -0.658961293f,
+   -0.751165132f, -0.660114342f,
+   -0.750151646f, -0.661265838f,
+   -0.749136395f, -0.662415778f,
+   -0.748119380f, -0.663564159f,
+   -0.747100606f, -0.664710978f,
+   -0.746080074f, -0.665856234f,
+   -0.745057785f, -0.666999922f,
+   -0.744033744f, -0.668142041f,
+   -0.743007952f, -0.669282588f,
+   -0.741980412f, -0.670421560f,
+   -0.740951125f, -0.671558955f,
+   -0.739920095f, -0.672694769f,
+   -0.738887324f, -0.673829000f,
+   -0.737852815f, -0.674961646f,
+   -0.736816569f, -0.676092704f,
+   -0.735778589f, -0.677222170f,
+   -0.734738878f, -0.678350043f,
+   -0.733697438f, -0.679476320f,
+   -0.732654272f, -0.680600998f,
+   -0.731609381f, -0.681724074f,
+   -0.730562769f, -0.682845546f,
+   -0.729514438f, -0.683965412f,
+   -0.728464390f, -0.685083668f,
+   -0.727412629f, -0.686200312f,
+   -0.726359155f, -0.687315341f,
+   -0.725303972f, -0.688428753f,
+   -0.724247083f, -0.689540545f,
+   -0.723188489f, -0.690650714f,
+   -0.722128194f, -0.691759258f,
+   -0.721066199f, -0.692866175f,
+   -0.720002508f, -0.693971461f,
+   -0.718937122f, -0.695075114f,
+   -0.717870045f, -0.696177131f,
+   -0.716801279f, -0.697277511f,
+   -0.715730825f, -0.698376249f,
+   -0.714658688f, -0.699473345f,
+   -0.713584869f, -0.700568794f,
+   -0.712509371f, -0.701662595f,
+   -0.711432196f, -0.702754744f,
+   -0.710353347f, -0.703845241f,
+   -0.709272826f, -0.704934080f,
+   -0.708190637f, -0.706021261f,
+   -0.707106781f, -0.707106781f,
+   -0.706021261f, -0.708190637f,
+   -0.704934080f, -0.709272826f,
+   -0.703845241f, -0.710353347f,
+   -0.702754744f, -0.711432196f,
+   -0.701662595f, -0.712509371f,
+   -0.700568794f, -0.713584869f,
+   -0.699473345f, -0.714658688f,
+   -0.698376249f, -0.715730825f,
+   -0.697277511f, -0.716801279f,
+   -0.696177131f, -0.717870045f,
+   -0.695075114f, -0.718937122f,
+   -0.693971461f, -0.720002508f,
+   -0.692866175f, -0.721066199f,
+   -0.691759258f, -0.722128194f,
+   -0.690650714f, -0.723188489f,
+   -0.689540545f, -0.724247083f,
+   -0.688428753f, -0.725303972f,
+   -0.687315341f, -0.726359155f,
+   -0.686200312f, -0.727412629f,
+   -0.685083668f, -0.728464390f,
+   -0.683965412f, -0.729514438f,
+   -0.682845546f, -0.730562769f,
+   -0.681724074f, -0.731609381f,
+   -0.680600998f, -0.732654272f,
+   -0.679476320f, -0.733697438f,
+   -0.678350043f, -0.734738878f,
+   -0.677222170f, -0.735778589f,
+   -0.676092704f, -0.736816569f,
+   -0.674961646f, -0.737852815f,
+   -0.673829000f, -0.738887324f,
+   -0.672694769f, -0.739920095f,
+   -0.671558955f, -0.740951125f,
+   -0.670421560f, -0.741980412f,
+   -0.669282588f, -0.743007952f,
+   -0.668142041f, -0.744033744f,
+   -0.666999922f, -0.745057785f,
+   -0.665856234f, -0.746080074f,
+   -0.664710978f, -0.747100606f,
+   -0.663564159f, -0.748119380f,
+   -0.662415778f, -0.749136395f,
+   -0.661265838f, -0.750151646f,
+   -0.660114342f, -0.751165132f,
+   -0.658961293f, -0.752176850f,
+   -0.657806693f, -0.753186799f,
+   -0.656650546f, -0.754194975f,
+   -0.655492853f, -0.755201377f,
+   -0.654333618f, -0.756206001f,
+   -0.653172843f, -0.757208847f,
+   -0.652010531f, -0.758209910f,
+   -0.650846685f, -0.759209189f,
+   -0.649681307f, -0.760206682f,
+   -0.648514401f, -0.761202385f,
+   -0.647345969f, -0.762196298f,
+   -0.646176013f, -0.763188417f,
+   -0.645004537f, -0.764178741f,
+   -0.643831543f, -0.765167266f,
+   -0.642657034f, -0.766153990f,
+   -0.641481013f, -0.767138912f,
+   -0.640303482f, -0.768122029f,
+   -0.639124445f, -0.769103338f,
+   -0.637943904f, -0.770082837f,
+   -0.636761861f, -0.771060524f,
+   -0.635578320f, -0.772036397f,
+   -0.634393284f, -0.773010453f,
+   -0.633206755f, -0.773982691f,
+   -0.632018736f, -0.774953107f,
+   -0.630829230f, -0.775921699f,
+   -0.629638239f, -0.776888466f,
+   -0.628445767f, -0.777853404f,
+   -0.627251815f, -0.778816512f,
+   -0.626056388f, -0.779777788f,
+   -0.624859488f, -0.780737229f,
+   -0.623661118f, -0.781694832f,
+   -0.622461279f, -0.782650596f,
+   -0.621259977f, -0.783604519f,
+   -0.620057212f, -0.784556597f,
+   -0.618852988f, -0.785506830f,
+   -0.617647308f, -0.786455214f,
+   -0.616440175f, -0.787401747f,
+   -0.615231591f, -0.788346428f,
+   -0.614021559f, -0.789289253f,
+   -0.612810082f, -0.790230221f,
+   -0.611597164f, -0.791169330f,
+   -0.610382806f, -0.792106577f,
+   -0.609167012f, -0.793041960f,
+   -0.607949785f, -0.793975478f,
+   -0.606731127f, -0.794907126f,
+   -0.605511041f, -0.795836905f,
+   -0.604289531f, -0.796764810f,
+   -0.603066599f, -0.797690841f,
+   -0.601842247f, -0.798614995f,
+   -0.600616479f, -0.799537269f,
+   -0.599389298f, -0.800457662f,
+   -0.598160707f, -0.801376172f,
+   -0.596930708f, -0.802292796f,
+   -0.595699304f, -0.803207531f,
+   -0.594466499f, -0.804120377f,
+   -0.593232295f, -0.805031331f,
+   -0.591996695f, -0.805940391f,
+   -0.590759702f, -0.806847554f,
+   -0.589521319f, -0.807752818f,
+   -0.588281548f, -0.808656182f,
+   -0.587040394f, -0.809557642f,
+   -0.585797857f, -0.810457198f,
+   -0.584553943f, -0.811354847f,
+   -0.583308653f, -0.812250587f,
+   -0.582061990f, -0.813144415f,
+   -0.580813958f, -0.814036330f,
+   -0.579564559f, -0.814926329f,
+   -0.578313796f, -0.815814411f,
+   -0.577061673f, -0.816700573f,
+   -0.575808191f, -0.817584813f,
+   -0.574553355f, -0.818467130f,
+   -0.573297167f, -0.819347520f,
+   -0.572039629f, -0.820225983f,
+   -0.570780746f, -0.821102515f,
+   -0.569520519f, -0.821977115f,
+   -0.568258953f, -0.822849781f,
+   -0.566996049f, -0.823720511f,
+   -0.565731811f, -0.824589303f,
+   -0.564466242f, -0.825456154f,
+   -0.563199344f, -0.826321063f,
+   -0.561931121f, -0.827184027f,
+   -0.560661576f, -0.828045045f,
+   -0.559390712f, -0.828904115f,
+   -0.558118531f, -0.829761234f,
+   -0.556845037f, -0.830616400f,
+   -0.555570233f, -0.831469612f,
+   -0.554294121f, -0.832320868f,
+   -0.553016706f, -0.833170165f,
+   -0.551737988f, -0.834017501f,
+   -0.550457973f, -0.834862875f,
+   -0.549176662f, -0.835706284f,
+   -0.547894059f, -0.836547727f,
+   -0.546610167f, -0.837387202f,
+   -0.545324988f, -0.838224706f,
+   -0.544038527f, -0.839060237f,
+   -0.542750785f, -0.839893794f,
+   -0.541461766f, -0.840725375f,
+   -0.540171473f, -0.841554977f,
+   -0.538879909f, -0.842382600f,
+   -0.537587076f, -0.843208240f,
+   -0.536292979f, -0.844031895f,
+   -0.534997620f, -0.844853565f,
+   -0.533701002f, -0.845673247f,
+   -0.532403128f, -0.846490939f,
+   -0.531104001f, -0.847306639f,
+   -0.529803625f, -0.848120345f,
+   -0.528502002f, -0.848932055f,
+   -0.527199135f, -0.849741768f,
+   -0.525895027f, -0.850549481f,
+   -0.524589683f, -0.851355193f,
+   -0.523283103f, -0.852158902f,
+   -0.521975293f, -0.852960605f,
+   -0.520666254f, -0.853760301f,
+   -0.519355990f, -0.854557988f,
+   -0.518044504f, -0.855353665f,
+   -0.516731799f, -0.856147328f,
+   -0.515417878f, -0.856938977f,
+   -0.514102744f, -0.857728610f,
+   -0.512786401f, -0.858516224f,
+   -0.511468850f, -0.859301818f,
+   -0.510150097f, -0.860085390f,
+   -0.508830143f, -0.860866939f,
+   -0.507508991f, -0.861646461f,
+   -0.506186645f, -0.862423956f,
+   -0.504863109f, -0.863199422f,
+   -0.503538384f, -0.863972856f,
+   -0.502212474f, -0.864744258f,
+   -0.500885383f, -0.865513624f,
+   -0.499557113f, -0.866280954f,
+   -0.498227667f, -0.867046246f,
+   -0.496897049f, -0.867809497f,
+   -0.495565262f, -0.868570706f,
+   -0.494232309f, -0.869329871f,
+   -0.492898192f, -0.870086991f,
+   -0.491562916f, -0.870842063f,
+   -0.490226483f, -0.871595087f,
+   -0.488888897f, -0.872346059f,
+   -0.487550160f, -0.873094978f,
+   -0.486210276f, -0.873841843f,
+   -0.484869248f, -0.874586652f,
+   -0.483527079f, -0.875329403f,
+   -0.482183772f, -0.876070094f,
+   -0.480839331f, -0.876808724f,
+   -0.479493758f, -0.877545290f,
+   -0.478147056f, -0.878279792f,
+   -0.476799230f, -0.879012226f,
+   -0.475450282f, -0.879742593f,
+   -0.474100215f, -0.880470889f,
+   -0.472749032f, -0.881197113f,
+   -0.471396737f, -0.881921264f,
+   -0.470043332f, -0.882643340f,
+   -0.468688822f, -0.883363339f,
+   -0.467333209f, -0.884081259f,
+   -0.465976496f, -0.884797098f,
+   -0.464618686f, -0.885510856f,
+   -0.463259784f, -0.886222530f,
+   -0.461899791f, -0.886932119f,
+   -0.460538711f, -0.887639620f,
+   -0.459176548f, -0.888345033f,
+   -0.457813304f, -0.889048356f,
+   -0.456448982f, -0.889749586f,
+   -0.455083587f, -0.890448723f,
+   -0.453717121f, -0.891145765f,
+   -0.452349587f, -0.891840709f,
+   -0.450980989f, -0.892533555f,
+   -0.449611330f, -0.893224301f,
+   -0.448240612f, -0.893912945f,
+   -0.446868840f, -0.894599486f,
+   -0.445496017f, -0.895283921f,
+   -0.444122145f, -0.895966250f,
+   -0.442747228f, -0.896646470f,
+   -0.441371269f, -0.897324581f,
+   -0.439994271f, -0.898000580f,
+   -0.438616239f, -0.898674466f,
+   -0.437237174f, -0.899346237f,
+   -0.435857080f, -0.900015892f,
+   -0.434475961f, -0.900683429f,
+   -0.433093819f, -0.901348847f,
+   -0.431710658f, -0.902012144f,
+   -0.430326481f, -0.902673318f,
+   -0.428941292f, -0.903332368f,
+   -0.427555093f, -0.903989293f,
+   -0.426167889f, -0.904644091f,
+   -0.424779681f, -0.905296759f,
+   -0.423390474f, -0.905947298f,
+   -0.422000271f, -0.906595705f,
+   -0.420609074f, -0.907241978f,
+   -0.419216888f, -0.907886116f,
+   -0.417823716f, -0.908528119f,
+   -0.416429560f, -0.909167983f,
+   -0.415034424f, -0.909805708f,
+   -0.413638312f, -0.910441292f,
+   -0.412241227f, -0.911074734f,
+   -0.410843171f, -0.911706032f,
+   -0.409444149f, -0.912335185f,
+   -0.408044163f, -0.912962190f,
+   -0.406643217f, -0.913587048f,
+   -0.405241314f, -0.914209756f,
+   -0.403838458f, -0.914830312f,
+   -0.402434651f, -0.915448716f,
+   -0.401029897f, -0.916064966f,
+   -0.399624200f, -0.916679060f,
+   -0.398217562f, -0.917290997f,
+   -0.396809987f, -0.917900776f,
+   -0.395401479f, -0.918508394f,
+   -0.393992040f, -0.919113852f,
+   -0.392581674f, -0.919717146f,
+   -0.391170384f, -0.920318277f,
+   -0.389758174f, -0.920917242f,
+   -0.388345047f, -0.921514039f,
+   -0.386931006f, -0.922108669f,
+   -0.385516054f, -0.922701128f,
+   -0.384100195f, -0.923291417f,
+   -0.382683432f, -0.923879533f,
+   -0.381265769f, -0.924465474f,
+   -0.379847209f, -0.925049241f,
+   -0.378427755f, -0.925630831f,
+   -0.377007410f, -0.926210242f,
+   -0.375586178f, -0.926787474f,
+   -0.374164063f, -0.927362526f,
+   -0.372741067f, -0.927935395f,
+   -0.371317194f, -0.928506080f,
+   -0.369892447f, -0.929074581f,
+   -0.368466830f, -0.929640896f,
+   -0.367040346f, -0.930205023f,
+   -0.365612998f, -0.930766961f,
+   -0.364184790f, -0.931326709f,
+   -0.362755724f, -0.931884266f,
+   -0.361325806f, -0.932439629f,
+   -0.359895037f, -0.932992799f,
+   -0.358463421f, -0.933543773f,
+   -0.357030961f, -0.934092550f,
+   -0.355597662f, -0.934639130f,
+   -0.354163525f, -0.935183510f,
+   -0.352728556f, -0.935725689f,
+   -0.351292756f, -0.936265667f,
+   -0.349856130f, -0.936803442f,
+   -0.348418680f, -0.937339012f,
+   -0.346980411f, -0.937872376f,
+   -0.345541325f, -0.938403534f,
+   -0.344101426f, -0.938932484f,
+   -0.342660717f, -0.939459224f,
+   -0.341219202f, -0.939983753f,
+   -0.339776884f, -0.940506071f,
+   -0.338333767f, -0.941026175f,
+   -0.336889853f, -0.941544065f,
+   -0.335445147f, -0.942059740f,
+   -0.333999651f, -0.942573198f,
+   -0.332553370f, -0.943084437f,
+   -0.331106306f, -0.943593458f,
+   -0.329658463f, -0.944100258f,
+   -0.328209844f, -0.944604837f,
+   -0.326760452f, -0.945107193f,
+   -0.325310292f, -0.945607325f,
+   -0.323859367f, -0.946105232f,
+   -0.322407679f, -0.946600913f,
+   -0.320955232f, -0.947094366f,
+   -0.319502031f, -0.947585591f,
+   -0.318048077f, -0.948074586f,
+   -0.316593376f, -0.948561350f,
+   -0.315137929f, -0.949045882f,
+   -0.313681740f, -0.949528181f,
+   -0.312224814f, -0.950008245f,
+   -0.310767153f, -0.950486074f,
+   -0.309308760f, -0.950961666f,
+   -0.307849640f, -0.951435021f,
+   -0.306389795f, -0.951906137f,
+   -0.304929230f, -0.952375013f,
+   -0.303467947f, -0.952841648f,
+   -0.302005949f, -0.953306040f,
+   -0.300543241f, -0.953768190f,
+   -0.299079826f, -0.954228095f,
+   -0.297615707f, -0.954685755f,
+   -0.296150888f, -0.955141168f,
+   -0.294685372f, -0.955594334f,
+   -0.293219163f, -0.956045251f,
+   -0.291752263f, -0.956493919f,
+   -0.290284677f, -0.956940336f,
+   -0.288816408f, -0.957384501f,
+   -0.287347460f, -0.957826413f,
+   -0.285877835f, -0.958266071f,
+   -0.284407537f, -0.958703475f,
+   -0.282936570f, -0.959138622f,
+   -0.281464938f, -0.959571513f,
+   -0.279992643f, -0.960002146f,
+   -0.278519689f, -0.960430519f,
+   -0.277046080f, -0.960856633f,
+   -0.275571819f, -0.961280486f,
+   -0.274096910f, -0.961702077f,
+   -0.272621355f, -0.962121404f,
+   -0.271145160f, -0.962538468f,
+   -0.269668326f, -0.962953267f,
+   -0.268190857f, -0.963365800f,
+   -0.266712757f, -0.963776066f,
+   -0.265234030f, -0.964184064f,
+   -0.263754679f, -0.964589793f,
+   -0.262274707f, -0.964993253f,
+   -0.260794118f, -0.965394442f,
+   -0.259312915f, -0.965793359f,
+   -0.257831102f, -0.966190003f,
+   -0.256348682f, -0.966584374f,
+   -0.254865660f, -0.966976471f,
+   -0.253382037f, -0.967366292f,
+   -0.251897818f, -0.967753837f,
+   -0.250413007f, -0.968139105f,
+   -0.248927606f, -0.968522094f,
+   -0.247441619f, -0.968902805f,
+   -0.245955050f, -0.969281235f,
+   -0.244467903f, -0.969657385f,
+   -0.242980180f, -0.970031253f,
+   -0.241491885f, -0.970402839f,
+   -0.240003022f, -0.970772141f,
+   -0.238513595f, -0.971139158f,
+   -0.237023606f, -0.971503891f,
+   -0.235533059f, -0.971866337f,
+   -0.234041959f, -0.972226497f,
+   -0.232550307f, -0.972584369f,
+   -0.231058108f, -0.972939952f,
+   -0.229565366f, -0.973293246f,
+   -0.228072083f, -0.973644250f,
+   -0.226578264f, -0.973992962f,
+   -0.225083911f, -0.974339383f,
+   -0.223589029f, -0.974683511f,
+   -0.222093621f, -0.975025345f,
+   -0.220597690f, -0.975364885f,
+   -0.219101240f, -0.975702130f,
+   -0.217604275f, -0.976037079f,
+   -0.216106797f, -0.976369731f,
+   -0.214608811f, -0.976700086f,
+   -0.213110320f, -0.977028143f,
+   -0.211611327f, -0.977353900f,
+   -0.210111837f, -0.977677358f,
+   -0.208611852f, -0.977998515f,
+   -0.207111376f, -0.978317371f,
+   -0.205610413f, -0.978633924f,
+   -0.204108966f, -0.978948175f,
+   -0.202607039f, -0.979260123f,
+   -0.201104635f, -0.979569766f,
+   -0.199601758f, -0.979877104f,
+   -0.198098411f, -0.980182136f,
+   -0.196594598f, -0.980484862f,
+   -0.195090322f, -0.980785280f,
+   -0.193585587f, -0.981083391f,
+   -0.192080397f, -0.981379193f,
+   -0.190574755f, -0.981672686f,
+   -0.189068664f, -0.981963869f,
+   -0.187562129f, -0.982252741f,
+   -0.186055152f, -0.982539302f,
+   -0.184547737f, -0.982823551f,
+   -0.183039888f, -0.983105487f,
+   -0.181531608f, -0.983385110f,
+   -0.180022901f, -0.983662419f,
+   -0.178513771f, -0.983937413f,
+   -0.177004220f, -0.984210092f,
+   -0.175494253f, -0.984480455f,
+   -0.173983873f, -0.984748502f,
+   -0.172473084f, -0.985014231f,
+   -0.170961889f, -0.985277642f,
+   -0.169450291f, -0.985538735f,
+   -0.167938295f, -0.985797509f,
+   -0.166425904f, -0.986053963f,
+   -0.164913120f, -0.986308097f,
+   -0.163399949f, -0.986559910f,
+   -0.161886394f, -0.986809402f,
+   -0.160372457f, -0.987056571f,
+   -0.158858143f, -0.987301418f,
+   -0.157343456f, -0.987543942f,
+   -0.155828398f, -0.987784142f,
+   -0.154312973f, -0.988022017f,
+   -0.152797185f, -0.988257568f,
+   -0.151281038f, -0.988490793f,
+   -0.149764535f, -0.988721692f,
+   -0.148247679f, -0.988950265f,
+   -0.146730474f, -0.989176510f,
+   -0.145212925f, -0.989400428f,
+   -0.143695033f, -0.989622017f,
+   -0.142176804f, -0.989841278f,
+   -0.140658239f, -0.990058210f,
+   -0.139139344f, -0.990272812f,
+   -0.137620122f, -0.990485084f,
+   -0.136100575f, -0.990695025f,
+   -0.134580709f, -0.990902635f,
+   -0.133060525f, -0.991107914f,
+   -0.131540029f, -0.991310860f,
+   -0.130019223f, -0.991511473f,
+   -0.128498111f, -0.991709754f,
+   -0.126976696f, -0.991905700f,
+   -0.125454983f, -0.992099313f,
+   -0.123932975f, -0.992290591f,
+   -0.122410675f, -0.992479535f,
+   -0.120888087f, -0.992666142f,
+   -0.119365215f, -0.992850414f,
+   -0.117842062f, -0.993032350f,
+   -0.116318631f, -0.993211949f,
+   -0.114794927f, -0.993389211f,
+   -0.113270952f, -0.993564136f,
+   -0.111746711f, -0.993736722f,
+   -0.110222207f, -0.993906970f,
+   -0.108697444f, -0.994074879f,
+   -0.107172425f, -0.994240449f,
+   -0.105647154f, -0.994403680f,
+   -0.104121634f, -0.994564571f,
+   -0.102595869f, -0.994723121f,
+   -0.101069863f, -0.994879331f,
+   -0.099543619f, -0.995033199f,
+   -0.098017140f, -0.995184727f,
+   -0.096490431f, -0.995333912f,
+   -0.094963495f, -0.995480755f,
+   -0.093436336f, -0.995625256f,
+   -0.091908956f, -0.995767414f,
+   -0.090381361f, -0.995907229f,
+   -0.088853553f, -0.996044701f,
+   -0.087325535f, -0.996179829f,
+   -0.085797312f, -0.996312612f,
+   -0.084268888f, -0.996443051f,
+   -0.082740265f, -0.996571146f,
+   -0.081211447f, -0.996696895f,
+   -0.079682438f, -0.996820299f,
+   -0.078153242f, -0.996941358f,
+   -0.076623861f, -0.997060070f,
+   -0.075094301f, -0.997176437f,
+   -0.073564564f, -0.997290457f,
+   -0.072034653f, -0.997402130f,
+   -0.070504573f, -0.997511456f,
+   -0.068974328f, -0.997618435f,
+   -0.067443920f, -0.997723067f,
+   -0.065913353f, -0.997825350f,
+   -0.064382631f, -0.997925286f,
+   -0.062851758f, -0.998022874f,
+   -0.061320736f, -0.998118113f,
+   -0.059789571f, -0.998211003f,
+   -0.058258265f, -0.998301545f,
+   -0.056726821f, -0.998389737f,
+   -0.055195244f, -0.998475581f,
+   -0.053663538f, -0.998559074f,
+   -0.052131705f, -0.998640218f,
+   -0.050599749f, -0.998719012f,
+   -0.049067674f, -0.998795456f,
+   -0.047535484f, -0.998869550f,
+   -0.046003182f, -0.998941293f,
+   -0.044470772f, -0.999010686f,
+   -0.042938257f, -0.999077728f,
+   -0.041405641f, -0.999142419f,
+   -0.039872928f, -0.999204759f,
+   -0.038340120f, -0.999264747f,
+   -0.036807223f, -0.999322385f,
+   -0.035274239f, -0.999377670f,
+   -0.033741172f, -0.999430605f,
+   -0.032208025f, -0.999481187f,
+   -0.030674803f, -0.999529418f,
+   -0.029141509f, -0.999575296f,
+   -0.027608146f, -0.999618822f,
+   -0.026074718f, -0.999659997f,
+   -0.024541229f, -0.999698819f,
+   -0.023007681f, -0.999735288f,
+   -0.021474080f, -0.999769405f,
+   -0.019940429f, -0.999801170f,
+   -0.018406730f, -0.999830582f,
+   -0.016872988f, -0.999857641f,
+   -0.015339206f, -0.999882347f,
+   -0.013805389f, -0.999904701f,
+   -0.012271538f, -0.999924702f,
+   -0.010737659f, -0.999942350f,
+   -0.009203755f, -0.999957645f,
+   -0.007669829f, -0.999970586f,
+   -0.006135885f, -0.999981175f,
+   -0.004601926f, -0.999989411f,
+   -0.003067957f, -0.999995294f,
+   -0.001533980f, -0.999998823f,
+   -0.000000000f, -1.000000000f,
+    0.001533980f, -0.999998823f,
+    0.003067957f, -0.999995294f,
+    0.004601926f, -0.999989411f,
+    0.006135885f, -0.999981175f,
+    0.007669829f, -0.999970586f,
+    0.009203755f, -0.999957645f,
+    0.010737659f, -0.999942350f,
+    0.012271538f, -0.999924702f,
+    0.013805389f, -0.999904701f,
+    0.015339206f, -0.999882347f,
+    0.016872988f, -0.999857641f,
+    0.018406730f, -0.999830582f,
+    0.019940429f, -0.999801170f,
+    0.021474080f, -0.999769405f,
+    0.023007681f, -0.999735288f,
+    0.024541229f, -0.999698819f,
+    0.026074718f, -0.999659997f,
+    0.027608146f, -0.999618822f,
+    0.029141509f, -0.999575296f,
+    0.030674803f, -0.999529418f,
+    0.032208025f, -0.999481187f,
+    0.033741172f, -0.999430605f,
+    0.035274239f, -0.999377670f,
+    0.036807223f, -0.999322385f,
+    0.038340120f, -0.999264747f,
+    0.039872928f, -0.999204759f,
+    0.041405641f, -0.999142419f,
+    0.042938257f, -0.999077728f,
+    0.044470772f, -0.999010686f,
+    0.046003182f, -0.998941293f,
+    0.047535484f, -0.998869550f,
+    0.049067674f, -0.998795456f,
+    0.050599749f, -0.998719012f,
+    0.052131705f, -0.998640218f,
+    0.053663538f, -0.998559074f,
+    0.055195244f, -0.998475581f,
+    0.056726821f, -0.998389737f,
+    0.058258265f, -0.998301545f,
+    0.059789571f, -0.998211003f,
+    0.061320736f, -0.998118113f,
+    0.062851758f, -0.998022874f,
+    0.064382631f, -0.997925286f,
+    0.065913353f, -0.997825350f,
+    0.067443920f, -0.997723067f,
+    0.068974328f, -0.997618435f,
+    0.070504573f, -0.997511456f,
+    0.072034653f, -0.997402130f,
+    0.073564564f, -0.997290457f,
+    0.075094301f, -0.997176437f,
+    0.076623861f, -0.997060070f,
+    0.078153242f, -0.996941358f,
+    0.079682438f, -0.996820299f,
+    0.081211447f, -0.996696895f,
+    0.082740265f, -0.996571146f,
+    0.084268888f, -0.996443051f,
+    0.085797312f, -0.996312612f,
+    0.087325535f, -0.996179829f,
+    0.088853553f, -0.996044701f,
+    0.090381361f, -0.995907229f,
+    0.091908956f, -0.995767414f,
+    0.093436336f, -0.995625256f,
+    0.094963495f, -0.995480755f,
+    0.096490431f, -0.995333912f,
+    0.098017140f, -0.995184727f,
+    0.099543619f, -0.995033199f,
+    0.101069863f, -0.994879331f,
+    0.102595869f, -0.994723121f,
+    0.104121634f, -0.994564571f,
+    0.105647154f, -0.994403680f,
+    0.107172425f, -0.994240449f,
+    0.108697444f, -0.994074879f,
+    0.110222207f, -0.993906970f,
+    0.111746711f, -0.993736722f,
+    0.113270952f, -0.993564136f,
+    0.114794927f, -0.993389211f,
+    0.116318631f, -0.993211949f,
+    0.117842062f, -0.993032350f,
+    0.119365215f, -0.992850414f,
+    0.120888087f, -0.992666142f,
+    0.122410675f, -0.992479535f,
+    0.123932975f, -0.992290591f,
+    0.125454983f, -0.992099313f,
+    0.126976696f, -0.991905700f,
+    0.128498111f, -0.991709754f,
+    0.130019223f, -0.991511473f,
+    0.131540029f, -0.991310860f,
+    0.133060525f, -0.991107914f,
+    0.134580709f, -0.990902635f,
+    0.136100575f, -0.990695025f,
+    0.137620122f, -0.990485084f,
+    0.139139344f, -0.990272812f,
+    0.140658239f, -0.990058210f,
+    0.142176804f, -0.989841278f,
+    0.143695033f, -0.989622017f,
+    0.145212925f, -0.989400428f,
+    0.146730474f, -0.989176510f,
+    0.148247679f, -0.988950265f,
+    0.149764535f, -0.988721692f,
+    0.151281038f, -0.988490793f,
+    0.152797185f, -0.988257568f,
+    0.154312973f, -0.988022017f,
+    0.155828398f, -0.987784142f,
+    0.157343456f, -0.987543942f,
+    0.158858143f, -0.987301418f,
+    0.160372457f, -0.987056571f,
+    0.161886394f, -0.986809402f,
+    0.163399949f, -0.986559910f,
+    0.164913120f, -0.986308097f,
+    0.166425904f, -0.986053963f,
+    0.167938295f, -0.985797509f,
+    0.169450291f, -0.985538735f,
+    0.170961889f, -0.985277642f,
+    0.172473084f, -0.985014231f,
+    0.173983873f, -0.984748502f,
+    0.175494253f, -0.984480455f,
+    0.177004220f, -0.984210092f,
+    0.178513771f, -0.983937413f,
+    0.180022901f, -0.983662419f,
+    0.181531608f, -0.983385110f,
+    0.183039888f, -0.983105487f,
+    0.184547737f, -0.982823551f,
+    0.186055152f, -0.982539302f,
+    0.187562129f, -0.982252741f,
+    0.189068664f, -0.981963869f,
+    0.190574755f, -0.981672686f,
+    0.192080397f, -0.981379193f,
+    0.193585587f, -0.981083391f,
+    0.195090322f, -0.980785280f,
+    0.196594598f, -0.980484862f,
+    0.198098411f, -0.980182136f,
+    0.199601758f, -0.979877104f,
+    0.201104635f, -0.979569766f,
+    0.202607039f, -0.979260123f,
+    0.204108966f, -0.978948175f,
+    0.205610413f, -0.978633924f,
+    0.207111376f, -0.978317371f,
+    0.208611852f, -0.977998515f,
+    0.210111837f, -0.977677358f,
+    0.211611327f, -0.977353900f,
+    0.213110320f, -0.977028143f,
+    0.214608811f, -0.976700086f,
+    0.216106797f, -0.976369731f,
+    0.217604275f, -0.976037079f,
+    0.219101240f, -0.975702130f,
+    0.220597690f, -0.975364885f,
+    0.222093621f, -0.975025345f,
+    0.223589029f, -0.974683511f,
+    0.225083911f, -0.974339383f,
+    0.226578264f, -0.973992962f,
+    0.228072083f, -0.973644250f,
+    0.229565366f, -0.973293246f,
+    0.231058108f, -0.972939952f,
+    0.232550307f, -0.972584369f,
+    0.234041959f, -0.972226497f,
+    0.235533059f, -0.971866337f,
+    0.237023606f, -0.971503891f,
+    0.238513595f, -0.971139158f,
+    0.240003022f, -0.970772141f,
+    0.241491885f, -0.970402839f,
+    0.242980180f, -0.970031253f,
+    0.244467903f, -0.969657385f,
+    0.245955050f, -0.969281235f,
+    0.247441619f, -0.968902805f,
+    0.248927606f, -0.968522094f,
+    0.250413007f, -0.968139105f,
+    0.251897818f, -0.967753837f,
+    0.253382037f, -0.967366292f,
+    0.254865660f, -0.966976471f,
+    0.256348682f, -0.966584374f,
+    0.257831102f, -0.966190003f,
+    0.259312915f, -0.965793359f,
+    0.260794118f, -0.965394442f,
+    0.262274707f, -0.964993253f,
+    0.263754679f, -0.964589793f,
+    0.265234030f, -0.964184064f,
+    0.266712757f, -0.963776066f,
+    0.268190857f, -0.963365800f,
+    0.269668326f, -0.962953267f,
+    0.271145160f, -0.962538468f,
+    0.272621355f, -0.962121404f,
+    0.274096910f, -0.961702077f,
+    0.275571819f, -0.961280486f,
+    0.277046080f, -0.960856633f,
+    0.278519689f, -0.960430519f,
+    0.279992643f, -0.960002146f,
+    0.281464938f, -0.959571513f,
+    0.282936570f, -0.959138622f,
+    0.284407537f, -0.958703475f,
+    0.285877835f, -0.958266071f,
+    0.287347460f, -0.957826413f,
+    0.288816408f, -0.957384501f,
+    0.290284677f, -0.956940336f,
+    0.291752263f, -0.956493919f,
+    0.293219163f, -0.956045251f,
+    0.294685372f, -0.955594334f,
+    0.296150888f, -0.955141168f,
+    0.297615707f, -0.954685755f,
+    0.299079826f, -0.954228095f,
+    0.300543241f, -0.953768190f,
+    0.302005949f, -0.953306040f,
+    0.303467947f, -0.952841648f,
+    0.304929230f, -0.952375013f,
+    0.306389795f, -0.951906137f,
+    0.307849640f, -0.951435021f,
+    0.309308760f, -0.950961666f,
+    0.310767153f, -0.950486074f,
+    0.312224814f, -0.950008245f,
+    0.313681740f, -0.949528181f,
+    0.315137929f, -0.949045882f,
+    0.316593376f, -0.948561350f,
+    0.318048077f, -0.948074586f,
+    0.319502031f, -0.947585591f,
+    0.320955232f, -0.947094366f,
+    0.322407679f, -0.946600913f,
+    0.323859367f, -0.946105232f,
+    0.325310292f, -0.945607325f,
+    0.326760452f, -0.945107193f,
+    0.328209844f, -0.944604837f,
+    0.329658463f, -0.944100258f,
+    0.331106306f, -0.943593458f,
+    0.332553370f, -0.943084437f,
+    0.333999651f, -0.942573198f,
+    0.335445147f, -0.942059740f,
+    0.336889853f, -0.941544065f,
+    0.338333767f, -0.941026175f,
+    0.339776884f, -0.940506071f,
+    0.341219202f, -0.939983753f,
+    0.342660717f, -0.939459224f,
+    0.344101426f, -0.938932484f,
+    0.345541325f, -0.938403534f,
+    0.346980411f, -0.937872376f,
+    0.348418680f, -0.937339012f,
+    0.349856130f, -0.936803442f,
+    0.351292756f, -0.936265667f,
+    0.352728556f, -0.935725689f,
+    0.354163525f, -0.935183510f,
+    0.355597662f, -0.934639130f,
+    0.357030961f, -0.934092550f,
+    0.358463421f, -0.933543773f,
+    0.359895037f, -0.932992799f,
+    0.361325806f, -0.932439629f,
+    0.362755724f, -0.931884266f,
+    0.364184790f, -0.931326709f,
+    0.365612998f, -0.930766961f,
+    0.367040346f, -0.930205023f,
+    0.368466830f, -0.929640896f,
+    0.369892447f, -0.929074581f,
+    0.371317194f, -0.928506080f,
+    0.372741067f, -0.927935395f,
+    0.374164063f, -0.927362526f,
+    0.375586178f, -0.926787474f,
+    0.377007410f, -0.926210242f,
+    0.378427755f, -0.925630831f,
+    0.379847209f, -0.925049241f,
+    0.381265769f, -0.924465474f,
+    0.382683432f, -0.923879533f,
+    0.384100195f, -0.923291417f,
+    0.385516054f, -0.922701128f,
+    0.386931006f, -0.922108669f,
+    0.388345047f, -0.921514039f,
+    0.389758174f, -0.920917242f,
+    0.391170384f, -0.920318277f,
+    0.392581674f, -0.919717146f,
+    0.393992040f, -0.919113852f,
+    0.395401479f, -0.918508394f,
+    0.396809987f, -0.917900776f,
+    0.398217562f, -0.917290997f,
+    0.399624200f, -0.916679060f,
+    0.401029897f, -0.916064966f,
+    0.402434651f, -0.915448716f,
+    0.403838458f, -0.914830312f,
+    0.405241314f, -0.914209756f,
+    0.406643217f, -0.913587048f,
+    0.408044163f, -0.912962190f,
+    0.409444149f, -0.912335185f,
+    0.410843171f, -0.911706032f,
+    0.412241227f, -0.911074734f,
+    0.413638312f, -0.910441292f,
+    0.415034424f, -0.909805708f,
+    0.416429560f, -0.909167983f,
+    0.417823716f, -0.908528119f,
+    0.419216888f, -0.907886116f,
+    0.420609074f, -0.907241978f,
+    0.422000271f, -0.906595705f,
+    0.423390474f, -0.905947298f,
+    0.424779681f, -0.905296759f,
+    0.426167889f, -0.904644091f,
+    0.427555093f, -0.903989293f,
+    0.428941292f, -0.903332368f,
+    0.430326481f, -0.902673318f,
+    0.431710658f, -0.902012144f,
+    0.433093819f, -0.901348847f,
+    0.434475961f, -0.900683429f,
+    0.435857080f, -0.900015892f,
+    0.437237174f, -0.899346237f,
+    0.438616239f, -0.898674466f,
+    0.439994271f, -0.898000580f,
+    0.441371269f, -0.897324581f,
+    0.442747228f, -0.896646470f,
+    0.444122145f, -0.895966250f,
+    0.445496017f, -0.895283921f,
+    0.446868840f, -0.894599486f,
+    0.448240612f, -0.893912945f,
+    0.449611330f, -0.893224301f,
+    0.450980989f, -0.892533555f,
+    0.452349587f, -0.891840709f,
+    0.453717121f, -0.891145765f,
+    0.455083587f, -0.890448723f,
+    0.456448982f, -0.889749586f,
+    0.457813304f, -0.889048356f,
+    0.459176548f, -0.888345033f,
+    0.460538711f, -0.887639620f,
+    0.461899791f, -0.886932119f,
+    0.463259784f, -0.886222530f,
+    0.464618686f, -0.885510856f,
+    0.465976496f, -0.884797098f,
+    0.467333209f, -0.884081259f,
+    0.468688822f, -0.883363339f,
+    0.470043332f, -0.882643340f,
+    0.471396737f, -0.881921264f,
+    0.472749032f, -0.881197113f,
+    0.474100215f, -0.880470889f,
+    0.475450282f, -0.879742593f,
+    0.476799230f, -0.879012226f,
+    0.478147056f, -0.878279792f,
+    0.479493758f, -0.877545290f,
+    0.480839331f, -0.876808724f,
+    0.482183772f, -0.876070094f,
+    0.483527079f, -0.875329403f,
+    0.484869248f, -0.874586652f,
+    0.486210276f, -0.873841843f,
+    0.487550160f, -0.873094978f,
+    0.488888897f, -0.872346059f,
+    0.490226483f, -0.871595087f,
+    0.491562916f, -0.870842063f,
+    0.492898192f, -0.870086991f,
+    0.494232309f, -0.869329871f,
+    0.495565262f, -0.868570706f,
+    0.496897049f, -0.867809497f,
+    0.498227667f, -0.867046246f,
+    0.499557113f, -0.866280954f,
+    0.500885383f, -0.865513624f,
+    0.502212474f, -0.864744258f,
+    0.503538384f, -0.863972856f,
+    0.504863109f, -0.863199422f,
+    0.506186645f, -0.862423956f,
+    0.507508991f, -0.861646461f,
+    0.508830143f, -0.860866939f,
+    0.510150097f, -0.860085390f,
+    0.511468850f, -0.859301818f,
+    0.512786401f, -0.858516224f,
+    0.514102744f, -0.857728610f,
+    0.515417878f, -0.856938977f,
+    0.516731799f, -0.856147328f,
+    0.518044504f, -0.855353665f,
+    0.519355990f, -0.854557988f,
+    0.520666254f, -0.853760301f,
+    0.521975293f, -0.852960605f,
+    0.523283103f, -0.852158902f,
+    0.524589683f, -0.851355193f,
+    0.525895027f, -0.850549481f,
+    0.527199135f, -0.849741768f,
+    0.528502002f, -0.848932055f,
+    0.529803625f, -0.848120345f,
+    0.531104001f, -0.847306639f,
+    0.532403128f, -0.846490939f,
+    0.533701002f, -0.845673247f,
+    0.534997620f, -0.844853565f,
+    0.536292979f, -0.844031895f,
+    0.537587076f, -0.843208240f,
+    0.538879909f, -0.842382600f,
+    0.540171473f, -0.841554977f,
+    0.541461766f, -0.840725375f,
+    0.542750785f, -0.839893794f,
+    0.544038527f, -0.839060237f,
+    0.545324988f, -0.838224706f,
+    0.546610167f, -0.837387202f,
+    0.547894059f, -0.836547727f,
+    0.549176662f, -0.835706284f,
+    0.550457973f, -0.834862875f,
+    0.551737988f, -0.834017501f,
+    0.553016706f, -0.833170165f,
+    0.554294121f, -0.832320868f,
+    0.555570233f, -0.831469612f,
+    0.556845037f, -0.830616400f,
+    0.558118531f, -0.829761234f,
+    0.559390712f, -0.828904115f,
+    0.560661576f, -0.828045045f,
+    0.561931121f, -0.827184027f,
+    0.563199344f, -0.826321063f,
+    0.564466242f, -0.825456154f,
+    0.565731811f, -0.824589303f,
+    0.566996049f, -0.823720511f,
+    0.568258953f, -0.822849781f,
+    0.569520519f, -0.821977115f,
+    0.570780746f, -0.821102515f,
+    0.572039629f, -0.820225983f,
+    0.573297167f, -0.819347520f,
+    0.574553355f, -0.818467130f,
+    0.575808191f, -0.817584813f,
+    0.577061673f, -0.816700573f,
+    0.578313796f, -0.815814411f,
+    0.579564559f, -0.814926329f,
+    0.580813958f, -0.814036330f,
+    0.582061990f, -0.813144415f,
+    0.583308653f, -0.812250587f,
+    0.584553943f, -0.811354847f,
+    0.585797857f, -0.810457198f,
+    0.587040394f, -0.809557642f,
+    0.588281548f, -0.808656182f,
+    0.589521319f, -0.807752818f,
+    0.590759702f, -0.806847554f,
+    0.591996695f, -0.805940391f,
+    0.593232295f, -0.805031331f,
+    0.594466499f, -0.804120377f,
+    0.595699304f, -0.803207531f,
+    0.596930708f, -0.802292796f,
+    0.598160707f, -0.801376172f,
+    0.599389298f, -0.800457662f,
+    0.600616479f, -0.799537269f,
+    0.601842247f, -0.798614995f,
+    0.603066599f, -0.797690841f,
+    0.604289531f, -0.796764810f,
+    0.605511041f, -0.795836905f,
+    0.606731127f, -0.794907126f,
+    0.607949785f, -0.793975478f,
+    0.609167012f, -0.793041960f,
+    0.610382806f, -0.792106577f,
+    0.611597164f, -0.791169330f,
+    0.612810082f, -0.790230221f,
+    0.614021559f, -0.789289253f,
+    0.615231591f, -0.788346428f,
+    0.616440175f, -0.787401747f,
+    0.617647308f, -0.786455214f,
+    0.618852988f, -0.785506830f,
+    0.620057212f, -0.784556597f,
+    0.621259977f, -0.783604519f,
+    0.622461279f, -0.782650596f,
+    0.623661118f, -0.781694832f,
+    0.624859488f, -0.780737229f,
+    0.626056388f, -0.779777788f,
+    0.627251815f, -0.778816512f,
+    0.628445767f, -0.777853404f,
+    0.629638239f, -0.776888466f,
+    0.630829230f, -0.775921699f,
+    0.632018736f, -0.774953107f,
+    0.633206755f, -0.773982691f,
+    0.634393284f, -0.773010453f,
+    0.635578320f, -0.772036397f,
+    0.636761861f, -0.771060524f,
+    0.637943904f, -0.770082837f,
+    0.639124445f, -0.769103338f,
+    0.640303482f, -0.768122029f,
+    0.641481013f, -0.767138912f,
+    0.642657034f, -0.766153990f,
+    0.643831543f, -0.765167266f,
+    0.645004537f, -0.764178741f,
+    0.646176013f, -0.763188417f,
+    0.647345969f, -0.762196298f,
+    0.648514401f, -0.761202385f,
+    0.649681307f, -0.760206682f,
+    0.650846685f, -0.759209189f,
+    0.652010531f, -0.758209910f,
+    0.653172843f, -0.757208847f,
+    0.654333618f, -0.756206001f,
+    0.655492853f, -0.755201377f,
+    0.656650546f, -0.754194975f,
+    0.657806693f, -0.753186799f,
+    0.658961293f, -0.752176850f,
+    0.660114342f, -0.751165132f,
+    0.661265838f, -0.750151646f,
+    0.662415778f, -0.749136395f,
+    0.663564159f, -0.748119380f,
+    0.664710978f, -0.747100606f,
+    0.665856234f, -0.746080074f,
+    0.666999922f, -0.745057785f,
+    0.668142041f, -0.744033744f,
+    0.669282588f, -0.743007952f,
+    0.670421560f, -0.741980412f,
+    0.671558955f, -0.740951125f,
+    0.672694769f, -0.739920095f,
+    0.673829000f, -0.738887324f,
+    0.674961646f, -0.737852815f,
+    0.676092704f, -0.736816569f,
+    0.677222170f, -0.735778589f,
+    0.678350043f, -0.734738878f,
+    0.679476320f, -0.733697438f,
+    0.680600998f, -0.732654272f,
+    0.681724074f, -0.731609381f,
+    0.682845546f, -0.730562769f,
+    0.683965412f, -0.729514438f,
+    0.685083668f, -0.728464390f,
+    0.686200312f, -0.727412629f,
+    0.687315341f, -0.726359155f,
+    0.688428753f, -0.725303972f,
+    0.689540545f, -0.724247083f,
+    0.690650714f, -0.723188489f,
+    0.691759258f, -0.722128194f,
+    0.692866175f, -0.721066199f,
+    0.693971461f, -0.720002508f,
+    0.695075114f, -0.718937122f,
+    0.696177131f, -0.717870045f,
+    0.697277511f, -0.716801279f,
+    0.698376249f, -0.715730825f,
+    0.699473345f, -0.714658688f,
+    0.700568794f, -0.713584869f,
+    0.701662595f, -0.712509371f,
+    0.702754744f, -0.711432196f,
+    0.703845241f, -0.710353347f,
+    0.704934080f, -0.709272826f,
+    0.706021261f, -0.708190637f,
+    0.707106781f, -0.707106781f,
+    0.708190637f, -0.706021261f,
+    0.709272826f, -0.704934080f,
+    0.710353347f, -0.703845241f,
+    0.711432196f, -0.702754744f,
+    0.712509371f, -0.701662595f,
+    0.713584869f, -0.700568794f,
+    0.714658688f, -0.699473345f,
+    0.715730825f, -0.698376249f,
+    0.716801279f, -0.697277511f,
+    0.717870045f, -0.696177131f,
+    0.718937122f, -0.695075114f,
+    0.720002508f, -0.693971461f,
+    0.721066199f, -0.692866175f,
+    0.722128194f, -0.691759258f,
+    0.723188489f, -0.690650714f,
+    0.724247083f, -0.689540545f,
+    0.725303972f, -0.688428753f,
+    0.726359155f, -0.687315341f,
+    0.727412629f, -0.686200312f,
+    0.728464390f, -0.685083668f,
+    0.729514438f, -0.683965412f,
+    0.730562769f, -0.682845546f,
+    0.731609381f, -0.681724074f,
+    0.732654272f, -0.680600998f,
+    0.733697438f, -0.679476320f,
+    0.734738878f, -0.678350043f,
+    0.735778589f, -0.677222170f,
+    0.736816569f, -0.676092704f,
+    0.737852815f, -0.674961646f,
+    0.738887324f, -0.673829000f,
+    0.739920095f, -0.672694769f,
+    0.740951125f, -0.671558955f,
+    0.741980412f, -0.670421560f,
+    0.743007952f, -0.669282588f,
+    0.744033744f, -0.668142041f,
+    0.745057785f, -0.666999922f,
+    0.746080074f, -0.665856234f,
+    0.747100606f, -0.664710978f,
+    0.748119380f, -0.663564159f,
+    0.749136395f, -0.662415778f,
+    0.750151646f, -0.661265838f,
+    0.751165132f, -0.660114342f,
+    0.752176850f, -0.658961293f,
+    0.753186799f, -0.657806693f,
+    0.754194975f, -0.656650546f,
+    0.755201377f, -0.655492853f,
+    0.756206001f, -0.654333618f,
+    0.757208847f, -0.653172843f,
+    0.758209910f, -0.652010531f,
+    0.759209189f, -0.650846685f,
+    0.760206682f, -0.649681307f,
+    0.761202385f, -0.648514401f,
+    0.762196298f, -0.647345969f,
+    0.763188417f, -0.646176013f,
+    0.764178741f, -0.645004537f,
+    0.765167266f, -0.643831543f,
+    0.766153990f, -0.642657034f,
+    0.767138912f, -0.641481013f,
+    0.768122029f, -0.640303482f,
+    0.769103338f, -0.639124445f,
+    0.770082837f, -0.637943904f,
+    0.771060524f, -0.636761861f,
+    0.772036397f, -0.635578320f,
+    0.773010453f, -0.634393284f,
+    0.773982691f, -0.633206755f,
+    0.774953107f, -0.632018736f,
+    0.775921699f, -0.630829230f,
+    0.776888466f, -0.629638239f,
+    0.777853404f, -0.628445767f,
+    0.778816512f, -0.627251815f,
+    0.779777788f, -0.626056388f,
+    0.780737229f, -0.624859488f,
+    0.781694832f, -0.623661118f,
+    0.782650596f, -0.622461279f,
+    0.783604519f, -0.621259977f,
+    0.784556597f, -0.620057212f,
+    0.785506830f, -0.618852988f,
+    0.786455214f, -0.617647308f,
+    0.787401747f, -0.616440175f,
+    0.788346428f, -0.615231591f,
+    0.789289253f, -0.614021559f,
+    0.790230221f, -0.612810082f,
+    0.791169330f, -0.611597164f,
+    0.792106577f, -0.610382806f,
+    0.793041960f, -0.609167012f,
+    0.793975478f, -0.607949785f,
+    0.794907126f, -0.606731127f,
+    0.795836905f, -0.605511041f,
+    0.796764810f, -0.604289531f,
+    0.797690841f, -0.603066599f,
+    0.798614995f, -0.601842247f,
+    0.799537269f, -0.600616479f,
+    0.800457662f, -0.599389298f,
+    0.801376172f, -0.598160707f,
+    0.802292796f, -0.596930708f,
+    0.803207531f, -0.595699304f,
+    0.804120377f, -0.594466499f,
+    0.805031331f, -0.593232295f,
+    0.805940391f, -0.591996695f,
+    0.806847554f, -0.590759702f,
+    0.807752818f, -0.589521319f,
+    0.808656182f, -0.588281548f,
+    0.809557642f, -0.587040394f,
+    0.810457198f, -0.585797857f,
+    0.811354847f, -0.584553943f,
+    0.812250587f, -0.583308653f,
+    0.813144415f, -0.582061990f,
+    0.814036330f, -0.580813958f,
+    0.814926329f, -0.579564559f,
+    0.815814411f, -0.578313796f,
+    0.816700573f, -0.577061673f,
+    0.817584813f, -0.575808191f,
+    0.818467130f, -0.574553355f,
+    0.819347520f, -0.573297167f,
+    0.820225983f, -0.572039629f,
+    0.821102515f, -0.570780746f,
+    0.821977115f, -0.569520519f,
+    0.822849781f, -0.568258953f,
+    0.823720511f, -0.566996049f,
+    0.824589303f, -0.565731811f,
+    0.825456154f, -0.564466242f,
+    0.826321063f, -0.563199344f,
+    0.827184027f, -0.561931121f,
+    0.828045045f, -0.560661576f,
+    0.828904115f, -0.559390712f,
+    0.829761234f, -0.558118531f,
+    0.830616400f, -0.556845037f,
+    0.831469612f, -0.555570233f,
+    0.832320868f, -0.554294121f,
+    0.833170165f, -0.553016706f,
+    0.834017501f, -0.551737988f,
+    0.834862875f, -0.550457973f,
+    0.835706284f, -0.549176662f,
+    0.836547727f, -0.547894059f,
+    0.837387202f, -0.546610167f,
+    0.838224706f, -0.545324988f,
+    0.839060237f, -0.544038527f,
+    0.839893794f, -0.542750785f,
+    0.840725375f, -0.541461766f,
+    0.841554977f, -0.540171473f,
+    0.842382600f, -0.538879909f,
+    0.843208240f, -0.537587076f,
+    0.844031895f, -0.536292979f,
+    0.844853565f, -0.534997620f,
+    0.845673247f, -0.533701002f,
+    0.846490939f, -0.532403128f,
+    0.847306639f, -0.531104001f,
+    0.848120345f, -0.529803625f,
+    0.848932055f, -0.528502002f,
+    0.849741768f, -0.527199135f,
+    0.850549481f, -0.525895027f,
+    0.851355193f, -0.524589683f,
+    0.852158902f, -0.523283103f,
+    0.852960605f, -0.521975293f,
+    0.853760301f, -0.520666254f,
+    0.854557988f, -0.519355990f,
+    0.855353665f, -0.518044504f,
+    0.856147328f, -0.516731799f,
+    0.856938977f, -0.515417878f,
+    0.857728610f, -0.514102744f,
+    0.858516224f, -0.512786401f,
+    0.859301818f, -0.511468850f,
+    0.860085390f, -0.510150097f,
+    0.860866939f, -0.508830143f,
+    0.861646461f, -0.507508991f,
+    0.862423956f, -0.506186645f,
+    0.863199422f, -0.504863109f,
+    0.863972856f, -0.503538384f,
+    0.864744258f, -0.502212474f,
+    0.865513624f, -0.500885383f,
+    0.866280954f, -0.499557113f,
+    0.867046246f, -0.498227667f,
+    0.867809497f, -0.496897049f,
+    0.868570706f, -0.495565262f,
+    0.869329871f, -0.494232309f,
+    0.870086991f, -0.492898192f,
+    0.870842063f, -0.491562916f,
+    0.871595087f, -0.490226483f,
+    0.872346059f, -0.488888897f,
+    0.873094978f, -0.487550160f,
+    0.873841843f, -0.486210276f,
+    0.874586652f, -0.484869248f,
+    0.875329403f, -0.483527079f,
+    0.876070094f, -0.482183772f,
+    0.876808724f, -0.480839331f,
+    0.877545290f, -0.479493758f,
+    0.878279792f, -0.478147056f,
+    0.879012226f, -0.476799230f,
+    0.879742593f, -0.475450282f,
+    0.880470889f, -0.474100215f,
+    0.881197113f, -0.472749032f,
+    0.881921264f, -0.471396737f,
+    0.882643340f, -0.470043332f,
+    0.883363339f, -0.468688822f,
+    0.884081259f, -0.467333209f,
+    0.884797098f, -0.465976496f,
+    0.885510856f, -0.464618686f,
+    0.886222530f, -0.463259784f,
+    0.886932119f, -0.461899791f,
+    0.887639620f, -0.460538711f,
+    0.888345033f, -0.459176548f,
+    0.889048356f, -0.457813304f,
+    0.889749586f, -0.456448982f,
+    0.890448723f, -0.455083587f,
+    0.891145765f, -0.453717121f,
+    0.891840709f, -0.452349587f,
+    0.892533555f, -0.450980989f,
+    0.893224301f, -0.449611330f,
+    0.893912945f, -0.448240612f,
+    0.894599486f, -0.446868840f,
+    0.895283921f, -0.445496017f,
+    0.895966250f, -0.444122145f,
+    0.896646470f, -0.442747228f,
+    0.897324581f, -0.441371269f,
+    0.898000580f, -0.439994271f,
+    0.898674466f, -0.438616239f,
+    0.899346237f, -0.437237174f,
+    0.900015892f, -0.435857080f,
+    0.900683429f, -0.434475961f,
+    0.901348847f, -0.433093819f,
+    0.902012144f, -0.431710658f,
+    0.902673318f, -0.430326481f,
+    0.903332368f, -0.428941292f,
+    0.903989293f, -0.427555093f,
+    0.904644091f, -0.426167889f,
+    0.905296759f, -0.424779681f,
+    0.905947298f, -0.423390474f,
+    0.906595705f, -0.422000271f,
+    0.907241978f, -0.420609074f,
+    0.907886116f, -0.419216888f,
+    0.908528119f, -0.417823716f,
+    0.909167983f, -0.416429560f,
+    0.909805708f, -0.415034424f,
+    0.910441292f, -0.413638312f,
+    0.911074734f, -0.412241227f,
+    0.911706032f, -0.410843171f,
+    0.912335185f, -0.409444149f,
+    0.912962190f, -0.408044163f,
+    0.913587048f, -0.406643217f,
+    0.914209756f, -0.405241314f,
+    0.914830312f, -0.403838458f,
+    0.915448716f, -0.402434651f,
+    0.916064966f, -0.401029897f,
+    0.916679060f, -0.399624200f,
+    0.917290997f, -0.398217562f,
+    0.917900776f, -0.396809987f,
+    0.918508394f, -0.395401479f,
+    0.919113852f, -0.393992040f,
+    0.919717146f, -0.392581674f,
+    0.920318277f, -0.391170384f,
+    0.920917242f, -0.389758174f,
+    0.921514039f, -0.388345047f,
+    0.922108669f, -0.386931006f,
+    0.922701128f, -0.385516054f,
+    0.923291417f, -0.384100195f,
+    0.923879533f, -0.382683432f,
+    0.924465474f, -0.381265769f,
+    0.925049241f, -0.379847209f,
+    0.925630831f, -0.378427755f,
+    0.926210242f, -0.377007410f,
+    0.926787474f, -0.375586178f,
+    0.927362526f, -0.374164063f,
+    0.927935395f, -0.372741067f,
+    0.928506080f, -0.371317194f,
+    0.929074581f, -0.369892447f,
+    0.929640896f, -0.368466830f,
+    0.930205023f, -0.367040346f,
+    0.930766961f, -0.365612998f,
+    0.931326709f, -0.364184790f,
+    0.931884266f, -0.362755724f,
+    0.932439629f, -0.361325806f,
+    0.932992799f, -0.359895037f,
+    0.933543773f, -0.358463421f,
+    0.934092550f, -0.357030961f,
+    0.934639130f, -0.355597662f,
+    0.935183510f, -0.354163525f,
+    0.935725689f, -0.352728556f,
+    0.936265667f, -0.351292756f,
+    0.936803442f, -0.349856130f,
+    0.937339012f, -0.348418680f,
+    0.937872376f, -0.346980411f,
+    0.938403534f, -0.345541325f,
+    0.938932484f, -0.344101426f,
+    0.939459224f, -0.342660717f,
+    0.939983753f, -0.341219202f,
+    0.940506071f, -0.339776884f,
+    0.941026175f, -0.338333767f,
+    0.941544065f, -0.336889853f,
+    0.942059740f, -0.335445147f,
+    0.942573198f, -0.333999651f,
+    0.943084437f, -0.332553370f,
+    0.943593458f, -0.331106306f,
+    0.944100258f, -0.329658463f,
+    0.944604837f, -0.328209844f,
+    0.945107193f, -0.326760452f,
+    0.945607325f, -0.325310292f,
+    0.946105232f, -0.323859367f,
+    0.946600913f, -0.322407679f,
+    0.947094366f, -0.320955232f,
+    0.947585591f, -0.319502031f,
+    0.948074586f, -0.318048077f,
+    0.948561350f, -0.316593376f,
+    0.949045882f, -0.315137929f,
+    0.949528181f, -0.313681740f,
+    0.950008245f, -0.312224814f,
+    0.950486074f, -0.310767153f,
+    0.950961666f, -0.309308760f,
+    0.951435021f, -0.307849640f,
+    0.951906137f, -0.306389795f,
+    0.952375013f, -0.304929230f,
+    0.952841648f, -0.303467947f,
+    0.953306040f, -0.302005949f,
+    0.953768190f, -0.300543241f,
+    0.954228095f, -0.299079826f,
+    0.954685755f, -0.297615707f,
+    0.955141168f, -0.296150888f,
+    0.955594334f, -0.294685372f,
+    0.956045251f, -0.293219163f,
+    0.956493919f, -0.291752263f,
+    0.956940336f, -0.290284677f,
+    0.957384501f, -0.288816408f,
+    0.957826413f, -0.287347460f,
+    0.958266071f, -0.285877835f,
+    0.958703475f, -0.284407537f,
+    0.959138622f, -0.282936570f,
+    0.959571513f, -0.281464938f,
+    0.960002146f, -0.279992643f,
+    0.960430519f, -0.278519689f,
+    0.960856633f, -0.277046080f,
+    0.961280486f, -0.275571819f,
+    0.961702077f, -0.274096910f,
+    0.962121404f, -0.272621355f,
+    0.962538468f, -0.271145160f,
+    0.962953267f, -0.269668326f,
+    0.963365800f, -0.268190857f,
+    0.963776066f, -0.266712757f,
+    0.964184064f, -0.265234030f,
+    0.964589793f, -0.263754679f,
+    0.964993253f, -0.262274707f,
+    0.965394442f, -0.260794118f,
+    0.965793359f, -0.259312915f,
+    0.966190003f, -0.257831102f,
+    0.966584374f, -0.256348682f,
+    0.966976471f, -0.254865660f,
+    0.967366292f, -0.253382037f,
+    0.967753837f, -0.251897818f,
+    0.968139105f, -0.250413007f,
+    0.968522094f, -0.248927606f,
+    0.968902805f, -0.247441619f,
+    0.969281235f, -0.245955050f,
+    0.969657385f, -0.244467903f,
+    0.970031253f, -0.242980180f,
+    0.970402839f, -0.241491885f,
+    0.970772141f, -0.240003022f,
+    0.971139158f, -0.238513595f,
+    0.971503891f, -0.237023606f,
+    0.971866337f, -0.235533059f,
+    0.972226497f, -0.234041959f,
+    0.972584369f, -0.232550307f,
+    0.972939952f, -0.231058108f,
+    0.973293246f, -0.229565366f,
+    0.973644250f, -0.228072083f,
+    0.973992962f, -0.226578264f,
+    0.974339383f, -0.225083911f,
+    0.974683511f, -0.223589029f,
+    0.975025345f, -0.222093621f,
+    0.975364885f, -0.220597690f,
+    0.975702130f, -0.219101240f,
+    0.976037079f, -0.217604275f,
+    0.976369731f, -0.216106797f,
+    0.976700086f, -0.214608811f,
+    0.977028143f, -0.213110320f,
+    0.977353900f, -0.211611327f,
+    0.977677358f, -0.210111837f,
+    0.977998515f, -0.208611852f,
+    0.978317371f, -0.207111376f,
+    0.978633924f, -0.205610413f,
+    0.978948175f, -0.204108966f,
+    0.979260123f, -0.202607039f,
+    0.979569766f, -0.201104635f,
+    0.979877104f, -0.199601758f,
+    0.980182136f, -0.198098411f,
+    0.980484862f, -0.196594598f,
+    0.980785280f, -0.195090322f,
+    0.981083391f, -0.193585587f,
+    0.981379193f, -0.192080397f,
+    0.981672686f, -0.190574755f,
+    0.981963869f, -0.189068664f,
+    0.982252741f, -0.187562129f,
+    0.982539302f, -0.186055152f,
+    0.982823551f, -0.184547737f,
+    0.983105487f, -0.183039888f,
+    0.983385110f, -0.181531608f,
+    0.983662419f, -0.180022901f,
+    0.983937413f, -0.178513771f,
+    0.984210092f, -0.177004220f,
+    0.984480455f, -0.175494253f,
+    0.984748502f, -0.173983873f,
+    0.985014231f, -0.172473084f,
+    0.985277642f, -0.170961889f,
+    0.985538735f, -0.169450291f,
+    0.985797509f, -0.167938295f,
+    0.986053963f, -0.166425904f,
+    0.986308097f, -0.164913120f,
+    0.986559910f, -0.163399949f,
+    0.986809402f, -0.161886394f,
+    0.987056571f, -0.160372457f,
+    0.987301418f, -0.158858143f,
+    0.987543942f, -0.157343456f,
+    0.987784142f, -0.155828398f,
+    0.988022017f, -0.154312973f,
+    0.988257568f, -0.152797185f,
+    0.988490793f, -0.151281038f,
+    0.988721692f, -0.149764535f,
+    0.988950265f, -0.148247679f,
+    0.989176510f, -0.146730474f,
+    0.989400428f, -0.145212925f,
+    0.989622017f, -0.143695033f,
+    0.989841278f, -0.142176804f,
+    0.990058210f, -0.140658239f,
+    0.990272812f, -0.139139344f,
+    0.990485084f, -0.137620122f,
+    0.990695025f, -0.136100575f,
+    0.990902635f, -0.134580709f,
+    0.991107914f, -0.133060525f,
+    0.991310860f, -0.131540029f,
+    0.991511473f, -0.130019223f,
+    0.991709754f, -0.128498111f,
+    0.991905700f, -0.126976696f,
+    0.992099313f, -0.125454983f,
+    0.992290591f, -0.123932975f,
+    0.992479535f, -0.122410675f,
+    0.992666142f, -0.120888087f,
+    0.992850414f, -0.119365215f,
+    0.993032350f, -0.117842062f,
+    0.993211949f, -0.116318631f,
+    0.993389211f, -0.114794927f,
+    0.993564136f, -0.113270952f,
+    0.993736722f, -0.111746711f,
+    0.993906970f, -0.110222207f,
+    0.994074879f, -0.108697444f,
+    0.994240449f, -0.107172425f,
+    0.994403680f, -0.105647154f,
+    0.994564571f, -0.104121634f,
+    0.994723121f, -0.102595869f,
+    0.994879331f, -0.101069863f,
+    0.995033199f, -0.099543619f,
+    0.995184727f, -0.098017140f,
+    0.995333912f, -0.096490431f,
+    0.995480755f, -0.094963495f,
+    0.995625256f, -0.093436336f,
+    0.995767414f, -0.091908956f,
+    0.995907229f, -0.090381361f,
+    0.996044701f, -0.088853553f,
+    0.996179829f, -0.087325535f,
+    0.996312612f, -0.085797312f,
+    0.996443051f, -0.084268888f,
+    0.996571146f, -0.082740265f,
+    0.996696895f, -0.081211447f,
+    0.996820299f, -0.079682438f,
+    0.996941358f, -0.078153242f,
+    0.997060070f, -0.076623861f,
+    0.997176437f, -0.075094301f,
+    0.997290457f, -0.073564564f,
+    0.997402130f, -0.072034653f,
+    0.997511456f, -0.070504573f,
+    0.997618435f, -0.068974328f,
+    0.997723067f, -0.067443920f,
+    0.997825350f, -0.065913353f,
+    0.997925286f, -0.064382631f,
+    0.998022874f, -0.062851758f,
+    0.998118113f, -0.061320736f,
+    0.998211003f, -0.059789571f,
+    0.998301545f, -0.058258265f,
+    0.998389737f, -0.056726821f,
+    0.998475581f, -0.055195244f,
+    0.998559074f, -0.053663538f,
+    0.998640218f, -0.052131705f,
+    0.998719012f, -0.050599749f,
+    0.998795456f, -0.049067674f,
+    0.998869550f, -0.047535484f,
+    0.998941293f, -0.046003182f,
+    0.999010686f, -0.044470772f,
+    0.999077728f, -0.042938257f,
+    0.999142419f, -0.041405641f,
+    0.999204759f, -0.039872928f,
+    0.999264747f, -0.038340120f,
+    0.999322385f, -0.036807223f,
+    0.999377670f, -0.035274239f,
+    0.999430605f, -0.033741172f,
+    0.999481187f, -0.032208025f,
+    0.999529418f, -0.030674803f,
+    0.999575296f, -0.029141509f,
+    0.999618822f, -0.027608146f,
+    0.999659997f, -0.026074718f,
+    0.999698819f, -0.024541229f,
+    0.999735288f, -0.023007681f,
+    0.999769405f, -0.021474080f,
+    0.999801170f, -0.019940429f,
+    0.999830582f, -0.018406730f,
+    0.999857641f, -0.016872988f,
+    0.999882347f, -0.015339206f,
+    0.999904701f, -0.013805389f,
+    0.999924702f, -0.012271538f,
+    0.999942350f, -0.010737659f,
+    0.999957645f, -0.009203755f,
+    0.999970586f, -0.007669829f,
+    0.999981175f, -0.006135885f,
+    0.999989411f, -0.004601926f,
+    0.999995294f, -0.003067957f,
+    0.999998823f, -0.001533980f
+};
+
+/*    
+* @brief  Q31 Twiddle factors Table    
+*/
+
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 16	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_16_q31[24] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7641AF3C, 0x30FBC54D,
+    0x5A82799A, 0x5A82799A,
+    0x30FBC54D, 0x7641AF3C,
+    0x00000000, 0x7FFFFFFF,
+    0xCF043AB2, 0x7641AF3C,
+    0xA57D8666, 0x5A82799A,
+    0x89BE50C3, 0x30FBC54D,
+    0x80000000, 0x00000000,
+    0x89BE50C3, 0xCF043AB2,
+    0xA57D8666, 0xA57D8666,
+    0xCF043AB2, 0x89BE50C3
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 32	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_32_q31[48] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7641AF3C, 0x30FBC54D,
+    0x6A6D98A4, 0x471CECE6,
+    0x5A82799A, 0x5A82799A,
+    0x471CECE6, 0x6A6D98A4,
+    0x30FBC54D, 0x7641AF3C,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x00000000, 0x7FFFFFFF,
+    0xE70747C3, 0x7D8A5F3F,
+    0xCF043AB2, 0x7641AF3C,
+    0xB8E31319, 0x6A6D98A4,
+    0xA57D8666, 0x5A82799A,
+    0x9592675B, 0x471CECE6,
+    0x89BE50C3, 0x30FBC54D,
+    0x8275A0C0, 0x18F8B83C,
+    0x80000000, 0x00000000,
+    0x8275A0C0, 0xE70747C3,
+    0x89BE50C3, 0xCF043AB2,
+    0x9592675B, 0xB8E31319,
+    0xA57D8666, 0xA57D8666,
+    0xB8E31319, 0x9592675B,
+    0xCF043AB2, 0x89BE50C3,
+    0xE70747C3, 0x8275A0C0
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 64	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_64_q31[96] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7F62368F, 0x0C8BD35E,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7A7D055B, 0x25280C5D,
+    0x7641AF3C, 0x30FBC54D,
+    0x70E2CBC6, 0x3C56BA70,
+    0x6A6D98A4, 0x471CECE6,
+    0x62F201AC, 0x5133CC94,
+    0x5A82799A, 0x5A82799A,
+    0x5133CC94, 0x62F201AC,
+    0x471CECE6, 0x6A6D98A4,
+    0x3C56BA70, 0x70E2CBC6,
+    0x30FBC54D, 0x7641AF3C,
+    0x25280C5D, 0x7A7D055B,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x0C8BD35E, 0x7F62368F,
+    0x00000000, 0x7FFFFFFF,
+    0xF3742CA1, 0x7F62368F,
+    0xE70747C3, 0x7D8A5F3F,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xCF043AB2, 0x7641AF3C,
+    0xC3A9458F, 0x70E2CBC6,
+    0xB8E31319, 0x6A6D98A4,
+    0xAECC336B, 0x62F201AC,
+    0xA57D8666, 0x5A82799A,
+    0x9D0DFE53, 0x5133CC94,
+    0x9592675B, 0x471CECE6,
+    0x8F1D343A, 0x3C56BA70,
+    0x89BE50C3, 0x30FBC54D,
+    0x8582FAA4, 0x25280C5D,
+    0x8275A0C0, 0x18F8B83C,
+    0x809DC970, 0x0C8BD35E,
+    0x80000000, 0x00000000,
+    0x809DC970, 0xF3742CA1,
+    0x8275A0C0, 0xE70747C3,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x89BE50C3, 0xCF043AB2,
+    0x8F1D343A, 0xC3A9458F,
+    0x9592675B, 0xB8E31319,
+    0x9D0DFE53, 0xAECC336B,
+    0xA57D8666, 0xA57D8666,
+    0xAECC336B, 0x9D0DFE53,
+    0xB8E31319, 0x9592675B,
+    0xC3A9458F, 0x8F1D343A,
+    0xCF043AB2, 0x89BE50C3,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xE70747C3, 0x8275A0C0,
+    0xF3742CA1, 0x809DC970
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 128	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_128_q31[192] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7FD8878D, 0x0647D97C,
+    0x7F62368F, 0x0C8BD35E,
+    0x7E9D55FC, 0x12C8106E,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7C29FBEE, 0x1F19F97B,
+    0x7A7D055B, 0x25280C5D,
+    0x78848413, 0x2B1F34EB,
+    0x7641AF3C, 0x30FBC54D,
+    0x73B5EBD0, 0x36BA2013,
+    0x70E2CBC6, 0x3C56BA70,
+    0x6DCA0D14, 0x41CE1E64,
+    0x6A6D98A4, 0x471CECE6,
+    0x66CF811F, 0x4C3FDFF3,
+    0x62F201AC, 0x5133CC94,
+    0x5ED77C89, 0x55F5A4D2,
+    0x5A82799A, 0x5A82799A,
+    0x55F5A4D2, 0x5ED77C89,
+    0x5133CC94, 0x62F201AC,
+    0x4C3FDFF3, 0x66CF811F,
+    0x471CECE6, 0x6A6D98A4,
+    0x41CE1E64, 0x6DCA0D14,
+    0x3C56BA70, 0x70E2CBC6,
+    0x36BA2013, 0x73B5EBD0,
+    0x30FBC54D, 0x7641AF3C,
+    0x2B1F34EB, 0x78848413,
+    0x25280C5D, 0x7A7D055B,
+    0x1F19F97B, 0x7C29FBEE,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x12C8106E, 0x7E9D55FC,
+    0x0C8BD35E, 0x7F62368F,
+    0x0647D97C, 0x7FD8878D,
+    0x00000000, 0x7FFFFFFF,
+    0xF9B82683, 0x7FD8878D,
+    0xF3742CA1, 0x7F62368F,
+    0xED37EF91, 0x7E9D55FC,
+    0xE70747C3, 0x7D8A5F3F,
+    0xE0E60684, 0x7C29FBEE,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xD4E0CB14, 0x78848413,
+    0xCF043AB2, 0x7641AF3C,
+    0xC945DFEC, 0x73B5EBD0,
+    0xC3A9458F, 0x70E2CBC6,
+    0xBE31E19B, 0x6DCA0D14,
+    0xB8E31319, 0x6A6D98A4,
+    0xB3C0200C, 0x66CF811F,
+    0xAECC336B, 0x62F201AC,
+    0xAA0A5B2D, 0x5ED77C89,
+    0xA57D8666, 0x5A82799A,
+    0xA1288376, 0x55F5A4D2,
+    0x9D0DFE53, 0x5133CC94,
+    0x99307EE0, 0x4C3FDFF3,
+    0x9592675B, 0x471CECE6,
+    0x9235F2EB, 0x41CE1E64,
+    0x8F1D343A, 0x3C56BA70,
+    0x8C4A142F, 0x36BA2013,
+    0x89BE50C3, 0x30FBC54D,
+    0x877B7BEC, 0x2B1F34EB,
+    0x8582FAA4, 0x25280C5D,
+    0x83D60411, 0x1F19F97B,
+    0x8275A0C0, 0x18F8B83C,
+    0x8162AA03, 0x12C8106E,
+    0x809DC970, 0x0C8BD35E,
+    0x80277872, 0x0647D97C,
+    0x80000000, 0x00000000,
+    0x80277872, 0xF9B82683,
+    0x809DC970, 0xF3742CA1,
+    0x8162AA03, 0xED37EF91,
+    0x8275A0C0, 0xE70747C3,
+    0x83D60411, 0xE0E60684,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x877B7BEC, 0xD4E0CB14,
+    0x89BE50C3, 0xCF043AB2,
+    0x8C4A142F, 0xC945DFEC,
+    0x8F1D343A, 0xC3A9458F,
+    0x9235F2EB, 0xBE31E19B,
+    0x9592675B, 0xB8E31319,
+    0x99307EE0, 0xB3C0200C,
+    0x9D0DFE53, 0xAECC336B,
+    0xA1288376, 0xAA0A5B2D,
+    0xA57D8666, 0xA57D8666,
+    0xAA0A5B2D, 0xA1288376,
+    0xAECC336B, 0x9D0DFE53,
+    0xB3C0200C, 0x99307EE0,
+    0xB8E31319, 0x9592675B,
+    0xBE31E19B, 0x9235F2EB,
+    0xC3A9458F, 0x8F1D343A,
+    0xC945DFEC, 0x8C4A142F,
+    0xCF043AB2, 0x89BE50C3,
+    0xD4E0CB14, 0x877B7BEC,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xE0E60684, 0x83D60411,
+    0xE70747C3, 0x8275A0C0,
+    0xED37EF91, 0x8162AA03,
+    0xF3742CA1, 0x809DC970,
+    0xF9B82683, 0x80277872
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 256	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_256_q31[384] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7FF62182, 0x03242ABF,
+    0x7FD8878D, 0x0647D97C,
+    0x7FA736B4, 0x096A9049,
+    0x7F62368F, 0x0C8BD35E,
+    0x7F0991C3, 0x0FAB272B,
+    0x7E9D55FC, 0x12C8106E,
+    0x7E1D93E9, 0x15E21444,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7CE3CEB1, 0x1C0B826A,
+    0x7C29FBEE, 0x1F19F97B,
+    0x7B5D039D, 0x2223A4C5,
+    0x7A7D055B, 0x25280C5D,
+    0x798A23B1, 0x2826B928,
+    0x78848413, 0x2B1F34EB,
+    0x776C4EDB, 0x2E110A62,
+    0x7641AF3C, 0x30FBC54D,
+    0x7504D345, 0x33DEF287,
+    0x73B5EBD0, 0x36BA2013,
+    0x72552C84, 0x398CDD32,
+    0x70E2CBC6, 0x3C56BA70,
+    0x6F5F02B1, 0x3F1749B7,
+    0x6DCA0D14, 0x41CE1E64,
+    0x6C242960, 0x447ACD50,
+    0x6A6D98A4, 0x471CECE6,
+    0x68A69E81, 0x49B41533,
+    0x66CF811F, 0x4C3FDFF3,
+    0x64E88926, 0x4EBFE8A4,
+    0x62F201AC, 0x5133CC94,
+    0x60EC3830, 0x539B2AEF,
+    0x5ED77C89, 0x55F5A4D2,
+    0x5CB420DF, 0x5842DD54,
+    0x5A82799A, 0x5A82799A,
+    0x5842DD54, 0x5CB420DF,
+    0x55F5A4D2, 0x5ED77C89,
+    0x539B2AEF, 0x60EC3830,
+    0x5133CC94, 0x62F201AC,
+    0x4EBFE8A4, 0x64E88926,
+    0x4C3FDFF3, 0x66CF811F,
+    0x49B41533, 0x68A69E81,
+    0x471CECE6, 0x6A6D98A4,
+    0x447ACD50, 0x6C242960,
+    0x41CE1E64, 0x6DCA0D14,
+    0x3F1749B7, 0x6F5F02B1,
+    0x3C56BA70, 0x70E2CBC6,
+    0x398CDD32, 0x72552C84,
+    0x36BA2013, 0x73B5EBD0,
+    0x33DEF287, 0x7504D345,
+    0x30FBC54D, 0x7641AF3C,
+    0x2E110A62, 0x776C4EDB,
+    0x2B1F34EB, 0x78848413,
+    0x2826B928, 0x798A23B1,
+    0x25280C5D, 0x7A7D055B,
+    0x2223A4C5, 0x7B5D039D,
+    0x1F19F97B, 0x7C29FBEE,
+    0x1C0B826A, 0x7CE3CEB1,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x15E21444, 0x7E1D93E9,
+    0x12C8106E, 0x7E9D55FC,
+    0x0FAB272B, 0x7F0991C3,
+    0x0C8BD35E, 0x7F62368F,
+    0x096A9049, 0x7FA736B4,
+    0x0647D97C, 0x7FD8878D,
+    0x03242ABF, 0x7FF62182,
+    0x00000000, 0x7FFFFFFF,
+    0xFCDBD541, 0x7FF62182,
+    0xF9B82683, 0x7FD8878D,
+    0xF6956FB6, 0x7FA736B4,
+    0xF3742CA1, 0x7F62368F,
+    0xF054D8D4, 0x7F0991C3,
+    0xED37EF91, 0x7E9D55FC,
+    0xEA1DEBBB, 0x7E1D93E9,
+    0xE70747C3, 0x7D8A5F3F,
+    0xE3F47D95, 0x7CE3CEB1,
+    0xE0E60684, 0x7C29FBEE,
+    0xDDDC5B3A, 0x7B5D039D,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xD7D946D7, 0x798A23B1,
+    0xD4E0CB14, 0x78848413,
+    0xD1EEF59E, 0x776C4EDB,
+    0xCF043AB2, 0x7641AF3C,
+    0xCC210D78, 0x7504D345,
+    0xC945DFEC, 0x73B5EBD0,
+    0xC67322CD, 0x72552C84,
+    0xC3A9458F, 0x70E2CBC6,
+    0xC0E8B648, 0x6F5F02B1,
+    0xBE31E19B, 0x6DCA0D14,
+    0xBB8532AF, 0x6C242960,
+    0xB8E31319, 0x6A6D98A4,
+    0xB64BEACC, 0x68A69E81,
+    0xB3C0200C, 0x66CF811F,
+    0xB140175B, 0x64E88926,
+    0xAECC336B, 0x62F201AC,
+    0xAC64D510, 0x60EC3830,
+    0xAA0A5B2D, 0x5ED77C89,
+    0xA7BD22AB, 0x5CB420DF,
+    0xA57D8666, 0x5A82799A,
+    0xA34BDF20, 0x5842DD54,
+    0xA1288376, 0x55F5A4D2,
+    0x9F13C7D0, 0x539B2AEF,
+    0x9D0DFE53, 0x5133CC94,
+    0x9B1776D9, 0x4EBFE8A4,
+    0x99307EE0, 0x4C3FDFF3,
+    0x9759617E, 0x49B41533,
+    0x9592675B, 0x471CECE6,
+    0x93DBD69F, 0x447ACD50,
+    0x9235F2EB, 0x41CE1E64,
+    0x90A0FD4E, 0x3F1749B7,
+    0x8F1D343A, 0x3C56BA70,
+    0x8DAAD37B, 0x398CDD32,
+    0x8C4A142F, 0x36BA2013,
+    0x8AFB2CBA, 0x33DEF287,
+    0x89BE50C3, 0x30FBC54D,
+    0x8893B124, 0x2E110A62,
+    0x877B7BEC, 0x2B1F34EB,
+    0x8675DC4E, 0x2826B928,
+    0x8582FAA4, 0x25280C5D,
+    0x84A2FC62, 0x2223A4C5,
+    0x83D60411, 0x1F19F97B,
+    0x831C314E, 0x1C0B826A,
+    0x8275A0C0, 0x18F8B83C,
+    0x81E26C16, 0x15E21444,
+    0x8162AA03, 0x12C8106E,
+    0x80F66E3C, 0x0FAB272B,
+    0x809DC970, 0x0C8BD35E,
+    0x8058C94C, 0x096A9049,
+    0x80277872, 0x0647D97C,
+    0x8009DE7D, 0x03242ABF,
+    0x80000000, 0x00000000,
+    0x8009DE7D, 0xFCDBD541,
+    0x80277872, 0xF9B82683,
+    0x8058C94C, 0xF6956FB6,
+    0x809DC970, 0xF3742CA1,
+    0x80F66E3C, 0xF054D8D4,
+    0x8162AA03, 0xED37EF91,
+    0x81E26C16, 0xEA1DEBBB,
+    0x8275A0C0, 0xE70747C3,
+    0x831C314E, 0xE3F47D95,
+    0x83D60411, 0xE0E60684,
+    0x84A2FC62, 0xDDDC5B3A,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x8675DC4E, 0xD7D946D7,
+    0x877B7BEC, 0xD4E0CB14,
+    0x8893B124, 0xD1EEF59E,
+    0x89BE50C3, 0xCF043AB2,
+    0x8AFB2CBA, 0xCC210D78,
+    0x8C4A142F, 0xC945DFEC,
+    0x8DAAD37B, 0xC67322CD,
+    0x8F1D343A, 0xC3A9458F,
+    0x90A0FD4E, 0xC0E8B648,
+    0x9235F2EB, 0xBE31E19B,
+    0x93DBD69F, 0xBB8532AF,
+    0x9592675B, 0xB8E31319,
+    0x9759617E, 0xB64BEACC,
+    0x99307EE0, 0xB3C0200C,
+    0x9B1776D9, 0xB140175B,
+    0x9D0DFE53, 0xAECC336B,
+    0x9F13C7D0, 0xAC64D510,
+    0xA1288376, 0xAA0A5B2D,
+    0xA34BDF20, 0xA7BD22AB,
+    0xA57D8666, 0xA57D8666,
+    0xA7BD22AB, 0xA34BDF20,
+    0xAA0A5B2D, 0xA1288376,
+    0xAC64D510, 0x9F13C7D0,
+    0xAECC336B, 0x9D0DFE53,
+    0xB140175B, 0x9B1776D9,
+    0xB3C0200C, 0x99307EE0,
+    0xB64BEACC, 0x9759617E,
+    0xB8E31319, 0x9592675B,
+    0xBB8532AF, 0x93DBD69F,
+    0xBE31E19B, 0x9235F2EB,
+    0xC0E8B648, 0x90A0FD4E,
+    0xC3A9458F, 0x8F1D343A,
+    0xC67322CD, 0x8DAAD37B,
+    0xC945DFEC, 0x8C4A142F,
+    0xCC210D78, 0x8AFB2CBA,
+    0xCF043AB2, 0x89BE50C3,
+    0xD1EEF59E, 0x8893B124,
+    0xD4E0CB14, 0x877B7BEC,
+    0xD7D946D7, 0x8675DC4E,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xDDDC5B3A, 0x84A2FC62,
+    0xE0E60684, 0x83D60411,
+    0xE3F47D95, 0x831C314E,
+    0xE70747C3, 0x8275A0C0,
+    0xEA1DEBBB, 0x81E26C16,
+    0xED37EF91, 0x8162AA03,
+    0xF054D8D4, 0x80F66E3C,
+    0xF3742CA1, 0x809DC970,
+    0xF6956FB6, 0x8058C94C,
+    0xF9B82683, 0x80277872,
+    0xFCDBD541, 0x8009DE7D
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 512	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_512_q31[768] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7FFD885A, 0x01921D1F,
+    0x7FF62182, 0x03242ABF,
+    0x7FE9CBC0, 0x04B6195D,
+    0x7FD8878D, 0x0647D97C,
+    0x7FC25596, 0x07D95B9E,
+    0x7FA736B4, 0x096A9049,
+    0x7F872BF3, 0x0AFB6805,
+    0x7F62368F, 0x0C8BD35E,
+    0x7F3857F5, 0x0E1BC2E3,
+    0x7F0991C3, 0x0FAB272B,
+    0x7ED5E5C6, 0x1139F0CE,
+    0x7E9D55FC, 0x12C8106E,
+    0x7E5FE493, 0x145576B1,
+    0x7E1D93E9, 0x15E21444,
+    0x7DD6668E, 0x176DD9DE,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7D3980EC, 0x1A82A025,
+    0x7CE3CEB1, 0x1C0B826A,
+    0x7C894BDD, 0x1D934FE5,
+    0x7C29FBEE, 0x1F19F97B,
+    0x7BC5E28F, 0x209F701C,
+    0x7B5D039D, 0x2223A4C5,
+    0x7AEF6323, 0x23A6887E,
+    0x7A7D055B, 0x25280C5D,
+    0x7A05EEAD, 0x26A82185,
+    0x798A23B1, 0x2826B928,
+    0x7909A92C, 0x29A3C484,
+    0x78848413, 0x2B1F34EB,
+    0x77FAB988, 0x2C98FBBA,
+    0x776C4EDB, 0x2E110A62,
+    0x76D94988, 0x2F875262,
+    0x7641AF3C, 0x30FBC54D,
+    0x75A585CF, 0x326E54C7,
+    0x7504D345, 0x33DEF287,
+    0x745F9DD1, 0x354D9056,
+    0x73B5EBD0, 0x36BA2013,
+    0x7307C3D0, 0x382493B0,
+    0x72552C84, 0x398CDD32,
+    0x719E2CD2, 0x3AF2EEB7,
+    0x70E2CBC6, 0x3C56BA70,
+    0x70231099, 0x3DB832A5,
+    0x6F5F02B1, 0x3F1749B7,
+    0x6E96A99C, 0x4073F21D,
+    0x6DCA0D14, 0x41CE1E64,
+    0x6CF934FB, 0x4325C135,
+    0x6C242960, 0x447ACD50,
+    0x6B4AF278, 0x45CD358F,
+    0x6A6D98A4, 0x471CECE6,
+    0x698C246C, 0x4869E664,
+    0x68A69E81, 0x49B41533,
+    0x67BD0FBC, 0x4AFB6C97,
+    0x66CF811F, 0x4C3FDFF3,
+    0x65DDFBD3, 0x4D8162C4,
+    0x64E88926, 0x4EBFE8A4,
+    0x63EF328F, 0x4FFB654D,
+    0x62F201AC, 0x5133CC94,
+    0x61F1003E, 0x5269126E,
+    0x60EC3830, 0x539B2AEF,
+    0x5FE3B38D, 0x54CA0A4A,
+    0x5ED77C89, 0x55F5A4D2,
+    0x5DC79D7C, 0x571DEEF9,
+    0x5CB420DF, 0x5842DD54,
+    0x5B9D1153, 0x59646497,
+    0x5A82799A, 0x5A82799A,
+    0x59646497, 0x5B9D1153,
+    0x5842DD54, 0x5CB420DF,
+    0x571DEEF9, 0x5DC79D7C,
+    0x55F5A4D2, 0x5ED77C89,
+    0x54CA0A4A, 0x5FE3B38D,
+    0x539B2AEF, 0x60EC3830,
+    0x5269126E, 0x61F1003E,
+    0x5133CC94, 0x62F201AC,
+    0x4FFB654D, 0x63EF328F,
+    0x4EBFE8A4, 0x64E88926,
+    0x4D8162C4, 0x65DDFBD3,
+    0x4C3FDFF3, 0x66CF811F,
+    0x4AFB6C97, 0x67BD0FBC,
+    0x49B41533, 0x68A69E81,
+    0x4869E664, 0x698C246C,
+    0x471CECE6, 0x6A6D98A4,
+    0x45CD358F, 0x6B4AF278,
+    0x447ACD50, 0x6C242960,
+    0x4325C135, 0x6CF934FB,
+    0x41CE1E64, 0x6DCA0D14,
+    0x4073F21D, 0x6E96A99C,
+    0x3F1749B7, 0x6F5F02B1,
+    0x3DB832A5, 0x70231099,
+    0x3C56BA70, 0x70E2CBC6,
+    0x3AF2EEB7, 0x719E2CD2,
+    0x398CDD32, 0x72552C84,
+    0x382493B0, 0x7307C3D0,
+    0x36BA2013, 0x73B5EBD0,
+    0x354D9056, 0x745F9DD1,
+    0x33DEF287, 0x7504D345,
+    0x326E54C7, 0x75A585CF,
+    0x30FBC54D, 0x7641AF3C,
+    0x2F875262, 0x76D94988,
+    0x2E110A62, 0x776C4EDB,
+    0x2C98FBBA, 0x77FAB988,
+    0x2B1F34EB, 0x78848413,
+    0x29A3C484, 0x7909A92C,
+    0x2826B928, 0x798A23B1,
+    0x26A82185, 0x7A05EEAD,
+    0x25280C5D, 0x7A7D055B,
+    0x23A6887E, 0x7AEF6323,
+    0x2223A4C5, 0x7B5D039D,
+    0x209F701C, 0x7BC5E28F,
+    0x1F19F97B, 0x7C29FBEE,
+    0x1D934FE5, 0x7C894BDD,
+    0x1C0B826A, 0x7CE3CEB1,
+    0x1A82A025, 0x7D3980EC,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x176DD9DE, 0x7DD6668E,
+    0x15E21444, 0x7E1D93E9,
+    0x145576B1, 0x7E5FE493,
+    0x12C8106E, 0x7E9D55FC,
+    0x1139F0CE, 0x7ED5E5C6,
+    0x0FAB272B, 0x7F0991C3,
+    0x0E1BC2E3, 0x7F3857F5,
+    0x0C8BD35E, 0x7F62368F,
+    0x0AFB6805, 0x7F872BF3,
+    0x096A9049, 0x7FA736B4,
+    0x07D95B9E, 0x7FC25596,
+    0x0647D97C, 0x7FD8878D,
+    0x04B6195D, 0x7FE9CBC0,
+    0x03242ABF, 0x7FF62182,
+    0x01921D1F, 0x7FFD885A,
+    0x00000000, 0x7FFFFFFF,
+    0xFE6DE2E0, 0x7FFD885A,
+    0xFCDBD541, 0x7FF62182,
+    0xFB49E6A2, 0x7FE9CBC0,
+    0xF9B82683, 0x7FD8878D,
+    0xF826A461, 0x7FC25596,
+    0xF6956FB6, 0x7FA736B4,
+    0xF50497FA, 0x7F872BF3,
+    0xF3742CA1, 0x7F62368F,
+    0xF1E43D1C, 0x7F3857F5,
+    0xF054D8D4, 0x7F0991C3,
+    0xEEC60F31, 0x7ED5E5C6,
+    0xED37EF91, 0x7E9D55FC,
+    0xEBAA894E, 0x7E5FE493,
+    0xEA1DEBBB, 0x7E1D93E9,
+    0xE8922621, 0x7DD6668E,
+    0xE70747C3, 0x7D8A5F3F,
+    0xE57D5FDA, 0x7D3980EC,
+    0xE3F47D95, 0x7CE3CEB1,
+    0xE26CB01A, 0x7C894BDD,
+    0xE0E60684, 0x7C29FBEE,
+    0xDF608FE3, 0x7BC5E28F,
+    0xDDDC5B3A, 0x7B5D039D,
+    0xDC597781, 0x7AEF6323,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xD957DE7A, 0x7A05EEAD,
+    0xD7D946D7, 0x798A23B1,
+    0xD65C3B7B, 0x7909A92C,
+    0xD4E0CB14, 0x78848413,
+    0xD3670445, 0x77FAB988,
+    0xD1EEF59E, 0x776C4EDB,
+    0xD078AD9D, 0x76D94988,
+    0xCF043AB2, 0x7641AF3C,
+    0xCD91AB38, 0x75A585CF,
+    0xCC210D78, 0x7504D345,
+    0xCAB26FA9, 0x745F9DD1,
+    0xC945DFEC, 0x73B5EBD0,
+    0xC7DB6C50, 0x7307C3D0,
+    0xC67322CD, 0x72552C84,
+    0xC50D1148, 0x719E2CD2,
+    0xC3A9458F, 0x70E2CBC6,
+    0xC247CD5A, 0x70231099,
+    0xC0E8B648, 0x6F5F02B1,
+    0xBF8C0DE2, 0x6E96A99C,
+    0xBE31E19B, 0x6DCA0D14,
+    0xBCDA3ECA, 0x6CF934FB,
+    0xBB8532AF, 0x6C242960,
+    0xBA32CA70, 0x6B4AF278,
+    0xB8E31319, 0x6A6D98A4,
+    0xB796199B, 0x698C246C,
+    0xB64BEACC, 0x68A69E81,
+    0xB5049368, 0x67BD0FBC,
+    0xB3C0200C, 0x66CF811F,
+    0xB27E9D3B, 0x65DDFBD3,
+    0xB140175B, 0x64E88926,
+    0xB0049AB2, 0x63EF328F,
+    0xAECC336B, 0x62F201AC,
+    0xAD96ED91, 0x61F1003E,
+    0xAC64D510, 0x60EC3830,
+    0xAB35F5B5, 0x5FE3B38D,
+    0xAA0A5B2D, 0x5ED77C89,
+    0xA8E21106, 0x5DC79D7C,
+    0xA7BD22AB, 0x5CB420DF,
+    0xA69B9B68, 0x5B9D1153,
+    0xA57D8666, 0x5A82799A,
+    0xA462EEAC, 0x59646497,
+    0xA34BDF20, 0x5842DD54,
+    0xA2386283, 0x571DEEF9,
+    0xA1288376, 0x55F5A4D2,
+    0xA01C4C72, 0x54CA0A4A,
+    0x9F13C7D0, 0x539B2AEF,
+    0x9E0EFFC1, 0x5269126E,
+    0x9D0DFE53, 0x5133CC94,
+    0x9C10CD70, 0x4FFB654D,
+    0x9B1776D9, 0x4EBFE8A4,
+    0x9A22042C, 0x4D8162C4,
+    0x99307EE0, 0x4C3FDFF3,
+    0x9842F043, 0x4AFB6C97,
+    0x9759617E, 0x49B41533,
+    0x9673DB94, 0x4869E664,
+    0x9592675B, 0x471CECE6,
+    0x94B50D87, 0x45CD358F,
+    0x93DBD69F, 0x447ACD50,
+    0x9306CB04, 0x4325C135,
+    0x9235F2EB, 0x41CE1E64,
+    0x91695663, 0x4073F21D,
+    0x90A0FD4E, 0x3F1749B7,
+    0x8FDCEF66, 0x3DB832A5,
+    0x8F1D343A, 0x3C56BA70,
+    0x8E61D32D, 0x3AF2EEB7,
+    0x8DAAD37B, 0x398CDD32,
+    0x8CF83C30, 0x382493B0,
+    0x8C4A142F, 0x36BA2013,
+    0x8BA0622F, 0x354D9056,
+    0x8AFB2CBA, 0x33DEF287,
+    0x8A5A7A30, 0x326E54C7,
+    0x89BE50C3, 0x30FBC54D,
+    0x8926B677, 0x2F875262,
+    0x8893B124, 0x2E110A62,
+    0x88054677, 0x2C98FBBA,
+    0x877B7BEC, 0x2B1F34EB,
+    0x86F656D3, 0x29A3C484,
+    0x8675DC4E, 0x2826B928,
+    0x85FA1152, 0x26A82185,
+    0x8582FAA4, 0x25280C5D,
+    0x85109CDC, 0x23A6887E,
+    0x84A2FC62, 0x2223A4C5,
+    0x843A1D70, 0x209F701C,
+    0x83D60411, 0x1F19F97B,
+    0x8376B422, 0x1D934FE5,
+    0x831C314E, 0x1C0B826A,
+    0x82C67F13, 0x1A82A025,
+    0x8275A0C0, 0x18F8B83C,
+    0x82299971, 0x176DD9DE,
+    0x81E26C16, 0x15E21444,
+    0x81A01B6C, 0x145576B1,
+    0x8162AA03, 0x12C8106E,
+    0x812A1A39, 0x1139F0CE,
+    0x80F66E3C, 0x0FAB272B,
+    0x80C7A80A, 0x0E1BC2E3,
+    0x809DC970, 0x0C8BD35E,
+    0x8078D40D, 0x0AFB6805,
+    0x8058C94C, 0x096A9049,
+    0x803DAA69, 0x07D95B9E,
+    0x80277872, 0x0647D97C,
+    0x80163440, 0x04B6195D,
+    0x8009DE7D, 0x03242ABF,
+    0x800277A5, 0x01921D1F,
+    0x80000000, 0x00000000,
+    0x800277A5, 0xFE6DE2E0,
+    0x8009DE7D, 0xFCDBD541,
+    0x80163440, 0xFB49E6A2,
+    0x80277872, 0xF9B82683,
+    0x803DAA69, 0xF826A461,
+    0x8058C94C, 0xF6956FB6,
+    0x8078D40D, 0xF50497FA,
+    0x809DC970, 0xF3742CA1,
+    0x80C7A80A, 0xF1E43D1C,
+    0x80F66E3C, 0xF054D8D4,
+    0x812A1A39, 0xEEC60F31,
+    0x8162AA03, 0xED37EF91,
+    0x81A01B6C, 0xEBAA894E,
+    0x81E26C16, 0xEA1DEBBB,
+    0x82299971, 0xE8922621,
+    0x8275A0C0, 0xE70747C3,
+    0x82C67F13, 0xE57D5FDA,
+    0x831C314E, 0xE3F47D95,
+    0x8376B422, 0xE26CB01A,
+    0x83D60411, 0xE0E60684,
+    0x843A1D70, 0xDF608FE3,
+    0x84A2FC62, 0xDDDC5B3A,
+    0x85109CDC, 0xDC597781,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x85FA1152, 0xD957DE7A,
+    0x8675DC4E, 0xD7D946D7,
+    0x86F656D3, 0xD65C3B7B,
+    0x877B7BEC, 0xD4E0CB14,
+    0x88054677, 0xD3670445,
+    0x8893B124, 0xD1EEF59E,
+    0x8926B677, 0xD078AD9D,
+    0x89BE50C3, 0xCF043AB2,
+    0x8A5A7A30, 0xCD91AB38,
+    0x8AFB2CBA, 0xCC210D78,
+    0x8BA0622F, 0xCAB26FA9,
+    0x8C4A142F, 0xC945DFEC,
+    0x8CF83C30, 0xC7DB6C50,
+    0x8DAAD37B, 0xC67322CD,
+    0x8E61D32D, 0xC50D1148,
+    0x8F1D343A, 0xC3A9458F,
+    0x8FDCEF66, 0xC247CD5A,
+    0x90A0FD4E, 0xC0E8B648,
+    0x91695663, 0xBF8C0DE2,
+    0x9235F2EB, 0xBE31E19B,
+    0x9306CB04, 0xBCDA3ECA,
+    0x93DBD69F, 0xBB8532AF,
+    0x94B50D87, 0xBA32CA70,
+    0x9592675B, 0xB8E31319,
+    0x9673DB94, 0xB796199B,
+    0x9759617E, 0xB64BEACC,
+    0x9842F043, 0xB5049368,
+    0x99307EE0, 0xB3C0200C,
+    0x9A22042C, 0xB27E9D3B,
+    0x9B1776D9, 0xB140175B,
+    0x9C10CD70, 0xB0049AB2,
+    0x9D0DFE53, 0xAECC336B,
+    0x9E0EFFC1, 0xAD96ED91,
+    0x9F13C7D0, 0xAC64D510,
+    0xA01C4C72, 0xAB35F5B5,
+    0xA1288376, 0xAA0A5B2D,
+    0xA2386283, 0xA8E21106,
+    0xA34BDF20, 0xA7BD22AB,
+    0xA462EEAC, 0xA69B9B68,
+    0xA57D8666, 0xA57D8666,
+    0xA69B9B68, 0xA462EEAC,
+    0xA7BD22AB, 0xA34BDF20,
+    0xA8E21106, 0xA2386283,
+    0xAA0A5B2D, 0xA1288376,
+    0xAB35F5B5, 0xA01C4C72,
+    0xAC64D510, 0x9F13C7D0,
+    0xAD96ED91, 0x9E0EFFC1,
+    0xAECC336B, 0x9D0DFE53,
+    0xB0049AB2, 0x9C10CD70,
+    0xB140175B, 0x9B1776D9,
+    0xB27E9D3B, 0x9A22042C,
+    0xB3C0200C, 0x99307EE0,
+    0xB5049368, 0x9842F043,
+    0xB64BEACC, 0x9759617E,
+    0xB796199B, 0x9673DB94,
+    0xB8E31319, 0x9592675B,
+    0xBA32CA70, 0x94B50D87,
+    0xBB8532AF, 0x93DBD69F,
+    0xBCDA3ECA, 0x9306CB04,
+    0xBE31E19B, 0x9235F2EB,
+    0xBF8C0DE2, 0x91695663,
+    0xC0E8B648, 0x90A0FD4E,
+    0xC247CD5A, 0x8FDCEF66,
+    0xC3A9458F, 0x8F1D343A,
+    0xC50D1148, 0x8E61D32D,
+    0xC67322CD, 0x8DAAD37B,
+    0xC7DB6C50, 0x8CF83C30,
+    0xC945DFEC, 0x8C4A142F,
+    0xCAB26FA9, 0x8BA0622F,
+    0xCC210D78, 0x8AFB2CBA,
+    0xCD91AB38, 0x8A5A7A30,
+    0xCF043AB2, 0x89BE50C3,
+    0xD078AD9D, 0x8926B677,
+    0xD1EEF59E, 0x8893B124,
+    0xD3670445, 0x88054677,
+    0xD4E0CB14, 0x877B7BEC,
+    0xD65C3B7B, 0x86F656D3,
+    0xD7D946D7, 0x8675DC4E,
+    0xD957DE7A, 0x85FA1152,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xDC597781, 0x85109CDC,
+    0xDDDC5B3A, 0x84A2FC62,
+    0xDF608FE3, 0x843A1D70,
+    0xE0E60684, 0x83D60411,
+    0xE26CB01A, 0x8376B422,
+    0xE3F47D95, 0x831C314E,
+    0xE57D5FDA, 0x82C67F13,
+    0xE70747C3, 0x8275A0C0,
+    0xE8922621, 0x82299971,
+    0xEA1DEBBB, 0x81E26C16,
+    0xEBAA894E, 0x81A01B6C,
+    0xED37EF91, 0x8162AA03,
+    0xEEC60F31, 0x812A1A39,
+    0xF054D8D4, 0x80F66E3C,
+    0xF1E43D1C, 0x80C7A80A,
+    0xF3742CA1, 0x809DC970,
+    0xF50497FA, 0x8078D40D,
+    0xF6956FB6, 0x8058C94C,
+    0xF826A461, 0x803DAA69,
+    0xF9B82683, 0x80277872,
+    0xFB49E6A2, 0x80163440,
+    0xFCDBD541, 0x8009DE7D,
+    0xFE6DE2E0, 0x800277A5
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 1024	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_1024_q31[1536] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7FFF6216, 0x00C90F88,
+    0x7FFD885A, 0x01921D1F,
+    0x7FFA72D1, 0x025B26D7,
+    0x7FF62182, 0x03242ABF,
+    0x7FF09477, 0x03ED26E6,
+    0x7FE9CBC0, 0x04B6195D,
+    0x7FE1C76B, 0x057F0034,
+    0x7FD8878D, 0x0647D97C,
+    0x7FCE0C3E, 0x0710A344,
+    0x7FC25596, 0x07D95B9E,
+    0x7FB563B2, 0x08A2009A,
+    0x7FA736B4, 0x096A9049,
+    0x7F97CEBC, 0x0A3308BC,
+    0x7F872BF3, 0x0AFB6805,
+    0x7F754E7F, 0x0BC3AC35,
+    0x7F62368F, 0x0C8BD35E,
+    0x7F4DE450, 0x0D53DB92,
+    0x7F3857F5, 0x0E1BC2E3,
+    0x7F2191B4, 0x0EE38765,
+    0x7F0991C3, 0x0FAB272B,
+    0x7EF0585F, 0x1072A047,
+    0x7ED5E5C6, 0x1139F0CE,
+    0x7EBA3A39, 0x120116D4,
+    0x7E9D55FC, 0x12C8106E,
+    0x7E7F3956, 0x138EDBB0,
+    0x7E5FE493, 0x145576B1,
+    0x7E3F57FE, 0x151BDF85,
+    0x7E1D93E9, 0x15E21444,
+    0x7DFA98A7, 0x16A81305,
+    0x7DD6668E, 0x176DD9DE,
+    0x7DB0FDF7, 0x183366E8,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7D628AC5, 0x19BDCBF2,
+    0x7D3980EC, 0x1A82A025,
+    0x7D0F4218, 0x1B4732EF,
+    0x7CE3CEB1, 0x1C0B826A,
+    0x7CB72724, 0x1CCF8CB3,
+    0x7C894BDD, 0x1D934FE5,
+    0x7C5A3D4F, 0x1E56CA1E,
+    0x7C29FBEE, 0x1F19F97B,
+    0x7BF88830, 0x1FDCDC1A,
+    0x7BC5E28F, 0x209F701C,
+    0x7B920B89, 0x2161B39F,
+    0x7B5D039D, 0x2223A4C5,
+    0x7B26CB4F, 0x22E541AE,
+    0x7AEF6323, 0x23A6887E,
+    0x7AB6CBA3, 0x24677757,
+    0x7A7D055B, 0x25280C5D,
+    0x7A4210D8, 0x25E845B5,
+    0x7A05EEAD, 0x26A82185,
+    0x79C89F6D, 0x27679DF4,
+    0x798A23B1, 0x2826B928,
+    0x794A7C11, 0x28E5714A,
+    0x7909A92C, 0x29A3C484,
+    0x78C7ABA1, 0x2A61B101,
+    0x78848413, 0x2B1F34EB,
+    0x78403328, 0x2BDC4E6F,
+    0x77FAB988, 0x2C98FBBA,
+    0x77B417DF, 0x2D553AFB,
+    0x776C4EDB, 0x2E110A62,
+    0x77235F2D, 0x2ECC681E,
+    0x76D94988, 0x2F875262,
+    0x768E0EA5, 0x3041C760,
+    0x7641AF3C, 0x30FBC54D,
+    0x75F42C0A, 0x31B54A5D,
+    0x75A585CF, 0x326E54C7,
+    0x7555BD4B, 0x3326E2C2,
+    0x7504D345, 0x33DEF287,
+    0x74B2C883, 0x3496824F,
+    0x745F9DD1, 0x354D9056,
+    0x740B53FA, 0x36041AD9,
+    0x73B5EBD0, 0x36BA2013,
+    0x735F6626, 0x376F9E46,
+    0x7307C3D0, 0x382493B0,
+    0x72AF05A6, 0x38D8FE93,
+    0x72552C84, 0x398CDD32,
+    0x71FA3948, 0x3A402DD1,
+    0x719E2CD2, 0x3AF2EEB7,
+    0x71410804, 0x3BA51E29,
+    0x70E2CBC6, 0x3C56BA70,
+    0x708378FE, 0x3D07C1D5,
+    0x70231099, 0x3DB832A5,
+    0x6FC19385, 0x3E680B2C,
+    0x6F5F02B1, 0x3F1749B7,
+    0x6EFB5F12, 0x3FC5EC97,
+    0x6E96A99C, 0x4073F21D,
+    0x6E30E349, 0x4121589A,
+    0x6DCA0D14, 0x41CE1E64,
+    0x6D6227FA, 0x427A41D0,
+    0x6CF934FB, 0x4325C135,
+    0x6C8F351C, 0x43D09AEC,
+    0x6C242960, 0x447ACD50,
+    0x6BB812D0, 0x452456BC,
+    0x6B4AF278, 0x45CD358F,
+    0x6ADCC964, 0x46756827,
+    0x6A6D98A4, 0x471CECE6,
+    0x69FD614A, 0x47C3C22E,
+    0x698C246C, 0x4869E664,
+    0x6919E320, 0x490F57EE,
+    0x68A69E81, 0x49B41533,
+    0x683257AA, 0x4A581C9D,
+    0x67BD0FBC, 0x4AFB6C97,
+    0x6746C7D7, 0x4B9E038F,
+    0x66CF811F, 0x4C3FDFF3,
+    0x66573CBB, 0x4CE10034,
+    0x65DDFBD3, 0x4D8162C4,
+    0x6563BF92, 0x4E210617,
+    0x64E88926, 0x4EBFE8A4,
+    0x646C59BF, 0x4F5E08E3,
+    0x63EF328F, 0x4FFB654D,
+    0x637114CC, 0x5097FC5E,
+    0x62F201AC, 0x5133CC94,
+    0x6271FA69, 0x51CED46E,
+    0x61F1003E, 0x5269126E,
+    0x616F146B, 0x53028517,
+    0x60EC3830, 0x539B2AEF,
+    0x60686CCE, 0x5433027D,
+    0x5FE3B38D, 0x54CA0A4A,
+    0x5F5E0DB3, 0x556040E2,
+    0x5ED77C89, 0x55F5A4D2,
+    0x5E50015D, 0x568A34A9,
+    0x5DC79D7C, 0x571DEEF9,
+    0x5D3E5236, 0x57B0D256,
+    0x5CB420DF, 0x5842DD54,
+    0x5C290ACC, 0x58D40E8C,
+    0x5B9D1153, 0x59646497,
+    0x5B1035CF, 0x59F3DE12,
+    0x5A82799A, 0x5A82799A,
+    0x59F3DE12, 0x5B1035CF,
+    0x59646497, 0x5B9D1153,
+    0x58D40E8C, 0x5C290ACC,
+    0x5842DD54, 0x5CB420DF,
+    0x57B0D256, 0x5D3E5236,
+    0x571DEEF9, 0x5DC79D7C,
+    0x568A34A9, 0x5E50015D,
+    0x55F5A4D2, 0x5ED77C89,
+    0x556040E2, 0x5F5E0DB3,
+    0x54CA0A4A, 0x5FE3B38D,
+    0x5433027D, 0x60686CCE,
+    0x539B2AEF, 0x60EC3830,
+    0x53028517, 0x616F146B,
+    0x5269126E, 0x61F1003E,
+    0x51CED46E, 0x6271FA69,
+    0x5133CC94, 0x62F201AC,
+    0x5097FC5E, 0x637114CC,
+    0x4FFB654D, 0x63EF328F,
+    0x4F5E08E3, 0x646C59BF,
+    0x4EBFE8A4, 0x64E88926,
+    0x4E210617, 0x6563BF92,
+    0x4D8162C4, 0x65DDFBD3,
+    0x4CE10034, 0x66573CBB,
+    0x4C3FDFF3, 0x66CF811F,
+    0x4B9E038F, 0x6746C7D7,
+    0x4AFB6C97, 0x67BD0FBC,
+    0x4A581C9D, 0x683257AA,
+    0x49B41533, 0x68A69E81,
+    0x490F57EE, 0x6919E320,
+    0x4869E664, 0x698C246C,
+    0x47C3C22E, 0x69FD614A,
+    0x471CECE6, 0x6A6D98A4,
+    0x46756827, 0x6ADCC964,
+    0x45CD358F, 0x6B4AF278,
+    0x452456BC, 0x6BB812D0,
+    0x447ACD50, 0x6C242960,
+    0x43D09AEC, 0x6C8F351C,
+    0x4325C135, 0x6CF934FB,
+    0x427A41D0, 0x6D6227FA,
+    0x41CE1E64, 0x6DCA0D14,
+    0x4121589A, 0x6E30E349,
+    0x4073F21D, 0x6E96A99C,
+    0x3FC5EC97, 0x6EFB5F12,
+    0x3F1749B7, 0x6F5F02B1,
+    0x3E680B2C, 0x6FC19385,
+    0x3DB832A5, 0x70231099,
+    0x3D07C1D5, 0x708378FE,
+    0x3C56BA70, 0x70E2CBC6,
+    0x3BA51E29, 0x71410804,
+    0x3AF2EEB7, 0x719E2CD2,
+    0x3A402DD1, 0x71FA3948,
+    0x398CDD32, 0x72552C84,
+    0x38D8FE93, 0x72AF05A6,
+    0x382493B0, 0x7307C3D0,
+    0x376F9E46, 0x735F6626,
+    0x36BA2013, 0x73B5EBD0,
+    0x36041AD9, 0x740B53FA,
+    0x354D9056, 0x745F9DD1,
+    0x3496824F, 0x74B2C883,
+    0x33DEF287, 0x7504D345,
+    0x3326E2C2, 0x7555BD4B,
+    0x326E54C7, 0x75A585CF,
+    0x31B54A5D, 0x75F42C0A,
+    0x30FBC54D, 0x7641AF3C,
+    0x3041C760, 0x768E0EA5,
+    0x2F875262, 0x76D94988,
+    0x2ECC681E, 0x77235F2D,
+    0x2E110A62, 0x776C4EDB,
+    0x2D553AFB, 0x77B417DF,
+    0x2C98FBBA, 0x77FAB988,
+    0x2BDC4E6F, 0x78403328,
+    0x2B1F34EB, 0x78848413,
+    0x2A61B101, 0x78C7ABA1,
+    0x29A3C484, 0x7909A92C,
+    0x28E5714A, 0x794A7C11,
+    0x2826B928, 0x798A23B1,
+    0x27679DF4, 0x79C89F6D,
+    0x26A82185, 0x7A05EEAD,
+    0x25E845B5, 0x7A4210D8,
+    0x25280C5D, 0x7A7D055B,
+    0x24677757, 0x7AB6CBA3,
+    0x23A6887E, 0x7AEF6323,
+    0x22E541AE, 0x7B26CB4F,
+    0x2223A4C5, 0x7B5D039D,
+    0x2161B39F, 0x7B920B89,
+    0x209F701C, 0x7BC5E28F,
+    0x1FDCDC1A, 0x7BF88830,
+    0x1F19F97B, 0x7C29FBEE,
+    0x1E56CA1E, 0x7C5A3D4F,
+    0x1D934FE5, 0x7C894BDD,
+    0x1CCF8CB3, 0x7CB72724,
+    0x1C0B826A, 0x7CE3CEB1,
+    0x1B4732EF, 0x7D0F4218,
+    0x1A82A025, 0x7D3980EC,
+    0x19BDCBF2, 0x7D628AC5,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x183366E8, 0x7DB0FDF7,
+    0x176DD9DE, 0x7DD6668E,
+    0x16A81305, 0x7DFA98A7,
+    0x15E21444, 0x7E1D93E9,
+    0x151BDF85, 0x7E3F57FE,
+    0x145576B1, 0x7E5FE493,
+    0x138EDBB0, 0x7E7F3956,
+    0x12C8106E, 0x7E9D55FC,
+    0x120116D4, 0x7EBA3A39,
+    0x1139F0CE, 0x7ED5E5C6,
+    0x1072A047, 0x7EF0585F,
+    0x0FAB272B, 0x7F0991C3,
+    0x0EE38765, 0x7F2191B4,
+    0x0E1BC2E3, 0x7F3857F5,
+    0x0D53DB92, 0x7F4DE450,
+    0x0C8BD35E, 0x7F62368F,
+    0x0BC3AC35, 0x7F754E7F,
+    0x0AFB6805, 0x7F872BF3,
+    0x0A3308BC, 0x7F97CEBC,
+    0x096A9049, 0x7FA736B4,
+    0x08A2009A, 0x7FB563B2,
+    0x07D95B9E, 0x7FC25596,
+    0x0710A344, 0x7FCE0C3E,
+    0x0647D97C, 0x7FD8878D,
+    0x057F0034, 0x7FE1C76B,
+    0x04B6195D, 0x7FE9CBC0,
+    0x03ED26E6, 0x7FF09477,
+    0x03242ABF, 0x7FF62182,
+    0x025B26D7, 0x7FFA72D1,
+    0x01921D1F, 0x7FFD885A,
+    0x00C90F88, 0x7FFF6216,
+    0x00000000, 0x7FFFFFFF,
+    0xFF36F078, 0x7FFF6216,
+    0xFE6DE2E0, 0x7FFD885A,
+    0xFDA4D928, 0x7FFA72D1,
+    0xFCDBD541, 0x7FF62182,
+    0xFC12D919, 0x7FF09477,
+    0xFB49E6A2, 0x7FE9CBC0,
+    0xFA80FFCB, 0x7FE1C76B,
+    0xF9B82683, 0x7FD8878D,
+    0xF8EF5CBB, 0x7FCE0C3E,
+    0xF826A461, 0x7FC25596,
+    0xF75DFF65, 0x7FB563B2,
+    0xF6956FB6, 0x7FA736B4,
+    0xF5CCF743, 0x7F97CEBC,
+    0xF50497FA, 0x7F872BF3,
+    0xF43C53CA, 0x7F754E7F,
+    0xF3742CA1, 0x7F62368F,
+    0xF2AC246D, 0x7F4DE450,
+    0xF1E43D1C, 0x7F3857F5,
+    0xF11C789A, 0x7F2191B4,
+    0xF054D8D4, 0x7F0991C3,
+    0xEF8D5FB8, 0x7EF0585F,
+    0xEEC60F31, 0x7ED5E5C6,
+    0xEDFEE92B, 0x7EBA3A39,
+    0xED37EF91, 0x7E9D55FC,
+    0xEC71244F, 0x7E7F3956,
+    0xEBAA894E, 0x7E5FE493,
+    0xEAE4207A, 0x7E3F57FE,
+    0xEA1DEBBB, 0x7E1D93E9,
+    0xE957ECFB, 0x7DFA98A7,
+    0xE8922621, 0x7DD6668E,
+    0xE7CC9917, 0x7DB0FDF7,
+    0xE70747C3, 0x7D8A5F3F,
+    0xE642340D, 0x7D628AC5,
+    0xE57D5FDA, 0x7D3980EC,
+    0xE4B8CD10, 0x7D0F4218,
+    0xE3F47D95, 0x7CE3CEB1,
+    0xE330734C, 0x7CB72724,
+    0xE26CB01A, 0x7C894BDD,
+    0xE1A935E1, 0x7C5A3D4F,
+    0xE0E60684, 0x7C29FBEE,
+    0xE02323E5, 0x7BF88830,
+    0xDF608FE3, 0x7BC5E28F,
+    0xDE9E4C60, 0x7B920B89,
+    0xDDDC5B3A, 0x7B5D039D,
+    0xDD1ABE51, 0x7B26CB4F,
+    0xDC597781, 0x7AEF6323,
+    0xDB9888A8, 0x7AB6CBA3,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xDA17BA4A, 0x7A4210D8,
+    0xD957DE7A, 0x7A05EEAD,
+    0xD898620C, 0x79C89F6D,
+    0xD7D946D7, 0x798A23B1,
+    0xD71A8EB5, 0x794A7C11,
+    0xD65C3B7B, 0x7909A92C,
+    0xD59E4EFE, 0x78C7ABA1,
+    0xD4E0CB14, 0x78848413,
+    0xD423B190, 0x78403328,
+    0xD3670445, 0x77FAB988,
+    0xD2AAC504, 0x77B417DF,
+    0xD1EEF59E, 0x776C4EDB,
+    0xD13397E1, 0x77235F2D,
+    0xD078AD9D, 0x76D94988,
+    0xCFBE389F, 0x768E0EA5,
+    0xCF043AB2, 0x7641AF3C,
+    0xCE4AB5A2, 0x75F42C0A,
+    0xCD91AB38, 0x75A585CF,
+    0xCCD91D3D, 0x7555BD4B,
+    0xCC210D78, 0x7504D345,
+    0xCB697DB0, 0x74B2C883,
+    0xCAB26FA9, 0x745F9DD1,
+    0xC9FBE527, 0x740B53FA,
+    0xC945DFEC, 0x73B5EBD0,
+    0xC89061BA, 0x735F6626,
+    0xC7DB6C50, 0x7307C3D0,
+    0xC727016C, 0x72AF05A6,
+    0xC67322CD, 0x72552C84,
+    0xC5BFD22E, 0x71FA3948,
+    0xC50D1148, 0x719E2CD2,
+    0xC45AE1D7, 0x71410804,
+    0xC3A9458F, 0x70E2CBC6,
+    0xC2F83E2A, 0x708378FE,
+    0xC247CD5A, 0x70231099,
+    0xC197F4D3, 0x6FC19385,
+    0xC0E8B648, 0x6F5F02B1,
+    0xC03A1368, 0x6EFB5F12,
+    0xBF8C0DE2, 0x6E96A99C,
+    0xBEDEA765, 0x6E30E349,
+    0xBE31E19B, 0x6DCA0D14,
+    0xBD85BE2F, 0x6D6227FA,
+    0xBCDA3ECA, 0x6CF934FB,
+    0xBC2F6513, 0x6C8F351C,
+    0xBB8532AF, 0x6C242960,
+    0xBADBA943, 0x6BB812D0,
+    0xBA32CA70, 0x6B4AF278,
+    0xB98A97D8, 0x6ADCC964,
+    0xB8E31319, 0x6A6D98A4,
+    0xB83C3DD1, 0x69FD614A,
+    0xB796199B, 0x698C246C,
+    0xB6F0A811, 0x6919E320,
+    0xB64BEACC, 0x68A69E81,
+    0xB5A7E362, 0x683257AA,
+    0xB5049368, 0x67BD0FBC,
+    0xB461FC70, 0x6746C7D7,
+    0xB3C0200C, 0x66CF811F,
+    0xB31EFFCB, 0x66573CBB,
+    0xB27E9D3B, 0x65DDFBD3,
+    0xB1DEF9E8, 0x6563BF92,
+    0xB140175B, 0x64E88926,
+    0xB0A1F71C, 0x646C59BF,
+    0xB0049AB2, 0x63EF328F,
+    0xAF6803A1, 0x637114CC,
+    0xAECC336B, 0x62F201AC,
+    0xAE312B91, 0x6271FA69,
+    0xAD96ED91, 0x61F1003E,
+    0xACFD7AE8, 0x616F146B,
+    0xAC64D510, 0x60EC3830,
+    0xABCCFD82, 0x60686CCE,
+    0xAB35F5B5, 0x5FE3B38D,
+    0xAA9FBF1D, 0x5F5E0DB3,
+    0xAA0A5B2D, 0x5ED77C89,
+    0xA975CB56, 0x5E50015D,
+    0xA8E21106, 0x5DC79D7C,
+    0xA84F2DA9, 0x5D3E5236,
+    0xA7BD22AB, 0x5CB420DF,
+    0xA72BF173, 0x5C290ACC,
+    0xA69B9B68, 0x5B9D1153,
+    0xA60C21ED, 0x5B1035CF,
+    0xA57D8666, 0x5A82799A,
+    0xA4EFCA31, 0x59F3DE12,
+    0xA462EEAC, 0x59646497,
+    0xA3D6F533, 0x58D40E8C,
+    0xA34BDF20, 0x5842DD54,
+    0xA2C1ADC9, 0x57B0D256,
+    0xA2386283, 0x571DEEF9,
+    0xA1AFFEA2, 0x568A34A9,
+    0xA1288376, 0x55F5A4D2,
+    0xA0A1F24C, 0x556040E2,
+    0xA01C4C72, 0x54CA0A4A,
+    0x9F979331, 0x5433027D,
+    0x9F13C7D0, 0x539B2AEF,
+    0x9E90EB94, 0x53028517,
+    0x9E0EFFC1, 0x5269126E,
+    0x9D8E0596, 0x51CED46E,
+    0x9D0DFE53, 0x5133CC94,
+    0x9C8EEB33, 0x5097FC5E,
+    0x9C10CD70, 0x4FFB654D,
+    0x9B93A640, 0x4F5E08E3,
+    0x9B1776D9, 0x4EBFE8A4,
+    0x9A9C406D, 0x4E210617,
+    0x9A22042C, 0x4D8162C4,
+    0x99A8C344, 0x4CE10034,
+    0x99307EE0, 0x4C3FDFF3,
+    0x98B93828, 0x4B9E038F,
+    0x9842F043, 0x4AFB6C97,
+    0x97CDA855, 0x4A581C9D,
+    0x9759617E, 0x49B41533,
+    0x96E61CDF, 0x490F57EE,
+    0x9673DB94, 0x4869E664,
+    0x96029EB5, 0x47C3C22E,
+    0x9592675B, 0x471CECE6,
+    0x9523369B, 0x46756827,
+    0x94B50D87, 0x45CD358F,
+    0x9447ED2F, 0x452456BC,
+    0x93DBD69F, 0x447ACD50,
+    0x9370CAE4, 0x43D09AEC,
+    0x9306CB04, 0x4325C135,
+    0x929DD805, 0x427A41D0,
+    0x9235F2EB, 0x41CE1E64,
+    0x91CF1CB6, 0x4121589A,
+    0x91695663, 0x4073F21D,
+    0x9104A0ED, 0x3FC5EC97,
+    0x90A0FD4E, 0x3F1749B7,
+    0x903E6C7A, 0x3E680B2C,
+    0x8FDCEF66, 0x3DB832A5,
+    0x8F7C8701, 0x3D07C1D5,
+    0x8F1D343A, 0x3C56BA70,
+    0x8EBEF7FB, 0x3BA51E29,
+    0x8E61D32D, 0x3AF2EEB7,
+    0x8E05C6B7, 0x3A402DD1,
+    0x8DAAD37B, 0x398CDD32,
+    0x8D50FA59, 0x38D8FE93,
+    0x8CF83C30, 0x382493B0,
+    0x8CA099D9, 0x376F9E46,
+    0x8C4A142F, 0x36BA2013,
+    0x8BF4AC05, 0x36041AD9,
+    0x8BA0622F, 0x354D9056,
+    0x8B4D377C, 0x3496824F,
+    0x8AFB2CBA, 0x33DEF287,
+    0x8AAA42B4, 0x3326E2C2,
+    0x8A5A7A30, 0x326E54C7,
+    0x8A0BD3F5, 0x31B54A5D,
+    0x89BE50C3, 0x30FBC54D,
+    0x8971F15A, 0x3041C760,
+    0x8926B677, 0x2F875262,
+    0x88DCA0D3, 0x2ECC681E,
+    0x8893B124, 0x2E110A62,
+    0x884BE820, 0x2D553AFB,
+    0x88054677, 0x2C98FBBA,
+    0x87BFCCD7, 0x2BDC4E6F,
+    0x877B7BEC, 0x2B1F34EB,
+    0x8738545E, 0x2A61B101,
+    0x86F656D3, 0x29A3C484,
+    0x86B583EE, 0x28E5714A,
+    0x8675DC4E, 0x2826B928,
+    0x86376092, 0x27679DF4,
+    0x85FA1152, 0x26A82185,
+    0x85BDEF27, 0x25E845B5,
+    0x8582FAA4, 0x25280C5D,
+    0x8549345C, 0x24677757,
+    0x85109CDC, 0x23A6887E,
+    0x84D934B0, 0x22E541AE,
+    0x84A2FC62, 0x2223A4C5,
+    0x846DF476, 0x2161B39F,
+    0x843A1D70, 0x209F701C,
+    0x840777CF, 0x1FDCDC1A,
+    0x83D60411, 0x1F19F97B,
+    0x83A5C2B0, 0x1E56CA1E,
+    0x8376B422, 0x1D934FE5,
+    0x8348D8DB, 0x1CCF8CB3,
+    0x831C314E, 0x1C0B826A,
+    0x82F0BDE8, 0x1B4732EF,
+    0x82C67F13, 0x1A82A025,
+    0x829D753A, 0x19BDCBF2,
+    0x8275A0C0, 0x18F8B83C,
+    0x824F0208, 0x183366E8,
+    0x82299971, 0x176DD9DE,
+    0x82056758, 0x16A81305,
+    0x81E26C16, 0x15E21444,
+    0x81C0A801, 0x151BDF85,
+    0x81A01B6C, 0x145576B1,
+    0x8180C6A9, 0x138EDBB0,
+    0x8162AA03, 0x12C8106E,
+    0x8145C5C6, 0x120116D4,
+    0x812A1A39, 0x1139F0CE,
+    0x810FA7A0, 0x1072A047,
+    0x80F66E3C, 0x0FAB272B,
+    0x80DE6E4C, 0x0EE38765,
+    0x80C7A80A, 0x0E1BC2E3,
+    0x80B21BAF, 0x0D53DB92,
+    0x809DC970, 0x0C8BD35E,
+    0x808AB180, 0x0BC3AC35,
+    0x8078D40D, 0x0AFB6805,
+    0x80683143, 0x0A3308BC,
+    0x8058C94C, 0x096A9049,
+    0x804A9C4D, 0x08A2009A,
+    0x803DAA69, 0x07D95B9E,
+    0x8031F3C1, 0x0710A344,
+    0x80277872, 0x0647D97C,
+    0x801E3894, 0x057F0034,
+    0x80163440, 0x04B6195D,
+    0x800F6B88, 0x03ED26E6,
+    0x8009DE7D, 0x03242ABF,
+    0x80058D2E, 0x025B26D7,
+    0x800277A5, 0x01921D1F,
+    0x80009DE9, 0x00C90F88,
+    0x80000000, 0x00000000,
+    0x80009DE9, 0xFF36F078,
+    0x800277A5, 0xFE6DE2E0,
+    0x80058D2E, 0xFDA4D928,
+    0x8009DE7D, 0xFCDBD541,
+    0x800F6B88, 0xFC12D919,
+    0x80163440, 0xFB49E6A2,
+    0x801E3894, 0xFA80FFCB,
+    0x80277872, 0xF9B82683,
+    0x8031F3C1, 0xF8EF5CBB,
+    0x803DAA69, 0xF826A461,
+    0x804A9C4D, 0xF75DFF65,
+    0x8058C94C, 0xF6956FB6,
+    0x80683143, 0xF5CCF743,
+    0x8078D40D, 0xF50497FA,
+    0x808AB180, 0xF43C53CA,
+    0x809DC970, 0xF3742CA1,
+    0x80B21BAF, 0xF2AC246D,
+    0x80C7A80A, 0xF1E43D1C,
+    0x80DE6E4C, 0xF11C789A,
+    0x80F66E3C, 0xF054D8D4,
+    0x810FA7A0, 0xEF8D5FB8,
+    0x812A1A39, 0xEEC60F31,
+    0x8145C5C6, 0xEDFEE92B,
+    0x8162AA03, 0xED37EF91,
+    0x8180C6A9, 0xEC71244F,
+    0x81A01B6C, 0xEBAA894E,
+    0x81C0A801, 0xEAE4207A,
+    0x81E26C16, 0xEA1DEBBB,
+    0x82056758, 0xE957ECFB,
+    0x82299971, 0xE8922621,
+    0x824F0208, 0xE7CC9917,
+    0x8275A0C0, 0xE70747C3,
+    0x829D753A, 0xE642340D,
+    0x82C67F13, 0xE57D5FDA,
+    0x82F0BDE8, 0xE4B8CD10,
+    0x831C314E, 0xE3F47D95,
+    0x8348D8DB, 0xE330734C,
+    0x8376B422, 0xE26CB01A,
+    0x83A5C2B0, 0xE1A935E1,
+    0x83D60411, 0xE0E60684,
+    0x840777CF, 0xE02323E5,
+    0x843A1D70, 0xDF608FE3,
+    0x846DF476, 0xDE9E4C60,
+    0x84A2FC62, 0xDDDC5B3A,
+    0x84D934B0, 0xDD1ABE51,
+    0x85109CDC, 0xDC597781,
+    0x8549345C, 0xDB9888A8,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x85BDEF27, 0xDA17BA4A,
+    0x85FA1152, 0xD957DE7A,
+    0x86376092, 0xD898620C,
+    0x8675DC4E, 0xD7D946D7,
+    0x86B583EE, 0xD71A8EB5,
+    0x86F656D3, 0xD65C3B7B,
+    0x8738545E, 0xD59E4EFE,
+    0x877B7BEC, 0xD4E0CB14,
+    0x87BFCCD7, 0xD423B190,
+    0x88054677, 0xD3670445,
+    0x884BE820, 0xD2AAC504,
+    0x8893B124, 0xD1EEF59E,
+    0x88DCA0D3, 0xD13397E1,
+    0x8926B677, 0xD078AD9D,
+    0x8971F15A, 0xCFBE389F,
+    0x89BE50C3, 0xCF043AB2,
+    0x8A0BD3F5, 0xCE4AB5A2,
+    0x8A5A7A30, 0xCD91AB38,
+    0x8AAA42B4, 0xCCD91D3D,
+    0x8AFB2CBA, 0xCC210D78,
+    0x8B4D377C, 0xCB697DB0,
+    0x8BA0622F, 0xCAB26FA9,
+    0x8BF4AC05, 0xC9FBE527,
+    0x8C4A142F, 0xC945DFEC,
+    0x8CA099D9, 0xC89061BA,
+    0x8CF83C30, 0xC7DB6C50,
+    0x8D50FA59, 0xC727016C,
+    0x8DAAD37B, 0xC67322CD,
+    0x8E05C6B7, 0xC5BFD22E,
+    0x8E61D32D, 0xC50D1148,
+    0x8EBEF7FB, 0xC45AE1D7,
+    0x8F1D343A, 0xC3A9458F,
+    0x8F7C8701, 0xC2F83E2A,
+    0x8FDCEF66, 0xC247CD5A,
+    0x903E6C7A, 0xC197F4D3,
+    0x90A0FD4E, 0xC0E8B648,
+    0x9104A0ED, 0xC03A1368,
+    0x91695663, 0xBF8C0DE2,
+    0x91CF1CB6, 0xBEDEA765,
+    0x9235F2EB, 0xBE31E19B,
+    0x929DD805, 0xBD85BE2F,
+    0x9306CB04, 0xBCDA3ECA,
+    0x9370CAE4, 0xBC2F6513,
+    0x93DBD69F, 0xBB8532AF,
+    0x9447ED2F, 0xBADBA943,
+    0x94B50D87, 0xBA32CA70,
+    0x9523369B, 0xB98A97D8,
+    0x9592675B, 0xB8E31319,
+    0x96029EB5, 0xB83C3DD1,
+    0x9673DB94, 0xB796199B,
+    0x96E61CDF, 0xB6F0A811,
+    0x9759617E, 0xB64BEACC,
+    0x97CDA855, 0xB5A7E362,
+    0x9842F043, 0xB5049368,
+    0x98B93828, 0xB461FC70,
+    0x99307EE0, 0xB3C0200C,
+    0x99A8C344, 0xB31EFFCB,
+    0x9A22042C, 0xB27E9D3B,
+    0x9A9C406D, 0xB1DEF9E8,
+    0x9B1776D9, 0xB140175B,
+    0x9B93A640, 0xB0A1F71C,
+    0x9C10CD70, 0xB0049AB2,
+    0x9C8EEB33, 0xAF6803A1,
+    0x9D0DFE53, 0xAECC336B,
+    0x9D8E0596, 0xAE312B91,
+    0x9E0EFFC1, 0xAD96ED91,
+    0x9E90EB94, 0xACFD7AE8,
+    0x9F13C7D0, 0xAC64D510,
+    0x9F979331, 0xABCCFD82,
+    0xA01C4C72, 0xAB35F5B5,
+    0xA0A1F24C, 0xAA9FBF1D,
+    0xA1288376, 0xAA0A5B2D,
+    0xA1AFFEA2, 0xA975CB56,
+    0xA2386283, 0xA8E21106,
+    0xA2C1ADC9, 0xA84F2DA9,
+    0xA34BDF20, 0xA7BD22AB,
+    0xA3D6F533, 0xA72BF173,
+    0xA462EEAC, 0xA69B9B68,
+    0xA4EFCA31, 0xA60C21ED,
+    0xA57D8666, 0xA57D8666,
+    0xA60C21ED, 0xA4EFCA31,
+    0xA69B9B68, 0xA462EEAC,
+    0xA72BF173, 0xA3D6F533,
+    0xA7BD22AB, 0xA34BDF20,
+    0xA84F2DA9, 0xA2C1ADC9,
+    0xA8E21106, 0xA2386283,
+    0xA975CB56, 0xA1AFFEA2,
+    0xAA0A5B2D, 0xA1288376,
+    0xAA9FBF1D, 0xA0A1F24C,
+    0xAB35F5B5, 0xA01C4C72,
+    0xABCCFD82, 0x9F979331,
+    0xAC64D510, 0x9F13C7D0,
+    0xACFD7AE8, 0x9E90EB94,
+    0xAD96ED91, 0x9E0EFFC1,
+    0xAE312B91, 0x9D8E0596,
+    0xAECC336B, 0x9D0DFE53,
+    0xAF6803A1, 0x9C8EEB33,
+    0xB0049AB2, 0x9C10CD70,
+    0xB0A1F71C, 0x9B93A640,
+    0xB140175B, 0x9B1776D9,
+    0xB1DEF9E8, 0x9A9C406D,
+    0xB27E9D3B, 0x9A22042C,
+    0xB31EFFCB, 0x99A8C344,
+    0xB3C0200C, 0x99307EE0,
+    0xB461FC70, 0x98B93828,
+    0xB5049368, 0x9842F043,
+    0xB5A7E362, 0x97CDA855,
+    0xB64BEACC, 0x9759617E,
+    0xB6F0A811, 0x96E61CDF,
+    0xB796199B, 0x9673DB94,
+    0xB83C3DD1, 0x96029EB5,
+    0xB8E31319, 0x9592675B,
+    0xB98A97D8, 0x9523369B,
+    0xBA32CA70, 0x94B50D87,
+    0xBADBA943, 0x9447ED2F,
+    0xBB8532AF, 0x93DBD69F,
+    0xBC2F6513, 0x9370CAE4,
+    0xBCDA3ECA, 0x9306CB04,
+    0xBD85BE2F, 0x929DD805,
+    0xBE31E19B, 0x9235F2EB,
+    0xBEDEA765, 0x91CF1CB6,
+    0xBF8C0DE2, 0x91695663,
+    0xC03A1368, 0x9104A0ED,
+    0xC0E8B648, 0x90A0FD4E,
+    0xC197F4D3, 0x903E6C7A,
+    0xC247CD5A, 0x8FDCEF66,
+    0xC2F83E2A, 0x8F7C8701,
+    0xC3A9458F, 0x8F1D343A,
+    0xC45AE1D7, 0x8EBEF7FB,
+    0xC50D1148, 0x8E61D32D,
+    0xC5BFD22E, 0x8E05C6B7,
+    0xC67322CD, 0x8DAAD37B,
+    0xC727016C, 0x8D50FA59,
+    0xC7DB6C50, 0x8CF83C30,
+    0xC89061BA, 0x8CA099D9,
+    0xC945DFEC, 0x8C4A142F,
+    0xC9FBE527, 0x8BF4AC05,
+    0xCAB26FA9, 0x8BA0622F,
+    0xCB697DB0, 0x8B4D377C,
+    0xCC210D78, 0x8AFB2CBA,
+    0xCCD91D3D, 0x8AAA42B4,
+    0xCD91AB38, 0x8A5A7A30,
+    0xCE4AB5A2, 0x8A0BD3F5,
+    0xCF043AB2, 0x89BE50C3,
+    0xCFBE389F, 0x8971F15A,
+    0xD078AD9D, 0x8926B677,
+    0xD13397E1, 0x88DCA0D3,
+    0xD1EEF59E, 0x8893B124,
+    0xD2AAC504, 0x884BE820,
+    0xD3670445, 0x88054677,
+    0xD423B190, 0x87BFCCD7,
+    0xD4E0CB14, 0x877B7BEC,
+    0xD59E4EFE, 0x8738545E,
+    0xD65C3B7B, 0x86F656D3,
+    0xD71A8EB5, 0x86B583EE,
+    0xD7D946D7, 0x8675DC4E,
+    0xD898620C, 0x86376092,
+    0xD957DE7A, 0x85FA1152,
+    0xDA17BA4A, 0x85BDEF27,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xDB9888A8, 0x8549345C,
+    0xDC597781, 0x85109CDC,
+    0xDD1ABE51, 0x84D934B0,
+    0xDDDC5B3A, 0x84A2FC62,
+    0xDE9E4C60, 0x846DF476,
+    0xDF608FE3, 0x843A1D70,
+    0xE02323E5, 0x840777CF,
+    0xE0E60684, 0x83D60411,
+    0xE1A935E1, 0x83A5C2B0,
+    0xE26CB01A, 0x8376B422,
+    0xE330734C, 0x8348D8DB,
+    0xE3F47D95, 0x831C314E,
+    0xE4B8CD10, 0x82F0BDE8,
+    0xE57D5FDA, 0x82C67F13,
+    0xE642340D, 0x829D753A,
+    0xE70747C3, 0x8275A0C0,
+    0xE7CC9917, 0x824F0208,
+    0xE8922621, 0x82299971,
+    0xE957ECFB, 0x82056758,
+    0xEA1DEBBB, 0x81E26C16,
+    0xEAE4207A, 0x81C0A801,
+    0xEBAA894E, 0x81A01B6C,
+    0xEC71244F, 0x8180C6A9,
+    0xED37EF91, 0x8162AA03,
+    0xEDFEE92B, 0x8145C5C6,
+    0xEEC60F31, 0x812A1A39,
+    0xEF8D5FB8, 0x810FA7A0,
+    0xF054D8D4, 0x80F66E3C,
+    0xF11C789A, 0x80DE6E4C,
+    0xF1E43D1C, 0x80C7A80A,
+    0xF2AC246D, 0x80B21BAF,
+    0xF3742CA1, 0x809DC970,
+    0xF43C53CA, 0x808AB180,
+    0xF50497FA, 0x8078D40D,
+    0xF5CCF743, 0x80683143,
+    0xF6956FB6, 0x8058C94C,
+    0xF75DFF65, 0x804A9C4D,
+    0xF826A461, 0x803DAA69,
+    0xF8EF5CBB, 0x8031F3C1,
+    0xF9B82683, 0x80277872,
+    0xFA80FFCB, 0x801E3894,
+    0xFB49E6A2, 0x80163440,
+    0xFC12D919, 0x800F6B88,
+    0xFCDBD541, 0x8009DE7D,
+    0xFDA4D928, 0x80058D2E,
+    0xFE6DE2E0, 0x800277A5,
+    0xFF36F078, 0x80009DE9
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 2048	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_2048_q31[3072] = {
+    0x7FFFFFFF, 0x00000000,
+    0x7FFFD885, 0x006487E3,
+    0x7FFF6216, 0x00C90F88,
+    0x7FFE9CB2, 0x012D96B0,
+    0x7FFD885A, 0x01921D1F,
+    0x7FFC250F, 0x01F6A296,
+    0x7FFA72D1, 0x025B26D7,
+    0x7FF871A1, 0x02BFA9A4,
+    0x7FF62182, 0x03242ABF,
+    0x7FF38273, 0x0388A9E9,
+    0x7FF09477, 0x03ED26E6,
+    0x7FED5790, 0x0451A176,
+    0x7FE9CBC0, 0x04B6195D,
+    0x7FE5F108, 0x051A8E5C,
+    0x7FE1C76B, 0x057F0034,
+    0x7FDD4EEC, 0x05E36EA9,
+    0x7FD8878D, 0x0647D97C,
+    0x7FD37152, 0x06AC406F,
+    0x7FCE0C3E, 0x0710A344,
+    0x7FC85853, 0x077501BE,
+    0x7FC25596, 0x07D95B9E,
+    0x7FBC040A, 0x083DB0A7,
+    0x7FB563B2, 0x08A2009A,
+    0x7FAE7494, 0x09064B3A,
+    0x7FA736B4, 0x096A9049,
+    0x7F9FAA15, 0x09CECF89,
+    0x7F97CEBC, 0x0A3308BC,
+    0x7F8FA4AF, 0x0A973BA5,
+    0x7F872BF3, 0x0AFB6805,
+    0x7F7E648B, 0x0B5F8D9F,
+    0x7F754E7F, 0x0BC3AC35,
+    0x7F6BE9D4, 0x0C27C389,
+    0x7F62368F, 0x0C8BD35E,
+    0x7F5834B6, 0x0CEFDB75,
+    0x7F4DE450, 0x0D53DB92,
+    0x7F434563, 0x0DB7D376,
+    0x7F3857F5, 0x0E1BC2E3,
+    0x7F2D1C0E, 0x0E7FA99D,
+    0x7F2191B4, 0x0EE38765,
+    0x7F15B8EE, 0x0F475BFE,
+    0x7F0991C3, 0x0FAB272B,
+    0x7EFD1C3C, 0x100EE8AD,
+    0x7EF0585F, 0x1072A047,
+    0x7EE34635, 0x10D64DBC,
+    0x7ED5E5C6, 0x1139F0CE,
+    0x7EC8371A, 0x119D8940,
+    0x7EBA3A39, 0x120116D4,
+    0x7EABEF2C, 0x1264994E,
+    0x7E9D55FC, 0x12C8106E,
+    0x7E8E6EB1, 0x132B7BF9,
+    0x7E7F3956, 0x138EDBB0,
+    0x7E6FB5F3, 0x13F22F57,
+    0x7E5FE493, 0x145576B1,
+    0x7E4FC53E, 0x14B8B17F,
+    0x7E3F57FE, 0x151BDF85,
+    0x7E2E9CDF, 0x157F0086,
+    0x7E1D93E9, 0x15E21444,
+    0x7E0C3D29, 0x16451A83,
+    0x7DFA98A7, 0x16A81305,
+    0x7DE8A670, 0x170AFD8D,
+    0x7DD6668E, 0x176DD9DE,
+    0x7DC3D90D, 0x17D0A7BB,
+    0x7DB0FDF7, 0x183366E8,
+    0x7D9DD55A, 0x18961727,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7D769BB5, 0x195B49E9,
+    0x7D628AC5, 0x19BDCBF2,
+    0x7D4E2C7E, 0x1A203E1B,
+    0x7D3980EC, 0x1A82A025,
+    0x7D24881A, 0x1AE4F1D6,
+    0x7D0F4218, 0x1B4732EF,
+    0x7CF9AEF0, 0x1BA96334,
+    0x7CE3CEB1, 0x1C0B826A,
+    0x7CCDA168, 0x1C6D9053,
+    0x7CB72724, 0x1CCF8CB3,
+    0x7CA05FF1, 0x1D31774D,
+    0x7C894BDD, 0x1D934FE5,
+    0x7C71EAF8, 0x1DF5163F,
+    0x7C5A3D4F, 0x1E56CA1E,
+    0x7C4242F2, 0x1EB86B46,
+    0x7C29FBEE, 0x1F19F97B,
+    0x7C116853, 0x1F7B7480,
+    0x7BF88830, 0x1FDCDC1A,
+    0x7BDF5B94, 0x203E300D,
+    0x7BC5E28F, 0x209F701C,
+    0x7BAC1D31, 0x21009C0B,
+    0x7B920B89, 0x2161B39F,
+    0x7B77ADA8, 0x21C2B69C,
+    0x7B5D039D, 0x2223A4C5,
+    0x7B420D7A, 0x22847DDF,
+    0x7B26CB4F, 0x22E541AE,
+    0x7B0B3D2C, 0x2345EFF7,
+    0x7AEF6323, 0x23A6887E,
+    0x7AD33D45, 0x24070B07,
+    0x7AB6CBA3, 0x24677757,
+    0x7A9A0E4F, 0x24C7CD32,
+    0x7A7D055B, 0x25280C5D,
+    0x7A5FB0D8, 0x2588349D,
+    0x7A4210D8, 0x25E845B5,
+    0x7A24256E, 0x26483F6C,
+    0x7A05EEAD, 0x26A82185,
+    0x79E76CA6, 0x2707EBC6,
+    0x79C89F6D, 0x27679DF4,
+    0x79A98715, 0x27C737D2,
+    0x798A23B1, 0x2826B928,
+    0x796A7554, 0x288621B9,
+    0x794A7C11, 0x28E5714A,
+    0x792A37FE, 0x2944A7A2,
+    0x7909A92C, 0x29A3C484,
+    0x78E8CFB1, 0x2A02C7B8,
+    0x78C7ABA1, 0x2A61B101,
+    0x78A63D10, 0x2AC08025,
+    0x78848413, 0x2B1F34EB,
+    0x786280BF, 0x2B7DCF17,
+    0x78403328, 0x2BDC4E6F,
+    0x781D9B64, 0x2C3AB2B9,
+    0x77FAB988, 0x2C98FBBA,
+    0x77D78DAA, 0x2CF72939,
+    0x77B417DF, 0x2D553AFB,
+    0x7790583D, 0x2DB330C7,
+    0x776C4EDB, 0x2E110A62,
+    0x7747FBCE, 0x2E6EC792,
+    0x77235F2D, 0x2ECC681E,
+    0x76FE790E, 0x2F29EBCC,
+    0x76D94988, 0x2F875262,
+    0x76B3D0B3, 0x2FE49BA6,
+    0x768E0EA5, 0x3041C760,
+    0x76680376, 0x309ED555,
+    0x7641AF3C, 0x30FBC54D,
+    0x761B1211, 0x3158970D,
+    0x75F42C0A, 0x31B54A5D,
+    0x75CCFD42, 0x3211DF03,
+    0x75A585CF, 0x326E54C7,
+    0x757DC5CA, 0x32CAAB6F,
+    0x7555BD4B, 0x3326E2C2,
+    0x752D6C6C, 0x3382FA88,
+    0x7504D345, 0x33DEF287,
+    0x74DBF1EF, 0x343ACA87,
+    0x74B2C883, 0x3496824F,
+    0x7489571B, 0x34F219A7,
+    0x745F9DD1, 0x354D9056,
+    0x74359CBD, 0x35A8E624,
+    0x740B53FA, 0x36041AD9,
+    0x73E0C3A3, 0x365F2E3B,
+    0x73B5EBD0, 0x36BA2013,
+    0x738ACC9E, 0x3714F02A,
+    0x735F6626, 0x376F9E46,
+    0x7333B883, 0x37CA2A30,
+    0x7307C3D0, 0x382493B0,
+    0x72DB8828, 0x387EDA8E,
+    0x72AF05A6, 0x38D8FE93,
+    0x72823C66, 0x3932FF87,
+    0x72552C84, 0x398CDD32,
+    0x7227D61C, 0x39E6975D,
+    0x71FA3948, 0x3A402DD1,
+    0x71CC5626, 0x3A99A057,
+    0x719E2CD2, 0x3AF2EEB7,
+    0x716FBD68, 0x3B4C18BA,
+    0x71410804, 0x3BA51E29,
+    0x71120CC5, 0x3BFDFECD,
+    0x70E2CBC6, 0x3C56BA70,
+    0x70B34524, 0x3CAF50DA,
+    0x708378FE, 0x3D07C1D5,
+    0x70536771, 0x3D600D2B,
+    0x70231099, 0x3DB832A5,
+    0x6FF27496, 0x3E10320D,
+    0x6FC19385, 0x3E680B2C,
+    0x6F906D84, 0x3EBFBDCC,
+    0x6F5F02B1, 0x3F1749B7,
+    0x6F2D532C, 0x3F6EAEB8,
+    0x6EFB5F12, 0x3FC5EC97,
+    0x6EC92682, 0x401D0320,
+    0x6E96A99C, 0x4073F21D,
+    0x6E63E87F, 0x40CAB957,
+    0x6E30E349, 0x4121589A,
+    0x6DFD9A1B, 0x4177CFB0,
+    0x6DCA0D14, 0x41CE1E64,
+    0x6D963C54, 0x42244480,
+    0x6D6227FA, 0x427A41D0,
+    0x6D2DD027, 0x42D0161E,
+    0x6CF934FB, 0x4325C135,
+    0x6CC45697, 0x437B42E1,
+    0x6C8F351C, 0x43D09AEC,
+    0x6C59D0A9, 0x4425C923,
+    0x6C242960, 0x447ACD50,
+    0x6BEE3F62, 0x44CFA73F,
+    0x6BB812D0, 0x452456BC,
+    0x6B81A3CD, 0x4578DB93,
+    0x6B4AF278, 0x45CD358F,
+    0x6B13FEF5, 0x4621647C,
+    0x6ADCC964, 0x46756827,
+    0x6AA551E8, 0x46C9405C,
+    0x6A6D98A4, 0x471CECE6,
+    0x6A359DB9, 0x47706D93,
+    0x69FD614A, 0x47C3C22E,
+    0x69C4E37A, 0x4816EA85,
+    0x698C246C, 0x4869E664,
+    0x69532442, 0x48BCB598,
+    0x6919E320, 0x490F57EE,
+    0x68E06129, 0x4961CD32,
+    0x68A69E81, 0x49B41533,
+    0x686C9B4B, 0x4A062FBD,
+    0x683257AA, 0x4A581C9D,
+    0x67F7D3C4, 0x4AA9DBA1,
+    0x67BD0FBC, 0x4AFB6C97,
+    0x67820BB6, 0x4B4CCF4D,
+    0x6746C7D7, 0x4B9E038F,
+    0x670B4443, 0x4BEF092D,
+    0x66CF811F, 0x4C3FDFF3,
+    0x66937E90, 0x4C9087B1,
+    0x66573CBB, 0x4CE10034,
+    0x661ABBC5, 0x4D31494B,
+    0x65DDFBD3, 0x4D8162C4,
+    0x65A0FD0B, 0x4DD14C6E,
+    0x6563BF92, 0x4E210617,
+    0x6526438E, 0x4E708F8F,
+    0x64E88926, 0x4EBFE8A4,
+    0x64AA907F, 0x4F0F1126,
+    0x646C59BF, 0x4F5E08E3,
+    0x642DE50D, 0x4FACCFAB,
+    0x63EF328F, 0x4FFB654D,
+    0x63B0426D, 0x5049C999,
+    0x637114CC, 0x5097FC5E,
+    0x6331A9D4, 0x50E5FD6C,
+    0x62F201AC, 0x5133CC94,
+    0x62B21C7B, 0x518169A4,
+    0x6271FA69, 0x51CED46E,
+    0x62319B9D, 0x521C0CC1,
+    0x61F1003E, 0x5269126E,
+    0x61B02876, 0x52B5E545,
+    0x616F146B, 0x53028517,
+    0x612DC446, 0x534EF1B5,
+    0x60EC3830, 0x539B2AEF,
+    0x60AA704F, 0x53E73097,
+    0x60686CCE, 0x5433027D,
+    0x60262DD5, 0x547EA073,
+    0x5FE3B38D, 0x54CA0A4A,
+    0x5FA0FE1E, 0x55153FD4,
+    0x5F5E0DB3, 0x556040E2,
+    0x5F1AE273, 0x55AB0D46,
+    0x5ED77C89, 0x55F5A4D2,
+    0x5E93DC1F, 0x56400757,
+    0x5E50015D, 0x568A34A9,
+    0x5E0BEC6E, 0x56D42C99,
+    0x5DC79D7C, 0x571DEEF9,
+    0x5D8314B0, 0x57677B9D,
+    0x5D3E5236, 0x57B0D256,
+    0x5CF95638, 0x57F9F2F7,
+    0x5CB420DF, 0x5842DD54,
+    0x5C6EB258, 0x588B913F,
+    0x5C290ACC, 0x58D40E8C,
+    0x5BE32A67, 0x591C550E,
+    0x5B9D1153, 0x59646497,
+    0x5B56BFBD, 0x59AC3CFD,
+    0x5B1035CF, 0x59F3DE12,
+    0x5AC973B4, 0x5A3B47AA,
+    0x5A82799A, 0x5A82799A,
+    0x5A3B47AA, 0x5AC973B4,
+    0x59F3DE12, 0x5B1035CF,
+    0x59AC3CFD, 0x5B56BFBD,
+    0x59646497, 0x5B9D1153,
+    0x591C550E, 0x5BE32A67,
+    0x58D40E8C, 0x5C290ACC,
+    0x588B913F, 0x5C6EB258,
+    0x5842DD54, 0x5CB420DF,
+    0x57F9F2F7, 0x5CF95638,
+    0x57B0D256, 0x5D3E5236,
+    0x57677B9D, 0x5D8314B0,
+    0x571DEEF9, 0x5DC79D7C,
+    0x56D42C99, 0x5E0BEC6E,
+    0x568A34A9, 0x5E50015D,
+    0x56400757, 0x5E93DC1F,
+    0x55F5A4D2, 0x5ED77C89,
+    0x55AB0D46, 0x5F1AE273,
+    0x556040E2, 0x5F5E0DB3,
+    0x55153FD4, 0x5FA0FE1E,
+    0x54CA0A4A, 0x5FE3B38D,
+    0x547EA073, 0x60262DD5,
+    0x5433027D, 0x60686CCE,
+    0x53E73097, 0x60AA704F,
+    0x539B2AEF, 0x60EC3830,
+    0x534EF1B5, 0x612DC446,
+    0x53028517, 0x616F146B,
+    0x52B5E545, 0x61B02876,
+    0x5269126E, 0x61F1003E,
+    0x521C0CC1, 0x62319B9D,
+    0x51CED46E, 0x6271FA69,
+    0x518169A4, 0x62B21C7B,
+    0x5133CC94, 0x62F201AC,
+    0x50E5FD6C, 0x6331A9D4,
+    0x5097FC5E, 0x637114CC,
+    0x5049C999, 0x63B0426D,
+    0x4FFB654D, 0x63EF328F,
+    0x4FACCFAB, 0x642DE50D,
+    0x4F5E08E3, 0x646C59BF,
+    0x4F0F1126, 0x64AA907F,
+    0x4EBFE8A4, 0x64E88926,
+    0x4E708F8F, 0x6526438E,
+    0x4E210617, 0x6563BF92,
+    0x4DD14C6E, 0x65A0FD0B,
+    0x4D8162C4, 0x65DDFBD3,
+    0x4D31494B, 0x661ABBC5,
+    0x4CE10034, 0x66573CBB,
+    0x4C9087B1, 0x66937E90,
+    0x4C3FDFF3, 0x66CF811F,
+    0x4BEF092D, 0x670B4443,
+    0x4B9E038F, 0x6746C7D7,
+    0x4B4CCF4D, 0x67820BB6,
+    0x4AFB6C97, 0x67BD0FBC,
+    0x4AA9DBA1, 0x67F7D3C4,
+    0x4A581C9D, 0x683257AA,
+    0x4A062FBD, 0x686C9B4B,
+    0x49B41533, 0x68A69E81,
+    0x4961CD32, 0x68E06129,
+    0x490F57EE, 0x6919E320,
+    0x48BCB598, 0x69532442,
+    0x4869E664, 0x698C246C,
+    0x4816EA85, 0x69C4E37A,
+    0x47C3C22E, 0x69FD614A,
+    0x47706D93, 0x6A359DB9,
+    0x471CECE6, 0x6A6D98A4,
+    0x46C9405C, 0x6AA551E8,
+    0x46756827, 0x6ADCC964,
+    0x4621647C, 0x6B13FEF5,
+    0x45CD358F, 0x6B4AF278,
+    0x4578DB93, 0x6B81A3CD,
+    0x452456BC, 0x6BB812D0,
+    0x44CFA73F, 0x6BEE3F62,
+    0x447ACD50, 0x6C242960,
+    0x4425C923, 0x6C59D0A9,
+    0x43D09AEC, 0x6C8F351C,
+    0x437B42E1, 0x6CC45697,
+    0x4325C135, 0x6CF934FB,
+    0x42D0161E, 0x6D2DD027,
+    0x427A41D0, 0x6D6227FA,
+    0x42244480, 0x6D963C54,
+    0x41CE1E64, 0x6DCA0D14,
+    0x4177CFB0, 0x6DFD9A1B,
+    0x4121589A, 0x6E30E349,
+    0x40CAB957, 0x6E63E87F,
+    0x4073F21D, 0x6E96A99C,
+    0x401D0320, 0x6EC92682,
+    0x3FC5EC97, 0x6EFB5F12,
+    0x3F6EAEB8, 0x6F2D532C,
+    0x3F1749B7, 0x6F5F02B1,
+    0x3EBFBDCC, 0x6F906D84,
+    0x3E680B2C, 0x6FC19385,
+    0x3E10320D, 0x6FF27496,
+    0x3DB832A5, 0x70231099,
+    0x3D600D2B, 0x70536771,
+    0x3D07C1D5, 0x708378FE,
+    0x3CAF50DA, 0x70B34524,
+    0x3C56BA70, 0x70E2CBC6,
+    0x3BFDFECD, 0x71120CC5,
+    0x3BA51E29, 0x71410804,
+    0x3B4C18BA, 0x716FBD68,
+    0x3AF2EEB7, 0x719E2CD2,
+    0x3A99A057, 0x71CC5626,
+    0x3A402DD1, 0x71FA3948,
+    0x39E6975D, 0x7227D61C,
+    0x398CDD32, 0x72552C84,
+    0x3932FF87, 0x72823C66,
+    0x38D8FE93, 0x72AF05A6,
+    0x387EDA8E, 0x72DB8828,
+    0x382493B0, 0x7307C3D0,
+    0x37CA2A30, 0x7333B883,
+    0x376F9E46, 0x735F6626,
+    0x3714F02A, 0x738ACC9E,
+    0x36BA2013, 0x73B5EBD0,
+    0x365F2E3B, 0x73E0C3A3,
+    0x36041AD9, 0x740B53FA,
+    0x35A8E624, 0x74359CBD,
+    0x354D9056, 0x745F9DD1,
+    0x34F219A7, 0x7489571B,
+    0x3496824F, 0x74B2C883,
+    0x343ACA87, 0x74DBF1EF,
+    0x33DEF287, 0x7504D345,
+    0x3382FA88, 0x752D6C6C,
+    0x3326E2C2, 0x7555BD4B,
+    0x32CAAB6F, 0x757DC5CA,
+    0x326E54C7, 0x75A585CF,
+    0x3211DF03, 0x75CCFD42,
+    0x31B54A5D, 0x75F42C0A,
+    0x3158970D, 0x761B1211,
+    0x30FBC54D, 0x7641AF3C,
+    0x309ED555, 0x76680376,
+    0x3041C760, 0x768E0EA5,
+    0x2FE49BA6, 0x76B3D0B3,
+    0x2F875262, 0x76D94988,
+    0x2F29EBCC, 0x76FE790E,
+    0x2ECC681E, 0x77235F2D,
+    0x2E6EC792, 0x7747FBCE,
+    0x2E110A62, 0x776C4EDB,
+    0x2DB330C7, 0x7790583D,
+    0x2D553AFB, 0x77B417DF,
+    0x2CF72939, 0x77D78DAA,
+    0x2C98FBBA, 0x77FAB988,
+    0x2C3AB2B9, 0x781D9B64,
+    0x2BDC4E6F, 0x78403328,
+    0x2B7DCF17, 0x786280BF,
+    0x2B1F34EB, 0x78848413,
+    0x2AC08025, 0x78A63D10,
+    0x2A61B101, 0x78C7ABA1,
+    0x2A02C7B8, 0x78E8CFB1,
+    0x29A3C484, 0x7909A92C,
+    0x2944A7A2, 0x792A37FE,
+    0x28E5714A, 0x794A7C11,
+    0x288621B9, 0x796A7554,
+    0x2826B928, 0x798A23B1,
+    0x27C737D2, 0x79A98715,
+    0x27679DF4, 0x79C89F6D,
+    0x2707EBC6, 0x79E76CA6,
+    0x26A82185, 0x7A05EEAD,
+    0x26483F6C, 0x7A24256E,
+    0x25E845B5, 0x7A4210D8,
+    0x2588349D, 0x7A5FB0D8,
+    0x25280C5D, 0x7A7D055B,
+    0x24C7CD32, 0x7A9A0E4F,
+    0x24677757, 0x7AB6CBA3,
+    0x24070B07, 0x7AD33D45,
+    0x23A6887E, 0x7AEF6323,
+    0x2345EFF7, 0x7B0B3D2C,
+    0x22E541AE, 0x7B26CB4F,
+    0x22847DDF, 0x7B420D7A,
+    0x2223A4C5, 0x7B5D039D,
+    0x21C2B69C, 0x7B77ADA8,
+    0x2161B39F, 0x7B920B89,
+    0x21009C0B, 0x7BAC1D31,
+    0x209F701C, 0x7BC5E28F,
+    0x203E300D, 0x7BDF5B94,
+    0x1FDCDC1A, 0x7BF88830,
+    0x1F7B7480, 0x7C116853,
+    0x1F19F97B, 0x7C29FBEE,
+    0x1EB86B46, 0x7C4242F2,
+    0x1E56CA1E, 0x7C5A3D4F,
+    0x1DF5163F, 0x7C71EAF8,
+    0x1D934FE5, 0x7C894BDD,
+    0x1D31774D, 0x7CA05FF1,
+    0x1CCF8CB3, 0x7CB72724,
+    0x1C6D9053, 0x7CCDA168,
+    0x1C0B826A, 0x7CE3CEB1,
+    0x1BA96334, 0x7CF9AEF0,
+    0x1B4732EF, 0x7D0F4218,
+    0x1AE4F1D6, 0x7D24881A,
+    0x1A82A025, 0x7D3980EC,
+    0x1A203E1B, 0x7D4E2C7E,
+    0x19BDCBF2, 0x7D628AC5,
+    0x195B49E9, 0x7D769BB5,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x18961727, 0x7D9DD55A,
+    0x183366E8, 0x7DB0FDF7,
+    0x17D0A7BB, 0x7DC3D90D,
+    0x176DD9DE, 0x7DD6668E,
+    0x170AFD8D, 0x7DE8A670,
+    0x16A81305, 0x7DFA98A7,
+    0x16451A83, 0x7E0C3D29,
+    0x15E21444, 0x7E1D93E9,
+    0x157F0086, 0x7E2E9CDF,
+    0x151BDF85, 0x7E3F57FE,
+    0x14B8B17F, 0x7E4FC53E,
+    0x145576B1, 0x7E5FE493,
+    0x13F22F57, 0x7E6FB5F3,
+    0x138EDBB0, 0x7E7F3956,
+    0x132B7BF9, 0x7E8E6EB1,
+    0x12C8106E, 0x7E9D55FC,
+    0x1264994E, 0x7EABEF2C,
+    0x120116D4, 0x7EBA3A39,
+    0x119D8940, 0x7EC8371A,
+    0x1139F0CE, 0x7ED5E5C6,
+    0x10D64DBC, 0x7EE34635,
+    0x1072A047, 0x7EF0585F,
+    0x100EE8AD, 0x7EFD1C3C,
+    0x0FAB272B, 0x7F0991C3,
+    0x0F475BFE, 0x7F15B8EE,
+    0x0EE38765, 0x7F2191B4,
+    0x0E7FA99D, 0x7F2D1C0E,
+    0x0E1BC2E3, 0x7F3857F5,
+    0x0DB7D376, 0x7F434563,
+    0x0D53DB92, 0x7F4DE450,
+    0x0CEFDB75, 0x7F5834B6,
+    0x0C8BD35E, 0x7F62368F,
+    0x0C27C389, 0x7F6BE9D4,
+    0x0BC3AC35, 0x7F754E7F,
+    0x0B5F8D9F, 0x7F7E648B,
+    0x0AFB6805, 0x7F872BF3,
+    0x0A973BA5, 0x7F8FA4AF,
+    0x0A3308BC, 0x7F97CEBC,
+    0x09CECF89, 0x7F9FAA15,
+    0x096A9049, 0x7FA736B4,
+    0x09064B3A, 0x7FAE7494,
+    0x08A2009A, 0x7FB563B2,
+    0x083DB0A7, 0x7FBC040A,
+    0x07D95B9E, 0x7FC25596,
+    0x077501BE, 0x7FC85853,
+    0x0710A344, 0x7FCE0C3E,
+    0x06AC406F, 0x7FD37152,
+    0x0647D97C, 0x7FD8878D,
+    0x05E36EA9, 0x7FDD4EEC,
+    0x057F0034, 0x7FE1C76B,
+    0x051A8E5C, 0x7FE5F108,
+    0x04B6195D, 0x7FE9CBC0,
+    0x0451A176, 0x7FED5790,
+    0x03ED26E6, 0x7FF09477,
+    0x0388A9E9, 0x7FF38273,
+    0x03242ABF, 0x7FF62182,
+    0x02BFA9A4, 0x7FF871A1,
+    0x025B26D7, 0x7FFA72D1,
+    0x01F6A296, 0x7FFC250F,
+    0x01921D1F, 0x7FFD885A,
+    0x012D96B0, 0x7FFE9CB2,
+    0x00C90F88, 0x7FFF6216,
+    0x006487E3, 0x7FFFD885,
+    0x00000000, 0x7FFFFFFF,
+    0xFF9B781D, 0x7FFFD885,
+    0xFF36F078, 0x7FFF6216,
+    0xFED2694F, 0x7FFE9CB2,
+    0xFE6DE2E0, 0x7FFD885A,
+    0xFE095D69, 0x7FFC250F,
+    0xFDA4D928, 0x7FFA72D1,
+    0xFD40565B, 0x7FF871A1,
+    0xFCDBD541, 0x7FF62182,
+    0xFC775616, 0x7FF38273,
+    0xFC12D919, 0x7FF09477,
+    0xFBAE5E89, 0x7FED5790,
+    0xFB49E6A2, 0x7FE9CBC0,
+    0xFAE571A4, 0x7FE5F108,
+    0xFA80FFCB, 0x7FE1C76B,
+    0xFA1C9156, 0x7FDD4EEC,
+    0xF9B82683, 0x7FD8878D,
+    0xF953BF90, 0x7FD37152,
+    0xF8EF5CBB, 0x7FCE0C3E,
+    0xF88AFE41, 0x7FC85853,
+    0xF826A461, 0x7FC25596,
+    0xF7C24F58, 0x7FBC040A,
+    0xF75DFF65, 0x7FB563B2,
+    0xF6F9B4C5, 0x7FAE7494,
+    0xF6956FB6, 0x7FA736B4,
+    0xF6313076, 0x7F9FAA15,
+    0xF5CCF743, 0x7F97CEBC,
+    0xF568C45A, 0x7F8FA4AF,
+    0xF50497FA, 0x7F872BF3,
+    0xF4A07260, 0x7F7E648B,
+    0xF43C53CA, 0x7F754E7F,
+    0xF3D83C76, 0x7F6BE9D4,
+    0xF3742CA1, 0x7F62368F,
+    0xF310248A, 0x7F5834B6,
+    0xF2AC246D, 0x7F4DE450,
+    0xF2482C89, 0x7F434563,
+    0xF1E43D1C, 0x7F3857F5,
+    0xF1805662, 0x7F2D1C0E,
+    0xF11C789A, 0x7F2191B4,
+    0xF0B8A401, 0x7F15B8EE,
+    0xF054D8D4, 0x7F0991C3,
+    0xEFF11752, 0x7EFD1C3C,
+    0xEF8D5FB8, 0x7EF0585F,
+    0xEF29B243, 0x7EE34635,
+    0xEEC60F31, 0x7ED5E5C6,
+    0xEE6276BF, 0x7EC8371A,
+    0xEDFEE92B, 0x7EBA3A39,
+    0xED9B66B2, 0x7EABEF2C,
+    0xED37EF91, 0x7E9D55FC,
+    0xECD48406, 0x7E8E6EB1,
+    0xEC71244F, 0x7E7F3956,
+    0xEC0DD0A8, 0x7E6FB5F3,
+    0xEBAA894E, 0x7E5FE493,
+    0xEB474E80, 0x7E4FC53E,
+    0xEAE4207A, 0x7E3F57FE,
+    0xEA80FF79, 0x7E2E9CDF,
+    0xEA1DEBBB, 0x7E1D93E9,
+    0xE9BAE57C, 0x7E0C3D29,
+    0xE957ECFB, 0x7DFA98A7,
+    0xE8F50273, 0x7DE8A670,
+    0xE8922621, 0x7DD6668E,
+    0xE82F5844, 0x7DC3D90D,
+    0xE7CC9917, 0x7DB0FDF7,
+    0xE769E8D8, 0x7D9DD55A,
+    0xE70747C3, 0x7D8A5F3F,
+    0xE6A4B616, 0x7D769BB5,
+    0xE642340D, 0x7D628AC5,
+    0xE5DFC1E4, 0x7D4E2C7E,
+    0xE57D5FDA, 0x7D3980EC,
+    0xE51B0E2A, 0x7D24881A,
+    0xE4B8CD10, 0x7D0F4218,
+    0xE4569CCB, 0x7CF9AEF0,
+    0xE3F47D95, 0x7CE3CEB1,
+    0xE3926FAC, 0x7CCDA168,
+    0xE330734C, 0x7CB72724,
+    0xE2CE88B2, 0x7CA05FF1,
+    0xE26CB01A, 0x7C894BDD,
+    0xE20AE9C1, 0x7C71EAF8,
+    0xE1A935E1, 0x7C5A3D4F,
+    0xE14794B9, 0x7C4242F2,
+    0xE0E60684, 0x7C29FBEE,
+    0xE0848B7F, 0x7C116853,
+    0xE02323E5, 0x7BF88830,
+    0xDFC1CFF2, 0x7BDF5B94,
+    0xDF608FE3, 0x7BC5E28F,
+    0xDEFF63F4, 0x7BAC1D31,
+    0xDE9E4C60, 0x7B920B89,
+    0xDE3D4963, 0x7B77ADA8,
+    0xDDDC5B3A, 0x7B5D039D,
+    0xDD7B8220, 0x7B420D7A,
+    0xDD1ABE51, 0x7B26CB4F,
+    0xDCBA1008, 0x7B0B3D2C,
+    0xDC597781, 0x7AEF6323,
+    0xDBF8F4F8, 0x7AD33D45,
+    0xDB9888A8, 0x7AB6CBA3,
+    0xDB3832CD, 0x7A9A0E4F,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xDA77CB62, 0x7A5FB0D8,
+    0xDA17BA4A, 0x7A4210D8,
+    0xD9B7C093, 0x7A24256E,
+    0xD957DE7A, 0x7A05EEAD,
+    0xD8F81439, 0x79E76CA6,
+    0xD898620C, 0x79C89F6D,
+    0xD838C82D, 0x79A98715,
+    0xD7D946D7, 0x798A23B1,
+    0xD779DE46, 0x796A7554,
+    0xD71A8EB5, 0x794A7C11,
+    0xD6BB585D, 0x792A37FE,
+    0xD65C3B7B, 0x7909A92C,
+    0xD5FD3847, 0x78E8CFB1,
+    0xD59E4EFE, 0x78C7ABA1,
+    0xD53F7FDA, 0x78A63D10,
+    0xD4E0CB14, 0x78848413,
+    0xD48230E8, 0x786280BF,
+    0xD423B190, 0x78403328,
+    0xD3C54D46, 0x781D9B64,
+    0xD3670445, 0x77FAB988,
+    0xD308D6C6, 0x77D78DAA,
+    0xD2AAC504, 0x77B417DF,
+    0xD24CCF38, 0x7790583D,
+    0xD1EEF59E, 0x776C4EDB,
+    0xD191386D, 0x7747FBCE,
+    0xD13397E1, 0x77235F2D,
+    0xD0D61433, 0x76FE790E,
+    0xD078AD9D, 0x76D94988,
+    0xD01B6459, 0x76B3D0B3,
+    0xCFBE389F, 0x768E0EA5,
+    0xCF612AAA, 0x76680376,
+    0xCF043AB2, 0x7641AF3C,
+    0xCEA768F2, 0x761B1211,
+    0xCE4AB5A2, 0x75F42C0A,
+    0xCDEE20FC, 0x75CCFD42,
+    0xCD91AB38, 0x75A585CF,
+    0xCD355490, 0x757DC5CA,
+    0xCCD91D3D, 0x7555BD4B,
+    0xCC7D0577, 0x752D6C6C,
+    0xCC210D78, 0x7504D345,
+    0xCBC53578, 0x74DBF1EF,
+    0xCB697DB0, 0x74B2C883,
+    0xCB0DE658, 0x7489571B,
+    0xCAB26FA9, 0x745F9DD1,
+    0xCA5719DB, 0x74359CBD,
+    0xC9FBE527, 0x740B53FA,
+    0xC9A0D1C4, 0x73E0C3A3,
+    0xC945DFEC, 0x73B5EBD0,
+    0xC8EB0FD6, 0x738ACC9E,
+    0xC89061BA, 0x735F6626,
+    0xC835D5D0, 0x7333B883,
+    0xC7DB6C50, 0x7307C3D0,
+    0xC7812571, 0x72DB8828,
+    0xC727016C, 0x72AF05A6,
+    0xC6CD0079, 0x72823C66,
+    0xC67322CD, 0x72552C84,
+    0xC61968A2, 0x7227D61C,
+    0xC5BFD22E, 0x71FA3948,
+    0xC5665FA8, 0x71CC5626,
+    0xC50D1148, 0x719E2CD2,
+    0xC4B3E746, 0x716FBD68,
+    0xC45AE1D7, 0x71410804,
+    0xC4020132, 0x71120CC5,
+    0xC3A9458F, 0x70E2CBC6,
+    0xC350AF25, 0x70B34524,
+    0xC2F83E2A, 0x708378FE,
+    0xC29FF2D4, 0x70536771,
+    0xC247CD5A, 0x70231099,
+    0xC1EFCDF2, 0x6FF27496,
+    0xC197F4D3, 0x6FC19385,
+    0xC1404233, 0x6F906D84,
+    0xC0E8B648, 0x6F5F02B1,
+    0xC0915147, 0x6F2D532C,
+    0xC03A1368, 0x6EFB5F12,
+    0xBFE2FCDF, 0x6EC92682,
+    0xBF8C0DE2, 0x6E96A99C,
+    0xBF3546A8, 0x6E63E87F,
+    0xBEDEA765, 0x6E30E349,
+    0xBE88304F, 0x6DFD9A1B,
+    0xBE31E19B, 0x6DCA0D14,
+    0xBDDBBB7F, 0x6D963C54,
+    0xBD85BE2F, 0x6D6227FA,
+    0xBD2FE9E1, 0x6D2DD027,
+    0xBCDA3ECA, 0x6CF934FB,
+    0xBC84BD1E, 0x6CC45697,
+    0xBC2F6513, 0x6C8F351C,
+    0xBBDA36DC, 0x6C59D0A9,
+    0xBB8532AF, 0x6C242960,
+    0xBB3058C0, 0x6BEE3F62,
+    0xBADBA943, 0x6BB812D0,
+    0xBA87246C, 0x6B81A3CD,
+    0xBA32CA70, 0x6B4AF278,
+    0xB9DE9B83, 0x6B13FEF5,
+    0xB98A97D8, 0x6ADCC964,
+    0xB936BFA3, 0x6AA551E8,
+    0xB8E31319, 0x6A6D98A4,
+    0xB88F926C, 0x6A359DB9,
+    0xB83C3DD1, 0x69FD614A,
+    0xB7E9157A, 0x69C4E37A,
+    0xB796199B, 0x698C246C,
+    0xB7434A67, 0x69532442,
+    0xB6F0A811, 0x6919E320,
+    0xB69E32CD, 0x68E06129,
+    0xB64BEACC, 0x68A69E81,
+    0xB5F9D042, 0x686C9B4B,
+    0xB5A7E362, 0x683257AA,
+    0xB556245E, 0x67F7D3C4,
+    0xB5049368, 0x67BD0FBC,
+    0xB4B330B2, 0x67820BB6,
+    0xB461FC70, 0x6746C7D7,
+    0xB410F6D2, 0x670B4443,
+    0xB3C0200C, 0x66CF811F,
+    0xB36F784E, 0x66937E90,
+    0xB31EFFCB, 0x66573CBB,
+    0xB2CEB6B5, 0x661ABBC5,
+    0xB27E9D3B, 0x65DDFBD3,
+    0xB22EB392, 0x65A0FD0B,
+    0xB1DEF9E8, 0x6563BF92,
+    0xB18F7070, 0x6526438E,
+    0xB140175B, 0x64E88926,
+    0xB0F0EEDA, 0x64AA907F,
+    0xB0A1F71C, 0x646C59BF,
+    0xB0533055, 0x642DE50D,
+    0xB0049AB2, 0x63EF328F,
+    0xAFB63667, 0x63B0426D,
+    0xAF6803A1, 0x637114CC,
+    0xAF1A0293, 0x6331A9D4,
+    0xAECC336B, 0x62F201AC,
+    0xAE7E965B, 0x62B21C7B,
+    0xAE312B91, 0x6271FA69,
+    0xADE3F33E, 0x62319B9D,
+    0xAD96ED91, 0x61F1003E,
+    0xAD4A1ABA, 0x61B02876,
+    0xACFD7AE8, 0x616F146B,
+    0xACB10E4A, 0x612DC446,
+    0xAC64D510, 0x60EC3830,
+    0xAC18CF68, 0x60AA704F,
+    0xABCCFD82, 0x60686CCE,
+    0xAB815F8C, 0x60262DD5,
+    0xAB35F5B5, 0x5FE3B38D,
+    0xAAEAC02B, 0x5FA0FE1E,
+    0xAA9FBF1D, 0x5F5E0DB3,
+    0xAA54F2B9, 0x5F1AE273,
+    0xAA0A5B2D, 0x5ED77C89,
+    0xA9BFF8A8, 0x5E93DC1F,
+    0xA975CB56, 0x5E50015D,
+    0xA92BD366, 0x5E0BEC6E,
+    0xA8E21106, 0x5DC79D7C,
+    0xA8988463, 0x5D8314B0,
+    0xA84F2DA9, 0x5D3E5236,
+    0xA8060D08, 0x5CF95638,
+    0xA7BD22AB, 0x5CB420DF,
+    0xA7746EC0, 0x5C6EB258,
+    0xA72BF173, 0x5C290ACC,
+    0xA6E3AAF2, 0x5BE32A67,
+    0xA69B9B68, 0x5B9D1153,
+    0xA653C302, 0x5B56BFBD,
+    0xA60C21ED, 0x5B1035CF,
+    0xA5C4B855, 0x5AC973B4,
+    0xA57D8666, 0x5A82799A,
+    0xA5368C4B, 0x5A3B47AA,
+    0xA4EFCA31, 0x59F3DE12,
+    0xA4A94042, 0x59AC3CFD,
+    0xA462EEAC, 0x59646497,
+    0xA41CD598, 0x591C550E,
+    0xA3D6F533, 0x58D40E8C,
+    0xA3914DA7, 0x588B913F,
+    0xA34BDF20, 0x5842DD54,
+    0xA306A9C7, 0x57F9F2F7,
+    0xA2C1ADC9, 0x57B0D256,
+    0xA27CEB4F, 0x57677B9D,
+    0xA2386283, 0x571DEEF9,
+    0xA1F41391, 0x56D42C99,
+    0xA1AFFEA2, 0x568A34A9,
+    0xA16C23E1, 0x56400757,
+    0xA1288376, 0x55F5A4D2,
+    0xA0E51D8C, 0x55AB0D46,
+    0xA0A1F24C, 0x556040E2,
+    0xA05F01E1, 0x55153FD4,
+    0xA01C4C72, 0x54CA0A4A,
+    0x9FD9D22A, 0x547EA073,
+    0x9F979331, 0x5433027D,
+    0x9F558FB0, 0x53E73097,
+    0x9F13C7D0, 0x539B2AEF,
+    0x9ED23BB9, 0x534EF1B5,
+    0x9E90EB94, 0x53028517,
+    0x9E4FD789, 0x52B5E545,
+    0x9E0EFFC1, 0x5269126E,
+    0x9DCE6462, 0x521C0CC1,
+    0x9D8E0596, 0x51CED46E,
+    0x9D4DE384, 0x518169A4,
+    0x9D0DFE53, 0x5133CC94,
+    0x9CCE562B, 0x50E5FD6C,
+    0x9C8EEB33, 0x5097FC5E,
+    0x9C4FBD92, 0x5049C999,
+    0x9C10CD70, 0x4FFB654D,
+    0x9BD21AF2, 0x4FACCFAB,
+    0x9B93A640, 0x4F5E08E3,
+    0x9B556F80, 0x4F0F1126,
+    0x9B1776D9, 0x4EBFE8A4,
+    0x9AD9BC71, 0x4E708F8F,
+    0x9A9C406D, 0x4E210617,
+    0x9A5F02F5, 0x4DD14C6E,
+    0x9A22042C, 0x4D8162C4,
+    0x99E5443A, 0x4D31494B,
+    0x99A8C344, 0x4CE10034,
+    0x996C816F, 0x4C9087B1,
+    0x99307EE0, 0x4C3FDFF3,
+    0x98F4BBBC, 0x4BEF092D,
+    0x98B93828, 0x4B9E038F,
+    0x987DF449, 0x4B4CCF4D,
+    0x9842F043, 0x4AFB6C97,
+    0x98082C3B, 0x4AA9DBA1,
+    0x97CDA855, 0x4A581C9D,
+    0x979364B5, 0x4A062FBD,
+    0x9759617E, 0x49B41533,
+    0x971F9ED6, 0x4961CD32,
+    0x96E61CDF, 0x490F57EE,
+    0x96ACDBBD, 0x48BCB598,
+    0x9673DB94, 0x4869E664,
+    0x963B1C85, 0x4816EA85,
+    0x96029EB5, 0x47C3C22E,
+    0x95CA6246, 0x47706D93,
+    0x9592675B, 0x471CECE6,
+    0x955AAE17, 0x46C9405C,
+    0x9523369B, 0x46756827,
+    0x94EC010B, 0x4621647C,
+    0x94B50D87, 0x45CD358F,
+    0x947E5C32, 0x4578DB93,
+    0x9447ED2F, 0x452456BC,
+    0x9411C09D, 0x44CFA73F,
+    0x93DBD69F, 0x447ACD50,
+    0x93A62F56, 0x4425C923,
+    0x9370CAE4, 0x43D09AEC,
+    0x933BA968, 0x437B42E1,
+    0x9306CB04, 0x4325C135,
+    0x92D22FD8, 0x42D0161E,
+    0x929DD805, 0x427A41D0,
+    0x9269C3AC, 0x42244480,
+    0x9235F2EB, 0x41CE1E64,
+    0x920265E4, 0x4177CFB0,
+    0x91CF1CB6, 0x4121589A,
+    0x919C1780, 0x40CAB957,
+    0x91695663, 0x4073F21D,
+    0x9136D97D, 0x401D0320,
+    0x9104A0ED, 0x3FC5EC97,
+    0x90D2ACD3, 0x3F6EAEB8,
+    0x90A0FD4E, 0x3F1749B7,
+    0x906F927B, 0x3EBFBDCC,
+    0x903E6C7A, 0x3E680B2C,
+    0x900D8B69, 0x3E10320D,
+    0x8FDCEF66, 0x3DB832A5,
+    0x8FAC988E, 0x3D600D2B,
+    0x8F7C8701, 0x3D07C1D5,
+    0x8F4CBADB, 0x3CAF50DA,
+    0x8F1D343A, 0x3C56BA70,
+    0x8EEDF33B, 0x3BFDFECD,
+    0x8EBEF7FB, 0x3BA51E29,
+    0x8E904298, 0x3B4C18BA,
+    0x8E61D32D, 0x3AF2EEB7,
+    0x8E33A9D9, 0x3A99A057,
+    0x8E05C6B7, 0x3A402DD1,
+    0x8DD829E4, 0x39E6975D,
+    0x8DAAD37B, 0x398CDD32,
+    0x8D7DC399, 0x3932FF87,
+    0x8D50FA59, 0x38D8FE93,
+    0x8D2477D8, 0x387EDA8E,
+    0x8CF83C30, 0x382493B0,
+    0x8CCC477D, 0x37CA2A30,
+    0x8CA099D9, 0x376F9E46,
+    0x8C753361, 0x3714F02A,
+    0x8C4A142F, 0x36BA2013,
+    0x8C1F3C5C, 0x365F2E3B,
+    0x8BF4AC05, 0x36041AD9,
+    0x8BCA6342, 0x35A8E624,
+    0x8BA0622F, 0x354D9056,
+    0x8B76A8E4, 0x34F219A7,
+    0x8B4D377C, 0x3496824F,
+    0x8B240E10, 0x343ACA87,
+    0x8AFB2CBA, 0x33DEF287,
+    0x8AD29393, 0x3382FA88,
+    0x8AAA42B4, 0x3326E2C2,
+    0x8A823A35, 0x32CAAB6F,
+    0x8A5A7A30, 0x326E54C7,
+    0x8A3302BD, 0x3211DF03,
+    0x8A0BD3F5, 0x31B54A5D,
+    0x89E4EDEE, 0x3158970D,
+    0x89BE50C3, 0x30FBC54D,
+    0x8997FC89, 0x309ED555,
+    0x8971F15A, 0x3041C760,
+    0x894C2F4C, 0x2FE49BA6,
+    0x8926B677, 0x2F875262,
+    0x890186F1, 0x2F29EBCC,
+    0x88DCA0D3, 0x2ECC681E,
+    0x88B80431, 0x2E6EC792,
+    0x8893B124, 0x2E110A62,
+    0x886FA7C2, 0x2DB330C7,
+    0x884BE820, 0x2D553AFB,
+    0x88287255, 0x2CF72939,
+    0x88054677, 0x2C98FBBA,
+    0x87E2649B, 0x2C3AB2B9,
+    0x87BFCCD7, 0x2BDC4E6F,
+    0x879D7F40, 0x2B7DCF17,
+    0x877B7BEC, 0x2B1F34EB,
+    0x8759C2EF, 0x2AC08025,
+    0x8738545E, 0x2A61B101,
+    0x8717304E, 0x2A02C7B8,
+    0x86F656D3, 0x29A3C484,
+    0x86D5C802, 0x2944A7A2,
+    0x86B583EE, 0x28E5714A,
+    0x86958AAB, 0x288621B9,
+    0x8675DC4E, 0x2826B928,
+    0x865678EA, 0x27C737D2,
+    0x86376092, 0x27679DF4,
+    0x86189359, 0x2707EBC6,
+    0x85FA1152, 0x26A82185,
+    0x85DBDA91, 0x26483F6C,
+    0x85BDEF27, 0x25E845B5,
+    0x85A04F28, 0x2588349D,
+    0x8582FAA4, 0x25280C5D,
+    0x8565F1B0, 0x24C7CD32,
+    0x8549345C, 0x24677757,
+    0x852CC2BA, 0x24070B07,
+    0x85109CDC, 0x23A6887E,
+    0x84F4C2D3, 0x2345EFF7,
+    0x84D934B0, 0x22E541AE,
+    0x84BDF285, 0x22847DDF,
+    0x84A2FC62, 0x2223A4C5,
+    0x84885257, 0x21C2B69C,
+    0x846DF476, 0x2161B39F,
+    0x8453E2CE, 0x21009C0B,
+    0x843A1D70, 0x209F701C,
+    0x8420A46B, 0x203E300D,
+    0x840777CF, 0x1FDCDC1A,
+    0x83EE97AC, 0x1F7B7480,
+    0x83D60411, 0x1F19F97B,
+    0x83BDBD0D, 0x1EB86B46,
+    0x83A5C2B0, 0x1E56CA1E,
+    0x838E1507, 0x1DF5163F,
+    0x8376B422, 0x1D934FE5,
+    0x835FA00E, 0x1D31774D,
+    0x8348D8DB, 0x1CCF8CB3,
+    0x83325E97, 0x1C6D9053,
+    0x831C314E, 0x1C0B826A,
+    0x8306510F, 0x1BA96334,
+    0x82F0BDE8, 0x1B4732EF,
+    0x82DB77E5, 0x1AE4F1D6,
+    0x82C67F13, 0x1A82A025,
+    0x82B1D381, 0x1A203E1B,
+    0x829D753A, 0x19BDCBF2,
+    0x8289644A, 0x195B49E9,
+    0x8275A0C0, 0x18F8B83C,
+    0x82622AA5, 0x18961727,
+    0x824F0208, 0x183366E8,
+    0x823C26F2, 0x17D0A7BB,
+    0x82299971, 0x176DD9DE,
+    0x8217598F, 0x170AFD8D,
+    0x82056758, 0x16A81305,
+    0x81F3C2D7, 0x16451A83,
+    0x81E26C16, 0x15E21444,
+    0x81D16320, 0x157F0086,
+    0x81C0A801, 0x151BDF85,
+    0x81B03AC1, 0x14B8B17F,
+    0x81A01B6C, 0x145576B1,
+    0x81904A0C, 0x13F22F57,
+    0x8180C6A9, 0x138EDBB0,
+    0x8171914E, 0x132B7BF9,
+    0x8162AA03, 0x12C8106E,
+    0x815410D3, 0x1264994E,
+    0x8145C5C6, 0x120116D4,
+    0x8137C8E6, 0x119D8940,
+    0x812A1A39, 0x1139F0CE,
+    0x811CB9CA, 0x10D64DBC,
+    0x810FA7A0, 0x1072A047,
+    0x8102E3C3, 0x100EE8AD,
+    0x80F66E3C, 0x0FAB272B,
+    0x80EA4712, 0x0F475BFE,
+    0x80DE6E4C, 0x0EE38765,
+    0x80D2E3F1, 0x0E7FA99D,
+    0x80C7A80A, 0x0E1BC2E3,
+    0x80BCBA9C, 0x0DB7D376,
+    0x80B21BAF, 0x0D53DB92,
+    0x80A7CB49, 0x0CEFDB75,
+    0x809DC970, 0x0C8BD35E,
+    0x8094162B, 0x0C27C389,
+    0x808AB180, 0x0BC3AC35,
+    0x80819B74, 0x0B5F8D9F,
+    0x8078D40D, 0x0AFB6805,
+    0x80705B50, 0x0A973BA5,
+    0x80683143, 0x0A3308BC,
+    0x806055EA, 0x09CECF89,
+    0x8058C94C, 0x096A9049,
+    0x80518B6B, 0x09064B3A,
+    0x804A9C4D, 0x08A2009A,
+    0x8043FBF6, 0x083DB0A7,
+    0x803DAA69, 0x07D95B9E,
+    0x8037A7AC, 0x077501BE,
+    0x8031F3C1, 0x0710A344,
+    0x802C8EAD, 0x06AC406F,
+    0x80277872, 0x0647D97C,
+    0x8022B113, 0x05E36EA9,
+    0x801E3894, 0x057F0034,
+    0x801A0EF7, 0x051A8E5C,
+    0x80163440, 0x04B6195D,
+    0x8012A86F, 0x0451A176,
+    0x800F6B88, 0x03ED26E6,
+    0x800C7D8C, 0x0388A9E9,
+    0x8009DE7D, 0x03242ABF,
+    0x80078E5E, 0x02BFA9A4,
+    0x80058D2E, 0x025B26D7,
+    0x8003DAF0, 0x01F6A296,
+    0x800277A5, 0x01921D1F,
+    0x8001634D, 0x012D96B0,
+    0x80009DE9, 0x00C90F88,
+    0x8000277A, 0x006487E3,
+    0x80000000, 0x00000000,
+    0x8000277A, 0xFF9B781D,
+    0x80009DE9, 0xFF36F078,
+    0x8001634D, 0xFED2694F,
+    0x800277A5, 0xFE6DE2E0,
+    0x8003DAF0, 0xFE095D69,
+    0x80058D2E, 0xFDA4D928,
+    0x80078E5E, 0xFD40565B,
+    0x8009DE7D, 0xFCDBD541,
+    0x800C7D8C, 0xFC775616,
+    0x800F6B88, 0xFC12D919,
+    0x8012A86F, 0xFBAE5E89,
+    0x80163440, 0xFB49E6A2,
+    0x801A0EF7, 0xFAE571A4,
+    0x801E3894, 0xFA80FFCB,
+    0x8022B113, 0xFA1C9156,
+    0x80277872, 0xF9B82683,
+    0x802C8EAD, 0xF953BF90,
+    0x8031F3C1, 0xF8EF5CBB,
+    0x8037A7AC, 0xF88AFE41,
+    0x803DAA69, 0xF826A461,
+    0x8043FBF6, 0xF7C24F58,
+    0x804A9C4D, 0xF75DFF65,
+    0x80518B6B, 0xF6F9B4C5,
+    0x8058C94C, 0xF6956FB6,
+    0x806055EA, 0xF6313076,
+    0x80683143, 0xF5CCF743,
+    0x80705B50, 0xF568C45A,
+    0x8078D40D, 0xF50497FA,
+    0x80819B74, 0xF4A07260,
+    0x808AB180, 0xF43C53CA,
+    0x8094162B, 0xF3D83C76,
+    0x809DC970, 0xF3742CA1,
+    0x80A7CB49, 0xF310248A,
+    0x80B21BAF, 0xF2AC246D,
+    0x80BCBA9C, 0xF2482C89,
+    0x80C7A80A, 0xF1E43D1C,
+    0x80D2E3F1, 0xF1805662,
+    0x80DE6E4C, 0xF11C789A,
+    0x80EA4712, 0xF0B8A401,
+    0x80F66E3C, 0xF054D8D4,
+    0x8102E3C3, 0xEFF11752,
+    0x810FA7A0, 0xEF8D5FB8,
+    0x811CB9CA, 0xEF29B243,
+    0x812A1A39, 0xEEC60F31,
+    0x8137C8E6, 0xEE6276BF,
+    0x8145C5C6, 0xEDFEE92B,
+    0x815410D3, 0xED9B66B2,
+    0x8162AA03, 0xED37EF91,
+    0x8171914E, 0xECD48406,
+    0x8180C6A9, 0xEC71244F,
+    0x81904A0C, 0xEC0DD0A8,
+    0x81A01B6C, 0xEBAA894E,
+    0x81B03AC1, 0xEB474E80,
+    0x81C0A801, 0xEAE4207A,
+    0x81D16320, 0xEA80FF79,
+    0x81E26C16, 0xEA1DEBBB,
+    0x81F3C2D7, 0xE9BAE57C,
+    0x82056758, 0xE957ECFB,
+    0x8217598F, 0xE8F50273,
+    0x82299971, 0xE8922621,
+    0x823C26F2, 0xE82F5844,
+    0x824F0208, 0xE7CC9917,
+    0x82622AA5, 0xE769E8D8,
+    0x8275A0C0, 0xE70747C3,
+    0x8289644A, 0xE6A4B616,
+    0x829D753A, 0xE642340D,
+    0x82B1D381, 0xE5DFC1E4,
+    0x82C67F13, 0xE57D5FDA,
+    0x82DB77E5, 0xE51B0E2A,
+    0x82F0BDE8, 0xE4B8CD10,
+    0x8306510F, 0xE4569CCB,
+    0x831C314E, 0xE3F47D95,
+    0x83325E97, 0xE3926FAC,
+    0x8348D8DB, 0xE330734C,
+    0x835FA00E, 0xE2CE88B2,
+    0x8376B422, 0xE26CB01A,
+    0x838E1507, 0xE20AE9C1,
+    0x83A5C2B0, 0xE1A935E1,
+    0x83BDBD0D, 0xE14794B9,
+    0x83D60411, 0xE0E60684,
+    0x83EE97AC, 0xE0848B7F,
+    0x840777CF, 0xE02323E5,
+    0x8420A46B, 0xDFC1CFF2,
+    0x843A1D70, 0xDF608FE3,
+    0x8453E2CE, 0xDEFF63F4,
+    0x846DF476, 0xDE9E4C60,
+    0x84885257, 0xDE3D4963,
+    0x84A2FC62, 0xDDDC5B3A,
+    0x84BDF285, 0xDD7B8220,
+    0x84D934B0, 0xDD1ABE51,
+    0x84F4C2D3, 0xDCBA1008,
+    0x85109CDC, 0xDC597781,
+    0x852CC2BA, 0xDBF8F4F8,
+    0x8549345C, 0xDB9888A8,
+    0x8565F1B0, 0xDB3832CD,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x85A04F28, 0xDA77CB62,
+    0x85BDEF27, 0xDA17BA4A,
+    0x85DBDA91, 0xD9B7C093,
+    0x85FA1152, 0xD957DE7A,
+    0x86189359, 0xD8F81439,
+    0x86376092, 0xD898620C,
+    0x865678EA, 0xD838C82D,
+    0x8675DC4E, 0xD7D946D7,
+    0x86958AAB, 0xD779DE46,
+    0x86B583EE, 0xD71A8EB5,
+    0x86D5C802, 0xD6BB585D,
+    0x86F656D3, 0xD65C3B7B,
+    0x8717304E, 0xD5FD3847,
+    0x8738545E, 0xD59E4EFE,
+    0x8759C2EF, 0xD53F7FDA,
+    0x877B7BEC, 0xD4E0CB14,
+    0x879D7F40, 0xD48230E8,
+    0x87BFCCD7, 0xD423B190,
+    0x87E2649B, 0xD3C54D46,
+    0x88054677, 0xD3670445,
+    0x88287255, 0xD308D6C6,
+    0x884BE820, 0xD2AAC504,
+    0x886FA7C2, 0xD24CCF38,
+    0x8893B124, 0xD1EEF59E,
+    0x88B80431, 0xD191386D,
+    0x88DCA0D3, 0xD13397E1,
+    0x890186F1, 0xD0D61433,
+    0x8926B677, 0xD078AD9D,
+    0x894C2F4C, 0xD01B6459,
+    0x8971F15A, 0xCFBE389F,
+    0x8997FC89, 0xCF612AAA,
+    0x89BE50C3, 0xCF043AB2,
+    0x89E4EDEE, 0xCEA768F2,
+    0x8A0BD3F5, 0xCE4AB5A2,
+    0x8A3302BD, 0xCDEE20FC,
+    0x8A5A7A30, 0xCD91AB38,
+    0x8A823A35, 0xCD355490,
+    0x8AAA42B4, 0xCCD91D3D,
+    0x8AD29393, 0xCC7D0577,
+    0x8AFB2CBA, 0xCC210D78,
+    0x8B240E10, 0xCBC53578,
+    0x8B4D377C, 0xCB697DB0,
+    0x8B76A8E4, 0xCB0DE658,
+    0x8BA0622F, 0xCAB26FA9,
+    0x8BCA6342, 0xCA5719DB,
+    0x8BF4AC05, 0xC9FBE527,
+    0x8C1F3C5C, 0xC9A0D1C4,
+    0x8C4A142F, 0xC945DFEC,
+    0x8C753361, 0xC8EB0FD6,
+    0x8CA099D9, 0xC89061BA,
+    0x8CCC477D, 0xC835D5D0,
+    0x8CF83C30, 0xC7DB6C50,
+    0x8D2477D8, 0xC7812571,
+    0x8D50FA59, 0xC727016C,
+    0x8D7DC399, 0xC6CD0079,
+    0x8DAAD37B, 0xC67322CD,
+    0x8DD829E4, 0xC61968A2,
+    0x8E05C6B7, 0xC5BFD22E,
+    0x8E33A9D9, 0xC5665FA8,
+    0x8E61D32D, 0xC50D1148,
+    0x8E904298, 0xC4B3E746,
+    0x8EBEF7FB, 0xC45AE1D7,
+    0x8EEDF33B, 0xC4020132,
+    0x8F1D343A, 0xC3A9458F,
+    0x8F4CBADB, 0xC350AF25,
+    0x8F7C8701, 0xC2F83E2A,
+    0x8FAC988E, 0xC29FF2D4,
+    0x8FDCEF66, 0xC247CD5A,
+    0x900D8B69, 0xC1EFCDF2,
+    0x903E6C7A, 0xC197F4D3,
+    0x906F927B, 0xC1404233,
+    0x90A0FD4E, 0xC0E8B648,
+    0x90D2ACD3, 0xC0915147,
+    0x9104A0ED, 0xC03A1368,
+    0x9136D97D, 0xBFE2FCDF,
+    0x91695663, 0xBF8C0DE2,
+    0x919C1780, 0xBF3546A8,
+    0x91CF1CB6, 0xBEDEA765,
+    0x920265E4, 0xBE88304F,
+    0x9235F2EB, 0xBE31E19B,
+    0x9269C3AC, 0xBDDBBB7F,
+    0x929DD805, 0xBD85BE2F,
+    0x92D22FD8, 0xBD2FE9E1,
+    0x9306CB04, 0xBCDA3ECA,
+    0x933BA968, 0xBC84BD1E,
+    0x9370CAE4, 0xBC2F6513,
+    0x93A62F56, 0xBBDA36DC,
+    0x93DBD69F, 0xBB8532AF,
+    0x9411C09D, 0xBB3058C0,
+    0x9447ED2F, 0xBADBA943,
+    0x947E5C32, 0xBA87246C,
+    0x94B50D87, 0xBA32CA70,
+    0x94EC010B, 0xB9DE9B83,
+    0x9523369B, 0xB98A97D8,
+    0x955AAE17, 0xB936BFA3,
+    0x9592675B, 0xB8E31319,
+    0x95CA6246, 0xB88F926C,
+    0x96029EB5, 0xB83C3DD1,
+    0x963B1C85, 0xB7E9157A,
+    0x9673DB94, 0xB796199B,
+    0x96ACDBBD, 0xB7434A67,
+    0x96E61CDF, 0xB6F0A811,
+    0x971F9ED6, 0xB69E32CD,
+    0x9759617E, 0xB64BEACC,
+    0x979364B5, 0xB5F9D042,
+    0x97CDA855, 0xB5A7E362,
+    0x98082C3B, 0xB556245E,
+    0x9842F043, 0xB5049368,
+    0x987DF449, 0xB4B330B2,
+    0x98B93828, 0xB461FC70,
+    0x98F4BBBC, 0xB410F6D2,
+    0x99307EE0, 0xB3C0200C,
+    0x996C816F, 0xB36F784E,
+    0x99A8C344, 0xB31EFFCB,
+    0x99E5443A, 0xB2CEB6B5,
+    0x9A22042C, 0xB27E9D3B,
+    0x9A5F02F5, 0xB22EB392,
+    0x9A9C406D, 0xB1DEF9E8,
+    0x9AD9BC71, 0xB18F7070,
+    0x9B1776D9, 0xB140175B,
+    0x9B556F80, 0xB0F0EEDA,
+    0x9B93A640, 0xB0A1F71C,
+    0x9BD21AF2, 0xB0533055,
+    0x9C10CD70, 0xB0049AB2,
+    0x9C4FBD92, 0xAFB63667,
+    0x9C8EEB33, 0xAF6803A1,
+    0x9CCE562B, 0xAF1A0293,
+    0x9D0DFE53, 0xAECC336B,
+    0x9D4DE384, 0xAE7E965B,
+    0x9D8E0596, 0xAE312B91,
+    0x9DCE6462, 0xADE3F33E,
+    0x9E0EFFC1, 0xAD96ED91,
+    0x9E4FD789, 0xAD4A1ABA,
+    0x9E90EB94, 0xACFD7AE8,
+    0x9ED23BB9, 0xACB10E4A,
+    0x9F13C7D0, 0xAC64D510,
+    0x9F558FB0, 0xAC18CF68,
+    0x9F979331, 0xABCCFD82,
+    0x9FD9D22A, 0xAB815F8C,
+    0xA01C4C72, 0xAB35F5B5,
+    0xA05F01E1, 0xAAEAC02B,
+    0xA0A1F24C, 0xAA9FBF1D,
+    0xA0E51D8C, 0xAA54F2B9,
+    0xA1288376, 0xAA0A5B2D,
+    0xA16C23E1, 0xA9BFF8A8,
+    0xA1AFFEA2, 0xA975CB56,
+    0xA1F41391, 0xA92BD366,
+    0xA2386283, 0xA8E21106,
+    0xA27CEB4F, 0xA8988463,
+    0xA2C1ADC9, 0xA84F2DA9,
+    0xA306A9C7, 0xA8060D08,
+    0xA34BDF20, 0xA7BD22AB,
+    0xA3914DA7, 0xA7746EC0,
+    0xA3D6F533, 0xA72BF173,
+    0xA41CD598, 0xA6E3AAF2,
+    0xA462EEAC, 0xA69B9B68,
+    0xA4A94042, 0xA653C302,
+    0xA4EFCA31, 0xA60C21ED,
+    0xA5368C4B, 0xA5C4B855,
+    0xA57D8666, 0xA57D8666,
+    0xA5C4B855, 0xA5368C4B,
+    0xA60C21ED, 0xA4EFCA31,
+    0xA653C302, 0xA4A94042,
+    0xA69B9B68, 0xA462EEAC,
+    0xA6E3AAF2, 0xA41CD598,
+    0xA72BF173, 0xA3D6F533,
+    0xA7746EC0, 0xA3914DA7,
+    0xA7BD22AB, 0xA34BDF20,
+    0xA8060D08, 0xA306A9C7,
+    0xA84F2DA9, 0xA2C1ADC9,
+    0xA8988463, 0xA27CEB4F,
+    0xA8E21106, 0xA2386283,
+    0xA92BD366, 0xA1F41391,
+    0xA975CB56, 0xA1AFFEA2,
+    0xA9BFF8A8, 0xA16C23E1,
+    0xAA0A5B2D, 0xA1288376,
+    0xAA54F2B9, 0xA0E51D8C,
+    0xAA9FBF1D, 0xA0A1F24C,
+    0xAAEAC02B, 0xA05F01E1,
+    0xAB35F5B5, 0xA01C4C72,
+    0xAB815F8C, 0x9FD9D22A,
+    0xABCCFD82, 0x9F979331,
+    0xAC18CF68, 0x9F558FB0,
+    0xAC64D510, 0x9F13C7D0,
+    0xACB10E4A, 0x9ED23BB9,
+    0xACFD7AE8, 0x9E90EB94,
+    0xAD4A1ABA, 0x9E4FD789,
+    0xAD96ED91, 0x9E0EFFC1,
+    0xADE3F33E, 0x9DCE6462,
+    0xAE312B91, 0x9D8E0596,
+    0xAE7E965B, 0x9D4DE384,
+    0xAECC336B, 0x9D0DFE53,
+    0xAF1A0293, 0x9CCE562B,
+    0xAF6803A1, 0x9C8EEB33,
+    0xAFB63667, 0x9C4FBD92,
+    0xB0049AB2, 0x9C10CD70,
+    0xB0533055, 0x9BD21AF2,
+    0xB0A1F71C, 0x9B93A640,
+    0xB0F0EEDA, 0x9B556F80,
+    0xB140175B, 0x9B1776D9,
+    0xB18F7070, 0x9AD9BC71,
+    0xB1DEF9E8, 0x9A9C406D,
+    0xB22EB392, 0x9A5F02F5,
+    0xB27E9D3B, 0x9A22042C,
+    0xB2CEB6B5, 0x99E5443A,
+    0xB31EFFCB, 0x99A8C344,
+    0xB36F784E, 0x996C816F,
+    0xB3C0200C, 0x99307EE0,
+    0xB410F6D2, 0x98F4BBBC,
+    0xB461FC70, 0x98B93828,
+    0xB4B330B2, 0x987DF449,
+    0xB5049368, 0x9842F043,
+    0xB556245E, 0x98082C3B,
+    0xB5A7E362, 0x97CDA855,
+    0xB5F9D042, 0x979364B5,
+    0xB64BEACC, 0x9759617E,
+    0xB69E32CD, 0x971F9ED6,
+    0xB6F0A811, 0x96E61CDF,
+    0xB7434A67, 0x96ACDBBD,
+    0xB796199B, 0x9673DB94,
+    0xB7E9157A, 0x963B1C85,
+    0xB83C3DD1, 0x96029EB5,
+    0xB88F926C, 0x95CA6246,
+    0xB8E31319, 0x9592675B,
+    0xB936BFA3, 0x955AAE17,
+    0xB98A97D8, 0x9523369B,
+    0xB9DE9B83, 0x94EC010B,
+    0xBA32CA70, 0x94B50D87,
+    0xBA87246C, 0x947E5C32,
+    0xBADBA943, 0x9447ED2F,
+    0xBB3058C0, 0x9411C09D,
+    0xBB8532AF, 0x93DBD69F,
+    0xBBDA36DC, 0x93A62F56,
+    0xBC2F6513, 0x9370CAE4,
+    0xBC84BD1E, 0x933BA968,
+    0xBCDA3ECA, 0x9306CB04,
+    0xBD2FE9E1, 0x92D22FD8,
+    0xBD85BE2F, 0x929DD805,
+    0xBDDBBB7F, 0x9269C3AC,
+    0xBE31E19B, 0x9235F2EB,
+    0xBE88304F, 0x920265E4,
+    0xBEDEA765, 0x91CF1CB6,
+    0xBF3546A8, 0x919C1780,
+    0xBF8C0DE2, 0x91695663,
+    0xBFE2FCDF, 0x9136D97D,
+    0xC03A1368, 0x9104A0ED,
+    0xC0915147, 0x90D2ACD3,
+    0xC0E8B648, 0x90A0FD4E,
+    0xC1404233, 0x906F927B,
+    0xC197F4D3, 0x903E6C7A,
+    0xC1EFCDF2, 0x900D8B69,
+    0xC247CD5A, 0x8FDCEF66,
+    0xC29FF2D4, 0x8FAC988E,
+    0xC2F83E2A, 0x8F7C8701,
+    0xC350AF25, 0x8F4CBADB,
+    0xC3A9458F, 0x8F1D343A,
+    0xC4020132, 0x8EEDF33B,
+    0xC45AE1D7, 0x8EBEF7FB,
+    0xC4B3E746, 0x8E904298,
+    0xC50D1148, 0x8E61D32D,
+    0xC5665FA8, 0x8E33A9D9,
+    0xC5BFD22E, 0x8E05C6B7,
+    0xC61968A2, 0x8DD829E4,
+    0xC67322CD, 0x8DAAD37B,
+    0xC6CD0079, 0x8D7DC399,
+    0xC727016C, 0x8D50FA59,
+    0xC7812571, 0x8D2477D8,
+    0xC7DB6C50, 0x8CF83C30,
+    0xC835D5D0, 0x8CCC477D,
+    0xC89061BA, 0x8CA099D9,
+    0xC8EB0FD6, 0x8C753361,
+    0xC945DFEC, 0x8C4A142F,
+    0xC9A0D1C4, 0x8C1F3C5C,
+    0xC9FBE527, 0x8BF4AC05,
+    0xCA5719DB, 0x8BCA6342,
+    0xCAB26FA9, 0x8BA0622F,
+    0xCB0DE658, 0x8B76A8E4,
+    0xCB697DB0, 0x8B4D377C,
+    0xCBC53578, 0x8B240E10,
+    0xCC210D78, 0x8AFB2CBA,
+    0xCC7D0577, 0x8AD29393,
+    0xCCD91D3D, 0x8AAA42B4,
+    0xCD355490, 0x8A823A35,
+    0xCD91AB38, 0x8A5A7A30,
+    0xCDEE20FC, 0x8A3302BD,
+    0xCE4AB5A2, 0x8A0BD3F5,
+    0xCEA768F2, 0x89E4EDEE,
+    0xCF043AB2, 0x89BE50C3,
+    0xCF612AAA, 0x8997FC89,
+    0xCFBE389F, 0x8971F15A,
+    0xD01B6459, 0x894C2F4C,
+    0xD078AD9D, 0x8926B677,
+    0xD0D61433, 0x890186F1,
+    0xD13397E1, 0x88DCA0D3,
+    0xD191386D, 0x88B80431,
+    0xD1EEF59E, 0x8893B124,
+    0xD24CCF38, 0x886FA7C2,
+    0xD2AAC504, 0x884BE820,
+    0xD308D6C6, 0x88287255,
+    0xD3670445, 0x88054677,
+    0xD3C54D46, 0x87E2649B,
+    0xD423B190, 0x87BFCCD7,
+    0xD48230E8, 0x879D7F40,
+    0xD4E0CB14, 0x877B7BEC,
+    0xD53F7FDA, 0x8759C2EF,
+    0xD59E4EFE, 0x8738545E,
+    0xD5FD3847, 0x8717304E,
+    0xD65C3B7B, 0x86F656D3,
+    0xD6BB585D, 0x86D5C802,
+    0xD71A8EB5, 0x86B583EE,
+    0xD779DE46, 0x86958AAB,
+    0xD7D946D7, 0x8675DC4E,
+    0xD838C82D, 0x865678EA,
+    0xD898620C, 0x86376092,
+    0xD8F81439, 0x86189359,
+    0xD957DE7A, 0x85FA1152,
+    0xD9B7C093, 0x85DBDA91,
+    0xDA17BA4A, 0x85BDEF27,
+    0xDA77CB62, 0x85A04F28,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xDB3832CD, 0x8565F1B0,
+    0xDB9888A8, 0x8549345C,
+    0xDBF8F4F8, 0x852CC2BA,
+    0xDC597781, 0x85109CDC,
+    0xDCBA1008, 0x84F4C2D3,
+    0xDD1ABE51, 0x84D934B0,
+    0xDD7B8220, 0x84BDF285,
+    0xDDDC5B3A, 0x84A2FC62,
+    0xDE3D4963, 0x84885257,
+    0xDE9E4C60, 0x846DF476,
+    0xDEFF63F4, 0x8453E2CE,
+    0xDF608FE3, 0x843A1D70,
+    0xDFC1CFF2, 0x8420A46B,
+    0xE02323E5, 0x840777CF,
+    0xE0848B7F, 0x83EE97AC,
+    0xE0E60684, 0x83D60411,
+    0xE14794B9, 0x83BDBD0D,
+    0xE1A935E1, 0x83A5C2B0,
+    0xE20AE9C1, 0x838E1507,
+    0xE26CB01A, 0x8376B422,
+    0xE2CE88B2, 0x835FA00E,
+    0xE330734C, 0x8348D8DB,
+    0xE3926FAC, 0x83325E97,
+    0xE3F47D95, 0x831C314E,
+    0xE4569CCB, 0x8306510F,
+    0xE4B8CD10, 0x82F0BDE8,
+    0xE51B0E2A, 0x82DB77E5,
+    0xE57D5FDA, 0x82C67F13,
+    0xE5DFC1E4, 0x82B1D381,
+    0xE642340D, 0x829D753A,
+    0xE6A4B616, 0x8289644A,
+    0xE70747C3, 0x8275A0C0,
+    0xE769E8D8, 0x82622AA5,
+    0xE7CC9917, 0x824F0208,
+    0xE82F5844, 0x823C26F2,
+    0xE8922621, 0x82299971,
+    0xE8F50273, 0x8217598F,
+    0xE957ECFB, 0x82056758,
+    0xE9BAE57C, 0x81F3C2D7,
+    0xEA1DEBBB, 0x81E26C16,
+    0xEA80FF79, 0x81D16320,
+    0xEAE4207A, 0x81C0A801,
+    0xEB474E80, 0x81B03AC1,
+    0xEBAA894E, 0x81A01B6C,
+    0xEC0DD0A8, 0x81904A0C,
+    0xEC71244F, 0x8180C6A9,
+    0xECD48406, 0x8171914E,
+    0xED37EF91, 0x8162AA03,
+    0xED9B66B2, 0x815410D3,
+    0xEDFEE92B, 0x8145C5C6,
+    0xEE6276BF, 0x8137C8E6,
+    0xEEC60F31, 0x812A1A39,
+    0xEF29B243, 0x811CB9CA,
+    0xEF8D5FB8, 0x810FA7A0,
+    0xEFF11752, 0x8102E3C3,
+    0xF054D8D4, 0x80F66E3C,
+    0xF0B8A401, 0x80EA4712,
+    0xF11C789A, 0x80DE6E4C,
+    0xF1805662, 0x80D2E3F1,
+    0xF1E43D1C, 0x80C7A80A,
+    0xF2482C89, 0x80BCBA9C,
+    0xF2AC246D, 0x80B21BAF,
+    0xF310248A, 0x80A7CB49,
+    0xF3742CA1, 0x809DC970,
+    0xF3D83C76, 0x8094162B,
+    0xF43C53CA, 0x808AB180,
+    0xF4A07260, 0x80819B74,
+    0xF50497FA, 0x8078D40D,
+    0xF568C45A, 0x80705B50,
+    0xF5CCF743, 0x80683143,
+    0xF6313076, 0x806055EA,
+    0xF6956FB6, 0x8058C94C,
+    0xF6F9B4C5, 0x80518B6B,
+    0xF75DFF65, 0x804A9C4D,
+    0xF7C24F58, 0x8043FBF6,
+    0xF826A461, 0x803DAA69,
+    0xF88AFE41, 0x8037A7AC,
+    0xF8EF5CBB, 0x8031F3C1,
+    0xF953BF90, 0x802C8EAD,
+    0xF9B82683, 0x80277872,
+    0xFA1C9156, 0x8022B113,
+    0xFA80FFCB, 0x801E3894,
+    0xFAE571A4, 0x801A0EF7,
+    0xFB49E6A2, 0x80163440,
+    0xFBAE5E89, 0x8012A86F,
+    0xFC12D919, 0x800F6B88,
+    0xFC775616, 0x800C7D8C,
+    0xFCDBD541, 0x8009DE7D,
+    0xFD40565B, 0x80078E5E,
+    0xFDA4D928, 0x80058D2E,
+    0xFE095D69, 0x8003DAF0,
+    0xFE6DE2E0, 0x800277A5,
+    0xFED2694F, 0x8001634D,
+    0xFF36F078, 0x80009DE9,
+    0xFF9B781D, 0x8000277A
+};
+
+/**    
+* \par   
+* Example code for Q31 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefQ31[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefQ31[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 4096	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to Q31(Fixed point 1.31):    
+*	round(twiddleCoefQ31(i) * pow(2, 31))    
+*    
+*/
+const q31_t twiddleCoef_4096_q31[6144] = 
+{
+    0x7FFFFFFF, 0x00000000,
+    0x7FFFF621, 0x003243F5,
+    0x7FFFD885, 0x006487E3,
+    0x7FFFA72C, 0x0096CBC1,
+    0x7FFF6216, 0x00C90F88,
+    0x7FFF0942, 0x00FB532F,
+    0x7FFE9CB2, 0x012D96B0,
+    0x7FFE1C64, 0x015FDA03,
+    0x7FFD885A, 0x01921D1F,
+    0x7FFCE093, 0x01C45FFE,
+    0x7FFC250F, 0x01F6A296,
+    0x7FFB55CE, 0x0228E4E1,
+    0x7FFA72D1, 0x025B26D7,
+    0x7FF97C17, 0x028D6870,
+    0x7FF871A1, 0x02BFA9A4,
+    0x7FF7536F, 0x02F1EA6B,
+    0x7FF62182, 0x03242ABF,
+    0x7FF4DBD8, 0x03566A96,
+    0x7FF38273, 0x0388A9E9,
+    0x7FF21553, 0x03BAE8B1,
+    0x7FF09477, 0x03ED26E6,
+    0x7FEEFFE1, 0x041F647F,
+    0x7FED5790, 0x0451A176,
+    0x7FEB9B85, 0x0483DDC3,
+    0x7FE9CBC0, 0x04B6195D,
+    0x7FE7E840, 0x04E8543D,
+    0x7FE5F108, 0x051A8E5C,
+    0x7FE3E616, 0x054CC7B0,
+    0x7FE1C76B, 0x057F0034,
+    0x7FDF9508, 0x05B137DF,
+    0x7FDD4EEC, 0x05E36EA9,
+    0x7FDAF518, 0x0615A48A,
+    0x7FD8878D, 0x0647D97C,
+    0x7FD6064B, 0x067A0D75,
+    0x7FD37152, 0x06AC406F,
+    0x7FD0C8A3, 0x06DE7261,
+    0x7FCE0C3E, 0x0710A344,
+    0x7FCB3C23, 0x0742D310,
+    0x7FC85853, 0x077501BE,
+    0x7FC560CF, 0x07A72F45,
+    0x7FC25596, 0x07D95B9E,
+    0x7FBF36A9, 0x080B86C1,
+    0x7FBC040A, 0x083DB0A7,
+    0x7FB8BDB7, 0x086FD947,
+    0x7FB563B2, 0x08A2009A,
+    0x7FB1F5FC, 0x08D42698,
+    0x7FAE7494, 0x09064B3A,
+    0x7FAADF7C, 0x09386E77,
+    0x7FA736B4, 0x096A9049,
+    0x7FA37A3C, 0x099CB0A7,
+    0x7F9FAA15, 0x09CECF89,
+    0x7F9BC63F, 0x0A00ECE8,
+    0x7F97CEBC, 0x0A3308BC,
+    0x7F93C38C, 0x0A6522FE,
+    0x7F8FA4AF, 0x0A973BA5,
+    0x7F8B7226, 0x0AC952AA,
+    0x7F872BF3, 0x0AFB6805,
+    0x7F82D214, 0x0B2D7BAE,
+    0x7F7E648B, 0x0B5F8D9F,
+    0x7F79E35A, 0x0B919DCE,
+    0x7F754E7F, 0x0BC3AC35,
+    0x7F70A5FD, 0x0BF5B8CB,
+    0x7F6BE9D4, 0x0C27C389,
+    0x7F671A04, 0x0C59CC67,
+    0x7F62368F, 0x0C8BD35E,
+    0x7F5D3F75, 0x0CBDD865,
+    0x7F5834B6, 0x0CEFDB75,
+    0x7F531654, 0x0D21DC87,
+    0x7F4DE450, 0x0D53DB92,
+    0x7F489EAA, 0x0D85D88F,
+    0x7F434563, 0x0DB7D376,
+    0x7F3DD87C, 0x0DE9CC3F,
+    0x7F3857F5, 0x0E1BC2E3,
+    0x7F32C3D0, 0x0E4DB75B,
+    0x7F2D1C0E, 0x0E7FA99D,
+    0x7F2760AF, 0x0EB199A3,
+    0x7F2191B4, 0x0EE38765,
+    0x7F1BAF1E, 0x0F1572DC,
+    0x7F15B8EE, 0x0F475BFE,
+    0x7F0FAF24, 0x0F7942C6,
+    0x7F0991C3, 0x0FAB272B,
+    0x7F0360CB, 0x0FDD0925,
+    0x7EFD1C3C, 0x100EE8AD,
+    0x7EF6C418, 0x1040C5BB,
+    0x7EF0585F, 0x1072A047,
+    0x7EE9D913, 0x10A4784A,
+    0x7EE34635, 0x10D64DBC,
+    0x7EDC9FC6, 0x11082096,
+    0x7ED5E5C6, 0x1139F0CE,
+    0x7ECF1837, 0x116BBE5F,
+    0x7EC8371A, 0x119D8940,
+    0x7EC1426F, 0x11CF516A,
+    0x7EBA3A39, 0x120116D4,
+    0x7EB31E77, 0x1232D978,
+    0x7EABEF2C, 0x1264994E,
+    0x7EA4AC58, 0x1296564D,
+    0x7E9D55FC, 0x12C8106E,
+    0x7E95EC19, 0x12F9C7AA,
+    0x7E8E6EB1, 0x132B7BF9,
+    0x7E86DDC5, 0x135D2D53,
+    0x7E7F3956, 0x138EDBB0,
+    0x7E778165, 0x13C0870A,
+    0x7E6FB5F3, 0x13F22F57,
+    0x7E67D702, 0x1423D492,
+    0x7E5FE493, 0x145576B1,
+    0x7E57DEA6, 0x148715AD,
+    0x7E4FC53E, 0x14B8B17F,
+    0x7E47985B, 0x14EA4A1F,
+    0x7E3F57FE, 0x151BDF85,
+    0x7E37042A, 0x154D71AA,
+    0x7E2E9CDF, 0x157F0086,
+    0x7E26221E, 0x15B08C11,
+    0x7E1D93E9, 0x15E21444,
+    0x7E14F242, 0x16139917,
+    0x7E0C3D29, 0x16451A83,
+    0x7E03749F, 0x1676987F,
+    0x7DFA98A7, 0x16A81305,
+    0x7DF1A942, 0x16D98A0C,
+    0x7DE8A670, 0x170AFD8D,
+    0x7DDF9034, 0x173C6D80,
+    0x7DD6668E, 0x176DD9DE,
+    0x7DCD2981, 0x179F429F,
+    0x7DC3D90D, 0x17D0A7BB,
+    0x7DBA7534, 0x1802092C,
+    0x7DB0FDF7, 0x183366E8,
+    0x7DA77359, 0x1864C0E9,
+    0x7D9DD55A, 0x18961727,
+    0x7D9423FB, 0x18C7699B,
+    0x7D8A5F3F, 0x18F8B83C,
+    0x7D808727, 0x192A0303,
+    0x7D769BB5, 0x195B49E9,
+    0x7D6C9CE9, 0x198C8CE6,
+    0x7D628AC5, 0x19BDCBF2,
+    0x7D58654C, 0x19EF0706,
+    0x7D4E2C7E, 0x1A203E1B,
+    0x7D43E05E, 0x1A517127,
+    0x7D3980EC, 0x1A82A025,
+    0x7D2F0E2A, 0x1AB3CB0C,
+    0x7D24881A, 0x1AE4F1D6,
+    0x7D19EEBE, 0x1B161479,
+    0x7D0F4218, 0x1B4732EF,
+    0x7D048228, 0x1B784D30,
+    0x7CF9AEF0, 0x1BA96334,
+    0x7CEEC873, 0x1BDA74F5,
+    0x7CE3CEB1, 0x1C0B826A,
+    0x7CD8C1AD, 0x1C3C8B8C,
+    0x7CCDA168, 0x1C6D9053,
+    0x7CC26DE5, 0x1C9E90B8,
+    0x7CB72724, 0x1CCF8CB3,
+    0x7CABCD27, 0x1D00843C,
+    0x7CA05FF1, 0x1D31774D,
+    0x7C94DF82, 0x1D6265DD,
+    0x7C894BDD, 0x1D934FE5,
+    0x7C7DA504, 0x1DC4355D,
+    0x7C71EAF8, 0x1DF5163F,
+    0x7C661DBB, 0x1E25F281,
+    0x7C5A3D4F, 0x1E56CA1E,
+    0x7C4E49B6, 0x1E879D0C,
+    0x7C4242F2, 0x1EB86B46,
+    0x7C362904, 0x1EE934C2,
+    0x7C29FBEE, 0x1F19F97B,
+    0x7C1DBBB2, 0x1F4AB967,
+    0x7C116853, 0x1F7B7480,
+    0x7C0501D1, 0x1FAC2ABF,
+    0x7BF88830, 0x1FDCDC1A,
+    0x7BEBFB70, 0x200D888C,
+    0x7BDF5B94, 0x203E300D,
+    0x7BD2A89E, 0x206ED295,
+    0x7BC5E28F, 0x209F701C,
+    0x7BB9096A, 0x20D0089B,
+    0x7BAC1D31, 0x21009C0B,
+    0x7B9F1DE5, 0x21312A65,
+    0x7B920B89, 0x2161B39F,
+    0x7B84E61E, 0x219237B4,
+    0x7B77ADA8, 0x21C2B69C,
+    0x7B6A6227, 0x21F3304E,
+    0x7B5D039D, 0x2223A4C5,
+    0x7B4F920E, 0x225413F8,
+    0x7B420D7A, 0x22847DDF,
+    0x7B3475E4, 0x22B4E274,
+    0x7B26CB4F, 0x22E541AE,
+    0x7B190DBB, 0x23159B87,
+    0x7B0B3D2C, 0x2345EFF7,
+    0x7AFD59A3, 0x23763EF7,
+    0x7AEF6323, 0x23A6887E,
+    0x7AE159AE, 0x23D6CC86,
+    0x7AD33D45, 0x24070B07,
+    0x7AC50DEB, 0x243743FA,
+    0x7AB6CBA3, 0x24677757,
+    0x7AA8766E, 0x2497A517,
+    0x7A9A0E4F, 0x24C7CD32,
+    0x7A8B9348, 0x24F7EFA1,
+    0x7A7D055B, 0x25280C5D,
+    0x7A6E648A, 0x2558235E,
+    0x7A5FB0D8, 0x2588349D,
+    0x7A50EA46, 0x25B84012,
+    0x7A4210D8, 0x25E845B5,
+    0x7A33248F, 0x26184581,
+    0x7A24256E, 0x26483F6C,
+    0x7A151377, 0x26783370,
+    0x7A05EEAD, 0x26A82185,
+    0x79F6B711, 0x26D809A5,
+    0x79E76CA6, 0x2707EBC6,
+    0x79D80F6F, 0x2737C7E3,
+    0x79C89F6D, 0x27679DF4,
+    0x79B91CA4, 0x27976DF1,
+    0x79A98715, 0x27C737D2,
+    0x7999DEC3, 0x27F6FB92,
+    0x798A23B1, 0x2826B928,
+    0x797A55E0, 0x2856708C,
+    0x796A7554, 0x288621B9,
+    0x795A820E, 0x28B5CCA5,
+    0x794A7C11, 0x28E5714A,
+    0x793A6360, 0x29150FA1,
+    0x792A37FE, 0x2944A7A2,
+    0x7919F9EB, 0x29743945,
+    0x7909A92C, 0x29A3C484,
+    0x78F945C3, 0x29D34958,
+    0x78E8CFB1, 0x2A02C7B8,
+    0x78D846FB, 0x2A323F9D,
+    0x78C7ABA1, 0x2A61B101,
+    0x78B6FDA8, 0x2A911BDB,
+    0x78A63D10, 0x2AC08025,
+    0x789569DE, 0x2AEFDDD8,
+    0x78848413, 0x2B1F34EB,
+    0x78738BB3, 0x2B4E8558,
+    0x786280BF, 0x2B7DCF17,
+    0x7851633B, 0x2BAD1221,
+    0x78403328, 0x2BDC4E6F,
+    0x782EF08B, 0x2C0B83F9,
+    0x781D9B64, 0x2C3AB2B9,
+    0x780C33B8, 0x2C69DAA6,
+    0x77FAB988, 0x2C98FBBA,
+    0x77E92CD8, 0x2CC815ED,
+    0x77D78DAA, 0x2CF72939,
+    0x77C5DC01, 0x2D263595,
+    0x77B417DF, 0x2D553AFB,
+    0x77A24148, 0x2D843963,
+    0x7790583D, 0x2DB330C7,
+    0x777E5CC3, 0x2DE2211E,
+    0x776C4EDB, 0x2E110A62,
+    0x775A2E88, 0x2E3FEC8B,
+    0x7747FBCE, 0x2E6EC792,
+    0x7735B6AE, 0x2E9D9B70,
+    0x77235F2D, 0x2ECC681E,
+    0x7710F54B, 0x2EFB2D94,
+    0x76FE790E, 0x2F29EBCC,
+    0x76EBEA77, 0x2F58A2BD,
+    0x76D94988, 0x2F875262,
+    0x76C69646, 0x2FB5FAB2,
+    0x76B3D0B3, 0x2FE49BA6,
+    0x76A0F8D2, 0x30133538,
+    0x768E0EA5, 0x3041C760,
+    0x767B1230, 0x30705217,
+    0x76680376, 0x309ED555,
+    0x7654E279, 0x30CD5114,
+    0x7641AF3C, 0x30FBC54D,
+    0x762E69C3, 0x312A31F8,
+    0x761B1211, 0x3158970D,
+    0x7607A827, 0x3186F487,
+    0x75F42C0A, 0x31B54A5D,
+    0x75E09DBD, 0x31E39889,
+    0x75CCFD42, 0x3211DF03,
+    0x75B94A9C, 0x32401DC5,
+    0x75A585CF, 0x326E54C7,
+    0x7591AEDD, 0x329C8402,
+    0x757DC5CA, 0x32CAAB6F,
+    0x7569CA98, 0x32F8CB07,
+    0x7555BD4B, 0x3326E2C2,
+    0x75419DE6, 0x3354F29A,
+    0x752D6C6C, 0x3382FA88,
+    0x751928E0, 0x33B0FA84,
+    0x7504D345, 0x33DEF287,
+    0x74F06B9E, 0x340CE28A,
+    0x74DBF1EF, 0x343ACA87,
+    0x74C7663A, 0x3468AA76,
+    0x74B2C883, 0x3496824F,
+    0x749E18CD, 0x34C4520D,
+    0x7489571B, 0x34F219A7,
+    0x74748371, 0x351FD917,
+    0x745F9DD1, 0x354D9056,
+    0x744AA63E, 0x357B3F5D,
+    0x74359CBD, 0x35A8E624,
+    0x74208150, 0x35D684A5,
+    0x740B53FA, 0x36041AD9,
+    0x73F614C0, 0x3631A8B7,
+    0x73E0C3A3, 0x365F2E3B,
+    0x73CB60A7, 0x368CAB5C,
+    0x73B5EBD0, 0x36BA2013,
+    0x73A06522, 0x36E78C5A,
+    0x738ACC9E, 0x3714F02A,
+    0x73752249, 0x37424B7A,
+    0x735F6626, 0x376F9E46,
+    0x73499838, 0x379CE884,
+    0x7333B883, 0x37CA2A30,
+    0x731DC709, 0x37F76340,
+    0x7307C3D0, 0x382493B0,
+    0x72F1AED8, 0x3851BB76,
+    0x72DB8828, 0x387EDA8E,
+    0x72C54FC0, 0x38ABF0EF,
+    0x72AF05A6, 0x38D8FE93,
+    0x7298A9DC, 0x39060372,
+    0x72823C66, 0x3932FF87,
+    0x726BBD48, 0x395FF2C9,
+    0x72552C84, 0x398CDD32,
+    0x723E8A1F, 0x39B9BEBB,
+    0x7227D61C, 0x39E6975D,
+    0x7211107D, 0x3A136712,
+    0x71FA3948, 0x3A402DD1,
+    0x71E3507F, 0x3A6CEB95,
+    0x71CC5626, 0x3A99A057,
+    0x71B54A40, 0x3AC64C0F,
+    0x719E2CD2, 0x3AF2EEB7,
+    0x7186FDDE, 0x3B1F8847,
+    0x716FBD68, 0x3B4C18BA,
+    0x71586B73, 0x3B78A007,
+    0x71410804, 0x3BA51E29,
+    0x7129931E, 0x3BD19317,
+    0x71120CC5, 0x3BFDFECD,
+    0x70FA74FB, 0x3C2A6142,
+    0x70E2CBC6, 0x3C56BA70,
+    0x70CB1127, 0x3C830A4F,
+    0x70B34524, 0x3CAF50DA,
+    0x709B67C0, 0x3CDB8E09,
+    0x708378FE, 0x3D07C1D5,
+    0x706B78E3, 0x3D33EC39,
+    0x70536771, 0x3D600D2B,
+    0x703B44AC, 0x3D8C24A7,
+    0x70231099, 0x3DB832A5,
+    0x700ACB3B, 0x3DE4371F,
+    0x6FF27496, 0x3E10320D,
+    0x6FDA0CAD, 0x3E3C2369,
+    0x6FC19385, 0x3E680B2C,
+    0x6FA90920, 0x3E93E94F,
+    0x6F906D84, 0x3EBFBDCC,
+    0x6F77C0B3, 0x3EEB889C,
+    0x6F5F02B1, 0x3F1749B7,
+    0x6F463383, 0x3F430118,
+    0x6F2D532C, 0x3F6EAEB8,
+    0x6F1461AF, 0x3F9A528F,
+    0x6EFB5F12, 0x3FC5EC97,
+    0x6EE24B57, 0x3FF17CCA,
+    0x6EC92682, 0x401D0320,
+    0x6EAFF098, 0x40487F93,
+    0x6E96A99C, 0x4073F21D,
+    0x6E7D5193, 0x409F5AB6,
+    0x6E63E87F, 0x40CAB957,
+    0x6E4A6E65, 0x40F60DFB,
+    0x6E30E349, 0x4121589A,
+    0x6E17472F, 0x414C992E,
+    0x6DFD9A1B, 0x4177CFB0,
+    0x6DE3DC11, 0x41A2FC1A,
+    0x6DCA0D14, 0x41CE1E64,
+    0x6DB02D29, 0x41F93688,
+    0x6D963C54, 0x42244480,
+    0x6D7C3A98, 0x424F4845,
+    0x6D6227FA, 0x427A41D0,
+    0x6D48047E, 0x42A5311A,
+    0x6D2DD027, 0x42D0161E,
+    0x6D138AFA, 0x42FAF0D4,
+    0x6CF934FB, 0x4325C135,
+    0x6CDECE2E, 0x4350873C,
+    0x6CC45697, 0x437B42E1,
+    0x6CA9CE3A, 0x43A5F41E,
+    0x6C8F351C, 0x43D09AEC,
+    0x6C748B3F, 0x43FB3745,
+    0x6C59D0A9, 0x4425C923,
+    0x6C3F055D, 0x4450507E,
+    0x6C242960, 0x447ACD50,
+    0x6C093CB6, 0x44A53F93,
+    0x6BEE3F62, 0x44CFA73F,
+    0x6BD3316A, 0x44FA044F,
+    0x6BB812D0, 0x452456BC,
+    0x6B9CE39B, 0x454E9E80,
+    0x6B81A3CD, 0x4578DB93,
+    0x6B66536A, 0x45A30DF0,
+    0x6B4AF278, 0x45CD358F,
+    0x6B2F80FA, 0x45F7526B,
+    0x6B13FEF5, 0x4621647C,
+    0x6AF86C6C, 0x464B6BBD,
+    0x6ADCC964, 0x46756827,
+    0x6AC115E1, 0x469F59B4,
+    0x6AA551E8, 0x46C9405C,
+    0x6A897D7D, 0x46F31C1A,
+    0x6A6D98A4, 0x471CECE6,
+    0x6A51A361, 0x4746B2BC,
+    0x6A359DB9, 0x47706D93,
+    0x6A1987B0, 0x479A1D66,
+    0x69FD614A, 0x47C3C22E,
+    0x69E12A8C, 0x47ED5BE6,
+    0x69C4E37A, 0x4816EA85,
+    0x69A88C18, 0x48406E07,
+    0x698C246C, 0x4869E664,
+    0x696FAC78, 0x48935397,
+    0x69532442, 0x48BCB598,
+    0x69368BCE, 0x48E60C62,
+    0x6919E320, 0x490F57EE,
+    0x68FD2A3D, 0x49389836,
+    0x68E06129, 0x4961CD32,
+    0x68C387E9, 0x498AF6DE,
+    0x68A69E81, 0x49B41533,
+    0x6889A4F5, 0x49DD282A,
+    0x686C9B4B, 0x4A062FBD,
+    0x684F8186, 0x4A2F2BE5,
+    0x683257AA, 0x4A581C9D,
+    0x68151DBE, 0x4A8101DE,
+    0x67F7D3C4, 0x4AA9DBA1,
+    0x67DA79C2, 0x4AD2A9E1,
+    0x67BD0FBC, 0x4AFB6C97,
+    0x679F95B7, 0x4B2423BD,
+    0x67820BB6, 0x4B4CCF4D,
+    0x676471C0, 0x4B756F3F,
+    0x6746C7D7, 0x4B9E038F,
+    0x67290E02, 0x4BC68C36,
+    0x670B4443, 0x4BEF092D,
+    0x66ED6AA1, 0x4C177A6E,
+    0x66CF811F, 0x4C3FDFF3,
+    0x66B187C3, 0x4C6839B6,
+    0x66937E90, 0x4C9087B1,
+    0x6675658C, 0x4CB8C9DD,
+    0x66573CBB, 0x4CE10034,
+    0x66390422, 0x4D092AB0,
+    0x661ABBC5, 0x4D31494B,
+    0x65FC63A9, 0x4D595BFE,
+    0x65DDFBD3, 0x4D8162C4,
+    0x65BF8447, 0x4DA95D96,
+    0x65A0FD0B, 0x4DD14C6E,
+    0x65826622, 0x4DF92F45,
+    0x6563BF92, 0x4E210617,
+    0x6545095F, 0x4E48D0DC,
+    0x6526438E, 0x4E708F8F,
+    0x65076E24, 0x4E984229,
+    0x64E88926, 0x4EBFE8A4,
+    0x64C99498, 0x4EE782FA,
+    0x64AA907F, 0x4F0F1126,
+    0x648B7CDF, 0x4F369320,
+    0x646C59BF, 0x4F5E08E3,
+    0x644D2722, 0x4F857268,
+    0x642DE50D, 0x4FACCFAB,
+    0x640E9385, 0x4FD420A3,
+    0x63EF328F, 0x4FFB654D,
+    0x63CFC230, 0x50229DA0,
+    0x63B0426D, 0x5049C999,
+    0x6390B34A, 0x5070E92F,
+    0x637114CC, 0x5097FC5E,
+    0x635166F8, 0x50BF031F,
+    0x6331A9D4, 0x50E5FD6C,
+    0x6311DD63, 0x510CEB40,
+    0x62F201AC, 0x5133CC94,
+    0x62D216B2, 0x515AA162,
+    0x62B21C7B, 0x518169A4,
+    0x6292130C, 0x51A82555,
+    0x6271FA69, 0x51CED46E,
+    0x6251D297, 0x51F576E9,
+    0x62319B9D, 0x521C0CC1,
+    0x6211557D, 0x524295EF,
+    0x61F1003E, 0x5269126E,
+    0x61D09BE5, 0x528F8237,
+    0x61B02876, 0x52B5E545,
+    0x618FA5F6, 0x52DC3B92,
+    0x616F146B, 0x53028517,
+    0x614E73D9, 0x5328C1D0,
+    0x612DC446, 0x534EF1B5,
+    0x610D05B7, 0x537514C1,
+    0x60EC3830, 0x539B2AEF,
+    0x60CB5BB6, 0x53C13438,
+    0x60AA704F, 0x53E73097,
+    0x60897600, 0x540D2005,
+    0x60686CCE, 0x5433027D,
+    0x604754BE, 0x5458D7F9,
+    0x60262DD5, 0x547EA073,
+    0x6004F818, 0x54A45BE5,
+    0x5FE3B38D, 0x54CA0A4A,
+    0x5FC26038, 0x54EFAB9C,
+    0x5FA0FE1E, 0x55153FD4,
+    0x5F7F8D46, 0x553AC6ED,
+    0x5F5E0DB3, 0x556040E2,
+    0x5F3C7F6B, 0x5585ADAC,
+    0x5F1AE273, 0x55AB0D46,
+    0x5EF936D1, 0x55D05FAA,
+    0x5ED77C89, 0x55F5A4D2,
+    0x5EB5B3A1, 0x561ADCB8,
+    0x5E93DC1F, 0x56400757,
+    0x5E71F606, 0x566524AA,
+    0x5E50015D, 0x568A34A9,
+    0x5E2DFE28, 0x56AF3750,
+    0x5E0BEC6E, 0x56D42C99,
+    0x5DE9CC32, 0x56F9147E,
+    0x5DC79D7C, 0x571DEEF9,
+    0x5DA5604E, 0x5742BC05,
+    0x5D8314B0, 0x57677B9D,
+    0x5D60BAA6, 0x578C2DB9,
+    0x5D3E5236, 0x57B0D256,
+    0x5D1BDB65, 0x57D5696C,
+    0x5CF95638, 0x57F9F2F7,
+    0x5CD6C2B4, 0x581E6EF1,
+    0x5CB420DF, 0x5842DD54,
+    0x5C9170BF, 0x58673E1B,
+    0x5C6EB258, 0x588B913F,
+    0x5C4BE5B0, 0x58AFD6BC,
+    0x5C290ACC, 0x58D40E8C,
+    0x5C0621B2, 0x58F838A9,
+    0x5BE32A67, 0x591C550E,
+    0x5BC024F0, 0x594063B4,
+    0x5B9D1153, 0x59646497,
+    0x5B79EF96, 0x598857B1,
+    0x5B56BFBD, 0x59AC3CFD,
+    0x5B3381CE, 0x59D01474,
+    0x5B1035CF, 0x59F3DE12,
+    0x5AECDBC4, 0x5A1799D0,
+    0x5AC973B4, 0x5A3B47AA,
+    0x5AA5FDA4, 0x5A5EE79A,
+    0x5A82799A, 0x5A82799A,
+    0x5A5EE79A, 0x5AA5FDA4,
+    0x5A3B47AA, 0x5AC973B4,
+    0x5A1799D0, 0x5AECDBC4,
+    0x59F3DE12, 0x5B1035CF,
+    0x59D01474, 0x5B3381CE,
+    0x59AC3CFD, 0x5B56BFBD,
+    0x598857B1, 0x5B79EF96,
+    0x59646497, 0x5B9D1153,
+    0x594063B4, 0x5BC024F0,
+    0x591C550E, 0x5BE32A67,
+    0x58F838A9, 0x5C0621B2,
+    0x58D40E8C, 0x5C290ACC,
+    0x58AFD6BC, 0x5C4BE5B0,
+    0x588B913F, 0x5C6EB258,
+    0x58673E1B, 0x5C9170BF,
+    0x5842DD54, 0x5CB420DF,
+    0x581E6EF1, 0x5CD6C2B4,
+    0x57F9F2F7, 0x5CF95638,
+    0x57D5696C, 0x5D1BDB65,
+    0x57B0D256, 0x5D3E5236,
+    0x578C2DB9, 0x5D60BAA6,
+    0x57677B9D, 0x5D8314B0,
+    0x5742BC05, 0x5DA5604E,
+    0x571DEEF9, 0x5DC79D7C,
+    0x56F9147E, 0x5DE9CC32,
+    0x56D42C99, 0x5E0BEC6E,
+    0x56AF3750, 0x5E2DFE28,
+    0x568A34A9, 0x5E50015D,
+    0x566524AA, 0x5E71F606,
+    0x56400757, 0x5E93DC1F,
+    0x561ADCB8, 0x5EB5B3A1,
+    0x55F5A4D2, 0x5ED77C89,
+    0x55D05FAA, 0x5EF936D1,
+    0x55AB0D46, 0x5F1AE273,
+    0x5585ADAC, 0x5F3C7F6B,
+    0x556040E2, 0x5F5E0DB3,
+    0x553AC6ED, 0x5F7F8D46,
+    0x55153FD4, 0x5FA0FE1E,
+    0x54EFAB9C, 0x5FC26038,
+    0x54CA0A4A, 0x5FE3B38D,
+    0x54A45BE5, 0x6004F818,
+    0x547EA073, 0x60262DD5,
+    0x5458D7F9, 0x604754BE,
+    0x5433027D, 0x60686CCE,
+    0x540D2005, 0x60897600,
+    0x53E73097, 0x60AA704F,
+    0x53C13438, 0x60CB5BB6,
+    0x539B2AEF, 0x60EC3830,
+    0x537514C1, 0x610D05B7,
+    0x534EF1B5, 0x612DC446,
+    0x5328C1D0, 0x614E73D9,
+    0x53028517, 0x616F146B,
+    0x52DC3B92, 0x618FA5F6,
+    0x52B5E545, 0x61B02876,
+    0x528F8237, 0x61D09BE5,
+    0x5269126E, 0x61F1003E,
+    0x524295EF, 0x6211557D,
+    0x521C0CC1, 0x62319B9D,
+    0x51F576E9, 0x6251D297,
+    0x51CED46E, 0x6271FA69,
+    0x51A82555, 0x6292130C,
+    0x518169A4, 0x62B21C7B,
+    0x515AA162, 0x62D216B2,
+    0x5133CC94, 0x62F201AC,
+    0x510CEB40, 0x6311DD63,
+    0x50E5FD6C, 0x6331A9D4,
+    0x50BF031F, 0x635166F8,
+    0x5097FC5E, 0x637114CC,
+    0x5070E92F, 0x6390B34A,
+    0x5049C999, 0x63B0426D,
+    0x50229DA0, 0x63CFC230,
+    0x4FFB654D, 0x63EF328F,
+    0x4FD420A3, 0x640E9385,
+    0x4FACCFAB, 0x642DE50D,
+    0x4F857268, 0x644D2722,
+    0x4F5E08E3, 0x646C59BF,
+    0x4F369320, 0x648B7CDF,
+    0x4F0F1126, 0x64AA907F,
+    0x4EE782FA, 0x64C99498,
+    0x4EBFE8A4, 0x64E88926,
+    0x4E984229, 0x65076E24,
+    0x4E708F8F, 0x6526438E,
+    0x4E48D0DC, 0x6545095F,
+    0x4E210617, 0x6563BF92,
+    0x4DF92F45, 0x65826622,
+    0x4DD14C6E, 0x65A0FD0B,
+    0x4DA95D96, 0x65BF8447,
+    0x4D8162C4, 0x65DDFBD3,
+    0x4D595BFE, 0x65FC63A9,
+    0x4D31494B, 0x661ABBC5,
+    0x4D092AB0, 0x66390422,
+    0x4CE10034, 0x66573CBB,
+    0x4CB8C9DD, 0x6675658C,
+    0x4C9087B1, 0x66937E90,
+    0x4C6839B6, 0x66B187C3,
+    0x4C3FDFF3, 0x66CF811F,
+    0x4C177A6E, 0x66ED6AA1,
+    0x4BEF092D, 0x670B4443,
+    0x4BC68C36, 0x67290E02,
+    0x4B9E038F, 0x6746C7D7,
+    0x4B756F3F, 0x676471C0,
+    0x4B4CCF4D, 0x67820BB6,
+    0x4B2423BD, 0x679F95B7,
+    0x4AFB6C97, 0x67BD0FBC,
+    0x4AD2A9E1, 0x67DA79C2,
+    0x4AA9DBA1, 0x67F7D3C4,
+    0x4A8101DE, 0x68151DBE,
+    0x4A581C9D, 0x683257AA,
+    0x4A2F2BE5, 0x684F8186,
+    0x4A062FBD, 0x686C9B4B,
+    0x49DD282A, 0x6889A4F5,
+    0x49B41533, 0x68A69E81,
+    0x498AF6DE, 0x68C387E9,
+    0x4961CD32, 0x68E06129,
+    0x49389836, 0x68FD2A3D,
+    0x490F57EE, 0x6919E320,
+    0x48E60C62, 0x69368BCE,
+    0x48BCB598, 0x69532442,
+    0x48935397, 0x696FAC78,
+    0x4869E664, 0x698C246C,
+    0x48406E07, 0x69A88C18,
+    0x4816EA85, 0x69C4E37A,
+    0x47ED5BE6, 0x69E12A8C,
+    0x47C3C22E, 0x69FD614A,
+    0x479A1D66, 0x6A1987B0,
+    0x47706D93, 0x6A359DB9,
+    0x4746B2BC, 0x6A51A361,
+    0x471CECE6, 0x6A6D98A4,
+    0x46F31C1A, 0x6A897D7D,
+    0x46C9405C, 0x6AA551E8,
+    0x469F59B4, 0x6AC115E1,
+    0x46756827, 0x6ADCC964,
+    0x464B6BBD, 0x6AF86C6C,
+    0x4621647C, 0x6B13FEF5,
+    0x45F7526B, 0x6B2F80FA,
+    0x45CD358F, 0x6B4AF278,
+    0x45A30DF0, 0x6B66536A,
+    0x4578DB93, 0x6B81A3CD,
+    0x454E9E80, 0x6B9CE39B,
+    0x452456BC, 0x6BB812D0,
+    0x44FA044F, 0x6BD3316A,
+    0x44CFA73F, 0x6BEE3F62,
+    0x44A53F93, 0x6C093CB6,
+    0x447ACD50, 0x6C242960,
+    0x4450507E, 0x6C3F055D,
+    0x4425C923, 0x6C59D0A9,
+    0x43FB3745, 0x6C748B3F,
+    0x43D09AEC, 0x6C8F351C,
+    0x43A5F41E, 0x6CA9CE3A,
+    0x437B42E1, 0x6CC45697,
+    0x4350873C, 0x6CDECE2E,
+    0x4325C135, 0x6CF934FB,
+    0x42FAF0D4, 0x6D138AFA,
+    0x42D0161E, 0x6D2DD027,
+    0x42A5311A, 0x6D48047E,
+    0x427A41D0, 0x6D6227FA,
+    0x424F4845, 0x6D7C3A98,
+    0x42244480, 0x6D963C54,
+    0x41F93688, 0x6DB02D29,
+    0x41CE1E64, 0x6DCA0D14,
+    0x41A2FC1A, 0x6DE3DC11,
+    0x4177CFB0, 0x6DFD9A1B,
+    0x414C992E, 0x6E17472F,
+    0x4121589A, 0x6E30E349,
+    0x40F60DFB, 0x6E4A6E65,
+    0x40CAB957, 0x6E63E87F,
+    0x409F5AB6, 0x6E7D5193,
+    0x4073F21D, 0x6E96A99C,
+    0x40487F93, 0x6EAFF098,
+    0x401D0320, 0x6EC92682,
+    0x3FF17CCA, 0x6EE24B57,
+    0x3FC5EC97, 0x6EFB5F12,
+    0x3F9A528F, 0x6F1461AF,
+    0x3F6EAEB8, 0x6F2D532C,
+    0x3F430118, 0x6F463383,
+    0x3F1749B7, 0x6F5F02B1,
+    0x3EEB889C, 0x6F77C0B3,
+    0x3EBFBDCC, 0x6F906D84,
+    0x3E93E94F, 0x6FA90920,
+    0x3E680B2C, 0x6FC19385,
+    0x3E3C2369, 0x6FDA0CAD,
+    0x3E10320D, 0x6FF27496,
+    0x3DE4371F, 0x700ACB3B,
+    0x3DB832A5, 0x70231099,
+    0x3D8C24A7, 0x703B44AC,
+    0x3D600D2B, 0x70536771,
+    0x3D33EC39, 0x706B78E3,
+    0x3D07C1D5, 0x708378FE,
+    0x3CDB8E09, 0x709B67C0,
+    0x3CAF50DA, 0x70B34524,
+    0x3C830A4F, 0x70CB1127,
+    0x3C56BA70, 0x70E2CBC6,
+    0x3C2A6142, 0x70FA74FB,
+    0x3BFDFECD, 0x71120CC5,
+    0x3BD19317, 0x7129931E,
+    0x3BA51E29, 0x71410804,
+    0x3B78A007, 0x71586B73,
+    0x3B4C18BA, 0x716FBD68,
+    0x3B1F8847, 0x7186FDDE,
+    0x3AF2EEB7, 0x719E2CD2,
+    0x3AC64C0F, 0x71B54A40,
+    0x3A99A057, 0x71CC5626,
+    0x3A6CEB95, 0x71E3507F,
+    0x3A402DD1, 0x71FA3948,
+    0x3A136712, 0x7211107D,
+    0x39E6975D, 0x7227D61C,
+    0x39B9BEBB, 0x723E8A1F,
+    0x398CDD32, 0x72552C84,
+    0x395FF2C9, 0x726BBD48,
+    0x3932FF87, 0x72823C66,
+    0x39060372, 0x7298A9DC,
+    0x38D8FE93, 0x72AF05A6,
+    0x38ABF0EF, 0x72C54FC0,
+    0x387EDA8E, 0x72DB8828,
+    0x3851BB76, 0x72F1AED8,
+    0x382493B0, 0x7307C3D0,
+    0x37F76340, 0x731DC709,
+    0x37CA2A30, 0x7333B883,
+    0x379CE884, 0x73499838,
+    0x376F9E46, 0x735F6626,
+    0x37424B7A, 0x73752249,
+    0x3714F02A, 0x738ACC9E,
+    0x36E78C5A, 0x73A06522,
+    0x36BA2013, 0x73B5EBD0,
+    0x368CAB5C, 0x73CB60A7,
+    0x365F2E3B, 0x73E0C3A3,
+    0x3631A8B7, 0x73F614C0,
+    0x36041AD9, 0x740B53FA,
+    0x35D684A5, 0x74208150,
+    0x35A8E624, 0x74359CBD,
+    0x357B3F5D, 0x744AA63E,
+    0x354D9056, 0x745F9DD1,
+    0x351FD917, 0x74748371,
+    0x34F219A7, 0x7489571B,
+    0x34C4520D, 0x749E18CD,
+    0x3496824F, 0x74B2C883,
+    0x3468AA76, 0x74C7663A,
+    0x343ACA87, 0x74DBF1EF,
+    0x340CE28A, 0x74F06B9E,
+    0x33DEF287, 0x7504D345,
+    0x33B0FA84, 0x751928E0,
+    0x3382FA88, 0x752D6C6C,
+    0x3354F29A, 0x75419DE6,
+    0x3326E2C2, 0x7555BD4B,
+    0x32F8CB07, 0x7569CA98,
+    0x32CAAB6F, 0x757DC5CA,
+    0x329C8402, 0x7591AEDD,
+    0x326E54C7, 0x75A585CF,
+    0x32401DC5, 0x75B94A9C,
+    0x3211DF03, 0x75CCFD42,
+    0x31E39889, 0x75E09DBD,
+    0x31B54A5D, 0x75F42C0A,
+    0x3186F487, 0x7607A827,
+    0x3158970D, 0x761B1211,
+    0x312A31F8, 0x762E69C3,
+    0x30FBC54D, 0x7641AF3C,
+    0x30CD5114, 0x7654E279,
+    0x309ED555, 0x76680376,
+    0x30705217, 0x767B1230,
+    0x3041C760, 0x768E0EA5,
+    0x30133538, 0x76A0F8D2,
+    0x2FE49BA6, 0x76B3D0B3,
+    0x2FB5FAB2, 0x76C69646,
+    0x2F875262, 0x76D94988,
+    0x2F58A2BD, 0x76EBEA77,
+    0x2F29EBCC, 0x76FE790E,
+    0x2EFB2D94, 0x7710F54B,
+    0x2ECC681E, 0x77235F2D,
+    0x2E9D9B70, 0x7735B6AE,
+    0x2E6EC792, 0x7747FBCE,
+    0x2E3FEC8B, 0x775A2E88,
+    0x2E110A62, 0x776C4EDB,
+    0x2DE2211E, 0x777E5CC3,
+    0x2DB330C7, 0x7790583D,
+    0x2D843963, 0x77A24148,
+    0x2D553AFB, 0x77B417DF,
+    0x2D263595, 0x77C5DC01,
+    0x2CF72939, 0x77D78DAA,
+    0x2CC815ED, 0x77E92CD8,
+    0x2C98FBBA, 0x77FAB988,
+    0x2C69DAA6, 0x780C33B8,
+    0x2C3AB2B9, 0x781D9B64,
+    0x2C0B83F9, 0x782EF08B,
+    0x2BDC4E6F, 0x78403328,
+    0x2BAD1221, 0x7851633B,
+    0x2B7DCF17, 0x786280BF,
+    0x2B4E8558, 0x78738BB3,
+    0x2B1F34EB, 0x78848413,
+    0x2AEFDDD8, 0x789569DE,
+    0x2AC08025, 0x78A63D10,
+    0x2A911BDB, 0x78B6FDA8,
+    0x2A61B101, 0x78C7ABA1,
+    0x2A323F9D, 0x78D846FB,
+    0x2A02C7B8, 0x78E8CFB1,
+    0x29D34958, 0x78F945C3,
+    0x29A3C484, 0x7909A92C,
+    0x29743945, 0x7919F9EB,
+    0x2944A7A2, 0x792A37FE,
+    0x29150FA1, 0x793A6360,
+    0x28E5714A, 0x794A7C11,
+    0x28B5CCA5, 0x795A820E,
+    0x288621B9, 0x796A7554,
+    0x2856708C, 0x797A55E0,
+    0x2826B928, 0x798A23B1,
+    0x27F6FB92, 0x7999DEC3,
+    0x27C737D2, 0x79A98715,
+    0x27976DF1, 0x79B91CA4,
+    0x27679DF4, 0x79C89F6D,
+    0x2737C7E3, 0x79D80F6F,
+    0x2707EBC6, 0x79E76CA6,
+    0x26D809A5, 0x79F6B711,
+    0x26A82185, 0x7A05EEAD,
+    0x26783370, 0x7A151377,
+    0x26483F6C, 0x7A24256E,
+    0x26184581, 0x7A33248F,
+    0x25E845B5, 0x7A4210D8,
+    0x25B84012, 0x7A50EA46,
+    0x2588349D, 0x7A5FB0D8,
+    0x2558235E, 0x7A6E648A,
+    0x25280C5D, 0x7A7D055B,
+    0x24F7EFA1, 0x7A8B9348,
+    0x24C7CD32, 0x7A9A0E4F,
+    0x2497A517, 0x7AA8766E,
+    0x24677757, 0x7AB6CBA3,
+    0x243743FA, 0x7AC50DEB,
+    0x24070B07, 0x7AD33D45,
+    0x23D6CC86, 0x7AE159AE,
+    0x23A6887E, 0x7AEF6323,
+    0x23763EF7, 0x7AFD59A3,
+    0x2345EFF7, 0x7B0B3D2C,
+    0x23159B87, 0x7B190DBB,
+    0x22E541AE, 0x7B26CB4F,
+    0x22B4E274, 0x7B3475E4,
+    0x22847DDF, 0x7B420D7A,
+    0x225413F8, 0x7B4F920E,
+    0x2223A4C5, 0x7B5D039D,
+    0x21F3304E, 0x7B6A6227,
+    0x21C2B69C, 0x7B77ADA8,
+    0x219237B4, 0x7B84E61E,
+    0x2161B39F, 0x7B920B89,
+    0x21312A65, 0x7B9F1DE5,
+    0x21009C0B, 0x7BAC1D31,
+    0x20D0089B, 0x7BB9096A,
+    0x209F701C, 0x7BC5E28F,
+    0x206ED295, 0x7BD2A89E,
+    0x203E300D, 0x7BDF5B94,
+    0x200D888C, 0x7BEBFB70,
+    0x1FDCDC1A, 0x7BF88830,
+    0x1FAC2ABF, 0x7C0501D1,
+    0x1F7B7480, 0x7C116853,
+    0x1F4AB967, 0x7C1DBBB2,
+    0x1F19F97B, 0x7C29FBEE,
+    0x1EE934C2, 0x7C362904,
+    0x1EB86B46, 0x7C4242F2,
+    0x1E879D0C, 0x7C4E49B6,
+    0x1E56CA1E, 0x7C5A3D4F,
+    0x1E25F281, 0x7C661DBB,
+    0x1DF5163F, 0x7C71EAF8,
+    0x1DC4355D, 0x7C7DA504,
+    0x1D934FE5, 0x7C894BDD,
+    0x1D6265DD, 0x7C94DF82,
+    0x1D31774D, 0x7CA05FF1,
+    0x1D00843C, 0x7CABCD27,
+    0x1CCF8CB3, 0x7CB72724,
+    0x1C9E90B8, 0x7CC26DE5,
+    0x1C6D9053, 0x7CCDA168,
+    0x1C3C8B8C, 0x7CD8C1AD,
+    0x1C0B826A, 0x7CE3CEB1,
+    0x1BDA74F5, 0x7CEEC873,
+    0x1BA96334, 0x7CF9AEF0,
+    0x1B784D30, 0x7D048228,
+    0x1B4732EF, 0x7D0F4218,
+    0x1B161479, 0x7D19EEBE,
+    0x1AE4F1D6, 0x7D24881A,
+    0x1AB3CB0C, 0x7D2F0E2A,
+    0x1A82A025, 0x7D3980EC,
+    0x1A517127, 0x7D43E05E,
+    0x1A203E1B, 0x7D4E2C7E,
+    0x19EF0706, 0x7D58654C,
+    0x19BDCBF2, 0x7D628AC5,
+    0x198C8CE6, 0x7D6C9CE9,
+    0x195B49E9, 0x7D769BB5,
+    0x192A0303, 0x7D808727,
+    0x18F8B83C, 0x7D8A5F3F,
+    0x18C7699B, 0x7D9423FB,
+    0x18961727, 0x7D9DD55A,
+    0x1864C0E9, 0x7DA77359,
+    0x183366E8, 0x7DB0FDF7,
+    0x1802092C, 0x7DBA7534,
+    0x17D0A7BB, 0x7DC3D90D,
+    0x179F429F, 0x7DCD2981,
+    0x176DD9DE, 0x7DD6668E,
+    0x173C6D80, 0x7DDF9034,
+    0x170AFD8D, 0x7DE8A670,
+    0x16D98A0C, 0x7DF1A942,
+    0x16A81305, 0x7DFA98A7,
+    0x1676987F, 0x7E03749F,
+    0x16451A83, 0x7E0C3D29,
+    0x16139917, 0x7E14F242,
+    0x15E21444, 0x7E1D93E9,
+    0x15B08C11, 0x7E26221E,
+    0x157F0086, 0x7E2E9CDF,
+    0x154D71AA, 0x7E37042A,
+    0x151BDF85, 0x7E3F57FE,
+    0x14EA4A1F, 0x7E47985B,
+    0x14B8B17F, 0x7E4FC53E,
+    0x148715AD, 0x7E57DEA6,
+    0x145576B1, 0x7E5FE493,
+    0x1423D492, 0x7E67D702,
+    0x13F22F57, 0x7E6FB5F3,
+    0x13C0870A, 0x7E778165,
+    0x138EDBB0, 0x7E7F3956,
+    0x135D2D53, 0x7E86DDC5,
+    0x132B7BF9, 0x7E8E6EB1,
+    0x12F9C7AA, 0x7E95EC19,
+    0x12C8106E, 0x7E9D55FC,
+    0x1296564D, 0x7EA4AC58,
+    0x1264994E, 0x7EABEF2C,
+    0x1232D978, 0x7EB31E77,
+    0x120116D4, 0x7EBA3A39,
+    0x11CF516A, 0x7EC1426F,
+    0x119D8940, 0x7EC8371A,
+    0x116BBE5F, 0x7ECF1837,
+    0x1139F0CE, 0x7ED5E5C6,
+    0x11082096, 0x7EDC9FC6,
+    0x10D64DBC, 0x7EE34635,
+    0x10A4784A, 0x7EE9D913,
+    0x1072A047, 0x7EF0585F,
+    0x1040C5BB, 0x7EF6C418,
+    0x100EE8AD, 0x7EFD1C3C,
+    0x0FDD0925, 0x7F0360CB,
+    0x0FAB272B, 0x7F0991C3,
+    0x0F7942C6, 0x7F0FAF24,
+    0x0F475BFE, 0x7F15B8EE,
+    0x0F1572DC, 0x7F1BAF1E,
+    0x0EE38765, 0x7F2191B4,
+    0x0EB199A3, 0x7F2760AF,
+    0x0E7FA99D, 0x7F2D1C0E,
+    0x0E4DB75B, 0x7F32C3D0,
+    0x0E1BC2E3, 0x7F3857F5,
+    0x0DE9CC3F, 0x7F3DD87C,
+    0x0DB7D376, 0x7F434563,
+    0x0D85D88F, 0x7F489EAA,
+    0x0D53DB92, 0x7F4DE450,
+    0x0D21DC87, 0x7F531654,
+    0x0CEFDB75, 0x7F5834B6,
+    0x0CBDD865, 0x7F5D3F75,
+    0x0C8BD35E, 0x7F62368F,
+    0x0C59CC67, 0x7F671A04,
+    0x0C27C389, 0x7F6BE9D4,
+    0x0BF5B8CB, 0x7F70A5FD,
+    0x0BC3AC35, 0x7F754E7F,
+    0x0B919DCE, 0x7F79E35A,
+    0x0B5F8D9F, 0x7F7E648B,
+    0x0B2D7BAE, 0x7F82D214,
+    0x0AFB6805, 0x7F872BF3,
+    0x0AC952AA, 0x7F8B7226,
+    0x0A973BA5, 0x7F8FA4AF,
+    0x0A6522FE, 0x7F93C38C,
+    0x0A3308BC, 0x7F97CEBC,
+    0x0A00ECE8, 0x7F9BC63F,
+    0x09CECF89, 0x7F9FAA15,
+    0x099CB0A7, 0x7FA37A3C,
+    0x096A9049, 0x7FA736B4,
+    0x09386E77, 0x7FAADF7C,
+    0x09064B3A, 0x7FAE7494,
+    0x08D42698, 0x7FB1F5FC,
+    0x08A2009A, 0x7FB563B2,
+    0x086FD947, 0x7FB8BDB7,
+    0x083DB0A7, 0x7FBC040A,
+    0x080B86C1, 0x7FBF36A9,
+    0x07D95B9E, 0x7FC25596,
+    0x07A72F45, 0x7FC560CF,
+    0x077501BE, 0x7FC85853,
+    0x0742D310, 0x7FCB3C23,
+    0x0710A344, 0x7FCE0C3E,
+    0x06DE7261, 0x7FD0C8A3,
+    0x06AC406F, 0x7FD37152,
+    0x067A0D75, 0x7FD6064B,
+    0x0647D97C, 0x7FD8878D,
+    0x0615A48A, 0x7FDAF518,
+    0x05E36EA9, 0x7FDD4EEC,
+    0x05B137DF, 0x7FDF9508,
+    0x057F0034, 0x7FE1C76B,
+    0x054CC7B0, 0x7FE3E616,
+    0x051A8E5C, 0x7FE5F108,
+    0x04E8543D, 0x7FE7E840,
+    0x04B6195D, 0x7FE9CBC0,
+    0x0483DDC3, 0x7FEB9B85,
+    0x0451A176, 0x7FED5790,
+    0x041F647F, 0x7FEEFFE1,
+    0x03ED26E6, 0x7FF09477,
+    0x03BAE8B1, 0x7FF21553,
+    0x0388A9E9, 0x7FF38273,
+    0x03566A96, 0x7FF4DBD8,
+    0x03242ABF, 0x7FF62182,
+    0x02F1EA6B, 0x7FF7536F,
+    0x02BFA9A4, 0x7FF871A1,
+    0x028D6870, 0x7FF97C17,
+    0x025B26D7, 0x7FFA72D1,
+    0x0228E4E1, 0x7FFB55CE,
+    0x01F6A296, 0x7FFC250F,
+    0x01C45FFE, 0x7FFCE093,
+    0x01921D1F, 0x7FFD885A,
+    0x015FDA03, 0x7FFE1C64,
+    0x012D96B0, 0x7FFE9CB2,
+    0x00FB532F, 0x7FFF0942,
+    0x00C90F88, 0x7FFF6216,
+    0x0096CBC1, 0x7FFFA72C,
+    0x006487E3, 0x7FFFD885,
+    0x003243F5, 0x7FFFF621,
+    0x00000000, 0x7FFFFFFF,
+    0xFFCDBC0A, 0x7FFFF621,
+    0xFF9B781D, 0x7FFFD885,
+    0xFF69343E, 0x7FFFA72C,
+    0xFF36F078, 0x7FFF6216,
+    0xFF04ACD0, 0x7FFF0942,
+    0xFED2694F, 0x7FFE9CB2,
+    0xFEA025FC, 0x7FFE1C64,
+    0xFE6DE2E0, 0x7FFD885A,
+    0xFE3BA001, 0x7FFCE093,
+    0xFE095D69, 0x7FFC250F,
+    0xFDD71B1E, 0x7FFB55CE,
+    0xFDA4D928, 0x7FFA72D1,
+    0xFD72978F, 0x7FF97C17,
+    0xFD40565B, 0x7FF871A1,
+    0xFD0E1594, 0x7FF7536F,
+    0xFCDBD541, 0x7FF62182,
+    0xFCA99569, 0x7FF4DBD8,
+    0xFC775616, 0x7FF38273,
+    0xFC45174E, 0x7FF21553,
+    0xFC12D919, 0x7FF09477,
+    0xFBE09B80, 0x7FEEFFE1,
+    0xFBAE5E89, 0x7FED5790,
+    0xFB7C223C, 0x7FEB9B85,
+    0xFB49E6A2, 0x7FE9CBC0,
+    0xFB17ABC2, 0x7FE7E840,
+    0xFAE571A4, 0x7FE5F108,
+    0xFAB3384F, 0x7FE3E616,
+    0xFA80FFCB, 0x7FE1C76B,
+    0xFA4EC820, 0x7FDF9508,
+    0xFA1C9156, 0x7FDD4EEC,
+    0xF9EA5B75, 0x7FDAF518,
+    0xF9B82683, 0x7FD8878D,
+    0xF985F28A, 0x7FD6064B,
+    0xF953BF90, 0x7FD37152,
+    0xF9218D9E, 0x7FD0C8A3,
+    0xF8EF5CBB, 0x7FCE0C3E,
+    0xF8BD2CEF, 0x7FCB3C23,
+    0xF88AFE41, 0x7FC85853,
+    0xF858D0BA, 0x7FC560CF,
+    0xF826A461, 0x7FC25596,
+    0xF7F4793E, 0x7FBF36A9,
+    0xF7C24F58, 0x7FBC040A,
+    0xF79026B8, 0x7FB8BDB7,
+    0xF75DFF65, 0x7FB563B2,
+    0xF72BD967, 0x7FB1F5FC,
+    0xF6F9B4C5, 0x7FAE7494,
+    0xF6C79188, 0x7FAADF7C,
+    0xF6956FB6, 0x7FA736B4,
+    0xF6634F58, 0x7FA37A3C,
+    0xF6313076, 0x7F9FAA15,
+    0xF5FF1317, 0x7F9BC63F,
+    0xF5CCF743, 0x7F97CEBC,
+    0xF59ADD01, 0x7F93C38C,
+    0xF568C45A, 0x7F8FA4AF,
+    0xF536AD55, 0x7F8B7226,
+    0xF50497FA, 0x7F872BF3,
+    0xF4D28451, 0x7F82D214,
+    0xF4A07260, 0x7F7E648B,
+    0xF46E6231, 0x7F79E35A,
+    0xF43C53CA, 0x7F754E7F,
+    0xF40A4734, 0x7F70A5FD,
+    0xF3D83C76, 0x7F6BE9D4,
+    0xF3A63398, 0x7F671A04,
+    0xF3742CA1, 0x7F62368F,
+    0xF342279A, 0x7F5D3F75,
+    0xF310248A, 0x7F5834B6,
+    0xF2DE2378, 0x7F531654,
+    0xF2AC246D, 0x7F4DE450,
+    0xF27A2770, 0x7F489EAA,
+    0xF2482C89, 0x7F434563,
+    0xF21633C0, 0x7F3DD87C,
+    0xF1E43D1C, 0x7F3857F5,
+    0xF1B248A5, 0x7F32C3D0,
+    0xF1805662, 0x7F2D1C0E,
+    0xF14E665C, 0x7F2760AF,
+    0xF11C789A, 0x7F2191B4,
+    0xF0EA8D23, 0x7F1BAF1E,
+    0xF0B8A401, 0x7F15B8EE,
+    0xF086BD39, 0x7F0FAF24,
+    0xF054D8D4, 0x7F0991C3,
+    0xF022F6DA, 0x7F0360CB,
+    0xEFF11752, 0x7EFD1C3C,
+    0xEFBF3A44, 0x7EF6C418,
+    0xEF8D5FB8, 0x7EF0585F,
+    0xEF5B87B5, 0x7EE9D913,
+    0xEF29B243, 0x7EE34635,
+    0xEEF7DF6A, 0x7EDC9FC6,
+    0xEEC60F31, 0x7ED5E5C6,
+    0xEE9441A0, 0x7ECF1837,
+    0xEE6276BF, 0x7EC8371A,
+    0xEE30AE95, 0x7EC1426F,
+    0xEDFEE92B, 0x7EBA3A39,
+    0xEDCD2687, 0x7EB31E77,
+    0xED9B66B2, 0x7EABEF2C,
+    0xED69A9B2, 0x7EA4AC58,
+    0xED37EF91, 0x7E9D55FC,
+    0xED063855, 0x7E95EC19,
+    0xECD48406, 0x7E8E6EB1,
+    0xECA2D2AC, 0x7E86DDC5,
+    0xEC71244F, 0x7E7F3956,
+    0xEC3F78F5, 0x7E778165,
+    0xEC0DD0A8, 0x7E6FB5F3,
+    0xEBDC2B6D, 0x7E67D702,
+    0xEBAA894E, 0x7E5FE493,
+    0xEB78EA52, 0x7E57DEA6,
+    0xEB474E80, 0x7E4FC53E,
+    0xEB15B5E0, 0x7E47985B,
+    0xEAE4207A, 0x7E3F57FE,
+    0xEAB28E55, 0x7E37042A,
+    0xEA80FF79, 0x7E2E9CDF,
+    0xEA4F73EE, 0x7E26221E,
+    0xEA1DEBBB, 0x7E1D93E9,
+    0xE9EC66E8, 0x7E14F242,
+    0xE9BAE57C, 0x7E0C3D29,
+    0xE9896780, 0x7E03749F,
+    0xE957ECFB, 0x7DFA98A7,
+    0xE92675F4, 0x7DF1A942,
+    0xE8F50273, 0x7DE8A670,
+    0xE8C3927F, 0x7DDF9034,
+    0xE8922621, 0x7DD6668E,
+    0xE860BD60, 0x7DCD2981,
+    0xE82F5844, 0x7DC3D90D,
+    0xE7FDF6D3, 0x7DBA7534,
+    0xE7CC9917, 0x7DB0FDF7,
+    0xE79B3F16, 0x7DA77359,
+    0xE769E8D8, 0x7D9DD55A,
+    0xE7389664, 0x7D9423FB,
+    0xE70747C3, 0x7D8A5F3F,
+    0xE6D5FCFC, 0x7D808727,
+    0xE6A4B616, 0x7D769BB5,
+    0xE6737319, 0x7D6C9CE9,
+    0xE642340D, 0x7D628AC5,
+    0xE610F8F9, 0x7D58654C,
+    0xE5DFC1E4, 0x7D4E2C7E,
+    0xE5AE8ED8, 0x7D43E05E,
+    0xE57D5FDA, 0x7D3980EC,
+    0xE54C34F3, 0x7D2F0E2A,
+    0xE51B0E2A, 0x7D24881A,
+    0xE4E9EB86, 0x7D19EEBE,
+    0xE4B8CD10, 0x7D0F4218,
+    0xE487B2CF, 0x7D048228,
+    0xE4569CCB, 0x7CF9AEF0,
+    0xE4258B0A, 0x7CEEC873,
+    0xE3F47D95, 0x7CE3CEB1,
+    0xE3C37473, 0x7CD8C1AD,
+    0xE3926FAC, 0x7CCDA168,
+    0xE3616F47, 0x7CC26DE5,
+    0xE330734C, 0x7CB72724,
+    0xE2FF7BC3, 0x7CABCD27,
+    0xE2CE88B2, 0x7CA05FF1,
+    0xE29D9A22, 0x7C94DF82,
+    0xE26CB01A, 0x7C894BDD,
+    0xE23BCAA2, 0x7C7DA504,
+    0xE20AE9C1, 0x7C71EAF8,
+    0xE1DA0D7E, 0x7C661DBB,
+    0xE1A935E1, 0x7C5A3D4F,
+    0xE17862F3, 0x7C4E49B6,
+    0xE14794B9, 0x7C4242F2,
+    0xE116CB3D, 0x7C362904,
+    0xE0E60684, 0x7C29FBEE,
+    0xE0B54698, 0x7C1DBBB2,
+    0xE0848B7F, 0x7C116853,
+    0xE053D541, 0x7C0501D1,
+    0xE02323E5, 0x7BF88830,
+    0xDFF27773, 0x7BEBFB70,
+    0xDFC1CFF2, 0x7BDF5B94,
+    0xDF912D6A, 0x7BD2A89E,
+    0xDF608FE3, 0x7BC5E28F,
+    0xDF2FF764, 0x7BB9096A,
+    0xDEFF63F4, 0x7BAC1D31,
+    0xDECED59B, 0x7B9F1DE5,
+    0xDE9E4C60, 0x7B920B89,
+    0xDE6DC84B, 0x7B84E61E,
+    0xDE3D4963, 0x7B77ADA8,
+    0xDE0CCFB1, 0x7B6A6227,
+    0xDDDC5B3A, 0x7B5D039D,
+    0xDDABEC07, 0x7B4F920E,
+    0xDD7B8220, 0x7B420D7A,
+    0xDD4B1D8B, 0x7B3475E4,
+    0xDD1ABE51, 0x7B26CB4F,
+    0xDCEA6478, 0x7B190DBB,
+    0xDCBA1008, 0x7B0B3D2C,
+    0xDC89C108, 0x7AFD59A3,
+    0xDC597781, 0x7AEF6323,
+    0xDC293379, 0x7AE159AE,
+    0xDBF8F4F8, 0x7AD33D45,
+    0xDBC8BC05, 0x7AC50DEB,
+    0xDB9888A8, 0x7AB6CBA3,
+    0xDB685AE8, 0x7AA8766E,
+    0xDB3832CD, 0x7A9A0E4F,
+    0xDB08105E, 0x7A8B9348,
+    0xDAD7F3A2, 0x7A7D055B,
+    0xDAA7DCA1, 0x7A6E648A,
+    0xDA77CB62, 0x7A5FB0D8,
+    0xDA47BFED, 0x7A50EA46,
+    0xDA17BA4A, 0x7A4210D8,
+    0xD9E7BA7E, 0x7A33248F,
+    0xD9B7C093, 0x7A24256E,
+    0xD987CC8F, 0x7A151377,
+    0xD957DE7A, 0x7A05EEAD,
+    0xD927F65B, 0x79F6B711,
+    0xD8F81439, 0x79E76CA6,
+    0xD8C8381C, 0x79D80F6F,
+    0xD898620C, 0x79C89F6D,
+    0xD868920F, 0x79B91CA4,
+    0xD838C82D, 0x79A98715,
+    0xD809046D, 0x7999DEC3,
+    0xD7D946D7, 0x798A23B1,
+    0xD7A98F73, 0x797A55E0,
+    0xD779DE46, 0x796A7554,
+    0xD74A335A, 0x795A820E,
+    0xD71A8EB5, 0x794A7C11,
+    0xD6EAF05E, 0x793A6360,
+    0xD6BB585D, 0x792A37FE,
+    0xD68BC6BA, 0x7919F9EB,
+    0xD65C3B7B, 0x7909A92C,
+    0xD62CB6A7, 0x78F945C3,
+    0xD5FD3847, 0x78E8CFB1,
+    0xD5CDC062, 0x78D846FB,
+    0xD59E4EFE, 0x78C7ABA1,
+    0xD56EE424, 0x78B6FDA8,
+    0xD53F7FDA, 0x78A63D10,
+    0xD5102227, 0x789569DE,
+    0xD4E0CB14, 0x78848413,
+    0xD4B17AA7, 0x78738BB3,
+    0xD48230E8, 0x786280BF,
+    0xD452EDDE, 0x7851633B,
+    0xD423B190, 0x78403328,
+    0xD3F47C06, 0x782EF08B,
+    0xD3C54D46, 0x781D9B64,
+    0xD3962559, 0x780C33B8,
+    0xD3670445, 0x77FAB988,
+    0xD337EA12, 0x77E92CD8,
+    0xD308D6C6, 0x77D78DAA,
+    0xD2D9CA6A, 0x77C5DC01,
+    0xD2AAC504, 0x77B417DF,
+    0xD27BC69C, 0x77A24148,
+    0xD24CCF38, 0x7790583D,
+    0xD21DDEE1, 0x777E5CC3,
+    0xD1EEF59E, 0x776C4EDB,
+    0xD1C01374, 0x775A2E88,
+    0xD191386D, 0x7747FBCE,
+    0xD162648F, 0x7735B6AE,
+    0xD13397E1, 0x77235F2D,
+    0xD104D26B, 0x7710F54B,
+    0xD0D61433, 0x76FE790E,
+    0xD0A75D42, 0x76EBEA77,
+    0xD078AD9D, 0x76D94988,
+    0xD04A054D, 0x76C69646,
+    0xD01B6459, 0x76B3D0B3,
+    0xCFECCAC7, 0x76A0F8D2,
+    0xCFBE389F, 0x768E0EA5,
+    0xCF8FADE8, 0x767B1230,
+    0xCF612AAA, 0x76680376,
+    0xCF32AEEB, 0x7654E279,
+    0xCF043AB2, 0x7641AF3C,
+    0xCED5CE08, 0x762E69C3,
+    0xCEA768F2, 0x761B1211,
+    0xCE790B78, 0x7607A827,
+    0xCE4AB5A2, 0x75F42C0A,
+    0xCE1C6776, 0x75E09DBD,
+    0xCDEE20FC, 0x75CCFD42,
+    0xCDBFE23A, 0x75B94A9C,
+    0xCD91AB38, 0x75A585CF,
+    0xCD637BFD, 0x7591AEDD,
+    0xCD355490, 0x757DC5CA,
+    0xCD0734F8, 0x7569CA98,
+    0xCCD91D3D, 0x7555BD4B,
+    0xCCAB0D65, 0x75419DE6,
+    0xCC7D0577, 0x752D6C6C,
+    0xCC4F057B, 0x751928E0,
+    0xCC210D78, 0x7504D345,
+    0xCBF31D75, 0x74F06B9E,
+    0xCBC53578, 0x74DBF1EF,
+    0xCB975589, 0x74C7663A,
+    0xCB697DB0, 0x74B2C883,
+    0xCB3BADF2, 0x749E18CD,
+    0xCB0DE658, 0x7489571B,
+    0xCAE026E8, 0x74748371,
+    0xCAB26FA9, 0x745F9DD1,
+    0xCA84C0A2, 0x744AA63E,
+    0xCA5719DB, 0x74359CBD,
+    0xCA297B5A, 0x74208150,
+    0xC9FBE527, 0x740B53FA,
+    0xC9CE5748, 0x73F614C0,
+    0xC9A0D1C4, 0x73E0C3A3,
+    0xC97354A3, 0x73CB60A7,
+    0xC945DFEC, 0x73B5EBD0,
+    0xC91873A5, 0x73A06522,
+    0xC8EB0FD6, 0x738ACC9E,
+    0xC8BDB485, 0x73752249,
+    0xC89061BA, 0x735F6626,
+    0xC863177B, 0x73499838,
+    0xC835D5D0, 0x7333B883,
+    0xC8089CBF, 0x731DC709,
+    0xC7DB6C50, 0x7307C3D0,
+    0xC7AE4489, 0x72F1AED8,
+    0xC7812571, 0x72DB8828,
+    0xC7540F10, 0x72C54FC0,
+    0xC727016C, 0x72AF05A6,
+    0xC6F9FC8D, 0x7298A9DC,
+    0xC6CD0079, 0x72823C66,
+    0xC6A00D36, 0x726BBD48,
+    0xC67322CD, 0x72552C84,
+    0xC6464144, 0x723E8A1F,
+    0xC61968A2, 0x7227D61C,
+    0xC5EC98ED, 0x7211107D,
+    0xC5BFD22E, 0x71FA3948,
+    0xC593146A, 0x71E3507F,
+    0xC5665FA8, 0x71CC5626,
+    0xC539B3F0, 0x71B54A40,
+    0xC50D1148, 0x719E2CD2,
+    0xC4E077B8, 0x7186FDDE,
+    0xC4B3E746, 0x716FBD68,
+    0xC4875FF8, 0x71586B73,
+    0xC45AE1D7, 0x71410804,
+    0xC42E6CE8, 0x7129931E,
+    0xC4020132, 0x71120CC5,
+    0xC3D59EBD, 0x70FA74FB,
+    0xC3A9458F, 0x70E2CBC6,
+    0xC37CF5B0, 0x70CB1127,
+    0xC350AF25, 0x70B34524,
+    0xC32471F6, 0x709B67C0,
+    0xC2F83E2A, 0x708378FE,
+    0xC2CC13C7, 0x706B78E3,
+    0xC29FF2D4, 0x70536771,
+    0xC273DB58, 0x703B44AC,
+    0xC247CD5A, 0x70231099,
+    0xC21BC8E0, 0x700ACB3B,
+    0xC1EFCDF2, 0x6FF27496,
+    0xC1C3DC96, 0x6FDA0CAD,
+    0xC197F4D3, 0x6FC19385,
+    0xC16C16B0, 0x6FA90920,
+    0xC1404233, 0x6F906D84,
+    0xC1147763, 0x6F77C0B3,
+    0xC0E8B648, 0x6F5F02B1,
+    0xC0BCFEE7, 0x6F463383,
+    0xC0915147, 0x6F2D532C,
+    0xC065AD70, 0x6F1461AF,
+    0xC03A1368, 0x6EFB5F12,
+    0xC00E8335, 0x6EE24B57,
+    0xBFE2FCDF, 0x6EC92682,
+    0xBFB7806C, 0x6EAFF098,
+    0xBF8C0DE2, 0x6E96A99C,
+    0xBF60A54A, 0x6E7D5193,
+    0xBF3546A8, 0x6E63E87F,
+    0xBF09F204, 0x6E4A6E65,
+    0xBEDEA765, 0x6E30E349,
+    0xBEB366D1, 0x6E17472F,
+    0xBE88304F, 0x6DFD9A1B,
+    0xBE5D03E5, 0x6DE3DC11,
+    0xBE31E19B, 0x6DCA0D14,
+    0xBE06C977, 0x6DB02D29,
+    0xBDDBBB7F, 0x6D963C54,
+    0xBDB0B7BA, 0x6D7C3A98,
+    0xBD85BE2F, 0x6D6227FA,
+    0xBD5ACEE5, 0x6D48047E,
+    0xBD2FE9E1, 0x6D2DD027,
+    0xBD050F2C, 0x6D138AFA,
+    0xBCDA3ECA, 0x6CF934FB,
+    0xBCAF78C3, 0x6CDECE2E,
+    0xBC84BD1E, 0x6CC45697,
+    0xBC5A0BE1, 0x6CA9CE3A,
+    0xBC2F6513, 0x6C8F351C,
+    0xBC04C8BA, 0x6C748B3F,
+    0xBBDA36DC, 0x6C59D0A9,
+    0xBBAFAF81, 0x6C3F055D,
+    0xBB8532AF, 0x6C242960,
+    0xBB5AC06C, 0x6C093CB6,
+    0xBB3058C0, 0x6BEE3F62,
+    0xBB05FBB0, 0x6BD3316A,
+    0xBADBA943, 0x6BB812D0,
+    0xBAB1617F, 0x6B9CE39B,
+    0xBA87246C, 0x6B81A3CD,
+    0xBA5CF210, 0x6B66536A,
+    0xBA32CA70, 0x6B4AF278,
+    0xBA08AD94, 0x6B2F80FA,
+    0xB9DE9B83, 0x6B13FEF5,
+    0xB9B49442, 0x6AF86C6C,
+    0xB98A97D8, 0x6ADCC964,
+    0xB960A64B, 0x6AC115E1,
+    0xB936BFA3, 0x6AA551E8,
+    0xB90CE3E6, 0x6A897D7D,
+    0xB8E31319, 0x6A6D98A4,
+    0xB8B94D44, 0x6A51A361,
+    0xB88F926C, 0x6A359DB9,
+    0xB865E299, 0x6A1987B0,
+    0xB83C3DD1, 0x69FD614A,
+    0xB812A419, 0x69E12A8C,
+    0xB7E9157A, 0x69C4E37A,
+    0xB7BF91F8, 0x69A88C18,
+    0xB796199B, 0x698C246C,
+    0xB76CAC68, 0x696FAC78,
+    0xB7434A67, 0x69532442,
+    0xB719F39D, 0x69368BCE,
+    0xB6F0A811, 0x6919E320,
+    0xB6C767CA, 0x68FD2A3D,
+    0xB69E32CD, 0x68E06129,
+    0xB6750921, 0x68C387E9,
+    0xB64BEACC, 0x68A69E81,
+    0xB622D7D5, 0x6889A4F5,
+    0xB5F9D042, 0x686C9B4B,
+    0xB5D0D41A, 0x684F8186,
+    0xB5A7E362, 0x683257AA,
+    0xB57EFE21, 0x68151DBE,
+    0xB556245E, 0x67F7D3C4,
+    0xB52D561E, 0x67DA79C2,
+    0xB5049368, 0x67BD0FBC,
+    0xB4DBDC42, 0x679F95B7,
+    0xB4B330B2, 0x67820BB6,
+    0xB48A90C0, 0x676471C0,
+    0xB461FC70, 0x6746C7D7,
+    0xB43973C9, 0x67290E02,
+    0xB410F6D2, 0x670B4443,
+    0xB3E88591, 0x66ED6AA1,
+    0xB3C0200C, 0x66CF811F,
+    0xB397C649, 0x66B187C3,
+    0xB36F784E, 0x66937E90,
+    0xB3473622, 0x6675658C,
+    0xB31EFFCB, 0x66573CBB,
+    0xB2F6D54F, 0x66390422,
+    0xB2CEB6B5, 0x661ABBC5,
+    0xB2A6A401, 0x65FC63A9,
+    0xB27E9D3B, 0x65DDFBD3,
+    0xB256A26A, 0x65BF8447,
+    0xB22EB392, 0x65A0FD0B,
+    0xB206D0BA, 0x65826622,
+    0xB1DEF9E8, 0x6563BF92,
+    0xB1B72F23, 0x6545095F,
+    0xB18F7070, 0x6526438E,
+    0xB167BDD6, 0x65076E24,
+    0xB140175B, 0x64E88926,
+    0xB1187D05, 0x64C99498,
+    0xB0F0EEDA, 0x64AA907F,
+    0xB0C96CDF, 0x648B7CDF,
+    0xB0A1F71C, 0x646C59BF,
+    0xB07A8D97, 0x644D2722,
+    0xB0533055, 0x642DE50D,
+    0xB02BDF5C, 0x640E9385,
+    0xB0049AB2, 0x63EF328F,
+    0xAFDD625F, 0x63CFC230,
+    0xAFB63667, 0x63B0426D,
+    0xAF8F16D0, 0x6390B34A,
+    0xAF6803A1, 0x637114CC,
+    0xAF40FCE0, 0x635166F8,
+    0xAF1A0293, 0x6331A9D4,
+    0xAEF314BF, 0x6311DD63,
+    0xAECC336B, 0x62F201AC,
+    0xAEA55E9D, 0x62D216B2,
+    0xAE7E965B, 0x62B21C7B,
+    0xAE57DAAA, 0x6292130C,
+    0xAE312B91, 0x6271FA69,
+    0xAE0A8916, 0x6251D297,
+    0xADE3F33E, 0x62319B9D,
+    0xADBD6A10, 0x6211557D,
+    0xAD96ED91, 0x61F1003E,
+    0xAD707DC8, 0x61D09BE5,
+    0xAD4A1ABA, 0x61B02876,
+    0xAD23C46D, 0x618FA5F6,
+    0xACFD7AE8, 0x616F146B,
+    0xACD73E30, 0x614E73D9,
+    0xACB10E4A, 0x612DC446,
+    0xAC8AEB3E, 0x610D05B7,
+    0xAC64D510, 0x60EC3830,
+    0xAC3ECBC7, 0x60CB5BB6,
+    0xAC18CF68, 0x60AA704F,
+    0xABF2DFFA, 0x60897600,
+    0xABCCFD82, 0x60686CCE,
+    0xABA72806, 0x604754BE,
+    0xAB815F8C, 0x60262DD5,
+    0xAB5BA41A, 0x6004F818,
+    0xAB35F5B5, 0x5FE3B38D,
+    0xAB105464, 0x5FC26038,
+    0xAAEAC02B, 0x5FA0FE1E,
+    0xAAC53912, 0x5F7F8D46,
+    0xAA9FBF1D, 0x5F5E0DB3,
+    0xAA7A5253, 0x5F3C7F6B,
+    0xAA54F2B9, 0x5F1AE273,
+    0xAA2FA055, 0x5EF936D1,
+    0xAA0A5B2D, 0x5ED77C89,
+    0xA9E52347, 0x5EB5B3A1,
+    0xA9BFF8A8, 0x5E93DC1F,
+    0xA99ADB56, 0x5E71F606,
+    0xA975CB56, 0x5E50015D,
+    0xA950C8AF, 0x5E2DFE28,
+    0xA92BD366, 0x5E0BEC6E,
+    0xA906EB81, 0x5DE9CC32,
+    0xA8E21106, 0x5DC79D7C,
+    0xA8BD43FA, 0x5DA5604E,
+    0xA8988463, 0x5D8314B0,
+    0xA873D246, 0x5D60BAA6,
+    0xA84F2DA9, 0x5D3E5236,
+    0xA82A9693, 0x5D1BDB65,
+    0xA8060D08, 0x5CF95638,
+    0xA7E1910E, 0x5CD6C2B4,
+    0xA7BD22AB, 0x5CB420DF,
+    0xA798C1E4, 0x5C9170BF,
+    0xA7746EC0, 0x5C6EB258,
+    0xA7502943, 0x5C4BE5B0,
+    0xA72BF173, 0x5C290ACC,
+    0xA707C756, 0x5C0621B2,
+    0xA6E3AAF2, 0x5BE32A67,
+    0xA6BF9C4B, 0x5BC024F0,
+    0xA69B9B68, 0x5B9D1153,
+    0xA677A84E, 0x5B79EF96,
+    0xA653C302, 0x5B56BFBD,
+    0xA62FEB8B, 0x5B3381CE,
+    0xA60C21ED, 0x5B1035CF,
+    0xA5E8662F, 0x5AECDBC4,
+    0xA5C4B855, 0x5AC973B4,
+    0xA5A11865, 0x5AA5FDA4,
+    0xA57D8666, 0x5A82799A,
+    0xA55A025B, 0x5A5EE79A,
+    0xA5368C4B, 0x5A3B47AA,
+    0xA513243B, 0x5A1799D0,
+    0xA4EFCA31, 0x59F3DE12,
+    0xA4CC7E31, 0x59D01474,
+    0xA4A94042, 0x59AC3CFD,
+    0xA4861069, 0x598857B1,
+    0xA462EEAC, 0x59646497,
+    0xA43FDB0F, 0x594063B4,
+    0xA41CD598, 0x591C550E,
+    0xA3F9DE4D, 0x58F838A9,
+    0xA3D6F533, 0x58D40E8C,
+    0xA3B41A4F, 0x58AFD6BC,
+    0xA3914DA7, 0x588B913F,
+    0xA36E8F40, 0x58673E1B,
+    0xA34BDF20, 0x5842DD54,
+    0xA3293D4B, 0x581E6EF1,
+    0xA306A9C7, 0x57F9F2F7,
+    0xA2E4249A, 0x57D5696C,
+    0xA2C1ADC9, 0x57B0D256,
+    0xA29F4559, 0x578C2DB9,
+    0xA27CEB4F, 0x57677B9D,
+    0xA25A9FB1, 0x5742BC05,
+    0xA2386283, 0x571DEEF9,
+    0xA21633CD, 0x56F9147E,
+    0xA1F41391, 0x56D42C99,
+    0xA1D201D7, 0x56AF3750,
+    0xA1AFFEA2, 0x568A34A9,
+    0xA18E09F9, 0x566524AA,
+    0xA16C23E1, 0x56400757,
+    0xA14A4C5E, 0x561ADCB8,
+    0xA1288376, 0x55F5A4D2,
+    0xA106C92E, 0x55D05FAA,
+    0xA0E51D8C, 0x55AB0D46,
+    0xA0C38094, 0x5585ADAC,
+    0xA0A1F24C, 0x556040E2,
+    0xA08072BA, 0x553AC6ED,
+    0xA05F01E1, 0x55153FD4,
+    0xA03D9FC7, 0x54EFAB9C,
+    0xA01C4C72, 0x54CA0A4A,
+    0x9FFB07E7, 0x54A45BE5,
+    0x9FD9D22A, 0x547EA073,
+    0x9FB8AB41, 0x5458D7F9,
+    0x9F979331, 0x5433027D,
+    0x9F7689FF, 0x540D2005,
+    0x9F558FB0, 0x53E73097,
+    0x9F34A449, 0x53C13438,
+    0x9F13C7D0, 0x539B2AEF,
+    0x9EF2FA48, 0x537514C1,
+    0x9ED23BB9, 0x534EF1B5,
+    0x9EB18C26, 0x5328C1D0,
+    0x9E90EB94, 0x53028517,
+    0x9E705A09, 0x52DC3B92,
+    0x9E4FD789, 0x52B5E545,
+    0x9E2F641A, 0x528F8237,
+    0x9E0EFFC1, 0x5269126E,
+    0x9DEEAA82, 0x524295EF,
+    0x9DCE6462, 0x521C0CC1,
+    0x9DAE2D68, 0x51F576E9,
+    0x9D8E0596, 0x51CED46E,
+    0x9D6DECF4, 0x51A82555,
+    0x9D4DE384, 0x518169A4,
+    0x9D2DE94D, 0x515AA162,
+    0x9D0DFE53, 0x5133CC94,
+    0x9CEE229C, 0x510CEB40,
+    0x9CCE562B, 0x50E5FD6C,
+    0x9CAE9907, 0x50BF031F,
+    0x9C8EEB33, 0x5097FC5E,
+    0x9C6F4CB5, 0x5070E92F,
+    0x9C4FBD92, 0x5049C999,
+    0x9C303DCF, 0x50229DA0,
+    0x9C10CD70, 0x4FFB654D,
+    0x9BF16C7A, 0x4FD420A3,
+    0x9BD21AF2, 0x4FACCFAB,
+    0x9BB2D8DD, 0x4F857268,
+    0x9B93A640, 0x4F5E08E3,
+    0x9B748320, 0x4F369320,
+    0x9B556F80, 0x4F0F1126,
+    0x9B366B67, 0x4EE782FA,
+    0x9B1776D9, 0x4EBFE8A4,
+    0x9AF891DB, 0x4E984229,
+    0x9AD9BC71, 0x4E708F8F,
+    0x9ABAF6A0, 0x4E48D0DC,
+    0x9A9C406D, 0x4E210617,
+    0x9A7D99DD, 0x4DF92F45,
+    0x9A5F02F5, 0x4DD14C6E,
+    0x9A407BB8, 0x4DA95D96,
+    0x9A22042C, 0x4D8162C4,
+    0x9A039C56, 0x4D595BFE,
+    0x99E5443A, 0x4D31494B,
+    0x99C6FBDE, 0x4D092AB0,
+    0x99A8C344, 0x4CE10034,
+    0x998A9A73, 0x4CB8C9DD,
+    0x996C816F, 0x4C9087B1,
+    0x994E783C, 0x4C6839B6,
+    0x99307EE0, 0x4C3FDFF3,
+    0x9912955E, 0x4C177A6E,
+    0x98F4BBBC, 0x4BEF092D,
+    0x98D6F1FE, 0x4BC68C36,
+    0x98B93828, 0x4B9E038F,
+    0x989B8E3F, 0x4B756F3F,
+    0x987DF449, 0x4B4CCF4D,
+    0x98606A48, 0x4B2423BD,
+    0x9842F043, 0x4AFB6C97,
+    0x9825863D, 0x4AD2A9E1,
+    0x98082C3B, 0x4AA9DBA1,
+    0x97EAE241, 0x4A8101DE,
+    0x97CDA855, 0x4A581C9D,
+    0x97B07E7A, 0x4A2F2BE5,
+    0x979364B5, 0x4A062FBD,
+    0x97765B0A, 0x49DD282A,
+    0x9759617E, 0x49B41533,
+    0x973C7816, 0x498AF6DE,
+    0x971F9ED6, 0x4961CD32,
+    0x9702D5C2, 0x49389836,
+    0x96E61CDF, 0x490F57EE,
+    0x96C97431, 0x48E60C62,
+    0x96ACDBBD, 0x48BCB598,
+    0x96905387, 0x48935397,
+    0x9673DB94, 0x4869E664,
+    0x965773E7, 0x48406E07,
+    0x963B1C85, 0x4816EA85,
+    0x961ED573, 0x47ED5BE6,
+    0x96029EB5, 0x47C3C22E,
+    0x95E6784F, 0x479A1D66,
+    0x95CA6246, 0x47706D93,
+    0x95AE5C9E, 0x4746B2BC,
+    0x9592675B, 0x471CECE6,
+    0x95768282, 0x46F31C1A,
+    0x955AAE17, 0x46C9405C,
+    0x953EEA1E, 0x469F59B4,
+    0x9523369B, 0x46756827,
+    0x95079393, 0x464B6BBD,
+    0x94EC010B, 0x4621647C,
+    0x94D07F05, 0x45F7526B,
+    0x94B50D87, 0x45CD358F,
+    0x9499AC95, 0x45A30DF0,
+    0x947E5C32, 0x4578DB93,
+    0x94631C64, 0x454E9E80,
+    0x9447ED2F, 0x452456BC,
+    0x942CCE95, 0x44FA044F,
+    0x9411C09D, 0x44CFA73F,
+    0x93F6C34A, 0x44A53F93,
+    0x93DBD69F, 0x447ACD50,
+    0x93C0FAA2, 0x4450507E,
+    0x93A62F56, 0x4425C923,
+    0x938B74C0, 0x43FB3745,
+    0x9370CAE4, 0x43D09AEC,
+    0x935631C5, 0x43A5F41E,
+    0x933BA968, 0x437B42E1,
+    0x932131D1, 0x4350873C,
+    0x9306CB04, 0x4325C135,
+    0x92EC7505, 0x42FAF0D4,
+    0x92D22FD8, 0x42D0161E,
+    0x92B7FB82, 0x42A5311A,
+    0x929DD805, 0x427A41D0,
+    0x9283C567, 0x424F4845,
+    0x9269C3AC, 0x42244480,
+    0x924FD2D6, 0x41F93688,
+    0x9235F2EB, 0x41CE1E64,
+    0x921C23EE, 0x41A2FC1A,
+    0x920265E4, 0x4177CFB0,
+    0x91E8B8D0, 0x414C992E,
+    0x91CF1CB6, 0x4121589A,
+    0x91B5919A, 0x40F60DFB,
+    0x919C1780, 0x40CAB957,
+    0x9182AE6C, 0x409F5AB6,
+    0x91695663, 0x4073F21D,
+    0x91500F67, 0x40487F93,
+    0x9136D97D, 0x401D0320,
+    0x911DB4A8, 0x3FF17CCA,
+    0x9104A0ED, 0x3FC5EC97,
+    0x90EB9E50, 0x3F9A528F,
+    0x90D2ACD3, 0x3F6EAEB8,
+    0x90B9CC7C, 0x3F430118,
+    0x90A0FD4E, 0x3F1749B7,
+    0x90883F4C, 0x3EEB889C,
+    0x906F927B, 0x3EBFBDCC,
+    0x9056F6DF, 0x3E93E94F,
+    0x903E6C7A, 0x3E680B2C,
+    0x9025F352, 0x3E3C2369,
+    0x900D8B69, 0x3E10320D,
+    0x8FF534C4, 0x3DE4371F,
+    0x8FDCEF66, 0x3DB832A5,
+    0x8FC4BB53, 0x3D8C24A7,
+    0x8FAC988E, 0x3D600D2B,
+    0x8F94871D, 0x3D33EC39,
+    0x8F7C8701, 0x3D07C1D5,
+    0x8F64983F, 0x3CDB8E09,
+    0x8F4CBADB, 0x3CAF50DA,
+    0x8F34EED8, 0x3C830A4F,
+    0x8F1D343A, 0x3C56BA70,
+    0x8F058B04, 0x3C2A6142,
+    0x8EEDF33B, 0x3BFDFECD,
+    0x8ED66CE1, 0x3BD19317,
+    0x8EBEF7FB, 0x3BA51E29,
+    0x8EA7948C, 0x3B78A007,
+    0x8E904298, 0x3B4C18BA,
+    0x8E790222, 0x3B1F8847,
+    0x8E61D32D, 0x3AF2EEB7,
+    0x8E4AB5BF, 0x3AC64C0F,
+    0x8E33A9D9, 0x3A99A057,
+    0x8E1CAF80, 0x3A6CEB95,
+    0x8E05C6B7, 0x3A402DD1,
+    0x8DEEEF82, 0x3A136712,
+    0x8DD829E4, 0x39E6975D,
+    0x8DC175E0, 0x39B9BEBB,
+    0x8DAAD37B, 0x398CDD32,
+    0x8D9442B7, 0x395FF2C9,
+    0x8D7DC399, 0x3932FF87,
+    0x8D675623, 0x39060372,
+    0x8D50FA59, 0x38D8FE93,
+    0x8D3AB03F, 0x38ABF0EF,
+    0x8D2477D8, 0x387EDA8E,
+    0x8D0E5127, 0x3851BB76,
+    0x8CF83C30, 0x382493B0,
+    0x8CE238F6, 0x37F76340,
+    0x8CCC477D, 0x37CA2A30,
+    0x8CB667C7, 0x379CE884,
+    0x8CA099D9, 0x376F9E46,
+    0x8C8ADDB6, 0x37424B7A,
+    0x8C753361, 0x3714F02A,
+    0x8C5F9ADD, 0x36E78C5A,
+    0x8C4A142F, 0x36BA2013,
+    0x8C349F58, 0x368CAB5C,
+    0x8C1F3C5C, 0x365F2E3B,
+    0x8C09EB40, 0x3631A8B7,
+    0x8BF4AC05, 0x36041AD9,
+    0x8BDF7EAF, 0x35D684A5,
+    0x8BCA6342, 0x35A8E624,
+    0x8BB559C1, 0x357B3F5D,
+    0x8BA0622F, 0x354D9056,
+    0x8B8B7C8F, 0x351FD917,
+    0x8B76A8E4, 0x34F219A7,
+    0x8B61E732, 0x34C4520D,
+    0x8B4D377C, 0x3496824F,
+    0x8B3899C5, 0x3468AA76,
+    0x8B240E10, 0x343ACA87,
+    0x8B0F9461, 0x340CE28A,
+    0x8AFB2CBA, 0x33DEF287,
+    0x8AE6D71F, 0x33B0FA84,
+    0x8AD29393, 0x3382FA88,
+    0x8ABE6219, 0x3354F29A,
+    0x8AAA42B4, 0x3326E2C2,
+    0x8A963567, 0x32F8CB07,
+    0x8A823A35, 0x32CAAB6F,
+    0x8A6E5122, 0x329C8402,
+    0x8A5A7A30, 0x326E54C7,
+    0x8A46B563, 0x32401DC5,
+    0x8A3302BD, 0x3211DF03,
+    0x8A1F6242, 0x31E39889,
+    0x8A0BD3F5, 0x31B54A5D,
+    0x89F857D8, 0x3186F487,
+    0x89E4EDEE, 0x3158970D,
+    0x89D1963C, 0x312A31F8,
+    0x89BE50C3, 0x30FBC54D,
+    0x89AB1D86, 0x30CD5114,
+    0x8997FC89, 0x309ED555,
+    0x8984EDCF, 0x30705217,
+    0x8971F15A, 0x3041C760,
+    0x895F072D, 0x30133538,
+    0x894C2F4C, 0x2FE49BA6,
+    0x893969B9, 0x2FB5FAB2,
+    0x8926B677, 0x2F875262,
+    0x89141589, 0x2F58A2BD,
+    0x890186F1, 0x2F29EBCC,
+    0x88EF0AB4, 0x2EFB2D94,
+    0x88DCA0D3, 0x2ECC681E,
+    0x88CA4951, 0x2E9D9B70,
+    0x88B80431, 0x2E6EC792,
+    0x88A5D177, 0x2E3FEC8B,
+    0x8893B124, 0x2E110A62,
+    0x8881A33C, 0x2DE2211E,
+    0x886FA7C2, 0x2DB330C7,
+    0x885DBEB7, 0x2D843963,
+    0x884BE820, 0x2D553AFB,
+    0x883A23FE, 0x2D263595,
+    0x88287255, 0x2CF72939,
+    0x8816D327, 0x2CC815ED,
+    0x88054677, 0x2C98FBBA,
+    0x87F3CC47, 0x2C69DAA6,
+    0x87E2649B, 0x2C3AB2B9,
+    0x87D10F75, 0x2C0B83F9,
+    0x87BFCCD7, 0x2BDC4E6F,
+    0x87AE9CC5, 0x2BAD1221,
+    0x879D7F40, 0x2B7DCF17,
+    0x878C744C, 0x2B4E8558,
+    0x877B7BEC, 0x2B1F34EB,
+    0x876A9621, 0x2AEFDDD8,
+    0x8759C2EF, 0x2AC08025,
+    0x87490257, 0x2A911BDB,
+    0x8738545E, 0x2A61B101,
+    0x8727B904, 0x2A323F9D,
+    0x8717304E, 0x2A02C7B8,
+    0x8706BA3C, 0x29D34958,
+    0x86F656D3, 0x29A3C484,
+    0x86E60614, 0x29743945,
+    0x86D5C802, 0x2944A7A2,
+    0x86C59C9F, 0x29150FA1,
+    0x86B583EE, 0x28E5714A,
+    0x86A57DF1, 0x28B5CCA5,
+    0x86958AAB, 0x288621B9,
+    0x8685AA1F, 0x2856708C,
+    0x8675DC4E, 0x2826B928,
+    0x8666213C, 0x27F6FB92,
+    0x865678EA, 0x27C737D2,
+    0x8646E35B, 0x27976DF1,
+    0x86376092, 0x27679DF4,
+    0x8627F090, 0x2737C7E3,
+    0x86189359, 0x2707EBC6,
+    0x860948EE, 0x26D809A5,
+    0x85FA1152, 0x26A82185,
+    0x85EAEC88, 0x26783370,
+    0x85DBDA91, 0x26483F6C,
+    0x85CCDB70, 0x26184581,
+    0x85BDEF27, 0x25E845B5,
+    0x85AF15B9, 0x25B84012,
+    0x85A04F28, 0x2588349D,
+    0x85919B75, 0x2558235E,
+    0x8582FAA4, 0x25280C5D,
+    0x85746CB7, 0x24F7EFA1,
+    0x8565F1B0, 0x24C7CD32,
+    0x85578991, 0x2497A517,
+    0x8549345C, 0x24677757,
+    0x853AF214, 0x243743FA,
+    0x852CC2BA, 0x24070B07,
+    0x851EA652, 0x23D6CC86,
+    0x85109CDC, 0x23A6887E,
+    0x8502A65C, 0x23763EF7,
+    0x84F4C2D3, 0x2345EFF7,
+    0x84E6F244, 0x23159B87,
+    0x84D934B0, 0x22E541AE,
+    0x84CB8A1B, 0x22B4E274,
+    0x84BDF285, 0x22847DDF,
+    0x84B06DF1, 0x225413F8,
+    0x84A2FC62, 0x2223A4C5,
+    0x84959DD9, 0x21F3304E,
+    0x84885257, 0x21C2B69C,
+    0x847B19E1, 0x219237B4,
+    0x846DF476, 0x2161B39F,
+    0x8460E21A, 0x21312A65,
+    0x8453E2CE, 0x21009C0B,
+    0x8446F695, 0x20D0089B,
+    0x843A1D70, 0x209F701C,
+    0x842D5761, 0x206ED295,
+    0x8420A46B, 0x203E300D,
+    0x8414048F, 0x200D888C,
+    0x840777CF, 0x1FDCDC1A,
+    0x83FAFE2E, 0x1FAC2ABF,
+    0x83EE97AC, 0x1F7B7480,
+    0x83E2444D, 0x1F4AB967,
+    0x83D60411, 0x1F19F97B,
+    0x83C9D6FB, 0x1EE934C2,
+    0x83BDBD0D, 0x1EB86B46,
+    0x83B1B649, 0x1E879D0C,
+    0x83A5C2B0, 0x1E56CA1E,
+    0x8399E244, 0x1E25F281,
+    0x838E1507, 0x1DF5163F,
+    0x83825AFB, 0x1DC4355D,
+    0x8376B422, 0x1D934FE5,
+    0x836B207D, 0x1D6265DD,
+    0x835FA00E, 0x1D31774D,
+    0x835432D8, 0x1D00843C,
+    0x8348D8DB, 0x1CCF8CB3,
+    0x833D921A, 0x1C9E90B8,
+    0x83325E97, 0x1C6D9053,
+    0x83273E52, 0x1C3C8B8C,
+    0x831C314E, 0x1C0B826A,
+    0x8311378C, 0x1BDA74F5,
+    0x8306510F, 0x1BA96334,
+    0x82FB7DD8, 0x1B784D30,
+    0x82F0BDE8, 0x1B4732EF,
+    0x82E61141, 0x1B161479,
+    0x82DB77E5, 0x1AE4F1D6,
+    0x82D0F1D5, 0x1AB3CB0C,
+    0x82C67F13, 0x1A82A025,
+    0x82BC1FA1, 0x1A517127,
+    0x82B1D381, 0x1A203E1B,
+    0x82A79AB3, 0x19EF0706,
+    0x829D753A, 0x19BDCBF2,
+    0x82936316, 0x198C8CE6,
+    0x8289644A, 0x195B49E9,
+    0x827F78D8, 0x192A0303,
+    0x8275A0C0, 0x18F8B83C,
+    0x826BDC04, 0x18C7699B,
+    0x82622AA5, 0x18961727,
+    0x82588CA6, 0x1864C0E9,
+    0x824F0208, 0x183366E8,
+    0x82458ACB, 0x1802092C,
+    0x823C26F2, 0x17D0A7BB,
+    0x8232D67E, 0x179F429F,
+    0x82299971, 0x176DD9DE,
+    0x82206FCB, 0x173C6D80,
+    0x8217598F, 0x170AFD8D,
+    0x820E56BE, 0x16D98A0C,
+    0x82056758, 0x16A81305,
+    0x81FC8B60, 0x1676987F,
+    0x81F3C2D7, 0x16451A83,
+    0x81EB0DBD, 0x16139917,
+    0x81E26C16, 0x15E21444,
+    0x81D9DDE1, 0x15B08C11,
+    0x81D16320, 0x157F0086,
+    0x81C8FBD5, 0x154D71AA,
+    0x81C0A801, 0x151BDF85,
+    0x81B867A4, 0x14EA4A1F,
+    0x81B03AC1, 0x14B8B17F,
+    0x81A82159, 0x148715AD,
+    0x81A01B6C, 0x145576B1,
+    0x819828FD, 0x1423D492,
+    0x81904A0C, 0x13F22F57,
+    0x81887E9A, 0x13C0870A,
+    0x8180C6A9, 0x138EDBB0,
+    0x8179223A, 0x135D2D53,
+    0x8171914E, 0x132B7BF9,
+    0x816A13E6, 0x12F9C7AA,
+    0x8162AA03, 0x12C8106E,
+    0x815B53A8, 0x1296564D,
+    0x815410D3, 0x1264994E,
+    0x814CE188, 0x1232D978,
+    0x8145C5C6, 0x120116D4,
+    0x813EBD90, 0x11CF516A,
+    0x8137C8E6, 0x119D8940,
+    0x8130E7C8, 0x116BBE5F,
+    0x812A1A39, 0x1139F0CE,
+    0x81236039, 0x11082096,
+    0x811CB9CA, 0x10D64DBC,
+    0x811626EC, 0x10A4784A,
+    0x810FA7A0, 0x1072A047,
+    0x81093BE8, 0x1040C5BB,
+    0x8102E3C3, 0x100EE8AD,
+    0x80FC9F35, 0x0FDD0925,
+    0x80F66E3C, 0x0FAB272B,
+    0x80F050DB, 0x0F7942C6,
+    0x80EA4712, 0x0F475BFE,
+    0x80E450E2, 0x0F1572DC,
+    0x80DE6E4C, 0x0EE38765,
+    0x80D89F51, 0x0EB199A3,
+    0x80D2E3F1, 0x0E7FA99D,
+    0x80CD3C2F, 0x0E4DB75B,
+    0x80C7A80A, 0x0E1BC2E3,
+    0x80C22783, 0x0DE9CC3F,
+    0x80BCBA9C, 0x0DB7D376,
+    0x80B76155, 0x0D85D88F,
+    0x80B21BAF, 0x0D53DB92,
+    0x80ACE9AB, 0x0D21DC87,
+    0x80A7CB49, 0x0CEFDB75,
+    0x80A2C08B, 0x0CBDD865,
+    0x809DC970, 0x0C8BD35E,
+    0x8098E5FB, 0x0C59CC67,
+    0x8094162B, 0x0C27C389,
+    0x808F5A02, 0x0BF5B8CB,
+    0x808AB180, 0x0BC3AC35,
+    0x80861CA5, 0x0B919DCE,
+    0x80819B74, 0x0B5F8D9F,
+    0x807D2DEB, 0x0B2D7BAE,
+    0x8078D40D, 0x0AFB6805,
+    0x80748DD9, 0x0AC952AA,
+    0x80705B50, 0x0A973BA5,
+    0x806C3C73, 0x0A6522FE,
+    0x80683143, 0x0A3308BC,
+    0x806439C0, 0x0A00ECE8,
+    0x806055EA, 0x09CECF89,
+    0x805C85C3, 0x099CB0A7,
+    0x8058C94C, 0x096A9049,
+    0x80552083, 0x09386E77,
+    0x80518B6B, 0x09064B3A,
+    0x804E0A03, 0x08D42698,
+    0x804A9C4D, 0x08A2009A,
+    0x80474248, 0x086FD947,
+    0x8043FBF6, 0x083DB0A7,
+    0x8040C956, 0x080B86C1,
+    0x803DAA69, 0x07D95B9E,
+    0x803A9F31, 0x07A72F45,
+    0x8037A7AC, 0x077501BE,
+    0x8034C3DC, 0x0742D310,
+    0x8031F3C1, 0x0710A344,
+    0x802F375C, 0x06DE7261,
+    0x802C8EAD, 0x06AC406F,
+    0x8029F9B4, 0x067A0D75,
+    0x80277872, 0x0647D97C,
+    0x80250AE7, 0x0615A48A,
+    0x8022B113, 0x05E36EA9,
+    0x80206AF8, 0x05B137DF,
+    0x801E3894, 0x057F0034,
+    0x801C19E9, 0x054CC7B0,
+    0x801A0EF7, 0x051A8E5C,
+    0x801817BF, 0x04E8543D,
+    0x80163440, 0x04B6195D,
+    0x8014647A, 0x0483DDC3,
+    0x8012A86F, 0x0451A176,
+    0x8011001E, 0x041F647F,
+    0x800F6B88, 0x03ED26E6,
+    0x800DEAAC, 0x03BAE8B1,
+    0x800C7D8C, 0x0388A9E9,
+    0x800B2427, 0x03566A96,
+    0x8009DE7D, 0x03242ABF,
+    0x8008AC90, 0x02F1EA6B,
+    0x80078E5E, 0x02BFA9A4,
+    0x800683E8, 0x028D6870,
+    0x80058D2E, 0x025B26D7,
+    0x8004AA31, 0x0228E4E1,
+    0x8003DAF0, 0x01F6A296,
+    0x80031F6C, 0x01C45FFE,
+    0x800277A5, 0x01921D1F,
+    0x8001E39B, 0x015FDA03,
+    0x8001634D, 0x012D96B0,
+    0x8000F6BD, 0x00FB532F,
+    0x80009DE9, 0x00C90F88,
+    0x800058D3, 0x0096CBC1,
+    0x8000277A, 0x006487E3,
+    0x800009DE, 0x003243F5,
+    0x80000000, 0x00000000,
+    0x800009DE, 0xFFCDBC0A,
+    0x8000277A, 0xFF9B781D,
+    0x800058D3, 0xFF69343E,
+    0x80009DE9, 0xFF36F078,
+    0x8000F6BD, 0xFF04ACD0,
+    0x8001634D, 0xFED2694F,
+    0x8001E39B, 0xFEA025FC,
+    0x800277A5, 0xFE6DE2E0,
+    0x80031F6C, 0xFE3BA001,
+    0x8003DAF0, 0xFE095D69,
+    0x8004AA31, 0xFDD71B1E,
+    0x80058D2E, 0xFDA4D928,
+    0x800683E8, 0xFD72978F,
+    0x80078E5E, 0xFD40565B,
+    0x8008AC90, 0xFD0E1594,
+    0x8009DE7D, 0xFCDBD541,
+    0x800B2427, 0xFCA99569,
+    0x800C7D8C, 0xFC775616,
+    0x800DEAAC, 0xFC45174E,
+    0x800F6B88, 0xFC12D919,
+    0x8011001E, 0xFBE09B80,
+    0x8012A86F, 0xFBAE5E89,
+    0x8014647A, 0xFB7C223C,
+    0x80163440, 0xFB49E6A2,
+    0x801817BF, 0xFB17ABC2,
+    0x801A0EF7, 0xFAE571A4,
+    0x801C19E9, 0xFAB3384F,
+    0x801E3894, 0xFA80FFCB,
+    0x80206AF8, 0xFA4EC820,
+    0x8022B113, 0xFA1C9156,
+    0x80250AE7, 0xF9EA5B75,
+    0x80277872, 0xF9B82683,
+    0x8029F9B4, 0xF985F28A,
+    0x802C8EAD, 0xF953BF90,
+    0x802F375C, 0xF9218D9E,
+    0x8031F3C1, 0xF8EF5CBB,
+    0x8034C3DC, 0xF8BD2CEF,
+    0x8037A7AC, 0xF88AFE41,
+    0x803A9F31, 0xF858D0BA,
+    0x803DAA69, 0xF826A461,
+    0x8040C956, 0xF7F4793E,
+    0x8043FBF6, 0xF7C24F58,
+    0x80474248, 0xF79026B8,
+    0x804A9C4D, 0xF75DFF65,
+    0x804E0A03, 0xF72BD967,
+    0x80518B6B, 0xF6F9B4C5,
+    0x80552083, 0xF6C79188,
+    0x8058C94C, 0xF6956FB6,
+    0x805C85C3, 0xF6634F58,
+    0x806055EA, 0xF6313076,
+    0x806439C0, 0xF5FF1317,
+    0x80683143, 0xF5CCF743,
+    0x806C3C73, 0xF59ADD01,
+    0x80705B50, 0xF568C45A,
+    0x80748DD9, 0xF536AD55,
+    0x8078D40D, 0xF50497FA,
+    0x807D2DEB, 0xF4D28451,
+    0x80819B74, 0xF4A07260,
+    0x80861CA5, 0xF46E6231,
+    0x808AB180, 0xF43C53CA,
+    0x808F5A02, 0xF40A4734,
+    0x8094162B, 0xF3D83C76,
+    0x8098E5FB, 0xF3A63398,
+    0x809DC970, 0xF3742CA1,
+    0x80A2C08B, 0xF342279A,
+    0x80A7CB49, 0xF310248A,
+    0x80ACE9AB, 0xF2DE2378,
+    0x80B21BAF, 0xF2AC246D,
+    0x80B76155, 0xF27A2770,
+    0x80BCBA9C, 0xF2482C89,
+    0x80C22783, 0xF21633C0,
+    0x80C7A80A, 0xF1E43D1C,
+    0x80CD3C2F, 0xF1B248A5,
+    0x80D2E3F1, 0xF1805662,
+    0x80D89F51, 0xF14E665C,
+    0x80DE6E4C, 0xF11C789A,
+    0x80E450E2, 0xF0EA8D23,
+    0x80EA4712, 0xF0B8A401,
+    0x80F050DB, 0xF086BD39,
+    0x80F66E3C, 0xF054D8D4,
+    0x80FC9F35, 0xF022F6DA,
+    0x8102E3C3, 0xEFF11752,
+    0x81093BE8, 0xEFBF3A44,
+    0x810FA7A0, 0xEF8D5FB8,
+    0x811626EC, 0xEF5B87B5,
+    0x811CB9CA, 0xEF29B243,
+    0x81236039, 0xEEF7DF6A,
+    0x812A1A39, 0xEEC60F31,
+    0x8130E7C8, 0xEE9441A0,
+    0x8137C8E6, 0xEE6276BF,
+    0x813EBD90, 0xEE30AE95,
+    0x8145C5C6, 0xEDFEE92B,
+    0x814CE188, 0xEDCD2687,
+    0x815410D3, 0xED9B66B2,
+    0x815B53A8, 0xED69A9B2,
+    0x8162AA03, 0xED37EF91,
+    0x816A13E6, 0xED063855,
+    0x8171914E, 0xECD48406,
+    0x8179223A, 0xECA2D2AC,
+    0x8180C6A9, 0xEC71244F,
+    0x81887E9A, 0xEC3F78F5,
+    0x81904A0C, 0xEC0DD0A8,
+    0x819828FD, 0xEBDC2B6D,
+    0x81A01B6C, 0xEBAA894E,
+    0x81A82159, 0xEB78EA52,
+    0x81B03AC1, 0xEB474E80,
+    0x81B867A4, 0xEB15B5E0,
+    0x81C0A801, 0xEAE4207A,
+    0x81C8FBD5, 0xEAB28E55,
+    0x81D16320, 0xEA80FF79,
+    0x81D9DDE1, 0xEA4F73EE,
+    0x81E26C16, 0xEA1DEBBB,
+    0x81EB0DBD, 0xE9EC66E8,
+    0x81F3C2D7, 0xE9BAE57C,
+    0x81FC8B60, 0xE9896780,
+    0x82056758, 0xE957ECFB,
+    0x820E56BE, 0xE92675F4,
+    0x8217598F, 0xE8F50273,
+    0x82206FCB, 0xE8C3927F,
+    0x82299971, 0xE8922621,
+    0x8232D67E, 0xE860BD60,
+    0x823C26F2, 0xE82F5844,
+    0x82458ACB, 0xE7FDF6D3,
+    0x824F0208, 0xE7CC9917,
+    0x82588CA6, 0xE79B3F16,
+    0x82622AA5, 0xE769E8D8,
+    0x826BDC04, 0xE7389664,
+    0x8275A0C0, 0xE70747C3,
+    0x827F78D8, 0xE6D5FCFC,
+    0x8289644A, 0xE6A4B616,
+    0x82936316, 0xE6737319,
+    0x829D753A, 0xE642340D,
+    0x82A79AB3, 0xE610F8F9,
+    0x82B1D381, 0xE5DFC1E4,
+    0x82BC1FA1, 0xE5AE8ED8,
+    0x82C67F13, 0xE57D5FDA,
+    0x82D0F1D5, 0xE54C34F3,
+    0x82DB77E5, 0xE51B0E2A,
+    0x82E61141, 0xE4E9EB86,
+    0x82F0BDE8, 0xE4B8CD10,
+    0x82FB7DD8, 0xE487B2CF,
+    0x8306510F, 0xE4569CCB,
+    0x8311378C, 0xE4258B0A,
+    0x831C314E, 0xE3F47D95,
+    0x83273E52, 0xE3C37473,
+    0x83325E97, 0xE3926FAC,
+    0x833D921A, 0xE3616F47,
+    0x8348D8DB, 0xE330734C,
+    0x835432D8, 0xE2FF7BC3,
+    0x835FA00E, 0xE2CE88B2,
+    0x836B207D, 0xE29D9A22,
+    0x8376B422, 0xE26CB01A,
+    0x83825AFB, 0xE23BCAA2,
+    0x838E1507, 0xE20AE9C1,
+    0x8399E244, 0xE1DA0D7E,
+    0x83A5C2B0, 0xE1A935E1,
+    0x83B1B649, 0xE17862F3,
+    0x83BDBD0D, 0xE14794B9,
+    0x83C9D6FB, 0xE116CB3D,
+    0x83D60411, 0xE0E60684,
+    0x83E2444D, 0xE0B54698,
+    0x83EE97AC, 0xE0848B7F,
+    0x83FAFE2E, 0xE053D541,
+    0x840777CF, 0xE02323E5,
+    0x8414048F, 0xDFF27773,
+    0x8420A46B, 0xDFC1CFF2,
+    0x842D5761, 0xDF912D6A,
+    0x843A1D70, 0xDF608FE3,
+    0x8446F695, 0xDF2FF764,
+    0x8453E2CE, 0xDEFF63F4,
+    0x8460E21A, 0xDECED59B,
+    0x846DF476, 0xDE9E4C60,
+    0x847B19E1, 0xDE6DC84B,
+    0x84885257, 0xDE3D4963,
+    0x84959DD9, 0xDE0CCFB1,
+    0x84A2FC62, 0xDDDC5B3A,
+    0x84B06DF1, 0xDDABEC07,
+    0x84BDF285, 0xDD7B8220,
+    0x84CB8A1B, 0xDD4B1D8B,
+    0x84D934B0, 0xDD1ABE51,
+    0x84E6F244, 0xDCEA6478,
+    0x84F4C2D3, 0xDCBA1008,
+    0x8502A65C, 0xDC89C108,
+    0x85109CDC, 0xDC597781,
+    0x851EA652, 0xDC293379,
+    0x852CC2BA, 0xDBF8F4F8,
+    0x853AF214, 0xDBC8BC05,
+    0x8549345C, 0xDB9888A8,
+    0x85578991, 0xDB685AE8,
+    0x8565F1B0, 0xDB3832CD,
+    0x85746CB7, 0xDB08105E,
+    0x8582FAA4, 0xDAD7F3A2,
+    0x85919B75, 0xDAA7DCA1,
+    0x85A04F28, 0xDA77CB62,
+    0x85AF15B9, 0xDA47BFED,
+    0x85BDEF27, 0xDA17BA4A,
+    0x85CCDB70, 0xD9E7BA7E,
+    0x85DBDA91, 0xD9B7C093,
+    0x85EAEC88, 0xD987CC8F,
+    0x85FA1152, 0xD957DE7A,
+    0x860948EE, 0xD927F65B,
+    0x86189359, 0xD8F81439,
+    0x8627F090, 0xD8C8381C,
+    0x86376092, 0xD898620C,
+    0x8646E35B, 0xD868920F,
+    0x865678EA, 0xD838C82D,
+    0x8666213C, 0xD809046D,
+    0x8675DC4E, 0xD7D946D7,
+    0x8685AA1F, 0xD7A98F73,
+    0x86958AAB, 0xD779DE46,
+    0x86A57DF1, 0xD74A335A,
+    0x86B583EE, 0xD71A8EB5,
+    0x86C59C9F, 0xD6EAF05E,
+    0x86D5C802, 0xD6BB585D,
+    0x86E60614, 0xD68BC6BA,
+    0x86F656D3, 0xD65C3B7B,
+    0x8706BA3C, 0xD62CB6A7,
+    0x8717304E, 0xD5FD3847,
+    0x8727B904, 0xD5CDC062,
+    0x8738545E, 0xD59E4EFE,
+    0x87490257, 0xD56EE424,
+    0x8759C2EF, 0xD53F7FDA,
+    0x876A9621, 0xD5102227,
+    0x877B7BEC, 0xD4E0CB14,
+    0x878C744C, 0xD4B17AA7,
+    0x879D7F40, 0xD48230E8,
+    0x87AE9CC5, 0xD452EDDE,
+    0x87BFCCD7, 0xD423B190,
+    0x87D10F75, 0xD3F47C06,
+    0x87E2649B, 0xD3C54D46,
+    0x87F3CC47, 0xD3962559,
+    0x88054677, 0xD3670445,
+    0x8816D327, 0xD337EA12,
+    0x88287255, 0xD308D6C6,
+    0x883A23FE, 0xD2D9CA6A,
+    0x884BE820, 0xD2AAC504,
+    0x885DBEB7, 0xD27BC69C,
+    0x886FA7C2, 0xD24CCF38,
+    0x8881A33C, 0xD21DDEE1,
+    0x8893B124, 0xD1EEF59E,
+    0x88A5D177, 0xD1C01374,
+    0x88B80431, 0xD191386D,
+    0x88CA4951, 0xD162648F,
+    0x88DCA0D3, 0xD13397E1,
+    0x88EF0AB4, 0xD104D26B,
+    0x890186F1, 0xD0D61433,
+    0x89141589, 0xD0A75D42,
+    0x8926B677, 0xD078AD9D,
+    0x893969B9, 0xD04A054D,
+    0x894C2F4C, 0xD01B6459,
+    0x895F072D, 0xCFECCAC7,
+    0x8971F15A, 0xCFBE389F,
+    0x8984EDCF, 0xCF8FADE8,
+    0x8997FC89, 0xCF612AAA,
+    0x89AB1D86, 0xCF32AEEB,
+    0x89BE50C3, 0xCF043AB2,
+    0x89D1963C, 0xCED5CE08,
+    0x89E4EDEE, 0xCEA768F2,
+    0x89F857D8, 0xCE790B78,
+    0x8A0BD3F5, 0xCE4AB5A2,
+    0x8A1F6242, 0xCE1C6776,
+    0x8A3302BD, 0xCDEE20FC,
+    0x8A46B563, 0xCDBFE23A,
+    0x8A5A7A30, 0xCD91AB38,
+    0x8A6E5122, 0xCD637BFD,
+    0x8A823A35, 0xCD355490,
+    0x8A963567, 0xCD0734F8,
+    0x8AAA42B4, 0xCCD91D3D,
+    0x8ABE6219, 0xCCAB0D65,
+    0x8AD29393, 0xCC7D0577,
+    0x8AE6D71F, 0xCC4F057B,
+    0x8AFB2CBA, 0xCC210D78,
+    0x8B0F9461, 0xCBF31D75,
+    0x8B240E10, 0xCBC53578,
+    0x8B3899C5, 0xCB975589,
+    0x8B4D377C, 0xCB697DB0,
+    0x8B61E732, 0xCB3BADF2,
+    0x8B76A8E4, 0xCB0DE658,
+    0x8B8B7C8F, 0xCAE026E8,
+    0x8BA0622F, 0xCAB26FA9,
+    0x8BB559C1, 0xCA84C0A2,
+    0x8BCA6342, 0xCA5719DB,
+    0x8BDF7EAF, 0xCA297B5A,
+    0x8BF4AC05, 0xC9FBE527,
+    0x8C09EB40, 0xC9CE5748,
+    0x8C1F3C5C, 0xC9A0D1C4,
+    0x8C349F58, 0xC97354A3,
+    0x8C4A142F, 0xC945DFEC,
+    0x8C5F9ADD, 0xC91873A5,
+    0x8C753361, 0xC8EB0FD6,
+    0x8C8ADDB6, 0xC8BDB485,
+    0x8CA099D9, 0xC89061BA,
+    0x8CB667C7, 0xC863177B,
+    0x8CCC477D, 0xC835D5D0,
+    0x8CE238F6, 0xC8089CBF,
+    0x8CF83C30, 0xC7DB6C50,
+    0x8D0E5127, 0xC7AE4489,
+    0x8D2477D8, 0xC7812571,
+    0x8D3AB03F, 0xC7540F10,
+    0x8D50FA59, 0xC727016C,
+    0x8D675623, 0xC6F9FC8D,
+    0x8D7DC399, 0xC6CD0079,
+    0x8D9442B7, 0xC6A00D36,
+    0x8DAAD37B, 0xC67322CD,
+    0x8DC175E0, 0xC6464144,
+    0x8DD829E4, 0xC61968A2,
+    0x8DEEEF82, 0xC5EC98ED,
+    0x8E05C6B7, 0xC5BFD22E,
+    0x8E1CAF80, 0xC593146A,
+    0x8E33A9D9, 0xC5665FA8,
+    0x8E4AB5BF, 0xC539B3F0,
+    0x8E61D32D, 0xC50D1148,
+    0x8E790222, 0xC4E077B8,
+    0x8E904298, 0xC4B3E746,
+    0x8EA7948C, 0xC4875FF8,
+    0x8EBEF7FB, 0xC45AE1D7,
+    0x8ED66CE1, 0xC42E6CE8,
+    0x8EEDF33B, 0xC4020132,
+    0x8F058B04, 0xC3D59EBD,
+    0x8F1D343A, 0xC3A9458F,
+    0x8F34EED8, 0xC37CF5B0,
+    0x8F4CBADB, 0xC350AF25,
+    0x8F64983F, 0xC32471F6,
+    0x8F7C8701, 0xC2F83E2A,
+    0x8F94871D, 0xC2CC13C7,
+    0x8FAC988E, 0xC29FF2D4,
+    0x8FC4BB53, 0xC273DB58,
+    0x8FDCEF66, 0xC247CD5A,
+    0x8FF534C4, 0xC21BC8E0,
+    0x900D8B69, 0xC1EFCDF2,
+    0x9025F352, 0xC1C3DC96,
+    0x903E6C7A, 0xC197F4D3,
+    0x9056F6DF, 0xC16C16B0,
+    0x906F927B, 0xC1404233,
+    0x90883F4C, 0xC1147763,
+    0x90A0FD4E, 0xC0E8B648,
+    0x90B9CC7C, 0xC0BCFEE7,
+    0x90D2ACD3, 0xC0915147,
+    0x90EB9E50, 0xC065AD70,
+    0x9104A0ED, 0xC03A1368,
+    0x911DB4A8, 0xC00E8335,
+    0x9136D97D, 0xBFE2FCDF,
+    0x91500F67, 0xBFB7806C,
+    0x91695663, 0xBF8C0DE2,
+    0x9182AE6C, 0xBF60A54A,
+    0x919C1780, 0xBF3546A8,
+    0x91B5919A, 0xBF09F204,
+    0x91CF1CB6, 0xBEDEA765,
+    0x91E8B8D0, 0xBEB366D1,
+    0x920265E4, 0xBE88304F,
+    0x921C23EE, 0xBE5D03E5,
+    0x9235F2EB, 0xBE31E19B,
+    0x924FD2D6, 0xBE06C977,
+    0x9269C3AC, 0xBDDBBB7F,
+    0x9283C567, 0xBDB0B7BA,
+    0x929DD805, 0xBD85BE2F,
+    0x92B7FB82, 0xBD5ACEE5,
+    0x92D22FD8, 0xBD2FE9E1,
+    0x92EC7505, 0xBD050F2C,
+    0x9306CB04, 0xBCDA3ECA,
+    0x932131D1, 0xBCAF78C3,
+    0x933BA968, 0xBC84BD1E,
+    0x935631C5, 0xBC5A0BE1,
+    0x9370CAE4, 0xBC2F6513,
+    0x938B74C0, 0xBC04C8BA,
+    0x93A62F56, 0xBBDA36DC,
+    0x93C0FAA2, 0xBBAFAF81,
+    0x93DBD69F, 0xBB8532AF,
+    0x93F6C34A, 0xBB5AC06C,
+    0x9411C09D, 0xBB3058C0,
+    0x942CCE95, 0xBB05FBB0,
+    0x9447ED2F, 0xBADBA943,
+    0x94631C64, 0xBAB1617F,
+    0x947E5C32, 0xBA87246C,
+    0x9499AC95, 0xBA5CF210,
+    0x94B50D87, 0xBA32CA70,
+    0x94D07F05, 0xBA08AD94,
+    0x94EC010B, 0xB9DE9B83,
+    0x95079393, 0xB9B49442,
+    0x9523369B, 0xB98A97D8,
+    0x953EEA1E, 0xB960A64B,
+    0x955AAE17, 0xB936BFA3,
+    0x95768282, 0xB90CE3E6,
+    0x9592675B, 0xB8E31319,
+    0x95AE5C9E, 0xB8B94D44,
+    0x95CA6246, 0xB88F926C,
+    0x95E6784F, 0xB865E299,
+    0x96029EB5, 0xB83C3DD1,
+    0x961ED573, 0xB812A419,
+    0x963B1C85, 0xB7E9157A,
+    0x965773E7, 0xB7BF91F8,
+    0x9673DB94, 0xB796199B,
+    0x96905387, 0xB76CAC68,
+    0x96ACDBBD, 0xB7434A67,
+    0x96C97431, 0xB719F39D,
+    0x96E61CDF, 0xB6F0A811,
+    0x9702D5C2, 0xB6C767CA,
+    0x971F9ED6, 0xB69E32CD,
+    0x973C7816, 0xB6750921,
+    0x9759617E, 0xB64BEACC,
+    0x97765B0A, 0xB622D7D5,
+    0x979364B5, 0xB5F9D042,
+    0x97B07E7A, 0xB5D0D41A,
+    0x97CDA855, 0xB5A7E362,
+    0x97EAE241, 0xB57EFE21,
+    0x98082C3B, 0xB556245E,
+    0x9825863D, 0xB52D561E,
+    0x9842F043, 0xB5049368,
+    0x98606A48, 0xB4DBDC42,
+    0x987DF449, 0xB4B330B2,
+    0x989B8E3F, 0xB48A90C0,
+    0x98B93828, 0xB461FC70,
+    0x98D6F1FE, 0xB43973C9,
+    0x98F4BBBC, 0xB410F6D2,
+    0x9912955E, 0xB3E88591,
+    0x99307EE0, 0xB3C0200C,
+    0x994E783C, 0xB397C649,
+    0x996C816F, 0xB36F784E,
+    0x998A9A73, 0xB3473622,
+    0x99A8C344, 0xB31EFFCB,
+    0x99C6FBDE, 0xB2F6D54F,
+    0x99E5443A, 0xB2CEB6B5,
+    0x9A039C56, 0xB2A6A401,
+    0x9A22042C, 0xB27E9D3B,
+    0x9A407BB8, 0xB256A26A,
+    0x9A5F02F5, 0xB22EB392,
+    0x9A7D99DD, 0xB206D0BA,
+    0x9A9C406D, 0xB1DEF9E8,
+    0x9ABAF6A0, 0xB1B72F23,
+    0x9AD9BC71, 0xB18F7070,
+    0x9AF891DB, 0xB167BDD6,
+    0x9B1776D9, 0xB140175B,
+    0x9B366B67, 0xB1187D05,
+    0x9B556F80, 0xB0F0EEDA,
+    0x9B748320, 0xB0C96CDF,
+    0x9B93A640, 0xB0A1F71C,
+    0x9BB2D8DD, 0xB07A8D97,
+    0x9BD21AF2, 0xB0533055,
+    0x9BF16C7A, 0xB02BDF5C,
+    0x9C10CD70, 0xB0049AB2,
+    0x9C303DCF, 0xAFDD625F,
+    0x9C4FBD92, 0xAFB63667,
+    0x9C6F4CB5, 0xAF8F16D0,
+    0x9C8EEB33, 0xAF6803A1,
+    0x9CAE9907, 0xAF40FCE0,
+    0x9CCE562B, 0xAF1A0293,
+    0x9CEE229C, 0xAEF314BF,
+    0x9D0DFE53, 0xAECC336B,
+    0x9D2DE94D, 0xAEA55E9D,
+    0x9D4DE384, 0xAE7E965B,
+    0x9D6DECF4, 0xAE57DAAA,
+    0x9D8E0596, 0xAE312B91,
+    0x9DAE2D68, 0xAE0A8916,
+    0x9DCE6462, 0xADE3F33E,
+    0x9DEEAA82, 0xADBD6A10,
+    0x9E0EFFC1, 0xAD96ED91,
+    0x9E2F641A, 0xAD707DC8,
+    0x9E4FD789, 0xAD4A1ABA,
+    0x9E705A09, 0xAD23C46D,
+    0x9E90EB94, 0xACFD7AE8,
+    0x9EB18C26, 0xACD73E30,
+    0x9ED23BB9, 0xACB10E4A,
+    0x9EF2FA48, 0xAC8AEB3E,
+    0x9F13C7D0, 0xAC64D510,
+    0x9F34A449, 0xAC3ECBC7,
+    0x9F558FB0, 0xAC18CF68,
+    0x9F7689FF, 0xABF2DFFA,
+    0x9F979331, 0xABCCFD82,
+    0x9FB8AB41, 0xABA72806,
+    0x9FD9D22A, 0xAB815F8C,
+    0x9FFB07E7, 0xAB5BA41A,
+    0xA01C4C72, 0xAB35F5B5,
+    0xA03D9FC7, 0xAB105464,
+    0xA05F01E1, 0xAAEAC02B,
+    0xA08072BA, 0xAAC53912,
+    0xA0A1F24C, 0xAA9FBF1D,
+    0xA0C38094, 0xAA7A5253,
+    0xA0E51D8C, 0xAA54F2B9,
+    0xA106C92E, 0xAA2FA055,
+    0xA1288376, 0xAA0A5B2D,
+    0xA14A4C5E, 0xA9E52347,
+    0xA16C23E1, 0xA9BFF8A8,
+    0xA18E09F9, 0xA99ADB56,
+    0xA1AFFEA2, 0xA975CB56,
+    0xA1D201D7, 0xA950C8AF,
+    0xA1F41391, 0xA92BD366,
+    0xA21633CD, 0xA906EB81,
+    0xA2386283, 0xA8E21106,
+    0xA25A9FB1, 0xA8BD43FA,
+    0xA27CEB4F, 0xA8988463,
+    0xA29F4559, 0xA873D246,
+    0xA2C1ADC9, 0xA84F2DA9,
+    0xA2E4249A, 0xA82A9693,
+    0xA306A9C7, 0xA8060D08,
+    0xA3293D4B, 0xA7E1910E,
+    0xA34BDF20, 0xA7BD22AB,
+    0xA36E8F40, 0xA798C1E4,
+    0xA3914DA7, 0xA7746EC0,
+    0xA3B41A4F, 0xA7502943,
+    0xA3D6F533, 0xA72BF173,
+    0xA3F9DE4D, 0xA707C756,
+    0xA41CD598, 0xA6E3AAF2,
+    0xA43FDB0F, 0xA6BF9C4B,
+    0xA462EEAC, 0xA69B9B68,
+    0xA4861069, 0xA677A84E,
+    0xA4A94042, 0xA653C302,
+    0xA4CC7E31, 0xA62FEB8B,
+    0xA4EFCA31, 0xA60C21ED,
+    0xA513243B, 0xA5E8662F,
+    0xA5368C4B, 0xA5C4B855,
+    0xA55A025B, 0xA5A11865,
+    0xA57D8666, 0xA57D8666,
+    0xA5A11865, 0xA55A025B,
+    0xA5C4B855, 0xA5368C4B,
+    0xA5E8662F, 0xA513243B,
+    0xA60C21ED, 0xA4EFCA31,
+    0xA62FEB8B, 0xA4CC7E31,
+    0xA653C302, 0xA4A94042,
+    0xA677A84E, 0xA4861069,
+    0xA69B9B68, 0xA462EEAC,
+    0xA6BF9C4B, 0xA43FDB0F,
+    0xA6E3AAF2, 0xA41CD598,
+    0xA707C756, 0xA3F9DE4D,
+    0xA72BF173, 0xA3D6F533,
+    0xA7502943, 0xA3B41A4F,
+    0xA7746EC0, 0xA3914DA7,
+    0xA798C1E4, 0xA36E8F40,
+    0xA7BD22AB, 0xA34BDF20,
+    0xA7E1910E, 0xA3293D4B,
+    0xA8060D08, 0xA306A9C7,
+    0xA82A9693, 0xA2E4249A,
+    0xA84F2DA9, 0xA2C1ADC9,
+    0xA873D246, 0xA29F4559,
+    0xA8988463, 0xA27CEB4F,
+    0xA8BD43FA, 0xA25A9FB1,
+    0xA8E21106, 0xA2386283,
+    0xA906EB81, 0xA21633CD,
+    0xA92BD366, 0xA1F41391,
+    0xA950C8AF, 0xA1D201D7,
+    0xA975CB56, 0xA1AFFEA2,
+    0xA99ADB56, 0xA18E09F9,
+    0xA9BFF8A8, 0xA16C23E1,
+    0xA9E52347, 0xA14A4C5E,
+    0xAA0A5B2D, 0xA1288376,
+    0xAA2FA055, 0xA106C92E,
+    0xAA54F2B9, 0xA0E51D8C,
+    0xAA7A5253, 0xA0C38094,
+    0xAA9FBF1D, 0xA0A1F24C,
+    0xAAC53912, 0xA08072BA,
+    0xAAEAC02B, 0xA05F01E1,
+    0xAB105464, 0xA03D9FC7,
+    0xAB35F5B5, 0xA01C4C72,
+    0xAB5BA41A, 0x9FFB07E7,
+    0xAB815F8C, 0x9FD9D22A,
+    0xABA72806, 0x9FB8AB41,
+    0xABCCFD82, 0x9F979331,
+    0xABF2DFFA, 0x9F7689FF,
+    0xAC18CF68, 0x9F558FB0,
+    0xAC3ECBC7, 0x9F34A449,
+    0xAC64D510, 0x9F13C7D0,
+    0xAC8AEB3E, 0x9EF2FA48,
+    0xACB10E4A, 0x9ED23BB9,
+    0xACD73E30, 0x9EB18C26,
+    0xACFD7AE8, 0x9E90EB94,
+    0xAD23C46D, 0x9E705A09,
+    0xAD4A1ABA, 0x9E4FD789,
+    0xAD707DC8, 0x9E2F641A,
+    0xAD96ED91, 0x9E0EFFC1,
+    0xADBD6A10, 0x9DEEAA82,
+    0xADE3F33E, 0x9DCE6462,
+    0xAE0A8916, 0x9DAE2D68,
+    0xAE312B91, 0x9D8E0596,
+    0xAE57DAAA, 0x9D6DECF4,
+    0xAE7E965B, 0x9D4DE384,
+    0xAEA55E9D, 0x9D2DE94D,
+    0xAECC336B, 0x9D0DFE53,
+    0xAEF314BF, 0x9CEE229C,
+    0xAF1A0293, 0x9CCE562B,
+    0xAF40FCE0, 0x9CAE9907,
+    0xAF6803A1, 0x9C8EEB33,
+    0xAF8F16D0, 0x9C6F4CB5,
+    0xAFB63667, 0x9C4FBD92,
+    0xAFDD625F, 0x9C303DCF,
+    0xB0049AB2, 0x9C10CD70,
+    0xB02BDF5C, 0x9BF16C7A,
+    0xB0533055, 0x9BD21AF2,
+    0xB07A8D97, 0x9BB2D8DD,
+    0xB0A1F71C, 0x9B93A640,
+    0xB0C96CDF, 0x9B748320,
+    0xB0F0EEDA, 0x9B556F80,
+    0xB1187D05, 0x9B366B67,
+    0xB140175B, 0x9B1776D9,
+    0xB167BDD6, 0x9AF891DB,
+    0xB18F7070, 0x9AD9BC71,
+    0xB1B72F23, 0x9ABAF6A0,
+    0xB1DEF9E8, 0x9A9C406D,
+    0xB206D0BA, 0x9A7D99DD,
+    0xB22EB392, 0x9A5F02F5,
+    0xB256A26A, 0x9A407BB8,
+    0xB27E9D3B, 0x9A22042C,
+    0xB2A6A401, 0x9A039C56,
+    0xB2CEB6B5, 0x99E5443A,
+    0xB2F6D54F, 0x99C6FBDE,
+    0xB31EFFCB, 0x99A8C344,
+    0xB3473622, 0x998A9A73,
+    0xB36F784E, 0x996C816F,
+    0xB397C649, 0x994E783C,
+    0xB3C0200C, 0x99307EE0,
+    0xB3E88591, 0x9912955E,
+    0xB410F6D2, 0x98F4BBBC,
+    0xB43973C9, 0x98D6F1FE,
+    0xB461FC70, 0x98B93828,
+    0xB48A90C0, 0x989B8E3F,
+    0xB4B330B2, 0x987DF449,
+    0xB4DBDC42, 0x98606A48,
+    0xB5049368, 0x9842F043,
+    0xB52D561E, 0x9825863D,
+    0xB556245E, 0x98082C3B,
+    0xB57EFE21, 0x97EAE241,
+    0xB5A7E362, 0x97CDA855,
+    0xB5D0D41A, 0x97B07E7A,
+    0xB5F9D042, 0x979364B5,
+    0xB622D7D5, 0x97765B0A,
+    0xB64BEACC, 0x9759617E,
+    0xB6750921, 0x973C7816,
+    0xB69E32CD, 0x971F9ED6,
+    0xB6C767CA, 0x9702D5C2,
+    0xB6F0A811, 0x96E61CDF,
+    0xB719F39D, 0x96C97431,
+    0xB7434A67, 0x96ACDBBD,
+    0xB76CAC68, 0x96905387,
+    0xB796199B, 0x9673DB94,
+    0xB7BF91F8, 0x965773E7,
+    0xB7E9157A, 0x963B1C85,
+    0xB812A419, 0x961ED573,
+    0xB83C3DD1, 0x96029EB5,
+    0xB865E299, 0x95E6784F,
+    0xB88F926C, 0x95CA6246,
+    0xB8B94D44, 0x95AE5C9E,
+    0xB8E31319, 0x9592675B,
+    0xB90CE3E6, 0x95768282,
+    0xB936BFA3, 0x955AAE17,
+    0xB960A64B, 0x953EEA1E,
+    0xB98A97D8, 0x9523369B,
+    0xB9B49442, 0x95079393,
+    0xB9DE9B83, 0x94EC010B,
+    0xBA08AD94, 0x94D07F05,
+    0xBA32CA70, 0x94B50D87,
+    0xBA5CF210, 0x9499AC95,
+    0xBA87246C, 0x947E5C32,
+    0xBAB1617F, 0x94631C64,
+    0xBADBA943, 0x9447ED2F,
+    0xBB05FBB0, 0x942CCE95,
+    0xBB3058C0, 0x9411C09D,
+    0xBB5AC06C, 0x93F6C34A,
+    0xBB8532AF, 0x93DBD69F,
+    0xBBAFAF81, 0x93C0FAA2,
+    0xBBDA36DC, 0x93A62F56,
+    0xBC04C8BA, 0x938B74C0,
+    0xBC2F6513, 0x9370CAE4,
+    0xBC5A0BE1, 0x935631C5,
+    0xBC84BD1E, 0x933BA968,
+    0xBCAF78C3, 0x932131D1,
+    0xBCDA3ECA, 0x9306CB04,
+    0xBD050F2C, 0x92EC7505,
+    0xBD2FE9E1, 0x92D22FD8,
+    0xBD5ACEE5, 0x92B7FB82,
+    0xBD85BE2F, 0x929DD805,
+    0xBDB0B7BA, 0x9283C567,
+    0xBDDBBB7F, 0x9269C3AC,
+    0xBE06C977, 0x924FD2D6,
+    0xBE31E19B, 0x9235F2EB,
+    0xBE5D03E5, 0x921C23EE,
+    0xBE88304F, 0x920265E4,
+    0xBEB366D1, 0x91E8B8D0,
+    0xBEDEA765, 0x91CF1CB6,
+    0xBF09F204, 0x91B5919A,
+    0xBF3546A8, 0x919C1780,
+    0xBF60A54A, 0x9182AE6C,
+    0xBF8C0DE2, 0x91695663,
+    0xBFB7806C, 0x91500F67,
+    0xBFE2FCDF, 0x9136D97D,
+    0xC00E8335, 0x911DB4A8,
+    0xC03A1368, 0x9104A0ED,
+    0xC065AD70, 0x90EB9E50,
+    0xC0915147, 0x90D2ACD3,
+    0xC0BCFEE7, 0x90B9CC7C,
+    0xC0E8B648, 0x90A0FD4E,
+    0xC1147763, 0x90883F4C,
+    0xC1404233, 0x906F927B,
+    0xC16C16B0, 0x9056F6DF,
+    0xC197F4D3, 0x903E6C7A,
+    0xC1C3DC96, 0x9025F352,
+    0xC1EFCDF2, 0x900D8B69,
+    0xC21BC8E0, 0x8FF534C4,
+    0xC247CD5A, 0x8FDCEF66,
+    0xC273DB58, 0x8FC4BB53,
+    0xC29FF2D4, 0x8FAC988E,
+    0xC2CC13C7, 0x8F94871D,
+    0xC2F83E2A, 0x8F7C8701,
+    0xC32471F6, 0x8F64983F,
+    0xC350AF25, 0x8F4CBADB,
+    0xC37CF5B0, 0x8F34EED8,
+    0xC3A9458F, 0x8F1D343A,
+    0xC3D59EBD, 0x8F058B04,
+    0xC4020132, 0x8EEDF33B,
+    0xC42E6CE8, 0x8ED66CE1,
+    0xC45AE1D7, 0x8EBEF7FB,
+    0xC4875FF8, 0x8EA7948C,
+    0xC4B3E746, 0x8E904298,
+    0xC4E077B8, 0x8E790222,
+    0xC50D1148, 0x8E61D32D,
+    0xC539B3F0, 0x8E4AB5BF,
+    0xC5665FA8, 0x8E33A9D9,
+    0xC593146A, 0x8E1CAF80,
+    0xC5BFD22E, 0x8E05C6B7,
+    0xC5EC98ED, 0x8DEEEF82,
+    0xC61968A2, 0x8DD829E4,
+    0xC6464144, 0x8DC175E0,
+    0xC67322CD, 0x8DAAD37B,
+    0xC6A00D36, 0x8D9442B7,
+    0xC6CD0079, 0x8D7DC399,
+    0xC6F9FC8D, 0x8D675623,
+    0xC727016C, 0x8D50FA59,
+    0xC7540F10, 0x8D3AB03F,
+    0xC7812571, 0x8D2477D8,
+    0xC7AE4489, 0x8D0E5127,
+    0xC7DB6C50, 0x8CF83C30,
+    0xC8089CBF, 0x8CE238F6,
+    0xC835D5D0, 0x8CCC477D,
+    0xC863177B, 0x8CB667C7,
+    0xC89061BA, 0x8CA099D9,
+    0xC8BDB485, 0x8C8ADDB6,
+    0xC8EB0FD6, 0x8C753361,
+    0xC91873A5, 0x8C5F9ADD,
+    0xC945DFEC, 0x8C4A142F,
+    0xC97354A3, 0x8C349F58,
+    0xC9A0D1C4, 0x8C1F3C5C,
+    0xC9CE5748, 0x8C09EB40,
+    0xC9FBE527, 0x8BF4AC05,
+    0xCA297B5A, 0x8BDF7EAF,
+    0xCA5719DB, 0x8BCA6342,
+    0xCA84C0A2, 0x8BB559C1,
+    0xCAB26FA9, 0x8BA0622F,
+    0xCAE026E8, 0x8B8B7C8F,
+    0xCB0DE658, 0x8B76A8E4,
+    0xCB3BADF2, 0x8B61E732,
+    0xCB697DB0, 0x8B4D377C,
+    0xCB975589, 0x8B3899C5,
+    0xCBC53578, 0x8B240E10,
+    0xCBF31D75, 0x8B0F9461,
+    0xCC210D78, 0x8AFB2CBA,
+    0xCC4F057B, 0x8AE6D71F,
+    0xCC7D0577, 0x8AD29393,
+    0xCCAB0D65, 0x8ABE6219,
+    0xCCD91D3D, 0x8AAA42B4,
+    0xCD0734F8, 0x8A963567,
+    0xCD355490, 0x8A823A35,
+    0xCD637BFD, 0x8A6E5122,
+    0xCD91AB38, 0x8A5A7A30,
+    0xCDBFE23A, 0x8A46B563,
+    0xCDEE20FC, 0x8A3302BD,
+    0xCE1C6776, 0x8A1F6242,
+    0xCE4AB5A2, 0x8A0BD3F5,
+    0xCE790B78, 0x89F857D8,
+    0xCEA768F2, 0x89E4EDEE,
+    0xCED5CE08, 0x89D1963C,
+    0xCF043AB2, 0x89BE50C3,
+    0xCF32AEEB, 0x89AB1D86,
+    0xCF612AAA, 0x8997FC89,
+    0xCF8FADE8, 0x8984EDCF,
+    0xCFBE389F, 0x8971F15A,
+    0xCFECCAC7, 0x895F072D,
+    0xD01B6459, 0x894C2F4C,
+    0xD04A054D, 0x893969B9,
+    0xD078AD9D, 0x8926B677,
+    0xD0A75D42, 0x89141589,
+    0xD0D61433, 0x890186F1,
+    0xD104D26B, 0x88EF0AB4,
+    0xD13397E1, 0x88DCA0D3,
+    0xD162648F, 0x88CA4951,
+    0xD191386D, 0x88B80431,
+    0xD1C01374, 0x88A5D177,
+    0xD1EEF59E, 0x8893B124,
+    0xD21DDEE1, 0x8881A33C,
+    0xD24CCF38, 0x886FA7C2,
+    0xD27BC69C, 0x885DBEB7,
+    0xD2AAC504, 0x884BE820,
+    0xD2D9CA6A, 0x883A23FE,
+    0xD308D6C6, 0x88287255,
+    0xD337EA12, 0x8816D327,
+    0xD3670445, 0x88054677,
+    0xD3962559, 0x87F3CC47,
+    0xD3C54D46, 0x87E2649B,
+    0xD3F47C06, 0x87D10F75,
+    0xD423B190, 0x87BFCCD7,
+    0xD452EDDE, 0x87AE9CC5,
+    0xD48230E8, 0x879D7F40,
+    0xD4B17AA7, 0x878C744C,
+    0xD4E0CB14, 0x877B7BEC,
+    0xD5102227, 0x876A9621,
+    0xD53F7FDA, 0x8759C2EF,
+    0xD56EE424, 0x87490257,
+    0xD59E4EFE, 0x8738545E,
+    0xD5CDC062, 0x8727B904,
+    0xD5FD3847, 0x8717304E,
+    0xD62CB6A7, 0x8706BA3C,
+    0xD65C3B7B, 0x86F656D3,
+    0xD68BC6BA, 0x86E60614,
+    0xD6BB585D, 0x86D5C802,
+    0xD6EAF05E, 0x86C59C9F,
+    0xD71A8EB5, 0x86B583EE,
+    0xD74A335A, 0x86A57DF1,
+    0xD779DE46, 0x86958AAB,
+    0xD7A98F73, 0x8685AA1F,
+    0xD7D946D7, 0x8675DC4E,
+    0xD809046D, 0x8666213C,
+    0xD838C82D, 0x865678EA,
+    0xD868920F, 0x8646E35B,
+    0xD898620C, 0x86376092,
+    0xD8C8381C, 0x8627F090,
+    0xD8F81439, 0x86189359,
+    0xD927F65B, 0x860948EE,
+    0xD957DE7A, 0x85FA1152,
+    0xD987CC8F, 0x85EAEC88,
+    0xD9B7C093, 0x85DBDA91,
+    0xD9E7BA7E, 0x85CCDB70,
+    0xDA17BA4A, 0x85BDEF27,
+    0xDA47BFED, 0x85AF15B9,
+    0xDA77CB62, 0x85A04F28,
+    0xDAA7DCA1, 0x85919B75,
+    0xDAD7F3A2, 0x8582FAA4,
+    0xDB08105E, 0x85746CB7,
+    0xDB3832CD, 0x8565F1B0,
+    0xDB685AE8, 0x85578991,
+    0xDB9888A8, 0x8549345C,
+    0xDBC8BC05, 0x853AF214,
+    0xDBF8F4F8, 0x852CC2BA,
+    0xDC293379, 0x851EA652,
+    0xDC597781, 0x85109CDC,
+    0xDC89C108, 0x8502A65C,
+    0xDCBA1008, 0x84F4C2D3,
+    0xDCEA6478, 0x84E6F244,
+    0xDD1ABE51, 0x84D934B0,
+    0xDD4B1D8B, 0x84CB8A1B,
+    0xDD7B8220, 0x84BDF285,
+    0xDDABEC07, 0x84B06DF1,
+    0xDDDC5B3A, 0x84A2FC62,
+    0xDE0CCFB1, 0x84959DD9,
+    0xDE3D4963, 0x84885257,
+    0xDE6DC84B, 0x847B19E1,
+    0xDE9E4C60, 0x846DF476,
+    0xDECED59B, 0x8460E21A,
+    0xDEFF63F4, 0x8453E2CE,
+    0xDF2FF764, 0x8446F695,
+    0xDF608FE3, 0x843A1D70,
+    0xDF912D6A, 0x842D5761,
+    0xDFC1CFF2, 0x8420A46B,
+    0xDFF27773, 0x8414048F,
+    0xE02323E5, 0x840777CF,
+    0xE053D541, 0x83FAFE2E,
+    0xE0848B7F, 0x83EE97AC,
+    0xE0B54698, 0x83E2444D,
+    0xE0E60684, 0x83D60411,
+    0xE116CB3D, 0x83C9D6FB,
+    0xE14794B9, 0x83BDBD0D,
+    0xE17862F3, 0x83B1B649,
+    0xE1A935E1, 0x83A5C2B0,
+    0xE1DA0D7E, 0x8399E244,
+    0xE20AE9C1, 0x838E1507,
+    0xE23BCAA2, 0x83825AFB,
+    0xE26CB01A, 0x8376B422,
+    0xE29D9A22, 0x836B207D,
+    0xE2CE88B2, 0x835FA00E,
+    0xE2FF7BC3, 0x835432D8,
+    0xE330734C, 0x8348D8DB,
+    0xE3616F47, 0x833D921A,
+    0xE3926FAC, 0x83325E97,
+    0xE3C37473, 0x83273E52,
+    0xE3F47D95, 0x831C314E,
+    0xE4258B0A, 0x8311378C,
+    0xE4569CCB, 0x8306510F,
+    0xE487B2CF, 0x82FB7DD8,
+    0xE4B8CD10, 0x82F0BDE8,
+    0xE4E9EB86, 0x82E61141,
+    0xE51B0E2A, 0x82DB77E5,
+    0xE54C34F3, 0x82D0F1D5,
+    0xE57D5FDA, 0x82C67F13,
+    0xE5AE8ED8, 0x82BC1FA1,
+    0xE5DFC1E4, 0x82B1D381,
+    0xE610F8F9, 0x82A79AB3,
+    0xE642340D, 0x829D753A,
+    0xE6737319, 0x82936316,
+    0xE6A4B616, 0x8289644A,
+    0xE6D5FCFC, 0x827F78D8,
+    0xE70747C3, 0x8275A0C0,
+    0xE7389664, 0x826BDC04,
+    0xE769E8D8, 0x82622AA5,
+    0xE79B3F16, 0x82588CA6,
+    0xE7CC9917, 0x824F0208,
+    0xE7FDF6D3, 0x82458ACB,
+    0xE82F5844, 0x823C26F2,
+    0xE860BD60, 0x8232D67E,
+    0xE8922621, 0x82299971,
+    0xE8C3927F, 0x82206FCB,
+    0xE8F50273, 0x8217598F,
+    0xE92675F4, 0x820E56BE,
+    0xE957ECFB, 0x82056758,
+    0xE9896780, 0x81FC8B60,
+    0xE9BAE57C, 0x81F3C2D7,
+    0xE9EC66E8, 0x81EB0DBD,
+    0xEA1DEBBB, 0x81E26C16,
+    0xEA4F73EE, 0x81D9DDE1,
+    0xEA80FF79, 0x81D16320,
+    0xEAB28E55, 0x81C8FBD5,
+    0xEAE4207A, 0x81C0A801,
+    0xEB15B5E0, 0x81B867A4,
+    0xEB474E80, 0x81B03AC1,
+    0xEB78EA52, 0x81A82159,
+    0xEBAA894E, 0x81A01B6C,
+    0xEBDC2B6D, 0x819828FD,
+    0xEC0DD0A8, 0x81904A0C,
+    0xEC3F78F5, 0x81887E9A,
+    0xEC71244F, 0x8180C6A9,
+    0xECA2D2AC, 0x8179223A,
+    0xECD48406, 0x8171914E,
+    0xED063855, 0x816A13E6,
+    0xED37EF91, 0x8162AA03,
+    0xED69A9B2, 0x815B53A8,
+    0xED9B66B2, 0x815410D3,
+    0xEDCD2687, 0x814CE188,
+    0xEDFEE92B, 0x8145C5C6,
+    0xEE30AE95, 0x813EBD90,
+    0xEE6276BF, 0x8137C8E6,
+    0xEE9441A0, 0x8130E7C8,
+    0xEEC60F31, 0x812A1A39,
+    0xEEF7DF6A, 0x81236039,
+    0xEF29B243, 0x811CB9CA,
+    0xEF5B87B5, 0x811626EC,
+    0xEF8D5FB8, 0x810FA7A0,
+    0xEFBF3A44, 0x81093BE8,
+    0xEFF11752, 0x8102E3C3,
+    0xF022F6DA, 0x80FC9F35,
+    0xF054D8D4, 0x80F66E3C,
+    0xF086BD39, 0x80F050DB,
+    0xF0B8A401, 0x80EA4712,
+    0xF0EA8D23, 0x80E450E2,
+    0xF11C789A, 0x80DE6E4C,
+    0xF14E665C, 0x80D89F51,
+    0xF1805662, 0x80D2E3F1,
+    0xF1B248A5, 0x80CD3C2F,
+    0xF1E43D1C, 0x80C7A80A,
+    0xF21633C0, 0x80C22783,
+    0xF2482C89, 0x80BCBA9C,
+    0xF27A2770, 0x80B76155,
+    0xF2AC246D, 0x80B21BAF,
+    0xF2DE2378, 0x80ACE9AB,
+    0xF310248A, 0x80A7CB49,
+    0xF342279A, 0x80A2C08B,
+    0xF3742CA1, 0x809DC970,
+    0xF3A63398, 0x8098E5FB,
+    0xF3D83C76, 0x8094162B,
+    0xF40A4734, 0x808F5A02,
+    0xF43C53CA, 0x808AB180,
+    0xF46E6231, 0x80861CA5,
+    0xF4A07260, 0x80819B74,
+    0xF4D28451, 0x807D2DEB,
+    0xF50497FA, 0x8078D40D,
+    0xF536AD55, 0x80748DD9,
+    0xF568C45A, 0x80705B50,
+    0xF59ADD01, 0x806C3C73,
+    0xF5CCF743, 0x80683143,
+    0xF5FF1317, 0x806439C0,
+    0xF6313076, 0x806055EA,
+    0xF6634F58, 0x805C85C3,
+    0xF6956FB6, 0x8058C94C,
+    0xF6C79188, 0x80552083,
+    0xF6F9B4C5, 0x80518B6B,
+    0xF72BD967, 0x804E0A03,
+    0xF75DFF65, 0x804A9C4D,
+    0xF79026B8, 0x80474248,
+    0xF7C24F58, 0x8043FBF6,
+    0xF7F4793E, 0x8040C956,
+    0xF826A461, 0x803DAA69,
+    0xF858D0BA, 0x803A9F31,
+    0xF88AFE41, 0x8037A7AC,
+    0xF8BD2CEF, 0x8034C3DC,
+    0xF8EF5CBB, 0x8031F3C1,
+    0xF9218D9E, 0x802F375C,
+    0xF953BF90, 0x802C8EAD,
+    0xF985F28A, 0x8029F9B4,
+    0xF9B82683, 0x80277872,
+    0xF9EA5B75, 0x80250AE7,
+    0xFA1C9156, 0x8022B113,
+    0xFA4EC820, 0x80206AF8,
+    0xFA80FFCB, 0x801E3894,
+    0xFAB3384F, 0x801C19E9,
+    0xFAE571A4, 0x801A0EF7,
+    0xFB17ABC2, 0x801817BF,
+    0xFB49E6A2, 0x80163440,
+    0xFB7C223C, 0x8014647A,
+    0xFBAE5E89, 0x8012A86F,
+    0xFBE09B80, 0x8011001E,
+    0xFC12D919, 0x800F6B88,
+    0xFC45174E, 0x800DEAAC,
+    0xFC775616, 0x800C7D8C,
+    0xFCA99569, 0x800B2427,
+    0xFCDBD541, 0x8009DE7D,
+    0xFD0E1594, 0x8008AC90,
+    0xFD40565B, 0x80078E5E,
+    0xFD72978F, 0x800683E8,
+    0xFDA4D928, 0x80058D2E,
+    0xFDD71B1E, 0x8004AA31,
+    0xFE095D69, 0x8003DAF0,
+    0xFE3BA001, 0x80031F6C,
+    0xFE6DE2E0, 0x800277A5,
+    0xFEA025FC, 0x8001E39B,
+    0xFED2694F, 0x8001634D,
+    0xFF04ACD0, 0x8000F6BD,
+    0xFF36F078, 0x80009DE9,
+    0xFF69343E, 0x800058D3,
+    0xFF9B781D, 0x8000277A,
+    0xFFCDBC0A, 0x800009DE
+};
+
+
+
+/*    
+* @brief  q15 Twiddle factors Table    
+*/
+
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 16	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_16_q15[24] = {
+    0x7FFF, 0x0000,
+    0x7641, 0x30FB,
+    0x5A82, 0x5A82,
+    0x30FB, 0x7641,
+    0x0000, 0x7FFF,
+    0xCF04, 0x7641,
+    0xA57D, 0x5A82,
+    0x89BE, 0x30FB,
+    0x8000, 0x0000,
+    0x89BE, 0xCF04,
+    0xA57D, 0xA57D,
+    0xCF04, 0x89BE
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 32	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_32_q15[48] = {
+    0x7FFF, 0x0000,
+    0x7D8A, 0x18F8,
+    0x7641, 0x30FB,
+    0x6A6D, 0x471C,
+    0x5A82, 0x5A82,
+    0x471C, 0x6A6D,
+    0x30FB, 0x7641,
+    0x18F8, 0x7D8A,
+    0x0000, 0x7FFF,
+    0xE707, 0x7D8A,
+    0xCF04, 0x7641,
+    0xB8E3, 0x6A6D,
+    0xA57D, 0x5A82,
+    0x9592, 0x471C,
+    0x89BE, 0x30FB,
+    0x8275, 0x18F8,
+    0x8000, 0x0000,
+    0x8275, 0xE707,
+    0x89BE, 0xCF04,
+    0x9592, 0xB8E3,
+    0xA57D, 0xA57D,
+    0xB8E3, 0x9592,
+    0xCF04, 0x89BE,
+    0xE707, 0x8275
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 64	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_64_q15[96] = {
+    0x7FFF, 0x0000,
+    0x7F62, 0x0C8B,
+    0x7D8A, 0x18F8,
+    0x7A7D, 0x2528,
+    0x7641, 0x30FB,
+    0x70E2, 0x3C56,
+    0x6A6D, 0x471C,
+    0x62F2, 0x5133,
+    0x5A82, 0x5A82,
+    0x5133, 0x62F2,
+    0x471C, 0x6A6D,
+    0x3C56, 0x70E2,
+    0x30FB, 0x7641,
+    0x2528, 0x7A7D,
+    0x18F8, 0x7D8A,
+    0x0C8B, 0x7F62,
+    0x0000, 0x7FFF,
+    0xF374, 0x7F62,
+    0xE707, 0x7D8A,
+    0xDAD7, 0x7A7D,
+    0xCF04, 0x7641,
+    0xC3A9, 0x70E2,
+    0xB8E3, 0x6A6D,
+    0xAECC, 0x62F2,
+    0xA57D, 0x5A82,
+    0x9D0D, 0x5133,
+    0x9592, 0x471C,
+    0x8F1D, 0x3C56,
+    0x89BE, 0x30FB,
+    0x8582, 0x2528,
+    0x8275, 0x18F8,
+    0x809D, 0x0C8B,
+    0x8000, 0x0000,
+    0x809D, 0xF374,
+    0x8275, 0xE707,
+    0x8582, 0xDAD7,
+    0x89BE, 0xCF04,
+    0x8F1D, 0xC3A9,
+    0x9592, 0xB8E3,
+    0x9D0D, 0xAECC,
+    0xA57D, 0xA57D,
+    0xAECC, 0x9D0D,
+    0xB8E3, 0x9592,
+    0xC3A9, 0x8F1D,
+    0xCF04, 0x89BE,
+    0xDAD7, 0x8582,
+    0xE707, 0x8275,
+    0xF374, 0x809D
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 128	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_128_q15[192] = {
+    0x7FFF, 0x0000,
+    0x7FD8, 0x0647,
+    0x7F62, 0x0C8B,
+    0x7E9D, 0x12C8,
+    0x7D8A, 0x18F8,
+    0x7C29, 0x1F19,
+    0x7A7D, 0x2528,
+    0x7884, 0x2B1F,
+    0x7641, 0x30FB,
+    0x73B5, 0x36BA,
+    0x70E2, 0x3C56,
+    0x6DCA, 0x41CE,
+    0x6A6D, 0x471C,
+    0x66CF, 0x4C3F,
+    0x62F2, 0x5133,
+    0x5ED7, 0x55F5,
+    0x5A82, 0x5A82,
+    0x55F5, 0x5ED7,
+    0x5133, 0x62F2,
+    0x4C3F, 0x66CF,
+    0x471C, 0x6A6D,
+    0x41CE, 0x6DCA,
+    0x3C56, 0x70E2,
+    0x36BA, 0x73B5,
+    0x30FB, 0x7641,
+    0x2B1F, 0x7884,
+    0x2528, 0x7A7D,
+    0x1F19, 0x7C29,
+    0x18F8, 0x7D8A,
+    0x12C8, 0x7E9D,
+    0x0C8B, 0x7F62,
+    0x0647, 0x7FD8,
+    0x0000, 0x7FFF,
+    0xF9B8, 0x7FD8,
+    0xF374, 0x7F62,
+    0xED37, 0x7E9D,
+    0xE707, 0x7D8A,
+    0xE0E6, 0x7C29,
+    0xDAD7, 0x7A7D,
+    0xD4E0, 0x7884,
+    0xCF04, 0x7641,
+    0xC945, 0x73B5,
+    0xC3A9, 0x70E2,
+    0xBE31, 0x6DCA,
+    0xB8E3, 0x6A6D,
+    0xB3C0, 0x66CF,
+    0xAECC, 0x62F2,
+    0xAA0A, 0x5ED7,
+    0xA57D, 0x5A82,
+    0xA128, 0x55F5,
+    0x9D0D, 0x5133,
+    0x9930, 0x4C3F,
+    0x9592, 0x471C,
+    0x9235, 0x41CE,
+    0x8F1D, 0x3C56,
+    0x8C4A, 0x36BA,
+    0x89BE, 0x30FB,
+    0x877B, 0x2B1F,
+    0x8582, 0x2528,
+    0x83D6, 0x1F19,
+    0x8275, 0x18F8,
+    0x8162, 0x12C8,
+    0x809D, 0x0C8B,
+    0x8027, 0x0647,
+    0x8000, 0x0000,
+    0x8027, 0xF9B8,
+    0x809D, 0xF374,
+    0x8162, 0xED37,
+    0x8275, 0xE707,
+    0x83D6, 0xE0E6,
+    0x8582, 0xDAD7,
+    0x877B, 0xD4E0,
+    0x89BE, 0xCF04,
+    0x8C4A, 0xC945,
+    0x8F1D, 0xC3A9,
+    0x9235, 0xBE31,
+    0x9592, 0xB8E3,
+    0x9930, 0xB3C0,
+    0x9D0D, 0xAECC,
+    0xA128, 0xAA0A,
+    0xA57D, 0xA57D,
+    0xAA0A, 0xA128,
+    0xAECC, 0x9D0D,
+    0xB3C0, 0x9930,
+    0xB8E3, 0x9592,
+    0xBE31, 0x9235,
+    0xC3A9, 0x8F1D,
+    0xC945, 0x8C4A,
+    0xCF04, 0x89BE,
+    0xD4E0, 0x877B,
+    0xDAD7, 0x8582,
+    0xE0E6, 0x83D6,
+    0xE707, 0x8275,
+    0xED37, 0x8162,
+    0xF374, 0x809D,
+    0xF9B8, 0x8027
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 256	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_256_q15[384] = {
+    0x7FFF, 0x0000,
+    0x7FF6, 0x0324,
+    0x7FD8, 0x0647,
+    0x7FA7, 0x096A,
+    0x7F62, 0x0C8B,
+    0x7F09, 0x0FAB,
+    0x7E9D, 0x12C8,
+    0x7E1D, 0x15E2,
+    0x7D8A, 0x18F8,
+    0x7CE3, 0x1C0B,
+    0x7C29, 0x1F19,
+    0x7B5D, 0x2223,
+    0x7A7D, 0x2528,
+    0x798A, 0x2826,
+    0x7884, 0x2B1F,
+    0x776C, 0x2E11,
+    0x7641, 0x30FB,
+    0x7504, 0x33DE,
+    0x73B5, 0x36BA,
+    0x7255, 0x398C,
+    0x70E2, 0x3C56,
+    0x6F5F, 0x3F17,
+    0x6DCA, 0x41CE,
+    0x6C24, 0x447A,
+    0x6A6D, 0x471C,
+    0x68A6, 0x49B4,
+    0x66CF, 0x4C3F,
+    0x64E8, 0x4EBF,
+    0x62F2, 0x5133,
+    0x60EC, 0x539B,
+    0x5ED7, 0x55F5,
+    0x5CB4, 0x5842,
+    0x5A82, 0x5A82,
+    0x5842, 0x5CB4,
+    0x55F5, 0x5ED7,
+    0x539B, 0x60EC,
+    0x5133, 0x62F2,
+    0x4EBF, 0x64E8,
+    0x4C3F, 0x66CF,
+    0x49B4, 0x68A6,
+    0x471C, 0x6A6D,
+    0x447A, 0x6C24,
+    0x41CE, 0x6DCA,
+    0x3F17, 0x6F5F,
+    0x3C56, 0x70E2,
+    0x398C, 0x7255,
+    0x36BA, 0x73B5,
+    0x33DE, 0x7504,
+    0x30FB, 0x7641,
+    0x2E11, 0x776C,
+    0x2B1F, 0x7884,
+    0x2826, 0x798A,
+    0x2528, 0x7A7D,
+    0x2223, 0x7B5D,
+    0x1F19, 0x7C29,
+    0x1C0B, 0x7CE3,
+    0x18F8, 0x7D8A,
+    0x15E2, 0x7E1D,
+    0x12C8, 0x7E9D,
+    0x0FAB, 0x7F09,
+    0x0C8B, 0x7F62,
+    0x096A, 0x7FA7,
+    0x0647, 0x7FD8,
+    0x0324, 0x7FF6,
+    0x0000, 0x7FFF,
+    0xFCDB, 0x7FF6,
+    0xF9B8, 0x7FD8,
+    0xF695, 0x7FA7,
+    0xF374, 0x7F62,
+    0xF054, 0x7F09,
+    0xED37, 0x7E9D,
+    0xEA1D, 0x7E1D,
+    0xE707, 0x7D8A,
+    0xE3F4, 0x7CE3,
+    0xE0E6, 0x7C29,
+    0xDDDC, 0x7B5D,
+    0xDAD7, 0x7A7D,
+    0xD7D9, 0x798A,
+    0xD4E0, 0x7884,
+    0xD1EE, 0x776C,
+    0xCF04, 0x7641,
+    0xCC21, 0x7504,
+    0xC945, 0x73B5,
+    0xC673, 0x7255,
+    0xC3A9, 0x70E2,
+    0xC0E8, 0x6F5F,
+    0xBE31, 0x6DCA,
+    0xBB85, 0x6C24,
+    0xB8E3, 0x6A6D,
+    0xB64B, 0x68A6,
+    0xB3C0, 0x66CF,
+    0xB140, 0x64E8,
+    0xAECC, 0x62F2,
+    0xAC64, 0x60EC,
+    0xAA0A, 0x5ED7,
+    0xA7BD, 0x5CB4,
+    0xA57D, 0x5A82,
+    0xA34B, 0x5842,
+    0xA128, 0x55F5,
+    0x9F13, 0x539B,
+    0x9D0D, 0x5133,
+    0x9B17, 0x4EBF,
+    0x9930, 0x4C3F,
+    0x9759, 0x49B4,
+    0x9592, 0x471C,
+    0x93DB, 0x447A,
+    0x9235, 0x41CE,
+    0x90A0, 0x3F17,
+    0x8F1D, 0x3C56,
+    0x8DAA, 0x398C,
+    0x8C4A, 0x36BA,
+    0x8AFB, 0x33DE,
+    0x89BE, 0x30FB,
+    0x8893, 0x2E11,
+    0x877B, 0x2B1F,
+    0x8675, 0x2826,
+    0x8582, 0x2528,
+    0x84A2, 0x2223,
+    0x83D6, 0x1F19,
+    0x831C, 0x1C0B,
+    0x8275, 0x18F8,
+    0x81E2, 0x15E2,
+    0x8162, 0x12C8,
+    0x80F6, 0x0FAB,
+    0x809D, 0x0C8B,
+    0x8058, 0x096A,
+    0x8027, 0x0647,
+    0x8009, 0x0324,
+    0x8000, 0x0000,
+    0x8009, 0xFCDB,
+    0x8027, 0xF9B8,
+    0x8058, 0xF695,
+    0x809D, 0xF374,
+    0x80F6, 0xF054,
+    0x8162, 0xED37,
+    0x81E2, 0xEA1D,
+    0x8275, 0xE707,
+    0x831C, 0xE3F4,
+    0x83D6, 0xE0E6,
+    0x84A2, 0xDDDC,
+    0x8582, 0xDAD7,
+    0x8675, 0xD7D9,
+    0x877B, 0xD4E0,
+    0x8893, 0xD1EE,
+    0x89BE, 0xCF04,
+    0x8AFB, 0xCC21,
+    0x8C4A, 0xC945,
+    0x8DAA, 0xC673,
+    0x8F1D, 0xC3A9,
+    0x90A0, 0xC0E8,
+    0x9235, 0xBE31,
+    0x93DB, 0xBB85,
+    0x9592, 0xB8E3,
+    0x9759, 0xB64B,
+    0x9930, 0xB3C0,
+    0x9B17, 0xB140,
+    0x9D0D, 0xAECC,
+    0x9F13, 0xAC64,
+    0xA128, 0xAA0A,
+    0xA34B, 0xA7BD,
+    0xA57D, 0xA57D,
+    0xA7BD, 0xA34B,
+    0xAA0A, 0xA128,
+    0xAC64, 0x9F13,
+    0xAECC, 0x9D0D,
+    0xB140, 0x9B17,
+    0xB3C0, 0x9930,
+    0xB64B, 0x9759,
+    0xB8E3, 0x9592,
+    0xBB85, 0x93DB,
+    0xBE31, 0x9235,
+    0xC0E8, 0x90A0,
+    0xC3A9, 0x8F1D,
+    0xC673, 0x8DAA,
+    0xC945, 0x8C4A,
+    0xCC21, 0x8AFB,
+    0xCF04, 0x89BE,
+    0xD1EE, 0x8893,
+    0xD4E0, 0x877B,
+    0xD7D9, 0x8675,
+    0xDAD7, 0x8582,
+    0xDDDC, 0x84A2,
+    0xE0E6, 0x83D6,
+    0xE3F4, 0x831C,
+    0xE707, 0x8275,
+    0xEA1D, 0x81E2,
+    0xED37, 0x8162,
+    0xF054, 0x80F6,
+    0xF374, 0x809D,
+    0xF695, 0x8058,
+    0xF9B8, 0x8027,
+    0xFCDB, 0x8009
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 512	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_512_q15[768] = {
+    0x7FFF, 0x0000,
+    0x7FFD, 0x0192,
+    0x7FF6, 0x0324,
+    0x7FE9, 0x04B6,
+    0x7FD8, 0x0647,
+    0x7FC2, 0x07D9,
+    0x7FA7, 0x096A,
+    0x7F87, 0x0AFB,
+    0x7F62, 0x0C8B,
+    0x7F38, 0x0E1B,
+    0x7F09, 0x0FAB,
+    0x7ED5, 0x1139,
+    0x7E9D, 0x12C8,
+    0x7E5F, 0x1455,
+    0x7E1D, 0x15E2,
+    0x7DD6, 0x176D,
+    0x7D8A, 0x18F8,
+    0x7D39, 0x1A82,
+    0x7CE3, 0x1C0B,
+    0x7C89, 0x1D93,
+    0x7C29, 0x1F19,
+    0x7BC5, 0x209F,
+    0x7B5D, 0x2223,
+    0x7AEF, 0x23A6,
+    0x7A7D, 0x2528,
+    0x7A05, 0x26A8,
+    0x798A, 0x2826,
+    0x7909, 0x29A3,
+    0x7884, 0x2B1F,
+    0x77FA, 0x2C98,
+    0x776C, 0x2E11,
+    0x76D9, 0x2F87,
+    0x7641, 0x30FB,
+    0x75A5, 0x326E,
+    0x7504, 0x33DE,
+    0x745F, 0x354D,
+    0x73B5, 0x36BA,
+    0x7307, 0x3824,
+    0x7255, 0x398C,
+    0x719E, 0x3AF2,
+    0x70E2, 0x3C56,
+    0x7023, 0x3DB8,
+    0x6F5F, 0x3F17,
+    0x6E96, 0x4073,
+    0x6DCA, 0x41CE,
+    0x6CF9, 0x4325,
+    0x6C24, 0x447A,
+    0x6B4A, 0x45CD,
+    0x6A6D, 0x471C,
+    0x698C, 0x4869,
+    0x68A6, 0x49B4,
+    0x67BD, 0x4AFB,
+    0x66CF, 0x4C3F,
+    0x65DD, 0x4D81,
+    0x64E8, 0x4EBF,
+    0x63EF, 0x4FFB,
+    0x62F2, 0x5133,
+    0x61F1, 0x5269,
+    0x60EC, 0x539B,
+    0x5FE3, 0x54CA,
+    0x5ED7, 0x55F5,
+    0x5DC7, 0x571D,
+    0x5CB4, 0x5842,
+    0x5B9D, 0x5964,
+    0x5A82, 0x5A82,
+    0x5964, 0x5B9D,
+    0x5842, 0x5CB4,
+    0x571D, 0x5DC7,
+    0x55F5, 0x5ED7,
+    0x54CA, 0x5FE3,
+    0x539B, 0x60EC,
+    0x5269, 0x61F1,
+    0x5133, 0x62F2,
+    0x4FFB, 0x63EF,
+    0x4EBF, 0x64E8,
+    0x4D81, 0x65DD,
+    0x4C3F, 0x66CF,
+    0x4AFB, 0x67BD,
+    0x49B4, 0x68A6,
+    0x4869, 0x698C,
+    0x471C, 0x6A6D,
+    0x45CD, 0x6B4A,
+    0x447A, 0x6C24,
+    0x4325, 0x6CF9,
+    0x41CE, 0x6DCA,
+    0x4073, 0x6E96,
+    0x3F17, 0x6F5F,
+    0x3DB8, 0x7023,
+    0x3C56, 0x70E2,
+    0x3AF2, 0x719E,
+    0x398C, 0x7255,
+    0x3824, 0x7307,
+    0x36BA, 0x73B5,
+    0x354D, 0x745F,
+    0x33DE, 0x7504,
+    0x326E, 0x75A5,
+    0x30FB, 0x7641,
+    0x2F87, 0x76D9,
+    0x2E11, 0x776C,
+    0x2C98, 0x77FA,
+    0x2B1F, 0x7884,
+    0x29A3, 0x7909,
+    0x2826, 0x798A,
+    0x26A8, 0x7A05,
+    0x2528, 0x7A7D,
+    0x23A6, 0x7AEF,
+    0x2223, 0x7B5D,
+    0x209F, 0x7BC5,
+    0x1F19, 0x7C29,
+    0x1D93, 0x7C89,
+    0x1C0B, 0x7CE3,
+    0x1A82, 0x7D39,
+    0x18F8, 0x7D8A,
+    0x176D, 0x7DD6,
+    0x15E2, 0x7E1D,
+    0x1455, 0x7E5F,
+    0x12C8, 0x7E9D,
+    0x1139, 0x7ED5,
+    0x0FAB, 0x7F09,
+    0x0E1B, 0x7F38,
+    0x0C8B, 0x7F62,
+    0x0AFB, 0x7F87,
+    0x096A, 0x7FA7,
+    0x07D9, 0x7FC2,
+    0x0647, 0x7FD8,
+    0x04B6, 0x7FE9,
+    0x0324, 0x7FF6,
+    0x0192, 0x7FFD,
+    0x0000, 0x7FFF,
+    0xFE6D, 0x7FFD,
+    0xFCDB, 0x7FF6,
+    0xFB49, 0x7FE9,
+    0xF9B8, 0x7FD8,
+    0xF826, 0x7FC2,
+    0xF695, 0x7FA7,
+    0xF504, 0x7F87,
+    0xF374, 0x7F62,
+    0xF1E4, 0x7F38,
+    0xF054, 0x7F09,
+    0xEEC6, 0x7ED5,
+    0xED37, 0x7E9D,
+    0xEBAA, 0x7E5F,
+    0xEA1D, 0x7E1D,
+    0xE892, 0x7DD6,
+    0xE707, 0x7D8A,
+    0xE57D, 0x7D39,
+    0xE3F4, 0x7CE3,
+    0xE26C, 0x7C89,
+    0xE0E6, 0x7C29,
+    0xDF60, 0x7BC5,
+    0xDDDC, 0x7B5D,
+    0xDC59, 0x7AEF,
+    0xDAD7, 0x7A7D,
+    0xD957, 0x7A05,
+    0xD7D9, 0x798A,
+    0xD65C, 0x7909,
+    0xD4E0, 0x7884,
+    0xD367, 0x77FA,
+    0xD1EE, 0x776C,
+    0xD078, 0x76D9,
+    0xCF04, 0x7641,
+    0xCD91, 0x75A5,
+    0xCC21, 0x7504,
+    0xCAB2, 0x745F,
+    0xC945, 0x73B5,
+    0xC7DB, 0x7307,
+    0xC673, 0x7255,
+    0xC50D, 0x719E,
+    0xC3A9, 0x70E2,
+    0xC247, 0x7023,
+    0xC0E8, 0x6F5F,
+    0xBF8C, 0x6E96,
+    0xBE31, 0x6DCA,
+    0xBCDA, 0x6CF9,
+    0xBB85, 0x6C24,
+    0xBA32, 0x6B4A,
+    0xB8E3, 0x6A6D,
+    0xB796, 0x698C,
+    0xB64B, 0x68A6,
+    0xB504, 0x67BD,
+    0xB3C0, 0x66CF,
+    0xB27E, 0x65DD,
+    0xB140, 0x64E8,
+    0xB004, 0x63EF,
+    0xAECC, 0x62F2,
+    0xAD96, 0x61F1,
+    0xAC64, 0x60EC,
+    0xAB35, 0x5FE3,
+    0xAA0A, 0x5ED7,
+    0xA8E2, 0x5DC7,
+    0xA7BD, 0x5CB4,
+    0xA69B, 0x5B9D,
+    0xA57D, 0x5A82,
+    0xA462, 0x5964,
+    0xA34B, 0x5842,
+    0xA238, 0x571D,
+    0xA128, 0x55F5,
+    0xA01C, 0x54CA,
+    0x9F13, 0x539B,
+    0x9E0E, 0x5269,
+    0x9D0D, 0x5133,
+    0x9C10, 0x4FFB,
+    0x9B17, 0x4EBF,
+    0x9A22, 0x4D81,
+    0x9930, 0x4C3F,
+    0x9842, 0x4AFB,
+    0x9759, 0x49B4,
+    0x9673, 0x4869,
+    0x9592, 0x471C,
+    0x94B5, 0x45CD,
+    0x93DB, 0x447A,
+    0x9306, 0x4325,
+    0x9235, 0x41CE,
+    0x9169, 0x4073,
+    0x90A0, 0x3F17,
+    0x8FDC, 0x3DB8,
+    0x8F1D, 0x3C56,
+    0x8E61, 0x3AF2,
+    0x8DAA, 0x398C,
+    0x8CF8, 0x3824,
+    0x8C4A, 0x36BA,
+    0x8BA0, 0x354D,
+    0x8AFB, 0x33DE,
+    0x8A5A, 0x326E,
+    0x89BE, 0x30FB,
+    0x8926, 0x2F87,
+    0x8893, 0x2E11,
+    0x8805, 0x2C98,
+    0x877B, 0x2B1F,
+    0x86F6, 0x29A3,
+    0x8675, 0x2826,
+    0x85FA, 0x26A8,
+    0x8582, 0x2528,
+    0x8510, 0x23A6,
+    0x84A2, 0x2223,
+    0x843A, 0x209F,
+    0x83D6, 0x1F19,
+    0x8376, 0x1D93,
+    0x831C, 0x1C0B,
+    0x82C6, 0x1A82,
+    0x8275, 0x18F8,
+    0x8229, 0x176D,
+    0x81E2, 0x15E2,
+    0x81A0, 0x1455,
+    0x8162, 0x12C8,
+    0x812A, 0x1139,
+    0x80F6, 0x0FAB,
+    0x80C7, 0x0E1B,
+    0x809D, 0x0C8B,
+    0x8078, 0x0AFB,
+    0x8058, 0x096A,
+    0x803D, 0x07D9,
+    0x8027, 0x0647,
+    0x8016, 0x04B6,
+    0x8009, 0x0324,
+    0x8002, 0x0192,
+    0x8000, 0x0000,
+    0x8002, 0xFE6D,
+    0x8009, 0xFCDB,
+    0x8016, 0xFB49,
+    0x8027, 0xF9B8,
+    0x803D, 0xF826,
+    0x8058, 0xF695,
+    0x8078, 0xF504,
+    0x809D, 0xF374,
+    0x80C7, 0xF1E4,
+    0x80F6, 0xF054,
+    0x812A, 0xEEC6,
+    0x8162, 0xED37,
+    0x81A0, 0xEBAA,
+    0x81E2, 0xEA1D,
+    0x8229, 0xE892,
+    0x8275, 0xE707,
+    0x82C6, 0xE57D,
+    0x831C, 0xE3F4,
+    0x8376, 0xE26C,
+    0x83D6, 0xE0E6,
+    0x843A, 0xDF60,
+    0x84A2, 0xDDDC,
+    0x8510, 0xDC59,
+    0x8582, 0xDAD7,
+    0x85FA, 0xD957,
+    0x8675, 0xD7D9,
+    0x86F6, 0xD65C,
+    0x877B, 0xD4E0,
+    0x8805, 0xD367,
+    0x8893, 0xD1EE,
+    0x8926, 0xD078,
+    0x89BE, 0xCF04,
+    0x8A5A, 0xCD91,
+    0x8AFB, 0xCC21,
+    0x8BA0, 0xCAB2,
+    0x8C4A, 0xC945,
+    0x8CF8, 0xC7DB,
+    0x8DAA, 0xC673,
+    0x8E61, 0xC50D,
+    0x8F1D, 0xC3A9,
+    0x8FDC, 0xC247,
+    0x90A0, 0xC0E8,
+    0x9169, 0xBF8C,
+    0x9235, 0xBE31,
+    0x9306, 0xBCDA,
+    0x93DB, 0xBB85,
+    0x94B5, 0xBA32,
+    0x9592, 0xB8E3,
+    0x9673, 0xB796,
+    0x9759, 0xB64B,
+    0x9842, 0xB504,
+    0x9930, 0xB3C0,
+    0x9A22, 0xB27E,
+    0x9B17, 0xB140,
+    0x9C10, 0xB004,
+    0x9D0D, 0xAECC,
+    0x9E0E, 0xAD96,
+    0x9F13, 0xAC64,
+    0xA01C, 0xAB35,
+    0xA128, 0xAA0A,
+    0xA238, 0xA8E2,
+    0xA34B, 0xA7BD,
+    0xA462, 0xA69B,
+    0xA57D, 0xA57D,
+    0xA69B, 0xA462,
+    0xA7BD, 0xA34B,
+    0xA8E2, 0xA238,
+    0xAA0A, 0xA128,
+    0xAB35, 0xA01C,
+    0xAC64, 0x9F13,
+    0xAD96, 0x9E0E,
+    0xAECC, 0x9D0D,
+    0xB004, 0x9C10,
+    0xB140, 0x9B17,
+    0xB27E, 0x9A22,
+    0xB3C0, 0x9930,
+    0xB504, 0x9842,
+    0xB64B, 0x9759,
+    0xB796, 0x9673,
+    0xB8E3, 0x9592,
+    0xBA32, 0x94B5,
+    0xBB85, 0x93DB,
+    0xBCDA, 0x9306,
+    0xBE31, 0x9235,
+    0xBF8C, 0x9169,
+    0xC0E8, 0x90A0,
+    0xC247, 0x8FDC,
+    0xC3A9, 0x8F1D,
+    0xC50D, 0x8E61,
+    0xC673, 0x8DAA,
+    0xC7DB, 0x8CF8,
+    0xC945, 0x8C4A,
+    0xCAB2, 0x8BA0,
+    0xCC21, 0x8AFB,
+    0xCD91, 0x8A5A,
+    0xCF04, 0x89BE,
+    0xD078, 0x8926,
+    0xD1EE, 0x8893,
+    0xD367, 0x8805,
+    0xD4E0, 0x877B,
+    0xD65C, 0x86F6,
+    0xD7D9, 0x8675,
+    0xD957, 0x85FA,
+    0xDAD7, 0x8582,
+    0xDC59, 0x8510,
+    0xDDDC, 0x84A2,
+    0xDF60, 0x843A,
+    0xE0E6, 0x83D6,
+    0xE26C, 0x8376,
+    0xE3F4, 0x831C,
+    0xE57D, 0x82C6,
+    0xE707, 0x8275,
+    0xE892, 0x8229,
+    0xEA1D, 0x81E2,
+    0xEBAA, 0x81A0,
+    0xED37, 0x8162,
+    0xEEC6, 0x812A,
+    0xF054, 0x80F6,
+    0xF1E4, 0x80C7,
+    0xF374, 0x809D,
+    0xF504, 0x8078,
+    0xF695, 0x8058,
+    0xF826, 0x803D,
+    0xF9B8, 0x8027,
+    0xFB49, 0x8016,
+    0xFCDB, 0x8009,
+    0xFE6D, 0x8002
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 1024	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_1024_q15[1536] = {
+    0x7FFF, 0x0000,
+    0x7FFF, 0x00C9,
+    0x7FFD, 0x0192,
+    0x7FFA, 0x025B,
+    0x7FF6, 0x0324,
+    0x7FF0, 0x03ED,
+    0x7FE9, 0x04B6,
+    0x7FE1, 0x057F,
+    0x7FD8, 0x0647,
+    0x7FCE, 0x0710,
+    0x7FC2, 0x07D9,
+    0x7FB5, 0x08A2,
+    0x7FA7, 0x096A,
+    0x7F97, 0x0A33,
+    0x7F87, 0x0AFB,
+    0x7F75, 0x0BC3,
+    0x7F62, 0x0C8B,
+    0x7F4D, 0x0D53,
+    0x7F38, 0x0E1B,
+    0x7F21, 0x0EE3,
+    0x7F09, 0x0FAB,
+    0x7EF0, 0x1072,
+    0x7ED5, 0x1139,
+    0x7EBA, 0x1201,
+    0x7E9D, 0x12C8,
+    0x7E7F, 0x138E,
+    0x7E5F, 0x1455,
+    0x7E3F, 0x151B,
+    0x7E1D, 0x15E2,
+    0x7DFA, 0x16A8,
+    0x7DD6, 0x176D,
+    0x7DB0, 0x1833,
+    0x7D8A, 0x18F8,
+    0x7D62, 0x19BD,
+    0x7D39, 0x1A82,
+    0x7D0F, 0x1B47,
+    0x7CE3, 0x1C0B,
+    0x7CB7, 0x1CCF,
+    0x7C89, 0x1D93,
+    0x7C5A, 0x1E56,
+    0x7C29, 0x1F19,
+    0x7BF8, 0x1FDC,
+    0x7BC5, 0x209F,
+    0x7B92, 0x2161,
+    0x7B5D, 0x2223,
+    0x7B26, 0x22E5,
+    0x7AEF, 0x23A6,
+    0x7AB6, 0x2467,
+    0x7A7D, 0x2528,
+    0x7A42, 0x25E8,
+    0x7A05, 0x26A8,
+    0x79C8, 0x2767,
+    0x798A, 0x2826,
+    0x794A, 0x28E5,
+    0x7909, 0x29A3,
+    0x78C7, 0x2A61,
+    0x7884, 0x2B1F,
+    0x7840, 0x2BDC,
+    0x77FA, 0x2C98,
+    0x77B4, 0x2D55,
+    0x776C, 0x2E11,
+    0x7723, 0x2ECC,
+    0x76D9, 0x2F87,
+    0x768E, 0x3041,
+    0x7641, 0x30FB,
+    0x75F4, 0x31B5,
+    0x75A5, 0x326E,
+    0x7555, 0x3326,
+    0x7504, 0x33DE,
+    0x74B2, 0x3496,
+    0x745F, 0x354D,
+    0x740B, 0x3604,
+    0x73B5, 0x36BA,
+    0x735F, 0x376F,
+    0x7307, 0x3824,
+    0x72AF, 0x38D8,
+    0x7255, 0x398C,
+    0x71FA, 0x3A40,
+    0x719E, 0x3AF2,
+    0x7141, 0x3BA5,
+    0x70E2, 0x3C56,
+    0x7083, 0x3D07,
+    0x7023, 0x3DB8,
+    0x6FC1, 0x3E68,
+    0x6F5F, 0x3F17,
+    0x6EFB, 0x3FC5,
+    0x6E96, 0x4073,
+    0x6E30, 0x4121,
+    0x6DCA, 0x41CE,
+    0x6D62, 0x427A,
+    0x6CF9, 0x4325,
+    0x6C8F, 0x43D0,
+    0x6C24, 0x447A,
+    0x6BB8, 0x4524,
+    0x6B4A, 0x45CD,
+    0x6ADC, 0x4675,
+    0x6A6D, 0x471C,
+    0x69FD, 0x47C3,
+    0x698C, 0x4869,
+    0x6919, 0x490F,
+    0x68A6, 0x49B4,
+    0x6832, 0x4A58,
+    0x67BD, 0x4AFB,
+    0x6746, 0x4B9E,
+    0x66CF, 0x4C3F,
+    0x6657, 0x4CE1,
+    0x65DD, 0x4D81,
+    0x6563, 0x4E21,
+    0x64E8, 0x4EBF,
+    0x646C, 0x4F5E,
+    0x63EF, 0x4FFB,
+    0x6371, 0x5097,
+    0x62F2, 0x5133,
+    0x6271, 0x51CE,
+    0x61F1, 0x5269,
+    0x616F, 0x5302,
+    0x60EC, 0x539B,
+    0x6068, 0x5433,
+    0x5FE3, 0x54CA,
+    0x5F5E, 0x5560,
+    0x5ED7, 0x55F5,
+    0x5E50, 0x568A,
+    0x5DC7, 0x571D,
+    0x5D3E, 0x57B0,
+    0x5CB4, 0x5842,
+    0x5C29, 0x58D4,
+    0x5B9D, 0x5964,
+    0x5B10, 0x59F3,
+    0x5A82, 0x5A82,
+    0x59F3, 0x5B10,
+    0x5964, 0x5B9D,
+    0x58D4, 0x5C29,
+    0x5842, 0x5CB4,
+    0x57B0, 0x5D3E,
+    0x571D, 0x5DC7,
+    0x568A, 0x5E50,
+    0x55F5, 0x5ED7,
+    0x5560, 0x5F5E,
+    0x54CA, 0x5FE3,
+    0x5433, 0x6068,
+    0x539B, 0x60EC,
+    0x5302, 0x616F,
+    0x5269, 0x61F1,
+    0x51CE, 0x6271,
+    0x5133, 0x62F2,
+    0x5097, 0x6371,
+    0x4FFB, 0x63EF,
+    0x4F5E, 0x646C,
+    0x4EBF, 0x64E8,
+    0x4E21, 0x6563,
+    0x4D81, 0x65DD,
+    0x4CE1, 0x6657,
+    0x4C3F, 0x66CF,
+    0x4B9E, 0x6746,
+    0x4AFB, 0x67BD,
+    0x4A58, 0x6832,
+    0x49B4, 0x68A6,
+    0x490F, 0x6919,
+    0x4869, 0x698C,
+    0x47C3, 0x69FD,
+    0x471C, 0x6A6D,
+    0x4675, 0x6ADC,
+    0x45CD, 0x6B4A,
+    0x4524, 0x6BB8,
+    0x447A, 0x6C24,
+    0x43D0, 0x6C8F,
+    0x4325, 0x6CF9,
+    0x427A, 0x6D62,
+    0x41CE, 0x6DCA,
+    0x4121, 0x6E30,
+    0x4073, 0x6E96,
+    0x3FC5, 0x6EFB,
+    0x3F17, 0x6F5F,
+    0x3E68, 0x6FC1,
+    0x3DB8, 0x7023,
+    0x3D07, 0x7083,
+    0x3C56, 0x70E2,
+    0x3BA5, 0x7141,
+    0x3AF2, 0x719E,
+    0x3A40, 0x71FA,
+    0x398C, 0x7255,
+    0x38D8, 0x72AF,
+    0x3824, 0x7307,
+    0x376F, 0x735F,
+    0x36BA, 0x73B5,
+    0x3604, 0x740B,
+    0x354D, 0x745F,
+    0x3496, 0x74B2,
+    0x33DE, 0x7504,
+    0x3326, 0x7555,
+    0x326E, 0x75A5,
+    0x31B5, 0x75F4,
+    0x30FB, 0x7641,
+    0x3041, 0x768E,
+    0x2F87, 0x76D9,
+    0x2ECC, 0x7723,
+    0x2E11, 0x776C,
+    0x2D55, 0x77B4,
+    0x2C98, 0x77FA,
+    0x2BDC, 0x7840,
+    0x2B1F, 0x7884,
+    0x2A61, 0x78C7,
+    0x29A3, 0x7909,
+    0x28E5, 0x794A,
+    0x2826, 0x798A,
+    0x2767, 0x79C8,
+    0x26A8, 0x7A05,
+    0x25E8, 0x7A42,
+    0x2528, 0x7A7D,
+    0x2467, 0x7AB6,
+    0x23A6, 0x7AEF,
+    0x22E5, 0x7B26,
+    0x2223, 0x7B5D,
+    0x2161, 0x7B92,
+    0x209F, 0x7BC5,
+    0x1FDC, 0x7BF8,
+    0x1F19, 0x7C29,
+    0x1E56, 0x7C5A,
+    0x1D93, 0x7C89,
+    0x1CCF, 0x7CB7,
+    0x1C0B, 0x7CE3,
+    0x1B47, 0x7D0F,
+    0x1A82, 0x7D39,
+    0x19BD, 0x7D62,
+    0x18F8, 0x7D8A,
+    0x1833, 0x7DB0,
+    0x176D, 0x7DD6,
+    0x16A8, 0x7DFA,
+    0x15E2, 0x7E1D,
+    0x151B, 0x7E3F,
+    0x1455, 0x7E5F,
+    0x138E, 0x7E7F,
+    0x12C8, 0x7E9D,
+    0x1201, 0x7EBA,
+    0x1139, 0x7ED5,
+    0x1072, 0x7EF0,
+    0x0FAB, 0x7F09,
+    0x0EE3, 0x7F21,
+    0x0E1B, 0x7F38,
+    0x0D53, 0x7F4D,
+    0x0C8B, 0x7F62,
+    0x0BC3, 0x7F75,
+    0x0AFB, 0x7F87,
+    0x0A33, 0x7F97,
+    0x096A, 0x7FA7,
+    0x08A2, 0x7FB5,
+    0x07D9, 0x7FC2,
+    0x0710, 0x7FCE,
+    0x0647, 0x7FD8,
+    0x057F, 0x7FE1,
+    0x04B6, 0x7FE9,
+    0x03ED, 0x7FF0,
+    0x0324, 0x7FF6,
+    0x025B, 0x7FFA,
+    0x0192, 0x7FFD,
+    0x00C9, 0x7FFF,
+    0x0000, 0x7FFF,
+    0xFF36, 0x7FFF,
+    0xFE6D, 0x7FFD,
+    0xFDA4, 0x7FFA,
+    0xFCDB, 0x7FF6,
+    0xFC12, 0x7FF0,
+    0xFB49, 0x7FE9,
+    0xFA80, 0x7FE1,
+    0xF9B8, 0x7FD8,
+    0xF8EF, 0x7FCE,
+    0xF826, 0x7FC2,
+    0xF75D, 0x7FB5,
+    0xF695, 0x7FA7,
+    0xF5CC, 0x7F97,
+    0xF504, 0x7F87,
+    0xF43C, 0x7F75,
+    0xF374, 0x7F62,
+    0xF2AC, 0x7F4D,
+    0xF1E4, 0x7F38,
+    0xF11C, 0x7F21,
+    0xF054, 0x7F09,
+    0xEF8D, 0x7EF0,
+    0xEEC6, 0x7ED5,
+    0xEDFE, 0x7EBA,
+    0xED37, 0x7E9D,
+    0xEC71, 0x7E7F,
+    0xEBAA, 0x7E5F,
+    0xEAE4, 0x7E3F,
+    0xEA1D, 0x7E1D,
+    0xE957, 0x7DFA,
+    0xE892, 0x7DD6,
+    0xE7CC, 0x7DB0,
+    0xE707, 0x7D8A,
+    0xE642, 0x7D62,
+    0xE57D, 0x7D39,
+    0xE4B8, 0x7D0F,
+    0xE3F4, 0x7CE3,
+    0xE330, 0x7CB7,
+    0xE26C, 0x7C89,
+    0xE1A9, 0x7C5A,
+    0xE0E6, 0x7C29,
+    0xE023, 0x7BF8,
+    0xDF60, 0x7BC5,
+    0xDE9E, 0x7B92,
+    0xDDDC, 0x7B5D,
+    0xDD1A, 0x7B26,
+    0xDC59, 0x7AEF,
+    0xDB98, 0x7AB6,
+    0xDAD7, 0x7A7D,
+    0xDA17, 0x7A42,
+    0xD957, 0x7A05,
+    0xD898, 0x79C8,
+    0xD7D9, 0x798A,
+    0xD71A, 0x794A,
+    0xD65C, 0x7909,
+    0xD59E, 0x78C7,
+    0xD4E0, 0x7884,
+    0xD423, 0x7840,
+    0xD367, 0x77FA,
+    0xD2AA, 0x77B4,
+    0xD1EE, 0x776C,
+    0xD133, 0x7723,
+    0xD078, 0x76D9,
+    0xCFBE, 0x768E,
+    0xCF04, 0x7641,
+    0xCE4A, 0x75F4,
+    0xCD91, 0x75A5,
+    0xCCD9, 0x7555,
+    0xCC21, 0x7504,
+    0xCB69, 0x74B2,
+    0xCAB2, 0x745F,
+    0xC9FB, 0x740B,
+    0xC945, 0x73B5,
+    0xC890, 0x735F,
+    0xC7DB, 0x7307,
+    0xC727, 0x72AF,
+    0xC673, 0x7255,
+    0xC5BF, 0x71FA,
+    0xC50D, 0x719E,
+    0xC45A, 0x7141,
+    0xC3A9, 0x70E2,
+    0xC2F8, 0x7083,
+    0xC247, 0x7023,
+    0xC197, 0x6FC1,
+    0xC0E8, 0x6F5F,
+    0xC03A, 0x6EFB,
+    0xBF8C, 0x6E96,
+    0xBEDE, 0x6E30,
+    0xBE31, 0x6DCA,
+    0xBD85, 0x6D62,
+    0xBCDA, 0x6CF9,
+    0xBC2F, 0x6C8F,
+    0xBB85, 0x6C24,
+    0xBADB, 0x6BB8,
+    0xBA32, 0x6B4A,
+    0xB98A, 0x6ADC,
+    0xB8E3, 0x6A6D,
+    0xB83C, 0x69FD,
+    0xB796, 0x698C,
+    0xB6F0, 0x6919,
+    0xB64B, 0x68A6,
+    0xB5A7, 0x6832,
+    0xB504, 0x67BD,
+    0xB461, 0x6746,
+    0xB3C0, 0x66CF,
+    0xB31E, 0x6657,
+    0xB27E, 0x65DD,
+    0xB1DE, 0x6563,
+    0xB140, 0x64E8,
+    0xB0A1, 0x646C,
+    0xB004, 0x63EF,
+    0xAF68, 0x6371,
+    0xAECC, 0x62F2,
+    0xAE31, 0x6271,
+    0xAD96, 0x61F1,
+    0xACFD, 0x616F,
+    0xAC64, 0x60EC,
+    0xABCC, 0x6068,
+    0xAB35, 0x5FE3,
+    0xAA9F, 0x5F5E,
+    0xAA0A, 0x5ED7,
+    0xA975, 0x5E50,
+    0xA8E2, 0x5DC7,
+    0xA84F, 0x5D3E,
+    0xA7BD, 0x5CB4,
+    0xA72B, 0x5C29,
+    0xA69B, 0x5B9D,
+    0xA60C, 0x5B10,
+    0xA57D, 0x5A82,
+    0xA4EF, 0x59F3,
+    0xA462, 0x5964,
+    0xA3D6, 0x58D4,
+    0xA34B, 0x5842,
+    0xA2C1, 0x57B0,
+    0xA238, 0x571D,
+    0xA1AF, 0x568A,
+    0xA128, 0x55F5,
+    0xA0A1, 0x5560,
+    0xA01C, 0x54CA,
+    0x9F97, 0x5433,
+    0x9F13, 0x539B,
+    0x9E90, 0x5302,
+    0x9E0E, 0x5269,
+    0x9D8E, 0x51CE,
+    0x9D0D, 0x5133,
+    0x9C8E, 0x5097,
+    0x9C10, 0x4FFB,
+    0x9B93, 0x4F5E,
+    0x9B17, 0x4EBF,
+    0x9A9C, 0x4E21,
+    0x9A22, 0x4D81,
+    0x99A8, 0x4CE1,
+    0x9930, 0x4C3F,
+    0x98B9, 0x4B9E,
+    0x9842, 0x4AFB,
+    0x97CD, 0x4A58,
+    0x9759, 0x49B4,
+    0x96E6, 0x490F,
+    0x9673, 0x4869,
+    0x9602, 0x47C3,
+    0x9592, 0x471C,
+    0x9523, 0x4675,
+    0x94B5, 0x45CD,
+    0x9447, 0x4524,
+    0x93DB, 0x447A,
+    0x9370, 0x43D0,
+    0x9306, 0x4325,
+    0x929D, 0x427A,
+    0x9235, 0x41CE,
+    0x91CF, 0x4121,
+    0x9169, 0x4073,
+    0x9104, 0x3FC5,
+    0x90A0, 0x3F17,
+    0x903E, 0x3E68,
+    0x8FDC, 0x3DB8,
+    0x8F7C, 0x3D07,
+    0x8F1D, 0x3C56,
+    0x8EBE, 0x3BA5,
+    0x8E61, 0x3AF2,
+    0x8E05, 0x3A40,
+    0x8DAA, 0x398C,
+    0x8D50, 0x38D8,
+    0x8CF8, 0x3824,
+    0x8CA0, 0x376F,
+    0x8C4A, 0x36BA,
+    0x8BF4, 0x3604,
+    0x8BA0, 0x354D,
+    0x8B4D, 0x3496,
+    0x8AFB, 0x33DE,
+    0x8AAA, 0x3326,
+    0x8A5A, 0x326E,
+    0x8A0B, 0x31B5,
+    0x89BE, 0x30FB,
+    0x8971, 0x3041,
+    0x8926, 0x2F87,
+    0x88DC, 0x2ECC,
+    0x8893, 0x2E11,
+    0x884B, 0x2D55,
+    0x8805, 0x2C98,
+    0x87BF, 0x2BDC,
+    0x877B, 0x2B1F,
+    0x8738, 0x2A61,
+    0x86F6, 0x29A3,
+    0x86B5, 0x28E5,
+    0x8675, 0x2826,
+    0x8637, 0x2767,
+    0x85FA, 0x26A8,
+    0x85BD, 0x25E8,
+    0x8582, 0x2528,
+    0x8549, 0x2467,
+    0x8510, 0x23A6,
+    0x84D9, 0x22E5,
+    0x84A2, 0x2223,
+    0x846D, 0x2161,
+    0x843A, 0x209F,
+    0x8407, 0x1FDC,
+    0x83D6, 0x1F19,
+    0x83A5, 0x1E56,
+    0x8376, 0x1D93,
+    0x8348, 0x1CCF,
+    0x831C, 0x1C0B,
+    0x82F0, 0x1B47,
+    0x82C6, 0x1A82,
+    0x829D, 0x19BD,
+    0x8275, 0x18F8,
+    0x824F, 0x1833,
+    0x8229, 0x176D,
+    0x8205, 0x16A8,
+    0x81E2, 0x15E2,
+    0x81C0, 0x151B,
+    0x81A0, 0x1455,
+    0x8180, 0x138E,
+    0x8162, 0x12C8,
+    0x8145, 0x1201,
+    0x812A, 0x1139,
+    0x810F, 0x1072,
+    0x80F6, 0x0FAB,
+    0x80DE, 0x0EE3,
+    0x80C7, 0x0E1B,
+    0x80B2, 0x0D53,
+    0x809D, 0x0C8B,
+    0x808A, 0x0BC3,
+    0x8078, 0x0AFB,
+    0x8068, 0x0A33,
+    0x8058, 0x096A,
+    0x804A, 0x08A2,
+    0x803D, 0x07D9,
+    0x8031, 0x0710,
+    0x8027, 0x0647,
+    0x801E, 0x057F,
+    0x8016, 0x04B6,
+    0x800F, 0x03ED,
+    0x8009, 0x0324,
+    0x8005, 0x025B,
+    0x8002, 0x0192,
+    0x8000, 0x00C9,
+    0x8000, 0x0000,
+    0x8000, 0xFF36,
+    0x8002, 0xFE6D,
+    0x8005, 0xFDA4,
+    0x8009, 0xFCDB,
+    0x800F, 0xFC12,
+    0x8016, 0xFB49,
+    0x801E, 0xFA80,
+    0x8027, 0xF9B8,
+    0x8031, 0xF8EF,
+    0x803D, 0xF826,
+    0x804A, 0xF75D,
+    0x8058, 0xF695,
+    0x8068, 0xF5CC,
+    0x8078, 0xF504,
+    0x808A, 0xF43C,
+    0x809D, 0xF374,
+    0x80B2, 0xF2AC,
+    0x80C7, 0xF1E4,
+    0x80DE, 0xF11C,
+    0x80F6, 0xF054,
+    0x810F, 0xEF8D,
+    0x812A, 0xEEC6,
+    0x8145, 0xEDFE,
+    0x8162, 0xED37,
+    0x8180, 0xEC71,
+    0x81A0, 0xEBAA,
+    0x81C0, 0xEAE4,
+    0x81E2, 0xEA1D,
+    0x8205, 0xE957,
+    0x8229, 0xE892,
+    0x824F, 0xE7CC,
+    0x8275, 0xE707,
+    0x829D, 0xE642,
+    0x82C6, 0xE57D,
+    0x82F0, 0xE4B8,
+    0x831C, 0xE3F4,
+    0x8348, 0xE330,
+    0x8376, 0xE26C,
+    0x83A5, 0xE1A9,
+    0x83D6, 0xE0E6,
+    0x8407, 0xE023,
+    0x843A, 0xDF60,
+    0x846D, 0xDE9E,
+    0x84A2, 0xDDDC,
+    0x84D9, 0xDD1A,
+    0x8510, 0xDC59,
+    0x8549, 0xDB98,
+    0x8582, 0xDAD7,
+    0x85BD, 0xDA17,
+    0x85FA, 0xD957,
+    0x8637, 0xD898,
+    0x8675, 0xD7D9,
+    0x86B5, 0xD71A,
+    0x86F6, 0xD65C,
+    0x8738, 0xD59E,
+    0x877B, 0xD4E0,
+    0x87BF, 0xD423,
+    0x8805, 0xD367,
+    0x884B, 0xD2AA,
+    0x8893, 0xD1EE,
+    0x88DC, 0xD133,
+    0x8926, 0xD078,
+    0x8971, 0xCFBE,
+    0x89BE, 0xCF04,
+    0x8A0B, 0xCE4A,
+    0x8A5A, 0xCD91,
+    0x8AAA, 0xCCD9,
+    0x8AFB, 0xCC21,
+    0x8B4D, 0xCB69,
+    0x8BA0, 0xCAB2,
+    0x8BF4, 0xC9FB,
+    0x8C4A, 0xC945,
+    0x8CA0, 0xC890,
+    0x8CF8, 0xC7DB,
+    0x8D50, 0xC727,
+    0x8DAA, 0xC673,
+    0x8E05, 0xC5BF,
+    0x8E61, 0xC50D,
+    0x8EBE, 0xC45A,
+    0x8F1D, 0xC3A9,
+    0x8F7C, 0xC2F8,
+    0x8FDC, 0xC247,
+    0x903E, 0xC197,
+    0x90A0, 0xC0E8,
+    0x9104, 0xC03A,
+    0x9169, 0xBF8C,
+    0x91CF, 0xBEDE,
+    0x9235, 0xBE31,
+    0x929D, 0xBD85,
+    0x9306, 0xBCDA,
+    0x9370, 0xBC2F,
+    0x93DB, 0xBB85,
+    0x9447, 0xBADB,
+    0x94B5, 0xBA32,
+    0x9523, 0xB98A,
+    0x9592, 0xB8E3,
+    0x9602, 0xB83C,
+    0x9673, 0xB796,
+    0x96E6, 0xB6F0,
+    0x9759, 0xB64B,
+    0x97CD, 0xB5A7,
+    0x9842, 0xB504,
+    0x98B9, 0xB461,
+    0x9930, 0xB3C0,
+    0x99A8, 0xB31E,
+    0x9A22, 0xB27E,
+    0x9A9C, 0xB1DE,
+    0x9B17, 0xB140,
+    0x9B93, 0xB0A1,
+    0x9C10, 0xB004,
+    0x9C8E, 0xAF68,
+    0x9D0D, 0xAECC,
+    0x9D8E, 0xAE31,
+    0x9E0E, 0xAD96,
+    0x9E90, 0xACFD,
+    0x9F13, 0xAC64,
+    0x9F97, 0xABCC,
+    0xA01C, 0xAB35,
+    0xA0A1, 0xAA9F,
+    0xA128, 0xAA0A,
+    0xA1AF, 0xA975,
+    0xA238, 0xA8E2,
+    0xA2C1, 0xA84F,
+    0xA34B, 0xA7BD,
+    0xA3D6, 0xA72B,
+    0xA462, 0xA69B,
+    0xA4EF, 0xA60C,
+    0xA57D, 0xA57D,
+    0xA60C, 0xA4EF,
+    0xA69B, 0xA462,
+    0xA72B, 0xA3D6,
+    0xA7BD, 0xA34B,
+    0xA84F, 0xA2C1,
+    0xA8E2, 0xA238,
+    0xA975, 0xA1AF,
+    0xAA0A, 0xA128,
+    0xAA9F, 0xA0A1,
+    0xAB35, 0xA01C,
+    0xABCC, 0x9F97,
+    0xAC64, 0x9F13,
+    0xACFD, 0x9E90,
+    0xAD96, 0x9E0E,
+    0xAE31, 0x9D8E,
+    0xAECC, 0x9D0D,
+    0xAF68, 0x9C8E,
+    0xB004, 0x9C10,
+    0xB0A1, 0x9B93,
+    0xB140, 0x9B17,
+    0xB1DE, 0x9A9C,
+    0xB27E, 0x9A22,
+    0xB31E, 0x99A8,
+    0xB3C0, 0x9930,
+    0xB461, 0x98B9,
+    0xB504, 0x9842,
+    0xB5A7, 0x97CD,
+    0xB64B, 0x9759,
+    0xB6F0, 0x96E6,
+    0xB796, 0x9673,
+    0xB83C, 0x9602,
+    0xB8E3, 0x9592,
+    0xB98A, 0x9523,
+    0xBA32, 0x94B5,
+    0xBADB, 0x9447,
+    0xBB85, 0x93DB,
+    0xBC2F, 0x9370,
+    0xBCDA, 0x9306,
+    0xBD85, 0x929D,
+    0xBE31, 0x9235,
+    0xBEDE, 0x91CF,
+    0xBF8C, 0x9169,
+    0xC03A, 0x9104,
+    0xC0E8, 0x90A0,
+    0xC197, 0x903E,
+    0xC247, 0x8FDC,
+    0xC2F8, 0x8F7C,
+    0xC3A9, 0x8F1D,
+    0xC45A, 0x8EBE,
+    0xC50D, 0x8E61,
+    0xC5BF, 0x8E05,
+    0xC673, 0x8DAA,
+    0xC727, 0x8D50,
+    0xC7DB, 0x8CF8,
+    0xC890, 0x8CA0,
+    0xC945, 0x8C4A,
+    0xC9FB, 0x8BF4,
+    0xCAB2, 0x8BA0,
+    0xCB69, 0x8B4D,
+    0xCC21, 0x8AFB,
+    0xCCD9, 0x8AAA,
+    0xCD91, 0x8A5A,
+    0xCE4A, 0x8A0B,
+    0xCF04, 0x89BE,
+    0xCFBE, 0x8971,
+    0xD078, 0x8926,
+    0xD133, 0x88DC,
+    0xD1EE, 0x8893,
+    0xD2AA, 0x884B,
+    0xD367, 0x8805,
+    0xD423, 0x87BF,
+    0xD4E0, 0x877B,
+    0xD59E, 0x8738,
+    0xD65C, 0x86F6,
+    0xD71A, 0x86B5,
+    0xD7D9, 0x8675,
+    0xD898, 0x8637,
+    0xD957, 0x85FA,
+    0xDA17, 0x85BD,
+    0xDAD7, 0x8582,
+    0xDB98, 0x8549,
+    0xDC59, 0x8510,
+    0xDD1A, 0x84D9,
+    0xDDDC, 0x84A2,
+    0xDE9E, 0x846D,
+    0xDF60, 0x843A,
+    0xE023, 0x8407,
+    0xE0E6, 0x83D6,
+    0xE1A9, 0x83A5,
+    0xE26C, 0x8376,
+    0xE330, 0x8348,
+    0xE3F4, 0x831C,
+    0xE4B8, 0x82F0,
+    0xE57D, 0x82C6,
+    0xE642, 0x829D,
+    0xE707, 0x8275,
+    0xE7CC, 0x824F,
+    0xE892, 0x8229,
+    0xE957, 0x8205,
+    0xEA1D, 0x81E2,
+    0xEAE4, 0x81C0,
+    0xEBAA, 0x81A0,
+    0xEC71, 0x8180,
+    0xED37, 0x8162,
+    0xEDFE, 0x8145,
+    0xEEC6, 0x812A,
+    0xEF8D, 0x810F,
+    0xF054, 0x80F6,
+    0xF11C, 0x80DE,
+    0xF1E4, 0x80C7,
+    0xF2AC, 0x80B2,
+    0xF374, 0x809D,
+    0xF43C, 0x808A,
+    0xF504, 0x8078,
+    0xF5CC, 0x8068,
+    0xF695, 0x8058,
+    0xF75D, 0x804A,
+    0xF826, 0x803D,
+    0xF8EF, 0x8031,
+    0xF9B8, 0x8027,
+    0xFA80, 0x801E,
+    0xFB49, 0x8016,
+    0xFC12, 0x800F,
+    0xFCDB, 0x8009,
+    0xFDA4, 0x8005,
+    0xFE6D, 0x8002,
+    0xFF36, 0x8000
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 2048	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_2048_q15[3072] = {
+    0x7FFF, 0x0000,
+    0x7FFF, 0x0064,
+    0x7FFF, 0x00C9,
+    0x7FFE, 0x012D,
+    0x7FFD, 0x0192,
+    0x7FFC, 0x01F6,
+    0x7FFA, 0x025B,
+    0x7FF8, 0x02BF,
+    0x7FF6, 0x0324,
+    0x7FF3, 0x0388,
+    0x7FF0, 0x03ED,
+    0x7FED, 0x0451,
+    0x7FE9, 0x04B6,
+    0x7FE5, 0x051A,
+    0x7FE1, 0x057F,
+    0x7FDD, 0x05E3,
+    0x7FD8, 0x0647,
+    0x7FD3, 0x06AC,
+    0x7FCE, 0x0710,
+    0x7FC8, 0x0775,
+    0x7FC2, 0x07D9,
+    0x7FBC, 0x083D,
+    0x7FB5, 0x08A2,
+    0x7FAE, 0x0906,
+    0x7FA7, 0x096A,
+    0x7F9F, 0x09CE,
+    0x7F97, 0x0A33,
+    0x7F8F, 0x0A97,
+    0x7F87, 0x0AFB,
+    0x7F7E, 0x0B5F,
+    0x7F75, 0x0BC3,
+    0x7F6B, 0x0C27,
+    0x7F62, 0x0C8B,
+    0x7F58, 0x0CEF,
+    0x7F4D, 0x0D53,
+    0x7F43, 0x0DB7,
+    0x7F38, 0x0E1B,
+    0x7F2D, 0x0E7F,
+    0x7F21, 0x0EE3,
+    0x7F15, 0x0F47,
+    0x7F09, 0x0FAB,
+    0x7EFD, 0x100E,
+    0x7EF0, 0x1072,
+    0x7EE3, 0x10D6,
+    0x7ED5, 0x1139,
+    0x7EC8, 0x119D,
+    0x7EBA, 0x1201,
+    0x7EAB, 0x1264,
+    0x7E9D, 0x12C8,
+    0x7E8E, 0x132B,
+    0x7E7F, 0x138E,
+    0x7E6F, 0x13F2,
+    0x7E5F, 0x1455,
+    0x7E4F, 0x14B8,
+    0x7E3F, 0x151B,
+    0x7E2E, 0x157F,
+    0x7E1D, 0x15E2,
+    0x7E0C, 0x1645,
+    0x7DFA, 0x16A8,
+    0x7DE8, 0x170A,
+    0x7DD6, 0x176D,
+    0x7DC3, 0x17D0,
+    0x7DB0, 0x1833,
+    0x7D9D, 0x1896,
+    0x7D8A, 0x18F8,
+    0x7D76, 0x195B,
+    0x7D62, 0x19BD,
+    0x7D4E, 0x1A20,
+    0x7D39, 0x1A82,
+    0x7D24, 0x1AE4,
+    0x7D0F, 0x1B47,
+    0x7CF9, 0x1BA9,
+    0x7CE3, 0x1C0B,
+    0x7CCD, 0x1C6D,
+    0x7CB7, 0x1CCF,
+    0x7CA0, 0x1D31,
+    0x7C89, 0x1D93,
+    0x7C71, 0x1DF5,
+    0x7C5A, 0x1E56,
+    0x7C42, 0x1EB8,
+    0x7C29, 0x1F19,
+    0x7C11, 0x1F7B,
+    0x7BF8, 0x1FDC,
+    0x7BDF, 0x203E,
+    0x7BC5, 0x209F,
+    0x7BAC, 0x2100,
+    0x7B92, 0x2161,
+    0x7B77, 0x21C2,
+    0x7B5D, 0x2223,
+    0x7B42, 0x2284,
+    0x7B26, 0x22E5,
+    0x7B0B, 0x2345,
+    0x7AEF, 0x23A6,
+    0x7AD3, 0x2407,
+    0x7AB6, 0x2467,
+    0x7A9A, 0x24C7,
+    0x7A7D, 0x2528,
+    0x7A5F, 0x2588,
+    0x7A42, 0x25E8,
+    0x7A24, 0x2648,
+    0x7A05, 0x26A8,
+    0x79E7, 0x2707,
+    0x79C8, 0x2767,
+    0x79A9, 0x27C7,
+    0x798A, 0x2826,
+    0x796A, 0x2886,
+    0x794A, 0x28E5,
+    0x792A, 0x2944,
+    0x7909, 0x29A3,
+    0x78E8, 0x2A02,
+    0x78C7, 0x2A61,
+    0x78A6, 0x2AC0,
+    0x7884, 0x2B1F,
+    0x7862, 0x2B7D,
+    0x7840, 0x2BDC,
+    0x781D, 0x2C3A,
+    0x77FA, 0x2C98,
+    0x77D7, 0x2CF7,
+    0x77B4, 0x2D55,
+    0x7790, 0x2DB3,
+    0x776C, 0x2E11,
+    0x7747, 0x2E6E,
+    0x7723, 0x2ECC,
+    0x76FE, 0x2F29,
+    0x76D9, 0x2F87,
+    0x76B3, 0x2FE4,
+    0x768E, 0x3041,
+    0x7668, 0x309E,
+    0x7641, 0x30FB,
+    0x761B, 0x3158,
+    0x75F4, 0x31B5,
+    0x75CC, 0x3211,
+    0x75A5, 0x326E,
+    0x757D, 0x32CA,
+    0x7555, 0x3326,
+    0x752D, 0x3382,
+    0x7504, 0x33DE,
+    0x74DB, 0x343A,
+    0x74B2, 0x3496,
+    0x7489, 0x34F2,
+    0x745F, 0x354D,
+    0x7435, 0x35A8,
+    0x740B, 0x3604,
+    0x73E0, 0x365F,
+    0x73B5, 0x36BA,
+    0x738A, 0x3714,
+    0x735F, 0x376F,
+    0x7333, 0x37CA,
+    0x7307, 0x3824,
+    0x72DB, 0x387E,
+    0x72AF, 0x38D8,
+    0x7282, 0x3932,
+    0x7255, 0x398C,
+    0x7227, 0x39E6,
+    0x71FA, 0x3A40,
+    0x71CC, 0x3A99,
+    0x719E, 0x3AF2,
+    0x716F, 0x3B4C,
+    0x7141, 0x3BA5,
+    0x7112, 0x3BFD,
+    0x70E2, 0x3C56,
+    0x70B3, 0x3CAF,
+    0x7083, 0x3D07,
+    0x7053, 0x3D60,
+    0x7023, 0x3DB8,
+    0x6FF2, 0x3E10,
+    0x6FC1, 0x3E68,
+    0x6F90, 0x3EBF,
+    0x6F5F, 0x3F17,
+    0x6F2D, 0x3F6E,
+    0x6EFB, 0x3FC5,
+    0x6EC9, 0x401D,
+    0x6E96, 0x4073,
+    0x6E63, 0x40CA,
+    0x6E30, 0x4121,
+    0x6DFD, 0x4177,
+    0x6DCA, 0x41CE,
+    0x6D96, 0x4224,
+    0x6D62, 0x427A,
+    0x6D2D, 0x42D0,
+    0x6CF9, 0x4325,
+    0x6CC4, 0x437B,
+    0x6C8F, 0x43D0,
+    0x6C59, 0x4425,
+    0x6C24, 0x447A,
+    0x6BEE, 0x44CF,
+    0x6BB8, 0x4524,
+    0x6B81, 0x4578,
+    0x6B4A, 0x45CD,
+    0x6B13, 0x4621,
+    0x6ADC, 0x4675,
+    0x6AA5, 0x46C9,
+    0x6A6D, 0x471C,
+    0x6A35, 0x4770,
+    0x69FD, 0x47C3,
+    0x69C4, 0x4816,
+    0x698C, 0x4869,
+    0x6953, 0x48BC,
+    0x6919, 0x490F,
+    0x68E0, 0x4961,
+    0x68A6, 0x49B4,
+    0x686C, 0x4A06,
+    0x6832, 0x4A58,
+    0x67F7, 0x4AA9,
+    0x67BD, 0x4AFB,
+    0x6782, 0x4B4C,
+    0x6746, 0x4B9E,
+    0x670B, 0x4BEF,
+    0x66CF, 0x4C3F,
+    0x6693, 0x4C90,
+    0x6657, 0x4CE1,
+    0x661A, 0x4D31,
+    0x65DD, 0x4D81,
+    0x65A0, 0x4DD1,
+    0x6563, 0x4E21,
+    0x6526, 0x4E70,
+    0x64E8, 0x4EBF,
+    0x64AA, 0x4F0F,
+    0x646C, 0x4F5E,
+    0x642D, 0x4FAC,
+    0x63EF, 0x4FFB,
+    0x63B0, 0x5049,
+    0x6371, 0x5097,
+    0x6331, 0x50E5,
+    0x62F2, 0x5133,
+    0x62B2, 0x5181,
+    0x6271, 0x51CE,
+    0x6231, 0x521C,
+    0x61F1, 0x5269,
+    0x61B0, 0x52B5,
+    0x616F, 0x5302,
+    0x612D, 0x534E,
+    0x60EC, 0x539B,
+    0x60AA, 0x53E7,
+    0x6068, 0x5433,
+    0x6026, 0x547E,
+    0x5FE3, 0x54CA,
+    0x5FA0, 0x5515,
+    0x5F5E, 0x5560,
+    0x5F1A, 0x55AB,
+    0x5ED7, 0x55F5,
+    0x5E93, 0x5640,
+    0x5E50, 0x568A,
+    0x5E0B, 0x56D4,
+    0x5DC7, 0x571D,
+    0x5D83, 0x5767,
+    0x5D3E, 0x57B0,
+    0x5CF9, 0x57F9,
+    0x5CB4, 0x5842,
+    0x5C6E, 0x588B,
+    0x5C29, 0x58D4,
+    0x5BE3, 0x591C,
+    0x5B9D, 0x5964,
+    0x5B56, 0x59AC,
+    0x5B10, 0x59F3,
+    0x5AC9, 0x5A3B,
+    0x5A82, 0x5A82,
+    0x5A3B, 0x5AC9,
+    0x59F3, 0x5B10,
+    0x59AC, 0x5B56,
+    0x5964, 0x5B9D,
+    0x591C, 0x5BE3,
+    0x58D4, 0x5C29,
+    0x588B, 0x5C6E,
+    0x5842, 0x5CB4,
+    0x57F9, 0x5CF9,
+    0x57B0, 0x5D3E,
+    0x5767, 0x5D83,
+    0x571D, 0x5DC7,
+    0x56D4, 0x5E0B,
+    0x568A, 0x5E50,
+    0x5640, 0x5E93,
+    0x55F5, 0x5ED7,
+    0x55AB, 0x5F1A,
+    0x5560, 0x5F5E,
+    0x5515, 0x5FA0,
+    0x54CA, 0x5FE3,
+    0x547E, 0x6026,
+    0x5433, 0x6068,
+    0x53E7, 0x60AA,
+    0x539B, 0x60EC,
+    0x534E, 0x612D,
+    0x5302, 0x616F,
+    0x52B5, 0x61B0,
+    0x5269, 0x61F1,
+    0x521C, 0x6231,
+    0x51CE, 0x6271,
+    0x5181, 0x62B2,
+    0x5133, 0x62F2,
+    0x50E5, 0x6331,
+    0x5097, 0x6371,
+    0x5049, 0x63B0,
+    0x4FFB, 0x63EF,
+    0x4FAC, 0x642D,
+    0x4F5E, 0x646C,
+    0x4F0F, 0x64AA,
+    0x4EBF, 0x64E8,
+    0x4E70, 0x6526,
+    0x4E21, 0x6563,
+    0x4DD1, 0x65A0,
+    0x4D81, 0x65DD,
+    0x4D31, 0x661A,
+    0x4CE1, 0x6657,
+    0x4C90, 0x6693,
+    0x4C3F, 0x66CF,
+    0x4BEF, 0x670B,
+    0x4B9E, 0x6746,
+    0x4B4C, 0x6782,
+    0x4AFB, 0x67BD,
+    0x4AA9, 0x67F7,
+    0x4A58, 0x6832,
+    0x4A06, 0x686C,
+    0x49B4, 0x68A6,
+    0x4961, 0x68E0,
+    0x490F, 0x6919,
+    0x48BC, 0x6953,
+    0x4869, 0x698C,
+    0x4816, 0x69C4,
+    0x47C3, 0x69FD,
+    0x4770, 0x6A35,
+    0x471C, 0x6A6D,
+    0x46C9, 0x6AA5,
+    0x4675, 0x6ADC,
+    0x4621, 0x6B13,
+    0x45CD, 0x6B4A,
+    0x4578, 0x6B81,
+    0x4524, 0x6BB8,
+    0x44CF, 0x6BEE,
+    0x447A, 0x6C24,
+    0x4425, 0x6C59,
+    0x43D0, 0x6C8F,
+    0x437B, 0x6CC4,
+    0x4325, 0x6CF9,
+    0x42D0, 0x6D2D,
+    0x427A, 0x6D62,
+    0x4224, 0x6D96,
+    0x41CE, 0x6DCA,
+    0x4177, 0x6DFD,
+    0x4121, 0x6E30,
+    0x40CA, 0x6E63,
+    0x4073, 0x6E96,
+    0x401D, 0x6EC9,
+    0x3FC5, 0x6EFB,
+    0x3F6E, 0x6F2D,
+    0x3F17, 0x6F5F,
+    0x3EBF, 0x6F90,
+    0x3E68, 0x6FC1,
+    0x3E10, 0x6FF2,
+    0x3DB8, 0x7023,
+    0x3D60, 0x7053,
+    0x3D07, 0x7083,
+    0x3CAF, 0x70B3,
+    0x3C56, 0x70E2,
+    0x3BFD, 0x7112,
+    0x3BA5, 0x7141,
+    0x3B4C, 0x716F,
+    0x3AF2, 0x719E,
+    0x3A99, 0x71CC,
+    0x3A40, 0x71FA,
+    0x39E6, 0x7227,
+    0x398C, 0x7255,
+    0x3932, 0x7282,
+    0x38D8, 0x72AF,
+    0x387E, 0x72DB,
+    0x3824, 0x7307,
+    0x37CA, 0x7333,
+    0x376F, 0x735F,
+    0x3714, 0x738A,
+    0x36BA, 0x73B5,
+    0x365F, 0x73E0,
+    0x3604, 0x740B,
+    0x35A8, 0x7435,
+    0x354D, 0x745F,
+    0x34F2, 0x7489,
+    0x3496, 0x74B2,
+    0x343A, 0x74DB,
+    0x33DE, 0x7504,
+    0x3382, 0x752D,
+    0x3326, 0x7555,
+    0x32CA, 0x757D,
+    0x326E, 0x75A5,
+    0x3211, 0x75CC,
+    0x31B5, 0x75F4,
+    0x3158, 0x761B,
+    0x30FB, 0x7641,
+    0x309E, 0x7668,
+    0x3041, 0x768E,
+    0x2FE4, 0x76B3,
+    0x2F87, 0x76D9,
+    0x2F29, 0x76FE,
+    0x2ECC, 0x7723,
+    0x2E6E, 0x7747,
+    0x2E11, 0x776C,
+    0x2DB3, 0x7790,
+    0x2D55, 0x77B4,
+    0x2CF7, 0x77D7,
+    0x2C98, 0x77FA,
+    0x2C3A, 0x781D,
+    0x2BDC, 0x7840,
+    0x2B7D, 0x7862,
+    0x2B1F, 0x7884,
+    0x2AC0, 0x78A6,
+    0x2A61, 0x78C7,
+    0x2A02, 0x78E8,
+    0x29A3, 0x7909,
+    0x2944, 0x792A,
+    0x28E5, 0x794A,
+    0x2886, 0x796A,
+    0x2826, 0x798A,
+    0x27C7, 0x79A9,
+    0x2767, 0x79C8,
+    0x2707, 0x79E7,
+    0x26A8, 0x7A05,
+    0x2648, 0x7A24,
+    0x25E8, 0x7A42,
+    0x2588, 0x7A5F,
+    0x2528, 0x7A7D,
+    0x24C7, 0x7A9A,
+    0x2467, 0x7AB6,
+    0x2407, 0x7AD3,
+    0x23A6, 0x7AEF,
+    0x2345, 0x7B0B,
+    0x22E5, 0x7B26,
+    0x2284, 0x7B42,
+    0x2223, 0x7B5D,
+    0x21C2, 0x7B77,
+    0x2161, 0x7B92,
+    0x2100, 0x7BAC,
+    0x209F, 0x7BC5,
+    0x203E, 0x7BDF,
+    0x1FDC, 0x7BF8,
+    0x1F7B, 0x7C11,
+    0x1F19, 0x7C29,
+    0x1EB8, 0x7C42,
+    0x1E56, 0x7C5A,
+    0x1DF5, 0x7C71,
+    0x1D93, 0x7C89,
+    0x1D31, 0x7CA0,
+    0x1CCF, 0x7CB7,
+    0x1C6D, 0x7CCD,
+    0x1C0B, 0x7CE3,
+    0x1BA9, 0x7CF9,
+    0x1B47, 0x7D0F,
+    0x1AE4, 0x7D24,
+    0x1A82, 0x7D39,
+    0x1A20, 0x7D4E,
+    0x19BD, 0x7D62,
+    0x195B, 0x7D76,
+    0x18F8, 0x7D8A,
+    0x1896, 0x7D9D,
+    0x1833, 0x7DB0,
+    0x17D0, 0x7DC3,
+    0x176D, 0x7DD6,
+    0x170A, 0x7DE8,
+    0x16A8, 0x7DFA,
+    0x1645, 0x7E0C,
+    0x15E2, 0x7E1D,
+    0x157F, 0x7E2E,
+    0x151B, 0x7E3F,
+    0x14B8, 0x7E4F,
+    0x1455, 0x7E5F,
+    0x13F2, 0x7E6F,
+    0x138E, 0x7E7F,
+    0x132B, 0x7E8E,
+    0x12C8, 0x7E9D,
+    0x1264, 0x7EAB,
+    0x1201, 0x7EBA,
+    0x119D, 0x7EC8,
+    0x1139, 0x7ED5,
+    0x10D6, 0x7EE3,
+    0x1072, 0x7EF0,
+    0x100E, 0x7EFD,
+    0x0FAB, 0x7F09,
+    0x0F47, 0x7F15,
+    0x0EE3, 0x7F21,
+    0x0E7F, 0x7F2D,
+    0x0E1B, 0x7F38,
+    0x0DB7, 0x7F43,
+    0x0D53, 0x7F4D,
+    0x0CEF, 0x7F58,
+    0x0C8B, 0x7F62,
+    0x0C27, 0x7F6B,
+    0x0BC3, 0x7F75,
+    0x0B5F, 0x7F7E,
+    0x0AFB, 0x7F87,
+    0x0A97, 0x7F8F,
+    0x0A33, 0x7F97,
+    0x09CE, 0x7F9F,
+    0x096A, 0x7FA7,
+    0x0906, 0x7FAE,
+    0x08A2, 0x7FB5,
+    0x083D, 0x7FBC,
+    0x07D9, 0x7FC2,
+    0x0775, 0x7FC8,
+    0x0710, 0x7FCE,
+    0x06AC, 0x7FD3,
+    0x0647, 0x7FD8,
+    0x05E3, 0x7FDD,
+    0x057F, 0x7FE1,
+    0x051A, 0x7FE5,
+    0x04B6, 0x7FE9,
+    0x0451, 0x7FED,
+    0x03ED, 0x7FF0,
+    0x0388, 0x7FF3,
+    0x0324, 0x7FF6,
+    0x02BF, 0x7FF8,
+    0x025B, 0x7FFA,
+    0x01F6, 0x7FFC,
+    0x0192, 0x7FFD,
+    0x012D, 0x7FFE,
+    0x00C9, 0x7FFF,
+    0x0064, 0x7FFF,
+    0x0000, 0x7FFF,
+    0xFF9B, 0x7FFF,
+    0xFF36, 0x7FFF,
+    0xFED2, 0x7FFE,
+    0xFE6D, 0x7FFD,
+    0xFE09, 0x7FFC,
+    0xFDA4, 0x7FFA,
+    0xFD40, 0x7FF8,
+    0xFCDB, 0x7FF6,
+    0xFC77, 0x7FF3,
+    0xFC12, 0x7FF0,
+    0xFBAE, 0x7FED,
+    0xFB49, 0x7FE9,
+    0xFAE5, 0x7FE5,
+    0xFA80, 0x7FE1,
+    0xFA1C, 0x7FDD,
+    0xF9B8, 0x7FD8,
+    0xF953, 0x7FD3,
+    0xF8EF, 0x7FCE,
+    0xF88A, 0x7FC8,
+    0xF826, 0x7FC2,
+    0xF7C2, 0x7FBC,
+    0xF75D, 0x7FB5,
+    0xF6F9, 0x7FAE,
+    0xF695, 0x7FA7,
+    0xF631, 0x7F9F,
+    0xF5CC, 0x7F97,
+    0xF568, 0x7F8F,
+    0xF504, 0x7F87,
+    0xF4A0, 0x7F7E,
+    0xF43C, 0x7F75,
+    0xF3D8, 0x7F6B,
+    0xF374, 0x7F62,
+    0xF310, 0x7F58,
+    0xF2AC, 0x7F4D,
+    0xF248, 0x7F43,
+    0xF1E4, 0x7F38,
+    0xF180, 0x7F2D,
+    0xF11C, 0x7F21,
+    0xF0B8, 0x7F15,
+    0xF054, 0x7F09,
+    0xEFF1, 0x7EFD,
+    0xEF8D, 0x7EF0,
+    0xEF29, 0x7EE3,
+    0xEEC6, 0x7ED5,
+    0xEE62, 0x7EC8,
+    0xEDFE, 0x7EBA,
+    0xED9B, 0x7EAB,
+    0xED37, 0x7E9D,
+    0xECD4, 0x7E8E,
+    0xEC71, 0x7E7F,
+    0xEC0D, 0x7E6F,
+    0xEBAA, 0x7E5F,
+    0xEB47, 0x7E4F,
+    0xEAE4, 0x7E3F,
+    0xEA80, 0x7E2E,
+    0xEA1D, 0x7E1D,
+    0xE9BA, 0x7E0C,
+    0xE957, 0x7DFA,
+    0xE8F5, 0x7DE8,
+    0xE892, 0x7DD6,
+    0xE82F, 0x7DC3,
+    0xE7CC, 0x7DB0,
+    0xE769, 0x7D9D,
+    0xE707, 0x7D8A,
+    0xE6A4, 0x7D76,
+    0xE642, 0x7D62,
+    0xE5DF, 0x7D4E,
+    0xE57D, 0x7D39,
+    0xE51B, 0x7D24,
+    0xE4B8, 0x7D0F,
+    0xE456, 0x7CF9,
+    0xE3F4, 0x7CE3,
+    0xE392, 0x7CCD,
+    0xE330, 0x7CB7,
+    0xE2CE, 0x7CA0,
+    0xE26C, 0x7C89,
+    0xE20A, 0x7C71,
+    0xE1A9, 0x7C5A,
+    0xE147, 0x7C42,
+    0xE0E6, 0x7C29,
+    0xE084, 0x7C11,
+    0xE023, 0x7BF8,
+    0xDFC1, 0x7BDF,
+    0xDF60, 0x7BC5,
+    0xDEFF, 0x7BAC,
+    0xDE9E, 0x7B92,
+    0xDE3D, 0x7B77,
+    0xDDDC, 0x7B5D,
+    0xDD7B, 0x7B42,
+    0xDD1A, 0x7B26,
+    0xDCBA, 0x7B0B,
+    0xDC59, 0x7AEF,
+    0xDBF8, 0x7AD3,
+    0xDB98, 0x7AB6,
+    0xDB38, 0x7A9A,
+    0xDAD7, 0x7A7D,
+    0xDA77, 0x7A5F,
+    0xDA17, 0x7A42,
+    0xD9B7, 0x7A24,
+    0xD957, 0x7A05,
+    0xD8F8, 0x79E7,
+    0xD898, 0x79C8,
+    0xD838, 0x79A9,
+    0xD7D9, 0x798A,
+    0xD779, 0x796A,
+    0xD71A, 0x794A,
+    0xD6BB, 0x792A,
+    0xD65C, 0x7909,
+    0xD5FD, 0x78E8,
+    0xD59E, 0x78C7,
+    0xD53F, 0x78A6,
+    0xD4E0, 0x7884,
+    0xD482, 0x7862,
+    0xD423, 0x7840,
+    0xD3C5, 0x781D,
+    0xD367, 0x77FA,
+    0xD308, 0x77D7,
+    0xD2AA, 0x77B4,
+    0xD24C, 0x7790,
+    0xD1EE, 0x776C,
+    0xD191, 0x7747,
+    0xD133, 0x7723,
+    0xD0D6, 0x76FE,
+    0xD078, 0x76D9,
+    0xD01B, 0x76B3,
+    0xCFBE, 0x768E,
+    0xCF61, 0x7668,
+    0xCF04, 0x7641,
+    0xCEA7, 0x761B,
+    0xCE4A, 0x75F4,
+    0xCDEE, 0x75CC,
+    0xCD91, 0x75A5,
+    0xCD35, 0x757D,
+    0xCCD9, 0x7555,
+    0xCC7D, 0x752D,
+    0xCC21, 0x7504,
+    0xCBC5, 0x74DB,
+    0xCB69, 0x74B2,
+    0xCB0D, 0x7489,
+    0xCAB2, 0x745F,
+    0xCA57, 0x7435,
+    0xC9FB, 0x740B,
+    0xC9A0, 0x73E0,
+    0xC945, 0x73B5,
+    0xC8EB, 0x738A,
+    0xC890, 0x735F,
+    0xC835, 0x7333,
+    0xC7DB, 0x7307,
+    0xC781, 0x72DB,
+    0xC727, 0x72AF,
+    0xC6CD, 0x7282,
+    0xC673, 0x7255,
+    0xC619, 0x7227,
+    0xC5BF, 0x71FA,
+    0xC566, 0x71CC,
+    0xC50D, 0x719E,
+    0xC4B3, 0x716F,
+    0xC45A, 0x7141,
+    0xC402, 0x7112,
+    0xC3A9, 0x70E2,
+    0xC350, 0x70B3,
+    0xC2F8, 0x7083,
+    0xC29F, 0x7053,
+    0xC247, 0x7023,
+    0xC1EF, 0x6FF2,
+    0xC197, 0x6FC1,
+    0xC140, 0x6F90,
+    0xC0E8, 0x6F5F,
+    0xC091, 0x6F2D,
+    0xC03A, 0x6EFB,
+    0xBFE2, 0x6EC9,
+    0xBF8C, 0x6E96,
+    0xBF35, 0x6E63,
+    0xBEDE, 0x6E30,
+    0xBE88, 0x6DFD,
+    0xBE31, 0x6DCA,
+    0xBDDB, 0x6D96,
+    0xBD85, 0x6D62,
+    0xBD2F, 0x6D2D,
+    0xBCDA, 0x6CF9,
+    0xBC84, 0x6CC4,
+    0xBC2F, 0x6C8F,
+    0xBBDA, 0x6C59,
+    0xBB85, 0x6C24,
+    0xBB30, 0x6BEE,
+    0xBADB, 0x6BB8,
+    0xBA87, 0x6B81,
+    0xBA32, 0x6B4A,
+    0xB9DE, 0x6B13,
+    0xB98A, 0x6ADC,
+    0xB936, 0x6AA5,
+    0xB8E3, 0x6A6D,
+    0xB88F, 0x6A35,
+    0xB83C, 0x69FD,
+    0xB7E9, 0x69C4,
+    0xB796, 0x698C,
+    0xB743, 0x6953,
+    0xB6F0, 0x6919,
+    0xB69E, 0x68E0,
+    0xB64B, 0x68A6,
+    0xB5F9, 0x686C,
+    0xB5A7, 0x6832,
+    0xB556, 0x67F7,
+    0xB504, 0x67BD,
+    0xB4B3, 0x6782,
+    0xB461, 0x6746,
+    0xB410, 0x670B,
+    0xB3C0, 0x66CF,
+    0xB36F, 0x6693,
+    0xB31E, 0x6657,
+    0xB2CE, 0x661A,
+    0xB27E, 0x65DD,
+    0xB22E, 0x65A0,
+    0xB1DE, 0x6563,
+    0xB18F, 0x6526,
+    0xB140, 0x64E8,
+    0xB0F0, 0x64AA,
+    0xB0A1, 0x646C,
+    0xB053, 0x642D,
+    0xB004, 0x63EF,
+    0xAFB6, 0x63B0,
+    0xAF68, 0x6371,
+    0xAF1A, 0x6331,
+    0xAECC, 0x62F2,
+    0xAE7E, 0x62B2,
+    0xAE31, 0x6271,
+    0xADE3, 0x6231,
+    0xAD96, 0x61F1,
+    0xAD4A, 0x61B0,
+    0xACFD, 0x616F,
+    0xACB1, 0x612D,
+    0xAC64, 0x60EC,
+    0xAC18, 0x60AA,
+    0xABCC, 0x6068,
+    0xAB81, 0x6026,
+    0xAB35, 0x5FE3,
+    0xAAEA, 0x5FA0,
+    0xAA9F, 0x5F5E,
+    0xAA54, 0x5F1A,
+    0xAA0A, 0x5ED7,
+    0xA9BF, 0x5E93,
+    0xA975, 0x5E50,
+    0xA92B, 0x5E0B,
+    0xA8E2, 0x5DC7,
+    0xA898, 0x5D83,
+    0xA84F, 0x5D3E,
+    0xA806, 0x5CF9,
+    0xA7BD, 0x5CB4,
+    0xA774, 0x5C6E,
+    0xA72B, 0x5C29,
+    0xA6E3, 0x5BE3,
+    0xA69B, 0x5B9D,
+    0xA653, 0x5B56,
+    0xA60C, 0x5B10,
+    0xA5C4, 0x5AC9,
+    0xA57D, 0x5A82,
+    0xA536, 0x5A3B,
+    0xA4EF, 0x59F3,
+    0xA4A9, 0x59AC,
+    0xA462, 0x5964,
+    0xA41C, 0x591C,
+    0xA3D6, 0x58D4,
+    0xA391, 0x588B,
+    0xA34B, 0x5842,
+    0xA306, 0x57F9,
+    0xA2C1, 0x57B0,
+    0xA27C, 0x5767,
+    0xA238, 0x571D,
+    0xA1F4, 0x56D4,
+    0xA1AF, 0x568A,
+    0xA16C, 0x5640,
+    0xA128, 0x55F5,
+    0xA0E5, 0x55AB,
+    0xA0A1, 0x5560,
+    0xA05F, 0x5515,
+    0xA01C, 0x54CA,
+    0x9FD9, 0x547E,
+    0x9F97, 0x5433,
+    0x9F55, 0x53E7,
+    0x9F13, 0x539B,
+    0x9ED2, 0x534E,
+    0x9E90, 0x5302,
+    0x9E4F, 0x52B5,
+    0x9E0E, 0x5269,
+    0x9DCE, 0x521C,
+    0x9D8E, 0x51CE,
+    0x9D4D, 0x5181,
+    0x9D0D, 0x5133,
+    0x9CCE, 0x50E5,
+    0x9C8E, 0x5097,
+    0x9C4F, 0x5049,
+    0x9C10, 0x4FFB,
+    0x9BD2, 0x4FAC,
+    0x9B93, 0x4F5E,
+    0x9B55, 0x4F0F,
+    0x9B17, 0x4EBF,
+    0x9AD9, 0x4E70,
+    0x9A9C, 0x4E21,
+    0x9A5F, 0x4DD1,
+    0x9A22, 0x4D81,
+    0x99E5, 0x4D31,
+    0x99A8, 0x4CE1,
+    0x996C, 0x4C90,
+    0x9930, 0x4C3F,
+    0x98F4, 0x4BEF,
+    0x98B9, 0x4B9E,
+    0x987D, 0x4B4C,
+    0x9842, 0x4AFB,
+    0x9808, 0x4AA9,
+    0x97CD, 0x4A58,
+    0x9793, 0x4A06,
+    0x9759, 0x49B4,
+    0x971F, 0x4961,
+    0x96E6, 0x490F,
+    0x96AC, 0x48BC,
+    0x9673, 0x4869,
+    0x963B, 0x4816,
+    0x9602, 0x47C3,
+    0x95CA, 0x4770,
+    0x9592, 0x471C,
+    0x955A, 0x46C9,
+    0x9523, 0x4675,
+    0x94EC, 0x4621,
+    0x94B5, 0x45CD,
+    0x947E, 0x4578,
+    0x9447, 0x4524,
+    0x9411, 0x44CF,
+    0x93DB, 0x447A,
+    0x93A6, 0x4425,
+    0x9370, 0x43D0,
+    0x933B, 0x437B,
+    0x9306, 0x4325,
+    0x92D2, 0x42D0,
+    0x929D, 0x427A,
+    0x9269, 0x4224,
+    0x9235, 0x41CE,
+    0x9202, 0x4177,
+    0x91CF, 0x4121,
+    0x919C, 0x40CA,
+    0x9169, 0x4073,
+    0x9136, 0x401D,
+    0x9104, 0x3FC5,
+    0x90D2, 0x3F6E,
+    0x90A0, 0x3F17,
+    0x906F, 0x3EBF,
+    0x903E, 0x3E68,
+    0x900D, 0x3E10,
+    0x8FDC, 0x3DB8,
+    0x8FAC, 0x3D60,
+    0x8F7C, 0x3D07,
+    0x8F4C, 0x3CAF,
+    0x8F1D, 0x3C56,
+    0x8EED, 0x3BFD,
+    0x8EBE, 0x3BA5,
+    0x8E90, 0x3B4C,
+    0x8E61, 0x3AF2,
+    0x8E33, 0x3A99,
+    0x8E05, 0x3A40,
+    0x8DD8, 0x39E6,
+    0x8DAA, 0x398C,
+    0x8D7D, 0x3932,
+    0x8D50, 0x38D8,
+    0x8D24, 0x387E,
+    0x8CF8, 0x3824,
+    0x8CCC, 0x37CA,
+    0x8CA0, 0x376F,
+    0x8C75, 0x3714,
+    0x8C4A, 0x36BA,
+    0x8C1F, 0x365F,
+    0x8BF4, 0x3604,
+    0x8BCA, 0x35A8,
+    0x8BA0, 0x354D,
+    0x8B76, 0x34F2,
+    0x8B4D, 0x3496,
+    0x8B24, 0x343A,
+    0x8AFB, 0x33DE,
+    0x8AD2, 0x3382,
+    0x8AAA, 0x3326,
+    0x8A82, 0x32CA,
+    0x8A5A, 0x326E,
+    0x8A33, 0x3211,
+    0x8A0B, 0x31B5,
+    0x89E4, 0x3158,
+    0x89BE, 0x30FB,
+    0x8997, 0x309E,
+    0x8971, 0x3041,
+    0x894C, 0x2FE4,
+    0x8926, 0x2F87,
+    0x8901, 0x2F29,
+    0x88DC, 0x2ECC,
+    0x88B8, 0x2E6E,
+    0x8893, 0x2E11,
+    0x886F, 0x2DB3,
+    0x884B, 0x2D55,
+    0x8828, 0x2CF7,
+    0x8805, 0x2C98,
+    0x87E2, 0x2C3A,
+    0x87BF, 0x2BDC,
+    0x879D, 0x2B7D,
+    0x877B, 0x2B1F,
+    0x8759, 0x2AC0,
+    0x8738, 0x2A61,
+    0x8717, 0x2A02,
+    0x86F6, 0x29A3,
+    0x86D5, 0x2944,
+    0x86B5, 0x28E5,
+    0x8695, 0x2886,
+    0x8675, 0x2826,
+    0x8656, 0x27C7,
+    0x8637, 0x2767,
+    0x8618, 0x2707,
+    0x85FA, 0x26A8,
+    0x85DB, 0x2648,
+    0x85BD, 0x25E8,
+    0x85A0, 0x2588,
+    0x8582, 0x2528,
+    0x8565, 0x24C7,
+    0x8549, 0x2467,
+    0x852C, 0x2407,
+    0x8510, 0x23A6,
+    0x84F4, 0x2345,
+    0x84D9, 0x22E5,
+    0x84BD, 0x2284,
+    0x84A2, 0x2223,
+    0x8488, 0x21C2,
+    0x846D, 0x2161,
+    0x8453, 0x2100,
+    0x843A, 0x209F,
+    0x8420, 0x203E,
+    0x8407, 0x1FDC,
+    0x83EE, 0x1F7B,
+    0x83D6, 0x1F19,
+    0x83BD, 0x1EB8,
+    0x83A5, 0x1E56,
+    0x838E, 0x1DF5,
+    0x8376, 0x1D93,
+    0x835F, 0x1D31,
+    0x8348, 0x1CCF,
+    0x8332, 0x1C6D,
+    0x831C, 0x1C0B,
+    0x8306, 0x1BA9,
+    0x82F0, 0x1B47,
+    0x82DB, 0x1AE4,
+    0x82C6, 0x1A82,
+    0x82B1, 0x1A20,
+    0x829D, 0x19BD,
+    0x8289, 0x195B,
+    0x8275, 0x18F8,
+    0x8262, 0x1896,
+    0x824F, 0x1833,
+    0x823C, 0x17D0,
+    0x8229, 0x176D,
+    0x8217, 0x170A,
+    0x8205, 0x16A8,
+    0x81F3, 0x1645,
+    0x81E2, 0x15E2,
+    0x81D1, 0x157F,
+    0x81C0, 0x151B,
+    0x81B0, 0x14B8,
+    0x81A0, 0x1455,
+    0x8190, 0x13F2,
+    0x8180, 0x138E,
+    0x8171, 0x132B,
+    0x8162, 0x12C8,
+    0x8154, 0x1264,
+    0x8145, 0x1201,
+    0x8137, 0x119D,
+    0x812A, 0x1139,
+    0x811C, 0x10D6,
+    0x810F, 0x1072,
+    0x8102, 0x100E,
+    0x80F6, 0x0FAB,
+    0x80EA, 0x0F47,
+    0x80DE, 0x0EE3,
+    0x80D2, 0x0E7F,
+    0x80C7, 0x0E1B,
+    0x80BC, 0x0DB7,
+    0x80B2, 0x0D53,
+    0x80A7, 0x0CEF,
+    0x809D, 0x0C8B,
+    0x8094, 0x0C27,
+    0x808A, 0x0BC3,
+    0x8081, 0x0B5F,
+    0x8078, 0x0AFB,
+    0x8070, 0x0A97,
+    0x8068, 0x0A33,
+    0x8060, 0x09CE,
+    0x8058, 0x096A,
+    0x8051, 0x0906,
+    0x804A, 0x08A2,
+    0x8043, 0x083D,
+    0x803D, 0x07D9,
+    0x8037, 0x0775,
+    0x8031, 0x0710,
+    0x802C, 0x06AC,
+    0x8027, 0x0647,
+    0x8022, 0x05E3,
+    0x801E, 0x057F,
+    0x801A, 0x051A,
+    0x8016, 0x04B6,
+    0x8012, 0x0451,
+    0x800F, 0x03ED,
+    0x800C, 0x0388,
+    0x8009, 0x0324,
+    0x8007, 0x02BF,
+    0x8005, 0x025B,
+    0x8003, 0x01F6,
+    0x8002, 0x0192,
+    0x8001, 0x012D,
+    0x8000, 0x00C9,
+    0x8000, 0x0064,
+    0x8000, 0x0000,
+    0x8000, 0xFF9B,
+    0x8000, 0xFF36,
+    0x8001, 0xFED2,
+    0x8002, 0xFE6D,
+    0x8003, 0xFE09,
+    0x8005, 0xFDA4,
+    0x8007, 0xFD40,
+    0x8009, 0xFCDB,
+    0x800C, 0xFC77,
+    0x800F, 0xFC12,
+    0x8012, 0xFBAE,
+    0x8016, 0xFB49,
+    0x801A, 0xFAE5,
+    0x801E, 0xFA80,
+    0x8022, 0xFA1C,
+    0x8027, 0xF9B8,
+    0x802C, 0xF953,
+    0x8031, 0xF8EF,
+    0x8037, 0xF88A,
+    0x803D, 0xF826,
+    0x8043, 0xF7C2,
+    0x804A, 0xF75D,
+    0x8051, 0xF6F9,
+    0x8058, 0xF695,
+    0x8060, 0xF631,
+    0x8068, 0xF5CC,
+    0x8070, 0xF568,
+    0x8078, 0xF504,
+    0x8081, 0xF4A0,
+    0x808A, 0xF43C,
+    0x8094, 0xF3D8,
+    0x809D, 0xF374,
+    0x80A7, 0xF310,
+    0x80B2, 0xF2AC,
+    0x80BC, 0xF248,
+    0x80C7, 0xF1E4,
+    0x80D2, 0xF180,
+    0x80DE, 0xF11C,
+    0x80EA, 0xF0B8,
+    0x80F6, 0xF054,
+    0x8102, 0xEFF1,
+    0x810F, 0xEF8D,
+    0x811C, 0xEF29,
+    0x812A, 0xEEC6,
+    0x8137, 0xEE62,
+    0x8145, 0xEDFE,
+    0x8154, 0xED9B,
+    0x8162, 0xED37,
+    0x8171, 0xECD4,
+    0x8180, 0xEC71,
+    0x8190, 0xEC0D,
+    0x81A0, 0xEBAA,
+    0x81B0, 0xEB47,
+    0x81C0, 0xEAE4,
+    0x81D1, 0xEA80,
+    0x81E2, 0xEA1D,
+    0x81F3, 0xE9BA,
+    0x8205, 0xE957,
+    0x8217, 0xE8F5,
+    0x8229, 0xE892,
+    0x823C, 0xE82F,
+    0x824F, 0xE7CC,
+    0x8262, 0xE769,
+    0x8275, 0xE707,
+    0x8289, 0xE6A4,
+    0x829D, 0xE642,
+    0x82B1, 0xE5DF,
+    0x82C6, 0xE57D,
+    0x82DB, 0xE51B,
+    0x82F0, 0xE4B8,
+    0x8306, 0xE456,
+    0x831C, 0xE3F4,
+    0x8332, 0xE392,
+    0x8348, 0xE330,
+    0x835F, 0xE2CE,
+    0x8376, 0xE26C,
+    0x838E, 0xE20A,
+    0x83A5, 0xE1A9,
+    0x83BD, 0xE147,
+    0x83D6, 0xE0E6,
+    0x83EE, 0xE084,
+    0x8407, 0xE023,
+    0x8420, 0xDFC1,
+    0x843A, 0xDF60,
+    0x8453, 0xDEFF,
+    0x846D, 0xDE9E,
+    0x8488, 0xDE3D,
+    0x84A2, 0xDDDC,
+    0x84BD, 0xDD7B,
+    0x84D9, 0xDD1A,
+    0x84F4, 0xDCBA,
+    0x8510, 0xDC59,
+    0x852C, 0xDBF8,
+    0x8549, 0xDB98,
+    0x8565, 0xDB38,
+    0x8582, 0xDAD7,
+    0x85A0, 0xDA77,
+    0x85BD, 0xDA17,
+    0x85DB, 0xD9B7,
+    0x85FA, 0xD957,
+    0x8618, 0xD8F8,
+    0x8637, 0xD898,
+    0x8656, 0xD838,
+    0x8675, 0xD7D9,
+    0x8695, 0xD779,
+    0x86B5, 0xD71A,
+    0x86D5, 0xD6BB,
+    0x86F6, 0xD65C,
+    0x8717, 0xD5FD,
+    0x8738, 0xD59E,
+    0x8759, 0xD53F,
+    0x877B, 0xD4E0,
+    0x879D, 0xD482,
+    0x87BF, 0xD423,
+    0x87E2, 0xD3C5,
+    0x8805, 0xD367,
+    0x8828, 0xD308,
+    0x884B, 0xD2AA,
+    0x886F, 0xD24C,
+    0x8893, 0xD1EE,
+    0x88B8, 0xD191,
+    0x88DC, 0xD133,
+    0x8901, 0xD0D6,
+    0x8926, 0xD078,
+    0x894C, 0xD01B,
+    0x8971, 0xCFBE,
+    0x8997, 0xCF61,
+    0x89BE, 0xCF04,
+    0x89E4, 0xCEA7,
+    0x8A0B, 0xCE4A,
+    0x8A33, 0xCDEE,
+    0x8A5A, 0xCD91,
+    0x8A82, 0xCD35,
+    0x8AAA, 0xCCD9,
+    0x8AD2, 0xCC7D,
+    0x8AFB, 0xCC21,
+    0x8B24, 0xCBC5,
+    0x8B4D, 0xCB69,
+    0x8B76, 0xCB0D,
+    0x8BA0, 0xCAB2,
+    0x8BCA, 0xCA57,
+    0x8BF4, 0xC9FB,
+    0x8C1F, 0xC9A0,
+    0x8C4A, 0xC945,
+    0x8C75, 0xC8EB,
+    0x8CA0, 0xC890,
+    0x8CCC, 0xC835,
+    0x8CF8, 0xC7DB,
+    0x8D24, 0xC781,
+    0x8D50, 0xC727,
+    0x8D7D, 0xC6CD,
+    0x8DAA, 0xC673,
+    0x8DD8, 0xC619,
+    0x8E05, 0xC5BF,
+    0x8E33, 0xC566,
+    0x8E61, 0xC50D,
+    0x8E90, 0xC4B3,
+    0x8EBE, 0xC45A,
+    0x8EED, 0xC402,
+    0x8F1D, 0xC3A9,
+    0x8F4C, 0xC350,
+    0x8F7C, 0xC2F8,
+    0x8FAC, 0xC29F,
+    0x8FDC, 0xC247,
+    0x900D, 0xC1EF,
+    0x903E, 0xC197,
+    0x906F, 0xC140,
+    0x90A0, 0xC0E8,
+    0x90D2, 0xC091,
+    0x9104, 0xC03A,
+    0x9136, 0xBFE2,
+    0x9169, 0xBF8C,
+    0x919C, 0xBF35,
+    0x91CF, 0xBEDE,
+    0x9202, 0xBE88,
+    0x9235, 0xBE31,
+    0x9269, 0xBDDB,
+    0x929D, 0xBD85,
+    0x92D2, 0xBD2F,
+    0x9306, 0xBCDA,
+    0x933B, 0xBC84,
+    0x9370, 0xBC2F,
+    0x93A6, 0xBBDA,
+    0x93DB, 0xBB85,
+    0x9411, 0xBB30,
+    0x9447, 0xBADB,
+    0x947E, 0xBA87,
+    0x94B5, 0xBA32,
+    0x94EC, 0xB9DE,
+    0x9523, 0xB98A,
+    0x955A, 0xB936,
+    0x9592, 0xB8E3,
+    0x95CA, 0xB88F,
+    0x9602, 0xB83C,
+    0x963B, 0xB7E9,
+    0x9673, 0xB796,
+    0x96AC, 0xB743,
+    0x96E6, 0xB6F0,
+    0x971F, 0xB69E,
+    0x9759, 0xB64B,
+    0x9793, 0xB5F9,
+    0x97CD, 0xB5A7,
+    0x9808, 0xB556,
+    0x9842, 0xB504,
+    0x987D, 0xB4B3,
+    0x98B9, 0xB461,
+    0x98F4, 0xB410,
+    0x9930, 0xB3C0,
+    0x996C, 0xB36F,
+    0x99A8, 0xB31E,
+    0x99E5, 0xB2CE,
+    0x9A22, 0xB27E,
+    0x9A5F, 0xB22E,
+    0x9A9C, 0xB1DE,
+    0x9AD9, 0xB18F,
+    0x9B17, 0xB140,
+    0x9B55, 0xB0F0,
+    0x9B93, 0xB0A1,
+    0x9BD2, 0xB053,
+    0x9C10, 0xB004,
+    0x9C4F, 0xAFB6,
+    0x9C8E, 0xAF68,
+    0x9CCE, 0xAF1A,
+    0x9D0D, 0xAECC,
+    0x9D4D, 0xAE7E,
+    0x9D8E, 0xAE31,
+    0x9DCE, 0xADE3,
+    0x9E0E, 0xAD96,
+    0x9E4F, 0xAD4A,
+    0x9E90, 0xACFD,
+    0x9ED2, 0xACB1,
+    0x9F13, 0xAC64,
+    0x9F55, 0xAC18,
+    0x9F97, 0xABCC,
+    0x9FD9, 0xAB81,
+    0xA01C, 0xAB35,
+    0xA05F, 0xAAEA,
+    0xA0A1, 0xAA9F,
+    0xA0E5, 0xAA54,
+    0xA128, 0xAA0A,
+    0xA16C, 0xA9BF,
+    0xA1AF, 0xA975,
+    0xA1F4, 0xA92B,
+    0xA238, 0xA8E2,
+    0xA27C, 0xA898,
+    0xA2C1, 0xA84F,
+    0xA306, 0xA806,
+    0xA34B, 0xA7BD,
+    0xA391, 0xA774,
+    0xA3D6, 0xA72B,
+    0xA41C, 0xA6E3,
+    0xA462, 0xA69B,
+    0xA4A9, 0xA653,
+    0xA4EF, 0xA60C,
+    0xA536, 0xA5C4,
+    0xA57D, 0xA57D,
+    0xA5C4, 0xA536,
+    0xA60C, 0xA4EF,
+    0xA653, 0xA4A9,
+    0xA69B, 0xA462,
+    0xA6E3, 0xA41C,
+    0xA72B, 0xA3D6,
+    0xA774, 0xA391,
+    0xA7BD, 0xA34B,
+    0xA806, 0xA306,
+    0xA84F, 0xA2C1,
+    0xA898, 0xA27C,
+    0xA8E2, 0xA238,
+    0xA92B, 0xA1F4,
+    0xA975, 0xA1AF,
+    0xA9BF, 0xA16C,
+    0xAA0A, 0xA128,
+    0xAA54, 0xA0E5,
+    0xAA9F, 0xA0A1,
+    0xAAEA, 0xA05F,
+    0xAB35, 0xA01C,
+    0xAB81, 0x9FD9,
+    0xABCC, 0x9F97,
+    0xAC18, 0x9F55,
+    0xAC64, 0x9F13,
+    0xACB1, 0x9ED2,
+    0xACFD, 0x9E90,
+    0xAD4A, 0x9E4F,
+    0xAD96, 0x9E0E,
+    0xADE3, 0x9DCE,
+    0xAE31, 0x9D8E,
+    0xAE7E, 0x9D4D,
+    0xAECC, 0x9D0D,
+    0xAF1A, 0x9CCE,
+    0xAF68, 0x9C8E,
+    0xAFB6, 0x9C4F,
+    0xB004, 0x9C10,
+    0xB053, 0x9BD2,
+    0xB0A1, 0x9B93,
+    0xB0F0, 0x9B55,
+    0xB140, 0x9B17,
+    0xB18F, 0x9AD9,
+    0xB1DE, 0x9A9C,
+    0xB22E, 0x9A5F,
+    0xB27E, 0x9A22,
+    0xB2CE, 0x99E5,
+    0xB31E, 0x99A8,
+    0xB36F, 0x996C,
+    0xB3C0, 0x9930,
+    0xB410, 0x98F4,
+    0xB461, 0x98B9,
+    0xB4B3, 0x987D,
+    0xB504, 0x9842,
+    0xB556, 0x9808,
+    0xB5A7, 0x97CD,
+    0xB5F9, 0x9793,
+    0xB64B, 0x9759,
+    0xB69E, 0x971F,
+    0xB6F0, 0x96E6,
+    0xB743, 0x96AC,
+    0xB796, 0x9673,
+    0xB7E9, 0x963B,
+    0xB83C, 0x9602,
+    0xB88F, 0x95CA,
+    0xB8E3, 0x9592,
+    0xB936, 0x955A,
+    0xB98A, 0x9523,
+    0xB9DE, 0x94EC,
+    0xBA32, 0x94B5,
+    0xBA87, 0x947E,
+    0xBADB, 0x9447,
+    0xBB30, 0x9411,
+    0xBB85, 0x93DB,
+    0xBBDA, 0x93A6,
+    0xBC2F, 0x9370,
+    0xBC84, 0x933B,
+    0xBCDA, 0x9306,
+    0xBD2F, 0x92D2,
+    0xBD85, 0x929D,
+    0xBDDB, 0x9269,
+    0xBE31, 0x9235,
+    0xBE88, 0x9202,
+    0xBEDE, 0x91CF,
+    0xBF35, 0x919C,
+    0xBF8C, 0x9169,
+    0xBFE2, 0x9136,
+    0xC03A, 0x9104,
+    0xC091, 0x90D2,
+    0xC0E8, 0x90A0,
+    0xC140, 0x906F,
+    0xC197, 0x903E,
+    0xC1EF, 0x900D,
+    0xC247, 0x8FDC,
+    0xC29F, 0x8FAC,
+    0xC2F8, 0x8F7C,
+    0xC350, 0x8F4C,
+    0xC3A9, 0x8F1D,
+    0xC402, 0x8EED,
+    0xC45A, 0x8EBE,
+    0xC4B3, 0x8E90,
+    0xC50D, 0x8E61,
+    0xC566, 0x8E33,
+    0xC5BF, 0x8E05,
+    0xC619, 0x8DD8,
+    0xC673, 0x8DAA,
+    0xC6CD, 0x8D7D,
+    0xC727, 0x8D50,
+    0xC781, 0x8D24,
+    0xC7DB, 0x8CF8,
+    0xC835, 0x8CCC,
+    0xC890, 0x8CA0,
+    0xC8EB, 0x8C75,
+    0xC945, 0x8C4A,
+    0xC9A0, 0x8C1F,
+    0xC9FB, 0x8BF4,
+    0xCA57, 0x8BCA,
+    0xCAB2, 0x8BA0,
+    0xCB0D, 0x8B76,
+    0xCB69, 0x8B4D,
+    0xCBC5, 0x8B24,
+    0xCC21, 0x8AFB,
+    0xCC7D, 0x8AD2,
+    0xCCD9, 0x8AAA,
+    0xCD35, 0x8A82,
+    0xCD91, 0x8A5A,
+    0xCDEE, 0x8A33,
+    0xCE4A, 0x8A0B,
+    0xCEA7, 0x89E4,
+    0xCF04, 0x89BE,
+    0xCF61, 0x8997,
+    0xCFBE, 0x8971,
+    0xD01B, 0x894C,
+    0xD078, 0x8926,
+    0xD0D6, 0x8901,
+    0xD133, 0x88DC,
+    0xD191, 0x88B8,
+    0xD1EE, 0x8893,
+    0xD24C, 0x886F,
+    0xD2AA, 0x884B,
+    0xD308, 0x8828,
+    0xD367, 0x8805,
+    0xD3C5, 0x87E2,
+    0xD423, 0x87BF,
+    0xD482, 0x879D,
+    0xD4E0, 0x877B,
+    0xD53F, 0x8759,
+    0xD59E, 0x8738,
+    0xD5FD, 0x8717,
+    0xD65C, 0x86F6,
+    0xD6BB, 0x86D5,
+    0xD71A, 0x86B5,
+    0xD779, 0x8695,
+    0xD7D9, 0x8675,
+    0xD838, 0x8656,
+    0xD898, 0x8637,
+    0xD8F8, 0x8618,
+    0xD957, 0x85FA,
+    0xD9B7, 0x85DB,
+    0xDA17, 0x85BD,
+    0xDA77, 0x85A0,
+    0xDAD7, 0x8582,
+    0xDB38, 0x8565,
+    0xDB98, 0x8549,
+    0xDBF8, 0x852C,
+    0xDC59, 0x8510,
+    0xDCBA, 0x84F4,
+    0xDD1A, 0x84D9,
+    0xDD7B, 0x84BD,
+    0xDDDC, 0x84A2,
+    0xDE3D, 0x8488,
+    0xDE9E, 0x846D,
+    0xDEFF, 0x8453,
+    0xDF60, 0x843A,
+    0xDFC1, 0x8420,
+    0xE023, 0x8407,
+    0xE084, 0x83EE,
+    0xE0E6, 0x83D6,
+    0xE147, 0x83BD,
+    0xE1A9, 0x83A5,
+    0xE20A, 0x838E,
+    0xE26C, 0x8376,
+    0xE2CE, 0x835F,
+    0xE330, 0x8348,
+    0xE392, 0x8332,
+    0xE3F4, 0x831C,
+    0xE456, 0x8306,
+    0xE4B8, 0x82F0,
+    0xE51B, 0x82DB,
+    0xE57D, 0x82C6,
+    0xE5DF, 0x82B1,
+    0xE642, 0x829D,
+    0xE6A4, 0x8289,
+    0xE707, 0x8275,
+    0xE769, 0x8262,
+    0xE7CC, 0x824F,
+    0xE82F, 0x823C,
+    0xE892, 0x8229,
+    0xE8F5, 0x8217,
+    0xE957, 0x8205,
+    0xE9BA, 0x81F3,
+    0xEA1D, 0x81E2,
+    0xEA80, 0x81D1,
+    0xEAE4, 0x81C0,
+    0xEB47, 0x81B0,
+    0xEBAA, 0x81A0,
+    0xEC0D, 0x8190,
+    0xEC71, 0x8180,
+    0xECD4, 0x8171,
+    0xED37, 0x8162,
+    0xED9B, 0x8154,
+    0xEDFE, 0x8145,
+    0xEE62, 0x8137,
+    0xEEC6, 0x812A,
+    0xEF29, 0x811C,
+    0xEF8D, 0x810F,
+    0xEFF1, 0x8102,
+    0xF054, 0x80F6,
+    0xF0B8, 0x80EA,
+    0xF11C, 0x80DE,
+    0xF180, 0x80D2,
+    0xF1E4, 0x80C7,
+    0xF248, 0x80BC,
+    0xF2AC, 0x80B2,
+    0xF310, 0x80A7,
+    0xF374, 0x809D,
+    0xF3D8, 0x8094,
+    0xF43C, 0x808A,
+    0xF4A0, 0x8081,
+    0xF504, 0x8078,
+    0xF568, 0x8070,
+    0xF5CC, 0x8068,
+    0xF631, 0x8060,
+    0xF695, 0x8058,
+    0xF6F9, 0x8051,
+    0xF75D, 0x804A,
+    0xF7C2, 0x8043,
+    0xF826, 0x803D,
+    0xF88A, 0x8037,
+    0xF8EF, 0x8031,
+    0xF953, 0x802C,
+    0xF9B8, 0x8027,
+    0xFA1C, 0x8022,
+    0xFA80, 0x801E,
+    0xFAE5, 0x801A,
+    0xFB49, 0x8016,
+    0xFBAE, 0x8012,
+    0xFC12, 0x800F,
+    0xFC77, 0x800C,
+    0xFCDB, 0x8009,
+    0xFD40, 0x8007,
+    0xFDA4, 0x8005,
+    0xFE09, 0x8003,
+    0xFE6D, 0x8002,
+    0xFED2, 0x8001,
+    0xFF36, 0x8000,
+    0xFF9B, 0x8000
+};
+
+/**    
+* \par   
+* Example code for q15 Twiddle factors Generation::    
+* \par    
+* <pre>for(i = 0; i< 3N/4; i++)    
+* {    
+*    twiddleCoefq15[2*i]= cos(i * 2*PI/(float)N);    
+*    twiddleCoefq15[2*i+1]= sin(i * 2*PI/(float)N);    
+* } </pre>    
+* \par    
+* where N = 4096	and PI = 3.14159265358979    
+* \par    
+* Cos and Sin values are interleaved fashion    
+* \par    
+* Convert Floating point to q15(Fixed point 1.15):    
+*	round(twiddleCoefq15(i) * pow(2, 15))    
+*    
+*/
+const q15_t twiddleCoef_4096_q15[6144] = 
+{
+    0x7FFF, 0x0000,
+    0x7FFF, 0x0032,
+    0x7FFF, 0x0064,
+    0x7FFF, 0x0096,
+    0x7FFF, 0x00C9,
+    0x7FFF, 0x00FB,
+    0x7FFE, 0x012D,
+    0x7FFE, 0x015F,
+    0x7FFD, 0x0192,
+    0x7FFC, 0x01C4,
+    0x7FFC, 0x01F6,
+    0x7FFB, 0x0228,
+    0x7FFA, 0x025B,
+    0x7FF9, 0x028D,
+    0x7FF8, 0x02BF,
+    0x7FF7, 0x02F1,
+    0x7FF6, 0x0324,
+    0x7FF4, 0x0356,
+    0x7FF3, 0x0388,
+    0x7FF2, 0x03BA,
+    0x7FF0, 0x03ED,
+    0x7FEE, 0x041F,
+    0x7FED, 0x0451,
+    0x7FEB, 0x0483,
+    0x7FE9, 0x04B6,
+    0x7FE7, 0x04E8,
+    0x7FE5, 0x051A,
+    0x7FE3, 0x054C,
+    0x7FE1, 0x057F,
+    0x7FDF, 0x05B1,
+    0x7FDD, 0x05E3,
+    0x7FDA, 0x0615,
+    0x7FD8, 0x0647,
+    0x7FD6, 0x067A,
+    0x7FD3, 0x06AC,
+    0x7FD0, 0x06DE,
+    0x7FCE, 0x0710,
+    0x7FCB, 0x0742,
+    0x7FC8, 0x0775,
+    0x7FC5, 0x07A7,
+    0x7FC2, 0x07D9,
+    0x7FBF, 0x080B,
+    0x7FBC, 0x083D,
+    0x7FB8, 0x086F,
+    0x7FB5, 0x08A2,
+    0x7FB1, 0x08D4,
+    0x7FAE, 0x0906,
+    0x7FAA, 0x0938,
+    0x7FA7, 0x096A,
+    0x7FA3, 0x099C,
+    0x7F9F, 0x09CE,
+    0x7F9B, 0x0A00,
+    0x7F97, 0x0A33,
+    0x7F93, 0x0A65,
+    0x7F8F, 0x0A97,
+    0x7F8B, 0x0AC9,
+    0x7F87, 0x0AFB,
+    0x7F82, 0x0B2D,
+    0x7F7E, 0x0B5F,
+    0x7F79, 0x0B91,
+    0x7F75, 0x0BC3,
+    0x7F70, 0x0BF5,
+    0x7F6B, 0x0C27,
+    0x7F67, 0x0C59,
+    0x7F62, 0x0C8B,
+    0x7F5D, 0x0CBD,
+    0x7F58, 0x0CEF,
+    0x7F53, 0x0D21,
+    0x7F4D, 0x0D53,
+    0x7F48, 0x0D85,
+    0x7F43, 0x0DB7,
+    0x7F3D, 0x0DE9,
+    0x7F38, 0x0E1B,
+    0x7F32, 0x0E4D,
+    0x7F2D, 0x0E7F,
+    0x7F27, 0x0EB1,
+    0x7F21, 0x0EE3,
+    0x7F1B, 0x0F15,
+    0x7F15, 0x0F47,
+    0x7F0F, 0x0F79,
+    0x7F09, 0x0FAB,
+    0x7F03, 0x0FDD,
+    0x7EFD, 0x100E,
+    0x7EF6, 0x1040,
+    0x7EF0, 0x1072,
+    0x7EE9, 0x10A4,
+    0x7EE3, 0x10D6,
+    0x7EDC, 0x1108,
+    0x7ED5, 0x1139,
+    0x7ECF, 0x116B,
+    0x7EC8, 0x119D,
+    0x7EC1, 0x11CF,
+    0x7EBA, 0x1201,
+    0x7EB3, 0x1232,
+    0x7EAB, 0x1264,
+    0x7EA4, 0x1296,
+    0x7E9D, 0x12C8,
+    0x7E95, 0x12F9,
+    0x7E8E, 0x132B,
+    0x7E86, 0x135D,
+    0x7E7F, 0x138E,
+    0x7E77, 0x13C0,
+    0x7E6F, 0x13F2,
+    0x7E67, 0x1423,
+    0x7E5F, 0x1455,
+    0x7E57, 0x1487,
+    0x7E4F, 0x14B8,
+    0x7E47, 0x14EA,
+    0x7E3F, 0x151B,
+    0x7E37, 0x154D,
+    0x7E2E, 0x157F,
+    0x7E26, 0x15B0,
+    0x7E1D, 0x15E2,
+    0x7E14, 0x1613,
+    0x7E0C, 0x1645,
+    0x7E03, 0x1676,
+    0x7DFA, 0x16A8,
+    0x7DF1, 0x16D9,
+    0x7DE8, 0x170A,
+    0x7DDF, 0x173C,
+    0x7DD6, 0x176D,
+    0x7DCD, 0x179F,
+    0x7DC3, 0x17D0,
+    0x7DBA, 0x1802,
+    0x7DB0, 0x1833,
+    0x7DA7, 0x1864,
+    0x7D9D, 0x1896,
+    0x7D94, 0x18C7,
+    0x7D8A, 0x18F8,
+    0x7D80, 0x192A,
+    0x7D76, 0x195B,
+    0x7D6C, 0x198C,
+    0x7D62, 0x19BD,
+    0x7D58, 0x19EF,
+    0x7D4E, 0x1A20,
+    0x7D43, 0x1A51,
+    0x7D39, 0x1A82,
+    0x7D2F, 0x1AB3,
+    0x7D24, 0x1AE4,
+    0x7D19, 0x1B16,
+    0x7D0F, 0x1B47,
+    0x7D04, 0x1B78,
+    0x7CF9, 0x1BA9,
+    0x7CEE, 0x1BDA,
+    0x7CE3, 0x1C0B,
+    0x7CD8, 0x1C3C,
+    0x7CCD, 0x1C6D,
+    0x7CC2, 0x1C9E,
+    0x7CB7, 0x1CCF,
+    0x7CAB, 0x1D00,
+    0x7CA0, 0x1D31,
+    0x7C94, 0x1D62,
+    0x7C89, 0x1D93,
+    0x7C7D, 0x1DC4,
+    0x7C71, 0x1DF5,
+    0x7C66, 0x1E25,
+    0x7C5A, 0x1E56,
+    0x7C4E, 0x1E87,
+    0x7C42, 0x1EB8,
+    0x7C36, 0x1EE9,
+    0x7C29, 0x1F19,
+    0x7C1D, 0x1F4A,
+    0x7C11, 0x1F7B,
+    0x7C05, 0x1FAC,
+    0x7BF8, 0x1FDC,
+    0x7BEB, 0x200D,
+    0x7BDF, 0x203E,
+    0x7BD2, 0x206E,
+    0x7BC5, 0x209F,
+    0x7BB9, 0x20D0,
+    0x7BAC, 0x2100,
+    0x7B9F, 0x2131,
+    0x7B92, 0x2161,
+    0x7B84, 0x2192,
+    0x7B77, 0x21C2,
+    0x7B6A, 0x21F3,
+    0x7B5D, 0x2223,
+    0x7B4F, 0x2254,
+    0x7B42, 0x2284,
+    0x7B34, 0x22B4,
+    0x7B26, 0x22E5,
+    0x7B19, 0x2315,
+    0x7B0B, 0x2345,
+    0x7AFD, 0x2376,
+    0x7AEF, 0x23A6,
+    0x7AE1, 0x23D6,
+    0x7AD3, 0x2407,
+    0x7AC5, 0x2437,
+    0x7AB6, 0x2467,
+    0x7AA8, 0x2497,
+    0x7A9A, 0x24C7,
+    0x7A8B, 0x24F7,
+    0x7A7D, 0x2528,
+    0x7A6E, 0x2558,
+    0x7A5F, 0x2588,
+    0x7A50, 0x25B8,
+    0x7A42, 0x25E8,
+    0x7A33, 0x2618,
+    0x7A24, 0x2648,
+    0x7A15, 0x2678,
+    0x7A05, 0x26A8,
+    0x79F6, 0x26D8,
+    0x79E7, 0x2707,
+    0x79D8, 0x2737,
+    0x79C8, 0x2767,
+    0x79B9, 0x2797,
+    0x79A9, 0x27C7,
+    0x7999, 0x27F6,
+    0x798A, 0x2826,
+    0x797A, 0x2856,
+    0x796A, 0x2886,
+    0x795A, 0x28B5,
+    0x794A, 0x28E5,
+    0x793A, 0x2915,
+    0x792A, 0x2944,
+    0x7919, 0x2974,
+    0x7909, 0x29A3,
+    0x78F9, 0x29D3,
+    0x78E8, 0x2A02,
+    0x78D8, 0x2A32,
+    0x78C7, 0x2A61,
+    0x78B6, 0x2A91,
+    0x78A6, 0x2AC0,
+    0x7895, 0x2AEF,
+    0x7884, 0x2B1F,
+    0x7873, 0x2B4E,
+    0x7862, 0x2B7D,
+    0x7851, 0x2BAD,
+    0x7840, 0x2BDC,
+    0x782E, 0x2C0B,
+    0x781D, 0x2C3A,
+    0x780C, 0x2C69,
+    0x77FA, 0x2C98,
+    0x77E9, 0x2CC8,
+    0x77D7, 0x2CF7,
+    0x77C5, 0x2D26,
+    0x77B4, 0x2D55,
+    0x77A2, 0x2D84,
+    0x7790, 0x2DB3,
+    0x777E, 0x2DE2,
+    0x776C, 0x2E11,
+    0x775A, 0x2E3F,
+    0x7747, 0x2E6E,
+    0x7735, 0x2E9D,
+    0x7723, 0x2ECC,
+    0x7710, 0x2EFB,
+    0x76FE, 0x2F29,
+    0x76EB, 0x2F58,
+    0x76D9, 0x2F87,
+    0x76C6, 0x2FB5,
+    0x76B3, 0x2FE4,
+    0x76A0, 0x3013,
+    0x768E, 0x3041,
+    0x767B, 0x3070,
+    0x7668, 0x309E,
+    0x7654, 0x30CD,
+    0x7641, 0x30FB,
+    0x762E, 0x312A,
+    0x761B, 0x3158,
+    0x7607, 0x3186,
+    0x75F4, 0x31B5,
+    0x75E0, 0x31E3,
+    0x75CC, 0x3211,
+    0x75B9, 0x3240,
+    0x75A5, 0x326E,
+    0x7591, 0x329C,
+    0x757D, 0x32CA,
+    0x7569, 0x32F8,
+    0x7555, 0x3326,
+    0x7541, 0x3354,
+    0x752D, 0x3382,
+    0x7519, 0x33B0,
+    0x7504, 0x33DE,
+    0x74F0, 0x340C,
+    0x74DB, 0x343A,
+    0x74C7, 0x3468,
+    0x74B2, 0x3496,
+    0x749E, 0x34C4,
+    0x7489, 0x34F2,
+    0x7474, 0x351F,
+    0x745F, 0x354D,
+    0x744A, 0x357B,
+    0x7435, 0x35A8,
+    0x7420, 0x35D6,
+    0x740B, 0x3604,
+    0x73F6, 0x3631,
+    0x73E0, 0x365F,
+    0x73CB, 0x368C,
+    0x73B5, 0x36BA,
+    0x73A0, 0x36E7,
+    0x738A, 0x3714,
+    0x7375, 0x3742,
+    0x735F, 0x376F,
+    0x7349, 0x379C,
+    0x7333, 0x37CA,
+    0x731D, 0x37F7,
+    0x7307, 0x3824,
+    0x72F1, 0x3851,
+    0x72DB, 0x387E,
+    0x72C5, 0x38AB,
+    0x72AF, 0x38D8,
+    0x7298, 0x3906,
+    0x7282, 0x3932,
+    0x726B, 0x395F,
+    0x7255, 0x398C,
+    0x723E, 0x39B9,
+    0x7227, 0x39E6,
+    0x7211, 0x3A13,
+    0x71FA, 0x3A40,
+    0x71E3, 0x3A6C,
+    0x71CC, 0x3A99,
+    0x71B5, 0x3AC6,
+    0x719E, 0x3AF2,
+    0x7186, 0x3B1F,
+    0x716F, 0x3B4C,
+    0x7158, 0x3B78,
+    0x7141, 0x3BA5,
+    0x7129, 0x3BD1,
+    0x7112, 0x3BFD,
+    0x70FA, 0x3C2A,
+    0x70E2, 0x3C56,
+    0x70CB, 0x3C83,
+    0x70B3, 0x3CAF,
+    0x709B, 0x3CDB,
+    0x7083, 0x3D07,
+    0x706B, 0x3D33,
+    0x7053, 0x3D60,
+    0x703B, 0x3D8C,
+    0x7023, 0x3DB8,
+    0x700A, 0x3DE4,
+    0x6FF2, 0x3E10,
+    0x6FDA, 0x3E3C,
+    0x6FC1, 0x3E68,
+    0x6FA9, 0x3E93,
+    0x6F90, 0x3EBF,
+    0x6F77, 0x3EEB,
+    0x6F5F, 0x3F17,
+    0x6F46, 0x3F43,
+    0x6F2D, 0x3F6E,
+    0x6F14, 0x3F9A,
+    0x6EFB, 0x3FC5,
+    0x6EE2, 0x3FF1,
+    0x6EC9, 0x401D,
+    0x6EAF, 0x4048,
+    0x6E96, 0x4073,
+    0x6E7D, 0x409F,
+    0x6E63, 0x40CA,
+    0x6E4A, 0x40F6,
+    0x6E30, 0x4121,
+    0x6E17, 0x414C,
+    0x6DFD, 0x4177,
+    0x6DE3, 0x41A2,
+    0x6DCA, 0x41CE,
+    0x6DB0, 0x41F9,
+    0x6D96, 0x4224,
+    0x6D7C, 0x424F,
+    0x6D62, 0x427A,
+    0x6D48, 0x42A5,
+    0x6D2D, 0x42D0,
+    0x6D13, 0x42FA,
+    0x6CF9, 0x4325,
+    0x6CDE, 0x4350,
+    0x6CC4, 0x437B,
+    0x6CA9, 0x43A5,
+    0x6C8F, 0x43D0,
+    0x6C74, 0x43FB,
+    0x6C59, 0x4425,
+    0x6C3F, 0x4450,
+    0x6C24, 0x447A,
+    0x6C09, 0x44A5,
+    0x6BEE, 0x44CF,
+    0x6BD3, 0x44FA,
+    0x6BB8, 0x4524,
+    0x6B9C, 0x454E,
+    0x6B81, 0x4578,
+    0x6B66, 0x45A3,
+    0x6B4A, 0x45CD,
+    0x6B2F, 0x45F7,
+    0x6B13, 0x4621,
+    0x6AF8, 0x464B,
+    0x6ADC, 0x4675,
+    0x6AC1, 0x469F,
+    0x6AA5, 0x46C9,
+    0x6A89, 0x46F3,
+    0x6A6D, 0x471C,
+    0x6A51, 0x4746,
+    0x6A35, 0x4770,
+    0x6A19, 0x479A,
+    0x69FD, 0x47C3,
+    0x69E1, 0x47ED,
+    0x69C4, 0x4816,
+    0x69A8, 0x4840,
+    0x698C, 0x4869,
+    0x696F, 0x4893,
+    0x6953, 0x48BC,
+    0x6936, 0x48E6,
+    0x6919, 0x490F,
+    0x68FD, 0x4938,
+    0x68E0, 0x4961,
+    0x68C3, 0x498A,
+    0x68A6, 0x49B4,
+    0x6889, 0x49DD,
+    0x686C, 0x4A06,
+    0x684F, 0x4A2F,
+    0x6832, 0x4A58,
+    0x6815, 0x4A81,
+    0x67F7, 0x4AA9,
+    0x67DA, 0x4AD2,
+    0x67BD, 0x4AFB,
+    0x679F, 0x4B24,
+    0x6782, 0x4B4C,
+    0x6764, 0x4B75,
+    0x6746, 0x4B9E,
+    0x6729, 0x4BC6,
+    0x670B, 0x4BEF,
+    0x66ED, 0x4C17,
+    0x66CF, 0x4C3F,
+    0x66B1, 0x4C68,
+    0x6693, 0x4C90,
+    0x6675, 0x4CB8,
+    0x6657, 0x4CE1,
+    0x6639, 0x4D09,
+    0x661A, 0x4D31,
+    0x65FC, 0x4D59,
+    0x65DD, 0x4D81,
+    0x65BF, 0x4DA9,
+    0x65A0, 0x4DD1,
+    0x6582, 0x4DF9,
+    0x6563, 0x4E21,
+    0x6545, 0x4E48,
+    0x6526, 0x4E70,
+    0x6507, 0x4E98,
+    0x64E8, 0x4EBF,
+    0x64C9, 0x4EE7,
+    0x64AA, 0x4F0F,
+    0x648B, 0x4F36,
+    0x646C, 0x4F5E,
+    0x644D, 0x4F85,
+    0x642D, 0x4FAC,
+    0x640E, 0x4FD4,
+    0x63EF, 0x4FFB,
+    0x63CF, 0x5022,
+    0x63B0, 0x5049,
+    0x6390, 0x5070,
+    0x6371, 0x5097,
+    0x6351, 0x50BF,
+    0x6331, 0x50E5,
+    0x6311, 0x510C,
+    0x62F2, 0x5133,
+    0x62D2, 0x515A,
+    0x62B2, 0x5181,
+    0x6292, 0x51A8,
+    0x6271, 0x51CE,
+    0x6251, 0x51F5,
+    0x6231, 0x521C,
+    0x6211, 0x5242,
+    0x61F1, 0x5269,
+    0x61D0, 0x528F,
+    0x61B0, 0x52B5,
+    0x618F, 0x52DC,
+    0x616F, 0x5302,
+    0x614E, 0x5328,
+    0x612D, 0x534E,
+    0x610D, 0x5375,
+    0x60EC, 0x539B,
+    0x60CB, 0x53C1,
+    0x60AA, 0x53E7,
+    0x6089, 0x540D,
+    0x6068, 0x5433,
+    0x6047, 0x5458,
+    0x6026, 0x547E,
+    0x6004, 0x54A4,
+    0x5FE3, 0x54CA,
+    0x5FC2, 0x54EF,
+    0x5FA0, 0x5515,
+    0x5F7F, 0x553A,
+    0x5F5E, 0x5560,
+    0x5F3C, 0x5585,
+    0x5F1A, 0x55AB,
+    0x5EF9, 0x55D0,
+    0x5ED7, 0x55F5,
+    0x5EB5, 0x561A,
+    0x5E93, 0x5640,
+    0x5E71, 0x5665,
+    0x5E50, 0x568A,
+    0x5E2D, 0x56AF,
+    0x5E0B, 0x56D4,
+    0x5DE9, 0x56F9,
+    0x5DC7, 0x571D,
+    0x5DA5, 0x5742,
+    0x5D83, 0x5767,
+    0x5D60, 0x578C,
+    0x5D3E, 0x57B0,
+    0x5D1B, 0x57D5,
+    0x5CF9, 0x57F9,
+    0x5CD6, 0x581E,
+    0x5CB4, 0x5842,
+    0x5C91, 0x5867,
+    0x5C6E, 0x588B,
+    0x5C4B, 0x58AF,
+    0x5C29, 0x58D4,
+    0x5C06, 0x58F8,
+    0x5BE3, 0x591C,
+    0x5BC0, 0x5940,
+    0x5B9D, 0x5964,
+    0x5B79, 0x5988,
+    0x5B56, 0x59AC,
+    0x5B33, 0x59D0,
+    0x5B10, 0x59F3,
+    0x5AEC, 0x5A17,
+    0x5AC9, 0x5A3B,
+    0x5AA5, 0x5A5E,
+    0x5A82, 0x5A82,
+    0x5A5E, 0x5AA5,
+    0x5A3B, 0x5AC9,
+    0x5A17, 0x5AEC,
+    0x59F3, 0x5B10,
+    0x59D0, 0x5B33,
+    0x59AC, 0x5B56,
+    0x5988, 0x5B79,
+    0x5964, 0x5B9D,
+    0x5940, 0x5BC0,
+    0x591C, 0x5BE3,
+    0x58F8, 0x5C06,
+    0x58D4, 0x5C29,
+    0x58AF, 0x5C4B,
+    0x588B, 0x5C6E,
+    0x5867, 0x5C91,
+    0x5842, 0x5CB4,
+    0x581E, 0x5CD6,
+    0x57F9, 0x5CF9,
+    0x57D5, 0x5D1B,
+    0x57B0, 0x5D3E,
+    0x578C, 0x5D60,
+    0x5767, 0x5D83,
+    0x5742, 0x5DA5,
+    0x571D, 0x5DC7,
+    0x56F9, 0x5DE9,
+    0x56D4, 0x5E0B,
+    0x56AF, 0x5E2D,
+    0x568A, 0x5E50,
+    0x5665, 0x5E71,
+    0x5640, 0x5E93,
+    0x561A, 0x5EB5,
+    0x55F5, 0x5ED7,
+    0x55D0, 0x5EF9,
+    0x55AB, 0x5F1A,
+    0x5585, 0x5F3C,
+    0x5560, 0x5F5E,
+    0x553A, 0x5F7F,
+    0x5515, 0x5FA0,
+    0x54EF, 0x5FC2,
+    0x54CA, 0x5FE3,
+    0x54A4, 0x6004,
+    0x547E, 0x6026,
+    0x5458, 0x6047,
+    0x5433, 0x6068,
+    0x540D, 0x6089,
+    0x53E7, 0x60AA,
+    0x53C1, 0x60CB,
+    0x539B, 0x60EC,
+    0x5375, 0x610D,
+    0x534E, 0x612D,
+    0x5328, 0x614E,
+    0x5302, 0x616F,
+    0x52DC, 0x618F,
+    0x52B5, 0x61B0,
+    0x528F, 0x61D0,
+    0x5269, 0x61F1,
+    0x5242, 0x6211,
+    0x521C, 0x6231,
+    0x51F5, 0x6251,
+    0x51CE, 0x6271,
+    0x51A8, 0x6292,
+    0x5181, 0x62B2,
+    0x515A, 0x62D2,
+    0x5133, 0x62F2,
+    0x510C, 0x6311,
+    0x50E5, 0x6331,
+    0x50BF, 0x6351,
+    0x5097, 0x6371,
+    0x5070, 0x6390,
+    0x5049, 0x63B0,
+    0x5022, 0x63CF,
+    0x4FFB, 0x63EF,
+    0x4FD4, 0x640E,
+    0x4FAC, 0x642D,
+    0x4F85, 0x644D,
+    0x4F5E, 0x646C,
+    0x4F36, 0x648B,
+    0x4F0F, 0x64AA,
+    0x4EE7, 0x64C9,
+    0x4EBF, 0x64E8,
+    0x4E98, 0x6507,
+    0x4E70, 0x6526,
+    0x4E48, 0x6545,
+    0x4E21, 0x6563,
+    0x4DF9, 0x6582,
+    0x4DD1, 0x65A0,
+    0x4DA9, 0x65BF,
+    0x4D81, 0x65DD,
+    0x4D59, 0x65FC,
+    0x4D31, 0x661A,
+    0x4D09, 0x6639,
+    0x4CE1, 0x6657,
+    0x4CB8, 0x6675,
+    0x4C90, 0x6693,
+    0x4C68, 0x66B1,
+    0x4C3F, 0x66CF,
+    0x4C17, 0x66ED,
+    0x4BEF, 0x670B,
+    0x4BC6, 0x6729,
+    0x4B9E, 0x6746,
+    0x4B75, 0x6764,
+    0x4B4C, 0x6782,
+    0x4B24, 0x679F,
+    0x4AFB, 0x67BD,
+    0x4AD2, 0x67DA,
+    0x4AA9, 0x67F7,
+    0x4A81, 0x6815,
+    0x4A58, 0x6832,
+    0x4A2F, 0x684F,
+    0x4A06, 0x686C,
+    0x49DD, 0x6889,
+    0x49B4, 0x68A6,
+    0x498A, 0x68C3,
+    0x4961, 0x68E0,
+    0x4938, 0x68FD,
+    0x490F, 0x6919,
+    0x48E6, 0x6936,
+    0x48BC, 0x6953,
+    0x4893, 0x696F,
+    0x4869, 0x698C,
+    0x4840, 0x69A8,
+    0x4816, 0x69C4,
+    0x47ED, 0x69E1,
+    0x47C3, 0x69FD,
+    0x479A, 0x6A19,
+    0x4770, 0x6A35,
+    0x4746, 0x6A51,
+    0x471C, 0x6A6D,
+    0x46F3, 0x6A89,
+    0x46C9, 0x6AA5,
+    0x469F, 0x6AC1,
+    0x4675, 0x6ADC,
+    0x464B, 0x6AF8,
+    0x4621, 0x6B13,
+    0x45F7, 0x6B2F,
+    0x45CD, 0x6B4A,
+    0x45A3, 0x6B66,
+    0x4578, 0x6B81,
+    0x454E, 0x6B9C,
+    0x4524, 0x6BB8,
+    0x44FA, 0x6BD3,
+    0x44CF, 0x6BEE,
+    0x44A5, 0x6C09,
+    0x447A, 0x6C24,
+    0x4450, 0x6C3F,
+    0x4425, 0x6C59,
+    0x43FB, 0x6C74,
+    0x43D0, 0x6C8F,
+    0x43A5, 0x6CA9,
+    0x437B, 0x6CC4,
+    0x4350, 0x6CDE,
+    0x4325, 0x6CF9,
+    0x42FA, 0x6D13,
+    0x42D0, 0x6D2D,
+    0x42A5, 0x6D48,
+    0x427A, 0x6D62,
+    0x424F, 0x6D7C,
+    0x4224, 0x6D96,
+    0x41F9, 0x6DB0,
+    0x41CE, 0x6DCA,
+    0x41A2, 0x6DE3,
+    0x4177, 0x6DFD,
+    0x414C, 0x6E17,
+    0x4121, 0x6E30,
+    0x40F6, 0x6E4A,
+    0x40CA, 0x6E63,
+    0x409F, 0x6E7D,
+    0x4073, 0x6E96,
+    0x4048, 0x6EAF,
+    0x401D, 0x6EC9,
+    0x3FF1, 0x6EE2,
+    0x3FC5, 0x6EFB,
+    0x3F9A, 0x6F14,
+    0x3F6E, 0x6F2D,
+    0x3F43, 0x6F46,
+    0x3F17, 0x6F5F,
+    0x3EEB, 0x6F77,
+    0x3EBF, 0x6F90,
+    0x3E93, 0x6FA9,
+    0x3E68, 0x6FC1,
+    0x3E3C, 0x6FDA,
+    0x3E10, 0x6FF2,
+    0x3DE4, 0x700A,
+    0x3DB8, 0x7023,
+    0x3D8C, 0x703B,
+    0x3D60, 0x7053,
+    0x3D33, 0x706B,
+    0x3D07, 0x7083,
+    0x3CDB, 0x709B,
+    0x3CAF, 0x70B3,
+    0x3C83, 0x70CB,
+    0x3C56, 0x70E2,
+    0x3C2A, 0x70FA,
+    0x3BFD, 0x7112,
+    0x3BD1, 0x7129,
+    0x3BA5, 0x7141,
+    0x3B78, 0x7158,
+    0x3B4C, 0x716F,
+    0x3B1F, 0x7186,
+    0x3AF2, 0x719E,
+    0x3AC6, 0x71B5,
+    0x3A99, 0x71CC,
+    0x3A6C, 0x71E3,
+    0x3A40, 0x71FA,
+    0x3A13, 0x7211,
+    0x39E6, 0x7227,
+    0x39B9, 0x723E,
+    0x398C, 0x7255,
+    0x395F, 0x726B,
+    0x3932, 0x7282,
+    0x3906, 0x7298,
+    0x38D8, 0x72AF,
+    0x38AB, 0x72C5,
+    0x387E, 0x72DB,
+    0x3851, 0x72F1,
+    0x3824, 0x7307,
+    0x37F7, 0x731D,
+    0x37CA, 0x7333,
+    0x379C, 0x7349,
+    0x376F, 0x735F,
+    0x3742, 0x7375,
+    0x3714, 0x738A,
+    0x36E7, 0x73A0,
+    0x36BA, 0x73B5,
+    0x368C, 0x73CB,
+    0x365F, 0x73E0,
+    0x3631, 0x73F6,
+    0x3604, 0x740B,
+    0x35D6, 0x7420,
+    0x35A8, 0x7435,
+    0x357B, 0x744A,
+    0x354D, 0x745F,
+    0x351F, 0x7474,
+    0x34F2, 0x7489,
+    0x34C4, 0x749E,
+    0x3496, 0x74B2,
+    0x3468, 0x74C7,
+    0x343A, 0x74DB,
+    0x340C, 0x74F0,
+    0x33DE, 0x7504,
+    0x33B0, 0x7519,
+    0x3382, 0x752D,
+    0x3354, 0x7541,
+    0x3326, 0x7555,
+    0x32F8, 0x7569,
+    0x32CA, 0x757D,
+    0x329C, 0x7591,
+    0x326E, 0x75A5,
+    0x3240, 0x75B9,
+    0x3211, 0x75CC,
+    0x31E3, 0x75E0,
+    0x31B5, 0x75F4,
+    0x3186, 0x7607,
+    0x3158, 0x761B,
+    0x312A, 0x762E,
+    0x30FB, 0x7641,
+    0x30CD, 0x7654,
+    0x309E, 0x7668,
+    0x3070, 0x767B,
+    0x3041, 0x768E,
+    0x3013, 0x76A0,
+    0x2FE4, 0x76B3,
+    0x2FB5, 0x76C6,
+    0x2F87, 0x76D9,
+    0x2F58, 0x76EB,
+    0x2F29, 0x76FE,
+    0x2EFB, 0x7710,
+    0x2ECC, 0x7723,
+    0x2E9D, 0x7735,
+    0x2E6E, 0x7747,
+    0x2E3F, 0x775A,
+    0x2E11, 0x776C,
+    0x2DE2, 0x777E,
+    0x2DB3, 0x7790,
+    0x2D84, 0x77A2,
+    0x2D55, 0x77B4,
+    0x2D26, 0x77C5,
+    0x2CF7, 0x77D7,
+    0x2CC8, 0x77E9,
+    0x2C98, 0x77FA,
+    0x2C69, 0x780C,
+    0x2C3A, 0x781D,
+    0x2C0B, 0x782E,
+    0x2BDC, 0x7840,
+    0x2BAD, 0x7851,
+    0x2B7D, 0x7862,
+    0x2B4E, 0x7873,
+    0x2B1F, 0x7884,
+    0x2AEF, 0x7895,
+    0x2AC0, 0x78A6,
+    0x2A91, 0x78B6,
+    0x2A61, 0x78C7,
+    0x2A32, 0x78D8,
+    0x2A02, 0x78E8,
+    0x29D3, 0x78F9,
+    0x29A3, 0x7909,
+    0x2974, 0x7919,
+    0x2944, 0x792A,
+    0x2915, 0x793A,
+    0x28E5, 0x794A,
+    0x28B5, 0x795A,
+    0x2886, 0x796A,
+    0x2856, 0x797A,
+    0x2826, 0x798A,
+    0x27F6, 0x7999,
+    0x27C7, 0x79A9,
+    0x2797, 0x79B9,
+    0x2767, 0x79C8,
+    0x2737, 0x79D8,
+    0x2707, 0x79E7,
+    0x26D8, 0x79F6,
+    0x26A8, 0x7A05,
+    0x2678, 0x7A15,
+    0x2648, 0x7A24,
+    0x2618, 0x7A33,
+    0x25E8, 0x7A42,
+    0x25B8, 0x7A50,
+    0x2588, 0x7A5F,
+    0x2558, 0x7A6E,
+    0x2528, 0x7A7D,
+    0x24F7, 0x7A8B,
+    0x24C7, 0x7A9A,
+    0x2497, 0x7AA8,
+    0x2467, 0x7AB6,
+    0x2437, 0x7AC5,
+    0x2407, 0x7AD3,
+    0x23D6, 0x7AE1,
+    0x23A6, 0x7AEF,
+    0x2376, 0x7AFD,
+    0x2345, 0x7B0B,
+    0x2315, 0x7B19,
+    0x22E5, 0x7B26,
+    0x22B4, 0x7B34,
+    0x2284, 0x7B42,
+    0x2254, 0x7B4F,
+    0x2223, 0x7B5D,
+    0x21F3, 0x7B6A,
+    0x21C2, 0x7B77,
+    0x2192, 0x7B84,
+    0x2161, 0x7B92,
+    0x2131, 0x7B9F,
+    0x2100, 0x7BAC,
+    0x20D0, 0x7BB9,
+    0x209F, 0x7BC5,
+    0x206E, 0x7BD2,
+    0x203E, 0x7BDF,
+    0x200D, 0x7BEB,
+    0x1FDC, 0x7BF8,
+    0x1FAC, 0x7C05,
+    0x1F7B, 0x7C11,
+    0x1F4A, 0x7C1D,
+    0x1F19, 0x7C29,
+    0x1EE9, 0x7C36,
+    0x1EB8, 0x7C42,
+    0x1E87, 0x7C4E,
+    0x1E56, 0x7C5A,
+    0x1E25, 0x7C66,
+    0x1DF5, 0x7C71,
+    0x1DC4, 0x7C7D,
+    0x1D93, 0x7C89,
+    0x1D62, 0x7C94,
+    0x1D31, 0x7CA0,
+    0x1D00, 0x7CAB,
+    0x1CCF, 0x7CB7,
+    0x1C9E, 0x7CC2,
+    0x1C6D, 0x7CCD,
+    0x1C3C, 0x7CD8,
+    0x1C0B, 0x7CE3,
+    0x1BDA, 0x7CEE,
+    0x1BA9, 0x7CF9,
+    0x1B78, 0x7D04,
+    0x1B47, 0x7D0F,
+    0x1B16, 0x7D19,
+    0x1AE4, 0x7D24,
+    0x1AB3, 0x7D2F,
+    0x1A82, 0x7D39,
+    0x1A51, 0x7D43,
+    0x1A20, 0x7D4E,
+    0x19EF, 0x7D58,
+    0x19BD, 0x7D62,
+    0x198C, 0x7D6C,
+    0x195B, 0x7D76,
+    0x192A, 0x7D80,
+    0x18F8, 0x7D8A,
+    0x18C7, 0x7D94,
+    0x1896, 0x7D9D,
+    0x1864, 0x7DA7,
+    0x1833, 0x7DB0,
+    0x1802, 0x7DBA,
+    0x17D0, 0x7DC3,
+    0x179F, 0x7DCD,
+    0x176D, 0x7DD6,
+    0x173C, 0x7DDF,
+    0x170A, 0x7DE8,
+    0x16D9, 0x7DF1,
+    0x16A8, 0x7DFA,
+    0x1676, 0x7E03,
+    0x1645, 0x7E0C,
+    0x1613, 0x7E14,
+    0x15E2, 0x7E1D,
+    0x15B0, 0x7E26,
+    0x157F, 0x7E2E,
+    0x154D, 0x7E37,
+    0x151B, 0x7E3F,
+    0x14EA, 0x7E47,
+    0x14B8, 0x7E4F,
+    0x1487, 0x7E57,
+    0x1455, 0x7E5F,
+    0x1423, 0x7E67,
+    0x13F2, 0x7E6F,
+    0x13C0, 0x7E77,
+    0x138E, 0x7E7F,
+    0x135D, 0x7E86,
+    0x132B, 0x7E8E,
+    0x12F9, 0x7E95,
+    0x12C8, 0x7E9D,
+    0x1296, 0x7EA4,
+    0x1264, 0x7EAB,
+    0x1232, 0x7EB3,
+    0x1201, 0x7EBA,
+    0x11CF, 0x7EC1,
+    0x119D, 0x7EC8,
+    0x116B, 0x7ECF,
+    0x1139, 0x7ED5,
+    0x1108, 0x7EDC,
+    0x10D6, 0x7EE3,
+    0x10A4, 0x7EE9,
+    0x1072, 0x7EF0,
+    0x1040, 0x7EF6,
+    0x100E, 0x7EFD,
+    0x0FDD, 0x7F03,
+    0x0FAB, 0x7F09,
+    0x0F79, 0x7F0F,
+    0x0F47, 0x7F15,
+    0x0F15, 0x7F1B,
+    0x0EE3, 0x7F21,
+    0x0EB1, 0x7F27,
+    0x0E7F, 0x7F2D,
+    0x0E4D, 0x7F32,
+    0x0E1B, 0x7F38,
+    0x0DE9, 0x7F3D,
+    0x0DB7, 0x7F43,
+    0x0D85, 0x7F48,
+    0x0D53, 0x7F4D,
+    0x0D21, 0x7F53,
+    0x0CEF, 0x7F58,
+    0x0CBD, 0x7F5D,
+    0x0C8B, 0x7F62,
+    0x0C59, 0x7F67,
+    0x0C27, 0x7F6B,
+    0x0BF5, 0x7F70,
+    0x0BC3, 0x7F75,
+    0x0B91, 0x7F79,
+    0x0B5F, 0x7F7E,
+    0x0B2D, 0x7F82,
+    0x0AFB, 0x7F87,
+    0x0AC9, 0x7F8B,
+    0x0A97, 0x7F8F,
+    0x0A65, 0x7F93,
+    0x0A33, 0x7F97,
+    0x0A00, 0x7F9B,
+    0x09CE, 0x7F9F,
+    0x099C, 0x7FA3,
+    0x096A, 0x7FA7,
+    0x0938, 0x7FAA,
+    0x0906, 0x7FAE,
+    0x08D4, 0x7FB1,
+    0x08A2, 0x7FB5,
+    0x086F, 0x7FB8,
+    0x083D, 0x7FBC,
+    0x080B, 0x7FBF,
+    0x07D9, 0x7FC2,
+    0x07A7, 0x7FC5,
+    0x0775, 0x7FC8,
+    0x0742, 0x7FCB,
+    0x0710, 0x7FCE,
+    0x06DE, 0x7FD0,
+    0x06AC, 0x7FD3,
+    0x067A, 0x7FD6,
+    0x0647, 0x7FD8,
+    0x0615, 0x7FDA,
+    0x05E3, 0x7FDD,
+    0x05B1, 0x7FDF,
+    0x057F, 0x7FE1,
+    0x054C, 0x7FE3,
+    0x051A, 0x7FE5,
+    0x04E8, 0x7FE7,
+    0x04B6, 0x7FE9,
+    0x0483, 0x7FEB,
+    0x0451, 0x7FED,
+    0x041F, 0x7FEE,
+    0x03ED, 0x7FF0,
+    0x03BA, 0x7FF2,
+    0x0388, 0x7FF3,
+    0x0356, 0x7FF4,
+    0x0324, 0x7FF6,
+    0x02F1, 0x7FF7,
+    0x02BF, 0x7FF8,
+    0x028D, 0x7FF9,
+    0x025B, 0x7FFA,
+    0x0228, 0x7FFB,
+    0x01F6, 0x7FFC,
+    0x01C4, 0x7FFC,
+    0x0192, 0x7FFD,
+    0x015F, 0x7FFE,
+    0x012D, 0x7FFE,
+    0x00FB, 0x7FFF,
+    0x00C9, 0x7FFF,
+    0x0096, 0x7FFF,
+    0x0064, 0x7FFF,
+    0x0032, 0x7FFF,
+    0x0000, 0x7FFF,
+    0xFFCD, 0x7FFF,
+    0xFF9B, 0x7FFF,
+    0xFF69, 0x7FFF,
+    0xFF36, 0x7FFF,
+    0xFF04, 0x7FFF,
+    0xFED2, 0x7FFE,
+    0xFEA0, 0x7FFE,
+    0xFE6D, 0x7FFD,
+    0xFE3B, 0x7FFC,
+    0xFE09, 0x7FFC,
+    0xFDD7, 0x7FFB,
+    0xFDA4, 0x7FFA,
+    0xFD72, 0x7FF9,
+    0xFD40, 0x7FF8,
+    0xFD0E, 0x7FF7,
+    0xFCDB, 0x7FF6,
+    0xFCA9, 0x7FF4,
+    0xFC77, 0x7FF3,
+    0xFC45, 0x7FF2,
+    0xFC12, 0x7FF0,
+    0xFBE0, 0x7FEE,
+    0xFBAE, 0x7FED,
+    0xFB7C, 0x7FEB,
+    0xFB49, 0x7FE9,
+    0xFB17, 0x7FE7,
+    0xFAE5, 0x7FE5,
+    0xFAB3, 0x7FE3,
+    0xFA80, 0x7FE1,
+    0xFA4E, 0x7FDF,
+    0xFA1C, 0x7FDD,
+    0xF9EA, 0x7FDA,
+    0xF9B8, 0x7FD8,
+    0xF985, 0x7FD6,
+    0xF953, 0x7FD3,
+    0xF921, 0x7FD0,
+    0xF8EF, 0x7FCE,
+    0xF8BD, 0x7FCB,
+    0xF88A, 0x7FC8,
+    0xF858, 0x7FC5,
+    0xF826, 0x7FC2,
+    0xF7F4, 0x7FBF,
+    0xF7C2, 0x7FBC,
+    0xF790, 0x7FB8,
+    0xF75D, 0x7FB5,
+    0xF72B, 0x7FB1,
+    0xF6F9, 0x7FAE,
+    0xF6C7, 0x7FAA,
+    0xF695, 0x7FA7,
+    0xF663, 0x7FA3,
+    0xF631, 0x7F9F,
+    0xF5FF, 0x7F9B,
+    0xF5CC, 0x7F97,
+    0xF59A, 0x7F93,
+    0xF568, 0x7F8F,
+    0xF536, 0x7F8B,
+    0xF504, 0x7F87,
+    0xF4D2, 0x7F82,
+    0xF4A0, 0x7F7E,
+    0xF46E, 0x7F79,
+    0xF43C, 0x7F75,
+    0xF40A, 0x7F70,
+    0xF3D8, 0x7F6B,
+    0xF3A6, 0x7F67,
+    0xF374, 0x7F62,
+    0xF342, 0x7F5D,
+    0xF310, 0x7F58,
+    0xF2DE, 0x7F53,
+    0xF2AC, 0x7F4D,
+    0xF27A, 0x7F48,
+    0xF248, 0x7F43,
+    0xF216, 0x7F3D,
+    0xF1E4, 0x7F38,
+    0xF1B2, 0x7F32,
+    0xF180, 0x7F2D,
+    0xF14E, 0x7F27,
+    0xF11C, 0x7F21,
+    0xF0EA, 0x7F1B,
+    0xF0B8, 0x7F15,
+    0xF086, 0x7F0F,
+    0xF054, 0x7F09,
+    0xF022, 0x7F03,
+    0xEFF1, 0x7EFD,
+    0xEFBF, 0x7EF6,
+    0xEF8D, 0x7EF0,
+    0xEF5B, 0x7EE9,
+    0xEF29, 0x7EE3,
+    0xEEF7, 0x7EDC,
+    0xEEC6, 0x7ED5,
+    0xEE94, 0x7ECF,
+    0xEE62, 0x7EC8,
+    0xEE30, 0x7EC1,
+    0xEDFE, 0x7EBA,
+    0xEDCD, 0x7EB3,
+    0xED9B, 0x7EAB,
+    0xED69, 0x7EA4,
+    0xED37, 0x7E9D,
+    0xED06, 0x7E95,
+    0xECD4, 0x7E8E,
+    0xECA2, 0x7E86,
+    0xEC71, 0x7E7F,
+    0xEC3F, 0x7E77,
+    0xEC0D, 0x7E6F,
+    0xEBDC, 0x7E67,
+    0xEBAA, 0x7E5F,
+    0xEB78, 0x7E57,
+    0xEB47, 0x7E4F,
+    0xEB15, 0x7E47,
+    0xEAE4, 0x7E3F,
+    0xEAB2, 0x7E37,
+    0xEA80, 0x7E2E,
+    0xEA4F, 0x7E26,
+    0xEA1D, 0x7E1D,
+    0xE9EC, 0x7E14,
+    0xE9BA, 0x7E0C,
+    0xE989, 0x7E03,
+    0xE957, 0x7DFA,
+    0xE926, 0x7DF1,
+    0xE8F5, 0x7DE8,
+    0xE8C3, 0x7DDF,
+    0xE892, 0x7DD6,
+    0xE860, 0x7DCD,
+    0xE82F, 0x7DC3,
+    0xE7FD, 0x7DBA,
+    0xE7CC, 0x7DB0,
+    0xE79B, 0x7DA7,
+    0xE769, 0x7D9D,
+    0xE738, 0x7D94,
+    0xE707, 0x7D8A,
+    0xE6D5, 0x7D80,
+    0xE6A4, 0x7D76,
+    0xE673, 0x7D6C,
+    0xE642, 0x7D62,
+    0xE610, 0x7D58,
+    0xE5DF, 0x7D4E,
+    0xE5AE, 0x7D43,
+    0xE57D, 0x7D39,
+    0xE54C, 0x7D2F,
+    0xE51B, 0x7D24,
+    0xE4E9, 0x7D19,
+    0xE4B8, 0x7D0F,
+    0xE487, 0x7D04,
+    0xE456, 0x7CF9,
+    0xE425, 0x7CEE,
+    0xE3F4, 0x7CE3,
+    0xE3C3, 0x7CD8,
+    0xE392, 0x7CCD,
+    0xE361, 0x7CC2,
+    0xE330, 0x7CB7,
+    0xE2FF, 0x7CAB,
+    0xE2CE, 0x7CA0,
+    0xE29D, 0x7C94,
+    0xE26C, 0x7C89,
+    0xE23B, 0x7C7D,
+    0xE20A, 0x7C71,
+    0xE1DA, 0x7C66,
+    0xE1A9, 0x7C5A,
+    0xE178, 0x7C4E,
+    0xE147, 0x7C42,
+    0xE116, 0x7C36,
+    0xE0E6, 0x7C29,
+    0xE0B5, 0x7C1D,
+    0xE084, 0x7C11,
+    0xE053, 0x7C05,
+    0xE023, 0x7BF8,
+    0xDFF2, 0x7BEB,
+    0xDFC1, 0x7BDF,
+    0xDF91, 0x7BD2,
+    0xDF60, 0x7BC5,
+    0xDF2F, 0x7BB9,
+    0xDEFF, 0x7BAC,
+    0xDECE, 0x7B9F,
+    0xDE9E, 0x7B92,
+    0xDE6D, 0x7B84,
+    0xDE3D, 0x7B77,
+    0xDE0C, 0x7B6A,
+    0xDDDC, 0x7B5D,
+    0xDDAB, 0x7B4F,
+    0xDD7B, 0x7B42,
+    0xDD4B, 0x7B34,
+    0xDD1A, 0x7B26,
+    0xDCEA, 0x7B19,
+    0xDCBA, 0x7B0B,
+    0xDC89, 0x7AFD,
+    0xDC59, 0x7AEF,
+    0xDC29, 0x7AE1,
+    0xDBF8, 0x7AD3,
+    0xDBC8, 0x7AC5,
+    0xDB98, 0x7AB6,
+    0xDB68, 0x7AA8,
+    0xDB38, 0x7A9A,
+    0xDB08, 0x7A8B,
+    0xDAD7, 0x7A7D,
+    0xDAA7, 0x7A6E,
+    0xDA77, 0x7A5F,
+    0xDA47, 0x7A50,
+    0xDA17, 0x7A42,
+    0xD9E7, 0x7A33,
+    0xD9B7, 0x7A24,
+    0xD987, 0x7A15,
+    0xD957, 0x7A05,
+    0xD927, 0x79F6,
+    0xD8F8, 0x79E7,
+    0xD8C8, 0x79D8,
+    0xD898, 0x79C8,
+    0xD868, 0x79B9,
+    0xD838, 0x79A9,
+    0xD809, 0x7999,
+    0xD7D9, 0x798A,
+    0xD7A9, 0x797A,
+    0xD779, 0x796A,
+    0xD74A, 0x795A,
+    0xD71A, 0x794A,
+    0xD6EA, 0x793A,
+    0xD6BB, 0x792A,
+    0xD68B, 0x7919,
+    0xD65C, 0x7909,
+    0xD62C, 0x78F9,
+    0xD5FD, 0x78E8,
+    0xD5CD, 0x78D8,
+    0xD59E, 0x78C7,
+    0xD56E, 0x78B6,
+    0xD53F, 0x78A6,
+    0xD510, 0x7895,
+    0xD4E0, 0x7884,
+    0xD4B1, 0x7873,
+    0xD482, 0x7862,
+    0xD452, 0x7851,
+    0xD423, 0x7840,
+    0xD3F4, 0x782E,
+    0xD3C5, 0x781D,
+    0xD396, 0x780C,
+    0xD367, 0x77FA,
+    0xD337, 0x77E9,
+    0xD308, 0x77D7,
+    0xD2D9, 0x77C5,
+    0xD2AA, 0x77B4,
+    0xD27B, 0x77A2,
+    0xD24C, 0x7790,
+    0xD21D, 0x777E,
+    0xD1EE, 0x776C,
+    0xD1C0, 0x775A,
+    0xD191, 0x7747,
+    0xD162, 0x7735,
+    0xD133, 0x7723,
+    0xD104, 0x7710,
+    0xD0D6, 0x76FE,
+    0xD0A7, 0x76EB,
+    0xD078, 0x76D9,
+    0xD04A, 0x76C6,
+    0xD01B, 0x76B3,
+    0xCFEC, 0x76A0,
+    0xCFBE, 0x768E,
+    0xCF8F, 0x767B,
+    0xCF61, 0x7668,
+    0xCF32, 0x7654,
+    0xCF04, 0x7641,
+    0xCED5, 0x762E,
+    0xCEA7, 0x761B,
+    0xCE79, 0x7607,
+    0xCE4A, 0x75F4,
+    0xCE1C, 0x75E0,
+    0xCDEE, 0x75CC,
+    0xCDBF, 0x75B9,
+    0xCD91, 0x75A5,
+    0xCD63, 0x7591,
+    0xCD35, 0x757D,
+    0xCD07, 0x7569,
+    0xCCD9, 0x7555,
+    0xCCAB, 0x7541,
+    0xCC7D, 0x752D,
+    0xCC4F, 0x7519,
+    0xCC21, 0x7504,
+    0xCBF3, 0x74F0,
+    0xCBC5, 0x74DB,
+    0xCB97, 0x74C7,
+    0xCB69, 0x74B2,
+    0xCB3B, 0x749E,
+    0xCB0D, 0x7489,
+    0xCAE0, 0x7474,
+    0xCAB2, 0x745F,
+    0xCA84, 0x744A,
+    0xCA57, 0x7435,
+    0xCA29, 0x7420,
+    0xC9FB, 0x740B,
+    0xC9CE, 0x73F6,
+    0xC9A0, 0x73E0,
+    0xC973, 0x73CB,
+    0xC945, 0x73B5,
+    0xC918, 0x73A0,
+    0xC8EB, 0x738A,
+    0xC8BD, 0x7375,
+    0xC890, 0x735F,
+    0xC863, 0x7349,
+    0xC835, 0x7333,
+    0xC808, 0x731D,
+    0xC7DB, 0x7307,
+    0xC7AE, 0x72F1,
+    0xC781, 0x72DB,
+    0xC754, 0x72C5,
+    0xC727, 0x72AF,
+    0xC6F9, 0x7298,
+    0xC6CD, 0x7282,
+    0xC6A0, 0x726B,
+    0xC673, 0x7255,
+    0xC646, 0x723E,
+    0xC619, 0x7227,
+    0xC5EC, 0x7211,
+    0xC5BF, 0x71FA,
+    0xC593, 0x71E3,
+    0xC566, 0x71CC,
+    0xC539, 0x71B5,
+    0xC50D, 0x719E,
+    0xC4E0, 0x7186,
+    0xC4B3, 0x716F,
+    0xC487, 0x7158,
+    0xC45A, 0x7141,
+    0xC42E, 0x7129,
+    0xC402, 0x7112,
+    0xC3D5, 0x70FA,
+    0xC3A9, 0x70E2,
+    0xC37C, 0x70CB,
+    0xC350, 0x70B3,
+    0xC324, 0x709B,
+    0xC2F8, 0x7083,
+    0xC2CC, 0x706B,
+    0xC29F, 0x7053,
+    0xC273, 0x703B,
+    0xC247, 0x7023,
+    0xC21B, 0x700A,
+    0xC1EF, 0x6FF2,
+    0xC1C3, 0x6FDA,
+    0xC197, 0x6FC1,
+    0xC16C, 0x6FA9,
+    0xC140, 0x6F90,
+    0xC114, 0x6F77,
+    0xC0E8, 0x6F5F,
+    0xC0BC, 0x6F46,
+    0xC091, 0x6F2D,
+    0xC065, 0x6F14,
+    0xC03A, 0x6EFB,
+    0xC00E, 0x6EE2,
+    0xBFE2, 0x6EC9,
+    0xBFB7, 0x6EAF,
+    0xBF8C, 0x6E96,
+    0xBF60, 0x6E7D,
+    0xBF35, 0x6E63,
+    0xBF09, 0x6E4A,
+    0xBEDE, 0x6E30,
+    0xBEB3, 0x6E17,
+    0xBE88, 0x6DFD,
+    0xBE5D, 0x6DE3,
+    0xBE31, 0x6DCA,
+    0xBE06, 0x6DB0,
+    0xBDDB, 0x6D96,
+    0xBDB0, 0x6D7C,
+    0xBD85, 0x6D62,
+    0xBD5A, 0x6D48,
+    0xBD2F, 0x6D2D,
+    0xBD05, 0x6D13,
+    0xBCDA, 0x6CF9,
+    0xBCAF, 0x6CDE,
+    0xBC84, 0x6CC4,
+    0xBC5A, 0x6CA9,
+    0xBC2F, 0x6C8F,
+    0xBC04, 0x6C74,
+    0xBBDA, 0x6C59,
+    0xBBAF, 0x6C3F,
+    0xBB85, 0x6C24,
+    0xBB5A, 0x6C09,
+    0xBB30, 0x6BEE,
+    0xBB05, 0x6BD3,
+    0xBADB, 0x6BB8,
+    0xBAB1, 0x6B9C,
+    0xBA87, 0x6B81,
+    0xBA5C, 0x6B66,
+    0xBA32, 0x6B4A,
+    0xBA08, 0x6B2F,
+    0xB9DE, 0x6B13,
+    0xB9B4, 0x6AF8,
+    0xB98A, 0x6ADC,
+    0xB960, 0x6AC1,
+    0xB936, 0x6AA5,
+    0xB90C, 0x6A89,
+    0xB8E3, 0x6A6D,
+    0xB8B9, 0x6A51,
+    0xB88F, 0x6A35,
+    0xB865, 0x6A19,
+    0xB83C, 0x69FD,
+    0xB812, 0x69E1,
+    0xB7E9, 0x69C4,
+    0xB7BF, 0x69A8,
+    0xB796, 0x698C,
+    0xB76C, 0x696F,
+    0xB743, 0x6953,
+    0xB719, 0x6936,
+    0xB6F0, 0x6919,
+    0xB6C7, 0x68FD,
+    0xB69E, 0x68E0,
+    0xB675, 0x68C3,
+    0xB64B, 0x68A6,
+    0xB622, 0x6889,
+    0xB5F9, 0x686C,
+    0xB5D0, 0x684F,
+    0xB5A7, 0x6832,
+    0xB57E, 0x6815,
+    0xB556, 0x67F7,
+    0xB52D, 0x67DA,
+    0xB504, 0x67BD,
+    0xB4DB, 0x679F,
+    0xB4B3, 0x6782,
+    0xB48A, 0x6764,
+    0xB461, 0x6746,
+    0xB439, 0x6729,
+    0xB410, 0x670B,
+    0xB3E8, 0x66ED,
+    0xB3C0, 0x66CF,
+    0xB397, 0x66B1,
+    0xB36F, 0x6693,
+    0xB347, 0x6675,
+    0xB31E, 0x6657,
+    0xB2F6, 0x6639,
+    0xB2CE, 0x661A,
+    0xB2A6, 0x65FC,
+    0xB27E, 0x65DD,
+    0xB256, 0x65BF,
+    0xB22E, 0x65A0,
+    0xB206, 0x6582,
+    0xB1DE, 0x6563,
+    0xB1B7, 0x6545,
+    0xB18F, 0x6526,
+    0xB167, 0x6507,
+    0xB140, 0x64E8,
+    0xB118, 0x64C9,
+    0xB0F0, 0x64AA,
+    0xB0C9, 0x648B,
+    0xB0A1, 0x646C,
+    0xB07A, 0x644D,
+    0xB053, 0x642D,
+    0xB02B, 0x640E,
+    0xB004, 0x63EF,
+    0xAFDD, 0x63CF,
+    0xAFB6, 0x63B0,
+    0xAF8F, 0x6390,
+    0xAF68, 0x6371,
+    0xAF40, 0x6351,
+    0xAF1A, 0x6331,
+    0xAEF3, 0x6311,
+    0xAECC, 0x62F2,
+    0xAEA5, 0x62D2,
+    0xAE7E, 0x62B2,
+    0xAE57, 0x6292,
+    0xAE31, 0x6271,
+    0xAE0A, 0x6251,
+    0xADE3, 0x6231,
+    0xADBD, 0x6211,
+    0xAD96, 0x61F1,
+    0xAD70, 0x61D0,
+    0xAD4A, 0x61B0,
+    0xAD23, 0x618F,
+    0xACFD, 0x616F,
+    0xACD7, 0x614E,
+    0xACB1, 0x612D,
+    0xAC8A, 0x610D,
+    0xAC64, 0x60EC,
+    0xAC3E, 0x60CB,
+    0xAC18, 0x60AA,
+    0xABF2, 0x6089,
+    0xABCC, 0x6068,
+    0xABA7, 0x6047,
+    0xAB81, 0x6026,
+    0xAB5B, 0x6004,
+    0xAB35, 0x5FE3,
+    0xAB10, 0x5FC2,
+    0xAAEA, 0x5FA0,
+    0xAAC5, 0x5F7F,
+    0xAA9F, 0x5F5E,
+    0xAA7A, 0x5F3C,
+    0xAA54, 0x5F1A,
+    0xAA2F, 0x5EF9,
+    0xAA0A, 0x5ED7,
+    0xA9E5, 0x5EB5,
+    0xA9BF, 0x5E93,
+    0xA99A, 0x5E71,
+    0xA975, 0x5E50,
+    0xA950, 0x5E2D,
+    0xA92B, 0x5E0B,
+    0xA906, 0x5DE9,
+    0xA8E2, 0x5DC7,
+    0xA8BD, 0x5DA5,
+    0xA898, 0x5D83,
+    0xA873, 0x5D60,
+    0xA84F, 0x5D3E,
+    0xA82A, 0x5D1B,
+    0xA806, 0x5CF9,
+    0xA7E1, 0x5CD6,
+    0xA7BD, 0x5CB4,
+    0xA798, 0x5C91,
+    0xA774, 0x5C6E,
+    0xA750, 0x5C4B,
+    0xA72B, 0x5C29,
+    0xA707, 0x5C06,
+    0xA6E3, 0x5BE3,
+    0xA6BF, 0x5BC0,
+    0xA69B, 0x5B9D,
+    0xA677, 0x5B79,
+    0xA653, 0x5B56,
+    0xA62F, 0x5B33,
+    0xA60C, 0x5B10,
+    0xA5E8, 0x5AEC,
+    0xA5C4, 0x5AC9,
+    0xA5A1, 0x5AA5,
+    0xA57D, 0x5A82,
+    0xA55A, 0x5A5E,
+    0xA536, 0x5A3B,
+    0xA513, 0x5A17,
+    0xA4EF, 0x59F3,
+    0xA4CC, 0x59D0,
+    0xA4A9, 0x59AC,
+    0xA486, 0x5988,
+    0xA462, 0x5964,
+    0xA43F, 0x5940,
+    0xA41C, 0x591C,
+    0xA3F9, 0x58F8,
+    0xA3D6, 0x58D4,
+    0xA3B4, 0x58AF,
+    0xA391, 0x588B,
+    0xA36E, 0x5867,
+    0xA34B, 0x5842,
+    0xA329, 0x581E,
+    0xA306, 0x57F9,
+    0xA2E4, 0x57D5,
+    0xA2C1, 0x57B0,
+    0xA29F, 0x578C,
+    0xA27C, 0x5767,
+    0xA25A, 0x5742,
+    0xA238, 0x571D,
+    0xA216, 0x56F9,
+    0xA1F4, 0x56D4,
+    0xA1D2, 0x56AF,
+    0xA1AF, 0x568A,
+    0xA18E, 0x5665,
+    0xA16C, 0x5640,
+    0xA14A, 0x561A,
+    0xA128, 0x55F5,
+    0xA106, 0x55D0,
+    0xA0E5, 0x55AB,
+    0xA0C3, 0x5585,
+    0xA0A1, 0x5560,
+    0xA080, 0x553A,
+    0xA05F, 0x5515,
+    0xA03D, 0x54EF,
+    0xA01C, 0x54CA,
+    0x9FFB, 0x54A4,
+    0x9FD9, 0x547E,
+    0x9FB8, 0x5458,
+    0x9F97, 0x5433,
+    0x9F76, 0x540D,
+    0x9F55, 0x53E7,
+    0x9F34, 0x53C1,
+    0x9F13, 0x539B,
+    0x9EF2, 0x5375,
+    0x9ED2, 0x534E,
+    0x9EB1, 0x5328,
+    0x9E90, 0x5302,
+    0x9E70, 0x52DC,
+    0x9E4F, 0x52B5,
+    0x9E2F, 0x528F,
+    0x9E0E, 0x5269,
+    0x9DEE, 0x5242,
+    0x9DCE, 0x521C,
+    0x9DAE, 0x51F5,
+    0x9D8E, 0x51CE,
+    0x9D6D, 0x51A8,
+    0x9D4D, 0x5181,
+    0x9D2D, 0x515A,
+    0x9D0D, 0x5133,
+    0x9CEE, 0x510C,
+    0x9CCE, 0x50E5,
+    0x9CAE, 0x50BF,
+    0x9C8E, 0x5097,
+    0x9C6F, 0x5070,
+    0x9C4F, 0x5049,
+    0x9C30, 0x5022,
+    0x9C10, 0x4FFB,
+    0x9BF1, 0x4FD4,
+    0x9BD2, 0x4FAC,
+    0x9BB2, 0x4F85,
+    0x9B93, 0x4F5E,
+    0x9B74, 0x4F36,
+    0x9B55, 0x4F0F,
+    0x9B36, 0x4EE7,
+    0x9B17, 0x4EBF,
+    0x9AF8, 0x4E98,
+    0x9AD9, 0x4E70,
+    0x9ABA, 0x4E48,
+    0x9A9C, 0x4E21,
+    0x9A7D, 0x4DF9,
+    0x9A5F, 0x4DD1,
+    0x9A40, 0x4DA9,
+    0x9A22, 0x4D81,
+    0x9A03, 0x4D59,
+    0x99E5, 0x4D31,
+    0x99C6, 0x4D09,
+    0x99A8, 0x4CE1,
+    0x998A, 0x4CB8,
+    0x996C, 0x4C90,
+    0x994E, 0x4C68,
+    0x9930, 0x4C3F,
+    0x9912, 0x4C17,
+    0x98F4, 0x4BEF,
+    0x98D6, 0x4BC6,
+    0x98B9, 0x4B9E,
+    0x989B, 0x4B75,
+    0x987D, 0x4B4C,
+    0x9860, 0x4B24,
+    0x9842, 0x4AFB,
+    0x9825, 0x4AD2,
+    0x9808, 0x4AA9,
+    0x97EA, 0x4A81,
+    0x97CD, 0x4A58,
+    0x97B0, 0x4A2F,
+    0x9793, 0x4A06,
+    0x9776, 0x49DD,
+    0x9759, 0x49B4,
+    0x973C, 0x498A,
+    0x971F, 0x4961,
+    0x9702, 0x4938,
+    0x96E6, 0x490F,
+    0x96C9, 0x48E6,
+    0x96AC, 0x48BC,
+    0x9690, 0x4893,
+    0x9673, 0x4869,
+    0x9657, 0x4840,
+    0x963B, 0x4816,
+    0x961E, 0x47ED,
+    0x9602, 0x47C3,
+    0x95E6, 0x479A,
+    0x95CA, 0x4770,
+    0x95AE, 0x4746,
+    0x9592, 0x471C,
+    0x9576, 0x46F3,
+    0x955A, 0x46C9,
+    0x953E, 0x469F,
+    0x9523, 0x4675,
+    0x9507, 0x464B,
+    0x94EC, 0x4621,
+    0x94D0, 0x45F7,
+    0x94B5, 0x45CD,
+    0x9499, 0x45A3,
+    0x947E, 0x4578,
+    0x9463, 0x454E,
+    0x9447, 0x4524,
+    0x942C, 0x44FA,
+    0x9411, 0x44CF,
+    0x93F6, 0x44A5,
+    0x93DB, 0x447A,
+    0x93C0, 0x4450,
+    0x93A6, 0x4425,
+    0x938B, 0x43FB,
+    0x9370, 0x43D0,
+    0x9356, 0x43A5,
+    0x933B, 0x437B,
+    0x9321, 0x4350,
+    0x9306, 0x4325,
+    0x92EC, 0x42FA,
+    0x92D2, 0x42D0,
+    0x92B7, 0x42A5,
+    0x929D, 0x427A,
+    0x9283, 0x424F,
+    0x9269, 0x4224,
+    0x924F, 0x41F9,
+    0x9235, 0x41CE,
+    0x921C, 0x41A2,
+    0x9202, 0x4177,
+    0x91E8, 0x414C,
+    0x91CF, 0x4121,
+    0x91B5, 0x40F6,
+    0x919C, 0x40CA,
+    0x9182, 0x409F,
+    0x9169, 0x4073,
+    0x9150, 0x4048,
+    0x9136, 0x401D,
+    0x911D, 0x3FF1,
+    0x9104, 0x3FC5,
+    0x90EB, 0x3F9A,
+    0x90D2, 0x3F6E,
+    0x90B9, 0x3F43,
+    0x90A0, 0x3F17,
+    0x9088, 0x3EEB,
+    0x906F, 0x3EBF,
+    0x9056, 0x3E93,
+    0x903E, 0x3E68,
+    0x9025, 0x3E3C,
+    0x900D, 0x3E10,
+    0x8FF5, 0x3DE4,
+    0x8FDC, 0x3DB8,
+    0x8FC4, 0x3D8C,
+    0x8FAC, 0x3D60,
+    0x8F94, 0x3D33,
+    0x8F7C, 0x3D07,
+    0x8F64, 0x3CDB,
+    0x8F4C, 0x3CAF,
+    0x8F34, 0x3C83,
+    0x8F1D, 0x3C56,
+    0x8F05, 0x3C2A,
+    0x8EED, 0x3BFD,
+    0x8ED6, 0x3BD1,
+    0x8EBE, 0x3BA5,
+    0x8EA7, 0x3B78,
+    0x8E90, 0x3B4C,
+    0x8E79, 0x3B1F,
+    0x8E61, 0x3AF2,
+    0x8E4A, 0x3AC6,
+    0x8E33, 0x3A99,
+    0x8E1C, 0x3A6C,
+    0x8E05, 0x3A40,
+    0x8DEE, 0x3A13,
+    0x8DD8, 0x39E6,
+    0x8DC1, 0x39B9,
+    0x8DAA, 0x398C,
+    0x8D94, 0x395F,
+    0x8D7D, 0x3932,
+    0x8D67, 0x3906,
+    0x8D50, 0x38D8,
+    0x8D3A, 0x38AB,
+    0x8D24, 0x387E,
+    0x8D0E, 0x3851,
+    0x8CF8, 0x3824,
+    0x8CE2, 0x37F7,
+    0x8CCC, 0x37CA,
+    0x8CB6, 0x379C,
+    0x8CA0, 0x376F,
+    0x8C8A, 0x3742,
+    0x8C75, 0x3714,
+    0x8C5F, 0x36E7,
+    0x8C4A, 0x36BA,
+    0x8C34, 0x368C,
+    0x8C1F, 0x365F,
+    0x8C09, 0x3631,
+    0x8BF4, 0x3604,
+    0x8BDF, 0x35D6,
+    0x8BCA, 0x35A8,
+    0x8BB5, 0x357B,
+    0x8BA0, 0x354D,
+    0x8B8B, 0x351F,
+    0x8B76, 0x34F2,
+    0x8B61, 0x34C4,
+    0x8B4D, 0x3496,
+    0x8B38, 0x3468,
+    0x8B24, 0x343A,
+    0x8B0F, 0x340C,
+    0x8AFB, 0x33DE,
+    0x8AE6, 0x33B0,
+    0x8AD2, 0x3382,
+    0x8ABE, 0x3354,
+    0x8AAA, 0x3326,
+    0x8A96, 0x32F8,
+    0x8A82, 0x32CA,
+    0x8A6E, 0x329C,
+    0x8A5A, 0x326E,
+    0x8A46, 0x3240,
+    0x8A33, 0x3211,
+    0x8A1F, 0x31E3,
+    0x8A0B, 0x31B5,
+    0x89F8, 0x3186,
+    0x89E4, 0x3158,
+    0x89D1, 0x312A,
+    0x89BE, 0x30FB,
+    0x89AB, 0x30CD,
+    0x8997, 0x309E,
+    0x8984, 0x3070,
+    0x8971, 0x3041,
+    0x895F, 0x3013,
+    0x894C, 0x2FE4,
+    0x8939, 0x2FB5,
+    0x8926, 0x2F87,
+    0x8914, 0x2F58,
+    0x8901, 0x2F29,
+    0x88EF, 0x2EFB,
+    0x88DC, 0x2ECC,
+    0x88CA, 0x2E9D,
+    0x88B8, 0x2E6E,
+    0x88A5, 0x2E3F,
+    0x8893, 0x2E11,
+    0x8881, 0x2DE2,
+    0x886F, 0x2DB3,
+    0x885D, 0x2D84,
+    0x884B, 0x2D55,
+    0x883A, 0x2D26,
+    0x8828, 0x2CF7,
+    0x8816, 0x2CC8,
+    0x8805, 0x2C98,
+    0x87F3, 0x2C69,
+    0x87E2, 0x2C3A,
+    0x87D1, 0x2C0B,
+    0x87BF, 0x2BDC,
+    0x87AE, 0x2BAD,
+    0x879D, 0x2B7D,
+    0x878C, 0x2B4E,
+    0x877B, 0x2B1F,
+    0x876A, 0x2AEF,
+    0x8759, 0x2AC0,
+    0x8749, 0x2A91,
+    0x8738, 0x2A61,
+    0x8727, 0x2A32,
+    0x8717, 0x2A02,
+    0x8706, 0x29D3,
+    0x86F6, 0x29A3,
+    0x86E6, 0x2974,
+    0x86D5, 0x2944,
+    0x86C5, 0x2915,
+    0x86B5, 0x28E5,
+    0x86A5, 0x28B5,
+    0x8695, 0x2886,
+    0x8685, 0x2856,
+    0x8675, 0x2826,
+    0x8666, 0x27F6,
+    0x8656, 0x27C7,
+    0x8646, 0x2797,
+    0x8637, 0x2767,
+    0x8627, 0x2737,
+    0x8618, 0x2707,
+    0x8609, 0x26D8,
+    0x85FA, 0x26A8,
+    0x85EA, 0x2678,
+    0x85DB, 0x2648,
+    0x85CC, 0x2618,
+    0x85BD, 0x25E8,
+    0x85AF, 0x25B8,
+    0x85A0, 0x2588,
+    0x8591, 0x2558,
+    0x8582, 0x2528,
+    0x8574, 0x24F7,
+    0x8565, 0x24C7,
+    0x8557, 0x2497,
+    0x8549, 0x2467,
+    0x853A, 0x2437,
+    0x852C, 0x2407,
+    0x851E, 0x23D6,
+    0x8510, 0x23A6,
+    0x8502, 0x2376,
+    0x84F4, 0x2345,
+    0x84E6, 0x2315,
+    0x84D9, 0x22E5,
+    0x84CB, 0x22B4,
+    0x84BD, 0x2284,
+    0x84B0, 0x2254,
+    0x84A2, 0x2223,
+    0x8495, 0x21F3,
+    0x8488, 0x21C2,
+    0x847B, 0x2192,
+    0x846D, 0x2161,
+    0x8460, 0x2131,
+    0x8453, 0x2100,
+    0x8446, 0x20D0,
+    0x843A, 0x209F,
+    0x842D, 0x206E,
+    0x8420, 0x203E,
+    0x8414, 0x200D,
+    0x8407, 0x1FDC,
+    0x83FA, 0x1FAC,
+    0x83EE, 0x1F7B,
+    0x83E2, 0x1F4A,
+    0x83D6, 0x1F19,
+    0x83C9, 0x1EE9,
+    0x83BD, 0x1EB8,
+    0x83B1, 0x1E87,
+    0x83A5, 0x1E56,
+    0x8399, 0x1E25,
+    0x838E, 0x1DF5,
+    0x8382, 0x1DC4,
+    0x8376, 0x1D93,
+    0x836B, 0x1D62,
+    0x835F, 0x1D31,
+    0x8354, 0x1D00,
+    0x8348, 0x1CCF,
+    0x833D, 0x1C9E,
+    0x8332, 0x1C6D,
+    0x8327, 0x1C3C,
+    0x831C, 0x1C0B,
+    0x8311, 0x1BDA,
+    0x8306, 0x1BA9,
+    0x82FB, 0x1B78,
+    0x82F0, 0x1B47,
+    0x82E6, 0x1B16,
+    0x82DB, 0x1AE4,
+    0x82D0, 0x1AB3,
+    0x82C6, 0x1A82,
+    0x82BC, 0x1A51,
+    0x82B1, 0x1A20,
+    0x82A7, 0x19EF,
+    0x829D, 0x19BD,
+    0x8293, 0x198C,
+    0x8289, 0x195B,
+    0x827F, 0x192A,
+    0x8275, 0x18F8,
+    0x826B, 0x18C7,
+    0x8262, 0x1896,
+    0x8258, 0x1864,
+    0x824F, 0x1833,
+    0x8245, 0x1802,
+    0x823C, 0x17D0,
+    0x8232, 0x179F,
+    0x8229, 0x176D,
+    0x8220, 0x173C,
+    0x8217, 0x170A,
+    0x820E, 0x16D9,
+    0x8205, 0x16A8,
+    0x81FC, 0x1676,
+    0x81F3, 0x1645,
+    0x81EB, 0x1613,
+    0x81E2, 0x15E2,
+    0x81D9, 0x15B0,
+    0x81D1, 0x157F,
+    0x81C8, 0x154D,
+    0x81C0, 0x151B,
+    0x81B8, 0x14EA,
+    0x81B0, 0x14B8,
+    0x81A8, 0x1487,
+    0x81A0, 0x1455,
+    0x8198, 0x1423,
+    0x8190, 0x13F2,
+    0x8188, 0x13C0,
+    0x8180, 0x138E,
+    0x8179, 0x135D,
+    0x8171, 0x132B,
+    0x816A, 0x12F9,
+    0x8162, 0x12C8,
+    0x815B, 0x1296,
+    0x8154, 0x1264,
+    0x814C, 0x1232,
+    0x8145, 0x1201,
+    0x813E, 0x11CF,
+    0x8137, 0x119D,
+    0x8130, 0x116B,
+    0x812A, 0x1139,
+    0x8123, 0x1108,
+    0x811C, 0x10D6,
+    0x8116, 0x10A4,
+    0x810F, 0x1072,
+    0x8109, 0x1040,
+    0x8102, 0x100E,
+    0x80FC, 0x0FDD,
+    0x80F6, 0x0FAB,
+    0x80F0, 0x0F79,
+    0x80EA, 0x0F47,
+    0x80E4, 0x0F15,
+    0x80DE, 0x0EE3,
+    0x80D8, 0x0EB1,
+    0x80D2, 0x0E7F,
+    0x80CD, 0x0E4D,
+    0x80C7, 0x0E1B,
+    0x80C2, 0x0DE9,
+    0x80BC, 0x0DB7,
+    0x80B7, 0x0D85,
+    0x80B2, 0x0D53,
+    0x80AC, 0x0D21,
+    0x80A7, 0x0CEF,
+    0x80A2, 0x0CBD,
+    0x809D, 0x0C8B,
+    0x8098, 0x0C59,
+    0x8094, 0x0C27,
+    0x808F, 0x0BF5,
+    0x808A, 0x0BC3,
+    0x8086, 0x0B91,
+    0x8081, 0x0B5F,
+    0x807D, 0x0B2D,
+    0x8078, 0x0AFB,
+    0x8074, 0x0AC9,
+    0x8070, 0x0A97,
+    0x806C, 0x0A65,
+    0x8068, 0x0A33,
+    0x8064, 0x0A00,
+    0x8060, 0x09CE,
+    0x805C, 0x099C,
+    0x8058, 0x096A,
+    0x8055, 0x0938,
+    0x8051, 0x0906,
+    0x804E, 0x08D4,
+    0x804A, 0x08A2,
+    0x8047, 0x086F,
+    0x8043, 0x083D,
+    0x8040, 0x080B,
+    0x803D, 0x07D9,
+    0x803A, 0x07A7,
+    0x8037, 0x0775,
+    0x8034, 0x0742,
+    0x8031, 0x0710,
+    0x802F, 0x06DE,
+    0x802C, 0x06AC,
+    0x8029, 0x067A,
+    0x8027, 0x0647,
+    0x8025, 0x0615,
+    0x8022, 0x05E3,
+    0x8020, 0x05B1,
+    0x801E, 0x057F,
+    0x801C, 0x054C,
+    0x801A, 0x051A,
+    0x8018, 0x04E8,
+    0x8016, 0x04B6,
+    0x8014, 0x0483,
+    0x8012, 0x0451,
+    0x8011, 0x041F,
+    0x800F, 0x03ED,
+    0x800D, 0x03BA,
+    0x800C, 0x0388,
+    0x800B, 0x0356,
+    0x8009, 0x0324,
+    0x8008, 0x02F1,
+    0x8007, 0x02BF,
+    0x8006, 0x028D,
+    0x8005, 0x025B,
+    0x8004, 0x0228,
+    0x8003, 0x01F6,
+    0x8003, 0x01C4,
+    0x8002, 0x0192,
+    0x8001, 0x015F,
+    0x8001, 0x012D,
+    0x8000, 0x00FB,
+    0x8000, 0x00C9,
+    0x8000, 0x0096,
+    0x8000, 0x0064,
+    0x8000, 0x0032,
+    0x8000, 0x0000,
+    0x8000, 0xFFCD,
+    0x8000, 0xFF9B,
+    0x8000, 0xFF69,
+    0x8000, 0xFF36,
+    0x8000, 0xFF04,
+    0x8001, 0xFED2,
+    0x8001, 0xFEA0,
+    0x8002, 0xFE6D,
+    0x8003, 0xFE3B,
+    0x8003, 0xFE09,
+    0x8004, 0xFDD7,
+    0x8005, 0xFDA4,
+    0x8006, 0xFD72,
+    0x8007, 0xFD40,
+    0x8008, 0xFD0E,
+    0x8009, 0xFCDB,
+    0x800B, 0xFCA9,
+    0x800C, 0xFC77,
+    0x800D, 0xFC45,
+    0x800F, 0xFC12,
+    0x8011, 0xFBE0,
+    0x8012, 0xFBAE,
+    0x8014, 0xFB7C,
+    0x8016, 0xFB49,
+    0x8018, 0xFB17,
+    0x801A, 0xFAE5,
+    0x801C, 0xFAB3,
+    0x801E, 0xFA80,
+    0x8020, 0xFA4E,
+    0x8022, 0xFA1C,
+    0x8025, 0xF9EA,
+    0x8027, 0xF9B8,
+    0x8029, 0xF985,
+    0x802C, 0xF953,
+    0x802F, 0xF921,
+    0x8031, 0xF8EF,
+    0x8034, 0xF8BD,
+    0x8037, 0xF88A,
+    0x803A, 0xF858,
+    0x803D, 0xF826,
+    0x8040, 0xF7F4,
+    0x8043, 0xF7C2,
+    0x8047, 0xF790,
+    0x804A, 0xF75D,
+    0x804E, 0xF72B,
+    0x8051, 0xF6F9,
+    0x8055, 0xF6C7,
+    0x8058, 0xF695,
+    0x805C, 0xF663,
+    0x8060, 0xF631,
+    0x8064, 0xF5FF,
+    0x8068, 0xF5CC,
+    0x806C, 0xF59A,
+    0x8070, 0xF568,
+    0x8074, 0xF536,
+    0x8078, 0xF504,
+    0x807D, 0xF4D2,
+    0x8081, 0xF4A0,
+    0x8086, 0xF46E,
+    0x808A, 0xF43C,
+    0x808F, 0xF40A,
+    0x8094, 0xF3D8,
+    0x8098, 0xF3A6,
+    0x809D, 0xF374,
+    0x80A2, 0xF342,
+    0x80A7, 0xF310,
+    0x80AC, 0xF2DE,
+    0x80B2, 0xF2AC,
+    0x80B7, 0xF27A,
+    0x80BC, 0xF248,
+    0x80C2, 0xF216,
+    0x80C7, 0xF1E4,
+    0x80CD, 0xF1B2,
+    0x80D2, 0xF180,
+    0x80D8, 0xF14E,
+    0x80DE, 0xF11C,
+    0x80E4, 0xF0EA,
+    0x80EA, 0xF0B8,
+    0x80F0, 0xF086,
+    0x80F6, 0xF054,
+    0x80FC, 0xF022,
+    0x8102, 0xEFF1,
+    0x8109, 0xEFBF,
+    0x810F, 0xEF8D,
+    0x8116, 0xEF5B,
+    0x811C, 0xEF29,
+    0x8123, 0xEEF7,
+    0x812A, 0xEEC6,
+    0x8130, 0xEE94,
+    0x8137, 0xEE62,
+    0x813E, 0xEE30,
+    0x8145, 0xEDFE,
+    0x814C, 0xEDCD,
+    0x8154, 0xED9B,
+    0x815B, 0xED69,
+    0x8162, 0xED37,
+    0x816A, 0xED06,
+    0x8171, 0xECD4,
+    0x8179, 0xECA2,
+    0x8180, 0xEC71,
+    0x8188, 0xEC3F,
+    0x8190, 0xEC0D,
+    0x8198, 0xEBDC,
+    0x81A0, 0xEBAA,
+    0x81A8, 0xEB78,
+    0x81B0, 0xEB47,
+    0x81B8, 0xEB15,
+    0x81C0, 0xEAE4,
+    0x81C8, 0xEAB2,
+    0x81D1, 0xEA80,
+    0x81D9, 0xEA4F,
+    0x81E2, 0xEA1D,
+    0x81EB, 0xE9EC,
+    0x81F3, 0xE9BA,
+    0x81FC, 0xE989,
+    0x8205, 0xE957,
+    0x820E, 0xE926,
+    0x8217, 0xE8F5,
+    0x8220, 0xE8C3,
+    0x8229, 0xE892,
+    0x8232, 0xE860,
+    0x823C, 0xE82F,
+    0x8245, 0xE7FD,
+    0x824F, 0xE7CC,
+    0x8258, 0xE79B,
+    0x8262, 0xE769,
+    0x826B, 0xE738,
+    0x8275, 0xE707,
+    0x827F, 0xE6D5,
+    0x8289, 0xE6A4,
+    0x8293, 0xE673,
+    0x829D, 0xE642,
+    0x82A7, 0xE610,
+    0x82B1, 0xE5DF,
+    0x82BC, 0xE5AE,
+    0x82C6, 0xE57D,
+    0x82D0, 0xE54C,
+    0x82DB, 0xE51B,
+    0x82E6, 0xE4E9,
+    0x82F0, 0xE4B8,
+    0x82FB, 0xE487,
+    0x8306, 0xE456,
+    0x8311, 0xE425,
+    0x831C, 0xE3F4,
+    0x8327, 0xE3C3,
+    0x8332, 0xE392,
+    0x833D, 0xE361,
+    0x8348, 0xE330,
+    0x8354, 0xE2FF,
+    0x835F, 0xE2CE,
+    0x836B, 0xE29D,
+    0x8376, 0xE26C,
+    0x8382, 0xE23B,
+    0x838E, 0xE20A,
+    0x8399, 0xE1DA,
+    0x83A5, 0xE1A9,
+    0x83B1, 0xE178,
+    0x83BD, 0xE147,
+    0x83C9, 0xE116,
+    0x83D6, 0xE0E6,
+    0x83E2, 0xE0B5,
+    0x83EE, 0xE084,
+    0x83FA, 0xE053,
+    0x8407, 0xE023,
+    0x8414, 0xDFF2,
+    0x8420, 0xDFC1,
+    0x842D, 0xDF91,
+    0x843A, 0xDF60,
+    0x8446, 0xDF2F,
+    0x8453, 0xDEFF,
+    0x8460, 0xDECE,
+    0x846D, 0xDE9E,
+    0x847B, 0xDE6D,
+    0x8488, 0xDE3D,
+    0x8495, 0xDE0C,
+    0x84A2, 0xDDDC,
+    0x84B0, 0xDDAB,
+    0x84BD, 0xDD7B,
+    0x84CB, 0xDD4B,
+    0x84D9, 0xDD1A,
+    0x84E6, 0xDCEA,
+    0x84F4, 0xDCBA,
+    0x8502, 0xDC89,
+    0x8510, 0xDC59,
+    0x851E, 0xDC29,
+    0x852C, 0xDBF8,
+    0x853A, 0xDBC8,
+    0x8549, 0xDB98,
+    0x8557, 0xDB68,
+    0x8565, 0xDB38,
+    0x8574, 0xDB08,
+    0x8582, 0xDAD7,
+    0x8591, 0xDAA7,
+    0x85A0, 0xDA77,
+    0x85AF, 0xDA47,
+    0x85BD, 0xDA17,
+    0x85CC, 0xD9E7,
+    0x85DB, 0xD9B7,
+    0x85EA, 0xD987,
+    0x85FA, 0xD957,
+    0x8609, 0xD927,
+    0x8618, 0xD8F8,
+    0x8627, 0xD8C8,
+    0x8637, 0xD898,
+    0x8646, 0xD868,
+    0x8656, 0xD838,
+    0x8666, 0xD809,
+    0x8675, 0xD7D9,
+    0x8685, 0xD7A9,
+    0x8695, 0xD779,
+    0x86A5, 0xD74A,
+    0x86B5, 0xD71A,
+    0x86C5, 0xD6EA,
+    0x86D5, 0xD6BB,
+    0x86E6, 0xD68B,
+    0x86F6, 0xD65C,
+    0x8706, 0xD62C,
+    0x8717, 0xD5FD,
+    0x8727, 0xD5CD,
+    0x8738, 0xD59E,
+    0x8749, 0xD56E,
+    0x8759, 0xD53F,
+    0x876A, 0xD510,
+    0x877B, 0xD4E0,
+    0x878C, 0xD4B1,
+    0x879D, 0xD482,
+    0x87AE, 0xD452,
+    0x87BF, 0xD423,
+    0x87D1, 0xD3F4,
+    0x87E2, 0xD3C5,
+    0x87F3, 0xD396,
+    0x8805, 0xD367,
+    0x8816, 0xD337,
+    0x8828, 0xD308,
+    0x883A, 0xD2D9,
+    0x884B, 0xD2AA,
+    0x885D, 0xD27B,
+    0x886F, 0xD24C,
+    0x8881, 0xD21D,
+    0x8893, 0xD1EE,
+    0x88A5, 0xD1C0,
+    0x88B8, 0xD191,
+    0x88CA, 0xD162,
+    0x88DC, 0xD133,
+    0x88EF, 0xD104,
+    0x8901, 0xD0D6,
+    0x8914, 0xD0A7,
+    0x8926, 0xD078,
+    0x8939, 0xD04A,
+    0x894C, 0xD01B,
+    0x895F, 0xCFEC,
+    0x8971, 0xCFBE,
+    0x8984, 0xCF8F,
+    0x8997, 0xCF61,
+    0x89AB, 0xCF32,
+    0x89BE, 0xCF04,
+    0x89D1, 0xCED5,
+    0x89E4, 0xCEA7,
+    0x89F8, 0xCE79,
+    0x8A0B, 0xCE4A,
+    0x8A1F, 0xCE1C,
+    0x8A33, 0xCDEE,
+    0x8A46, 0xCDBF,
+    0x8A5A, 0xCD91,
+    0x8A6E, 0xCD63,
+    0x8A82, 0xCD35,
+    0x8A96, 0xCD07,
+    0x8AAA, 0xCCD9,
+    0x8ABE, 0xCCAB,
+    0x8AD2, 0xCC7D,
+    0x8AE6, 0xCC4F,
+    0x8AFB, 0xCC21,
+    0x8B0F, 0xCBF3,
+    0x8B24, 0xCBC5,
+    0x8B38, 0xCB97,
+    0x8B4D, 0xCB69,
+    0x8B61, 0xCB3B,
+    0x8B76, 0xCB0D,
+    0x8B8B, 0xCAE0,
+    0x8BA0, 0xCAB2,
+    0x8BB5, 0xCA84,
+    0x8BCA, 0xCA57,
+    0x8BDF, 0xCA29,
+    0x8BF4, 0xC9FB,
+    0x8C09, 0xC9CE,
+    0x8C1F, 0xC9A0,
+    0x8C34, 0xC973,
+    0x8C4A, 0xC945,
+    0x8C5F, 0xC918,
+    0x8C75, 0xC8EB,
+    0x8C8A, 0xC8BD,
+    0x8CA0, 0xC890,
+    0x8CB6, 0xC863,
+    0x8CCC, 0xC835,
+    0x8CE2, 0xC808,
+    0x8CF8, 0xC7DB,
+    0x8D0E, 0xC7AE,
+    0x8D24, 0xC781,
+    0x8D3A, 0xC754,
+    0x8D50, 0xC727,
+    0x8D67, 0xC6F9,
+    0x8D7D, 0xC6CD,
+    0x8D94, 0xC6A0,
+    0x8DAA, 0xC673,
+    0x8DC1, 0xC646,
+    0x8DD8, 0xC619,
+    0x8DEE, 0xC5EC,
+    0x8E05, 0xC5BF,
+    0x8E1C, 0xC593,
+    0x8E33, 0xC566,
+    0x8E4A, 0xC539,
+    0x8E61, 0xC50D,
+    0x8E79, 0xC4E0,
+    0x8E90, 0xC4B3,
+    0x8EA7, 0xC487,
+    0x8EBE, 0xC45A,
+    0x8ED6, 0xC42E,
+    0x8EED, 0xC402,
+    0x8F05, 0xC3D5,
+    0x8F1D, 0xC3A9,
+    0x8F34, 0xC37C,
+    0x8F4C, 0xC350,
+    0x8F64, 0xC324,
+    0x8F7C, 0xC2F8,
+    0x8F94, 0xC2CC,
+    0x8FAC, 0xC29F,
+    0x8FC4, 0xC273,
+    0x8FDC, 0xC247,
+    0x8FF5, 0xC21B,
+    0x900D, 0xC1EF,
+    0x9025, 0xC1C3,
+    0x903E, 0xC197,
+    0x9056, 0xC16C,
+    0x906F, 0xC140,
+    0x9088, 0xC114,
+    0x90A0, 0xC0E8,
+    0x90B9, 0xC0BC,
+    0x90D2, 0xC091,
+    0x90EB, 0xC065,
+    0x9104, 0xC03A,
+    0x911D, 0xC00E,
+    0x9136, 0xBFE2,
+    0x9150, 0xBFB7,
+    0x9169, 0xBF8C,
+    0x9182, 0xBF60,
+    0x919C, 0xBF35,
+    0x91B5, 0xBF09,
+    0x91CF, 0xBEDE,
+    0x91E8, 0xBEB3,
+    0x9202, 0xBE88,
+    0x921C, 0xBE5D,
+    0x9235, 0xBE31,
+    0x924F, 0xBE06,
+    0x9269, 0xBDDB,
+    0x9283, 0xBDB0,
+    0x929D, 0xBD85,
+    0x92B7, 0xBD5A,
+    0x92D2, 0xBD2F,
+    0x92EC, 0xBD05,
+    0x9306, 0xBCDA,
+    0x9321, 0xBCAF,
+    0x933B, 0xBC84,
+    0x9356, 0xBC5A,
+    0x9370, 0xBC2F,
+    0x938B, 0xBC04,
+    0x93A6, 0xBBDA,
+    0x93C0, 0xBBAF,
+    0x93DB, 0xBB85,
+    0x93F6, 0xBB5A,
+    0x9411, 0xBB30,
+    0x942C, 0xBB05,
+    0x9447, 0xBADB,
+    0x9463, 0xBAB1,
+    0x947E, 0xBA87,
+    0x9499, 0xBA5C,
+    0x94B5, 0xBA32,
+    0x94D0, 0xBA08,
+    0x94EC, 0xB9DE,
+    0x9507, 0xB9B4,
+    0x9523, 0xB98A,
+    0x953E, 0xB960,
+    0x955A, 0xB936,
+    0x9576, 0xB90C,
+    0x9592, 0xB8E3,
+    0x95AE, 0xB8B9,
+    0x95CA, 0xB88F,
+    0x95E6, 0xB865,
+    0x9602, 0xB83C,
+    0x961E, 0xB812,
+    0x963B, 0xB7E9,
+    0x9657, 0xB7BF,
+    0x9673, 0xB796,
+    0x9690, 0xB76C,
+    0x96AC, 0xB743,
+    0x96C9, 0xB719,
+    0x96E6, 0xB6F0,
+    0x9702, 0xB6C7,
+    0x971F, 0xB69E,
+    0x973C, 0xB675,
+    0x9759, 0xB64B,
+    0x9776, 0xB622,
+    0x9793, 0xB5F9,
+    0x97B0, 0xB5D0,
+    0x97CD, 0xB5A7,
+    0x97EA, 0xB57E,
+    0x9808, 0xB556,
+    0x9825, 0xB52D,
+    0x9842, 0xB504,
+    0x9860, 0xB4DB,
+    0x987D, 0xB4B3,
+    0x989B, 0xB48A,
+    0x98B9, 0xB461,
+    0x98D6, 0xB439,
+    0x98F4, 0xB410,
+    0x9912, 0xB3E8,
+    0x9930, 0xB3C0,
+    0x994E, 0xB397,
+    0x996C, 0xB36F,
+    0x998A, 0xB347,
+    0x99A8, 0xB31E,
+    0x99C6, 0xB2F6,
+    0x99E5, 0xB2CE,
+    0x9A03, 0xB2A6,
+    0x9A22, 0xB27E,
+    0x9A40, 0xB256,
+    0x9A5F, 0xB22E,
+    0x9A7D, 0xB206,
+    0x9A9C, 0xB1DE,
+    0x9ABA, 0xB1B7,
+    0x9AD9, 0xB18F,
+    0x9AF8, 0xB167,
+    0x9B17, 0xB140,
+    0x9B36, 0xB118,
+    0x9B55, 0xB0F0,
+    0x9B74, 0xB0C9,
+    0x9B93, 0xB0A1,
+    0x9BB2, 0xB07A,
+    0x9BD2, 0xB053,
+    0x9BF1, 0xB02B,
+    0x9C10, 0xB004,
+    0x9C30, 0xAFDD,
+    0x9C4F, 0xAFB6,
+    0x9C6F, 0xAF8F,
+    0x9C8E, 0xAF68,
+    0x9CAE, 0xAF40,
+    0x9CCE, 0xAF1A,
+    0x9CEE, 0xAEF3,
+    0x9D0D, 0xAECC,
+    0x9D2D, 0xAEA5,
+    0x9D4D, 0xAE7E,
+    0x9D6D, 0xAE57,
+    0x9D8E, 0xAE31,
+    0x9DAE, 0xAE0A,
+    0x9DCE, 0xADE3,
+    0x9DEE, 0xADBD,
+    0x9E0E, 0xAD96,
+    0x9E2F, 0xAD70,
+    0x9E4F, 0xAD4A,
+    0x9E70, 0xAD23,
+    0x9E90, 0xACFD,
+    0x9EB1, 0xACD7,
+    0x9ED2, 0xACB1,
+    0x9EF2, 0xAC8A,
+    0x9F13, 0xAC64,
+    0x9F34, 0xAC3E,
+    0x9F55, 0xAC18,
+    0x9F76, 0xABF2,
+    0x9F97, 0xABCC,
+    0x9FB8, 0xABA7,
+    0x9FD9, 0xAB81,
+    0x9FFB, 0xAB5B,
+    0xA01C, 0xAB35,
+    0xA03D, 0xAB10,
+    0xA05F, 0xAAEA,
+    0xA080, 0xAAC5,
+    0xA0A1, 0xAA9F,
+    0xA0C3, 0xAA7A,
+    0xA0E5, 0xAA54,
+    0xA106, 0xAA2F,
+    0xA128, 0xAA0A,
+    0xA14A, 0xA9E5,
+    0xA16C, 0xA9BF,
+    0xA18E, 0xA99A,
+    0xA1AF, 0xA975,
+    0xA1D2, 0xA950,
+    0xA1F4, 0xA92B,
+    0xA216, 0xA906,
+    0xA238, 0xA8E2,
+    0xA25A, 0xA8BD,
+    0xA27C, 0xA898,
+    0xA29F, 0xA873,
+    0xA2C1, 0xA84F,
+    0xA2E4, 0xA82A,
+    0xA306, 0xA806,
+    0xA329, 0xA7E1,
+    0xA34B, 0xA7BD,
+    0xA36E, 0xA798,
+    0xA391, 0xA774,
+    0xA3B4, 0xA750,
+    0xA3D6, 0xA72B,
+    0xA3F9, 0xA707,
+    0xA41C, 0xA6E3,
+    0xA43F, 0xA6BF,
+    0xA462, 0xA69B,
+    0xA486, 0xA677,
+    0xA4A9, 0xA653,
+    0xA4CC, 0xA62F,
+    0xA4EF, 0xA60C,
+    0xA513, 0xA5E8,
+    0xA536, 0xA5C4,
+    0xA55A, 0xA5A1,
+    0xA57D, 0xA57D,
+    0xA5A1, 0xA55A,
+    0xA5C4, 0xA536,
+    0xA5E8, 0xA513,
+    0xA60C, 0xA4EF,
+    0xA62F, 0xA4CC,
+    0xA653, 0xA4A9,
+    0xA677, 0xA486,
+    0xA69B, 0xA462,
+    0xA6BF, 0xA43F,
+    0xA6E3, 0xA41C,
+    0xA707, 0xA3F9,
+    0xA72B, 0xA3D6,
+    0xA750, 0xA3B4,
+    0xA774, 0xA391,
+    0xA798, 0xA36E,
+    0xA7BD, 0xA34B,
+    0xA7E1, 0xA329,
+    0xA806, 0xA306,
+    0xA82A, 0xA2E4,
+    0xA84F, 0xA2C1,
+    0xA873, 0xA29F,
+    0xA898, 0xA27C,
+    0xA8BD, 0xA25A,
+    0xA8E2, 0xA238,
+    0xA906, 0xA216,
+    0xA92B, 0xA1F4,
+    0xA950, 0xA1D2,
+    0xA975, 0xA1AF,
+    0xA99A, 0xA18E,
+    0xA9BF, 0xA16C,
+    0xA9E5, 0xA14A,
+    0xAA0A, 0xA128,
+    0xAA2F, 0xA106,
+    0xAA54, 0xA0E5,
+    0xAA7A, 0xA0C3,
+    0xAA9F, 0xA0A1,
+    0xAAC5, 0xA080,
+    0xAAEA, 0xA05F,
+    0xAB10, 0xA03D,
+    0xAB35, 0xA01C,
+    0xAB5B, 0x9FFB,
+    0xAB81, 0x9FD9,
+    0xABA7, 0x9FB8,
+    0xABCC, 0x9F97,
+    0xABF2, 0x9F76,
+    0xAC18, 0x9F55,
+    0xAC3E, 0x9F34,
+    0xAC64, 0x9F13,
+    0xAC8A, 0x9EF2,
+    0xACB1, 0x9ED2,
+    0xACD7, 0x9EB1,
+    0xACFD, 0x9E90,
+    0xAD23, 0x9E70,
+    0xAD4A, 0x9E4F,
+    0xAD70, 0x9E2F,
+    0xAD96, 0x9E0E,
+    0xADBD, 0x9DEE,
+    0xADE3, 0x9DCE,
+    0xAE0A, 0x9DAE,
+    0xAE31, 0x9D8E,
+    0xAE57, 0x9D6D,
+    0xAE7E, 0x9D4D,
+    0xAEA5, 0x9D2D,
+    0xAECC, 0x9D0D,
+    0xAEF3, 0x9CEE,
+    0xAF1A, 0x9CCE,
+    0xAF40, 0x9CAE,
+    0xAF68, 0x9C8E,
+    0xAF8F, 0x9C6F,
+    0xAFB6, 0x9C4F,
+    0xAFDD, 0x9C30,
+    0xB004, 0x9C10,
+    0xB02B, 0x9BF1,
+    0xB053, 0x9BD2,
+    0xB07A, 0x9BB2,
+    0xB0A1, 0x9B93,
+    0xB0C9, 0x9B74,
+    0xB0F0, 0x9B55,
+    0xB118, 0x9B36,
+    0xB140, 0x9B17,
+    0xB167, 0x9AF8,
+    0xB18F, 0x9AD9,
+    0xB1B7, 0x9ABA,
+    0xB1DE, 0x9A9C,
+    0xB206, 0x9A7D,
+    0xB22E, 0x9A5F,
+    0xB256, 0x9A40,
+    0xB27E, 0x9A22,
+    0xB2A6, 0x9A03,
+    0xB2CE, 0x99E5,
+    0xB2F6, 0x99C6,
+    0xB31E, 0x99A8,
+    0xB347, 0x998A,
+    0xB36F, 0x996C,
+    0xB397, 0x994E,
+    0xB3C0, 0x9930,
+    0xB3E8, 0x9912,
+    0xB410, 0x98F4,
+    0xB439, 0x98D6,
+    0xB461, 0x98B9,
+    0xB48A, 0x989B,
+    0xB4B3, 0x987D,
+    0xB4DB, 0x9860,
+    0xB504, 0x9842,
+    0xB52D, 0x9825,
+    0xB556, 0x9808,
+    0xB57E, 0x97EA,
+    0xB5A7, 0x97CD,
+    0xB5D0, 0x97B0,
+    0xB5F9, 0x9793,
+    0xB622, 0x9776,
+    0xB64B, 0x9759,
+    0xB675, 0x973C,
+    0xB69E, 0x971F,
+    0xB6C7, 0x9702,
+    0xB6F0, 0x96E6,
+    0xB719, 0x96C9,
+    0xB743, 0x96AC,
+    0xB76C, 0x9690,
+    0xB796, 0x9673,
+    0xB7BF, 0x9657,
+    0xB7E9, 0x963B,
+    0xB812, 0x961E,
+    0xB83C, 0x9602,
+    0xB865, 0x95E6,
+    0xB88F, 0x95CA,
+    0xB8B9, 0x95AE,
+    0xB8E3, 0x9592,
+    0xB90C, 0x9576,
+    0xB936, 0x955A,
+    0xB960, 0x953E,
+    0xB98A, 0x9523,
+    0xB9B4, 0x9507,
+    0xB9DE, 0x94EC,
+    0xBA08, 0x94D0,
+    0xBA32, 0x94B5,
+    0xBA5C, 0x9499,
+    0xBA87, 0x947E,
+    0xBAB1, 0x9463,
+    0xBADB, 0x9447,
+    0xBB05, 0x942C,
+    0xBB30, 0x9411,
+    0xBB5A, 0x93F6,
+    0xBB85, 0x93DB,
+    0xBBAF, 0x93C0,
+    0xBBDA, 0x93A6,
+    0xBC04, 0x938B,
+    0xBC2F, 0x9370,
+    0xBC5A, 0x9356,
+    0xBC84, 0x933B,
+    0xBCAF, 0x9321,
+    0xBCDA, 0x9306,
+    0xBD05, 0x92EC,
+    0xBD2F, 0x92D2,
+    0xBD5A, 0x92B7,
+    0xBD85, 0x929D,
+    0xBDB0, 0x9283,
+    0xBDDB, 0x9269,
+    0xBE06, 0x924F,
+    0xBE31, 0x9235,
+    0xBE5D, 0x921C,
+    0xBE88, 0x9202,
+    0xBEB3, 0x91E8,
+    0xBEDE, 0x91CF,
+    0xBF09, 0x91B5,
+    0xBF35, 0x919C,
+    0xBF60, 0x9182,
+    0xBF8C, 0x9169,
+    0xBFB7, 0x9150,
+    0xBFE2, 0x9136,
+    0xC00E, 0x911D,
+    0xC03A, 0x9104,
+    0xC065, 0x90EB,
+    0xC091, 0x90D2,
+    0xC0BC, 0x90B9,
+    0xC0E8, 0x90A0,
+    0xC114, 0x9088,
+    0xC140, 0x906F,
+    0xC16C, 0x9056,
+    0xC197, 0x903E,
+    0xC1C3, 0x9025,
+    0xC1EF, 0x900D,
+    0xC21B, 0x8FF5,
+    0xC247, 0x8FDC,
+    0xC273, 0x8FC4,
+    0xC29F, 0x8FAC,
+    0xC2CC, 0x8F94,
+    0xC2F8, 0x8F7C,
+    0xC324, 0x8F64,
+    0xC350, 0x8F4C,
+    0xC37C, 0x8F34,
+    0xC3A9, 0x8F1D,
+    0xC3D5, 0x8F05,
+    0xC402, 0x8EED,
+    0xC42E, 0x8ED6,
+    0xC45A, 0x8EBE,
+    0xC487, 0x8EA7,
+    0xC4B3, 0x8E90,
+    0xC4E0, 0x8E79,
+    0xC50D, 0x8E61,
+    0xC539, 0x8E4A,
+    0xC566, 0x8E33,
+    0xC593, 0x8E1C,
+    0xC5BF, 0x8E05,
+    0xC5EC, 0x8DEE,
+    0xC619, 0x8DD8,
+    0xC646, 0x8DC1,
+    0xC673, 0x8DAA,
+    0xC6A0, 0x8D94,
+    0xC6CD, 0x8D7D,
+    0xC6F9, 0x8D67,
+    0xC727, 0x8D50,
+    0xC754, 0x8D3A,
+    0xC781, 0x8D24,
+    0xC7AE, 0x8D0E,
+    0xC7DB, 0x8CF8,
+    0xC808, 0x8CE2,
+    0xC835, 0x8CCC,
+    0xC863, 0x8CB6,
+    0xC890, 0x8CA0,
+    0xC8BD, 0x8C8A,
+    0xC8EB, 0x8C75,
+    0xC918, 0x8C5F,
+    0xC945, 0x8C4A,
+    0xC973, 0x8C34,
+    0xC9A0, 0x8C1F,
+    0xC9CE, 0x8C09,
+    0xC9FB, 0x8BF4,
+    0xCA29, 0x8BDF,
+    0xCA57, 0x8BCA,
+    0xCA84, 0x8BB5,
+    0xCAB2, 0x8BA0,
+    0xCAE0, 0x8B8B,
+    0xCB0D, 0x8B76,
+    0xCB3B, 0x8B61,
+    0xCB69, 0x8B4D,
+    0xCB97, 0x8B38,
+    0xCBC5, 0x8B24,
+    0xCBF3, 0x8B0F,
+    0xCC21, 0x8AFB,
+    0xCC4F, 0x8AE6,
+    0xCC7D, 0x8AD2,
+    0xCCAB, 0x8ABE,
+    0xCCD9, 0x8AAA,
+    0xCD07, 0x8A96,
+    0xCD35, 0x8A82,
+    0xCD63, 0x8A6E,
+    0xCD91, 0x8A5A,
+    0xCDBF, 0x8A46,
+    0xCDEE, 0x8A33,
+    0xCE1C, 0x8A1F,
+    0xCE4A, 0x8A0B,
+    0xCE79, 0x89F8,
+    0xCEA7, 0x89E4,
+    0xCED5, 0x89D1,
+    0xCF04, 0x89BE,
+    0xCF32, 0x89AB,
+    0xCF61, 0x8997,
+    0xCF8F, 0x8984,
+    0xCFBE, 0x8971,
+    0xCFEC, 0x895F,
+    0xD01B, 0x894C,
+    0xD04A, 0x8939,
+    0xD078, 0x8926,
+    0xD0A7, 0x8914,
+    0xD0D6, 0x8901,
+    0xD104, 0x88EF,
+    0xD133, 0x88DC,
+    0xD162, 0x88CA,
+    0xD191, 0x88B8,
+    0xD1C0, 0x88A5,
+    0xD1EE, 0x8893,
+    0xD21D, 0x8881,
+    0xD24C, 0x886F,
+    0xD27B, 0x885D,
+    0xD2AA, 0x884B,
+    0xD2D9, 0x883A,
+    0xD308, 0x8828,
+    0xD337, 0x8816,
+    0xD367, 0x8805,
+    0xD396, 0x87F3,
+    0xD3C5, 0x87E2,
+    0xD3F4, 0x87D1,
+    0xD423, 0x87BF,
+    0xD452, 0x87AE,
+    0xD482, 0x879D,
+    0xD4B1, 0x878C,
+    0xD4E0, 0x877B,
+    0xD510, 0x876A,
+    0xD53F, 0x8759,
+    0xD56E, 0x8749,
+    0xD59E, 0x8738,
+    0xD5CD, 0x8727,
+    0xD5FD, 0x8717,
+    0xD62C, 0x8706,
+    0xD65C, 0x86F6,
+    0xD68B, 0x86E6,
+    0xD6BB, 0x86D5,
+    0xD6EA, 0x86C5,
+    0xD71A, 0x86B5,
+    0xD74A, 0x86A5,
+    0xD779, 0x8695,
+    0xD7A9, 0x8685,
+    0xD7D9, 0x8675,
+    0xD809, 0x8666,
+    0xD838, 0x8656,
+    0xD868, 0x8646,
+    0xD898, 0x8637,
+    0xD8C8, 0x8627,
+    0xD8F8, 0x8618,
+    0xD927, 0x8609,
+    0xD957, 0x85FA,
+    0xD987, 0x85EA,
+    0xD9B7, 0x85DB,
+    0xD9E7, 0x85CC,
+    0xDA17, 0x85BD,
+    0xDA47, 0x85AF,
+    0xDA77, 0x85A0,
+    0xDAA7, 0x8591,
+    0xDAD7, 0x8582,
+    0xDB08, 0x8574,
+    0xDB38, 0x8565,
+    0xDB68, 0x8557,
+    0xDB98, 0x8549,
+    0xDBC8, 0x853A,
+    0xDBF8, 0x852C,
+    0xDC29, 0x851E,
+    0xDC59, 0x8510,
+    0xDC89, 0x8502,
+    0xDCBA, 0x84F4,
+    0xDCEA, 0x84E6,
+    0xDD1A, 0x84D9,
+    0xDD4B, 0x84CB,
+    0xDD7B, 0x84BD,
+    0xDDAB, 0x84B0,
+    0xDDDC, 0x84A2,
+    0xDE0C, 0x8495,
+    0xDE3D, 0x8488,
+    0xDE6D, 0x847B,
+    0xDE9E, 0x846D,
+    0xDECE, 0x8460,
+    0xDEFF, 0x8453,
+    0xDF2F, 0x8446,
+    0xDF60, 0x843A,
+    0xDF91, 0x842D,
+    0xDFC1, 0x8420,
+    0xDFF2, 0x8414,
+    0xE023, 0x8407,
+    0xE053, 0x83FA,
+    0xE084, 0x83EE,
+    0xE0B5, 0x83E2,
+    0xE0E6, 0x83D6,
+    0xE116, 0x83C9,
+    0xE147, 0x83BD,
+    0xE178, 0x83B1,
+    0xE1A9, 0x83A5,
+    0xE1DA, 0x8399,
+    0xE20A, 0x838E,
+    0xE23B, 0x8382,
+    0xE26C, 0x8376,
+    0xE29D, 0x836B,
+    0xE2CE, 0x835F,
+    0xE2FF, 0x8354,
+    0xE330, 0x8348,
+    0xE361, 0x833D,
+    0xE392, 0x8332,
+    0xE3C3, 0x8327,
+    0xE3F4, 0x831C,
+    0xE425, 0x8311,
+    0xE456, 0x8306,
+    0xE487, 0x82FB,
+    0xE4B8, 0x82F0,
+    0xE4E9, 0x82E6,
+    0xE51B, 0x82DB,
+    0xE54C, 0x82D0,
+    0xE57D, 0x82C6,
+    0xE5AE, 0x82BC,
+    0xE5DF, 0x82B1,
+    0xE610, 0x82A7,
+    0xE642, 0x829D,
+    0xE673, 0x8293,
+    0xE6A4, 0x8289,
+    0xE6D5, 0x827F,
+    0xE707, 0x8275,
+    0xE738, 0x826B,
+    0xE769, 0x8262,
+    0xE79B, 0x8258,
+    0xE7CC, 0x824F,
+    0xE7FD, 0x8245,
+    0xE82F, 0x823C,
+    0xE860, 0x8232,
+    0xE892, 0x8229,
+    0xE8C3, 0x8220,
+    0xE8F5, 0x8217,
+    0xE926, 0x820E,
+    0xE957, 0x8205,
+    0xE989, 0x81FC,
+    0xE9BA, 0x81F3,
+    0xE9EC, 0x81EB,
+    0xEA1D, 0x81E2,
+    0xEA4F, 0x81D9,
+    0xEA80, 0x81D1,
+    0xEAB2, 0x81C8,
+    0xEAE4, 0x81C0,
+    0xEB15, 0x81B8,
+    0xEB47, 0x81B0,
+    0xEB78, 0x81A8,
+    0xEBAA, 0x81A0,
+    0xEBDC, 0x8198,
+    0xEC0D, 0x8190,
+    0xEC3F, 0x8188,
+    0xEC71, 0x8180,
+    0xECA2, 0x8179,
+    0xECD4, 0x8171,
+    0xED06, 0x816A,
+    0xED37, 0x8162,
+    0xED69, 0x815B,
+    0xED9B, 0x8154,
+    0xEDCD, 0x814C,
+    0xEDFE, 0x8145,
+    0xEE30, 0x813E,
+    0xEE62, 0x8137,
+    0xEE94, 0x8130,
+    0xEEC6, 0x812A,
+    0xEEF7, 0x8123,
+    0xEF29, 0x811C,
+    0xEF5B, 0x8116,
+    0xEF8D, 0x810F,
+    0xEFBF, 0x8109,
+    0xEFF1, 0x8102,
+    0xF022, 0x80FC,
+    0xF054, 0x80F6,
+    0xF086, 0x80F0,
+    0xF0B8, 0x80EA,
+    0xF0EA, 0x80E4,
+    0xF11C, 0x80DE,
+    0xF14E, 0x80D8,
+    0xF180, 0x80D2,
+    0xF1B2, 0x80CD,
+    0xF1E4, 0x80C7,
+    0xF216, 0x80C2,
+    0xF248, 0x80BC,
+    0xF27A, 0x80B7,
+    0xF2AC, 0x80B2,
+    0xF2DE, 0x80AC,
+    0xF310, 0x80A7,
+    0xF342, 0x80A2,
+    0xF374, 0x809D,
+    0xF3A6, 0x8098,
+    0xF3D8, 0x8094,
+    0xF40A, 0x808F,
+    0xF43C, 0x808A,
+    0xF46E, 0x8086,
+    0xF4A0, 0x8081,
+    0xF4D2, 0x807D,
+    0xF504, 0x8078,
+    0xF536, 0x8074,
+    0xF568, 0x8070,
+    0xF59A, 0x806C,
+    0xF5CC, 0x8068,
+    0xF5FF, 0x8064,
+    0xF631, 0x8060,
+    0xF663, 0x805C,
+    0xF695, 0x8058,
+    0xF6C7, 0x8055,
+    0xF6F9, 0x8051,
+    0xF72B, 0x804E,
+    0xF75D, 0x804A,
+    0xF790, 0x8047,
+    0xF7C2, 0x8043,
+    0xF7F4, 0x8040,
+    0xF826, 0x803D,
+    0xF858, 0x803A,
+    0xF88A, 0x8037,
+    0xF8BD, 0x8034,
+    0xF8EF, 0x8031,
+    0xF921, 0x802F,
+    0xF953, 0x802C,
+    0xF985, 0x8029,
+    0xF9B8, 0x8027,
+    0xF9EA, 0x8025,
+    0xFA1C, 0x8022,
+    0xFA4E, 0x8020,
+    0xFA80, 0x801E,
+    0xFAB3, 0x801C,
+    0xFAE5, 0x801A,
+    0xFB17, 0x8018,
+    0xFB49, 0x8016,
+    0xFB7C, 0x8014,
+    0xFBAE, 0x8012,
+    0xFBE0, 0x8011,
+    0xFC12, 0x800F,
+    0xFC45, 0x800D,
+    0xFC77, 0x800C,
+    0xFCA9, 0x800B,
+    0xFCDB, 0x8009,
+    0xFD0E, 0x8008,
+    0xFD40, 0x8007,
+    0xFD72, 0x8006,
+    0xFDA4, 0x8005,
+    0xFDD7, 0x8004,
+    0xFE09, 0x8003,
+    0xFE3B, 0x8003,
+    0xFE6D, 0x8002,
+    0xFEA0, 0x8001,
+    0xFED2, 0x8001,
+    0xFF04, 0x8000,
+    0xFF36, 0x8000,
+    0xFF69, 0x8000,
+    0xFF9B, 0x8000,
+    0xFFCD, 0x8000
+};
+
+
+/**    
+* @} end of CFFT_CIFFT group    
+*/
+
+/*    
+* @brief  Q15 table for reciprocal    
+*/
+const q15_t ALIGN4 armRecipTableQ15[64] = {
+ 0x7F03, 0x7D13, 0x7B31, 0x795E, 0x7798, 0x75E0,
+ 0x7434, 0x7294, 0x70FF, 0x6F76, 0x6DF6, 0x6C82,
+ 0x6B16, 0x69B5, 0x685C, 0x670C, 0x65C4, 0x6484,
+ 0x634C, 0x621C, 0x60F3, 0x5FD0, 0x5EB5, 0x5DA0,
+ 0x5C91, 0x5B88, 0x5A85, 0x5988, 0x5890, 0x579E,
+ 0x56B0, 0x55C8, 0x54E4, 0x5405, 0x532B, 0x5255,
+ 0x5183, 0x50B6, 0x4FEC, 0x4F26, 0x4E64, 0x4DA6,
+ 0x4CEC, 0x4C34, 0x4B81, 0x4AD0, 0x4A23, 0x4978,
+ 0x48D1, 0x482D, 0x478C, 0x46ED, 0x4651, 0x45B8,
+ 0x4521, 0x448D, 0x43FC, 0x436C, 0x42DF, 0x4255,
+ 0x41CC, 0x4146, 0x40C2, 0x4040
+};
+
+/*    
+* @brief  Q31 table for reciprocal    
+*/
+const q31_t armRecipTableQ31[64] = {
+  0x7F03F03F, 0x7D137420, 0x7B31E739, 0x795E9F94, 0x7798FD29, 0x75E06928,
+  0x7434554D, 0x72943B4B, 0x70FF9C40, 0x6F760031, 0x6DF6F593, 0x6C8210E3,
+  0x6B16EC3A, 0x69B526F6, 0x685C655F, 0x670C505D, 0x65C4952D, 0x6484E519,
+  0x634CF53E, 0x621C7E4F, 0x60F33C61, 0x5FD0EEB3, 0x5EB55785, 0x5DA03BEB,
+  0x5C9163A1, 0x5B8898E6, 0x5A85A85A, 0x598860DF, 0x58909373, 0x579E1318,
+  0x56B0B4B8, 0x55C84F0B, 0x54E4BA80, 0x5405D124, 0x532B6E8F, 0x52556FD0,
+  0x5183B35A, 0x50B618F3, 0x4FEC81A2, 0x4F26CFA2, 0x4E64E64E, 0x4DA6AA1D,
+  0x4CEC008B, 0x4C34D010, 0x4B810016, 0x4AD078EF, 0x4A2323C4, 0x4978EA96,
+  0x48D1B827, 0x482D77FE, 0x478C1657, 0x46ED801D, 0x4651A2E5, 0x45B86CE2,
+  0x4521CCE1, 0x448DB244, 0x43FC0CFA, 0x436CCD78, 0x42DFE4B4, 0x42554426,
+  0x41CCDDB6, 0x4146A3C6, 0x40C28923, 0x40408102
+};
+
+const uint16_t armBitRevIndexTable16[ARMBITREVINDEXTABLE__16_TABLE_LENGTH] = 
+{
+   //8x2, size 20
+   8,64, 24,72, 16,64, 40,80, 32,64, 56,88, 48,72, 88,104, 72,96, 104,112
+};
+
+const uint16_t armBitRevIndexTable32[ARMBITREVINDEXTABLE__32_TABLE_LENGTH] = 
+{
+   //8x4, size 48
+   8,64, 16,128, 24,192, 32,64, 40,72, 48,136, 56,200, 64,128, 72,80, 88,208,
+   80,144, 96,192, 104,208, 112,152, 120,216, 136,192, 144,160, 168,208,
+   152,224, 176,208, 184,232, 216,240, 200,224, 232,240
+};
+
+const uint16_t armBitRevIndexTable64[ARMBITREVINDEXTABLE__64_TABLE_LENGTH] = 
+{   
+   //radix 8, size 56
+   8,64, 16,128, 24,192, 32,256, 40,320, 48,384, 56,448, 80,136, 88,200, 
+   96,264, 104,328, 112,392, 120,456, 152,208, 160,272, 168,336, 176,400, 
+   184,464, 224,280, 232,344, 240,408, 248,472, 296,352, 304,416, 312,480, 
+   368,424, 376,488, 440,496
+};
+
+const uint16_t armBitRevIndexTable128[ARMBITREVINDEXTABLE_128_TABLE_LENGTH] = 
+{
+   //8x2, size 208
+   8,512, 16,64, 24,576, 32,128, 40,640, 48,192, 56,704, 64,256, 72,768, 
+   80,320, 88,832, 96,384, 104,896, 112,448, 120,960, 128,512, 136,520, 
+   144,768, 152,584, 160,520, 168,648, 176,200, 184,712, 192,264, 200,776, 
+   208,328, 216,840, 224,392, 232,904, 240,456, 248,968, 264,528, 272,320, 
+   280,592, 288,768, 296,656, 304,328, 312,720, 328,784, 344,848, 352,400, 
+   360,912, 368,464, 376,976, 384,576, 392,536, 400,832, 408,600, 416,584, 
+   424,664, 432,840, 440,728, 448,592, 456,792, 464,848, 472,856, 480,600, 
+   488,920, 496,856, 504,984, 520,544, 528,576, 536,608, 552,672, 560,608, 
+   568,736, 576,768, 584,800, 592,832, 600,864, 608,800, 616,928, 624,864, 
+   632,992, 648,672, 656,896, 664,928, 688,904, 696,744, 704,896, 712,808, 
+   720,912, 728,872, 736,928, 744,936, 752,920, 760,1000, 776,800, 784,832, 
+   792,864, 808,904, 816,864, 824,920, 840,864, 856,880, 872,944, 888,1008, 
+   904,928, 912,960, 920,992, 944,968, 952,1000, 968,992, 984,1008
+};
+
+const uint16_t armBitRevIndexTable256[ARMBITREVINDEXTABLE_256_TABLE_LENGTH] = 
+{
+   //8x4, size 440
+   8,512, 16,1024, 24,1536, 32,64, 40,576, 48,1088, 56,1600, 64,128, 72,640, 
+   80,1152, 88,1664, 96,192, 104,704, 112,1216, 120,1728, 128,256, 136,768, 
+   144,1280, 152,1792, 160,320, 168,832, 176,1344, 184,1856, 192,384, 
+   200,896, 208,1408, 216,1920, 224,448, 232,960, 240,1472, 248,1984, 
+   256,512, 264,520, 272,1032, 280,1544, 288,640, 296,584, 304,1096, 312,1608, 
+   320,768, 328,648, 336,1160, 344,1672, 352,896, 360,712, 368,1224, 376,1736, 
+   384,520, 392,776, 400,1288, 408,1800, 416,648, 424,840, 432,1352, 440,1864, 
+   448,776, 456,904, 464,1416, 472,1928, 480,904, 488,968, 496,1480, 504,1992, 
+   520,528, 512,1024, 528,1040, 536,1552, 544,1152, 552,592, 560,1104, 
+   568,1616, 576,1280, 584,656, 592,1168, 600,1680, 608,1408, 616,720, 
+   624,1232, 632,1744, 640,1032, 648,784, 656,1296, 664,1808, 672,1160, 
+   680,848, 688,1360, 696,1872, 704,1288, 712,912, 720,1424, 728,1936, 
+   736,1416, 744,976, 752,1488, 760,2000, 768,1536, 776,1552, 784,1048, 
+   792,1560, 800,1664, 808,1680, 816,1112, 824,1624, 832,1792, 840,1808, 
+   848,1176, 856,1688, 864,1920, 872,1936, 880,1240, 888,1752, 896,1544, 
+   904,1560, 912,1304, 920,1816, 928,1672, 936,1688, 944,1368, 952,1880, 
+   960,1800, 968,1816, 976,1432, 984,1944, 992,1928, 1000,1944, 1008,1496, 
+   1016,2008, 1032,1152, 1040,1056, 1048,1568, 1064,1408, 1072,1120, 
+   1080,1632, 1088,1536, 1096,1160, 1104,1184, 1112,1696, 1120,1552, 
+   1128,1416, 1136,1248, 1144,1760, 1160,1664, 1168,1312, 1176,1824, 
+   1184,1544, 1192,1920, 1200,1376, 1208,1888, 1216,1568, 1224,1672, 
+   1232,1440, 1240,1952, 1248,1560, 1256,1928, 1264,1504, 1272,2016, 
+   1288,1312, 1296,1408, 1304,1576, 1320,1424, 1328,1416, 1336,1640, 
+   1344,1792, 1352,1824, 1360,1920, 1368,1704, 1376,1800, 1384,1432, 
+   1392,1928, 1400,1768, 1416,1680, 1432,1832, 1440,1576, 1448,1936, 
+   1456,1832, 1464,1896, 1472,1808, 1480,1688, 1488,1936, 1496,1960, 
+   1504,1816, 1512,1944, 1520,1944, 1528,2024, 1560,1584, 1592,1648, 
+   1600,1792, 1608,1920, 1616,1800, 1624,1712, 1632,1808, 1640,1936, 
+   1648,1816, 1656,1776, 1672,1696, 1688,1840, 1704,1952, 1712,1928, 
+   1720,1904, 1728,1824, 1736,1952, 1744,1832, 1752,1968, 1760,1840, 
+   1768,1960, 1776,1944, 1784,2032, 1864,1872, 1848,1944, 1872,1888, 
+   1880,1904, 1888,1984, 1896,2000, 1912,2032, 1904,2016, 1976,2032,
+   1960,1968, 2008,2032, 1992,2016, 2024,2032
+};
+
+const uint16_t armBitRevIndexTable512[ARMBITREVINDEXTABLE_512_TABLE_LENGTH] = 
+{
+   //radix 8, size 448
+   8,512, 16,1024, 24,1536, 32,2048, 40,2560, 48,3072, 56,3584, 72,576, 
+   80,1088, 88,1600, 96,2112, 104,2624, 112,3136, 120,3648, 136,640, 144,1152, 
+   152,1664, 160,2176, 168,2688, 176,3200, 184,3712, 200,704, 208,1216, 
+   216,1728, 224,2240, 232,2752, 240,3264, 248,3776, 264,768, 272,1280, 
+   280,1792, 288,2304, 296,2816, 304,3328, 312,3840, 328,832, 336,1344, 
+   344,1856, 352,2368, 360,2880, 368,3392, 376,3904, 392,896, 400,1408, 
+   408,1920, 416,2432, 424,2944, 432,3456, 440,3968, 456,960, 464,1472, 
+   472,1984, 480,2496, 488,3008, 496,3520, 504,4032, 528,1032, 536,1544, 
+   544,2056, 552,2568, 560,3080, 568,3592, 592,1096, 600,1608, 608,2120, 
+   616,2632, 624,3144, 632,3656, 656,1160, 664,1672, 672,2184, 680,2696, 
+   688,3208, 696,3720, 720,1224, 728,1736, 736,2248, 744,2760, 752,3272, 
+   760,3784, 784,1288, 792,1800, 800,2312, 808,2824, 816,3336, 824,3848, 
+   848,1352, 856,1864, 864,2376, 872,2888, 880,3400, 888,3912, 912,1416, 
+   920,1928, 928,2440, 936,2952, 944,3464, 952,3976, 976,1480, 984,1992, 
+   992,2504, 1000,3016, 1008,3528, 1016,4040, 1048,1552, 1056,2064, 1064,2576, 
+   1072,3088, 1080,3600, 1112,1616, 1120,2128, 1128,2640, 1136,3152, 
+   1144,3664, 1176,1680, 1184,2192, 1192,2704, 1200,3216, 1208,3728, 
+   1240,1744, 1248,2256, 1256,2768, 1264,3280, 1272,3792, 1304,1808, 
+   1312,2320, 1320,2832, 1328,3344, 1336,3856, 1368,1872, 1376,2384, 
+   1384,2896, 1392,3408, 1400,3920, 1432,1936, 1440,2448, 1448,2960, 
+   1456,3472, 1464,3984, 1496,2000, 1504,2512, 1512,3024, 1520,3536, 
+   1528,4048, 1568,2072, 1576,2584, 1584,3096, 1592,3608, 1632,2136, 
+   1640,2648, 1648,3160, 1656,3672, 1696,2200, 1704,2712, 1712,3224, 
+   1720,3736, 1760,2264, 1768,2776, 1776,3288, 1784,3800, 1824,2328, 
+   1832,2840, 1840,3352, 1848,3864, 1888,2392, 1896,2904, 1904,3416, 
+   1912,3928, 1952,2456, 1960,2968, 1968,3480, 1976,3992, 2016,2520, 
+   2024,3032, 2032,3544, 2040,4056, 2088,2592, 2096,3104, 2104,3616, 
+   2152,2656, 2160,3168, 2168,3680, 2216,2720, 2224,3232, 2232,3744, 
+   2280,2784, 2288,3296, 2296,3808, 2344,2848, 2352,3360, 2360,3872, 
+   2408,2912, 2416,3424, 2424,3936, 2472,2976, 2480,3488, 2488,4000, 
+   2536,3040, 2544,3552, 2552,4064, 2608,3112, 2616,3624, 2672,3176, 
+   2680,3688, 2736,3240, 2744,3752, 2800,3304, 2808,3816, 2864,3368, 
+   2872,3880, 2928,3432, 2936,3944, 2992,3496, 3000,4008, 3056,3560, 
+   3064,4072, 3128,3632, 3192,3696, 3256,3760, 3320,3824, 3384,3888, 
+   3448,3952, 3512,4016, 3576,4080
+};
+
+const uint16_t armBitRevIndexTable1024[ARMBITREVINDEXTABLE1024_TABLE_LENGTH] = 
+{
+   //8x2, size 1800
+   8,4096, 16,512, 24,4608, 32,1024, 40,5120, 48,1536, 56,5632, 64,2048, 
+   72,6144, 80,2560, 88,6656, 96,3072, 104,7168, 112,3584, 120,7680, 128,2048, 
+   136,4160, 144,576, 152,4672, 160,1088, 168,5184, 176,1600, 184,5696, 
+   192,2112, 200,6208, 208,2624, 216,6720, 224,3136, 232,7232, 240,3648, 
+   248,7744, 256,2048, 264,4224, 272,640, 280,4736, 288,1152, 296,5248, 
+   304,1664, 312,5760, 320,2176, 328,6272, 336,2688, 344,6784, 352,3200, 
+   360,7296, 368,3712, 376,7808, 384,2112, 392,4288, 400,704, 408,4800, 
+   416,1216, 424,5312, 432,1728, 440,5824, 448,2240, 456,6336, 464,2752, 
+   472,6848, 480,3264, 488,7360, 496,3776, 504,7872, 512,2048, 520,4352, 
+   528,768, 536,4864, 544,1280, 552,5376, 560,1792, 568,5888, 576,2304, 
+   584,6400, 592,2816, 600,6912, 608,3328, 616,7424, 624,3840, 632,7936, 
+   640,2176, 648,4416, 656,832, 664,4928, 672,1344, 680,5440, 688,1856, 
+   696,5952, 704,2368, 712,6464, 720,2880, 728,6976, 736,3392, 744,7488, 
+   752,3904, 760,8000, 768,2112, 776,4480, 784,896, 792,4992, 800,1408, 
+   808,5504, 816,1920, 824,6016, 832,2432, 840,6528, 848,2944, 856,7040, 
+   864,3456, 872,7552, 880,3968, 888,8064, 896,2240, 904,4544, 912,960, 
+   920,5056, 928,1472, 936,5568, 944,1984, 952,6080, 960,2496, 968,6592, 
+   976,3008, 984,7104, 992,3520, 1000,7616, 1008,4032, 1016,8128, 1024,4096, 
+   1032,4104, 1040,4352, 1048,4616, 1056,4104, 1064,5128, 1072,1544, 
+   1080,5640, 1088,2056, 1096,6152, 1104,2568, 1112,6664, 1120,3080, 
+   1128,7176, 1136,3592, 1144,7688, 1152,6144, 1160,4168, 1168,6400, 
+   1176,4680, 1184,6152, 1192,5192, 1200,1608, 1208,5704, 1216,2120, 
+   1224,6216, 1232,2632, 1240,6728, 1248,3144, 1256,7240, 1264,3656, 
+   1272,7752, 1280,4160, 1288,4232, 1296,4416, 1304,4744, 1312,4168, 
+   1320,5256, 1328,1672, 1336,5768, 1344,2184, 1352,6280, 1360,2696, 
+   1368,6792, 1376,3208, 1384,7304, 1392,3720, 1400,7816, 1408,6208, 
+   1416,4296, 1424,6464, 1432,4808, 1440,6216, 1448,5320, 1456,1736, 
+   1464,5832, 1472,2248, 1480,6344, 1488,2760, 1496,6856, 1504,3272, 
+   1512,7368, 1520,3784, 1528,7880, 1536,4224, 1544,4360, 1552,4480, 
+   1560,4872, 1568,4232, 1576,5384, 1584,1800, 1592,5896, 1600,2312, 
+   1608,6408, 1616,2824, 1624,6920, 1632,3336, 1640,7432, 1648,3848, 
+   1656,7944, 1664,6272, 1672,4424, 1680,6528, 1688,4936, 1696,6280, 
+   1704,5448, 1712,1864, 1720,5960, 1728,2376, 1736,6472, 1744,2888, 
+   1752,6984, 1760,3400, 1768,7496, 1776,3912, 1784,8008, 1792,4288, 
+   1800,4488, 1808,4544, 1816,5000, 1824,4296, 1832,5512, 1840,1928, 
+   1848,6024, 1856,2440, 1864,6536, 1872,2952, 1880,7048, 1888,3464, 
+   1896,7560, 1904,3976, 1912,8072, 1920,6336, 1928,4552, 1936,6592, 
+   1944,5064, 1952,6344, 1960,5576, 1968,1992, 1976,6088, 1984,2504, 
+   1992,6600, 2000,3016, 2008,7112, 2016,3528, 2024,7624, 2032,4040, 
+   2040,8136, 2056,4112, 2064,2112, 2072,4624, 2080,4352, 2088,5136, 
+   2096,4480, 2104,5648, 2120,6160, 2128,2576, 2136,6672, 2144,3088, 
+   2152,7184, 2160,3600, 2168,7696, 2176,2560, 2184,4176, 2192,2816, 
+   2200,4688, 2208,2568, 2216,5200, 2224,2824, 2232,5712, 2240,2576, 
+   2248,6224, 2256,2640, 2264,6736, 2272,3152, 2280,7248, 2288,3664, 
+   2296,7760, 2312,4240, 2320,2432, 2328,4752, 2336,6400, 2344,5264, 
+   2352,6528, 2360,5776, 2368,2816, 2376,6288, 2384,2704, 2392,6800, 
+   2400,3216, 2408,7312, 2416,3728, 2424,7824, 2432,2624, 2440,4304, 
+   2448,2880, 2456,4816, 2464,2632, 2472,5328, 2480,2888, 2488,5840, 
+   2496,2640, 2504,6352, 2512,2768, 2520,6864, 2528,3280, 2536,7376, 
+   2544,3792, 2552,7888, 2568,4368, 2584,4880, 2592,4416, 2600,5392, 
+   2608,4544, 2616,5904, 2632,6416, 2640,2832, 2648,6928, 2656,3344, 
+   2664,7440, 2672,3856, 2680,7952, 2696,4432, 2704,2944, 2712,4944, 
+   2720,4432, 2728,5456, 2736,2952, 2744,5968, 2752,2944, 2760,6480, 
+   2768,2896, 2776,6992, 2784,3408, 2792,7504, 2800,3920, 2808,8016, 
+   2824,4496, 2840,5008, 2848,6464, 2856,5520, 2864,6592, 2872,6032, 
+   2888,6544, 2896,2960, 2904,7056, 2912,3472, 2920,7568, 2928,3984, 
+   2936,8080, 2952,4560, 2960,3008, 2968,5072, 2976,6480, 2984,5584, 
+   2992,3016, 3000,6096, 3016,6608, 3032,7120, 3040,3536, 3048,7632, 
+   3056,4048, 3064,8144, 3072,4608, 3080,4120, 3088,4864, 3096,4632, 
+   3104,4616, 3112,5144, 3120,4872, 3128,5656, 3136,4624, 3144,6168, 
+   3152,4880, 3160,6680, 3168,4632, 3176,7192, 3184,3608, 3192,7704, 
+   3200,6656, 3208,4184, 3216,6912, 3224,4696, 3232,6664, 3240,5208, 
+   3248,6920, 3256,5720, 3264,6672, 3272,6232, 3280,6928, 3288,6744, 
+   3296,6680, 3304,7256, 3312,3672, 3320,7768, 3328,4672, 3336,4248, 
+   3344,4928, 3352,4760, 3360,4680, 3368,5272, 3376,4936, 3384,5784, 
+   3392,4688, 3400,6296, 3408,4944, 3416,6808, 3424,4696, 3432,7320, 
+   3440,3736, 3448,7832, 3456,6720, 3464,4312, 3472,6976, 3480,4824, 
+   3488,6728, 3496,5336, 3504,6984, 3512,5848, 3520,6736, 3528,6360, 
+   3536,6992, 3544,6872, 3552,6744, 3560,7384, 3568,3800, 3576,7896, 
+   3584,4736, 3592,4376, 3600,4992, 3608,4888, 3616,4744, 3624,5400, 
+   3632,5000, 3640,5912, 3648,4752, 3656,6424, 3664,5008, 3672,6936, 
+   3680,4760, 3688,7448, 3696,3864, 3704,7960, 3712,6784, 3720,4440, 
+   3728,7040, 3736,4952, 3744,6792, 3752,5464, 3760,7048, 3768,5976, 
+   3776,6800, 3784,6488, 3792,7056, 3800,7000, 3808,6808, 3816,7512, 
+   3824,3928, 3832,8024, 3840,4800, 3848,4504, 3856,5056, 3864,5016, 
+   3872,4808, 3880,5528, 3888,5064, 3896,6040, 3904,4816, 3912,6552, 
+   3920,5072, 3928,7064, 3936,4824, 3944,7576, 3952,3992, 3960,8088, 
+   3968,6848, 3976,4568, 3984,7104, 3992,5080, 4000,6856, 4008,5592, 
+   4016,7112, 4024,6104, 4032,6864, 4040,6616, 4048,7120, 4056,7128, 
+   4064,6872, 4072,7640, 4080,7128, 4088,8152, 4104,4128, 4112,4160, 
+   4120,4640, 4136,5152, 4144,4232, 4152,5664, 4160,4352, 4168,6176, 
+   4176,4416, 4184,6688, 4192,4616, 4200,7200, 4208,4744, 4216,7712, 
+   4224,4608, 4232,4616, 4240,4672, 4248,4704, 4256,4640, 4264,5216, 
+   4272,4704, 4280,5728, 4288,4864, 4296,6240, 4304,4928, 4312,6752, 
+   4320,4632, 4328,7264, 4336,4760, 4344,7776, 4360,4640, 4368,4416, 
+   4376,4768, 4384,6152, 4392,5280, 4400,6280, 4408,5792, 4424,6304, 
+   4440,6816, 4448,6664, 4456,7328, 4464,6792, 4472,7840, 4480,4624, 
+   4488,4632, 4496,4688, 4504,4832, 4512,6168, 4520,5344, 4528,6296, 
+   4536,5856, 4544,4880, 4552,6368, 4560,4944, 4568,6880, 4576,6680, 
+   4584,7392, 4592,6808, 4600,7904, 4608,6144, 4616,6152, 4624,6208, 
+   4632,4896, 4640,6176, 4648,5408, 4656,6240, 4664,5920, 4672,6400, 
+   4680,6432, 4688,6464, 4696,6944, 4704,6432, 4712,7456, 4720,4808, 
+   4728,7968, 4736,6656, 4744,6664, 4752,6720, 4760,4960, 4768,6688, 
+   4776,5472, 4784,6752, 4792,5984, 4800,6912, 4808,6496, 4816,6976, 
+   4824,7008, 4832,6944, 4840,7520, 4848,7008, 4856,8032, 4864,6160, 
+   4872,6168, 4880,6224, 4888,5024, 4896,6216, 4904,5536, 4912,6344, 
+   4920,6048, 4928,6416, 4936,6560, 4944,6480, 4952,7072, 4960,6728, 
+   4968,7584, 4976,6856, 4984,8096, 4992,6672, 5000,6680, 5008,6736, 
+   5016,5088, 5024,6232, 5032,5600, 5040,6360, 5048,6112, 5056,6928, 
+   5064,6624, 5072,6992, 5080,7136, 5088,6744, 5096,7648, 5104,6872, 
+   5112,8160, 5128,5152, 5136,5376, 5144,5408, 5168,5384, 5176,5672, 
+   5184,5376, 5192,6184, 5200,5392, 5208,6696, 5216,5408, 5224,7208, 
+   5232,5400, 5240,7720, 5248,7168, 5256,7200, 5264,7424, 5272,7456, 
+   5280,7176, 5288,7208, 5296,7432, 5304,5736, 5312,7184, 5320,6248, 
+   5328,7440, 5336,6760, 5344,7192, 5352,7272, 5360,7448, 5368,7784, 
+   5384,5408, 5392,5440, 5400,5472, 5408,6184, 5416,7208, 5424,5448, 
+   5432,5800, 5448,6312, 5464,6824, 5472,6696, 5480,7336, 5488,6824, 
+   5496,7848, 5504,7232, 5512,7264, 5520,7488, 5528,7520, 5536,7240, 
+   5544,7272, 5552,7496, 5560,5864, 5568,7248, 5576,6376, 5584,7504, 
+   5592,6888, 5600,7256, 5608,7400, 5616,7512, 5624,7912, 5632,7168, 
+   5640,7176, 5648,7232, 5656,7240, 5664,7200, 5672,7208, 5680,7264, 
+   5688,5928, 5696,7424, 5704,6440, 5712,7488, 5720,6952, 5728,7456, 
+   5736,7464, 5744,7520, 5752,7976, 5760,7296, 5768,7328, 5776,7552, 
+   5784,7584, 5792,7304, 5800,7336, 5808,7560, 5816,5992, 5824,7312, 
+   5832,6504, 5840,7568, 5848,7016, 5856,7320, 5864,7528, 5872,7576, 
+   5880,8040, 5888,7184, 5896,7192, 5904,7248, 5912,7256, 5920,6248, 
+   5928,7272, 5936,6376, 5944,6056, 5952,7440, 5960,6568, 5968,7504, 
+   5976,7080, 5984,6760, 5992,7592, 6000,6888, 6008,8104, 6016,7360, 
+   6024,7392, 6032,7616, 6040,7648, 6048,7368, 6056,7400, 6064,7624, 
+   6072,6120, 6080,7376, 6088,6632, 6096,7632, 6104,7144, 6112,7384, 
+   6120,7656, 6128,7640, 6136,8168, 6168,6240, 6192,6216, 6200,7264, 
+   6232,6704, 6248,7216, 6256,6680, 6264,7728, 6272,6656, 6280,6664, 
+   6288,6912, 6296,6496, 6304,6688, 6312,6696, 6320,6944, 6328,7520, 
+   6336,6672, 6344,6680, 6352,6928, 6360,6768, 6368,6704, 6376,7280, 
+   6384,6744, 6392,7792, 6408,6432, 6424,6752, 6440,7432, 6448,6536, 
+   6456,7560, 6472,6944, 6488,6832, 6496,6920, 6504,7344, 6512,7048, 
+   6520,7856, 6528,6720, 6536,6728, 6544,6976, 6552,7008, 6560,6752, 
+   6568,7448, 6576,7008, 6584,7576, 6592,6736, 6600,6744, 6608,6992, 
+   6616,6896, 6624,6936, 6632,7408, 6640,7064, 6648,7920, 6712,7280, 
+   6744,6960, 6760,7472, 6768,6936, 6776,7984, 6800,6848, 6808,6856, 
+   6832,6880, 6840,6888, 6848,7040, 6856,7048, 6864,7104, 6872,7024, 
+   6880,7072, 6888,7536, 6896,7136, 6904,8048, 6952,7496, 6968,7624, 
+   6984,7008, 7000,7088, 7016,7600, 7024,7112, 7032,8112, 7056,7104, 
+   7064,7112, 7080,7512, 7088,7136, 7096,7640, 7128,7152, 7144,7664, 
+   7160,8176, 7176,7200, 7192,7216, 7224,7272, 7240,7264, 7256,7280, 
+   7288,7736, 7296,7680, 7304,7712, 7312,7936, 7320,7968, 7328,7688, 
+   7336,7720, 7344,7944, 7352,7976, 7360,7696, 7368,7728, 7376,7952, 
+   7384,7984, 7392,7704, 7400,7736, 7408,7960, 7416,7800, 7432,7456, 
+   7448,7472, 7480,7592, 7496,7520, 7512,7536, 7528,7976, 7544,7864, 
+   7552,7744, 7560,7776, 7568,8000, 7576,8032, 7584,7752, 7592,7784, 
+   7600,8008, 7608,8040, 7616,7760, 7624,7792, 7632,8016, 7640,8048, 
+   7648,7768, 7656,7800, 7664,8024, 7672,7928, 7688,7712, 7704,7728, 
+   7752,7776, 7768,7792, 7800,7992, 7816,7840, 7824,8064, 7832,8096, 
+   7856,8072, 7864,8104, 7872,8064, 7880,8072, 7888,8080, 7896,8112, 
+   7904,8096, 7912,8104, 7920,8088, 7928,8056, 7944,7968, 7960,7984, 
+   8008,8032, 8024,8048, 8056,8120, 8072,8096, 8080,8128, 8088,8160, 
+   8112,8136, 8120,8168, 8136,8160, 8152,8176
+};
+
+const uint16_t armBitRevIndexTable2048[ARMBITREVINDEXTABLE2048_TABLE_LENGTH] = 
+{
+   //8x2, size 3808
+   8,4096, 16,8192, 24,12288, 32,512, 40,4608, 48,8704, 56,12800, 64,1024, 
+   72,5120, 80,9216, 88,13312, 96,1536, 104,5632, 112,9728, 120,13824, 
+   128,2048, 136,6144, 144,10240, 152,14336, 160,2560, 168,6656, 176,10752, 
+   184,14848, 192,3072, 200,7168, 208,11264, 216,15360, 224,3584, 232,7680, 
+   240,11776, 248,15872, 256,1024, 264,4160, 272,8256, 280,12352, 288,576, 
+   296,4672, 304,8768, 312,12864, 320,1088, 328,5184, 336,9280, 344,13376, 
+   352,1600, 360,5696, 368,9792, 376,13888, 384,2112, 392,6208, 400,10304, 
+   408,14400, 416,2624, 424,6720, 432,10816, 440,14912, 448,3136, 456,7232, 
+   464,11328, 472,15424, 480,3648, 488,7744, 496,11840, 504,15936, 512,2048, 
+   520,4224, 528,8320, 536,12416, 544,640, 552,4736, 560,8832, 568,12928, 
+   576,1152, 584,5248, 592,9344, 600,13440, 608,1664, 616,5760, 624,9856, 
+   632,13952, 640,2176, 648,6272, 656,10368, 664,14464, 672,2688, 680,6784, 
+   688,10880, 696,14976, 704,3200, 712,7296, 720,11392, 728,15488, 736,3712, 
+   744,7808, 752,11904, 760,16000, 768,3072, 776,4288, 784,8384, 792,12480, 
+   800,3200, 808,4800, 816,8896, 824,12992, 832,1216, 840,5312, 848,9408, 
+   856,13504, 864,1728, 872,5824, 880,9920, 888,14016, 896,2240, 904,6336, 
+   912,10432, 920,14528, 928,2752, 936,6848, 944,10944, 952,15040, 960,3264, 
+   968,7360, 976,11456, 984,15552, 992,3776, 1000,7872, 1008,11968, 1016,16064, 
+   1032,4352, 1040,8448, 1048,12544, 1056,3072, 1064,4864, 1072,8960, 
+   1080,13056, 1088,1280, 1096,5376, 1104,9472, 1112,13568, 1120,1792, 
+   1128,5888, 1136,9984, 1144,14080, 1152,2304, 1160,6400, 1168,10496, 
+   1176,14592, 1184,2816, 1192,6912, 1200,11008, 1208,15104, 1216,3328, 
+   1224,7424, 1232,11520, 1240,15616, 1248,3840, 1256,7936, 1264,12032, 
+   1272,16128, 1288,4416, 1296,8512, 1304,12608, 1312,3328, 1320,4928, 
+   1328,9024, 1336,13120, 1352,5440, 1360,9536, 1368,13632, 1376,1856, 
+   1384,5952, 1392,10048, 1400,14144, 1408,2368, 1416,6464, 1424,10560, 
+   1432,14656, 1440,2880, 1448,6976, 1456,11072, 1464,15168, 1472,3392, 
+   1480,7488, 1488,11584, 1496,15680, 1504,3904, 1512,8000, 1520,12096, 
+   1528,16192, 1536,2112, 1544,4480, 1552,8576, 1560,12672, 1568,2240, 
+   1576,4992, 1584,9088, 1592,13184, 1600,2368, 1608,5504, 1616,9600, 
+   1624,13696, 1632,1920, 1640,6016, 1648,10112, 1656,14208, 1664,2432, 
+   1672,6528, 1680,10624, 1688,14720, 1696,2944, 1704,7040, 1712,11136, 
+   1720,15232, 1728,3456, 1736,7552, 1744,11648, 1752,15744, 1760,3968, 
+   1768,8064, 1776,12160, 1784,16256, 1792,3136, 1800,4544, 1808,8640, 
+   1816,12736, 1824,3264, 1832,5056, 1840,9152, 1848,13248, 1856,3392, 
+   1864,5568, 1872,9664, 1880,13760, 1888,1984, 1896,6080, 1904,10176, 
+   1912,14272, 1920,2496, 1928,6592, 1936,10688, 1944,14784, 1952,3008, 
+   1960,7104, 1968,11200, 1976,15296, 1984,3520, 1992,7616, 2000,11712, 
+   2008,15808, 2016,4032, 2024,8128, 2032,12224, 2040,16320, 2048,4096, 
+   2056,4104, 2064,8200, 2072,12296, 2080,4224, 2088,4616, 2096,8712, 
+   2104,12808, 2112,4352, 2120,5128, 2128,9224, 2136,13320, 2144,4480, 
+   2152,5640, 2160,9736, 2168,13832, 2176,4104, 2184,6152, 2192,10248, 
+   2200,14344, 2208,2568, 2216,6664, 2224,10760, 2232,14856, 2240,3080, 
+   2248,7176, 2256,11272, 2264,15368, 2272,3592, 2280,7688, 2288,11784, 
+   2296,15880, 2304,5120, 2312,4168, 2320,8264, 2328,12360, 2336,5248, 
+   2344,4680, 2352,8776, 2360,12872, 2368,5376, 2376,5192, 2384,9288, 
+   2392,13384, 2400,5504, 2408,5704, 2416,9800, 2424,13896, 2432,5128, 
+   2440,6216, 2448,10312, 2456,14408, 2464,2632, 2472,6728, 2480,10824, 
+   2488,14920, 2496,3144, 2504,7240, 2512,11336, 2520,15432, 2528,3656, 
+   2536,7752, 2544,11848, 2552,15944, 2560,6144, 2568,4232, 2576,8328, 
+   2584,12424, 2592,6272, 2600,4744, 2608,8840, 2616,12936, 2624,6400, 
+   2632,5256, 2640,9352, 2648,13448, 2656,6528, 2664,5768, 2672,9864, 
+   2680,13960, 2688,6152, 2696,6280, 2704,10376, 2712,14472, 2720,6280, 
+   2728,6792, 2736,10888, 2744,14984, 2752,3208, 2760,7304, 2768,11400, 
+   2776,15496, 2784,3720, 2792,7816, 2800,11912, 2808,16008, 2816,7168, 
+   2824,4296, 2832,8392, 2840,12488, 2848,7296, 2856,4808, 2864,8904, 
+   2872,13000, 2880,7424, 2888,5320, 2896,9416, 2904,13512, 2912,7552, 
+   2920,5832, 2928,9928, 2936,14024, 2944,7176, 2952,6344, 2960,10440, 
+   2968,14536, 2976,7304, 2984,6856, 2992,10952, 3000,15048, 3008,3272, 
+   3016,7368, 3024,11464, 3032,15560, 3040,3784, 3048,7880, 3056,11976, 
+   3064,16072, 3072,4160, 3080,4360, 3088,8456, 3096,12552, 3104,4288, 
+   3112,4872, 3120,8968, 3128,13064, 3136,4416, 3144,5384, 3152,9480, 
+   3160,13576, 3168,4544, 3176,5896, 3184,9992, 3192,14088, 3200,4168, 
+   3208,6408, 3216,10504, 3224,14600, 3232,4296, 3240,6920, 3248,11016, 
+   3256,15112, 3264,3336, 3272,7432, 3280,11528, 3288,15624, 3296,3848, 
+   3304,7944, 3312,12040, 3320,16136, 3328,5184, 3336,4424, 3344,8520, 
+   3352,12616, 3360,5312, 3368,4936, 3376,9032, 3384,13128, 3392,5440, 
+   3400,5448, 3408,9544, 3416,13640, 3424,5568, 3432,5960, 3440,10056, 
+   3448,14152, 3456,5192, 3464,6472, 3472,10568, 3480,14664, 3488,5320, 
+   3496,6984, 3504,11080, 3512,15176, 3520,5448, 3528,7496, 3536,11592, 
+   3544,15688, 3552,3912, 3560,8008, 3568,12104, 3576,16200, 3584,6208, 
+   3592,4488, 3600,8584, 3608,12680, 3616,6336, 3624,5000, 3632,9096, 
+   3640,13192, 3648,6464, 3656,5512, 3664,9608, 3672,13704, 3680,6592, 
+   3688,6024, 3696,10120, 3704,14216, 3712,6216, 3720,6536, 3728,10632, 
+   3736,14728, 3744,6344, 3752,7048, 3760,11144, 3768,15240, 3776,6472, 
+   3784,7560, 3792,11656, 3800,15752, 3808,3976, 3816,8072, 3824,12168, 
+   3832,16264, 3840,7232, 3848,4552, 3856,8648, 3864,12744, 3872,7360, 
+   3880,5064, 3888,9160, 3896,13256, 3904,7488, 3912,5576, 3920,9672, 
+   3928,13768, 3936,7616, 3944,6088, 3952,10184, 3960,14280, 3968,7240, 
+   3976,6600, 3984,10696, 3992,14792, 4000,7368, 4008,7112, 4016,11208, 
+   4024,15304, 4032,7496, 4040,7624, 4048,11720, 4056,15816, 4064,7624, 
+   4072,8136, 4080,12232, 4088,16328, 4096,8192, 4104,4112, 4112,8208, 
+   4120,12304, 4128,8320, 4136,4624, 4144,8720, 4152,12816, 4160,8448, 
+   4168,5136, 4176,9232, 4184,13328, 4192,8576, 4200,5648, 4208,9744, 
+   4216,13840, 4224,8200, 4232,6160, 4240,10256, 4248,14352, 4256,8328, 
+   4264,6672, 4272,10768, 4280,14864, 4288,8456, 4296,7184, 4304,11280, 
+   4312,15376, 4320,8584, 4328,7696, 4336,11792, 4344,15888, 4352,9216, 
+   4360,9232, 4368,8272, 4376,12368, 4384,9344, 4392,4688, 4400,8784, 
+   4408,12880, 4416,9472, 4424,5200, 4432,9296, 4440,13392, 4448,9600, 
+   4456,5712, 4464,9808, 4472,13904, 4480,9224, 4488,6224, 4496,10320, 
+   4504,14416, 4512,9352, 4520,6736, 4528,10832, 4536,14928, 4544,9480, 
+   4552,7248, 4560,11344, 4568,15440, 4576,9608, 4584,7760, 4592,11856, 
+   4600,15952, 4608,10240, 4616,10256, 4624,8336, 4632,12432, 4640,10368, 
+   4648,4752, 4656,8848, 4664,12944, 4672,10496, 4680,5264, 4688,9360, 
+   4696,13456, 4704,10624, 4712,5776, 4720,9872, 4728,13968, 4736,10248, 
+   4744,6288, 4752,10384, 4760,14480, 4768,10376, 4776,6800, 4784,10896, 
+   4792,14992, 4800,10504, 4808,7312, 4816,11408, 4824,15504, 4832,10632, 
+   4840,7824, 4848,11920, 4856,16016, 4864,11264, 4872,11280, 4880,8400, 
+   4888,12496, 4896,11392, 4904,11408, 4912,8912, 4920,13008, 4928,11520, 
+   4936,5328, 4944,9424, 4952,13520, 4960,11648, 4968,5840, 4976,9936, 
+   4984,14032, 4992,11272, 5000,6352, 5008,10448, 5016,14544, 5024,11400, 
+   5032,6864, 5040,10960, 5048,15056, 5056,11528, 5064,7376, 5072,11472, 
+   5080,15568, 5088,11656, 5096,7888, 5104,11984, 5112,16080, 5120,8256, 
+   5128,8272, 5136,8464, 5144,12560, 5152,8384, 5160,8400, 5168,8976, 
+   5176,13072, 5184,8512, 5192,5392, 5200,9488, 5208,13584, 5216,8640, 
+   5224,5904, 5232,10000, 5240,14096, 5248,8264, 5256,6416, 5264,10512, 
+   5272,14608, 5280,8392, 5288,6928, 5296,11024, 5304,15120, 5312,8520, 
+   5320,7440, 5328,11536, 5336,15632, 5344,8648, 5352,7952, 5360,12048, 
+   5368,16144, 5376,9280, 5384,9296, 5392,8528, 5400,12624, 5408,9408, 
+   5416,9424, 5424,9040, 5432,13136, 5440,9536, 5448,5456, 5456,9552, 
+   5464,13648, 5472,9664, 5480,5968, 5488,10064, 5496,14160, 5504,9288, 
+   5512,6480, 5520,10576, 5528,14672, 5536,9416, 5544,6992, 5552,11088, 
+   5560,15184, 5568,9544, 5576,7504, 5584,11600, 5592,15696, 5600,9672, 
+   5608,8016, 5616,12112, 5624,16208, 5632,10304, 5640,10320, 5648,8592, 
+   5656,12688, 5664,10432, 5672,10448, 5680,9104, 5688,13200, 5696,10560, 
+   5704,10576, 5712,9616, 5720,13712, 5728,10688, 5736,6032, 5744,10128, 
+   5752,14224, 5760,10312, 5768,6544, 5776,10640, 5784,14736, 5792,10440, 
+   5800,7056, 5808,11152, 5816,15248, 5824,10568, 5832,7568, 5840,11664, 
+   5848,15760, 5856,10696, 5864,8080, 5872,12176, 5880,16272, 5888,11328, 
+   5896,11344, 5904,8656, 5912,12752, 5920,11456, 5928,11472, 5936,9168, 
+   5944,13264, 5952,11584, 5960,11600, 5968,9680, 5976,13776, 5984,11712, 
+   5992,6096, 6000,10192, 6008,14288, 6016,11336, 6024,6608, 6032,10704, 
+   6040,14800, 6048,11464, 6056,7120, 6064,11216, 6072,15312, 6080,11592, 
+   6088,7632, 6096,11728, 6104,15824, 6112,11720, 6120,8144, 6128,12240, 
+   6136,16336, 6144,12288, 6152,12304, 6160,8216, 6168,12312, 6176,12416, 
+   6184,12432, 6192,8728, 6200,12824, 6208,12544, 6216,12560, 6224,9240, 
+   6232,13336, 6240,12672, 6248,12688, 6256,9752, 6264,13848, 6272,12296, 
+   6280,12312, 6288,10264, 6296,14360, 6304,12424, 6312,6680, 6320,10776, 
+   6328,14872, 6336,12552, 6344,7192, 6352,11288, 6360,15384, 6368,12680, 
+   6376,7704, 6384,11800, 6392,15896, 6400,13312, 6408,13328, 6416,8280, 
+   6424,12376, 6432,13440, 6440,13456, 6448,8792, 6456,12888, 6464,13568, 
+   6472,13584, 6480,9304, 6488,13400, 6496,13696, 6504,13712, 6512,9816, 
+   6520,13912, 6528,13320, 6536,13336, 6544,10328, 6552,14424, 6560,13448, 
+   6568,6744, 6576,10840, 6584,14936, 6592,13576, 6600,7256, 6608,11352, 
+   6616,15448, 6624,13704, 6632,7768, 6640,11864, 6648,15960, 6656,14336, 
+   6664,14352, 6672,8344, 6680,12440, 6688,14464, 6696,14480, 6704,8856, 
+   6712,12952, 6720,14592, 6728,14608, 6736,9368, 6744,13464, 6752,14720, 
+   6760,14736, 6768,9880, 6776,13976, 6784,14344, 6792,14360, 6800,10392, 
+   6808,14488, 6816,14472, 6824,14488, 6832,10904, 6840,15000, 6848,14600, 
+   6856,7320, 6864,11416, 6872,15512, 6880,14728, 6888,7832, 6896,11928, 
+   6904,16024, 6912,15360, 6920,15376, 6928,8408, 6936,12504, 6944,15488, 
+   6952,15504, 6960,8920, 6968,13016, 6976,15616, 6984,15632, 6992,9432, 
+   7000,13528, 7008,15744, 7016,15760, 7024,9944, 7032,14040, 7040,15368, 
+   7048,15384, 7056,10456, 7064,14552, 7072,15496, 7080,15512, 7088,10968, 
+   7096,15064, 7104,15624, 7112,7384, 7120,11480, 7128,15576, 7136,15752, 
+   7144,7896, 7152,11992, 7160,16088, 7168,12352, 7176,12368, 7184,8472, 
+   7192,12568, 7200,12480, 7208,12496, 7216,8984, 7224,13080, 7232,12608, 
+   7240,12624, 7248,9496, 7256,13592, 7264,12736, 7272,12752, 7280,10008, 
+   7288,14104, 7296,12360, 7304,12376, 7312,10520, 7320,14616, 7328,12488, 
+   7336,12504, 7344,11032, 7352,15128, 7360,12616, 7368,7448, 7376,11544, 
+   7384,15640, 7392,12744, 7400,7960, 7408,12056, 7416,16152, 7424,13376, 
+   7432,13392, 7440,8536, 7448,12632, 7456,13504, 7464,13520, 7472,9048, 
+   7480,13144, 7488,13632, 7496,13648, 7504,9560, 7512,13656, 7520,13760, 
+   7528,13776, 7536,10072, 7544,14168, 7552,13384, 7560,13400, 7568,10584, 
+   7576,14680, 7584,13512, 7592,13528, 7600,11096, 7608,15192, 7616,13640, 
+   7624,13656, 7632,11608, 7640,15704, 7648,13768, 7656,8024, 7664,12120, 
+   7672,16216, 7680,14400, 7688,14416, 7696,8600, 7704,12696, 7712,14528, 
+   7720,14544, 7728,9112, 7736,13208, 7744,14656, 7752,14672, 7760,9624, 
+   7768,13720, 7776,14784, 7784,14800, 7792,10136, 7800,14232, 7808,14408, 
+   7816,14424, 7824,10648, 7832,14744, 7840,14536, 7848,14552, 7856,11160, 
+   7864,15256, 7872,14664, 7880,14680, 7888,11672, 7896,15768, 7904,14792, 
+   7912,8088, 7920,12184, 7928,16280, 7936,15424, 7944,15440, 7952,8664, 
+   7960,12760, 7968,15552, 7976,15568, 7984,9176, 7992,13272, 8000,15680, 
+   8008,15696, 8016,9688, 8024,13784, 8032,15808, 8040,15824, 8048,10200, 
+   8056,14296, 8064,15432, 8072,15448, 8080,10712, 8088,14808, 8096,15560, 
+   8104,15576, 8112,11224, 8120,15320, 8128,15688, 8136,15704, 8144,11736, 
+   8152,15832, 8160,15816, 8168,15832, 8176,12248, 8184,16344, 8200,8320, 
+   8208,8224, 8216,12320, 8232,10368, 8240,8736, 8248,12832, 8256,8448, 
+   8264,8384, 8272,9248, 8280,13344, 8288,9232, 8296,10432, 8304,9760, 
+   8312,13856, 8328,12416, 8336,10272, 8344,14368, 8352,12296, 8360,14464, 
+   8368,10784, 8376,14880, 8384,8456, 8392,12480, 8400,11296, 8408,15392, 
+   8416,12552, 8424,14528, 8432,11808, 8440,15904, 8448,9216, 8456,8576, 
+   8464,9232, 8472,12384, 8480,9248, 8488,10624, 8496,8800, 8504,12896, 
+   8512,9472, 8520,8640, 8528,9312, 8536,13408, 8544,9296, 8552,10688, 
+   8560,9824, 8568,13920, 8576,9224, 8584,12672, 8592,10336, 8600,14432, 
+   8608,13320, 8616,14720, 8624,10848, 8632,14944, 8640,9480, 8648,12736, 
+   8656,11360, 8664,15456, 8672,13576, 8680,14784, 8688,11872, 8696,15968, 
+   8704,12288, 8712,12416, 8720,12296, 8728,12448, 8736,12304, 8744,10376, 
+   8752,8864, 8760,12960, 8768,12352, 8776,12480, 8784,9376, 8792,13472, 
+   8800,12368, 8808,10440, 8816,9888, 8824,13984, 8832,12320, 8840,12424, 
+   8848,10400, 8856,14496, 8864,12312, 8872,14472, 8880,10912, 8888,15008, 
+   8896,12384, 8904,12488, 8912,11424, 8920,15520, 8928,12568, 8936,14536, 
+   8944,11936, 8952,16032, 8960,12544, 8968,12672, 8976,12552, 8984,12512, 
+   8992,12560, 9000,10632, 9008,12568, 9016,13024, 9024,12608, 9032,12736, 
+   9040,9440, 9048,13536, 9056,12624, 9064,10696, 9072,9952, 9080,14048, 
+   9088,9240, 9096,12680, 9104,10464, 9112,14560, 9120,13336, 9128,14728, 
+   9136,10976, 9144,15072, 9152,9496, 9160,12744, 9168,11488, 9176,15584, 
+   9184,13592, 9192,14792, 9200,12000, 9208,16096, 9224,9344, 9232,9248, 
+   9240,12576, 9256,11392, 9264,12560, 9272,13088, 9280,9472, 9288,9408, 
+   9296,9504, 9304,13600, 9312,9488, 9320,11456, 9328,10016, 9336,14112, 
+   9352,13440, 9360,10528, 9368,14624, 9376,12360, 9384,15488, 9392,11040, 
+   9400,15136, 9408,9480, 9416,13504, 9424,11552, 9432,15648, 9440,12616, 
+   9448,15552, 9456,12064, 9464,16160, 9480,9600, 9488,9504, 9496,12640, 
+   9512,11648, 9520,12624, 9528,13152, 9544,9664, 9552,9568, 9560,13664, 
+   9576,11712, 9584,10080, 9592,14176, 9608,13696, 9616,10592, 9624,14688, 
+   9632,13384, 9640,15744, 9648,11104, 9656,15200, 9672,13760, 9680,11616, 
+   9688,15712, 9696,13640, 9704,15808, 9712,12128, 9720,16224, 9728,13312, 
+   9736,13440, 9744,13320, 9752,12704, 9760,13328, 9768,11400, 9776,13336, 
+   9784,13216, 9792,13376, 9800,13504, 9808,13384, 9816,13728, 9824,13392, 
+   9832,11464, 9840,10144, 9848,14240, 9856,13344, 9864,13448, 9872,10656, 
+   9880,14752, 9888,12376, 9896,15496, 9904,11168, 9912,15264, 9920,13408, 
+   9928,13512, 9936,11680, 9944,15776, 9952,12632, 9960,15560, 9968,12192, 
+   9976,16288, 9984,13568, 9992,13696, 10000,13576, 10008,12768, 10016,13584, 
+   10024,11656, 10032,13592, 10040,13280, 10048,13632, 10056,13760, 
+   10064,13640, 10072,13792, 10080,13648, 10088,11720, 10096,10208, 
+   10104,14304, 10112,13600, 10120,13704, 10128,10720, 10136,14816, 
+   10144,13400, 10152,15752, 10160,11232, 10168,15328, 10176,13664, 
+   10184,13768, 10192,11744, 10200,15840, 10208,13656, 10216,15816, 
+   10224,12256, 10232,16352, 10248,10272, 10256,10368, 10264,12328, 
+   10280,10384, 10288,10376, 10296,12840, 10304,11264, 10312,11296, 
+   10320,11392, 10328,13352, 10336,11272, 10344,10448, 10352,11400, 
+   10360,13864, 10376,12432, 10392,14376, 10400,12328, 10408,14480, 
+   10416,10792, 10424,14888, 10432,11280, 10440,12496, 10448,11304, 
+   10456,15400, 10464,11288, 10472,14544, 10480,11816, 10488,15912, 
+   10496,11264, 10504,11272, 10512,11280, 10520,12392, 10528,11296, 
+   10536,10640, 10544,12496, 10552,12904, 10560,11328, 10568,11360, 
+   10576,11456, 10584,13416, 10592,11336, 10600,10704, 10608,11464, 
+   10616,13928, 10624,11392, 10632,12688, 10640,11304, 10648,14440, 
+   10656,13352, 10664,14736, 10672,10856, 10680,14952, 10688,11344, 
+   10696,12752, 10704,11368, 10712,15464, 10720,11352, 10728,14800, 
+   10736,11880, 10744,15976, 10752,14336, 10760,14368, 10768,14464, 
+   10776,12456, 10784,14344, 10792,14376, 10800,14472, 10808,12968, 
+   10816,15360, 10824,15392, 10832,15488, 10840,13480, 10848,15368, 
+   10856,15400, 10864,15496, 10872,13992, 10880,14352, 10888,12440, 
+   10896,14480, 10904,14504, 10912,14360, 10920,14488, 10928,14488, 
+   10936,15016, 10944,15376, 10952,12504, 10960,11432, 10968,15528, 
+   10976,15384, 10984,14552, 10992,11944, 11000,16040, 11008,14400, 
+   11016,14432, 11024,14528, 11032,12520, 11040,14408, 11048,14440, 
+   11056,14536, 11064,13032, 11072,15424, 11080,15456, 11088,15552, 
+   11096,13544, 11104,15432, 11112,15464, 11120,15560, 11128,14056, 
+   11136,14416, 11144,12696, 11152,14544, 11160,14568, 11168,14424, 
+   11176,14744, 11184,14552, 11192,15080, 11200,15440, 11208,12760, 
+   11216,11496, 11224,15592, 11232,15448, 11240,14808, 11248,12008, 
+   11256,16104, 11272,11296, 11280,11392, 11288,12584, 11304,11408, 
+   11312,12688, 11320,13096, 11328,11520, 11336,11552, 11344,11648, 
+   11352,13608, 11360,11528, 11368,11472, 11376,11656, 11384,14120, 
+   11400,13456, 11416,14632, 11424,12392, 11432,15504, 11440,14440, 
+   11448,15144, 11456,11536, 11464,13520, 11472,11560, 11480,15656, 
+   11488,11544, 11496,15568, 11504,12072, 11512,16168, 11528,11552, 
+   11536,11648, 11544,12648, 11560,11664, 11568,12752, 11576,13160, 
+   11592,11616, 11600,11712, 11608,13672, 11624,11728, 11632,11720, 
+   11640,14184, 11656,13712, 11672,14696, 11680,13416, 11688,15760, 
+   11696,15464, 11704,15208, 11720,13776, 11736,15720, 11744,13672, 
+   11752,15824, 11760,12136, 11768,16232, 11776,14592, 11784,14624, 
+   11792,14720, 11800,12712, 11808,14600, 11816,14632, 11824,14728, 
+   11832,13224, 11840,15616, 11848,15648, 11856,15744, 11864,13736, 
+   11872,15624, 11880,15656, 11888,15752, 11896,14248, 11904,14608, 
+   11912,13464, 11920,14736, 11928,14760, 11936,14616, 11944,15512, 
+   11952,14744, 11960,15272, 11968,15632, 11976,13528, 11984,15760, 
+   11992,15784, 12000,15640, 12008,15576, 12016,12200, 12024,16296, 
+   12032,14656, 12040,14688, 12048,14784, 12056,12776, 12064,14664, 
+   12072,14696, 12080,14792, 12088,13288, 12096,15680, 12104,15712, 
+   12112,15808, 12120,13800, 12128,15688, 12136,15720, 12144,15816, 
+   12152,14312, 12160,14672, 12168,13720, 12176,14800, 12184,14824, 
+   12192,14680, 12200,15768, 12208,14808, 12216,15336, 12224,15696, 
+   12232,13784, 12240,15824, 12248,15848, 12256,15704, 12264,15832, 
+   12272,15832, 12280,16360, 12312,12336, 12344,12848, 12352,12544, 
+   12360,12552, 12368,12560, 12376,13360, 12384,12576, 12392,12584, 
+   12400,13336, 12408,13872, 12424,12448, 12440,14384, 12456,14496, 
+   12464,14472, 12472,14896, 12480,12672, 12488,12512, 12496,12688, 
+   12504,15408, 12512,12680, 12520,14560, 12528,14728, 12536,15920, 
+   12544,13312, 12552,13320, 12560,13328, 12568,13336, 12576,13344, 
+   12584,13352, 12592,13360, 12600,12912, 12608,13568, 12616,13576, 
+   12624,13584, 12632,13424, 12640,13600, 12648,13608, 12656,13400, 
+   12664,13936, 12672,13440, 12680,12704, 12688,13456, 12696,14448, 
+   12704,13448, 12712,14752, 12720,15496, 12728,14960, 12736,13696, 
+   12744,12768, 12752,13712, 12760,15472, 12768,13704, 12776,14816, 
+   12784,15752, 12792,15984, 12800,14336, 12808,14464, 12816,14344, 
+   12824,14472, 12832,14352, 12840,14480, 12848,14360, 12856,12976, 
+   12864,14400, 12872,14528, 12880,14408, 12888,13488, 12896,14416, 
+   12904,14544, 12912,14424, 12920,14000, 12928,14368, 12936,14496, 
+   12944,14376, 12952,14512, 12960,14384, 12968,14504, 12976,14488, 
+   12984,15024, 12992,14432, 13000,14560, 13008,14440, 13016,15536, 
+   13024,14448, 13032,14568, 13040,14744, 13048,16048, 13056,14592, 
+   13064,14720, 13072,14600, 13080,14728, 13088,14608, 13096,14736, 
+   13104,14616, 13112,14744, 13120,14656, 13128,14784, 13136,14664, 
+   13144,13552, 13152,14672, 13160,14800, 13168,14680, 13176,14064, 
+   13184,14624, 13192,14752, 13200,14632, 13208,14576, 13216,13464, 
+   13224,14760, 13232,15512, 13240,15088, 13248,14688, 13256,14816, 
+   13264,14696, 13272,15600, 13280,13720, 13288,14824, 13296,15768, 
+   13304,16112, 13336,13360, 13368,14616, 13376,13568, 13384,13576, 
+   13392,13584, 13400,13616, 13408,13600, 13416,13608, 13424,13592, 
+   13432,14128, 13448,13472, 13464,14640, 13480,15520, 13488,14536, 
+   13496,15152, 13504,13696, 13512,13536, 13520,13712, 13528,15664, 
+   13536,13704, 13544,15584, 13552,14792, 13560,16176, 13592,13616, 
+   13624,14680, 13656,13680, 13688,14192, 13704,13728, 13720,14704, 
+   13736,15776, 13744,15560, 13752,15216, 13768,13792, 13784,15728, 
+   13800,15840, 13808,15816, 13816,16240, 13824,15360, 13832,15488, 
+   13840,15368, 13848,15496, 13856,15376, 13864,15504, 13872,15384, 
+   13880,15512, 13888,15424, 13896,15552, 13904,15432, 13912,15560, 
+   13920,15440, 13928,15568, 13936,15448, 13944,14256, 13952,15392, 
+   13960,15520, 13968,15400, 13976,14768, 13984,15408, 13992,15528, 
+   14000,14552, 14008,15280, 14016,15456, 14024,15584, 14032,15464, 
+   14040,15792, 14048,15472, 14056,15592, 14064,14808, 14072,16304, 
+   14080,15616, 14088,15744, 14096,15624, 14104,15752, 14112,15632, 
+   14120,15760, 14128,15640, 14136,15768, 14144,15680, 14152,15808, 
+   14160,15688, 14168,15816, 14176,15696, 14184,15824, 14192,15704, 
+   14200,14320, 14208,15648, 14216,15776, 14224,15656, 14232,14832, 
+   14240,15664, 14248,15784, 14256,15576, 14264,15344, 14272,15712, 
+   14280,15840, 14288,15720, 14296,15856, 14304,15728, 14312,15848, 
+   14320,15832, 14328,16368, 14392,14488, 14400,14592, 14408,14600, 
+   14416,14608, 14424,14616, 14432,14624, 14440,14632, 14448,14640, 
+   14456,15512, 14504,14512, 14520,14904, 14528,14720, 14536,14728, 
+   14544,14736, 14552,15416, 14560,14752, 14568,14576, 14584,15928,
+   14576,14760, 14592,15360, 14600,15368, 14608,15376, 14616,15384, 
+   14624,15392, 14632,15400, 14640,15408, 14648,15416, 14656,15616, 
+   14664,15624, 14672,15632, 14680,15640, 14688,15648, 14696,15656, 
+   14704,15664, 14712,15576, 14720,15488, 14728,15496, 14736,15504, 
+   14744,15512, 14752,15520, 14760,14768, 14776,14968, 14768,15528, 
+   14784,15744, 14792,15752, 14800,15760, 14808,15480, 14816,15776, 
+   14824,14832, 14840,15992, 14832,15784, 14856,14864, 14864,14880, 
+   14872,14896, 14880,14976, 14888,14992, 14896,15008, 14904,15024, 
+   14912,15104, 14920,15120, 14928,15136, 14936,15152, 14944,15232, 
+   14952,15248, 14960,15264, 14968,15280, 14984,15008, 15000,15024, 
+   15016,15024, 15040,15112, 15048,15128, 15056,15144, 15064,15544, 
+   15072,15240, 15080,15256, 15088,15272, 15096,16056, 15104,15872, 
+   15112,15888, 15120,15904, 15128,15920, 15136,16000, 15144,16016, 
+   15152,16032, 15160,16048, 15168,16128, 15176,16144, 15184,16160, 
+   15192,16176, 15200,16256, 15208,16272, 15216,16288, 15224,16304, 
+   15232,15880, 15240,15896, 15248,15912, 15256,15928, 15264,16008, 
+   15272,16024, 15280,16040, 15288,16056, 15296,16136, 15304,16152, 
+   15312,16168, 15320,15608, 15328,16264, 15336,16280, 15344,16296, 
+   15352,16120, 15416,15512, 15424,15616, 15432,15624, 15440,15632, 
+   15448,15640, 15456,15648, 15464,15656, 15472,15664, 15480,15768, 
+   15528,15536, 15544,16048, 15552,15744, 15560,15752, 15568,15760, 
+   15576,15672, 15584,15776, 15592,15600, 15600,15784, 15608,16184, 
+   15672,15768, 15736,15832, 15784,15792, 15800,16304, 15848,15856, 
+   15880,16000, 15864,16248, 15888,16000, 15896,16008, 15904,16000, 
+   15912,16016, 15920,16008, 15928,16024, 15936,16128, 15944,16160, 
+   15952,16256, 15960,16288, 15968,16136, 15976,16168, 15984,16264, 
+   15992,16296, 16008,16032, 16024,16040, 16064,16144, 16040,16048, 
+   16072,16176, 16080,16272, 16088,16304, 16096,16152, 16104,16184, 
+   16112,16280, 16136,16256, 16120,16312, 16144,16256, 16152,16264, 
+   16160,16256, 16168,16272, 16176,16264, 16184,16280, 16200,16208, 
+   16208,16224, 16216,16240, 16224,16320, 16232,16336, 16240,16352, 
+   16248,16368, 16264,16288, 16280,16296, 16296,16304, 16344,16368,
+   16328,16352, 16360,16368
+};
+
+const uint16_t armBitRevIndexTable4096[ARMBITREVINDEXTABLE4096_TABLE_LENGTH] = 
+{
+   //radix 8, size 4032
+   8,4096, 16,8192, 24,12288, 32,16384, 40,20480, 48,24576, 56,28672, 64,512, 
+   72,4608, 80,8704, 88,12800, 96,16896, 104,20992, 112,25088, 120,29184, 
+   128,1024, 136,5120, 144,9216, 152,13312, 160,17408, 168,21504, 176,25600, 
+   184,29696, 192,1536, 200,5632, 208,9728, 216,13824, 224,17920, 232,22016, 
+   240,26112, 248,30208, 256,2048, 264,6144, 272,10240, 280,14336, 288,18432, 
+   296,22528, 304,26624, 312,30720, 320,2560, 328,6656, 336,10752, 344,14848, 
+   352,18944, 360,23040, 368,27136, 376,31232, 384,3072, 392,7168, 400,11264, 
+   408,15360, 416,19456, 424,23552, 432,27648, 440,31744, 448,3584, 456,7680, 
+   464,11776, 472,15872, 480,19968, 488,24064, 496,28160, 504,32256, 520,4160, 
+   528,8256, 536,12352, 544,16448, 552,20544, 560,24640, 568,28736, 584,4672, 
+   592,8768, 600,12864, 608,16960, 616,21056, 624,25152, 632,29248, 640,1088, 
+   648,5184, 656,9280, 664,13376, 672,17472, 680,21568, 688,25664, 696,29760, 
+   704,1600, 712,5696, 720,9792, 728,13888, 736,17984, 744,22080, 752,26176, 
+   760,30272, 768,2112, 776,6208, 784,10304, 792,14400, 800,18496, 808,22592, 
+   816,26688, 824,30784, 832,2624, 840,6720, 848,10816, 856,14912, 864,19008, 
+   872,23104, 880,27200, 888,31296, 896,3136, 904,7232, 912,11328, 920,15424, 
+   928,19520, 936,23616, 944,27712, 952,31808, 960,3648, 968,7744, 976,11840, 
+   984,15936, 992,20032, 1000,24128, 1008,28224, 1016,32320, 1032,4224, 
+   1040,8320, 1048,12416, 1056,16512, 1064,20608, 1072,24704, 1080,28800, 
+   1096,4736, 1104,8832, 1112,12928, 1120,17024, 1128,21120, 1136,25216, 
+   1144,29312, 1160,5248, 1168,9344, 1176,13440, 1184,17536, 1192,21632, 
+   1200,25728, 1208,29824, 1216,1664, 1224,5760, 1232,9856, 1240,13952, 
+   1248,18048, 1256,22144, 1264,26240, 1272,30336, 1280,2176, 1288,6272, 
+   1296,10368, 1304,14464, 1312,18560, 1320,22656, 1328,26752, 1336,30848, 
+   1344,2688, 1352,6784, 1360,10880, 1368,14976, 1376,19072, 1384,23168, 
+   1392,27264, 1400,31360, 1408,3200, 1416,7296, 1424,11392, 1432,15488, 
+   1440,19584, 1448,23680, 1456,27776, 1464,31872, 1472,3712, 1480,7808, 
+   1488,11904, 1496,16000, 1504,20096, 1512,24192, 1520,28288, 1528,32384, 
+   1544,4288, 1552,8384, 1560,12480, 1568,16576, 1576,20672, 1584,24768, 
+   1592,28864, 1608,4800, 1616,8896, 1624,12992, 1632,17088, 1640,21184, 
+   1648,25280, 1656,29376, 1672,5312, 1680,9408, 1688,13504, 1696,17600, 
+   1704,21696, 1712,25792, 1720,29888, 1736,5824, 1744,9920, 1752,14016, 
+   1760,18112, 1768,22208, 1776,26304, 1784,30400, 1792,2240, 1800,6336, 
+   1808,10432, 1816,14528, 1824,18624, 1832,22720, 1840,26816, 1848,30912, 
+   1856,2752, 1864,6848, 1872,10944, 1880,15040, 1888,19136, 1896,23232, 
+   1904,27328, 1912,31424, 1920,3264, 1928,7360, 1936,11456, 1944,15552, 
+   1952,19648, 1960,23744, 1968,27840, 1976,31936, 1984,3776, 1992,7872, 
+   2000,11968, 2008,16064, 2016,20160, 2024,24256, 2032,28352, 2040,32448, 
+   2056,4352, 2064,8448, 2072,12544, 2080,16640, 2088,20736, 2096,24832, 
+   2104,28928, 2120,4864, 2128,8960, 2136,13056, 2144,17152, 2152,21248, 
+   2160,25344, 2168,29440, 2184,5376, 2192,9472, 2200,13568, 2208,17664, 
+   2216,21760, 2224,25856, 2232,29952, 2248,5888, 2256,9984, 2264,14080, 
+   2272,18176, 2280,22272, 2288,26368, 2296,30464, 2312,6400, 2320,10496, 
+   2328,14592, 2336,18688, 2344,22784, 2352,26880, 2360,30976, 2368,2816, 
+   2376,6912, 2384,11008, 2392,15104, 2400,19200, 2408,23296, 2416,27392, 
+   2424,31488, 2432,3328, 2440,7424, 2448,11520, 2456,15616, 2464,19712, 
+   2472,23808, 2480,27904, 2488,32000, 2496,3840, 2504,7936, 2512,12032, 
+   2520,16128, 2528,20224, 2536,24320, 2544,28416, 2552,32512, 2568,4416, 
+   2576,8512, 2584,12608, 2592,16704, 2600,20800, 2608,24896, 2616,28992, 
+   2632,4928, 2640,9024, 2648,13120, 2656,17216, 2664,21312, 2672,25408, 
+   2680,29504, 2696,5440, 2704,9536, 2712,13632, 2720,17728, 2728,21824, 
+   2736,25920, 2744,30016, 2760,5952, 2768,10048, 2776,14144, 2784,18240, 
+   2792,22336, 2800,26432, 2808,30528, 2824,6464, 2832,10560, 2840,14656, 
+   2848,18752, 2856,22848, 2864,26944, 2872,31040, 2888,6976, 2896,11072, 
+   2904,15168, 2912,19264, 2920,23360, 2928,27456, 2936,31552, 2944,3392, 
+   2952,7488, 2960,11584, 2968,15680, 2976,19776, 2984,23872, 2992,27968, 
+   3000,32064, 3008,3904, 3016,8000, 3024,12096, 3032,16192, 3040,20288, 
+   3048,24384, 3056,28480, 3064,32576, 3080,4480, 3088,8576, 3096,12672, 
+   3104,16768, 3112,20864, 3120,24960, 3128,29056, 3144,4992, 3152,9088, 
+   3160,13184, 3168,17280, 3176,21376, 3184,25472, 3192,29568, 3208,5504, 
+   3216,9600, 3224,13696, 3232,17792, 3240,21888, 3248,25984, 3256,30080, 
+   3272,6016, 3280,10112, 3288,14208, 3296,18304, 3304,22400, 3312,26496, 
+   3320,30592, 3336,6528, 3344,10624, 3352,14720, 3360,18816, 3368,22912, 
+   3376,27008, 3384,31104, 3400,7040, 3408,11136, 3416,15232, 3424,19328, 
+   3432,23424, 3440,27520, 3448,31616, 3464,7552, 3472,11648, 3480,15744, 
+   3488,19840, 3496,23936, 3504,28032, 3512,32128, 3520,3968, 3528,8064, 
+   3536,12160, 3544,16256, 3552,20352, 3560,24448, 3568,28544, 3576,32640, 
+   3592,4544, 3600,8640, 3608,12736, 3616,16832, 3624,20928, 3632,25024, 
+   3640,29120, 3656,5056, 3664,9152, 3672,13248, 3680,17344, 3688,21440, 
+   3696,25536, 3704,29632, 3720,5568, 3728,9664, 3736,13760, 3744,17856, 
+   3752,21952, 3760,26048, 3768,30144, 3784,6080, 3792,10176, 3800,14272, 
+   3808,18368, 3816,22464, 3824,26560, 3832,30656, 3848,6592, 3856,10688, 
+   3864,14784, 3872,18880, 3880,22976, 3888,27072, 3896,31168, 3912,7104, 
+   3920,11200, 3928,15296, 3936,19392, 3944,23488, 3952,27584, 3960,31680, 
+   3976,7616, 3984,11712, 3992,15808, 4000,19904, 4008,24000, 4016,28096, 
+   4024,32192, 4040,8128, 4048,12224, 4056,16320, 4064,20416, 4072,24512, 
+   4080,28608, 4088,32704, 4112,8200, 4120,12296, 4128,16392, 4136,20488, 
+   4144,24584, 4152,28680, 4168,4616, 4176,8712, 4184,12808, 4192,16904, 
+   4200,21000, 4208,25096, 4216,29192, 4232,5128, 4240,9224, 4248,13320, 
+   4256,17416, 4264,21512, 4272,25608, 4280,29704, 4296,5640, 4304,9736, 
+   4312,13832, 4320,17928, 4328,22024, 4336,26120, 4344,30216, 4360,6152, 
+   4368,10248, 4376,14344, 4384,18440, 4392,22536, 4400,26632, 4408,30728, 
+   4424,6664, 4432,10760, 4440,14856, 4448,18952, 4456,23048, 4464,27144, 
+   4472,31240, 4488,7176, 4496,11272, 4504,15368, 4512,19464, 4520,23560, 
+   4528,27656, 4536,31752, 4552,7688, 4560,11784, 4568,15880, 4576,19976, 
+   4584,24072, 4592,28168, 4600,32264, 4624,8264, 4632,12360, 4640,16456, 
+   4648,20552, 4656,24648, 4664,28744, 4688,8776, 4696,12872, 4704,16968, 
+   4712,21064, 4720,25160, 4728,29256, 4744,5192, 4752,9288, 4760,13384, 
+   4768,17480, 4776,21576, 4784,25672, 4792,29768, 4808,5704, 4816,9800, 
+   4824,13896, 4832,17992, 4840,22088, 4848,26184, 4856,30280, 4872,6216, 
+   4880,10312, 4888,14408, 4896,18504, 4904,22600, 4912,26696, 4920,30792, 
+   4936,6728, 4944,10824, 4952,14920, 4960,19016, 4968,23112, 4976,27208, 
+   4984,31304, 5000,7240, 5008,11336, 5016,15432, 5024,19528, 5032,23624, 
+   5040,27720, 5048,31816, 5064,7752, 5072,11848, 5080,15944, 5088,20040, 
+   5096,24136, 5104,28232, 5112,32328, 5136,8328, 5144,12424, 5152,16520, 
+   5160,20616, 5168,24712, 5176,28808, 5200,8840, 5208,12936, 5216,17032, 
+   5224,21128, 5232,25224, 5240,29320, 5264,9352, 5272,13448, 5280,17544, 
+   5288,21640, 5296,25736, 5304,29832, 5320,5768, 5328,9864, 5336,13960, 
+   5344,18056, 5352,22152, 5360,26248, 5368,30344, 5384,6280, 5392,10376, 
+   5400,14472, 5408,18568, 5416,22664, 5424,26760, 5432,30856, 5448,6792, 
+   5456,10888, 5464,14984, 5472,19080, 5480,23176, 5488,27272, 5496,31368, 
+   5512,7304, 5520,11400, 5528,15496, 5536,19592, 5544,23688, 5552,27784, 
+   5560,31880, 5576,7816, 5584,11912, 5592,16008, 5600,20104, 5608,24200, 
+   5616,28296, 5624,32392, 5648,8392, 5656,12488, 5664,16584, 5672,20680, 
+   5680,24776, 5688,28872, 5712,8904, 5720,13000, 5728,17096, 5736,21192, 
+   5744,25288, 5752,29384, 5776,9416, 5784,13512, 5792,17608, 5800,21704, 
+   5808,25800, 5816,29896, 5840,9928, 5848,14024, 5856,18120, 5864,22216, 
+   5872,26312, 5880,30408, 5896,6344, 5904,10440, 5912,14536, 5920,18632, 
+   5928,22728, 5936,26824, 5944,30920, 5960,6856, 5968,10952, 5976,15048, 
+   5984,19144, 5992,23240, 6000,27336, 6008,31432, 6024,7368, 6032,11464, 
+   6040,15560, 6048,19656, 6056,23752, 6064,27848, 6072,31944, 6088,7880, 
+   6096,11976, 6104,16072, 6112,20168, 6120,24264, 6128,28360, 6136,32456, 
+   6160,8456, 6168,12552, 6176,16648, 6184,20744, 6192,24840, 6200,28936, 
+   6224,8968, 6232,13064, 6240,17160, 6248,21256, 6256,25352, 6264,29448, 
+   6288,9480, 6296,13576, 6304,17672, 6312,21768, 6320,25864, 6328,29960, 
+   6352,9992, 6360,14088, 6368,18184, 6376,22280, 6384,26376, 6392,30472, 
+   6416,10504, 6424,14600, 6432,18696, 6440,22792, 6448,26888, 6456,30984, 
+   6472,6920, 6480,11016, 6488,15112, 6496,19208, 6504,23304, 6512,27400, 
+   6520,31496, 6536,7432, 6544,11528, 6552,15624, 6560,19720, 6568,23816, 
+   6576,27912, 6584,32008, 6600,7944, 6608,12040, 6616,16136, 6624,20232, 
+   6632,24328, 6640,28424, 6648,32520, 6672,8520, 6680,12616, 6688,16712, 
+   6696,20808, 6704,24904, 6712,29000, 6736,9032, 6744,13128, 6752,17224, 
+   6760,21320, 6768,25416, 6776,29512, 6800,9544, 6808,13640, 6816,17736, 
+   6824,21832, 6832,25928, 6840,30024, 6864,10056, 6872,14152, 6880,18248, 
+   6888,22344, 6896,26440, 6904,30536, 6928,10568, 6936,14664, 6944,18760, 
+   6952,22856, 6960,26952, 6968,31048, 6992,11080, 7000,15176, 7008,19272, 
+   7016,23368, 7024,27464, 7032,31560, 7048,7496, 7056,11592, 7064,15688, 
+   7072,19784, 7080,23880, 7088,27976, 7096,32072, 7112,8008, 7120,12104, 
+   7128,16200, 7136,20296, 7144,24392, 7152,28488, 7160,32584, 7184,8584, 
+   7192,12680, 7200,16776, 7208,20872, 7216,24968, 7224,29064, 7248,9096, 
+   7256,13192, 7264,17288, 7272,21384, 7280,25480, 7288,29576, 7312,9608, 
+   7320,13704, 7328,17800, 7336,21896, 7344,25992, 7352,30088, 7376,10120, 
+   7384,14216, 7392,18312, 7400,22408, 7408,26504, 7416,30600, 7440,10632, 
+   7448,14728, 7456,18824, 7464,22920, 7472,27016, 7480,31112, 7504,11144, 
+   7512,15240, 7520,19336, 7528,23432, 7536,27528, 7544,31624, 7568,11656, 
+   7576,15752, 7584,19848, 7592,23944, 7600,28040, 7608,32136, 7624,8072, 
+   7632,12168, 7640,16264, 7648,20360, 7656,24456, 7664,28552, 7672,32648, 
+   7696,8648, 7704,12744, 7712,16840, 7720,20936, 7728,25032, 7736,29128, 
+   7760,9160, 7768,13256, 7776,17352, 7784,21448, 7792,25544, 7800,29640, 
+   7824,9672, 7832,13768, 7840,17864, 7848,21960, 7856,26056, 7864,30152, 
+   7888,10184, 7896,14280, 7904,18376, 7912,22472, 7920,26568, 7928,30664, 
+   7952,10696, 7960,14792, 7968,18888, 7976,22984, 7984,27080, 7992,31176, 
+   8016,11208, 8024,15304, 8032,19400, 8040,23496, 8048,27592, 8056,31688, 
+   8080,11720, 8088,15816, 8096,19912, 8104,24008, 8112,28104, 8120,32200, 
+   8144,12232, 8152,16328, 8160,20424, 8168,24520, 8176,28616, 8184,32712, 
+   8216,12304, 8224,16400, 8232,20496, 8240,24592, 8248,28688, 8272,8720, 
+   8280,12816, 8288,16912, 8296,21008, 8304,25104, 8312,29200, 8336,9232, 
+   8344,13328, 8352,17424, 8360,21520, 8368,25616, 8376,29712, 8400,9744, 
+   8408,13840, 8416,17936, 8424,22032, 8432,26128, 8440,30224, 8464,10256, 
+   8472,14352, 8480,18448, 8488,22544, 8496,26640, 8504,30736, 8528,10768, 
+   8536,14864, 8544,18960, 8552,23056, 8560,27152, 8568,31248, 8592,11280, 
+   8600,15376, 8608,19472, 8616,23568, 8624,27664, 8632,31760, 8656,11792, 
+   8664,15888, 8672,19984, 8680,24080, 8688,28176, 8696,32272, 8728,12368, 
+   8736,16464, 8744,20560, 8752,24656, 8760,28752, 8792,12880, 8800,16976, 
+   8808,21072, 8816,25168, 8824,29264, 8848,9296, 8856,13392, 8864,17488, 
+   8872,21584, 8880,25680, 8888,29776, 8912,9808, 8920,13904, 8928,18000, 
+   8936,22096, 8944,26192, 8952,30288, 8976,10320, 8984,14416, 8992,18512, 
+   9000,22608, 9008,26704, 9016,30800, 9040,10832, 9048,14928, 9056,19024, 
+   9064,23120, 9072,27216, 9080,31312, 9104,11344, 9112,15440, 9120,19536, 
+   9128,23632, 9136,27728, 9144,31824, 9168,11856, 9176,15952, 9184,20048, 
+   9192,24144, 9200,28240, 9208,32336, 9240,12432, 9248,16528, 9256,20624, 
+   9264,24720, 9272,28816, 9304,12944, 9312,17040, 9320,21136, 9328,25232, 
+   9336,29328, 9368,13456, 9376,17552, 9384,21648, 9392,25744, 9400,29840, 
+   9424,9872, 9432,13968, 9440,18064, 9448,22160, 9456,26256, 9464,30352, 
+   9488,10384, 9496,14480, 9504,18576, 9512,22672, 9520,26768, 9528,30864, 
+   9552,10896, 9560,14992, 9568,19088, 9576,23184, 9584,27280, 9592,31376, 
+   9616,11408, 9624,15504, 9632,19600, 9640,23696, 9648,27792, 9656,31888, 
+   9680,11920, 9688,16016, 9696,20112, 9704,24208, 9712,28304, 9720,32400, 
+   9752,12496, 9760,16592, 9768,20688, 9776,24784, 9784,28880, 9816,13008, 
+   9824,17104, 9832,21200, 9840,25296, 9848,29392, 9880,13520, 9888,17616, 
+   9896,21712, 9904,25808, 9912,29904, 9944,14032, 9952,18128, 9960,22224, 
+   9968,26320, 9976,30416, 10000,10448, 10008,14544, 10016,18640, 10024,22736, 
+   10032,26832, 10040,30928, 10064,10960, 10072,15056, 10080,19152, 
+   10088,23248, 10096,27344, 10104,31440, 10128,11472, 10136,15568, 
+   10144,19664, 10152,23760, 10160,27856, 10168,31952, 10192,11984, 
+   10200,16080, 10208,20176, 10216,24272, 10224,28368, 10232,32464, 
+   10264,12560, 10272,16656, 10280,20752, 10288,24848, 10296,28944, 
+   10328,13072, 10336,17168, 10344,21264, 10352,25360, 10360,29456, 
+   10392,13584, 10400,17680, 10408,21776, 10416,25872, 10424,29968, 
+   10456,14096, 10464,18192, 10472,22288, 10480,26384, 10488,30480, 
+   10520,14608, 10528,18704, 10536,22800, 10544,26896, 10552,30992, 
+   10576,11024, 10584,15120, 10592,19216, 10600,23312, 10608,27408, 
+   10616,31504, 10640,11536, 10648,15632, 10656,19728, 10664,23824, 
+   10672,27920, 10680,32016, 10704,12048, 10712,16144, 10720,20240, 
+   10728,24336, 10736,28432, 10744,32528, 10776,12624, 10784,16720, 
+   10792,20816, 10800,24912, 10808,29008, 10840,13136, 10848,17232, 
+   10856,21328, 10864,25424, 10872,29520, 10904,13648, 10912,17744, 
+   10920,21840, 10928,25936, 10936,30032, 10968,14160, 10976,18256, 
+   10984,22352, 10992,26448, 11000,30544, 11032,14672, 11040,18768, 
+   11048,22864, 11056,26960, 11064,31056, 11096,15184, 11104,19280, 
+   11112,23376, 11120,27472, 11128,31568, 11152,11600, 11160,15696, 
+   11168,19792, 11176,23888, 11184,27984, 11192,32080, 11216,12112, 
+   11224,16208, 11232,20304, 11240,24400, 11248,28496, 11256,32592, 
+   11288,12688, 11296,16784, 11304,20880, 11312,24976, 11320,29072, 
+   11352,13200, 11360,17296, 11368,21392, 11376,25488, 11384,29584, 
+   11416,13712, 11424,17808, 11432,21904, 11440,26000, 11448,30096, 
+   11480,14224, 11488,18320, 11496,22416, 11504,26512, 11512,30608, 
+   11544,14736, 11552,18832, 11560,22928, 11568,27024, 11576,31120, 
+   11608,15248, 11616,19344, 11624,23440, 11632,27536, 11640,31632, 
+   11672,15760, 11680,19856, 11688,23952, 11696,28048, 11704,32144, 
+   11728,12176, 11736,16272, 11744,20368, 11752,24464, 11760,28560, 
+   11768,32656, 11800,12752, 11808,16848, 11816,20944, 11824,25040, 
+   11832,29136, 11864,13264, 11872,17360, 11880,21456, 11888,25552, 
+   11896,29648, 11928,13776, 11936,17872, 11944,21968, 11952,26064, 
+   11960,30160, 11992,14288, 12000,18384, 12008,22480, 12016,26576, 
+   12024,30672, 12056,14800, 12064,18896, 12072,22992, 12080,27088, 
+   12088,31184, 12120,15312, 12128,19408, 12136,23504, 12144,27600, 
+   12152,31696, 12184,15824, 12192,19920, 12200,24016, 12208,28112, 
+   12216,32208, 12248,16336, 12256,20432, 12264,24528, 12272,28624, 
+   12280,32720, 12320,16408, 12328,20504, 12336,24600, 12344,28696, 
+   12376,12824, 12384,16920, 12392,21016, 12400,25112, 12408,29208, 
+   12440,13336, 12448,17432, 12456,21528, 12464,25624, 12472,29720, 
+   12504,13848, 12512,17944, 12520,22040, 12528,26136, 12536,30232, 
+   12568,14360, 12576,18456, 12584,22552, 12592,26648, 12600,30744, 
+   12632,14872, 12640,18968, 12648,23064, 12656,27160, 12664,31256, 
+   12696,15384, 12704,19480, 12712,23576, 12720,27672, 12728,31768, 
+   12760,15896, 12768,19992, 12776,24088, 12784,28184, 12792,32280, 
+   12832,16472, 12840,20568, 12848,24664, 12856,28760, 12896,16984, 
+   12904,21080, 12912,25176, 12920,29272, 12952,13400, 12960,17496, 
+   12968,21592, 12976,25688, 12984,29784, 13016,13912, 13024,18008, 
+   13032,22104, 13040,26200, 13048,30296, 13080,14424, 13088,18520, 
+   13096,22616, 13104,26712, 13112,30808, 13144,14936, 13152,19032, 
+   13160,23128, 13168,27224, 13176,31320, 13208,15448, 13216,19544, 
+   13224,23640, 13232,27736, 13240,31832, 13272,15960, 13280,20056, 
+   13288,24152, 13296,28248, 13304,32344, 13344,16536, 13352,20632, 
+   13360,24728, 13368,28824, 13408,17048, 13416,21144, 13424,25240, 
+   13432,29336, 13472,17560, 13480,21656, 13488,25752, 13496,29848, 
+   13528,13976, 13536,18072, 13544,22168, 13552,26264, 13560,30360, 
+   13592,14488, 13600,18584, 13608,22680, 13616,26776, 13624,30872, 
+   13656,15000, 13664,19096, 13672,23192, 13680,27288, 13688,31384, 
+   13720,15512, 13728,19608, 13736,23704, 13744,27800, 13752,31896, 
+   13784,16024, 13792,20120, 13800,24216, 13808,28312, 13816,32408, 
+   13856,16600, 13864,20696, 13872,24792, 13880,28888, 13920,17112, 
+   13928,21208, 13936,25304, 13944,29400, 13984,17624, 13992,21720, 
+   14000,25816, 14008,29912, 14048,18136, 14056,22232, 14064,26328, 
+   14072,30424, 14104,14552, 14112,18648, 14120,22744, 14128,26840, 
+   14136,30936, 14168,15064, 14176,19160, 14184,23256, 14192,27352, 
+   14200,31448, 14232,15576, 14240,19672, 14248,23768, 14256,27864, 
+   14264,31960, 14296,16088, 14304,20184, 14312,24280, 14320,28376, 
+   14328,32472, 14368,16664, 14376,20760, 14384,24856, 14392,28952, 
+   14432,17176, 14440,21272, 14448,25368, 14456,29464, 14496,17688, 
+   14504,21784, 14512,25880, 14520,29976, 14560,18200, 14568,22296, 
+   14576,26392, 14584,30488, 14624,18712, 14632,22808, 14640,26904, 
+   14648,31000, 14680,15128, 14688,19224, 14696,23320, 14704,27416, 
+   14712,31512, 14744,15640, 14752,19736, 14760,23832, 14768,27928, 
+   14776,32024, 14808,16152, 14816,20248, 14824,24344, 14832,28440, 
+   14840,32536, 14880,16728, 14888,20824, 14896,24920, 14904,29016, 
+   14944,17240, 14952,21336, 14960,25432, 14968,29528, 15008,17752, 
+   15016,21848, 15024,25944, 15032,30040, 15072,18264, 15080,22360, 
+   15088,26456, 15096,30552, 15136,18776, 15144,22872, 15152,26968, 
+   15160,31064, 15200,19288, 15208,23384, 15216,27480, 15224,31576, 
+   15256,15704, 15264,19800, 15272,23896, 15280,27992, 15288,32088, 
+   15320,16216, 15328,20312, 15336,24408, 15344,28504, 15352,32600, 
+   15392,16792, 15400,20888, 15408,24984, 15416,29080, 15456,17304, 
+   15464,21400, 15472,25496, 15480,29592, 15520,17816, 15528,21912, 
+   15536,26008, 15544,30104, 15584,18328, 15592,22424, 15600,26520, 
+   15608,30616, 15648,18840, 15656,22936, 15664,27032, 15672,31128, 
+   15712,19352, 15720,23448, 15728,27544, 15736,31640, 15776,19864, 
+   15784,23960, 15792,28056, 15800,32152, 15832,16280, 15840,20376, 
+   15848,24472, 15856,28568, 15864,32664, 15904,16856, 15912,20952, 
+   15920,25048, 15928,29144, 15968,17368, 15976,21464, 15984,25560, 
+   15992,29656, 16032,17880, 16040,21976, 16048,26072, 16056,30168, 
+   16096,18392, 16104,22488, 16112,26584, 16120,30680, 16160,18904, 
+   16168,23000, 16176,27096, 16184,31192, 16224,19416, 16232,23512, 
+   16240,27608, 16248,31704, 16288,19928, 16296,24024, 16304,28120, 
+   16312,32216, 16352,20440, 16360,24536, 16368,28632, 16376,32728, 
+   16424,20512, 16432,24608, 16440,28704, 16480,16928, 16488,21024, 
+   16496,25120, 16504,29216, 16544,17440, 16552,21536, 16560,25632, 
+   16568,29728, 16608,17952, 16616,22048, 16624,26144, 16632,30240, 
+   16672,18464, 16680,22560, 16688,26656, 16696,30752, 16736,18976, 
+   16744,23072, 16752,27168, 16760,31264, 16800,19488, 16808,23584, 
+   16816,27680, 16824,31776, 16864,20000, 16872,24096, 16880,28192, 
+   16888,32288, 16936,20576, 16944,24672, 16952,28768, 17000,21088, 
+   17008,25184, 17016,29280, 17056,17504, 17064,21600, 17072,25696, 
+   17080,29792, 17120,18016, 17128,22112, 17136,26208, 17144,30304, 
+   17184,18528, 17192,22624, 17200,26720, 17208,30816, 17248,19040, 
+   17256,23136, 17264,27232, 17272,31328, 17312,19552, 17320,23648, 
+   17328,27744, 17336,31840, 17376,20064, 17384,24160, 17392,28256, 
+   17400,32352, 17448,20640, 17456,24736, 17464,28832, 17512,21152, 
+   17520,25248, 17528,29344, 17576,21664, 17584,25760, 17592,29856, 
+   17632,18080, 17640,22176, 17648,26272, 17656,30368, 17696,18592, 
+   17704,22688, 17712,26784, 17720,30880, 17760,19104, 17768,23200, 
+   17776,27296, 17784,31392, 17824,19616, 17832,23712, 17840,27808, 
+   17848,31904, 17888,20128, 17896,24224, 17904,28320, 17912,32416, 
+   17960,20704, 17968,24800, 17976,28896, 18024,21216, 18032,25312, 
+   18040,29408, 18088,21728, 18096,25824, 18104,29920, 18152,22240, 
+   18160,26336, 18168,30432, 18208,18656, 18216,22752, 18224,26848, 
+   18232,30944, 18272,19168, 18280,23264, 18288,27360, 18296,31456, 
+   18336,19680, 18344,23776, 18352,27872, 18360,31968, 18400,20192, 
+   18408,24288, 18416,28384, 18424,32480, 18472,20768, 18480,24864, 
+   18488,28960, 18536,21280, 18544,25376, 18552,29472, 18600,21792, 
+   18608,25888, 18616,29984, 18664,22304, 18672,26400, 18680,30496, 
+   18728,22816, 18736,26912, 18744,31008, 18784,19232, 18792,23328, 
+   18800,27424, 18808,31520, 18848,19744, 18856,23840, 18864,27936, 
+   18872,32032, 18912,20256, 18920,24352, 18928,28448, 18936,32544, 
+   18984,20832, 18992,24928, 19000,29024, 19048,21344, 19056,25440, 
+   19064,29536, 19112,21856, 19120,25952, 19128,30048, 19176,22368, 
+   19184,26464, 19192,30560, 19240,22880, 19248,26976, 19256,31072, 
+   19304,23392, 19312,27488, 19320,31584, 19360,19808, 19368,23904, 
+   19376,28000, 19384,32096, 19424,20320, 19432,24416, 19440,28512, 
+   19448,32608, 19496,20896, 19504,24992, 19512,29088, 19560,21408, 
+   19568,25504, 19576,29600, 19624,21920, 19632,26016, 19640,30112, 
+   19688,22432, 19696,26528, 19704,30624, 19752,22944, 19760,27040, 
+   19768,31136, 19816,23456, 19824,27552, 19832,31648, 19880,23968, 
+   19888,28064, 19896,32160, 19936,20384, 19944,24480, 19952,28576, 
+   19960,32672, 20008,20960, 20016,25056, 20024,29152, 20072,21472, 
+   20080,25568, 20088,29664, 20136,21984, 20144,26080, 20152,30176, 
+   20200,22496, 20208,26592, 20216,30688, 20264,23008, 20272,27104, 
+   20280,31200, 20328,23520, 20336,27616, 20344,31712, 20392,24032, 
+   20400,28128, 20408,32224, 20456,24544, 20464,28640, 20472,32736, 
+   20528,24616, 20536,28712, 20584,21032, 20592,25128, 20600,29224, 
+   20648,21544, 20656,25640, 20664,29736, 20712,22056, 20720,26152, 
+   20728,30248, 20776,22568, 20784,26664, 20792,30760, 20840,23080, 
+   20848,27176, 20856,31272, 20904,23592, 20912,27688, 20920,31784, 
+   20968,24104, 20976,28200, 20984,32296, 21040,24680, 21048,28776, 
+   21104,25192, 21112,29288, 21160,21608, 21168,25704, 21176,29800, 
+   21224,22120, 21232,26216, 21240,30312, 21288,22632, 21296,26728, 
+   21304,30824, 21352,23144, 21360,27240, 21368,31336, 21416,23656, 
+   21424,27752, 21432,31848, 21480,24168, 21488,28264, 21496,32360, 
+   21552,24744, 21560,28840, 21616,25256, 21624,29352, 21680,25768, 
+   21688,29864, 21736,22184, 21744,26280, 21752,30376, 21800,22696, 
+   21808,26792, 21816,30888, 21864,23208, 21872,27304, 21880,31400, 
+   21928,23720, 21936,27816, 21944,31912, 21992,24232, 22000,28328, 
+   22008,32424, 22064,24808, 22072,28904, 22128,25320, 22136,29416, 
+   22192,25832, 22200,29928, 22256,26344, 22264,30440, 22312,22760, 
+   22320,26856, 22328,30952, 22376,23272, 22384,27368, 22392,31464, 
+   22440,23784, 22448,27880, 22456,31976, 22504,24296, 22512,28392, 
+   22520,32488, 22576,24872, 22584,28968, 22640,25384, 22648,29480, 
+   22704,25896, 22712,29992, 22768,26408, 22776,30504, 22832,26920, 
+   22840,31016, 22888,23336, 22896,27432, 22904,31528, 22952,23848, 
+   22960,27944, 22968,32040, 23016,24360, 23024,28456, 23032,32552, 
+   23088,24936, 23096,29032, 23152,25448, 23160,29544, 23216,25960, 
+   23224,30056, 23280,26472, 23288,30568, 23344,26984, 23352,31080, 
+   23408,27496, 23416,31592, 23464,23912, 23472,28008, 23480,32104, 
+   23528,24424, 23536,28520, 23544,32616, 23600,25000, 23608,29096, 
+   23664,25512, 23672,29608, 23728,26024, 23736,30120, 23792,26536, 
+   23800,30632, 23856,27048, 23864,31144, 23920,27560, 23928,31656, 
+   23984,28072, 23992,32168, 24040,24488, 24048,28584, 24056,32680, 
+   24112,25064, 24120,29160, 24176,25576, 24184,29672, 24240,26088, 
+   24248,30184, 24304,26600, 24312,30696, 24368,27112, 24376,31208, 
+   24432,27624, 24440,31720, 24496,28136, 24504,32232, 24560,28648, 
+   24568,32744, 24632,28720, 24688,25136, 24696,29232, 24752,25648, 
+   24760,29744, 24816,26160, 24824,30256, 24880,26672, 24888,30768, 
+   24944,27184, 24952,31280, 25008,27696, 25016,31792, 25072,28208, 
+   25080,32304, 25144,28784, 25208,29296, 25264,25712, 25272,29808, 
+   25328,26224, 25336,30320, 25392,26736, 25400,30832, 25456,27248, 
+   25464,31344, 25520,27760, 25528,31856, 25584,28272, 25592,32368, 
+   25656,28848, 25720,29360, 25784,29872, 25840,26288, 25848,30384, 
+   25904,26800, 25912,30896, 25968,27312, 25976,31408, 26032,27824, 
+   26040,31920, 26096,28336, 26104,32432, 26168,28912, 26232,29424, 
+   26296,29936, 26360,30448, 26416,26864, 26424,30960, 26480,27376, 
+   26488,31472, 26544,27888, 26552,31984, 26608,28400, 26616,32496, 
+   26680,28976, 26744,29488, 26808,30000, 26872,30512, 26936,31024, 
+   26992,27440, 27000,31536, 27056,27952, 27064,32048, 27120,28464, 
+   27128,32560, 27192,29040, 27256,29552, 27320,30064, 27384,30576, 
+   27448,31088, 27512,31600, 27568,28016, 27576,32112, 27632,28528, 
+   27640,32624, 27704,29104, 27768,29616, 27832,30128, 27896,30640, 
+   27960,31152, 28024,31664, 28088,32176, 28144,28592, 28152,32688, 
+   28216,29168, 28280,29680, 28344,30192, 28408,30704, 28472,31216, 
+   28536,31728, 28600,32240, 28664,32752, 28792,29240, 28856,29752, 
+   28920,30264, 28984,30776, 29048,31288, 29112,31800, 29176,32312, 
+   29368,29816, 29432,30328, 29496,30840, 29560,31352, 29624,31864, 
+   29688,32376, 29944,30392, 30008,30904, 30072,31416, 30136,31928, 
+   30200,32440, 30520,30968, 30584,31480, 30648,31992, 30712,32504, 
+   31096,31544, 31160,32056, 31224,32568, 31672,32120, 31736,32632, 
+   32248,32696
+};
+
+
+const uint16_t armBitRevIndexTable_fixed_16[ARMBITREVINDEXTABLE_FIXED___16_TABLE_LENGTH] = 
+{
+   //radix 4, size 12
+   8,64, 16,32, 24,96, 40,80, 56,112, 88,104
+};
+
+const uint16_t armBitRevIndexTable_fixed_32[ARMBITREVINDEXTABLE_FIXED___32_TABLE_LENGTH] = 
+{
+   //4x2, size 24
+   8,128, 16,64, 24,192, 40,160, 48,96, 56,224, 72,144,
+   88,208, 104,176, 120,240, 152,200, 184,232
+};
+
+const uint16_t armBitRevIndexTable_fixed_64[ARMBITREVINDEXTABLE_FIXED___64_TABLE_LENGTH] = 
+{   
+   //radix 4, size 56
+   8,256, 16,128, 24,384, 32,64, 40,320, 48,192, 56,448, 72,288, 80,160, 88,416, 104,352,
+   112,224, 120,480, 136,272, 152,400, 168,336, 176,208, 184,464, 200,304, 216,432,
+   232,368, 248,496, 280,392, 296,328, 312,456, 344,424, 376,488, 440,472
+};
+
+const uint16_t armBitRevIndexTable_fixed_128[ARMBITREVINDEXTABLE_FIXED__128_TABLE_LENGTH] = 
+{
+   //4x2, size 112
+   8,512, 16,256, 24,768, 32,128, 40,640, 48,384, 56,896, 72,576, 80,320, 88,832, 96,192,
+   104,704, 112,448, 120,960, 136,544, 144,288, 152,800, 168,672, 176,416, 184,928, 200,608,
+   208,352, 216,864, 232,736, 240,480, 248,992, 264,528, 280,784, 296,656, 304,400, 312,912,
+   328,592, 344,848, 360,720, 368,464, 376,976, 392,560, 408,816, 424,688, 440,944, 456,624,
+   472,880, 488,752, 504,1008, 536,776, 552,648, 568,904, 600,840, 616,712, 632,968,
+   664,808, 696,936, 728,872, 760,1000, 824,920, 888,984
+};
+
+const uint16_t armBitRevIndexTable_fixed_256[ARMBITREVINDEXTABLE_FIXED__256_TABLE_LENGTH] = 
+{
+   //radix 4, size 240
+   8,1024, 16,512, 24,1536, 32,256, 40,1280, 48,768, 56,1792, 64,128, 72,1152, 80,640,
+   88,1664, 96,384, 104,1408, 112,896, 120,1920, 136,1088, 144,576, 152,1600, 160,320,
+   168,1344, 176,832, 184,1856, 200,1216, 208,704, 216,1728, 224,448, 232,1472, 240,960,
+   248,1984, 264,1056, 272,544, 280,1568, 296,1312, 304,800, 312,1824, 328,1184, 336,672,
+   344,1696, 352,416, 360,1440, 368,928, 376,1952, 392,1120, 400,608, 408,1632, 424,1376,
+   432,864, 440,1888, 456,1248, 464,736, 472,1760, 488,1504, 496,992, 504,2016, 520,1040,
+   536,1552, 552,1296, 560,784, 568,1808, 584,1168, 592,656, 600,1680, 616,1424, 624,912,
+   632,1936, 648,1104, 664,1616, 680,1360, 688,848, 696,1872, 712,1232, 728,1744, 744,1488,
+   752,976, 760,2000, 776,1072, 792,1584, 808,1328, 824,1840, 840,1200, 856,1712, 872,1456,
+   880,944, 888,1968, 904,1136, 920,1648, 936,1392, 952,1904, 968,1264, 984,1776, 1000,1520,
+   1016,2032, 1048,1544, 1064,1288, 1080,1800, 1096,1160, 1112,1672, 1128,1416, 1144,1928,
+   1176,1608, 1192,1352, 1208,1864, 1240,1736, 1256,1480, 1272,1992, 1304,1576, 1336,1832,
+   1368,1704, 1384,1448, 1400,1960, 1432,1640, 1464,1896, 1496,1768, 1528,2024, 1592,1816,
+   1624,1688, 1656,1944, 1720,1880, 1784,2008, 1912,1976
+};
+
+const uint16_t armBitRevIndexTable_fixed_512[ARMBITREVINDEXTABLE_FIXED__512_TABLE_LENGTH] = 
+{
+   //4x2, size 480
+   8,2048, 16,1024, 24,3072, 32,512, 40,2560, 48,1536, 56,3584, 64,256, 72,2304, 80,1280,
+   88,3328, 96,768, 104,2816, 112,1792, 120,3840, 136,2176, 144,1152, 152,3200, 160,640,
+   168,2688, 176,1664, 184,3712, 192,384, 200,2432, 208,1408, 216,3456, 224,896, 232,2944,
+   240,1920, 248,3968, 264,2112, 272,1088, 280,3136, 288,576, 296,2624, 304,1600, 312,3648,
+   328,2368, 336,1344, 344,3392, 352,832, 360,2880, 368,1856, 376,3904, 392,2240, 400,1216,
+   408,3264, 416,704, 424,2752, 432,1728, 440,3776, 456,2496, 464,1472, 472,3520, 480,960,
+   488,3008, 496,1984, 504,4032, 520,2080, 528,1056, 536,3104, 552,2592, 560,1568, 568,3616,
+   584,2336, 592,1312, 600,3360, 608,800, 616,2848, 624,1824, 632,3872, 648,2208, 656,1184,
+   664,3232, 680,2720, 688,1696, 696,3744, 712,2464, 720,1440, 728,3488, 736,928, 744,2976,
+   752,1952, 760,4000, 776,2144, 784,1120, 792,3168, 808,2656, 816,1632, 824,3680, 840,2400,
+   848,1376, 856,3424, 872,2912, 880,1888, 888,3936, 904,2272, 912,1248, 920,3296, 936,2784,
+   944,1760, 952,3808, 968,2528, 976,1504, 984,3552, 1000,3040, 1008,2016, 1016,4064,
+   1032,2064, 1048,3088, 1064,2576, 1072,1552, 1080,3600, 1096,2320, 1104,1296, 1112,3344,
+   1128,2832, 1136,1808, 1144,3856, 1160,2192, 1176,3216, 1192,2704, 1200,1680, 1208,3728,
+   1224,2448, 1232,1424, 1240,3472, 1256,2960, 1264,1936, 1272,3984, 1288,2128, 1304,3152,
+   1320,2640, 1328,1616, 1336,3664, 1352,2384, 1368,3408, 1384,2896, 1392,1872, 1400,3920,
+   1416,2256, 1432,3280, 1448,2768, 1456,1744, 1464,3792, 1480,2512, 1496,3536, 1512,3024,
+   1520,2000, 1528,4048, 1544,2096, 1560,3120, 1576,2608, 1592,3632, 1608,2352, 1624,3376,
+   1640,2864, 1648,1840, 1656,3888, 1672,2224, 1688,3248, 1704,2736, 1720,3760, 1736,2480,
+   1752,3504, 1768,2992, 1776,1968, 1784,4016, 1800,2160, 1816,3184, 1832,2672, 1848,3696,
+   1864,2416, 1880,3440, 1896,2928, 1912,3952, 1928,2288, 1944,3312, 1960,2800, 1976,3824,
+   1992,2544, 2008,3568, 2024,3056, 2040,4080, 2072,3080, 2088,2568, 2104,3592, 2120,2312,
+   2136,3336, 2152,2824, 2168,3848, 2200,3208, 2216,2696, 2232,3720, 2248,2440, 2264,3464,
+   2280,2952, 2296,3976, 2328,3144, 2344,2632, 2360,3656, 2392,3400, 2408,2888, 2424,3912,
+   2456,3272, 2472,2760, 2488,3784, 2520,3528, 2536,3016, 2552,4040, 2584,3112, 2616,3624,
+   2648,3368, 2664,2856, 2680,3880, 2712,3240, 2744,3752, 2776,3496, 2792,2984, 2808,4008,
+   2840,3176, 2872,3688, 2904,3432, 2936,3944, 2968,3304, 3000,3816, 3032,3560, 3064,4072,
+   3128,3608, 3160,3352, 3192,3864, 3256,3736, 3288,3480, 3320,3992, 3384,3672, 3448,3928,
+   3512,3800, 3576,4056, 3704,3896, 3832,4024
+};
+
+const uint16_t armBitRevIndexTable_fixed_1024[ARMBITREVINDEXTABLE_FIXED_1024_TABLE_LENGTH] = 
+{
+    //radix 4, size 992
+    8,4096, 16,2048, 24,6144, 32,1024, 40,5120, 48,3072, 56,7168, 64,512, 72,4608, 
+    80,2560, 88,6656, 96,1536, 104,5632, 112,3584, 120,7680, 128,256, 136,4352, 
+    144,2304, 152,6400, 160,1280, 168,5376, 176,3328, 184,7424, 192,768, 200,4864, 
+    208,2816, 216,6912, 224,1792, 232,5888, 240,3840, 248,7936, 264,4224, 272,2176, 
+    280,6272, 288,1152, 296,5248, 304,3200, 312,7296, 320,640, 328,4736, 336,2688, 
+    344,6784, 352,1664, 360,5760, 368,3712, 376,7808, 392,4480, 400,2432, 408,6528, 
+    416,1408, 424,5504, 432,3456, 440,7552, 448,896, 456,4992, 464,2944, 472,7040, 
+    480,1920, 488,6016, 496,3968, 504,8064, 520,4160, 528,2112, 536,6208, 544,1088, 
+    552,5184, 560,3136, 568,7232, 584,4672, 592,2624, 600,6720, 608,1600, 616,5696, 
+    624,3648, 632,7744, 648,4416, 656,2368, 664,6464, 672,1344, 680,5440, 688,3392, 
+    696,7488, 704,832, 712,4928, 720,2880, 728,6976, 736,1856, 744,5952, 752,3904, 
+    760,8000, 776,4288, 784,2240, 792,6336, 800,1216, 808,5312, 816,3264, 824,7360, 
+    840,4800, 848,2752, 856,6848, 864,1728, 872,5824, 880,3776, 888,7872, 904,4544, 
+    912,2496, 920,6592, 928,1472, 936,5568, 944,3520, 952,7616, 968,5056, 976,3008, 
+    984,7104, 992,1984, 1000,6080, 1008,4032, 1016,8128, 1032,4128, 1040,2080, 
+    1048,6176, 1064,5152, 1072,3104, 1080,7200, 1096,4640, 1104,2592, 1112,6688, 
+    1120,1568, 1128,5664, 1136,3616, 1144,7712, 1160,4384, 1168,2336, 1176,6432, 
+    1184,1312, 1192,5408, 1200,3360, 1208,7456, 1224,4896, 1232,2848, 1240,6944, 
+    1248,1824, 1256,5920, 1264,3872, 1272,7968, 1288,4256, 1296,2208, 1304,6304, 
+    1320,5280, 1328,3232, 1336,7328, 1352,4768, 1360,2720, 1368,6816, 1376,1696, 
+    1384,5792, 1392,3744, 1400,7840, 1416,4512, 1424,2464, 1432,6560, 1448,5536, 
+    1456,3488, 1464,7584, 1480,5024, 1488,2976, 1496,7072, 1504,1952, 1512,6048, 
+    1520,4000, 1528,8096, 1544,4192, 1552,2144, 1560,6240, 1576,5216, 1584,3168, 
+    1592,7264, 1608,4704, 1616,2656, 1624,6752, 1640,5728, 1648,3680, 1656,7776, 
+    1672,4448, 1680,2400, 1688,6496, 1704,5472, 1712,3424, 1720,7520, 1736,4960, 
+    1744,2912, 1752,7008, 1760,1888, 1768,5984, 1776,3936, 1784,8032, 1800,4320, 
+    1808,2272, 1816,6368, 1832,5344, 1840,3296, 1848,7392, 1864,4832, 1872,2784, 
+    1880,6880, 1896,5856, 1904,3808, 1912,7904, 1928,4576, 1936,2528, 1944,6624, 
+    1960,5600, 1968,3552, 1976,7648, 1992,5088, 2000,3040, 2008,7136, 2024,6112, 
+    2032,4064, 2040,8160, 2056,4112, 2072,6160, 2088,5136, 2096,3088, 2104,7184, 
+    2120,4624, 2128,2576, 2136,6672, 2152,5648, 2160,3600, 2168,7696, 2184,4368, 
+    2192,2320, 2200,6416, 2216,5392, 2224,3344, 2232,7440, 2248,4880, 2256,2832, 
+    2264,6928, 2280,5904, 2288,3856, 2296,7952, 2312,4240, 2328,6288, 2344,5264, 
+    2352,3216, 2360,7312, 2376,4752, 2384,2704, 2392,6800, 2408,5776, 2416,3728, 
+    2424,7824, 2440,4496, 2456,6544, 2472,5520, 2480,3472, 2488,7568, 2504,5008, 
+    2512,2960, 2520,7056, 2536,6032, 2544,3984, 2552,8080, 2568,4176, 2584,6224, 
+    2600,5200, 2608,3152, 2616,7248, 2632,4688, 2648,6736, 2664,5712, 2672,3664, 
+    2680,7760, 2696,4432, 2712,6480, 2728,5456, 2736,3408, 2744,7504, 2760,4944, 
+    2768,2896, 2776,6992, 2792,5968, 2800,3920, 2808,8016, 2824,4304, 2840,6352, 
+    2856,5328, 2864,3280, 2872,7376, 2888,4816, 2904,6864, 2920,5840, 2928,3792, 
+    2936,7888, 2952,4560, 2968,6608, 2984,5584, 2992,3536, 3000,7632, 3016,5072, 
+    3032,7120, 3048,6096, 3056,4048, 3064,8144, 3080,4144, 3096,6192, 3112,5168, 
+    3128,7216, 3144,4656, 3160,6704, 3176,5680, 3184,3632, 3192,7728, 3208,4400, 
+    3224,6448, 3240,5424, 3248,3376, 3256,7472, 3272,4912, 3288,6960, 3304,5936, 
+    3312,3888, 3320,7984, 3336,4272, 3352,6320, 3368,5296, 3384,7344, 3400,4784, 
+    3416,6832, 3432,5808, 3440,3760, 3448,7856, 3464,4528, 3480,6576, 3496,5552, 
+    3512,7600, 3528,5040, 3544,7088, 3560,6064, 3568,4016, 3576,8112, 3592,4208, 
+    3608,6256, 3624,5232, 3640,7280, 3656,4720, 3672,6768, 3688,5744, 3704,7792, 
+    3720,4464, 3736,6512, 3752,5488, 3768,7536, 3784,4976, 3800,7024, 3816,6000, 
+    3824,3952, 3832,8048, 3848,4336, 3864,6384, 3880,5360, 3896,7408, 3912,4848, 
+    3928,6896, 3944,5872, 3960,7920, 3976,4592, 3992,6640, 4008,5616, 4024,7664, 
+    4040,5104, 4056,7152, 4072,6128, 4088,8176, 4120,6152, 4136,5128, 4152,7176, 
+    4168,4616, 4184,6664, 4200,5640, 4216,7688, 4232,4360, 4248,6408, 4264,5384, 
+    4280,7432, 4296,4872, 4312,6920, 4328,5896, 4344,7944, 4376,6280, 4392,5256, 
+    4408,7304, 4424,4744, 4440,6792, 4456,5768, 4472,7816, 4504,6536, 4520,5512, 
+    4536,7560, 4552,5000, 4568,7048, 4584,6024, 4600,8072, 4632,6216, 4648,5192, 
+    4664,7240, 4696,6728, 4712,5704, 4728,7752, 4760,6472, 4776,5448, 4792,7496, 
+    4808,4936, 4824,6984, 4840,5960, 4856,8008, 4888,6344, 4904,5320, 4920,7368, 
+    4952,6856, 4968,5832, 4984,7880, 5016,6600, 5032,5576, 5048,7624, 5080,7112, 
+    5096,6088, 5112,8136, 5144,6184, 5176,7208, 5208,6696, 5224,5672, 5240,7720, 
+    5272,6440, 5288,5416, 5304,7464, 5336,6952, 5352,5928, 5368,7976, 5400,6312, 
+    5432,7336, 5464,6824, 5480,5800, 5496,7848, 5528,6568, 5560,7592, 5592,7080, 
+    5608,6056, 5624,8104, 5656,6248, 5688,7272, 5720,6760, 5752,7784, 5784,6504, 
+    5816,7528, 5848,7016, 5864,5992, 5880,8040, 5912,6376, 5944,7400, 5976,6888, 
+    6008,7912, 6040,6632, 6072,7656, 6104,7144, 6136,8168, 6200,7192, 6232,6680, 
+    6264,7704, 6296,6424, 6328,7448, 6360,6936, 6392,7960, 6456,7320, 6488,6808, 
+    6520,7832, 6584,7576, 6616,7064, 6648,8088, 6712,7256, 6776,7768, 6840,7512, 
+    6872,7000, 6904,8024, 6968,7384, 7032,7896, 7096,7640, 7160,8152, 7288,7736, 
+    7352,7480, 7416,7992, 7544,7864, 7672,8120, 7928,8056 
+};
+
+const uint16_t armBitRevIndexTable_fixed_2048[ARMBITREVINDEXTABLE_FIXED_2048_TABLE_LENGTH] = 
+{
+    //4x2, size 1984
+    8,8192, 16,4096, 24,12288, 32,2048, 40,10240, 48,6144, 56,14336, 64,1024, 
+    72,9216, 80,5120, 88,13312, 96,3072, 104,11264, 112,7168, 120,15360, 128,512, 
+    136,8704, 144,4608, 152,12800, 160,2560, 168,10752, 176,6656, 184,14848, 
+    192,1536, 200,9728, 208,5632, 216,13824, 224,3584, 232,11776, 240,7680, 
+    248,15872, 264,8448, 272,4352, 280,12544, 288,2304, 296,10496, 304,6400, 
+    312,14592, 320,1280, 328,9472, 336,5376, 344,13568, 352,3328, 360,11520, 
+    368,7424, 376,15616, 384,768, 392,8960, 400,4864, 408,13056, 416,2816, 
+    424,11008, 432,6912, 440,15104, 448,1792, 456,9984, 464,5888, 472,14080, 
+    480,3840, 488,12032, 496,7936, 504,16128, 520,8320, 528,4224, 536,12416, 
+    544,2176, 552,10368, 560,6272, 568,14464, 576,1152, 584,9344, 592,5248, 
+    600,13440, 608,3200, 616,11392, 624,7296, 632,15488, 648,8832, 656,4736, 
+    664,12928, 672,2688, 680,10880, 688,6784, 696,14976, 704,1664, 712,9856, 
+    720,5760, 728,13952, 736,3712, 744,11904, 752,7808, 760,16000, 776,8576, 
+    784,4480, 792,12672, 800,2432, 808,10624, 816,6528, 824,14720, 832,1408, 
+    840,9600, 848,5504, 856,13696, 864,3456, 872,11648, 880,7552, 888,15744, 
+    904,9088, 912,4992, 920,13184, 928,2944, 936,11136, 944,7040, 952,15232, 
+    960,1920, 968,10112, 976,6016, 984,14208, 992,3968, 1000,12160, 1008,8064, 
+    1016,16256, 1032,8256, 1040,4160, 1048,12352, 1056,2112, 1064,10304, 1072,6208, 
+    1080,14400, 1096,9280, 1104,5184, 1112,13376, 1120,3136, 1128,11328, 1136,7232, 
+    1144,15424, 1160,8768, 1168,4672, 1176,12864, 1184,2624, 1192,10816, 1200,6720, 
+    1208,14912, 1216,1600, 1224,9792, 1232,5696, 1240,13888, 1248,3648, 1256,11840, 
+    1264,7744, 1272,15936, 1288,8512, 1296,4416, 1304,12608, 1312,2368, 1320,10560, 
+    1328,6464, 1336,14656, 1352,9536, 1360,5440, 1368,13632, 1376,3392, 1384,11584, 
+    1392,7488, 1400,15680, 1416,9024, 1424,4928, 1432,13120, 1440,2880, 1448,11072, 
+    1456,6976, 1464,15168, 1472,1856, 1480,10048, 1488,5952, 1496,14144, 1504,3904, 
+    1512,12096, 1520,8000, 1528,16192, 1544,8384, 1552,4288, 1560,12480, 1568,2240, 
+    1576,10432, 1584,6336, 1592,14528, 1608,9408, 1616,5312, 1624,13504, 1632,3264, 
+    1640,11456, 1648,7360, 1656,15552, 1672,8896, 1680,4800, 1688,12992, 1696,2752, 
+    1704,10944, 1712,6848, 1720,15040, 1736,9920, 1744,5824, 1752,14016, 1760,3776, 
+    1768,11968, 1776,7872, 1784,16064, 1800,8640, 1808,4544, 1816,12736, 1824,2496, 
+    1832,10688, 1840,6592, 1848,14784, 1864,9664, 1872,5568, 1880,13760, 1888,3520, 
+    1896,11712, 1904,7616, 1912,15808, 1928,9152, 1936,5056, 1944,13248, 1952,3008, 
+    1960,11200, 1968,7104, 1976,15296, 1992,10176, 2000,6080, 2008,14272, 2016,4032, 
+    2024,12224, 2032,8128, 2040,16320, 2056,8224, 2064,4128, 2072,12320, 2088,10272, 
+    2096,6176, 2104,14368, 2120,9248, 2128,5152, 2136,13344, 2144,3104, 2152,11296, 
+    2160,7200, 2168,15392, 2184,8736, 2192,4640, 2200,12832, 2208,2592, 2216,10784, 
+    2224,6688, 2232,14880, 2248,9760, 2256,5664, 2264,13856, 2272,3616, 2280,11808, 
+    2288,7712, 2296,15904, 2312,8480, 2320,4384, 2328,12576, 2344,10528, 2352,6432, 
+    2360,14624, 2376,9504, 2384,5408, 2392,13600, 2400,3360, 2408,11552, 2416,7456, 
+    2424,15648, 2440,8992, 2448,4896, 2456,13088, 2464,2848, 2472,11040, 2480,6944, 
+    2488,15136, 2504,10016, 2512,5920, 2520,14112, 2528,3872, 2536,12064, 2544,7968, 
+    2552,16160, 2568,8352, 2576,4256, 2584,12448, 2600,10400, 2608,6304, 2616,14496, 
+    2632,9376, 2640,5280, 2648,13472, 2656,3232, 2664,11424, 2672,7328, 2680,15520, 
+    2696,8864, 2704,4768, 2712,12960, 2728,10912, 2736,6816, 2744,15008, 2760,9888, 
+    2768,5792, 2776,13984, 2784,3744, 2792,11936, 2800,7840, 2808,16032, 2824,8608, 
+    2832,4512, 2840,12704, 2856,10656, 2864,6560, 2872,14752, 2888,9632, 2896,5536, 
+    2904,13728, 2912,3488, 2920,11680, 2928,7584, 2936,15776, 2952,9120, 2960,5024, 
+    2968,13216, 2984,11168, 2992,7072, 3000,15264, 3016,10144, 3024,6048, 
+    3032,14240, 3040,4000, 3048,12192, 3056,8096, 3064,16288, 3080,8288, 3088,4192, 
+    3096,12384, 3112,10336, 3120,6240, 3128,14432, 3144,9312, 3152,5216, 3160,13408, 
+    3176,11360, 3184,7264, 3192,15456, 3208,8800, 3216,4704, 3224,12896, 3240,10848, 
+    3248,6752, 3256,14944, 3272,9824, 3280,5728, 3288,13920, 3296,3680, 3304,11872, 
+    3312,7776, 3320,15968, 3336,8544, 3344,4448, 3352,12640, 3368,10592, 3376,6496, 
+    3384,14688, 3400,9568, 3408,5472, 3416,13664, 3432,11616, 3440,7520, 3448,15712, 
+    3464,9056, 3472,4960, 3480,13152, 3496,11104, 3504,7008, 3512,15200, 3528,10080, 
+    3536,5984, 3544,14176, 3552,3936, 3560,12128, 3568,8032, 3576,16224, 3592,8416, 
+    3600,4320, 3608,12512, 3624,10464, 3632,6368, 3640,14560, 3656,9440, 3664,5344, 
+    3672,13536, 3688,11488, 3696,7392, 3704,15584, 3720,8928, 3728,4832, 3736,13024, 
+    3752,10976, 3760,6880, 3768,15072, 3784,9952, 3792,5856, 3800,14048, 3816,12000, 
+    3824,7904, 3832,16096, 3848,8672, 3856,4576, 3864,12768, 3880,10720, 3888,6624, 
+    3896,14816, 3912,9696, 3920,5600, 3928,13792, 3944,11744, 3952,7648, 3960,15840, 
+    3976,9184, 3984,5088, 3992,13280, 4008,11232, 4016,7136, 4024,15328, 4040,10208, 
+    4048,6112, 4056,14304, 4072,12256, 4080,8160, 4088,16352, 4104,8208, 4120,12304, 
+    4136,10256, 4144,6160, 4152,14352, 4168,9232, 4176,5136, 4184,13328, 4200,11280, 
+    4208,7184, 4216,15376, 4232,8720, 4240,4624, 4248,12816, 4264,10768, 4272,6672, 
+    4280,14864, 4296,9744, 4304,5648, 4312,13840, 4328,11792, 4336,7696, 4344,15888, 
+    4360,8464, 4376,12560, 4392,10512, 4400,6416, 4408,14608, 4424,9488, 4432,5392, 
+    4440,13584, 4456,11536, 4464,7440, 4472,15632, 4488,8976, 4496,4880, 4504,13072, 
+    4520,11024, 4528,6928, 4536,15120, 4552,10000, 4560,5904, 4568,14096, 
+    4584,12048, 4592,7952, 4600,16144, 4616,8336, 4632,12432, 4648,10384, 4656,6288, 
+    4664,14480, 4680,9360, 4688,5264, 4696,13456, 4712,11408, 4720,7312, 4728,15504, 
+    4744,8848, 4760,12944, 4776,10896, 4784,6800, 4792,14992, 4808,9872, 4816,5776, 
+    4824,13968, 4840,11920, 4848,7824, 4856,16016, 4872,8592, 4888,12688, 
+    4904,10640, 4912,6544, 4920,14736, 4936,9616, 4944,5520, 4952,13712, 4968,11664, 
+    4976,7568, 4984,15760, 5000,9104, 5016,13200, 5032,11152, 5040,7056, 5048,15248, 
+    5064,10128, 5072,6032, 5080,14224, 5096,12176, 5104,8080, 5112,16272, 5128,8272, 
+    5144,12368, 5160,10320, 5168,6224, 5176,14416, 5192,9296, 5208,13392, 
+    5224,11344, 5232,7248, 5240,15440, 5256,8784, 5272,12880, 5288,10832, 5296,6736, 
+    5304,14928, 5320,9808, 5328,5712, 5336,13904, 5352,11856, 5360,7760, 5368,15952, 
+    5384,8528, 5400,12624, 5416,10576, 5424,6480, 5432,14672, 5448,9552, 5464,13648, 
+    5480,11600, 5488,7504, 5496,15696, 5512,9040, 5528,13136, 5544,11088, 5552,6992, 
+    5560,15184, 5576,10064, 5584,5968, 5592,14160, 5608,12112, 5616,8016, 
+    5624,16208, 5640,8400, 5656,12496, 5672,10448, 5680,6352, 5688,14544, 5704,9424, 
+    5720,13520, 5736,11472, 5744,7376, 5752,15568, 5768,8912, 5784,13008, 
+    5800,10960, 5808,6864, 5816,15056, 5832,9936, 5848,14032, 5864,11984, 5872,7888, 
+    5880,16080, 5896,8656, 5912,12752, 5928,10704, 5936,6608, 5944,14800, 5960,9680, 
+    5976,13776, 5992,11728, 6000,7632, 6008,15824, 6024,9168, 6040,13264, 
+    6056,11216, 6064,7120, 6072,15312, 6088,10192, 6104,14288, 6120,12240, 
+    6128,8144, 6136,16336, 6152,8240, 6168,12336, 6184,10288, 6200,14384, 6216,9264, 
+    6232,13360, 6248,11312, 6256,7216, 6264,15408, 6280,8752, 6296,12848, 
+    6312,10800, 6320,6704, 6328,14896, 6344,9776, 6360,13872, 6376,11824, 6384,7728, 
+    6392,15920, 6408,8496, 6424,12592, 6440,10544, 6456,14640, 6472,9520, 
+    6488,13616, 6504,11568, 6512,7472, 6520,15664, 6536,9008, 6552,13104, 
+    6568,11056, 6576,6960, 6584,15152, 6600,10032, 6616,14128, 6632,12080, 
+    6640,7984, 6648,16176, 6664,8368, 6680,12464, 6696,10416, 6712,14512, 6728,9392, 
+    6744,13488, 6760,11440, 6768,7344, 6776,15536, 6792,8880, 6808,12976, 
+    6824,10928, 6840,15024, 6856,9904, 6872,14000, 6888,11952, 6896,7856, 
+    6904,16048, 6920,8624, 6936,12720, 6952,10672, 6968,14768, 6984,9648, 
+    7000,13744, 7016,11696, 7024,7600, 7032,15792, 7048,9136, 7064,13232, 
+    7080,11184, 7096,15280, 7112,10160, 7128,14256, 7144,12208, 7152,8112, 
+    7160,16304, 7176,8304, 7192,12400, 7208,10352, 7224,14448, 7240,9328, 
+    7256,13424, 7272,11376, 7288,15472, 7304,8816, 7320,12912, 7336,10864, 
+    7352,14960, 7368,9840, 7384,13936, 7400,11888, 7408,7792, 7416,15984, 7432,8560, 
+    7448,12656, 7464,10608, 7480,14704, 7496,9584, 7512,13680, 7528,11632, 
+    7544,15728, 7560,9072, 7576,13168, 7592,11120, 7608,15216, 7624,10096, 
+    7640,14192, 7656,12144, 7664,8048, 7672,16240, 7688,8432, 7704,12528, 
+    7720,10480, 7736,14576, 7752,9456, 7768,13552, 7784,11504, 7800,15600, 
+    7816,8944, 7832,13040, 7848,10992, 7864,15088, 7880,9968, 7896,14064, 
+    7912,12016, 7928,16112, 7944,8688, 7960,12784, 7976,10736, 7992,14832, 
+    8008,9712, 8024,13808, 8040,11760, 8056,15856, 8072,9200, 8088,13296, 
+    8104,11248, 8120,15344, 8136,10224, 8152,14320, 8168,12272, 8184,16368, 
+    8216,12296, 8232,10248, 8248,14344, 8264,9224, 8280,13320, 8296,11272, 
+    8312,15368, 8328,8712, 8344,12808, 8360,10760, 8376,14856, 8392,9736, 
+    8408,13832, 8424,11784, 8440,15880, 8472,12552, 8488,10504, 8504,14600, 
+    8520,9480, 8536,13576, 8552,11528, 8568,15624, 8584,8968, 8600,13064, 
+    8616,11016, 8632,15112, 8648,9992, 8664,14088, 8680,12040, 8696,16136, 
+    8728,12424, 8744,10376, 8760,14472, 8776,9352, 8792,13448, 8808,11400, 
+    8824,15496, 8856,12936, 8872,10888, 8888,14984, 8904,9864, 8920,13960, 
+    8936,11912, 8952,16008, 8984,12680, 9000,10632, 9016,14728, 9032,9608, 
+    9048,13704, 9064,11656, 9080,15752, 9112,13192, 9128,11144, 9144,15240, 
+    9160,10120, 9176,14216, 9192,12168, 9208,16264, 9240,12360, 9256,10312, 
+    9272,14408, 9304,13384, 9320,11336, 9336,15432, 9368,12872, 9384,10824, 
+    9400,14920, 9416,9800, 9432,13896, 9448,11848, 9464,15944, 9496,12616, 
+    9512,10568, 9528,14664, 9560,13640, 9576,11592, 9592,15688, 9624,13128, 
+    9640,11080, 9656,15176, 9672,10056, 9688,14152, 9704,12104, 9720,16200, 
+    9752,12488, 9768,10440, 9784,14536, 9816,13512, 9832,11464, 9848,15560, 
+    9880,13000, 9896,10952, 9912,15048, 9944,14024, 9960,11976, 9976,16072, 
+    10008,12744, 10024,10696, 10040,14792, 10072,13768, 10088,11720, 10104,15816, 
+    10136,13256, 10152,11208, 10168,15304, 10200,14280, 10216,12232, 10232,16328, 
+    10264,12328, 10296,14376, 10328,13352, 10344,11304, 10360,15400, 10392,12840, 
+    10408,10792, 10424,14888, 10456,13864, 10472,11816, 10488,15912, 10520,12584, 
+    10552,14632, 10584,13608, 10600,11560, 10616,15656, 10648,13096, 10664,11048, 
+    10680,15144, 10712,14120, 10728,12072, 10744,16168, 10776,12456, 10808,14504, 
+    10840,13480, 10856,11432, 10872,15528, 10904,12968, 10936,15016, 10968,13992, 
+    10984,11944, 11000,16040, 11032,12712, 11064,14760, 11096,13736, 11112,11688, 
+    11128,15784, 11160,13224, 11192,15272, 11224,14248, 11240,12200, 11256,16296, 
+    11288,12392, 11320,14440, 11352,13416, 11384,15464, 11416,12904, 11448,14952, 
+    11480,13928, 11496,11880, 11512,15976, 11544,12648, 11576,14696, 11608,13672, 
+    11640,15720, 11672,13160, 11704,15208, 11736,14184, 11752,12136, 11768,16232, 
+    11800,12520, 11832,14568, 11864,13544, 11896,15592, 11928,13032, 11960,15080, 
+    11992,14056, 12024,16104, 12056,12776, 12088,14824, 12120,13800, 12152,15848, 
+    12184,13288, 12216,15336, 12248,14312, 12280,16360, 12344,14360, 12376,13336, 
+    12408,15384, 12440,12824, 12472,14872, 12504,13848, 12536,15896, 12600,14616, 
+    12632,13592, 12664,15640, 12696,13080, 12728,15128, 12760,14104, 12792,16152, 
+    12856,14488, 12888,13464, 12920,15512, 12984,15000, 13016,13976, 13048,16024, 
+    13112,14744, 13144,13720, 13176,15768, 13240,15256, 13272,14232, 13304,16280, 
+    13368,14424, 13432,15448, 13496,14936, 13528,13912, 13560,15960, 13624,14680, 
+    13688,15704, 13752,15192, 13784,14168, 13816,16216, 13880,14552, 13944,15576, 
+    14008,15064, 14072,16088, 14136,14808, 14200,15832, 14264,15320, 14328,16344, 
+    14456,15416, 14520,14904, 14584,15928, 14712,15672, 14776,15160, 14840,16184, 
+    14968,15544, 15096,16056, 15224,15800, 15352,16312, 15608,15992, 15864,16248 
+};
+
+const uint16_t armBitRevIndexTable_fixed_4096[ARMBITREVINDEXTABLE_FIXED_4096_TABLE_LENGTH] = 
+{
+    //radix 4, size 4032
+    8,16384, 16,8192, 24,24576, 32,4096, 40,20480, 48,12288, 56,28672, 64,2048, 
+    72,18432, 80,10240, 88,26624, 96,6144, 104,22528, 112,14336, 120,30720, 
+    128,1024, 136,17408, 144,9216, 152,25600, 160,5120, 168,21504, 176,13312, 
+    184,29696, 192,3072, 200,19456, 208,11264, 216,27648, 224,7168, 232,23552, 
+    240,15360, 248,31744, 256,512, 264,16896, 272,8704, 280,25088, 288,4608, 
+    296,20992, 304,12800, 312,29184, 320,2560, 328,18944, 336,10752, 344,27136, 
+    352,6656, 360,23040, 368,14848, 376,31232, 384,1536, 392,17920, 400,9728, 
+    408,26112, 416,5632, 424,22016, 432,13824, 440,30208, 448,3584, 456,19968, 
+    464,11776, 472,28160, 480,7680, 488,24064, 496,15872, 504,32256, 520,16640, 
+    528,8448, 536,24832, 544,4352, 552,20736, 560,12544, 568,28928, 576,2304, 
+    584,18688, 592,10496, 600,26880, 608,6400, 616,22784, 624,14592, 632,30976, 
+    640,1280, 648,17664, 656,9472, 664,25856, 672,5376, 680,21760, 688,13568, 
+    696,29952, 704,3328, 712,19712, 720,11520, 728,27904, 736,7424, 744,23808, 
+    752,15616, 760,32000, 776,17152, 784,8960, 792,25344, 800,4864, 808,21248, 
+    816,13056, 824,29440, 832,2816, 840,19200, 848,11008, 856,27392, 864,6912, 
+    872,23296, 880,15104, 888,31488, 896,1792, 904,18176, 912,9984, 920,26368, 
+    928,5888, 936,22272, 944,14080, 952,30464, 960,3840, 968,20224, 976,12032, 
+    984,28416, 992,7936, 1000,24320, 1008,16128, 1016,32512, 1032,16512, 1040,8320, 
+    1048,24704, 1056,4224, 1064,20608, 1072,12416, 1080,28800, 1088,2176, 
+    1096,18560, 1104,10368, 1112,26752, 1120,6272, 1128,22656, 1136,14464, 
+    1144,30848, 1160,17536, 1168,9344, 1176,25728, 1184,5248, 1192,21632, 
+    1200,13440, 1208,29824, 1216,3200, 1224,19584, 1232,11392, 1240,27776, 
+    1248,7296, 1256,23680, 1264,15488, 1272,31872, 1288,17024, 1296,8832, 
+    1304,25216, 1312,4736, 1320,21120, 1328,12928, 1336,29312, 1344,2688, 
+    1352,19072, 1360,10880, 1368,27264, 1376,6784, 1384,23168, 1392,14976, 
+    1400,31360, 1408,1664, 1416,18048, 1424,9856, 1432,26240, 1440,5760, 1448,22144, 
+    1456,13952, 1464,30336, 1472,3712, 1480,20096, 1488,11904, 1496,28288, 
+    1504,7808, 1512,24192, 1520,16000, 1528,32384, 1544,16768, 1552,8576, 
+    1560,24960, 1568,4480, 1576,20864, 1584,12672, 1592,29056, 1600,2432, 
+    1608,18816, 1616,10624, 1624,27008, 1632,6528, 1640,22912, 1648,14720, 
+    1656,31104, 1672,17792, 1680,9600, 1688,25984, 1696,5504, 1704,21888, 
+    1712,13696, 1720,30080, 1728,3456, 1736,19840, 1744,11648, 1752,28032, 
+    1760,7552, 1768,23936, 1776,15744, 1784,32128, 1800,17280, 1808,9088, 
+    1816,25472, 1824,4992, 1832,21376, 1840,13184, 1848,29568, 1856,2944, 
+    1864,19328, 1872,11136, 1880,27520, 1888,7040, 1896,23424, 1904,15232, 
+    1912,31616, 1928,18304, 1936,10112, 1944,26496, 1952,6016, 1960,22400, 
+    1968,14208, 1976,30592, 1984,3968, 1992,20352, 2000,12160, 2008,28544, 
+    2016,8064, 2024,24448, 2032,16256, 2040,32640, 2056,16448, 2064,8256, 
+    2072,24640, 2080,4160, 2088,20544, 2096,12352, 2104,28736, 2120,18496, 
+    2128,10304, 2136,26688, 2144,6208, 2152,22592, 2160,14400, 2168,30784, 
+    2184,17472, 2192,9280, 2200,25664, 2208,5184, 2216,21568, 2224,13376, 
+    2232,29760, 2240,3136, 2248,19520, 2256,11328, 2264,27712, 2272,7232, 
+    2280,23616, 2288,15424, 2296,31808, 2312,16960, 2320,8768, 2328,25152, 
+    2336,4672, 2344,21056, 2352,12864, 2360,29248, 2368,2624, 2376,19008, 
+    2384,10816, 2392,27200, 2400,6720, 2408,23104, 2416,14912, 2424,31296, 
+    2440,17984, 2448,9792, 2456,26176, 2464,5696, 2472,22080, 2480,13888, 
+    2488,30272, 2496,3648, 2504,20032, 2512,11840, 2520,28224, 2528,7744, 
+    2536,24128, 2544,15936, 2552,32320, 2568,16704, 2576,8512, 2584,24896, 
+    2592,4416, 2600,20800, 2608,12608, 2616,28992, 2632,18752, 2640,10560, 
+    2648,26944, 2656,6464, 2664,22848, 2672,14656, 2680,31040, 2696,17728, 
+    2704,9536, 2712,25920, 2720,5440, 2728,21824, 2736,13632, 2744,30016, 2752,3392, 
+    2760,19776, 2768,11584, 2776,27968, 2784,7488, 2792,23872, 2800,15680, 
+    2808,32064, 2824,17216, 2832,9024, 2840,25408, 2848,4928, 2856,21312, 
+    2864,13120, 2872,29504, 2888,19264, 2896,11072, 2904,27456, 2912,6976, 
+    2920,23360, 2928,15168, 2936,31552, 2952,18240, 2960,10048, 2968,26432, 
+    2976,5952, 2984,22336, 2992,14144, 3000,30528, 3008,3904, 3016,20288, 
+    3024,12096, 3032,28480, 3040,8000, 3048,24384, 3056,16192, 3064,32576, 
+    3080,16576, 3088,8384, 3096,24768, 3104,4288, 3112,20672, 3120,12480, 
+    3128,28864, 3144,18624, 3152,10432, 3160,26816, 3168,6336, 3176,22720, 
+    3184,14528, 3192,30912, 3208,17600, 3216,9408, 3224,25792, 3232,5312, 
+    3240,21696, 3248,13504, 3256,29888, 3272,19648, 3280,11456, 3288,27840, 
+    3296,7360, 3304,23744, 3312,15552, 3320,31936, 3336,17088, 3344,8896, 
+    3352,25280, 3360,4800, 3368,21184, 3376,12992, 3384,29376, 3400,19136, 
+    3408,10944, 3416,27328, 3424,6848, 3432,23232, 3440,15040, 3448,31424, 
+    3464,18112, 3472,9920, 3480,26304, 3488,5824, 3496,22208, 3504,14016, 
+    3512,30400, 3520,3776, 3528,20160, 3536,11968, 3544,28352, 3552,7872, 
+    3560,24256, 3568,16064, 3576,32448, 3592,16832, 3600,8640, 3608,25024, 
+    3616,4544, 3624,20928, 3632,12736, 3640,29120, 3656,18880, 3664,10688, 
+    3672,27072, 3680,6592, 3688,22976, 3696,14784, 3704,31168, 3720,17856, 
+    3728,9664, 3736,26048, 3744,5568, 3752,21952, 3760,13760, 3768,30144, 
+    3784,19904, 3792,11712, 3800,28096, 3808,7616, 3816,24000, 3824,15808, 
+    3832,32192, 3848,17344, 3856,9152, 3864,25536, 3872,5056, 3880,21440, 
+    3888,13248, 3896,29632, 3912,19392, 3920,11200, 3928,27584, 3936,7104, 
+    3944,23488, 3952,15296, 3960,31680, 3976,18368, 3984,10176, 3992,26560, 
+    4000,6080, 4008,22464, 4016,14272, 4024,30656, 4040,20416, 4048,12224, 
+    4056,28608, 4064,8128, 4072,24512, 4080,16320, 4088,32704, 4104,16416, 
+    4112,8224, 4120,24608, 4136,20512, 4144,12320, 4152,28704, 4168,18464, 
+    4176,10272, 4184,26656, 4192,6176, 4200,22560, 4208,14368, 4216,30752, 
+    4232,17440, 4240,9248, 4248,25632, 4256,5152, 4264,21536, 4272,13344, 
+    4280,29728, 4296,19488, 4304,11296, 4312,27680, 4320,7200, 4328,23584, 
+    4336,15392, 4344,31776, 4360,16928, 4368,8736, 4376,25120, 4384,4640, 
+    4392,21024, 4400,12832, 4408,29216, 4424,18976, 4432,10784, 4440,27168, 
+    4448,6688, 4456,23072, 4464,14880, 4472,31264, 4488,17952, 4496,9760, 
+    4504,26144, 4512,5664, 4520,22048, 4528,13856, 4536,30240, 4552,20000, 
+    4560,11808, 4568,28192, 4576,7712, 4584,24096, 4592,15904, 4600,32288, 
+    4616,16672, 4624,8480, 4632,24864, 4648,20768, 4656,12576, 4664,28960, 
+    4680,18720, 4688,10528, 4696,26912, 4704,6432, 4712,22816, 4720,14624, 
+    4728,31008, 4744,17696, 4752,9504, 4760,25888, 4768,5408, 4776,21792, 
+    4784,13600, 4792,29984, 4808,19744, 4816,11552, 4824,27936, 4832,7456, 
+    4840,23840, 4848,15648, 4856,32032, 4872,17184, 4880,8992, 4888,25376, 
+    4904,21280, 4912,13088, 4920,29472, 4936,19232, 4944,11040, 4952,27424, 
+    4960,6944, 4968,23328, 4976,15136, 4984,31520, 5000,18208, 5008,10016, 
+    5016,26400, 5024,5920, 5032,22304, 5040,14112, 5048,30496, 5064,20256, 
+    5072,12064, 5080,28448, 5088,7968, 5096,24352, 5104,16160, 5112,32544, 
+    5128,16544, 5136,8352, 5144,24736, 5160,20640, 5168,12448, 5176,28832, 
+    5192,18592, 5200,10400, 5208,26784, 5216,6304, 5224,22688, 5232,14496, 
+    5240,30880, 5256,17568, 5264,9376, 5272,25760, 5288,21664, 5296,13472, 
+    5304,29856, 5320,19616, 5328,11424, 5336,27808, 5344,7328, 5352,23712, 
+    5360,15520, 5368,31904, 5384,17056, 5392,8864, 5400,25248, 5416,21152, 
+    5424,12960, 5432,29344, 5448,19104, 5456,10912, 5464,27296, 5472,6816, 
+    5480,23200, 5488,15008, 5496,31392, 5512,18080, 5520,9888, 5528,26272, 
+    5536,5792, 5544,22176, 5552,13984, 5560,30368, 5576,20128, 5584,11936, 
+    5592,28320, 5600,7840, 5608,24224, 5616,16032, 5624,32416, 5640,16800, 
+    5648,8608, 5656,24992, 5672,20896, 5680,12704, 5688,29088, 5704,18848, 
+    5712,10656, 5720,27040, 5728,6560, 5736,22944, 5744,14752, 5752,31136, 
+    5768,17824, 5776,9632, 5784,26016, 5800,21920, 5808,13728, 5816,30112, 
+    5832,19872, 5840,11680, 5848,28064, 5856,7584, 5864,23968, 5872,15776, 
+    5880,32160, 5896,17312, 5904,9120, 5912,25504, 5928,21408, 5936,13216, 
+    5944,29600, 5960,19360, 5968,11168, 5976,27552, 5984,7072, 5992,23456, 
+    6000,15264, 6008,31648, 6024,18336, 6032,10144, 6040,26528, 6056,22432, 
+    6064,14240, 6072,30624, 6088,20384, 6096,12192, 6104,28576, 6112,8096, 
+    6120,24480, 6128,16288, 6136,32672, 6152,16480, 6160,8288, 6168,24672, 
+    6184,20576, 6192,12384, 6200,28768, 6216,18528, 6224,10336, 6232,26720, 
+    6248,22624, 6256,14432, 6264,30816, 6280,17504, 6288,9312, 6296,25696, 
+    6312,21600, 6320,13408, 6328,29792, 6344,19552, 6352,11360, 6360,27744, 
+    6368,7264, 6376,23648, 6384,15456, 6392,31840, 6408,16992, 6416,8800, 
+    6424,25184, 6440,21088, 6448,12896, 6456,29280, 6472,19040, 6480,10848, 
+    6488,27232, 6496,6752, 6504,23136, 6512,14944, 6520,31328, 6536,18016, 
+    6544,9824, 6552,26208, 6568,22112, 6576,13920, 6584,30304, 6600,20064, 
+    6608,11872, 6616,28256, 6624,7776, 6632,24160, 6640,15968, 6648,32352, 
+    6664,16736, 6672,8544, 6680,24928, 6696,20832, 6704,12640, 6712,29024, 
+    6728,18784, 6736,10592, 6744,26976, 6760,22880, 6768,14688, 6776,31072, 
+    6792,17760, 6800,9568, 6808,25952, 6824,21856, 6832,13664, 6840,30048, 
+    6856,19808, 6864,11616, 6872,28000, 6880,7520, 6888,23904, 6896,15712, 
+    6904,32096, 6920,17248, 6928,9056, 6936,25440, 6952,21344, 6960,13152, 
+    6968,29536, 6984,19296, 6992,11104, 7000,27488, 7016,23392, 7024,15200, 
+    7032,31584, 7048,18272, 7056,10080, 7064,26464, 7080,22368, 7088,14176, 
+    7096,30560, 7112,20320, 7120,12128, 7128,28512, 7136,8032, 7144,24416, 
+    7152,16224, 7160,32608, 7176,16608, 7184,8416, 7192,24800, 7208,20704, 
+    7216,12512, 7224,28896, 7240,18656, 7248,10464, 7256,26848, 7272,22752, 
+    7280,14560, 7288,30944, 7304,17632, 7312,9440, 7320,25824, 7336,21728, 
+    7344,13536, 7352,29920, 7368,19680, 7376,11488, 7384,27872, 7400,23776, 
+    7408,15584, 7416,31968, 7432,17120, 7440,8928, 7448,25312, 7464,21216, 
+    7472,13024, 7480,29408, 7496,19168, 7504,10976, 7512,27360, 7528,23264, 
+    7536,15072, 7544,31456, 7560,18144, 7568,9952, 7576,26336, 7592,22240, 
+    7600,14048, 7608,30432, 7624,20192, 7632,12000, 7640,28384, 7648,7904, 
+    7656,24288, 7664,16096, 7672,32480, 7688,16864, 7696,8672, 7704,25056, 
+    7720,20960, 7728,12768, 7736,29152, 7752,18912, 7760,10720, 7768,27104, 
+    7784,23008, 7792,14816, 7800,31200, 7816,17888, 7824,9696, 7832,26080, 
+    7848,21984, 7856,13792, 7864,30176, 7880,19936, 7888,11744, 7896,28128, 
+    7912,24032, 7920,15840, 7928,32224, 7944,17376, 7952,9184, 7960,25568, 
+    7976,21472, 7984,13280, 7992,29664, 8008,19424, 8016,11232, 8024,27616, 
+    8040,23520, 8048,15328, 8056,31712, 8072,18400, 8080,10208, 8088,26592, 
+    8104,22496, 8112,14304, 8120,30688, 8136,20448, 8144,12256, 8152,28640, 
+    8168,24544, 8176,16352, 8184,32736, 8200,16400, 8216,24592, 8232,20496, 
+    8240,12304, 8248,28688, 8264,18448, 8272,10256, 8280,26640, 8296,22544, 
+    8304,14352, 8312,30736, 8328,17424, 8336,9232, 8344,25616, 8360,21520, 
+    8368,13328, 8376,29712, 8392,19472, 8400,11280, 8408,27664, 8424,23568, 
+    8432,15376, 8440,31760, 8456,16912, 8464,8720, 8472,25104, 8488,21008, 
+    8496,12816, 8504,29200, 8520,18960, 8528,10768, 8536,27152, 8552,23056, 
+    8560,14864, 8568,31248, 8584,17936, 8592,9744, 8600,26128, 8616,22032, 
+    8624,13840, 8632,30224, 8648,19984, 8656,11792, 8664,28176, 8680,24080, 
+    8688,15888, 8696,32272, 8712,16656, 8728,24848, 8744,20752, 8752,12560, 
+    8760,28944, 8776,18704, 8784,10512, 8792,26896, 8808,22800, 8816,14608, 
+    8824,30992, 8840,17680, 8848,9488, 8856,25872, 8872,21776, 8880,13584, 
+    8888,29968, 8904,19728, 8912,11536, 8920,27920, 8936,23824, 8944,15632, 
+    8952,32016, 8968,17168, 8984,25360, 9000,21264, 9008,13072, 9016,29456, 
+    9032,19216, 9040,11024, 9048,27408, 9064,23312, 9072,15120, 9080,31504, 
+    9096,18192, 9104,10000, 9112,26384, 9128,22288, 9136,14096, 9144,30480, 
+    9160,20240, 9168,12048, 9176,28432, 9192,24336, 9200,16144, 9208,32528, 
+    9224,16528, 9240,24720, 9256,20624, 9264,12432, 9272,28816, 9288,18576, 
+    9296,10384, 9304,26768, 9320,22672, 9328,14480, 9336,30864, 9352,17552, 
+    9368,25744, 9384,21648, 9392,13456, 9400,29840, 9416,19600, 9424,11408, 
+    9432,27792, 9448,23696, 9456,15504, 9464,31888, 9480,17040, 9496,25232, 
+    9512,21136, 9520,12944, 9528,29328, 9544,19088, 9552,10896, 9560,27280, 
+    9576,23184, 9584,14992, 9592,31376, 9608,18064, 9616,9872, 9624,26256, 
+    9640,22160, 9648,13968, 9656,30352, 9672,20112, 9680,11920, 9688,28304, 
+    9704,24208, 9712,16016, 9720,32400, 9736,16784, 9752,24976, 9768,20880, 
+    9776,12688, 9784,29072, 9800,18832, 9808,10640, 9816,27024, 9832,22928, 
+    9840,14736, 9848,31120, 9864,17808, 9880,26000, 9896,21904, 9904,13712, 
+    9912,30096, 9928,19856, 9936,11664, 9944,28048, 9960,23952, 9968,15760, 
+    9976,32144, 9992,17296, 10008,25488, 10024,21392, 10032,13200, 10040,29584, 
+    10056,19344, 10064,11152, 10072,27536, 10088,23440, 10096,15248, 10104,31632, 
+    10120,18320, 10136,26512, 10152,22416, 10160,14224, 10168,30608, 10184,20368, 
+    10192,12176, 10200,28560, 10216,24464, 10224,16272, 10232,32656, 10248,16464, 
+    10264,24656, 10280,20560, 10288,12368, 10296,28752, 10312,18512, 10328,26704, 
+    10344,22608, 10352,14416, 10360,30800, 10376,17488, 10392,25680, 10408,21584, 
+    10416,13392, 10424,29776, 10440,19536, 10448,11344, 10456,27728, 10472,23632, 
+    10480,15440, 10488,31824, 10504,16976, 10520,25168, 10536,21072, 10544,12880, 
+    10552,29264, 10568,19024, 10576,10832, 10584,27216, 10600,23120, 10608,14928, 
+    10616,31312, 10632,18000, 10648,26192, 10664,22096, 10672,13904, 10680,30288, 
+    10696,20048, 10704,11856, 10712,28240, 10728,24144, 10736,15952, 10744,32336, 
+    10760,16720, 10776,24912, 10792,20816, 10800,12624, 10808,29008, 10824,18768, 
+    10840,26960, 10856,22864, 10864,14672, 10872,31056, 10888,17744, 10904,25936, 
+    10920,21840, 10928,13648, 10936,30032, 10952,19792, 10960,11600, 10968,27984, 
+    10984,23888, 10992,15696, 11000,32080, 11016,17232, 11032,25424, 11048,21328, 
+    11056,13136, 11064,29520, 11080,19280, 11096,27472, 11112,23376, 11120,15184, 
+    11128,31568, 11144,18256, 11160,26448, 11176,22352, 11184,14160, 11192,30544, 
+    11208,20304, 11216,12112, 11224,28496, 11240,24400, 11248,16208, 11256,32592, 
+    11272,16592, 11288,24784, 11304,20688, 11312,12496, 11320,28880, 11336,18640, 
+    11352,26832, 11368,22736, 11376,14544, 11384,30928, 11400,17616, 11416,25808, 
+    11432,21712, 11440,13520, 11448,29904, 11464,19664, 11480,27856, 11496,23760, 
+    11504,15568, 11512,31952, 11528,17104, 11544,25296, 11560,21200, 11568,13008, 
+    11576,29392, 11592,19152, 11608,27344, 11624,23248, 11632,15056, 11640,31440, 
+    11656,18128, 11672,26320, 11688,22224, 11696,14032, 11704,30416, 11720,20176, 
+    11728,11984, 11736,28368, 11752,24272, 11760,16080, 11768,32464, 11784,16848, 
+    11800,25040, 11816,20944, 11824,12752, 11832,29136, 11848,18896, 11864,27088, 
+    11880,22992, 11888,14800, 11896,31184, 11912,17872, 11928,26064, 11944,21968, 
+    11952,13776, 11960,30160, 11976,19920, 11992,28112, 12008,24016, 12016,15824, 
+    12024,32208, 12040,17360, 12056,25552, 12072,21456, 12080,13264, 12088,29648, 
+    12104,19408, 12120,27600, 12136,23504, 12144,15312, 12152,31696, 12168,18384, 
+    12184,26576, 12200,22480, 12208,14288, 12216,30672, 12232,20432, 12248,28624, 
+    12264,24528, 12272,16336, 12280,32720, 12296,16432, 12312,24624, 12328,20528, 
+    12344,28720, 12360,18480, 12376,26672, 12392,22576, 12400,14384, 12408,30768, 
+    12424,17456, 12440,25648, 12456,21552, 12464,13360, 12472,29744, 12488,19504, 
+    12504,27696, 12520,23600, 12528,15408, 12536,31792, 12552,16944, 12568,25136, 
+    12584,21040, 12592,12848, 12600,29232, 12616,18992, 12632,27184, 12648,23088, 
+    12656,14896, 12664,31280, 12680,17968, 12696,26160, 12712,22064, 12720,13872, 
+    12728,30256, 12744,20016, 12760,28208, 12776,24112, 12784,15920, 12792,32304, 
+    12808,16688, 12824,24880, 12840,20784, 12856,28976, 12872,18736, 12888,26928, 
+    12904,22832, 12912,14640, 12920,31024, 12936,17712, 12952,25904, 12968,21808, 
+    12976,13616, 12984,30000, 13000,19760, 13016,27952, 13032,23856, 13040,15664, 
+    13048,32048, 13064,17200, 13080,25392, 13096,21296, 13112,29488, 13128,19248, 
+    13144,27440, 13160,23344, 13168,15152, 13176,31536, 13192,18224, 13208,26416, 
+    13224,22320, 13232,14128, 13240,30512, 13256,20272, 13272,28464, 13288,24368, 
+    13296,16176, 13304,32560, 13320,16560, 13336,24752, 13352,20656, 13368,28848, 
+    13384,18608, 13400,26800, 13416,22704, 13424,14512, 13432,30896, 13448,17584, 
+    13464,25776, 13480,21680, 13496,29872, 13512,19632, 13528,27824, 13544,23728, 
+    13552,15536, 13560,31920, 13576,17072, 13592,25264, 13608,21168, 13624,29360, 
+    13640,19120, 13656,27312, 13672,23216, 13680,15024, 13688,31408, 13704,18096, 
+    13720,26288, 13736,22192, 13744,14000, 13752,30384, 13768,20144, 13784,28336, 
+    13800,24240, 13808,16048, 13816,32432, 13832,16816, 13848,25008, 13864,20912, 
+    13880,29104, 13896,18864, 13912,27056, 13928,22960, 13936,14768, 13944,31152, 
+    13960,17840, 13976,26032, 13992,21936, 14008,30128, 14024,19888, 14040,28080, 
+    14056,23984, 14064,15792, 14072,32176, 14088,17328, 14104,25520, 14120,21424, 
+    14136,29616, 14152,19376, 14168,27568, 14184,23472, 14192,15280, 14200,31664, 
+    14216,18352, 14232,26544, 14248,22448, 14264,30640, 14280,20400, 14296,28592, 
+    14312,24496, 14320,16304, 14328,32688, 14344,16496, 14360,24688, 14376,20592, 
+    14392,28784, 14408,18544, 14424,26736, 14440,22640, 14456,30832, 14472,17520, 
+    14488,25712, 14504,21616, 14520,29808, 14536,19568, 14552,27760, 14568,23664, 
+    14576,15472, 14584,31856, 14600,17008, 14616,25200, 14632,21104, 14648,29296, 
+    14664,19056, 14680,27248, 14696,23152, 14704,14960, 14712,31344, 14728,18032, 
+    14744,26224, 14760,22128, 14776,30320, 14792,20080, 14808,28272, 14824,24176, 
+    14832,15984, 14840,32368, 14856,16752, 14872,24944, 14888,20848, 14904,29040, 
+    14920,18800, 14936,26992, 14952,22896, 14968,31088, 14984,17776, 15000,25968, 
+    15016,21872, 15032,30064, 15048,19824, 15064,28016, 15080,23920, 15088,15728, 
+    15096,32112, 15112,17264, 15128,25456, 15144,21360, 15160,29552, 15176,19312, 
+    15192,27504, 15208,23408, 15224,31600, 15240,18288, 15256,26480, 15272,22384, 
+    15288,30576, 15304,20336, 15320,28528, 15336,24432, 15344,16240, 15352,32624, 
+    15368,16624, 15384,24816, 15400,20720, 15416,28912, 15432,18672, 15448,26864, 
+    15464,22768, 15480,30960, 15496,17648, 15512,25840, 15528,21744, 15544,29936, 
+    15560,19696, 15576,27888, 15592,23792, 15608,31984, 15624,17136, 15640,25328, 
+    15656,21232, 15672,29424, 15688,19184, 15704,27376, 15720,23280, 15736,31472, 
+    15752,18160, 15768,26352, 15784,22256, 15800,30448, 15816,20208, 15832,28400, 
+    15848,24304, 15856,16112, 15864,32496, 15880,16880, 15896,25072, 15912,20976, 
+    15928,29168, 15944,18928, 15960,27120, 15976,23024, 15992,31216, 16008,17904, 
+    16024,26096, 16040,22000, 16056,30192, 16072,19952, 16088,28144, 16104,24048, 
+    16120,32240, 16136,17392, 16152,25584, 16168,21488, 16184,29680, 16200,19440, 
+    16216,27632, 16232,23536, 16248,31728, 16264,18416, 16280,26608, 16296,22512, 
+    16312,30704, 16328,20464, 16344,28656, 16360,24560, 16376,32752, 16408,24584, 
+    16424,20488, 16440,28680, 16456,18440, 16472,26632, 16488,22536, 16504,30728, 
+    16520,17416, 16536,25608, 16552,21512, 16568,29704, 16584,19464, 16600,27656, 
+    16616,23560, 16632,31752, 16648,16904, 16664,25096, 16680,21000, 16696,29192, 
+    16712,18952, 16728,27144, 16744,23048, 16760,31240, 16776,17928, 16792,26120, 
+    16808,22024, 16824,30216, 16840,19976, 16856,28168, 16872,24072, 16888,32264, 
+    16920,24840, 16936,20744, 16952,28936, 16968,18696, 16984,26888, 17000,22792, 
+    17016,30984, 17032,17672, 17048,25864, 17064,21768, 17080,29960, 17096,19720, 
+    17112,27912, 17128,23816, 17144,32008, 17176,25352, 17192,21256, 17208,29448, 
+    17224,19208, 17240,27400, 17256,23304, 17272,31496, 17288,18184, 17304,26376, 
+    17320,22280, 17336,30472, 17352,20232, 17368,28424, 17384,24328, 17400,32520, 
+    17432,24712, 17448,20616, 17464,28808, 17480,18568, 17496,26760, 17512,22664, 
+    17528,30856, 17560,25736, 17576,21640, 17592,29832, 17608,19592, 17624,27784, 
+    17640,23688, 17656,31880, 17688,25224, 17704,21128, 17720,29320, 17736,19080, 
+    17752,27272, 17768,23176, 17784,31368, 17800,18056, 17816,26248, 17832,22152, 
+    17848,30344, 17864,20104, 17880,28296, 17896,24200, 17912,32392, 17944,24968, 
+    17960,20872, 17976,29064, 17992,18824, 18008,27016, 18024,22920, 18040,31112, 
+    18072,25992, 18088,21896, 18104,30088, 18120,19848, 18136,28040, 18152,23944, 
+    18168,32136, 18200,25480, 18216,21384, 18232,29576, 18248,19336, 18264,27528, 
+    18280,23432, 18296,31624, 18328,26504, 18344,22408, 18360,30600, 18376,20360, 
+    18392,28552, 18408,24456, 18424,32648, 18456,24648, 18472,20552, 18488,28744, 
+    18520,26696, 18536,22600, 18552,30792, 18584,25672, 18600,21576, 18616,29768, 
+    18632,19528, 18648,27720, 18664,23624, 18680,31816, 18712,25160, 18728,21064, 
+    18744,29256, 18760,19016, 18776,27208, 18792,23112, 18808,31304, 18840,26184, 
+    18856,22088, 18872,30280, 18888,20040, 18904,28232, 18920,24136, 18936,32328, 
+    18968,24904, 18984,20808, 19000,29000, 19032,26952, 19048,22856, 19064,31048, 
+    19096,25928, 19112,21832, 19128,30024, 19144,19784, 19160,27976, 19176,23880, 
+    19192,32072, 19224,25416, 19240,21320, 19256,29512, 19288,27464, 19304,23368, 
+    19320,31560, 19352,26440, 19368,22344, 19384,30536, 19400,20296, 19416,28488, 
+    19432,24392, 19448,32584, 19480,24776, 19496,20680, 19512,28872, 19544,26824, 
+    19560,22728, 19576,30920, 19608,25800, 19624,21704, 19640,29896, 19672,27848, 
+    19688,23752, 19704,31944, 19736,25288, 19752,21192, 19768,29384, 19800,27336, 
+    19816,23240, 19832,31432, 19864,26312, 19880,22216, 19896,30408, 19912,20168, 
+    19928,28360, 19944,24264, 19960,32456, 19992,25032, 20008,20936, 20024,29128, 
+    20056,27080, 20072,22984, 20088,31176, 20120,26056, 20136,21960, 20152,30152, 
+    20184,28104, 20200,24008, 20216,32200, 20248,25544, 20264,21448, 20280,29640, 
+    20312,27592, 20328,23496, 20344,31688, 20376,26568, 20392,22472, 20408,30664, 
+    20440,28616, 20456,24520, 20472,32712, 20504,24616, 20536,28712, 20568,26664, 
+    20584,22568, 20600,30760, 20632,25640, 20648,21544, 20664,29736, 20696,27688, 
+    20712,23592, 20728,31784, 20760,25128, 20776,21032, 20792,29224, 20824,27176, 
+    20840,23080, 20856,31272, 20888,26152, 20904,22056, 20920,30248, 20952,28200, 
+    20968,24104, 20984,32296, 21016,24872, 21048,28968, 21080,26920, 21096,22824, 
+    21112,31016, 21144,25896, 21160,21800, 21176,29992, 21208,27944, 21224,23848, 
+    21240,32040, 21272,25384, 21304,29480, 21336,27432, 21352,23336, 21368,31528, 
+    21400,26408, 21416,22312, 21432,30504, 21464,28456, 21480,24360, 21496,32552, 
+    21528,24744, 21560,28840, 21592,26792, 21608,22696, 21624,30888, 21656,25768, 
+    21688,29864, 21720,27816, 21736,23720, 21752,31912, 21784,25256, 21816,29352, 
+    21848,27304, 21864,23208, 21880,31400, 21912,26280, 21928,22184, 21944,30376, 
+    21976,28328, 21992,24232, 22008,32424, 22040,25000, 22072,29096, 22104,27048, 
+    22120,22952, 22136,31144, 22168,26024, 22200,30120, 22232,28072, 22248,23976, 
+    22264,32168, 22296,25512, 22328,29608, 22360,27560, 22376,23464, 22392,31656, 
+    22424,26536, 22456,30632, 22488,28584, 22504,24488, 22520,32680, 22552,24680, 
+    22584,28776, 22616,26728, 22648,30824, 22680,25704, 22712,29800, 22744,27752, 
+    22760,23656, 22776,31848, 22808,25192, 22840,29288, 22872,27240, 22888,23144, 
+    22904,31336, 22936,26216, 22968,30312, 23000,28264, 23016,24168, 23032,32360, 
+    23064,24936, 23096,29032, 23128,26984, 23160,31080, 23192,25960, 23224,30056, 
+    23256,28008, 23272,23912, 23288,32104, 23320,25448, 23352,29544, 23384,27496, 
+    23416,31592, 23448,26472, 23480,30568, 23512,28520, 23528,24424, 23544,32616, 
+    23576,24808, 23608,28904, 23640,26856, 23672,30952, 23704,25832, 23736,29928, 
+    23768,27880, 23800,31976, 23832,25320, 23864,29416, 23896,27368, 23928,31464, 
+    23960,26344, 23992,30440, 24024,28392, 24040,24296, 24056,32488, 24088,25064, 
+    24120,29160, 24152,27112, 24184,31208, 24216,26088, 24248,30184, 24280,28136, 
+    24312,32232, 24344,25576, 24376,29672, 24408,27624, 24440,31720, 24472,26600, 
+    24504,30696, 24536,28648, 24568,32744, 24632,28696, 24664,26648, 24696,30744, 
+    24728,25624, 24760,29720, 24792,27672, 24824,31768, 24856,25112, 24888,29208, 
+    24920,27160, 24952,31256, 24984,26136, 25016,30232, 25048,28184, 25080,32280, 
+    25144,28952, 25176,26904, 25208,31000, 25240,25880, 25272,29976, 25304,27928, 
+    25336,32024, 25400,29464, 25432,27416, 25464,31512, 25496,26392, 25528,30488, 
+    25560,28440, 25592,32536, 25656,28824, 25688,26776, 25720,30872, 25784,29848, 
+    25816,27800, 25848,31896, 25912,29336, 25944,27288, 25976,31384, 26008,26264, 
+    26040,30360, 26072,28312, 26104,32408, 26168,29080, 26200,27032, 26232,31128, 
+    26296,30104, 26328,28056, 26360,32152, 26424,29592, 26456,27544, 26488,31640, 
+    26552,30616, 26584,28568, 26616,32664, 26680,28760, 26744,30808, 26808,29784, 
+    26840,27736, 26872,31832, 26936,29272, 26968,27224, 27000,31320, 27064,30296, 
+    27096,28248, 27128,32344, 27192,29016, 27256,31064, 27320,30040, 27352,27992, 
+    27384,32088, 27448,29528, 27512,31576, 27576,30552, 27608,28504, 27640,32600, 
+    27704,28888, 27768,30936, 27832,29912, 27896,31960, 27960,29400, 28024,31448, 
+    28088,30424, 28120,28376, 28152,32472, 28216,29144, 28280,31192, 28344,30168, 
+    28408,32216, 28472,29656, 28536,31704, 28600,30680, 28664,32728, 28792,30776, 
+    28856,29752, 28920,31800, 28984,29240, 29048,31288, 29112,30264, 29176,32312, 
+    29304,31032, 29368,30008, 29432,32056, 29560,31544, 29624,30520, 29688,32568, 
+    29816,30904, 29944,31928, 30072,31416, 30136,30392, 30200,32440, 30328,31160, 
+    30456,32184, 30584,31672, 30712,32696, 30968,31864, 31096,31352, 31224,32376, 
+    31480,32120, 31736,32632, 32248,32504 
+};
+
+/**    
+* \par    
+* Example code for Floating-point RFFT Twiddle factors Generation:    
+* \par    
+* <pre>TW = exp(2*pi*i*[0:L/2-1]/L - pi/2*i).' </pre>    
+* \par    
+* Real and Imag values are in interleaved fashion    
+*/
+const float32_t twiddleCoef_rfft_32[32] = {
+0.0f			,	1.0f			,
+0.195090322f	,	0.98078528f 	,
+0.382683432f	,	0.923879533f	,
+0.555570233f	,	0.831469612f	,
+0.707106781f	,	0.707106781f	,
+0.831469612f	,	0.555570233f	,
+0.923879533f	,	0.382683432f    ,	
+0.98078528f		,	0.195090322f	,
+1.0f			,	0.0f			,
+0.98078528f		,	-0.195090322f	,
+0.923879533f	,	-0.382683432f	,
+0.831469612f	,	-0.555570233f	,
+0.707106781f	,	-0.707106781f	,
+0.555570233f	,	-0.831469612f	,
+0.382683432f	,	-0.923879533f	,
+0.195090322f	,	-0.98078528f	
+};
+
+const float32_t twiddleCoef_rfft_64[64] = {
+0.0f,	1.0f,
+0.098017140329561f,	0.995184726672197f,
+0.195090322016128f,	0.98078528040323f,
+0.290284677254462f,	0.956940335732209f,
+0.38268343236509f,	0.923879532511287f,
+0.471396736825998f,	0.881921264348355f,
+0.555570233019602f,	0.831469612302545f,
+0.634393284163645f,	0.773010453362737f,
+0.707106781186547f,	0.707106781186548f,
+0.773010453362737f,	0.634393284163645f,
+0.831469612302545f,	0.555570233019602f,
+0.881921264348355f,	0.471396736825998f,
+0.923879532511287f,	0.38268343236509f,
+0.956940335732209f,	0.290284677254462f,
+0.98078528040323f,	0.195090322016128f,
+0.995184726672197f,	0.098017140329561f,
+1.0f,	0.0f,
+0.995184726672197f,	-0.098017140329561f,
+0.98078528040323f,	-0.195090322016128f,
+0.956940335732209f,	-0.290284677254462f,
+0.923879532511287f,	-0.38268343236509f,
+0.881921264348355f,	-0.471396736825998f,
+0.831469612302545f,	-0.555570233019602f,
+0.773010453362737f,	-0.634393284163645f,
+0.707106781186548f,	-0.707106781186547f,
+0.634393284163645f,	-0.773010453362737f,
+0.555570233019602f,	-0.831469612302545f,
+0.471396736825998f,	-0.881921264348355f,
+0.38268343236509f,	-0.923879532511287f,
+0.290284677254462f,	-0.956940335732209f,
+0.195090322016129f,	-0.98078528040323f,
+0.098017140329561f,	-0.995184726672197f
+};
+
+const float32_t twiddleCoef_rfft_128[128] = {
+    0.000000000f,  1.000000000f,
+    0.049067674f,  0.998795456f,
+    0.098017140f,  0.995184727f,
+    0.146730474f,  0.989176510f,
+    0.195090322f,  0.980785280f,
+    0.242980180f,  0.970031253f,
+    0.290284677f,  0.956940336f,
+    0.336889853f,  0.941544065f,
+    0.382683432f,  0.923879533f,
+    0.427555093f,  0.903989293f,
+    0.471396737f,  0.881921264f,
+    0.514102744f,  0.857728610f,
+    0.555570233f,  0.831469612f,
+    0.595699304f,  0.803207531f,
+    0.634393284f,  0.773010453f,
+    0.671558955f,  0.740951125f,
+    0.707106781f,  0.707106781f,
+    0.740951125f,  0.671558955f,
+    0.773010453f,  0.634393284f,
+    0.803207531f,  0.595699304f,
+    0.831469612f,  0.555570233f,
+    0.857728610f,  0.514102744f,
+    0.881921264f,  0.471396737f,
+    0.903989293f,  0.427555093f,
+    0.923879533f,  0.382683432f,
+    0.941544065f,  0.336889853f,
+    0.956940336f,  0.290284677f,
+    0.970031253f,  0.242980180f,
+    0.980785280f,  0.195090322f,
+    0.989176510f,  0.146730474f,
+    0.995184727f,  0.098017140f,
+    0.998795456f,  0.049067674f,
+    1.000000000f,  0.000000000f,
+    0.998795456f, -0.049067674f,
+    0.995184727f, -0.098017140f,
+    0.989176510f, -0.146730474f,
+    0.980785280f, -0.195090322f,
+    0.970031253f, -0.242980180f,
+    0.956940336f, -0.290284677f,
+    0.941544065f, -0.336889853f,
+    0.923879533f, -0.382683432f,
+    0.903989293f, -0.427555093f,
+    0.881921264f, -0.471396737f,
+    0.857728610f, -0.514102744f,
+    0.831469612f, -0.555570233f,
+    0.803207531f, -0.595699304f,
+    0.773010453f, -0.634393284f,
+    0.740951125f, -0.671558955f,
+    0.707106781f, -0.707106781f,
+    0.671558955f, -0.740951125f,
+    0.634393284f, -0.773010453f,
+    0.595699304f, -0.803207531f,
+    0.555570233f, -0.831469612f,
+    0.514102744f, -0.857728610f,
+    0.471396737f, -0.881921264f,
+    0.427555093f, -0.903989293f,
+    0.382683432f, -0.923879533f,
+    0.336889853f, -0.941544065f,
+    0.290284677f, -0.956940336f,
+    0.242980180f, -0.970031253f,
+    0.195090322f, -0.980785280f,
+    0.146730474f, -0.989176510f,
+    0.098017140f, -0.995184727f,
+    0.049067674f, -0.998795456f
+};
+
+const float32_t twiddleCoef_rfft_256[256] = {
+    0.000000000f,  1.000000000f,
+    0.024541229f,  0.999698819f,
+    0.049067674f,  0.998795456f,
+    0.073564564f,  0.997290457f,
+    0.098017140f,  0.995184727f,
+    0.122410675f,  0.992479535f,
+    0.146730474f,  0.989176510f,
+    0.170961889f,  0.985277642f,
+    0.195090322f,  0.980785280f,
+    0.219101240f,  0.975702130f,
+    0.242980180f,  0.970031253f,
+    0.266712757f,  0.963776066f,
+    0.290284677f,  0.956940336f,
+    0.313681740f,  0.949528181f,
+    0.336889853f,  0.941544065f,
+    0.359895037f,  0.932992799f,
+    0.382683432f,  0.923879533f,
+    0.405241314f,  0.914209756f,
+    0.427555093f,  0.903989293f,
+    0.449611330f,  0.893224301f,
+    0.471396737f,  0.881921264f,
+    0.492898192f,  0.870086991f,
+    0.514102744f,  0.857728610f,
+    0.534997620f,  0.844853565f,
+    0.555570233f,  0.831469612f,
+    0.575808191f,  0.817584813f,
+    0.595699304f,  0.803207531f,
+    0.615231591f,  0.788346428f,
+    0.634393284f,  0.773010453f,
+    0.653172843f,  0.757208847f,
+    0.671558955f,  0.740951125f,
+    0.689540545f,  0.724247083f,
+    0.707106781f,  0.707106781f,
+    0.724247083f,  0.689540545f,
+    0.740951125f,  0.671558955f,
+    0.757208847f,  0.653172843f,
+    0.773010453f,  0.634393284f,
+    0.788346428f,  0.615231591f,
+    0.803207531f,  0.595699304f,
+    0.817584813f,  0.575808191f,
+    0.831469612f,  0.555570233f,
+    0.844853565f,  0.534997620f,
+    0.857728610f,  0.514102744f,
+    0.870086991f,  0.492898192f,
+    0.881921264f,  0.471396737f,
+    0.893224301f,  0.449611330f,
+    0.903989293f,  0.427555093f,
+    0.914209756f,  0.405241314f,
+    0.923879533f,  0.382683432f,
+    0.932992799f,  0.359895037f,
+    0.941544065f,  0.336889853f,
+    0.949528181f,  0.313681740f,
+    0.956940336f,  0.290284677f,
+    0.963776066f,  0.266712757f,
+    0.970031253f,  0.242980180f,
+    0.975702130f,  0.219101240f,
+    0.980785280f,  0.195090322f,
+    0.985277642f,  0.170961889f,
+    0.989176510f,  0.146730474f,
+    0.992479535f,  0.122410675f,
+    0.995184727f,  0.098017140f,
+    0.997290457f,  0.073564564f,
+    0.998795456f,  0.049067674f,
+    0.999698819f,  0.024541229f,
+    1.000000000f,  0.000000000f,
+    0.999698819f, -0.024541229f,
+    0.998795456f, -0.049067674f,
+    0.997290457f, -0.073564564f,
+    0.995184727f, -0.098017140f,
+    0.992479535f, -0.122410675f,
+    0.989176510f, -0.146730474f,
+    0.985277642f, -0.170961889f,
+    0.980785280f, -0.195090322f,
+    0.975702130f, -0.219101240f,
+    0.970031253f, -0.242980180f,
+    0.963776066f, -0.266712757f,
+    0.956940336f, -0.290284677f,
+    0.949528181f, -0.313681740f,
+    0.941544065f, -0.336889853f,
+    0.932992799f, -0.359895037f,
+    0.923879533f, -0.382683432f,
+    0.914209756f, -0.405241314f,
+    0.903989293f, -0.427555093f,
+    0.893224301f, -0.449611330f,
+    0.881921264f, -0.471396737f,
+    0.870086991f, -0.492898192f,
+    0.857728610f, -0.514102744f,
+    0.844853565f, -0.534997620f,
+    0.831469612f, -0.555570233f,
+    0.817584813f, -0.575808191f,
+    0.803207531f, -0.595699304f,
+    0.788346428f, -0.615231591f,
+    0.773010453f, -0.634393284f,
+    0.757208847f, -0.653172843f,
+    0.740951125f, -0.671558955f,
+    0.724247083f, -0.689540545f,
+    0.707106781f, -0.707106781f,
+    0.689540545f, -0.724247083f,
+    0.671558955f, -0.740951125f,
+    0.653172843f, -0.757208847f,
+    0.634393284f, -0.773010453f,
+    0.615231591f, -0.788346428f,
+    0.595699304f, -0.803207531f,
+    0.575808191f, -0.817584813f,
+    0.555570233f, -0.831469612f,
+    0.534997620f, -0.844853565f,
+    0.514102744f, -0.857728610f,
+    0.492898192f, -0.870086991f,
+    0.471396737f, -0.881921264f,
+    0.449611330f, -0.893224301f,
+    0.427555093f, -0.903989293f,
+    0.405241314f, -0.914209756f,
+    0.382683432f, -0.923879533f,
+    0.359895037f, -0.932992799f,
+    0.336889853f, -0.941544065f,
+    0.313681740f, -0.949528181f,
+    0.290284677f, -0.956940336f,
+    0.266712757f, -0.963776066f,
+    0.242980180f, -0.970031253f,
+    0.219101240f, -0.975702130f,
+    0.195090322f, -0.980785280f,
+    0.170961889f, -0.985277642f,
+    0.146730474f, -0.989176510f,
+    0.122410675f, -0.992479535f,
+    0.098017140f, -0.995184727f,
+    0.073564564f, -0.997290457f,
+    0.049067674f, -0.998795456f,
+    0.024541229f, -0.999698819f
+};
+
+const float32_t twiddleCoef_rfft_512[512] = {
+    0.000000000f,  1.000000000f,
+    0.012271538f,  0.999924702f,
+    0.024541229f,  0.999698819f,
+    0.036807223f,  0.999322385f,
+    0.049067674f,  0.998795456f,
+    0.061320736f,  0.998118113f,
+    0.073564564f,  0.997290457f,
+    0.085797312f,  0.996312612f,
+    0.098017140f,  0.995184727f,
+    0.110222207f,  0.993906970f,
+    0.122410675f,  0.992479535f,
+    0.134580709f,  0.990902635f,
+    0.146730474f,  0.989176510f,
+    0.158858143f,  0.987301418f,
+    0.170961889f,  0.985277642f,
+    0.183039888f,  0.983105487f,
+    0.195090322f,  0.980785280f,
+    0.207111376f,  0.978317371f,
+    0.219101240f,  0.975702130f,
+    0.231058108f,  0.972939952f,
+    0.242980180f,  0.970031253f,
+    0.254865660f,  0.966976471f,
+    0.266712757f,  0.963776066f,
+    0.278519689f,  0.960430519f,
+    0.290284677f,  0.956940336f,
+    0.302005949f,  0.953306040f,
+    0.313681740f,  0.949528181f,
+    0.325310292f,  0.945607325f,
+    0.336889853f,  0.941544065f,
+    0.348418680f,  0.937339012f,
+    0.359895037f,  0.932992799f,
+    0.371317194f,  0.928506080f,
+    0.382683432f,  0.923879533f,
+    0.393992040f,  0.919113852f,
+    0.405241314f,  0.914209756f,
+    0.416429560f,  0.909167983f,
+    0.427555093f,  0.903989293f,
+    0.438616239f,  0.898674466f,
+    0.449611330f,  0.893224301f,
+    0.460538711f,  0.887639620f,
+    0.471396737f,  0.881921264f,
+    0.482183772f,  0.876070094f,
+    0.492898192f,  0.870086991f,
+    0.503538384f,  0.863972856f,
+    0.514102744f,  0.857728610f,
+    0.524589683f,  0.851355193f,
+    0.534997620f,  0.844853565f,
+    0.545324988f,  0.838224706f,
+    0.555570233f,  0.831469612f,
+    0.565731811f,  0.824589303f,
+    0.575808191f,  0.817584813f,
+    0.585797857f,  0.810457198f,
+    0.595699304f,  0.803207531f,
+    0.605511041f,  0.795836905f,
+    0.615231591f,  0.788346428f,
+    0.624859488f,  0.780737229f,
+    0.634393284f,  0.773010453f,
+    0.643831543f,  0.765167266f,
+    0.653172843f,  0.757208847f,
+    0.662415778f,  0.749136395f,
+    0.671558955f,  0.740951125f,
+    0.680600998f,  0.732654272f,
+    0.689540545f,  0.724247083f,
+    0.698376249f,  0.715730825f,
+    0.707106781f,  0.707106781f,
+    0.715730825f,  0.698376249f,
+    0.724247083f,  0.689540545f,
+    0.732654272f,  0.680600998f,
+    0.740951125f,  0.671558955f,
+    0.749136395f,  0.662415778f,
+    0.757208847f,  0.653172843f,
+    0.765167266f,  0.643831543f,
+    0.773010453f,  0.634393284f,
+    0.780737229f,  0.624859488f,
+    0.788346428f,  0.615231591f,
+    0.795836905f,  0.605511041f,
+    0.803207531f,  0.595699304f,
+    0.810457198f,  0.585797857f,
+    0.817584813f,  0.575808191f,
+    0.824589303f,  0.565731811f,
+    0.831469612f,  0.555570233f,
+    0.838224706f,  0.545324988f,
+    0.844853565f,  0.534997620f,
+    0.851355193f,  0.524589683f,
+    0.857728610f,  0.514102744f,
+    0.863972856f,  0.503538384f,
+    0.870086991f,  0.492898192f,
+    0.876070094f,  0.482183772f,
+    0.881921264f,  0.471396737f,
+    0.887639620f,  0.460538711f,
+    0.893224301f,  0.449611330f,
+    0.898674466f,  0.438616239f,
+    0.903989293f,  0.427555093f,
+    0.909167983f,  0.416429560f,
+    0.914209756f,  0.405241314f,
+    0.919113852f,  0.393992040f,
+    0.923879533f,  0.382683432f,
+    0.928506080f,  0.371317194f,
+    0.932992799f,  0.359895037f,
+    0.937339012f,  0.348418680f,
+    0.941544065f,  0.336889853f,
+    0.945607325f,  0.325310292f,
+    0.949528181f,  0.313681740f,
+    0.953306040f,  0.302005949f,
+    0.956940336f,  0.290284677f,
+    0.960430519f,  0.278519689f,
+    0.963776066f,  0.266712757f,
+    0.966976471f,  0.254865660f,
+    0.970031253f,  0.242980180f,
+    0.972939952f,  0.231058108f,
+    0.975702130f,  0.219101240f,
+    0.978317371f,  0.207111376f,
+    0.980785280f,  0.195090322f,
+    0.983105487f,  0.183039888f,
+    0.985277642f,  0.170961889f,
+    0.987301418f,  0.158858143f,
+    0.989176510f,  0.146730474f,
+    0.990902635f,  0.134580709f,
+    0.992479535f,  0.122410675f,
+    0.993906970f,  0.110222207f,
+    0.995184727f,  0.098017140f,
+    0.996312612f,  0.085797312f,
+    0.997290457f,  0.073564564f,
+    0.998118113f,  0.061320736f,
+    0.998795456f,  0.049067674f,
+    0.999322385f,  0.036807223f,
+    0.999698819f,  0.024541229f,
+    0.999924702f,  0.012271538f,
+    1.000000000f,  0.000000000f,
+    0.999924702f, -0.012271538f,
+    0.999698819f, -0.024541229f,
+    0.999322385f, -0.036807223f,
+    0.998795456f, -0.049067674f,
+    0.998118113f, -0.061320736f,
+    0.997290457f, -0.073564564f,
+    0.996312612f, -0.085797312f,
+    0.995184727f, -0.098017140f,
+    0.993906970f, -0.110222207f,
+    0.992479535f, -0.122410675f,
+    0.990902635f, -0.134580709f,
+    0.989176510f, -0.146730474f,
+    0.987301418f, -0.158858143f,
+    0.985277642f, -0.170961889f,
+    0.983105487f, -0.183039888f,
+    0.980785280f, -0.195090322f,
+    0.978317371f, -0.207111376f,
+    0.975702130f, -0.219101240f,
+    0.972939952f, -0.231058108f,
+    0.970031253f, -0.242980180f,
+    0.966976471f, -0.254865660f,
+    0.963776066f, -0.266712757f,
+    0.960430519f, -0.278519689f,
+    0.956940336f, -0.290284677f,
+    0.953306040f, -0.302005949f,
+    0.949528181f, -0.313681740f,
+    0.945607325f, -0.325310292f,
+    0.941544065f, -0.336889853f,
+    0.937339012f, -0.348418680f,
+    0.932992799f, -0.359895037f,
+    0.928506080f, -0.371317194f,
+    0.923879533f, -0.382683432f,
+    0.919113852f, -0.393992040f,
+    0.914209756f, -0.405241314f,
+    0.909167983f, -0.416429560f,
+    0.903989293f, -0.427555093f,
+    0.898674466f, -0.438616239f,
+    0.893224301f, -0.449611330f,
+    0.887639620f, -0.460538711f,
+    0.881921264f, -0.471396737f,
+    0.876070094f, -0.482183772f,
+    0.870086991f, -0.492898192f,
+    0.863972856f, -0.503538384f,
+    0.857728610f, -0.514102744f,
+    0.851355193f, -0.524589683f,
+    0.844853565f, -0.534997620f,
+    0.838224706f, -0.545324988f,
+    0.831469612f, -0.555570233f,
+    0.824589303f, -0.565731811f,
+    0.817584813f, -0.575808191f,
+    0.810457198f, -0.585797857f,
+    0.803207531f, -0.595699304f,
+    0.795836905f, -0.605511041f,
+    0.788346428f, -0.615231591f,
+    0.780737229f, -0.624859488f,
+    0.773010453f, -0.634393284f,
+    0.765167266f, -0.643831543f,
+    0.757208847f, -0.653172843f,
+    0.749136395f, -0.662415778f,
+    0.740951125f, -0.671558955f,
+    0.732654272f, -0.680600998f,
+    0.724247083f, -0.689540545f,
+    0.715730825f, -0.698376249f,
+    0.707106781f, -0.707106781f,
+    0.698376249f, -0.715730825f,
+    0.689540545f, -0.724247083f,
+    0.680600998f, -0.732654272f,
+    0.671558955f, -0.740951125f,
+    0.662415778f, -0.749136395f,
+    0.653172843f, -0.757208847f,
+    0.643831543f, -0.765167266f,
+    0.634393284f, -0.773010453f,
+    0.624859488f, -0.780737229f,
+    0.615231591f, -0.788346428f,
+    0.605511041f, -0.795836905f,
+    0.595699304f, -0.803207531f,
+    0.585797857f, -0.810457198f,
+    0.575808191f, -0.817584813f,
+    0.565731811f, -0.824589303f,
+    0.555570233f, -0.831469612f,
+    0.545324988f, -0.838224706f,
+    0.534997620f, -0.844853565f,
+    0.524589683f, -0.851355193f,
+    0.514102744f, -0.857728610f,
+    0.503538384f, -0.863972856f,
+    0.492898192f, -0.870086991f,
+    0.482183772f, -0.876070094f,
+    0.471396737f, -0.881921264f,
+    0.460538711f, -0.887639620f,
+    0.449611330f, -0.893224301f,
+    0.438616239f, -0.898674466f,
+    0.427555093f, -0.903989293f,
+    0.416429560f, -0.909167983f,
+    0.405241314f, -0.914209756f,
+    0.393992040f, -0.919113852f,
+    0.382683432f, -0.923879533f,
+    0.371317194f, -0.928506080f,
+    0.359895037f, -0.932992799f,
+    0.348418680f, -0.937339012f,
+    0.336889853f, -0.941544065f,
+    0.325310292f, -0.945607325f,
+    0.313681740f, -0.949528181f,
+    0.302005949f, -0.953306040f,
+    0.290284677f, -0.956940336f,
+    0.278519689f, -0.960430519f,
+    0.266712757f, -0.963776066f,
+    0.254865660f, -0.966976471f,
+    0.242980180f, -0.970031253f,
+    0.231058108f, -0.972939952f,
+    0.219101240f, -0.975702130f,
+    0.207111376f, -0.978317371f,
+    0.195090322f, -0.980785280f,
+    0.183039888f, -0.983105487f,
+    0.170961889f, -0.985277642f,
+    0.158858143f, -0.987301418f,
+    0.146730474f, -0.989176510f,
+    0.134580709f, -0.990902635f,
+    0.122410675f, -0.992479535f,
+    0.110222207f, -0.993906970f,
+    0.098017140f, -0.995184727f,
+    0.085797312f, -0.996312612f,
+    0.073564564f, -0.997290457f,
+    0.061320736f, -0.998118113f,
+    0.049067674f, -0.998795456f,
+    0.036807223f, -0.999322385f,
+    0.024541229f, -0.999698819f,
+    0.012271538f, -0.999924702f
+};
+
+const float32_t twiddleCoef_rfft_1024[1024] = {
+    0.000000000f,  1.000000000f,
+    0.006135885f,  0.999981175f,
+    0.012271538f,  0.999924702f,
+    0.018406730f,  0.999830582f,
+    0.024541229f,  0.999698819f,
+    0.030674803f,  0.999529418f,
+    0.036807223f,  0.999322385f,
+    0.042938257f,  0.999077728f,
+    0.049067674f,  0.998795456f,
+    0.055195244f,  0.998475581f,
+    0.061320736f,  0.998118113f,
+    0.067443920f,  0.997723067f,
+    0.073564564f,  0.997290457f,
+    0.079682438f,  0.996820299f,
+    0.085797312f,  0.996312612f,
+    0.091908956f,  0.995767414f,
+    0.098017140f,  0.995184727f,
+    0.104121634f,  0.994564571f,
+    0.110222207f,  0.993906970f,
+    0.116318631f,  0.993211949f,
+    0.122410675f,  0.992479535f,
+    0.128498111f,  0.991709754f,
+    0.134580709f,  0.990902635f,
+    0.140658239f,  0.990058210f,
+    0.146730474f,  0.989176510f,
+    0.152797185f,  0.988257568f,
+    0.158858143f,  0.987301418f,
+    0.164913120f,  0.986308097f,
+    0.170961889f,  0.985277642f,
+    0.177004220f,  0.984210092f,
+    0.183039888f,  0.983105487f,
+    0.189068664f,  0.981963869f,
+    0.195090322f,  0.980785280f,
+    0.201104635f,  0.979569766f,
+    0.207111376f,  0.978317371f,
+    0.213110320f,  0.977028143f,
+    0.219101240f,  0.975702130f,
+    0.225083911f,  0.974339383f,
+    0.231058108f,  0.972939952f,
+    0.237023606f,  0.971503891f,
+    0.242980180f,  0.970031253f,
+    0.248927606f,  0.968522094f,
+    0.254865660f,  0.966976471f,
+    0.260794118f,  0.965394442f,
+    0.266712757f,  0.963776066f,
+    0.272621355f,  0.962121404f,
+    0.278519689f,  0.960430519f,
+    0.284407537f,  0.958703475f,
+    0.290284677f,  0.956940336f,
+    0.296150888f,  0.955141168f,
+    0.302005949f,  0.953306040f,
+    0.307849640f,  0.951435021f,
+    0.313681740f,  0.949528181f,
+    0.319502031f,  0.947585591f,
+    0.325310292f,  0.945607325f,
+    0.331106306f,  0.943593458f,
+    0.336889853f,  0.941544065f,
+    0.342660717f,  0.939459224f,
+    0.348418680f,  0.937339012f,
+    0.354163525f,  0.935183510f,
+    0.359895037f,  0.932992799f,
+    0.365612998f,  0.930766961f,
+    0.371317194f,  0.928506080f,
+    0.377007410f,  0.926210242f,
+    0.382683432f,  0.923879533f,
+    0.388345047f,  0.921514039f,
+    0.393992040f,  0.919113852f,
+    0.399624200f,  0.916679060f,
+    0.405241314f,  0.914209756f,
+    0.410843171f,  0.911706032f,
+    0.416429560f,  0.909167983f,
+    0.422000271f,  0.906595705f,
+    0.427555093f,  0.903989293f,
+    0.433093819f,  0.901348847f,
+    0.438616239f,  0.898674466f,
+    0.444122145f,  0.895966250f,
+    0.449611330f,  0.893224301f,
+    0.455083587f,  0.890448723f,
+    0.460538711f,  0.887639620f,
+    0.465976496f,  0.884797098f,
+    0.471396737f,  0.881921264f,
+    0.476799230f,  0.879012226f,
+    0.482183772f,  0.876070094f,
+    0.487550160f,  0.873094978f,
+    0.492898192f,  0.870086991f,
+    0.498227667f,  0.867046246f,
+    0.503538384f,  0.863972856f,
+    0.508830143f,  0.860866939f,
+    0.514102744f,  0.857728610f,
+    0.519355990f,  0.854557988f,
+    0.524589683f,  0.851355193f,
+    0.529803625f,  0.848120345f,
+    0.534997620f,  0.844853565f,
+    0.540171473f,  0.841554977f,
+    0.545324988f,  0.838224706f,
+    0.550457973f,  0.834862875f,
+    0.555570233f,  0.831469612f,
+    0.560661576f,  0.828045045f,
+    0.565731811f,  0.824589303f,
+    0.570780746f,  0.821102515f,
+    0.575808191f,  0.817584813f,
+    0.580813958f,  0.814036330f,
+    0.585797857f,  0.810457198f,
+    0.590759702f,  0.806847554f,
+    0.595699304f,  0.803207531f,
+    0.600616479f,  0.799537269f,
+    0.605511041f,  0.795836905f,
+    0.610382806f,  0.792106577f,
+    0.615231591f,  0.788346428f,
+    0.620057212f,  0.784556597f,
+    0.624859488f,  0.780737229f,
+    0.629638239f,  0.776888466f,
+    0.634393284f,  0.773010453f,
+    0.639124445f,  0.769103338f,
+    0.643831543f,  0.765167266f,
+    0.648514401f,  0.761202385f,
+    0.653172843f,  0.757208847f,
+    0.657806693f,  0.753186799f,
+    0.662415778f,  0.749136395f,
+    0.666999922f,  0.745057785f,
+    0.671558955f,  0.740951125f,
+    0.676092704f,  0.736816569f,
+    0.680600998f,  0.732654272f,
+    0.685083668f,  0.728464390f,
+    0.689540545f,  0.724247083f,
+    0.693971461f,  0.720002508f,
+    0.698376249f,  0.715730825f,
+    0.702754744f,  0.711432196f,
+    0.707106781f,  0.707106781f,
+    0.711432196f,  0.702754744f,
+    0.715730825f,  0.698376249f,
+    0.720002508f,  0.693971461f,
+    0.724247083f,  0.689540545f,
+    0.728464390f,  0.685083668f,
+    0.732654272f,  0.680600998f,
+    0.736816569f,  0.676092704f,
+    0.740951125f,  0.671558955f,
+    0.745057785f,  0.666999922f,
+    0.749136395f,  0.662415778f,
+    0.753186799f,  0.657806693f,
+    0.757208847f,  0.653172843f,
+    0.761202385f,  0.648514401f,
+    0.765167266f,  0.643831543f,
+    0.769103338f,  0.639124445f,
+    0.773010453f,  0.634393284f,
+    0.776888466f,  0.629638239f,
+    0.780737229f,  0.624859488f,
+    0.784556597f,  0.620057212f,
+    0.788346428f,  0.615231591f,
+    0.792106577f,  0.610382806f,
+    0.795836905f,  0.605511041f,
+    0.799537269f,  0.600616479f,
+    0.803207531f,  0.595699304f,
+    0.806847554f,  0.590759702f,
+    0.810457198f,  0.585797857f,
+    0.814036330f,  0.580813958f,
+    0.817584813f,  0.575808191f,
+    0.821102515f,  0.570780746f,
+    0.824589303f,  0.565731811f,
+    0.828045045f,  0.560661576f,
+    0.831469612f,  0.555570233f,
+    0.834862875f,  0.550457973f,
+    0.838224706f,  0.545324988f,
+    0.841554977f,  0.540171473f,
+    0.844853565f,  0.534997620f,
+    0.848120345f,  0.529803625f,
+    0.851355193f,  0.524589683f,
+    0.854557988f,  0.519355990f,
+    0.857728610f,  0.514102744f,
+    0.860866939f,  0.508830143f,
+    0.863972856f,  0.503538384f,
+    0.867046246f,  0.498227667f,
+    0.870086991f,  0.492898192f,
+    0.873094978f,  0.487550160f,
+    0.876070094f,  0.482183772f,
+    0.879012226f,  0.476799230f,
+    0.881921264f,  0.471396737f,
+    0.884797098f,  0.465976496f,
+    0.887639620f,  0.460538711f,
+    0.890448723f,  0.455083587f,
+    0.893224301f,  0.449611330f,
+    0.895966250f,  0.444122145f,
+    0.898674466f,  0.438616239f,
+    0.901348847f,  0.433093819f,
+    0.903989293f,  0.427555093f,
+    0.906595705f,  0.422000271f,
+    0.909167983f,  0.416429560f,
+    0.911706032f,  0.410843171f,
+    0.914209756f,  0.405241314f,
+    0.916679060f,  0.399624200f,
+    0.919113852f,  0.393992040f,
+    0.921514039f,  0.388345047f,
+    0.923879533f,  0.382683432f,
+    0.926210242f,  0.377007410f,
+    0.928506080f,  0.371317194f,
+    0.930766961f,  0.365612998f,
+    0.932992799f,  0.359895037f,
+    0.935183510f,  0.354163525f,
+    0.937339012f,  0.348418680f,
+    0.939459224f,  0.342660717f,
+    0.941544065f,  0.336889853f,
+    0.943593458f,  0.331106306f,
+    0.945607325f,  0.325310292f,
+    0.947585591f,  0.319502031f,
+    0.949528181f,  0.313681740f,
+    0.951435021f,  0.307849640f,
+    0.953306040f,  0.302005949f,
+    0.955141168f,  0.296150888f,
+    0.956940336f,  0.290284677f,
+    0.958703475f,  0.284407537f,
+    0.960430519f,  0.278519689f,
+    0.962121404f,  0.272621355f,
+    0.963776066f,  0.266712757f,
+    0.965394442f,  0.260794118f,
+    0.966976471f,  0.254865660f,
+    0.968522094f,  0.248927606f,
+    0.970031253f,  0.242980180f,
+    0.971503891f,  0.237023606f,
+    0.972939952f,  0.231058108f,
+    0.974339383f,  0.225083911f,
+    0.975702130f,  0.219101240f,
+    0.977028143f,  0.213110320f,
+    0.978317371f,  0.207111376f,
+    0.979569766f,  0.201104635f,
+    0.980785280f,  0.195090322f,
+    0.981963869f,  0.189068664f,
+    0.983105487f,  0.183039888f,
+    0.984210092f,  0.177004220f,
+    0.985277642f,  0.170961889f,
+    0.986308097f,  0.164913120f,
+    0.987301418f,  0.158858143f,
+    0.988257568f,  0.152797185f,
+    0.989176510f,  0.146730474f,
+    0.990058210f,  0.140658239f,
+    0.990902635f,  0.134580709f,
+    0.991709754f,  0.128498111f,
+    0.992479535f,  0.122410675f,
+    0.993211949f,  0.116318631f,
+    0.993906970f,  0.110222207f,
+    0.994564571f,  0.104121634f,
+    0.995184727f,  0.098017140f,
+    0.995767414f,  0.091908956f,
+    0.996312612f,  0.085797312f,
+    0.996820299f,  0.079682438f,
+    0.997290457f,  0.073564564f,
+    0.997723067f,  0.067443920f,
+    0.998118113f,  0.061320736f,
+    0.998475581f,  0.055195244f,
+    0.998795456f,  0.049067674f,
+    0.999077728f,  0.042938257f,
+    0.999322385f,  0.036807223f,
+    0.999529418f,  0.030674803f,
+    0.999698819f,  0.024541229f,
+    0.999830582f,  0.018406730f,
+    0.999924702f,  0.012271538f,
+    0.999981175f,  0.006135885f,
+    1.000000000f,  0.000000000f,
+    0.999981175f, -0.006135885f,
+    0.999924702f, -0.012271538f,
+    0.999830582f, -0.018406730f,
+    0.999698819f, -0.024541229f,
+    0.999529418f, -0.030674803f,
+    0.999322385f, -0.036807223f,
+    0.999077728f, -0.042938257f,
+    0.998795456f, -0.049067674f,
+    0.998475581f, -0.055195244f,
+    0.998118113f, -0.061320736f,
+    0.997723067f, -0.067443920f,
+    0.997290457f, -0.073564564f,
+    0.996820299f, -0.079682438f,
+    0.996312612f, -0.085797312f,
+    0.995767414f, -0.091908956f,
+    0.995184727f, -0.098017140f,
+    0.994564571f, -0.104121634f,
+    0.993906970f, -0.110222207f,
+    0.993211949f, -0.116318631f,
+    0.992479535f, -0.122410675f,
+    0.991709754f, -0.128498111f,
+    0.990902635f, -0.134580709f,
+    0.990058210f, -0.140658239f,
+    0.989176510f, -0.146730474f,
+    0.988257568f, -0.152797185f,
+    0.987301418f, -0.158858143f,
+    0.986308097f, -0.164913120f,
+    0.985277642f, -0.170961889f,
+    0.984210092f, -0.177004220f,
+    0.983105487f, -0.183039888f,
+    0.981963869f, -0.189068664f,
+    0.980785280f, -0.195090322f,
+    0.979569766f, -0.201104635f,
+    0.978317371f, -0.207111376f,
+    0.977028143f, -0.213110320f,
+    0.975702130f, -0.219101240f,
+    0.974339383f, -0.225083911f,
+    0.972939952f, -0.231058108f,
+    0.971503891f, -0.237023606f,
+    0.970031253f, -0.242980180f,
+    0.968522094f, -0.248927606f,
+    0.966976471f, -0.254865660f,
+    0.965394442f, -0.260794118f,
+    0.963776066f, -0.266712757f,
+    0.962121404f, -0.272621355f,
+    0.960430519f, -0.278519689f,
+    0.958703475f, -0.284407537f,
+    0.956940336f, -0.290284677f,
+    0.955141168f, -0.296150888f,
+    0.953306040f, -0.302005949f,
+    0.951435021f, -0.307849640f,
+    0.949528181f, -0.313681740f,
+    0.947585591f, -0.319502031f,
+    0.945607325f, -0.325310292f,
+    0.943593458f, -0.331106306f,
+    0.941544065f, -0.336889853f,
+    0.939459224f, -0.342660717f,
+    0.937339012f, -0.348418680f,
+    0.935183510f, -0.354163525f,
+    0.932992799f, -0.359895037f,
+    0.930766961f, -0.365612998f,
+    0.928506080f, -0.371317194f,
+    0.926210242f, -0.377007410f,
+    0.923879533f, -0.382683432f,
+    0.921514039f, -0.388345047f,
+    0.919113852f, -0.393992040f,
+    0.916679060f, -0.399624200f,
+    0.914209756f, -0.405241314f,
+    0.911706032f, -0.410843171f,
+    0.909167983f, -0.416429560f,
+    0.906595705f, -0.422000271f,
+    0.903989293f, -0.427555093f,
+    0.901348847f, -0.433093819f,
+    0.898674466f, -0.438616239f,
+    0.895966250f, -0.444122145f,
+    0.893224301f, -0.449611330f,
+    0.890448723f, -0.455083587f,
+    0.887639620f, -0.460538711f,
+    0.884797098f, -0.465976496f,
+    0.881921264f, -0.471396737f,
+    0.879012226f, -0.476799230f,
+    0.876070094f, -0.482183772f,
+    0.873094978f, -0.487550160f,
+    0.870086991f, -0.492898192f,
+    0.867046246f, -0.498227667f,
+    0.863972856f, -0.503538384f,
+    0.860866939f, -0.508830143f,
+    0.857728610f, -0.514102744f,
+    0.854557988f, -0.519355990f,
+    0.851355193f, -0.524589683f,
+    0.848120345f, -0.529803625f,
+    0.844853565f, -0.534997620f,
+    0.841554977f, -0.540171473f,
+    0.838224706f, -0.545324988f,
+    0.834862875f, -0.550457973f,
+    0.831469612f, -0.555570233f,
+    0.828045045f, -0.560661576f,
+    0.824589303f, -0.565731811f,
+    0.821102515f, -0.570780746f,
+    0.817584813f, -0.575808191f,
+    0.814036330f, -0.580813958f,
+    0.810457198f, -0.585797857f,
+    0.806847554f, -0.590759702f,
+    0.803207531f, -0.595699304f,
+    0.799537269f, -0.600616479f,
+    0.795836905f, -0.605511041f,
+    0.792106577f, -0.610382806f,
+    0.788346428f, -0.615231591f,
+    0.784556597f, -0.620057212f,
+    0.780737229f, -0.624859488f,
+    0.776888466f, -0.629638239f,
+    0.773010453f, -0.634393284f,
+    0.769103338f, -0.639124445f,
+    0.765167266f, -0.643831543f,
+    0.761202385f, -0.648514401f,
+    0.757208847f, -0.653172843f,
+    0.753186799f, -0.657806693f,
+    0.749136395f, -0.662415778f,
+    0.745057785f, -0.666999922f,
+    0.740951125f, -0.671558955f,
+    0.736816569f, -0.676092704f,
+    0.732654272f, -0.680600998f,
+    0.728464390f, -0.685083668f,
+    0.724247083f, -0.689540545f,
+    0.720002508f, -0.693971461f,
+    0.715730825f, -0.698376249f,
+    0.711432196f, -0.702754744f,
+    0.707106781f, -0.707106781f,
+    0.702754744f, -0.711432196f,
+    0.698376249f, -0.715730825f,
+    0.693971461f, -0.720002508f,
+    0.689540545f, -0.724247083f,
+    0.685083668f, -0.728464390f,
+    0.680600998f, -0.732654272f,
+    0.676092704f, -0.736816569f,
+    0.671558955f, -0.740951125f,
+    0.666999922f, -0.745057785f,
+    0.662415778f, -0.749136395f,
+    0.657806693f, -0.753186799f,
+    0.653172843f, -0.757208847f,
+    0.648514401f, -0.761202385f,
+    0.643831543f, -0.765167266f,
+    0.639124445f, -0.769103338f,
+    0.634393284f, -0.773010453f,
+    0.629638239f, -0.776888466f,
+    0.624859488f, -0.780737229f,
+    0.620057212f, -0.784556597f,
+    0.615231591f, -0.788346428f,
+    0.610382806f, -0.792106577f,
+    0.605511041f, -0.795836905f,
+    0.600616479f, -0.799537269f,
+    0.595699304f, -0.803207531f,
+    0.590759702f, -0.806847554f,
+    0.585797857f, -0.810457198f,
+    0.580813958f, -0.814036330f,
+    0.575808191f, -0.817584813f,
+    0.570780746f, -0.821102515f,
+    0.565731811f, -0.824589303f,
+    0.560661576f, -0.828045045f,
+    0.555570233f, -0.831469612f,
+    0.550457973f, -0.834862875f,
+    0.545324988f, -0.838224706f,
+    0.540171473f, -0.841554977f,
+    0.534997620f, -0.844853565f,
+    0.529803625f, -0.848120345f,
+    0.524589683f, -0.851355193f,
+    0.519355990f, -0.854557988f,
+    0.514102744f, -0.857728610f,
+    0.508830143f, -0.860866939f,
+    0.503538384f, -0.863972856f,
+    0.498227667f, -0.867046246f,
+    0.492898192f, -0.870086991f,
+    0.487550160f, -0.873094978f,
+    0.482183772f, -0.876070094f,
+    0.476799230f, -0.879012226f,
+    0.471396737f, -0.881921264f,
+    0.465976496f, -0.884797098f,
+    0.460538711f, -0.887639620f,
+    0.455083587f, -0.890448723f,
+    0.449611330f, -0.893224301f,
+    0.444122145f, -0.895966250f,
+    0.438616239f, -0.898674466f,
+    0.433093819f, -0.901348847f,
+    0.427555093f, -0.903989293f,
+    0.422000271f, -0.906595705f,
+    0.416429560f, -0.909167983f,
+    0.410843171f, -0.911706032f,
+    0.405241314f, -0.914209756f,
+    0.399624200f, -0.916679060f,
+    0.393992040f, -0.919113852f,
+    0.388345047f, -0.921514039f,
+    0.382683432f, -0.923879533f,
+    0.377007410f, -0.926210242f,
+    0.371317194f, -0.928506080f,
+    0.365612998f, -0.930766961f,
+    0.359895037f, -0.932992799f,
+    0.354163525f, -0.935183510f,
+    0.348418680f, -0.937339012f,
+    0.342660717f, -0.939459224f,
+    0.336889853f, -0.941544065f,
+    0.331106306f, -0.943593458f,
+    0.325310292f, -0.945607325f,
+    0.319502031f, -0.947585591f,
+    0.313681740f, -0.949528181f,
+    0.307849640f, -0.951435021f,
+    0.302005949f, -0.953306040f,
+    0.296150888f, -0.955141168f,
+    0.290284677f, -0.956940336f,
+    0.284407537f, -0.958703475f,
+    0.278519689f, -0.960430519f,
+    0.272621355f, -0.962121404f,
+    0.266712757f, -0.963776066f,
+    0.260794118f, -0.965394442f,
+    0.254865660f, -0.966976471f,
+    0.248927606f, -0.968522094f,
+    0.242980180f, -0.970031253f,
+    0.237023606f, -0.971503891f,
+    0.231058108f, -0.972939952f,
+    0.225083911f, -0.974339383f,
+    0.219101240f, -0.975702130f,
+    0.213110320f, -0.977028143f,
+    0.207111376f, -0.978317371f,
+    0.201104635f, -0.979569766f,
+    0.195090322f, -0.980785280f,
+    0.189068664f, -0.981963869f,
+    0.183039888f, -0.983105487f,
+    0.177004220f, -0.984210092f,
+    0.170961889f, -0.985277642f,
+    0.164913120f, -0.986308097f,
+    0.158858143f, -0.987301418f,
+    0.152797185f, -0.988257568f,
+    0.146730474f, -0.989176510f,
+    0.140658239f, -0.990058210f,
+    0.134580709f, -0.990902635f,
+    0.128498111f, -0.991709754f,
+    0.122410675f, -0.992479535f,
+    0.116318631f, -0.993211949f,
+    0.110222207f, -0.993906970f,
+    0.104121634f, -0.994564571f,
+    0.098017140f, -0.995184727f,
+    0.091908956f, -0.995767414f,
+    0.085797312f, -0.996312612f,
+    0.079682438f, -0.996820299f,
+    0.073564564f, -0.997290457f,
+    0.067443920f, -0.997723067f,
+    0.061320736f, -0.998118113f,
+    0.055195244f, -0.998475581f,
+    0.049067674f, -0.998795456f,
+    0.042938257f, -0.999077728f,
+    0.036807223f, -0.999322385f,
+    0.030674803f, -0.999529418f,
+    0.024541229f, -0.999698819f,
+    0.018406730f, -0.999830582f,
+    0.012271538f, -0.999924702f,
+    0.006135885f, -0.999981175f
+};
+
+const float32_t twiddleCoef_rfft_2048[2048] = {
+    0.000000000f,  1.000000000f,
+    0.003067957f,  0.999995294f,
+    0.006135885f,  0.999981175f,
+    0.009203755f,  0.999957645f,
+    0.012271538f,  0.999924702f,
+    0.015339206f,  0.999882347f,
+    0.018406730f,  0.999830582f,
+    0.021474080f,  0.999769405f,
+    0.024541229f,  0.999698819f,
+    0.027608146f,  0.999618822f,
+    0.030674803f,  0.999529418f,
+    0.033741172f,  0.999430605f,
+    0.036807223f,  0.999322385f,
+    0.039872928f,  0.999204759f,
+    0.042938257f,  0.999077728f,
+    0.046003182f,  0.998941293f,
+    0.049067674f,  0.998795456f,
+    0.052131705f,  0.998640218f,
+    0.055195244f,  0.998475581f,
+    0.058258265f,  0.998301545f,
+    0.061320736f,  0.998118113f,
+    0.064382631f,  0.997925286f,
+    0.067443920f,  0.997723067f,
+    0.070504573f,  0.997511456f,
+    0.073564564f,  0.997290457f,
+    0.076623861f,  0.997060070f,
+    0.079682438f,  0.996820299f,
+    0.082740265f,  0.996571146f,
+    0.085797312f,  0.996312612f,
+    0.088853553f,  0.996044701f,
+    0.091908956f,  0.995767414f,
+    0.094963495f,  0.995480755f,
+    0.098017140f,  0.995184727f,
+    0.101069863f,  0.994879331f,
+    0.104121634f,  0.994564571f,
+    0.107172425f,  0.994240449f,
+    0.110222207f,  0.993906970f,
+    0.113270952f,  0.993564136f,
+    0.116318631f,  0.993211949f,
+    0.119365215f,  0.992850414f,
+    0.122410675f,  0.992479535f,
+    0.125454983f,  0.992099313f,
+    0.128498111f,  0.991709754f,
+    0.131540029f,  0.991310860f,
+    0.134580709f,  0.990902635f,
+    0.137620122f,  0.990485084f,
+    0.140658239f,  0.990058210f,
+    0.143695033f,  0.989622017f,
+    0.146730474f,  0.989176510f,
+    0.149764535f,  0.988721692f,
+    0.152797185f,  0.988257568f,
+    0.155828398f,  0.987784142f,
+    0.158858143f,  0.987301418f,
+    0.161886394f,  0.986809402f,
+    0.164913120f,  0.986308097f,
+    0.167938295f,  0.985797509f,
+    0.170961889f,  0.985277642f,
+    0.173983873f,  0.984748502f,
+    0.177004220f,  0.984210092f,
+    0.180022901f,  0.983662419f,
+    0.183039888f,  0.983105487f,
+    0.186055152f,  0.982539302f,
+    0.189068664f,  0.981963869f,
+    0.192080397f,  0.981379193f,
+    0.195090322f,  0.980785280f,
+    0.198098411f,  0.980182136f,
+    0.201104635f,  0.979569766f,
+    0.204108966f,  0.978948175f,
+    0.207111376f,  0.978317371f,
+    0.210111837f,  0.977677358f,
+    0.213110320f,  0.977028143f,
+    0.216106797f,  0.976369731f,
+    0.219101240f,  0.975702130f,
+    0.222093621f,  0.975025345f,
+    0.225083911f,  0.974339383f,
+    0.228072083f,  0.973644250f,
+    0.231058108f,  0.972939952f,
+    0.234041959f,  0.972226497f,
+    0.237023606f,  0.971503891f,
+    0.240003022f,  0.970772141f,
+    0.242980180f,  0.970031253f,
+    0.245955050f,  0.969281235f,
+    0.248927606f,  0.968522094f,
+    0.251897818f,  0.967753837f,
+    0.254865660f,  0.966976471f,
+    0.257831102f,  0.966190003f,
+    0.260794118f,  0.965394442f,
+    0.263754679f,  0.964589793f,
+    0.266712757f,  0.963776066f,
+    0.269668326f,  0.962953267f,
+    0.272621355f,  0.962121404f,
+    0.275571819f,  0.961280486f,
+    0.278519689f,  0.960430519f,
+    0.281464938f,  0.959571513f,
+    0.284407537f,  0.958703475f,
+    0.287347460f,  0.957826413f,
+    0.290284677f,  0.956940336f,
+    0.293219163f,  0.956045251f,
+    0.296150888f,  0.955141168f,
+    0.299079826f,  0.954228095f,
+    0.302005949f,  0.953306040f,
+    0.304929230f,  0.952375013f,
+    0.307849640f,  0.951435021f,
+    0.310767153f,  0.950486074f,
+    0.313681740f,  0.949528181f,
+    0.316593376f,  0.948561350f,
+    0.319502031f,  0.947585591f,
+    0.322407679f,  0.946600913f,
+    0.325310292f,  0.945607325f,
+    0.328209844f,  0.944604837f,
+    0.331106306f,  0.943593458f,
+    0.333999651f,  0.942573198f,
+    0.336889853f,  0.941544065f,
+    0.339776884f,  0.940506071f,
+    0.342660717f,  0.939459224f,
+    0.345541325f,  0.938403534f,
+    0.348418680f,  0.937339012f,
+    0.351292756f,  0.936265667f,
+    0.354163525f,  0.935183510f,
+    0.357030961f,  0.934092550f,
+    0.359895037f,  0.932992799f,
+    0.362755724f,  0.931884266f,
+    0.365612998f,  0.930766961f,
+    0.368466830f,  0.929640896f,
+    0.371317194f,  0.928506080f,
+    0.374164063f,  0.927362526f,
+    0.377007410f,  0.926210242f,
+    0.379847209f,  0.925049241f,
+    0.382683432f,  0.923879533f,
+    0.385516054f,  0.922701128f,
+    0.388345047f,  0.921514039f,
+    0.391170384f,  0.920318277f,
+    0.393992040f,  0.919113852f,
+    0.396809987f,  0.917900776f,
+    0.399624200f,  0.916679060f,
+    0.402434651f,  0.915448716f,
+    0.405241314f,  0.914209756f,
+    0.408044163f,  0.912962190f,
+    0.410843171f,  0.911706032f,
+    0.413638312f,  0.910441292f,
+    0.416429560f,  0.909167983f,
+    0.419216888f,  0.907886116f,
+    0.422000271f,  0.906595705f,
+    0.424779681f,  0.905296759f,
+    0.427555093f,  0.903989293f,
+    0.430326481f,  0.902673318f,
+    0.433093819f,  0.901348847f,
+    0.435857080f,  0.900015892f,
+    0.438616239f,  0.898674466f,
+    0.441371269f,  0.897324581f,
+    0.444122145f,  0.895966250f,
+    0.446868840f,  0.894599486f,
+    0.449611330f,  0.893224301f,
+    0.452349587f,  0.891840709f,
+    0.455083587f,  0.890448723f,
+    0.457813304f,  0.889048356f,
+    0.460538711f,  0.887639620f,
+    0.463259784f,  0.886222530f,
+    0.465976496f,  0.884797098f,
+    0.468688822f,  0.883363339f,
+    0.471396737f,  0.881921264f,
+    0.474100215f,  0.880470889f,
+    0.476799230f,  0.879012226f,
+    0.479493758f,  0.877545290f,
+    0.482183772f,  0.876070094f,
+    0.484869248f,  0.874586652f,
+    0.487550160f,  0.873094978f,
+    0.490226483f,  0.871595087f,
+    0.492898192f,  0.870086991f,
+    0.495565262f,  0.868570706f,
+    0.498227667f,  0.867046246f,
+    0.500885383f,  0.865513624f,
+    0.503538384f,  0.863972856f,
+    0.506186645f,  0.862423956f,
+    0.508830143f,  0.860866939f,
+    0.511468850f,  0.859301818f,
+    0.514102744f,  0.857728610f,
+    0.516731799f,  0.856147328f,
+    0.519355990f,  0.854557988f,
+    0.521975293f,  0.852960605f,
+    0.524589683f,  0.851355193f,
+    0.527199135f,  0.849741768f,
+    0.529803625f,  0.848120345f,
+    0.532403128f,  0.846490939f,
+    0.534997620f,  0.844853565f,
+    0.537587076f,  0.843208240f,
+    0.540171473f,  0.841554977f,
+    0.542750785f,  0.839893794f,
+    0.545324988f,  0.838224706f,
+    0.547894059f,  0.836547727f,
+    0.550457973f,  0.834862875f,
+    0.553016706f,  0.833170165f,
+    0.555570233f,  0.831469612f,
+    0.558118531f,  0.829761234f,
+    0.560661576f,  0.828045045f,
+    0.563199344f,  0.826321063f,
+    0.565731811f,  0.824589303f,
+    0.568258953f,  0.822849781f,
+    0.570780746f,  0.821102515f,
+    0.573297167f,  0.819347520f,
+    0.575808191f,  0.817584813f,
+    0.578313796f,  0.815814411f,
+    0.580813958f,  0.814036330f,
+    0.583308653f,  0.812250587f,
+    0.585797857f,  0.810457198f,
+    0.588281548f,  0.808656182f,
+    0.590759702f,  0.806847554f,
+    0.593232295f,  0.805031331f,
+    0.595699304f,  0.803207531f,
+    0.598160707f,  0.801376172f,
+    0.600616479f,  0.799537269f,
+    0.603066599f,  0.797690841f,
+    0.605511041f,  0.795836905f,
+    0.607949785f,  0.793975478f,
+    0.610382806f,  0.792106577f,
+    0.612810082f,  0.790230221f,
+    0.615231591f,  0.788346428f,
+    0.617647308f,  0.786455214f,
+    0.620057212f,  0.784556597f,
+    0.622461279f,  0.782650596f,
+    0.624859488f,  0.780737229f,
+    0.627251815f,  0.778816512f,
+    0.629638239f,  0.776888466f,
+    0.632018736f,  0.774953107f,
+    0.634393284f,  0.773010453f,
+    0.636761861f,  0.771060524f,
+    0.639124445f,  0.769103338f,
+    0.641481013f,  0.767138912f,
+    0.643831543f,  0.765167266f,
+    0.646176013f,  0.763188417f,
+    0.648514401f,  0.761202385f,
+    0.650846685f,  0.759209189f,
+    0.653172843f,  0.757208847f,
+    0.655492853f,  0.755201377f,
+    0.657806693f,  0.753186799f,
+    0.660114342f,  0.751165132f,
+    0.662415778f,  0.749136395f,
+    0.664710978f,  0.747100606f,
+    0.666999922f,  0.745057785f,
+    0.669282588f,  0.743007952f,
+    0.671558955f,  0.740951125f,
+    0.673829000f,  0.738887324f,
+    0.676092704f,  0.736816569f,
+    0.678350043f,  0.734738878f,
+    0.680600998f,  0.732654272f,
+    0.682845546f,  0.730562769f,
+    0.685083668f,  0.728464390f,
+    0.687315341f,  0.726359155f,
+    0.689540545f,  0.724247083f,
+    0.691759258f,  0.722128194f,
+    0.693971461f,  0.720002508f,
+    0.696177131f,  0.717870045f,
+    0.698376249f,  0.715730825f,
+    0.700568794f,  0.713584869f,
+    0.702754744f,  0.711432196f,
+    0.704934080f,  0.709272826f,
+    0.707106781f,  0.707106781f,
+    0.709272826f,  0.704934080f,
+    0.711432196f,  0.702754744f,
+    0.713584869f,  0.700568794f,
+    0.715730825f,  0.698376249f,
+    0.717870045f,  0.696177131f,
+    0.720002508f,  0.693971461f,
+    0.722128194f,  0.691759258f,
+    0.724247083f,  0.689540545f,
+    0.726359155f,  0.687315341f,
+    0.728464390f,  0.685083668f,
+    0.730562769f,  0.682845546f,
+    0.732654272f,  0.680600998f,
+    0.734738878f,  0.678350043f,
+    0.736816569f,  0.676092704f,
+    0.738887324f,  0.673829000f,
+    0.740951125f,  0.671558955f,
+    0.743007952f,  0.669282588f,
+    0.745057785f,  0.666999922f,
+    0.747100606f,  0.664710978f,
+    0.749136395f,  0.662415778f,
+    0.751165132f,  0.660114342f,
+    0.753186799f,  0.657806693f,
+    0.755201377f,  0.655492853f,
+    0.757208847f,  0.653172843f,
+    0.759209189f,  0.650846685f,
+    0.761202385f,  0.648514401f,
+    0.763188417f,  0.646176013f,
+    0.765167266f,  0.643831543f,
+    0.767138912f,  0.641481013f,
+    0.769103338f,  0.639124445f,
+    0.771060524f,  0.636761861f,
+    0.773010453f,  0.634393284f,
+    0.774953107f,  0.632018736f,
+    0.776888466f,  0.629638239f,
+    0.778816512f,  0.627251815f,
+    0.780737229f,  0.624859488f,
+    0.782650596f,  0.622461279f,
+    0.784556597f,  0.620057212f,
+    0.786455214f,  0.617647308f,
+    0.788346428f,  0.615231591f,
+    0.790230221f,  0.612810082f,
+    0.792106577f,  0.610382806f,
+    0.793975478f,  0.607949785f,
+    0.795836905f,  0.605511041f,
+    0.797690841f,  0.603066599f,
+    0.799537269f,  0.600616479f,
+    0.801376172f,  0.598160707f,
+    0.803207531f,  0.595699304f,
+    0.805031331f,  0.593232295f,
+    0.806847554f,  0.590759702f,
+    0.808656182f,  0.588281548f,
+    0.810457198f,  0.585797857f,
+    0.812250587f,  0.583308653f,
+    0.814036330f,  0.580813958f,
+    0.815814411f,  0.578313796f,
+    0.817584813f,  0.575808191f,
+    0.819347520f,  0.573297167f,
+    0.821102515f,  0.570780746f,
+    0.822849781f,  0.568258953f,
+    0.824589303f,  0.565731811f,
+    0.826321063f,  0.563199344f,
+    0.828045045f,  0.560661576f,
+    0.829761234f,  0.558118531f,
+    0.831469612f,  0.555570233f,
+    0.833170165f,  0.553016706f,
+    0.834862875f,  0.550457973f,
+    0.836547727f,  0.547894059f,
+    0.838224706f,  0.545324988f,
+    0.839893794f,  0.542750785f,
+    0.841554977f,  0.540171473f,
+    0.843208240f,  0.537587076f,
+    0.844853565f,  0.534997620f,
+    0.846490939f,  0.532403128f,
+    0.848120345f,  0.529803625f,
+    0.849741768f,  0.527199135f,
+    0.851355193f,  0.524589683f,
+    0.852960605f,  0.521975293f,
+    0.854557988f,  0.519355990f,
+    0.856147328f,  0.516731799f,
+    0.857728610f,  0.514102744f,
+    0.859301818f,  0.511468850f,
+    0.860866939f,  0.508830143f,
+    0.862423956f,  0.506186645f,
+    0.863972856f,  0.503538384f,
+    0.865513624f,  0.500885383f,
+    0.867046246f,  0.498227667f,
+    0.868570706f,  0.495565262f,
+    0.870086991f,  0.492898192f,
+    0.871595087f,  0.490226483f,
+    0.873094978f,  0.487550160f,
+    0.874586652f,  0.484869248f,
+    0.876070094f,  0.482183772f,
+    0.877545290f,  0.479493758f,
+    0.879012226f,  0.476799230f,
+    0.880470889f,  0.474100215f,
+    0.881921264f,  0.471396737f,
+    0.883363339f,  0.468688822f,
+    0.884797098f,  0.465976496f,
+    0.886222530f,  0.463259784f,
+    0.887639620f,  0.460538711f,
+    0.889048356f,  0.457813304f,
+    0.890448723f,  0.455083587f,
+    0.891840709f,  0.452349587f,
+    0.893224301f,  0.449611330f,
+    0.894599486f,  0.446868840f,
+    0.895966250f,  0.444122145f,
+    0.897324581f,  0.441371269f,
+    0.898674466f,  0.438616239f,
+    0.900015892f,  0.435857080f,
+    0.901348847f,  0.433093819f,
+    0.902673318f,  0.430326481f,
+    0.903989293f,  0.427555093f,
+    0.905296759f,  0.424779681f,
+    0.906595705f,  0.422000271f,
+    0.907886116f,  0.419216888f,
+    0.909167983f,  0.416429560f,
+    0.910441292f,  0.413638312f,
+    0.911706032f,  0.410843171f,
+    0.912962190f,  0.408044163f,
+    0.914209756f,  0.405241314f,
+    0.915448716f,  0.402434651f,
+    0.916679060f,  0.399624200f,
+    0.917900776f,  0.396809987f,
+    0.919113852f,  0.393992040f,
+    0.920318277f,  0.391170384f,
+    0.921514039f,  0.388345047f,
+    0.922701128f,  0.385516054f,
+    0.923879533f,  0.382683432f,
+    0.925049241f,  0.379847209f,
+    0.926210242f,  0.377007410f,
+    0.927362526f,  0.374164063f,
+    0.928506080f,  0.371317194f,
+    0.929640896f,  0.368466830f,
+    0.930766961f,  0.365612998f,
+    0.931884266f,  0.362755724f,
+    0.932992799f,  0.359895037f,
+    0.934092550f,  0.357030961f,
+    0.935183510f,  0.354163525f,
+    0.936265667f,  0.351292756f,
+    0.937339012f,  0.348418680f,
+    0.938403534f,  0.345541325f,
+    0.939459224f,  0.342660717f,
+    0.940506071f,  0.339776884f,
+    0.941544065f,  0.336889853f,
+    0.942573198f,  0.333999651f,
+    0.943593458f,  0.331106306f,
+    0.944604837f,  0.328209844f,
+    0.945607325f,  0.325310292f,
+    0.946600913f,  0.322407679f,
+    0.947585591f,  0.319502031f,
+    0.948561350f,  0.316593376f,
+    0.949528181f,  0.313681740f,
+    0.950486074f,  0.310767153f,
+    0.951435021f,  0.307849640f,
+    0.952375013f,  0.304929230f,
+    0.953306040f,  0.302005949f,
+    0.954228095f,  0.299079826f,
+    0.955141168f,  0.296150888f,
+    0.956045251f,  0.293219163f,
+    0.956940336f,  0.290284677f,
+    0.957826413f,  0.287347460f,
+    0.958703475f,  0.284407537f,
+    0.959571513f,  0.281464938f,
+    0.960430519f,  0.278519689f,
+    0.961280486f,  0.275571819f,
+    0.962121404f,  0.272621355f,
+    0.962953267f,  0.269668326f,
+    0.963776066f,  0.266712757f,
+    0.964589793f,  0.263754679f,
+    0.965394442f,  0.260794118f,
+    0.966190003f,  0.257831102f,
+    0.966976471f,  0.254865660f,
+    0.967753837f,  0.251897818f,
+    0.968522094f,  0.248927606f,
+    0.969281235f,  0.245955050f,
+    0.970031253f,  0.242980180f,
+    0.970772141f,  0.240003022f,
+    0.971503891f,  0.237023606f,
+    0.972226497f,  0.234041959f,
+    0.972939952f,  0.231058108f,
+    0.973644250f,  0.228072083f,
+    0.974339383f,  0.225083911f,
+    0.975025345f,  0.222093621f,
+    0.975702130f,  0.219101240f,
+    0.976369731f,  0.216106797f,
+    0.977028143f,  0.213110320f,
+    0.977677358f,  0.210111837f,
+    0.978317371f,  0.207111376f,
+    0.978948175f,  0.204108966f,
+    0.979569766f,  0.201104635f,
+    0.980182136f,  0.198098411f,
+    0.980785280f,  0.195090322f,
+    0.981379193f,  0.192080397f,
+    0.981963869f,  0.189068664f,
+    0.982539302f,  0.186055152f,
+    0.983105487f,  0.183039888f,
+    0.983662419f,  0.180022901f,
+    0.984210092f,  0.177004220f,
+    0.984748502f,  0.173983873f,
+    0.985277642f,  0.170961889f,
+    0.985797509f,  0.167938295f,
+    0.986308097f,  0.164913120f,
+    0.986809402f,  0.161886394f,
+    0.987301418f,  0.158858143f,
+    0.987784142f,  0.155828398f,
+    0.988257568f,  0.152797185f,
+    0.988721692f,  0.149764535f,
+    0.989176510f,  0.146730474f,
+    0.989622017f,  0.143695033f,
+    0.990058210f,  0.140658239f,
+    0.990485084f,  0.137620122f,
+    0.990902635f,  0.134580709f,
+    0.991310860f,  0.131540029f,
+    0.991709754f,  0.128498111f,
+    0.992099313f,  0.125454983f,
+    0.992479535f,  0.122410675f,
+    0.992850414f,  0.119365215f,
+    0.993211949f,  0.116318631f,
+    0.993564136f,  0.113270952f,
+    0.993906970f,  0.110222207f,
+    0.994240449f,  0.107172425f,
+    0.994564571f,  0.104121634f,
+    0.994879331f,  0.101069863f,
+    0.995184727f,  0.098017140f,
+    0.995480755f,  0.094963495f,
+    0.995767414f,  0.091908956f,
+    0.996044701f,  0.088853553f,
+    0.996312612f,  0.085797312f,
+    0.996571146f,  0.082740265f,
+    0.996820299f,  0.079682438f,
+    0.997060070f,  0.076623861f,
+    0.997290457f,  0.073564564f,
+    0.997511456f,  0.070504573f,
+    0.997723067f,  0.067443920f,
+    0.997925286f,  0.064382631f,
+    0.998118113f,  0.061320736f,
+    0.998301545f,  0.058258265f,
+    0.998475581f,  0.055195244f,
+    0.998640218f,  0.052131705f,
+    0.998795456f,  0.049067674f,
+    0.998941293f,  0.046003182f,
+    0.999077728f,  0.042938257f,
+    0.999204759f,  0.039872928f,
+    0.999322385f,  0.036807223f,
+    0.999430605f,  0.033741172f,
+    0.999529418f,  0.030674803f,
+    0.999618822f,  0.027608146f,
+    0.999698819f,  0.024541229f,
+    0.999769405f,  0.021474080f,
+    0.999830582f,  0.018406730f,
+    0.999882347f,  0.015339206f,
+    0.999924702f,  0.012271538f,
+    0.999957645f,  0.009203755f,
+    0.999981175f,  0.006135885f,
+    0.999995294f,  0.003067957f,
+    1.000000000f,  0.000000000f,
+    0.999995294f, -0.003067957f,
+    0.999981175f, -0.006135885f,
+    0.999957645f, -0.009203755f,
+    0.999924702f, -0.012271538f,
+    0.999882347f, -0.015339206f,
+    0.999830582f, -0.018406730f,
+    0.999769405f, -0.021474080f,
+    0.999698819f, -0.024541229f,
+    0.999618822f, -0.027608146f,
+    0.999529418f, -0.030674803f,
+    0.999430605f, -0.033741172f,
+    0.999322385f, -0.036807223f,
+    0.999204759f, -0.039872928f,
+    0.999077728f, -0.042938257f,
+    0.998941293f, -0.046003182f,
+    0.998795456f, -0.049067674f,
+    0.998640218f, -0.052131705f,
+    0.998475581f, -0.055195244f,
+    0.998301545f, -0.058258265f,
+    0.998118113f, -0.061320736f,
+    0.997925286f, -0.064382631f,
+    0.997723067f, -0.067443920f,
+    0.997511456f, -0.070504573f,
+    0.997290457f, -0.073564564f,
+    0.997060070f, -0.076623861f,
+    0.996820299f, -0.079682438f,
+    0.996571146f, -0.082740265f,
+    0.996312612f, -0.085797312f,
+    0.996044701f, -0.088853553f,
+    0.995767414f, -0.091908956f,
+    0.995480755f, -0.094963495f,
+    0.995184727f, -0.098017140f,
+    0.994879331f, -0.101069863f,
+    0.994564571f, -0.104121634f,
+    0.994240449f, -0.107172425f,
+    0.993906970f, -0.110222207f,
+    0.993564136f, -0.113270952f,
+    0.993211949f, -0.116318631f,
+    0.992850414f, -0.119365215f,
+    0.992479535f, -0.122410675f,
+    0.992099313f, -0.125454983f,
+    0.991709754f, -0.128498111f,
+    0.991310860f, -0.131540029f,
+    0.990902635f, -0.134580709f,
+    0.990485084f, -0.137620122f,
+    0.990058210f, -0.140658239f,
+    0.989622017f, -0.143695033f,
+    0.989176510f, -0.146730474f,
+    0.988721692f, -0.149764535f,
+    0.988257568f, -0.152797185f,
+    0.987784142f, -0.155828398f,
+    0.987301418f, -0.158858143f,
+    0.986809402f, -0.161886394f,
+    0.986308097f, -0.164913120f,
+    0.985797509f, -0.167938295f,
+    0.985277642f, -0.170961889f,
+    0.984748502f, -0.173983873f,
+    0.984210092f, -0.177004220f,
+    0.983662419f, -0.180022901f,
+    0.983105487f, -0.183039888f,
+    0.982539302f, -0.186055152f,
+    0.981963869f, -0.189068664f,
+    0.981379193f, -0.192080397f,
+    0.980785280f, -0.195090322f,
+    0.980182136f, -0.198098411f,
+    0.979569766f, -0.201104635f,
+    0.978948175f, -0.204108966f,
+    0.978317371f, -0.207111376f,
+    0.977677358f, -0.210111837f,
+    0.977028143f, -0.213110320f,
+    0.976369731f, -0.216106797f,
+    0.975702130f, -0.219101240f,
+    0.975025345f, -0.222093621f,
+    0.974339383f, -0.225083911f,
+    0.973644250f, -0.228072083f,
+    0.972939952f, -0.231058108f,
+    0.972226497f, -0.234041959f,
+    0.971503891f, -0.237023606f,
+    0.970772141f, -0.240003022f,
+    0.970031253f, -0.242980180f,
+    0.969281235f, -0.245955050f,
+    0.968522094f, -0.248927606f,
+    0.967753837f, -0.251897818f,
+    0.966976471f, -0.254865660f,
+    0.966190003f, -0.257831102f,
+    0.965394442f, -0.260794118f,
+    0.964589793f, -0.263754679f,
+    0.963776066f, -0.266712757f,
+    0.962953267f, -0.269668326f,
+    0.962121404f, -0.272621355f,
+    0.961280486f, -0.275571819f,
+    0.960430519f, -0.278519689f,
+    0.959571513f, -0.281464938f,
+    0.958703475f, -0.284407537f,
+    0.957826413f, -0.287347460f,
+    0.956940336f, -0.290284677f,
+    0.956045251f, -0.293219163f,
+    0.955141168f, -0.296150888f,
+    0.954228095f, -0.299079826f,
+    0.953306040f, -0.302005949f,
+    0.952375013f, -0.304929230f,
+    0.951435021f, -0.307849640f,
+    0.950486074f, -0.310767153f,
+    0.949528181f, -0.313681740f,
+    0.948561350f, -0.316593376f,
+    0.947585591f, -0.319502031f,
+    0.946600913f, -0.322407679f,
+    0.945607325f, -0.325310292f,
+    0.944604837f, -0.328209844f,
+    0.943593458f, -0.331106306f,
+    0.942573198f, -0.333999651f,
+    0.941544065f, -0.336889853f,
+    0.940506071f, -0.339776884f,
+    0.939459224f, -0.342660717f,
+    0.938403534f, -0.345541325f,
+    0.937339012f, -0.348418680f,
+    0.936265667f, -0.351292756f,
+    0.935183510f, -0.354163525f,
+    0.934092550f, -0.357030961f,
+    0.932992799f, -0.359895037f,
+    0.931884266f, -0.362755724f,
+    0.930766961f, -0.365612998f,
+    0.929640896f, -0.368466830f,
+    0.928506080f, -0.371317194f,
+    0.927362526f, -0.374164063f,
+    0.926210242f, -0.377007410f,
+    0.925049241f, -0.379847209f,
+    0.923879533f, -0.382683432f,
+    0.922701128f, -0.385516054f,
+    0.921514039f, -0.388345047f,
+    0.920318277f, -0.391170384f,
+    0.919113852f, -0.393992040f,
+    0.917900776f, -0.396809987f,
+    0.916679060f, -0.399624200f,
+    0.915448716f, -0.402434651f,
+    0.914209756f, -0.405241314f,
+    0.912962190f, -0.408044163f,
+    0.911706032f, -0.410843171f,
+    0.910441292f, -0.413638312f,
+    0.909167983f, -0.416429560f,
+    0.907886116f, -0.419216888f,
+    0.906595705f, -0.422000271f,
+    0.905296759f, -0.424779681f,
+    0.903989293f, -0.427555093f,
+    0.902673318f, -0.430326481f,
+    0.901348847f, -0.433093819f,
+    0.900015892f, -0.435857080f,
+    0.898674466f, -0.438616239f,
+    0.897324581f, -0.441371269f,
+    0.895966250f, -0.444122145f,
+    0.894599486f, -0.446868840f,
+    0.893224301f, -0.449611330f,
+    0.891840709f, -0.452349587f,
+    0.890448723f, -0.455083587f,
+    0.889048356f, -0.457813304f,
+    0.887639620f, -0.460538711f,
+    0.886222530f, -0.463259784f,
+    0.884797098f, -0.465976496f,
+    0.883363339f, -0.468688822f,
+    0.881921264f, -0.471396737f,
+    0.880470889f, -0.474100215f,
+    0.879012226f, -0.476799230f,
+    0.877545290f, -0.479493758f,
+    0.876070094f, -0.482183772f,
+    0.874586652f, -0.484869248f,
+    0.873094978f, -0.487550160f,
+    0.871595087f, -0.490226483f,
+    0.870086991f, -0.492898192f,
+    0.868570706f, -0.495565262f,
+    0.867046246f, -0.498227667f,
+    0.865513624f, -0.500885383f,
+    0.863972856f, -0.503538384f,
+    0.862423956f, -0.506186645f,
+    0.860866939f, -0.508830143f,
+    0.859301818f, -0.511468850f,
+    0.857728610f, -0.514102744f,
+    0.856147328f, -0.516731799f,
+    0.854557988f, -0.519355990f,
+    0.852960605f, -0.521975293f,
+    0.851355193f, -0.524589683f,
+    0.849741768f, -0.527199135f,
+    0.848120345f, -0.529803625f,
+    0.846490939f, -0.532403128f,
+    0.844853565f, -0.534997620f,
+    0.843208240f, -0.537587076f,
+    0.841554977f, -0.540171473f,
+    0.839893794f, -0.542750785f,
+    0.838224706f, -0.545324988f,
+    0.836547727f, -0.547894059f,
+    0.834862875f, -0.550457973f,
+    0.833170165f, -0.553016706f,
+    0.831469612f, -0.555570233f,
+    0.829761234f, -0.558118531f,
+    0.828045045f, -0.560661576f,
+    0.826321063f, -0.563199344f,
+    0.824589303f, -0.565731811f,
+    0.822849781f, -0.568258953f,
+    0.821102515f, -0.570780746f,
+    0.819347520f, -0.573297167f,
+    0.817584813f, -0.575808191f,
+    0.815814411f, -0.578313796f,
+    0.814036330f, -0.580813958f,
+    0.812250587f, -0.583308653f,
+    0.810457198f, -0.585797857f,
+    0.808656182f, -0.588281548f,
+    0.806847554f, -0.590759702f,
+    0.805031331f, -0.593232295f,
+    0.803207531f, -0.595699304f,
+    0.801376172f, -0.598160707f,
+    0.799537269f, -0.600616479f,
+    0.797690841f, -0.603066599f,
+    0.795836905f, -0.605511041f,
+    0.793975478f, -0.607949785f,
+    0.792106577f, -0.610382806f,
+    0.790230221f, -0.612810082f,
+    0.788346428f, -0.615231591f,
+    0.786455214f, -0.617647308f,
+    0.784556597f, -0.620057212f,
+    0.782650596f, -0.622461279f,
+    0.780737229f, -0.624859488f,
+    0.778816512f, -0.627251815f,
+    0.776888466f, -0.629638239f,
+    0.774953107f, -0.632018736f,
+    0.773010453f, -0.634393284f,
+    0.771060524f, -0.636761861f,
+    0.769103338f, -0.639124445f,
+    0.767138912f, -0.641481013f,
+    0.765167266f, -0.643831543f,
+    0.763188417f, -0.646176013f,
+    0.761202385f, -0.648514401f,
+    0.759209189f, -0.650846685f,
+    0.757208847f, -0.653172843f,
+    0.755201377f, -0.655492853f,
+    0.753186799f, -0.657806693f,
+    0.751165132f, -0.660114342f,
+    0.749136395f, -0.662415778f,
+    0.747100606f, -0.664710978f,
+    0.745057785f, -0.666999922f,
+    0.743007952f, -0.669282588f,
+    0.740951125f, -0.671558955f,
+    0.738887324f, -0.673829000f,
+    0.736816569f, -0.676092704f,
+    0.734738878f, -0.678350043f,
+    0.732654272f, -0.680600998f,
+    0.730562769f, -0.682845546f,
+    0.728464390f, -0.685083668f,
+    0.726359155f, -0.687315341f,
+    0.724247083f, -0.689540545f,
+    0.722128194f, -0.691759258f,
+    0.720002508f, -0.693971461f,
+    0.717870045f, -0.696177131f,
+    0.715730825f, -0.698376249f,
+    0.713584869f, -0.700568794f,
+    0.711432196f, -0.702754744f,
+    0.709272826f, -0.704934080f,
+    0.707106781f, -0.707106781f,
+    0.704934080f, -0.709272826f,
+    0.702754744f, -0.711432196f,
+    0.700568794f, -0.713584869f,
+    0.698376249f, -0.715730825f,
+    0.696177131f, -0.717870045f,
+    0.693971461f, -0.720002508f,
+    0.691759258f, -0.722128194f,
+    0.689540545f, -0.724247083f,
+    0.687315341f, -0.726359155f,
+    0.685083668f, -0.728464390f,
+    0.682845546f, -0.730562769f,
+    0.680600998f, -0.732654272f,
+    0.678350043f, -0.734738878f,
+    0.676092704f, -0.736816569f,
+    0.673829000f, -0.738887324f,
+    0.671558955f, -0.740951125f,
+    0.669282588f, -0.743007952f,
+    0.666999922f, -0.745057785f,
+    0.664710978f, -0.747100606f,
+    0.662415778f, -0.749136395f,
+    0.660114342f, -0.751165132f,
+    0.657806693f, -0.753186799f,
+    0.655492853f, -0.755201377f,
+    0.653172843f, -0.757208847f,
+    0.650846685f, -0.759209189f,
+    0.648514401f, -0.761202385f,
+    0.646176013f, -0.763188417f,
+    0.643831543f, -0.765167266f,
+    0.641481013f, -0.767138912f,
+    0.639124445f, -0.769103338f,
+    0.636761861f, -0.771060524f,
+    0.634393284f, -0.773010453f,
+    0.632018736f, -0.774953107f,
+    0.629638239f, -0.776888466f,
+    0.627251815f, -0.778816512f,
+    0.624859488f, -0.780737229f,
+    0.622461279f, -0.782650596f,
+    0.620057212f, -0.784556597f,
+    0.617647308f, -0.786455214f,
+    0.615231591f, -0.788346428f,
+    0.612810082f, -0.790230221f,
+    0.610382806f, -0.792106577f,
+    0.607949785f, -0.793975478f,
+    0.605511041f, -0.795836905f,
+    0.603066599f, -0.797690841f,
+    0.600616479f, -0.799537269f,
+    0.598160707f, -0.801376172f,
+    0.595699304f, -0.803207531f,
+    0.593232295f, -0.805031331f,
+    0.590759702f, -0.806847554f,
+    0.588281548f, -0.808656182f,
+    0.585797857f, -0.810457198f,
+    0.583308653f, -0.812250587f,
+    0.580813958f, -0.814036330f,
+    0.578313796f, -0.815814411f,
+    0.575808191f, -0.817584813f,
+    0.573297167f, -0.819347520f,
+    0.570780746f, -0.821102515f,
+    0.568258953f, -0.822849781f,
+    0.565731811f, -0.824589303f,
+    0.563199344f, -0.826321063f,
+    0.560661576f, -0.828045045f,
+    0.558118531f, -0.829761234f,
+    0.555570233f, -0.831469612f,
+    0.553016706f, -0.833170165f,
+    0.550457973f, -0.834862875f,
+    0.547894059f, -0.836547727f,
+    0.545324988f, -0.838224706f,
+    0.542750785f, -0.839893794f,
+    0.540171473f, -0.841554977f,
+    0.537587076f, -0.843208240f,
+    0.534997620f, -0.844853565f,
+    0.532403128f, -0.846490939f,
+    0.529803625f, -0.848120345f,
+    0.527199135f, -0.849741768f,
+    0.524589683f, -0.851355193f,
+    0.521975293f, -0.852960605f,
+    0.519355990f, -0.854557988f,
+    0.516731799f, -0.856147328f,
+    0.514102744f, -0.857728610f,
+    0.511468850f, -0.859301818f,
+    0.508830143f, -0.860866939f,
+    0.506186645f, -0.862423956f,
+    0.503538384f, -0.863972856f,
+    0.500885383f, -0.865513624f,
+    0.498227667f, -0.867046246f,
+    0.495565262f, -0.868570706f,
+    0.492898192f, -0.870086991f,
+    0.490226483f, -0.871595087f,
+    0.487550160f, -0.873094978f,
+    0.484869248f, -0.874586652f,
+    0.482183772f, -0.876070094f,
+    0.479493758f, -0.877545290f,
+    0.476799230f, -0.879012226f,
+    0.474100215f, -0.880470889f,
+    0.471396737f, -0.881921264f,
+    0.468688822f, -0.883363339f,
+    0.465976496f, -0.884797098f,
+    0.463259784f, -0.886222530f,
+    0.460538711f, -0.887639620f,
+    0.457813304f, -0.889048356f,
+    0.455083587f, -0.890448723f,
+    0.452349587f, -0.891840709f,
+    0.449611330f, -0.893224301f,
+    0.446868840f, -0.894599486f,
+    0.444122145f, -0.895966250f,
+    0.441371269f, -0.897324581f,
+    0.438616239f, -0.898674466f,
+    0.435857080f, -0.900015892f,
+    0.433093819f, -0.901348847f,
+    0.430326481f, -0.902673318f,
+    0.427555093f, -0.903989293f,
+    0.424779681f, -0.905296759f,
+    0.422000271f, -0.906595705f,
+    0.419216888f, -0.907886116f,
+    0.416429560f, -0.909167983f,
+    0.413638312f, -0.910441292f,
+    0.410843171f, -0.911706032f,
+    0.408044163f, -0.912962190f,
+    0.405241314f, -0.914209756f,
+    0.402434651f, -0.915448716f,
+    0.399624200f, -0.916679060f,
+    0.396809987f, -0.917900776f,
+    0.393992040f, -0.919113852f,
+    0.391170384f, -0.920318277f,
+    0.388345047f, -0.921514039f,
+    0.385516054f, -0.922701128f,
+    0.382683432f, -0.923879533f,
+    0.379847209f, -0.925049241f,
+    0.377007410f, -0.926210242f,
+    0.374164063f, -0.927362526f,
+    0.371317194f, -0.928506080f,
+    0.368466830f, -0.929640896f,
+    0.365612998f, -0.930766961f,
+    0.362755724f, -0.931884266f,
+    0.359895037f, -0.932992799f,
+    0.357030961f, -0.934092550f,
+    0.354163525f, -0.935183510f,
+    0.351292756f, -0.936265667f,
+    0.348418680f, -0.937339012f,
+    0.345541325f, -0.938403534f,
+    0.342660717f, -0.939459224f,
+    0.339776884f, -0.940506071f,
+    0.336889853f, -0.941544065f,
+    0.333999651f, -0.942573198f,
+    0.331106306f, -0.943593458f,
+    0.328209844f, -0.944604837f,
+    0.325310292f, -0.945607325f,
+    0.322407679f, -0.946600913f,
+    0.319502031f, -0.947585591f,
+    0.316593376f, -0.948561350f,
+    0.313681740f, -0.949528181f,
+    0.310767153f, -0.950486074f,
+    0.307849640f, -0.951435021f,
+    0.304929230f, -0.952375013f,
+    0.302005949f, -0.953306040f,
+    0.299079826f, -0.954228095f,
+    0.296150888f, -0.955141168f,
+    0.293219163f, -0.956045251f,
+    0.290284677f, -0.956940336f,
+    0.287347460f, -0.957826413f,
+    0.284407537f, -0.958703475f,
+    0.281464938f, -0.959571513f,
+    0.278519689f, -0.960430519f,
+    0.275571819f, -0.961280486f,
+    0.272621355f, -0.962121404f,
+    0.269668326f, -0.962953267f,
+    0.266712757f, -0.963776066f,
+    0.263754679f, -0.964589793f,
+    0.260794118f, -0.965394442f,
+    0.257831102f, -0.966190003f,
+    0.254865660f, -0.966976471f,
+    0.251897818f, -0.967753837f,
+    0.248927606f, -0.968522094f,
+    0.245955050f, -0.969281235f,
+    0.242980180f, -0.970031253f,
+    0.240003022f, -0.970772141f,
+    0.237023606f, -0.971503891f,
+    0.234041959f, -0.972226497f,
+    0.231058108f, -0.972939952f,
+    0.228072083f, -0.973644250f,
+    0.225083911f, -0.974339383f,
+    0.222093621f, -0.975025345f,
+    0.219101240f, -0.975702130f,
+    0.216106797f, -0.976369731f,
+    0.213110320f, -0.977028143f,
+    0.210111837f, -0.977677358f,
+    0.207111376f, -0.978317371f,
+    0.204108966f, -0.978948175f,
+    0.201104635f, -0.979569766f,
+    0.198098411f, -0.980182136f,
+    0.195090322f, -0.980785280f,
+    0.192080397f, -0.981379193f,
+    0.189068664f, -0.981963869f,
+    0.186055152f, -0.982539302f,
+    0.183039888f, -0.983105487f,
+    0.180022901f, -0.983662419f,
+    0.177004220f, -0.984210092f,
+    0.173983873f, -0.984748502f,
+    0.170961889f, -0.985277642f,
+    0.167938295f, -0.985797509f,
+    0.164913120f, -0.986308097f,
+    0.161886394f, -0.986809402f,
+    0.158858143f, -0.987301418f,
+    0.155828398f, -0.987784142f,
+    0.152797185f, -0.988257568f,
+    0.149764535f, -0.988721692f,
+    0.146730474f, -0.989176510f,
+    0.143695033f, -0.989622017f,
+    0.140658239f, -0.990058210f,
+    0.137620122f, -0.990485084f,
+    0.134580709f, -0.990902635f,
+    0.131540029f, -0.991310860f,
+    0.128498111f, -0.991709754f,
+    0.125454983f, -0.992099313f,
+    0.122410675f, -0.992479535f,
+    0.119365215f, -0.992850414f,
+    0.116318631f, -0.993211949f,
+    0.113270952f, -0.993564136f,
+    0.110222207f, -0.993906970f,
+    0.107172425f, -0.994240449f,
+    0.104121634f, -0.994564571f,
+    0.101069863f, -0.994879331f,
+    0.098017140f, -0.995184727f,
+    0.094963495f, -0.995480755f,
+    0.091908956f, -0.995767414f,
+    0.088853553f, -0.996044701f,
+    0.085797312f, -0.996312612f,
+    0.082740265f, -0.996571146f,
+    0.079682438f, -0.996820299f,
+    0.076623861f, -0.997060070f,
+    0.073564564f, -0.997290457f,
+    0.070504573f, -0.997511456f,
+    0.067443920f, -0.997723067f,
+    0.064382631f, -0.997925286f,
+    0.061320736f, -0.998118113f,
+    0.058258265f, -0.998301545f,
+    0.055195244f, -0.998475581f,
+    0.052131705f, -0.998640218f,
+    0.049067674f, -0.998795456f,
+    0.046003182f, -0.998941293f,
+    0.042938257f, -0.999077728f,
+    0.039872928f, -0.999204759f,
+    0.036807223f, -0.999322385f,
+    0.033741172f, -0.999430605f,
+    0.030674803f, -0.999529418f,
+    0.027608146f, -0.999618822f,
+    0.024541229f, -0.999698819f,
+    0.021474080f, -0.999769405f,
+    0.018406730f, -0.999830582f,
+    0.015339206f, -0.999882347f,
+    0.012271538f, -0.999924702f,
+    0.009203755f, -0.999957645f,
+    0.006135885f, -0.999981175f,
+    0.003067957f, -0.999995294f
+};
+
+const float32_t twiddleCoef_rfft_4096[4096] = {
+    0.000000000f,  1.000000000f,
+    0.001533980f,  0.999998823f,
+    0.003067957f,  0.999995294f,
+    0.004601926f,  0.999989411f,
+    0.006135885f,  0.999981175f,
+    0.007669829f,  0.999970586f,
+    0.009203755f,  0.999957645f,
+    0.010737659f,  0.999942350f,
+    0.012271538f,  0.999924702f,
+    0.013805389f,  0.999904701f,
+    0.015339206f,  0.999882347f,
+    0.016872988f,  0.999857641f,
+    0.018406730f,  0.999830582f,
+    0.019940429f,  0.999801170f,
+    0.021474080f,  0.999769405f,
+    0.023007681f,  0.999735288f,
+    0.024541229f,  0.999698819f,
+    0.026074718f,  0.999659997f,
+    0.027608146f,  0.999618822f,
+    0.029141509f,  0.999575296f,
+    0.030674803f,  0.999529418f,
+    0.032208025f,  0.999481187f,
+    0.033741172f,  0.999430605f,
+    0.035274239f,  0.999377670f,
+    0.036807223f,  0.999322385f,
+    0.038340120f,  0.999264747f,
+    0.039872928f,  0.999204759f,
+    0.041405641f,  0.999142419f,
+    0.042938257f,  0.999077728f,
+    0.044470772f,  0.999010686f,
+    0.046003182f,  0.998941293f,
+    0.047535484f,  0.998869550f,
+    0.049067674f,  0.998795456f,
+    0.050599749f,  0.998719012f,
+    0.052131705f,  0.998640218f,
+    0.053663538f,  0.998559074f,
+    0.055195244f,  0.998475581f,
+    0.056726821f,  0.998389737f,
+    0.058258265f,  0.998301545f,
+    0.059789571f,  0.998211003f,
+    0.061320736f,  0.998118113f,
+    0.062851758f,  0.998022874f,
+    0.064382631f,  0.997925286f,
+    0.065913353f,  0.997825350f,
+    0.067443920f,  0.997723067f,
+    0.068974328f,  0.997618435f,
+    0.070504573f,  0.997511456f,
+    0.072034653f,  0.997402130f,
+    0.073564564f,  0.997290457f,
+    0.075094301f,  0.997176437f,
+    0.076623861f,  0.997060070f,
+    0.078153242f,  0.996941358f,
+    0.079682438f,  0.996820299f,
+    0.081211447f,  0.996696895f,
+    0.082740265f,  0.996571146f,
+    0.084268888f,  0.996443051f,
+    0.085797312f,  0.996312612f,
+    0.087325535f,  0.996179829f,
+    0.088853553f,  0.996044701f,
+    0.090381361f,  0.995907229f,
+    0.091908956f,  0.995767414f,
+    0.093436336f,  0.995625256f,
+    0.094963495f,  0.995480755f,
+    0.096490431f,  0.995333912f,
+    0.098017140f,  0.995184727f,
+    0.099543619f,  0.995033199f,
+    0.101069863f,  0.994879331f,
+    0.102595869f,  0.994723121f,
+    0.104121634f,  0.994564571f,
+    0.105647154f,  0.994403680f,
+    0.107172425f,  0.994240449f,
+    0.108697444f,  0.994074879f,
+    0.110222207f,  0.993906970f,
+    0.111746711f,  0.993736722f,
+    0.113270952f,  0.993564136f,
+    0.114794927f,  0.993389211f,
+    0.116318631f,  0.993211949f,
+    0.117842062f,  0.993032350f,
+    0.119365215f,  0.992850414f,
+    0.120888087f,  0.992666142f,
+    0.122410675f,  0.992479535f,
+    0.123932975f,  0.992290591f,
+    0.125454983f,  0.992099313f,
+    0.126976696f,  0.991905700f,
+    0.128498111f,  0.991709754f,
+    0.130019223f,  0.991511473f,
+    0.131540029f,  0.991310860f,
+    0.133060525f,  0.991107914f,
+    0.134580709f,  0.990902635f,
+    0.136100575f,  0.990695025f,
+    0.137620122f,  0.990485084f,
+    0.139139344f,  0.990272812f,
+    0.140658239f,  0.990058210f,
+    0.142176804f,  0.989841278f,
+    0.143695033f,  0.989622017f,
+    0.145212925f,  0.989400428f,
+    0.146730474f,  0.989176510f,
+    0.148247679f,  0.988950265f,
+    0.149764535f,  0.988721692f,
+    0.151281038f,  0.988490793f,
+    0.152797185f,  0.988257568f,
+    0.154312973f,  0.988022017f,
+    0.155828398f,  0.987784142f,
+    0.157343456f,  0.987543942f,
+    0.158858143f,  0.987301418f,
+    0.160372457f,  0.987056571f,
+    0.161886394f,  0.986809402f,
+    0.163399949f,  0.986559910f,
+    0.164913120f,  0.986308097f,
+    0.166425904f,  0.986053963f,
+    0.167938295f,  0.985797509f,
+    0.169450291f,  0.985538735f,
+    0.170961889f,  0.985277642f,
+    0.172473084f,  0.985014231f,
+    0.173983873f,  0.984748502f,
+    0.175494253f,  0.984480455f,
+    0.177004220f,  0.984210092f,
+    0.178513771f,  0.983937413f,
+    0.180022901f,  0.983662419f,
+    0.181531608f,  0.983385110f,
+    0.183039888f,  0.983105487f,
+    0.184547737f,  0.982823551f,
+    0.186055152f,  0.982539302f,
+    0.187562129f,  0.982252741f,
+    0.189068664f,  0.981963869f,
+    0.190574755f,  0.981672686f,
+    0.192080397f,  0.981379193f,
+    0.193585587f,  0.981083391f,
+    0.195090322f,  0.980785280f,
+    0.196594598f,  0.980484862f,
+    0.198098411f,  0.980182136f,
+    0.199601758f,  0.979877104f,
+    0.201104635f,  0.979569766f,
+    0.202607039f,  0.979260123f,
+    0.204108966f,  0.978948175f,
+    0.205610413f,  0.978633924f,
+    0.207111376f,  0.978317371f,
+    0.208611852f,  0.977998515f,
+    0.210111837f,  0.977677358f,
+    0.211611327f,  0.977353900f,
+    0.213110320f,  0.977028143f,
+    0.214608811f,  0.976700086f,
+    0.216106797f,  0.976369731f,
+    0.217604275f,  0.976037079f,
+    0.219101240f,  0.975702130f,
+    0.220597690f,  0.975364885f,
+    0.222093621f,  0.975025345f,
+    0.223589029f,  0.974683511f,
+    0.225083911f,  0.974339383f,
+    0.226578264f,  0.973992962f,
+    0.228072083f,  0.973644250f,
+    0.229565366f,  0.973293246f,
+    0.231058108f,  0.972939952f,
+    0.232550307f,  0.972584369f,
+    0.234041959f,  0.972226497f,
+    0.235533059f,  0.971866337f,
+    0.237023606f,  0.971503891f,
+    0.238513595f,  0.971139158f,
+    0.240003022f,  0.970772141f,
+    0.241491885f,  0.970402839f,
+    0.242980180f,  0.970031253f,
+    0.244467903f,  0.969657385f,
+    0.245955050f,  0.969281235f,
+    0.247441619f,  0.968902805f,
+    0.248927606f,  0.968522094f,
+    0.250413007f,  0.968139105f,
+    0.251897818f,  0.967753837f,
+    0.253382037f,  0.967366292f,
+    0.254865660f,  0.966976471f,
+    0.256348682f,  0.966584374f,
+    0.257831102f,  0.966190003f,
+    0.259312915f,  0.965793359f,
+    0.260794118f,  0.965394442f,
+    0.262274707f,  0.964993253f,
+    0.263754679f,  0.964589793f,
+    0.265234030f,  0.964184064f,
+    0.266712757f,  0.963776066f,
+    0.268190857f,  0.963365800f,
+    0.269668326f,  0.962953267f,
+    0.271145160f,  0.962538468f,
+    0.272621355f,  0.962121404f,
+    0.274096910f,  0.961702077f,
+    0.275571819f,  0.961280486f,
+    0.277046080f,  0.960856633f,
+    0.278519689f,  0.960430519f,
+    0.279992643f,  0.960002146f,
+    0.281464938f,  0.959571513f,
+    0.282936570f,  0.959138622f,
+    0.284407537f,  0.958703475f,
+    0.285877835f,  0.958266071f,
+    0.287347460f,  0.957826413f,
+    0.288816408f,  0.957384501f,
+    0.290284677f,  0.956940336f,
+    0.291752263f,  0.956493919f,
+    0.293219163f,  0.956045251f,
+    0.294685372f,  0.955594334f,
+    0.296150888f,  0.955141168f,
+    0.297615707f,  0.954685755f,
+    0.299079826f,  0.954228095f,
+    0.300543241f,  0.953768190f,
+    0.302005949f,  0.953306040f,
+    0.303467947f,  0.952841648f,
+    0.304929230f,  0.952375013f,
+    0.306389795f,  0.951906137f,
+    0.307849640f,  0.951435021f,
+    0.309308760f,  0.950961666f,
+    0.310767153f,  0.950486074f,
+    0.312224814f,  0.950008245f,
+    0.313681740f,  0.949528181f,
+    0.315137929f,  0.949045882f,
+    0.316593376f,  0.948561350f,
+    0.318048077f,  0.948074586f,
+    0.319502031f,  0.947585591f,
+    0.320955232f,  0.947094366f,
+    0.322407679f,  0.946600913f,
+    0.323859367f,  0.946105232f,
+    0.325310292f,  0.945607325f,
+    0.326760452f,  0.945107193f,
+    0.328209844f,  0.944604837f,
+    0.329658463f,  0.944100258f,
+    0.331106306f,  0.943593458f,
+    0.332553370f,  0.943084437f,
+    0.333999651f,  0.942573198f,
+    0.335445147f,  0.942059740f,
+    0.336889853f,  0.941544065f,
+    0.338333767f,  0.941026175f,
+    0.339776884f,  0.940506071f,
+    0.341219202f,  0.939983753f,
+    0.342660717f,  0.939459224f,
+    0.344101426f,  0.938932484f,
+    0.345541325f,  0.938403534f,
+    0.346980411f,  0.937872376f,
+    0.348418680f,  0.937339012f,
+    0.349856130f,  0.936803442f,
+    0.351292756f,  0.936265667f,
+    0.352728556f,  0.935725689f,
+    0.354163525f,  0.935183510f,
+    0.355597662f,  0.934639130f,
+    0.357030961f,  0.934092550f,
+    0.358463421f,  0.933543773f,
+    0.359895037f,  0.932992799f,
+    0.361325806f,  0.932439629f,
+    0.362755724f,  0.931884266f,
+    0.364184790f,  0.931326709f,
+    0.365612998f,  0.930766961f,
+    0.367040346f,  0.930205023f,
+    0.368466830f,  0.929640896f,
+    0.369892447f,  0.929074581f,
+    0.371317194f,  0.928506080f,
+    0.372741067f,  0.927935395f,
+    0.374164063f,  0.927362526f,
+    0.375586178f,  0.926787474f,
+    0.377007410f,  0.926210242f,
+    0.378427755f,  0.925630831f,
+    0.379847209f,  0.925049241f,
+    0.381265769f,  0.924465474f,
+    0.382683432f,  0.923879533f,
+    0.384100195f,  0.923291417f,
+    0.385516054f,  0.922701128f,
+    0.386931006f,  0.922108669f,
+    0.388345047f,  0.921514039f,
+    0.389758174f,  0.920917242f,
+    0.391170384f,  0.920318277f,
+    0.392581674f,  0.919717146f,
+    0.393992040f,  0.919113852f,
+    0.395401479f,  0.918508394f,
+    0.396809987f,  0.917900776f,
+    0.398217562f,  0.917290997f,
+    0.399624200f,  0.916679060f,
+    0.401029897f,  0.916064966f,
+    0.402434651f,  0.915448716f,
+    0.403838458f,  0.914830312f,
+    0.405241314f,  0.914209756f,
+    0.406643217f,  0.913587048f,
+    0.408044163f,  0.912962190f,
+    0.409444149f,  0.912335185f,
+    0.410843171f,  0.911706032f,
+    0.412241227f,  0.911074734f,
+    0.413638312f,  0.910441292f,
+    0.415034424f,  0.909805708f,
+    0.416429560f,  0.909167983f,
+    0.417823716f,  0.908528119f,
+    0.419216888f,  0.907886116f,
+    0.420609074f,  0.907241978f,
+    0.422000271f,  0.906595705f,
+    0.423390474f,  0.905947298f,
+    0.424779681f,  0.905296759f,
+    0.426167889f,  0.904644091f,
+    0.427555093f,  0.903989293f,
+    0.428941292f,  0.903332368f,
+    0.430326481f,  0.902673318f,
+    0.431710658f,  0.902012144f,
+    0.433093819f,  0.901348847f,
+    0.434475961f,  0.900683429f,
+    0.435857080f,  0.900015892f,
+    0.437237174f,  0.899346237f,
+    0.438616239f,  0.898674466f,
+    0.439994271f,  0.898000580f,
+    0.441371269f,  0.897324581f,
+    0.442747228f,  0.896646470f,
+    0.444122145f,  0.895966250f,
+    0.445496017f,  0.895283921f,
+    0.446868840f,  0.894599486f,
+    0.448240612f,  0.893912945f,
+    0.449611330f,  0.893224301f,
+    0.450980989f,  0.892533555f,
+    0.452349587f,  0.891840709f,
+    0.453717121f,  0.891145765f,
+    0.455083587f,  0.890448723f,
+    0.456448982f,  0.889749586f,
+    0.457813304f,  0.889048356f,
+    0.459176548f,  0.888345033f,
+    0.460538711f,  0.887639620f,
+    0.461899791f,  0.886932119f,
+    0.463259784f,  0.886222530f,
+    0.464618686f,  0.885510856f,
+    0.465976496f,  0.884797098f,
+    0.467333209f,  0.884081259f,
+    0.468688822f,  0.883363339f,
+    0.470043332f,  0.882643340f,
+    0.471396737f,  0.881921264f,
+    0.472749032f,  0.881197113f,
+    0.474100215f,  0.880470889f,
+    0.475450282f,  0.879742593f,
+    0.476799230f,  0.879012226f,
+    0.478147056f,  0.878279792f,
+    0.479493758f,  0.877545290f,
+    0.480839331f,  0.876808724f,
+    0.482183772f,  0.876070094f,
+    0.483527079f,  0.875329403f,
+    0.484869248f,  0.874586652f,
+    0.486210276f,  0.873841843f,
+    0.487550160f,  0.873094978f,
+    0.488888897f,  0.872346059f,
+    0.490226483f,  0.871595087f,
+    0.491562916f,  0.870842063f,
+    0.492898192f,  0.870086991f,
+    0.494232309f,  0.869329871f,
+    0.495565262f,  0.868570706f,
+    0.496897049f,  0.867809497f,
+    0.498227667f,  0.867046246f,
+    0.499557113f,  0.866280954f,
+    0.500885383f,  0.865513624f,
+    0.502212474f,  0.864744258f,
+    0.503538384f,  0.863972856f,
+    0.504863109f,  0.863199422f,
+    0.506186645f,  0.862423956f,
+    0.507508991f,  0.861646461f,
+    0.508830143f,  0.860866939f,
+    0.510150097f,  0.860085390f,
+    0.511468850f,  0.859301818f,
+    0.512786401f,  0.858516224f,
+    0.514102744f,  0.857728610f,
+    0.515417878f,  0.856938977f,
+    0.516731799f,  0.856147328f,
+    0.518044504f,  0.855353665f,
+    0.519355990f,  0.854557988f,
+    0.520666254f,  0.853760301f,
+    0.521975293f,  0.852960605f,
+    0.523283103f,  0.852158902f,
+    0.524589683f,  0.851355193f,
+    0.525895027f,  0.850549481f,
+    0.527199135f,  0.849741768f,
+    0.528502002f,  0.848932055f,
+    0.529803625f,  0.848120345f,
+    0.531104001f,  0.847306639f,
+    0.532403128f,  0.846490939f,
+    0.533701002f,  0.845673247f,
+    0.534997620f,  0.844853565f,
+    0.536292979f,  0.844031895f,
+    0.537587076f,  0.843208240f,
+    0.538879909f,  0.842382600f,
+    0.540171473f,  0.841554977f,
+    0.541461766f,  0.840725375f,
+    0.542750785f,  0.839893794f,
+    0.544038527f,  0.839060237f,
+    0.545324988f,  0.838224706f,
+    0.546610167f,  0.837387202f,
+    0.547894059f,  0.836547727f,
+    0.549176662f,  0.835706284f,
+    0.550457973f,  0.834862875f,
+    0.551737988f,  0.834017501f,
+    0.553016706f,  0.833170165f,
+    0.554294121f,  0.832320868f,
+    0.555570233f,  0.831469612f,
+    0.556845037f,  0.830616400f,
+    0.558118531f,  0.829761234f,
+    0.559390712f,  0.828904115f,
+    0.560661576f,  0.828045045f,
+    0.561931121f,  0.827184027f,
+    0.563199344f,  0.826321063f,
+    0.564466242f,  0.825456154f,
+    0.565731811f,  0.824589303f,
+    0.566996049f,  0.823720511f,
+    0.568258953f,  0.822849781f,
+    0.569520519f,  0.821977115f,
+    0.570780746f,  0.821102515f,
+    0.572039629f,  0.820225983f,
+    0.573297167f,  0.819347520f,
+    0.574553355f,  0.818467130f,
+    0.575808191f,  0.817584813f,
+    0.577061673f,  0.816700573f,
+    0.578313796f,  0.815814411f,
+    0.579564559f,  0.814926329f,
+    0.580813958f,  0.814036330f,
+    0.582061990f,  0.813144415f,
+    0.583308653f,  0.812250587f,
+    0.584553943f,  0.811354847f,
+    0.585797857f,  0.810457198f,
+    0.587040394f,  0.809557642f,
+    0.588281548f,  0.808656182f,
+    0.589521319f,  0.807752818f,
+    0.590759702f,  0.806847554f,
+    0.591996695f,  0.805940391f,
+    0.593232295f,  0.805031331f,
+    0.594466499f,  0.804120377f,
+    0.595699304f,  0.803207531f,
+    0.596930708f,  0.802292796f,
+    0.598160707f,  0.801376172f,
+    0.599389298f,  0.800457662f,
+    0.600616479f,  0.799537269f,
+    0.601842247f,  0.798614995f,
+    0.603066599f,  0.797690841f,
+    0.604289531f,  0.796764810f,
+    0.605511041f,  0.795836905f,
+    0.606731127f,  0.794907126f,
+    0.607949785f,  0.793975478f,
+    0.609167012f,  0.793041960f,
+    0.610382806f,  0.792106577f,
+    0.611597164f,  0.791169330f,
+    0.612810082f,  0.790230221f,
+    0.614021559f,  0.789289253f,
+    0.615231591f,  0.788346428f,
+    0.616440175f,  0.787401747f,
+    0.617647308f,  0.786455214f,
+    0.618852988f,  0.785506830f,
+    0.620057212f,  0.784556597f,
+    0.621259977f,  0.783604519f,
+    0.622461279f,  0.782650596f,
+    0.623661118f,  0.781694832f,
+    0.624859488f,  0.780737229f,
+    0.626056388f,  0.779777788f,
+    0.627251815f,  0.778816512f,
+    0.628445767f,  0.777853404f,
+    0.629638239f,  0.776888466f,
+    0.630829230f,  0.775921699f,
+    0.632018736f,  0.774953107f,
+    0.633206755f,  0.773982691f,
+    0.634393284f,  0.773010453f,
+    0.635578320f,  0.772036397f,
+    0.636761861f,  0.771060524f,
+    0.637943904f,  0.770082837f,
+    0.639124445f,  0.769103338f,
+    0.640303482f,  0.768122029f,
+    0.641481013f,  0.767138912f,
+    0.642657034f,  0.766153990f,
+    0.643831543f,  0.765167266f,
+    0.645004537f,  0.764178741f,
+    0.646176013f,  0.763188417f,
+    0.647345969f,  0.762196298f,
+    0.648514401f,  0.761202385f,
+    0.649681307f,  0.760206682f,
+    0.650846685f,  0.759209189f,
+    0.652010531f,  0.758209910f,
+    0.653172843f,  0.757208847f,
+    0.654333618f,  0.756206001f,
+    0.655492853f,  0.755201377f,
+    0.656650546f,  0.754194975f,
+    0.657806693f,  0.753186799f,
+    0.658961293f,  0.752176850f,
+    0.660114342f,  0.751165132f,
+    0.661265838f,  0.750151646f,
+    0.662415778f,  0.749136395f,
+    0.663564159f,  0.748119380f,
+    0.664710978f,  0.747100606f,
+    0.665856234f,  0.746080074f,
+    0.666999922f,  0.745057785f,
+    0.668142041f,  0.744033744f,
+    0.669282588f,  0.743007952f,
+    0.670421560f,  0.741980412f,
+    0.671558955f,  0.740951125f,
+    0.672694769f,  0.739920095f,
+    0.673829000f,  0.738887324f,
+    0.674961646f,  0.737852815f,
+    0.676092704f,  0.736816569f,
+    0.677222170f,  0.735778589f,
+    0.678350043f,  0.734738878f,
+    0.679476320f,  0.733697438f,
+    0.680600998f,  0.732654272f,
+    0.681724074f,  0.731609381f,
+    0.682845546f,  0.730562769f,
+    0.683965412f,  0.729514438f,
+    0.685083668f,  0.728464390f,
+    0.686200312f,  0.727412629f,
+    0.687315341f,  0.726359155f,
+    0.688428753f,  0.725303972f,
+    0.689540545f,  0.724247083f,
+    0.690650714f,  0.723188489f,
+    0.691759258f,  0.722128194f,
+    0.692866175f,  0.721066199f,
+    0.693971461f,  0.720002508f,
+    0.695075114f,  0.718937122f,
+    0.696177131f,  0.717870045f,
+    0.697277511f,  0.716801279f,
+    0.698376249f,  0.715730825f,
+    0.699473345f,  0.714658688f,
+    0.700568794f,  0.713584869f,
+    0.701662595f,  0.712509371f,
+    0.702754744f,  0.711432196f,
+    0.703845241f,  0.710353347f,
+    0.704934080f,  0.709272826f,
+    0.706021261f,  0.708190637f,
+    0.707106781f,  0.707106781f,
+    0.708190637f,  0.706021261f,
+    0.709272826f,  0.704934080f,
+    0.710353347f,  0.703845241f,
+    0.711432196f,  0.702754744f,
+    0.712509371f,  0.701662595f,
+    0.713584869f,  0.700568794f,
+    0.714658688f,  0.699473345f,
+    0.715730825f,  0.698376249f,
+    0.716801279f,  0.697277511f,
+    0.717870045f,  0.696177131f,
+    0.718937122f,  0.695075114f,
+    0.720002508f,  0.693971461f,
+    0.721066199f,  0.692866175f,
+    0.722128194f,  0.691759258f,
+    0.723188489f,  0.690650714f,
+    0.724247083f,  0.689540545f,
+    0.725303972f,  0.688428753f,
+    0.726359155f,  0.687315341f,
+    0.727412629f,  0.686200312f,
+    0.728464390f,  0.685083668f,
+    0.729514438f,  0.683965412f,
+    0.730562769f,  0.682845546f,
+    0.731609381f,  0.681724074f,
+    0.732654272f,  0.680600998f,
+    0.733697438f,  0.679476320f,
+    0.734738878f,  0.678350043f,
+    0.735778589f,  0.677222170f,
+    0.736816569f,  0.676092704f,
+    0.737852815f,  0.674961646f,
+    0.738887324f,  0.673829000f,
+    0.739920095f,  0.672694769f,
+    0.740951125f,  0.671558955f,
+    0.741980412f,  0.670421560f,
+    0.743007952f,  0.669282588f,
+    0.744033744f,  0.668142041f,
+    0.745057785f,  0.666999922f,
+    0.746080074f,  0.665856234f,
+    0.747100606f,  0.664710978f,
+    0.748119380f,  0.663564159f,
+    0.749136395f,  0.662415778f,
+    0.750151646f,  0.661265838f,
+    0.751165132f,  0.660114342f,
+    0.752176850f,  0.658961293f,
+    0.753186799f,  0.657806693f,
+    0.754194975f,  0.656650546f,
+    0.755201377f,  0.655492853f,
+    0.756206001f,  0.654333618f,
+    0.757208847f,  0.653172843f,
+    0.758209910f,  0.652010531f,
+    0.759209189f,  0.650846685f,
+    0.760206682f,  0.649681307f,
+    0.761202385f,  0.648514401f,
+    0.762196298f,  0.647345969f,
+    0.763188417f,  0.646176013f,
+    0.764178741f,  0.645004537f,
+    0.765167266f,  0.643831543f,
+    0.766153990f,  0.642657034f,
+    0.767138912f,  0.641481013f,
+    0.768122029f,  0.640303482f,
+    0.769103338f,  0.639124445f,
+    0.770082837f,  0.637943904f,
+    0.771060524f,  0.636761861f,
+    0.772036397f,  0.635578320f,
+    0.773010453f,  0.634393284f,
+    0.773982691f,  0.633206755f,
+    0.774953107f,  0.632018736f,
+    0.775921699f,  0.630829230f,
+    0.776888466f,  0.629638239f,
+    0.777853404f,  0.628445767f,
+    0.778816512f,  0.627251815f,
+    0.779777788f,  0.626056388f,
+    0.780737229f,  0.624859488f,
+    0.781694832f,  0.623661118f,
+    0.782650596f,  0.622461279f,
+    0.783604519f,  0.621259977f,
+    0.784556597f,  0.620057212f,
+    0.785506830f,  0.618852988f,
+    0.786455214f,  0.617647308f,
+    0.787401747f,  0.616440175f,
+    0.788346428f,  0.615231591f,
+    0.789289253f,  0.614021559f,
+    0.790230221f,  0.612810082f,
+    0.791169330f,  0.611597164f,
+    0.792106577f,  0.610382806f,
+    0.793041960f,  0.609167012f,
+    0.793975478f,  0.607949785f,
+    0.794907126f,  0.606731127f,
+    0.795836905f,  0.605511041f,
+    0.796764810f,  0.604289531f,
+    0.797690841f,  0.603066599f,
+    0.798614995f,  0.601842247f,
+    0.799537269f,  0.600616479f,
+    0.800457662f,  0.599389298f,
+    0.801376172f,  0.598160707f,
+    0.802292796f,  0.596930708f,
+    0.803207531f,  0.595699304f,
+    0.804120377f,  0.594466499f,
+    0.805031331f,  0.593232295f,
+    0.805940391f,  0.591996695f,
+    0.806847554f,  0.590759702f,
+    0.807752818f,  0.589521319f,
+    0.808656182f,  0.588281548f,
+    0.809557642f,  0.587040394f,
+    0.810457198f,  0.585797857f,
+    0.811354847f,  0.584553943f,
+    0.812250587f,  0.583308653f,
+    0.813144415f,  0.582061990f,
+    0.814036330f,  0.580813958f,
+    0.814926329f,  0.579564559f,
+    0.815814411f,  0.578313796f,
+    0.816700573f,  0.577061673f,
+    0.817584813f,  0.575808191f,
+    0.818467130f,  0.574553355f,
+    0.819347520f,  0.573297167f,
+    0.820225983f,  0.572039629f,
+    0.821102515f,  0.570780746f,
+    0.821977115f,  0.569520519f,
+    0.822849781f,  0.568258953f,
+    0.823720511f,  0.566996049f,
+    0.824589303f,  0.565731811f,
+    0.825456154f,  0.564466242f,
+    0.826321063f,  0.563199344f,
+    0.827184027f,  0.561931121f,
+    0.828045045f,  0.560661576f,
+    0.828904115f,  0.559390712f,
+    0.829761234f,  0.558118531f,
+    0.830616400f,  0.556845037f,
+    0.831469612f,  0.555570233f,
+    0.832320868f,  0.554294121f,
+    0.833170165f,  0.553016706f,
+    0.834017501f,  0.551737988f,
+    0.834862875f,  0.550457973f,
+    0.835706284f,  0.549176662f,
+    0.836547727f,  0.547894059f,
+    0.837387202f,  0.546610167f,
+    0.838224706f,  0.545324988f,
+    0.839060237f,  0.544038527f,
+    0.839893794f,  0.542750785f,
+    0.840725375f,  0.541461766f,
+    0.841554977f,  0.540171473f,
+    0.842382600f,  0.538879909f,
+    0.843208240f,  0.537587076f,
+    0.844031895f,  0.536292979f,
+    0.844853565f,  0.534997620f,
+    0.845673247f,  0.533701002f,
+    0.846490939f,  0.532403128f,
+    0.847306639f,  0.531104001f,
+    0.848120345f,  0.529803625f,
+    0.848932055f,  0.528502002f,
+    0.849741768f,  0.527199135f,
+    0.850549481f,  0.525895027f,
+    0.851355193f,  0.524589683f,
+    0.852158902f,  0.523283103f,
+    0.852960605f,  0.521975293f,
+    0.853760301f,  0.520666254f,
+    0.854557988f,  0.519355990f,
+    0.855353665f,  0.518044504f,
+    0.856147328f,  0.516731799f,
+    0.856938977f,  0.515417878f,
+    0.857728610f,  0.514102744f,
+    0.858516224f,  0.512786401f,
+    0.859301818f,  0.511468850f,
+    0.860085390f,  0.510150097f,
+    0.860866939f,  0.508830143f,
+    0.861646461f,  0.507508991f,
+    0.862423956f,  0.506186645f,
+    0.863199422f,  0.504863109f,
+    0.863972856f,  0.503538384f,
+    0.864744258f,  0.502212474f,
+    0.865513624f,  0.500885383f,
+    0.866280954f,  0.499557113f,
+    0.867046246f,  0.498227667f,
+    0.867809497f,  0.496897049f,
+    0.868570706f,  0.495565262f,
+    0.869329871f,  0.494232309f,
+    0.870086991f,  0.492898192f,
+    0.870842063f,  0.491562916f,
+    0.871595087f,  0.490226483f,
+    0.872346059f,  0.488888897f,
+    0.873094978f,  0.487550160f,
+    0.873841843f,  0.486210276f,
+    0.874586652f,  0.484869248f,
+    0.875329403f,  0.483527079f,
+    0.876070094f,  0.482183772f,
+    0.876808724f,  0.480839331f,
+    0.877545290f,  0.479493758f,
+    0.878279792f,  0.478147056f,
+    0.879012226f,  0.476799230f,
+    0.879742593f,  0.475450282f,
+    0.880470889f,  0.474100215f,
+    0.881197113f,  0.472749032f,
+    0.881921264f,  0.471396737f,
+    0.882643340f,  0.470043332f,
+    0.883363339f,  0.468688822f,
+    0.884081259f,  0.467333209f,
+    0.884797098f,  0.465976496f,
+    0.885510856f,  0.464618686f,
+    0.886222530f,  0.463259784f,
+    0.886932119f,  0.461899791f,
+    0.887639620f,  0.460538711f,
+    0.888345033f,  0.459176548f,
+    0.889048356f,  0.457813304f,
+    0.889749586f,  0.456448982f,
+    0.890448723f,  0.455083587f,
+    0.891145765f,  0.453717121f,
+    0.891840709f,  0.452349587f,
+    0.892533555f,  0.450980989f,
+    0.893224301f,  0.449611330f,
+    0.893912945f,  0.448240612f,
+    0.894599486f,  0.446868840f,
+    0.895283921f,  0.445496017f,
+    0.895966250f,  0.444122145f,
+    0.896646470f,  0.442747228f,
+    0.897324581f,  0.441371269f,
+    0.898000580f,  0.439994271f,
+    0.898674466f,  0.438616239f,
+    0.899346237f,  0.437237174f,
+    0.900015892f,  0.435857080f,
+    0.900683429f,  0.434475961f,
+    0.901348847f,  0.433093819f,
+    0.902012144f,  0.431710658f,
+    0.902673318f,  0.430326481f,
+    0.903332368f,  0.428941292f,
+    0.903989293f,  0.427555093f,
+    0.904644091f,  0.426167889f,
+    0.905296759f,  0.424779681f,
+    0.905947298f,  0.423390474f,
+    0.906595705f,  0.422000271f,
+    0.907241978f,  0.420609074f,
+    0.907886116f,  0.419216888f,
+    0.908528119f,  0.417823716f,
+    0.909167983f,  0.416429560f,
+    0.909805708f,  0.415034424f,
+    0.910441292f,  0.413638312f,
+    0.911074734f,  0.412241227f,
+    0.911706032f,  0.410843171f,
+    0.912335185f,  0.409444149f,
+    0.912962190f,  0.408044163f,
+    0.913587048f,  0.406643217f,
+    0.914209756f,  0.405241314f,
+    0.914830312f,  0.403838458f,
+    0.915448716f,  0.402434651f,
+    0.916064966f,  0.401029897f,
+    0.916679060f,  0.399624200f,
+    0.917290997f,  0.398217562f,
+    0.917900776f,  0.396809987f,
+    0.918508394f,  0.395401479f,
+    0.919113852f,  0.393992040f,
+    0.919717146f,  0.392581674f,
+    0.920318277f,  0.391170384f,
+    0.920917242f,  0.389758174f,
+    0.921514039f,  0.388345047f,
+    0.922108669f,  0.386931006f,
+    0.922701128f,  0.385516054f,
+    0.923291417f,  0.384100195f,
+    0.923879533f,  0.382683432f,
+    0.924465474f,  0.381265769f,
+    0.925049241f,  0.379847209f,
+    0.925630831f,  0.378427755f,
+    0.926210242f,  0.377007410f,
+    0.926787474f,  0.375586178f,
+    0.927362526f,  0.374164063f,
+    0.927935395f,  0.372741067f,
+    0.928506080f,  0.371317194f,
+    0.929074581f,  0.369892447f,
+    0.929640896f,  0.368466830f,
+    0.930205023f,  0.367040346f,
+    0.930766961f,  0.365612998f,
+    0.931326709f,  0.364184790f,
+    0.931884266f,  0.362755724f,
+    0.932439629f,  0.361325806f,
+    0.932992799f,  0.359895037f,
+    0.933543773f,  0.358463421f,
+    0.934092550f,  0.357030961f,
+    0.934639130f,  0.355597662f,
+    0.935183510f,  0.354163525f,
+    0.935725689f,  0.352728556f,
+    0.936265667f,  0.351292756f,
+    0.936803442f,  0.349856130f,
+    0.937339012f,  0.348418680f,
+    0.937872376f,  0.346980411f,
+    0.938403534f,  0.345541325f,
+    0.938932484f,  0.344101426f,
+    0.939459224f,  0.342660717f,
+    0.939983753f,  0.341219202f,
+    0.940506071f,  0.339776884f,
+    0.941026175f,  0.338333767f,
+    0.941544065f,  0.336889853f,
+    0.942059740f,  0.335445147f,
+    0.942573198f,  0.333999651f,
+    0.943084437f,  0.332553370f,
+    0.943593458f,  0.331106306f,
+    0.944100258f,  0.329658463f,
+    0.944604837f,  0.328209844f,
+    0.945107193f,  0.326760452f,
+    0.945607325f,  0.325310292f,
+    0.946105232f,  0.323859367f,
+    0.946600913f,  0.322407679f,
+    0.947094366f,  0.320955232f,
+    0.947585591f,  0.319502031f,
+    0.948074586f,  0.318048077f,
+    0.948561350f,  0.316593376f,
+    0.949045882f,  0.315137929f,
+    0.949528181f,  0.313681740f,
+    0.950008245f,  0.312224814f,
+    0.950486074f,  0.310767153f,
+    0.950961666f,  0.309308760f,
+    0.951435021f,  0.307849640f,
+    0.951906137f,  0.306389795f,
+    0.952375013f,  0.304929230f,
+    0.952841648f,  0.303467947f,
+    0.953306040f,  0.302005949f,
+    0.953768190f,  0.300543241f,
+    0.954228095f,  0.299079826f,
+    0.954685755f,  0.297615707f,
+    0.955141168f,  0.296150888f,
+    0.955594334f,  0.294685372f,
+    0.956045251f,  0.293219163f,
+    0.956493919f,  0.291752263f,
+    0.956940336f,  0.290284677f,
+    0.957384501f,  0.288816408f,
+    0.957826413f,  0.287347460f,
+    0.958266071f,  0.285877835f,
+    0.958703475f,  0.284407537f,
+    0.959138622f,  0.282936570f,
+    0.959571513f,  0.281464938f,
+    0.960002146f,  0.279992643f,
+    0.960430519f,  0.278519689f,
+    0.960856633f,  0.277046080f,
+    0.961280486f,  0.275571819f,
+    0.961702077f,  0.274096910f,
+    0.962121404f,  0.272621355f,
+    0.962538468f,  0.271145160f,
+    0.962953267f,  0.269668326f,
+    0.963365800f,  0.268190857f,
+    0.963776066f,  0.266712757f,
+    0.964184064f,  0.265234030f,
+    0.964589793f,  0.263754679f,
+    0.964993253f,  0.262274707f,
+    0.965394442f,  0.260794118f,
+    0.965793359f,  0.259312915f,
+    0.966190003f,  0.257831102f,
+    0.966584374f,  0.256348682f,
+    0.966976471f,  0.254865660f,
+    0.967366292f,  0.253382037f,
+    0.967753837f,  0.251897818f,
+    0.968139105f,  0.250413007f,
+    0.968522094f,  0.248927606f,
+    0.968902805f,  0.247441619f,
+    0.969281235f,  0.245955050f,
+    0.969657385f,  0.244467903f,
+    0.970031253f,  0.242980180f,
+    0.970402839f,  0.241491885f,
+    0.970772141f,  0.240003022f,
+    0.971139158f,  0.238513595f,
+    0.971503891f,  0.237023606f,
+    0.971866337f,  0.235533059f,
+    0.972226497f,  0.234041959f,
+    0.972584369f,  0.232550307f,
+    0.972939952f,  0.231058108f,
+    0.973293246f,  0.229565366f,
+    0.973644250f,  0.228072083f,
+    0.973992962f,  0.226578264f,
+    0.974339383f,  0.225083911f,
+    0.974683511f,  0.223589029f,
+    0.975025345f,  0.222093621f,
+    0.975364885f,  0.220597690f,
+    0.975702130f,  0.219101240f,
+    0.976037079f,  0.217604275f,
+    0.976369731f,  0.216106797f,
+    0.976700086f,  0.214608811f,
+    0.977028143f,  0.213110320f,
+    0.977353900f,  0.211611327f,
+    0.977677358f,  0.210111837f,
+    0.977998515f,  0.208611852f,
+    0.978317371f,  0.207111376f,
+    0.978633924f,  0.205610413f,
+    0.978948175f,  0.204108966f,
+    0.979260123f,  0.202607039f,
+    0.979569766f,  0.201104635f,
+    0.979877104f,  0.199601758f,
+    0.980182136f,  0.198098411f,
+    0.980484862f,  0.196594598f,
+    0.980785280f,  0.195090322f,
+    0.981083391f,  0.193585587f,
+    0.981379193f,  0.192080397f,
+    0.981672686f,  0.190574755f,
+    0.981963869f,  0.189068664f,
+    0.982252741f,  0.187562129f,
+    0.982539302f,  0.186055152f,
+    0.982823551f,  0.184547737f,
+    0.983105487f,  0.183039888f,
+    0.983385110f,  0.181531608f,
+    0.983662419f,  0.180022901f,
+    0.983937413f,  0.178513771f,
+    0.984210092f,  0.177004220f,
+    0.984480455f,  0.175494253f,
+    0.984748502f,  0.173983873f,
+    0.985014231f,  0.172473084f,
+    0.985277642f,  0.170961889f,
+    0.985538735f,  0.169450291f,
+    0.985797509f,  0.167938295f,
+    0.986053963f,  0.166425904f,
+    0.986308097f,  0.164913120f,
+    0.986559910f,  0.163399949f,
+    0.986809402f,  0.161886394f,
+    0.987056571f,  0.160372457f,
+    0.987301418f,  0.158858143f,
+    0.987543942f,  0.157343456f,
+    0.987784142f,  0.155828398f,
+    0.988022017f,  0.154312973f,
+    0.988257568f,  0.152797185f,
+    0.988490793f,  0.151281038f,
+    0.988721692f,  0.149764535f,
+    0.988950265f,  0.148247679f,
+    0.989176510f,  0.146730474f,
+    0.989400428f,  0.145212925f,
+    0.989622017f,  0.143695033f,
+    0.989841278f,  0.142176804f,
+    0.990058210f,  0.140658239f,
+    0.990272812f,  0.139139344f,
+    0.990485084f,  0.137620122f,
+    0.990695025f,  0.136100575f,
+    0.990902635f,  0.134580709f,
+    0.991107914f,  0.133060525f,
+    0.991310860f,  0.131540029f,
+    0.991511473f,  0.130019223f,
+    0.991709754f,  0.128498111f,
+    0.991905700f,  0.126976696f,
+    0.992099313f,  0.125454983f,
+    0.992290591f,  0.123932975f,
+    0.992479535f,  0.122410675f,
+    0.992666142f,  0.120888087f,
+    0.992850414f,  0.119365215f,
+    0.993032350f,  0.117842062f,
+    0.993211949f,  0.116318631f,
+    0.993389211f,  0.114794927f,
+    0.993564136f,  0.113270952f,
+    0.993736722f,  0.111746711f,
+    0.993906970f,  0.110222207f,
+    0.994074879f,  0.108697444f,
+    0.994240449f,  0.107172425f,
+    0.994403680f,  0.105647154f,
+    0.994564571f,  0.104121634f,
+    0.994723121f,  0.102595869f,
+    0.994879331f,  0.101069863f,
+    0.995033199f,  0.099543619f,
+    0.995184727f,  0.098017140f,
+    0.995333912f,  0.096490431f,
+    0.995480755f,  0.094963495f,
+    0.995625256f,  0.093436336f,
+    0.995767414f,  0.091908956f,
+    0.995907229f,  0.090381361f,
+    0.996044701f,  0.088853553f,
+    0.996179829f,  0.087325535f,
+    0.996312612f,  0.085797312f,
+    0.996443051f,  0.084268888f,
+    0.996571146f,  0.082740265f,
+    0.996696895f,  0.081211447f,
+    0.996820299f,  0.079682438f,
+    0.996941358f,  0.078153242f,
+    0.997060070f,  0.076623861f,
+    0.997176437f,  0.075094301f,
+    0.997290457f,  0.073564564f,
+    0.997402130f,  0.072034653f,
+    0.997511456f,  0.070504573f,
+    0.997618435f,  0.068974328f,
+    0.997723067f,  0.067443920f,
+    0.997825350f,  0.065913353f,
+    0.997925286f,  0.064382631f,
+    0.998022874f,  0.062851758f,
+    0.998118113f,  0.061320736f,
+    0.998211003f,  0.059789571f,
+    0.998301545f,  0.058258265f,
+    0.998389737f,  0.056726821f,
+    0.998475581f,  0.055195244f,
+    0.998559074f,  0.053663538f,
+    0.998640218f,  0.052131705f,
+    0.998719012f,  0.050599749f,
+    0.998795456f,  0.049067674f,
+    0.998869550f,  0.047535484f,
+    0.998941293f,  0.046003182f,
+    0.999010686f,  0.044470772f,
+    0.999077728f,  0.042938257f,
+    0.999142419f,  0.041405641f,
+    0.999204759f,  0.039872928f,
+    0.999264747f,  0.038340120f,
+    0.999322385f,  0.036807223f,
+    0.999377670f,  0.035274239f,
+    0.999430605f,  0.033741172f,
+    0.999481187f,  0.032208025f,
+    0.999529418f,  0.030674803f,
+    0.999575296f,  0.029141509f,
+    0.999618822f,  0.027608146f,
+    0.999659997f,  0.026074718f,
+    0.999698819f,  0.024541229f,
+    0.999735288f,  0.023007681f,
+    0.999769405f,  0.021474080f,
+    0.999801170f,  0.019940429f,
+    0.999830582f,  0.018406730f,
+    0.999857641f,  0.016872988f,
+    0.999882347f,  0.015339206f,
+    0.999904701f,  0.013805389f,
+    0.999924702f,  0.012271538f,
+    0.999942350f,  0.010737659f,
+    0.999957645f,  0.009203755f,
+    0.999970586f,  0.007669829f,
+    0.999981175f,  0.006135885f,
+    0.999989411f,  0.004601926f,
+    0.999995294f,  0.003067957f,
+    0.999998823f,  0.001533980f,
+    1.000000000f,  0.000000000f,
+    0.999998823f, -0.001533980f,
+    0.999995294f, -0.003067957f,
+    0.999989411f, -0.004601926f,
+    0.999981175f, -0.006135885f,
+    0.999970586f, -0.007669829f,
+    0.999957645f, -0.009203755f,
+    0.999942350f, -0.010737659f,
+    0.999924702f, -0.012271538f,
+    0.999904701f, -0.013805389f,
+    0.999882347f, -0.015339206f,
+    0.999857641f, -0.016872988f,
+    0.999830582f, -0.018406730f,
+    0.999801170f, -0.019940429f,
+    0.999769405f, -0.021474080f,
+    0.999735288f, -0.023007681f,
+    0.999698819f, -0.024541229f,
+    0.999659997f, -0.026074718f,
+    0.999618822f, -0.027608146f,
+    0.999575296f, -0.029141509f,
+    0.999529418f, -0.030674803f,
+    0.999481187f, -0.032208025f,
+    0.999430605f, -0.033741172f,
+    0.999377670f, -0.035274239f,
+    0.999322385f, -0.036807223f,
+    0.999264747f, -0.038340120f,
+    0.999204759f, -0.039872928f,
+    0.999142419f, -0.041405641f,
+    0.999077728f, -0.042938257f,
+    0.999010686f, -0.044470772f,
+    0.998941293f, -0.046003182f,
+    0.998869550f, -0.047535484f,
+    0.998795456f, -0.049067674f,
+    0.998719012f, -0.050599749f,
+    0.998640218f, -0.052131705f,
+    0.998559074f, -0.053663538f,
+    0.998475581f, -0.055195244f,
+    0.998389737f, -0.056726821f,
+    0.998301545f, -0.058258265f,
+    0.998211003f, -0.059789571f,
+    0.998118113f, -0.061320736f,
+    0.998022874f, -0.062851758f,
+    0.997925286f, -0.064382631f,
+    0.997825350f, -0.065913353f,
+    0.997723067f, -0.067443920f,
+    0.997618435f, -0.068974328f,
+    0.997511456f, -0.070504573f,
+    0.997402130f, -0.072034653f,
+    0.997290457f, -0.073564564f,
+    0.997176437f, -0.075094301f,
+    0.997060070f, -0.076623861f,
+    0.996941358f, -0.078153242f,
+    0.996820299f, -0.079682438f,
+    0.996696895f, -0.081211447f,
+    0.996571146f, -0.082740265f,
+    0.996443051f, -0.084268888f,
+    0.996312612f, -0.085797312f,
+    0.996179829f, -0.087325535f,
+    0.996044701f, -0.088853553f,
+    0.995907229f, -0.090381361f,
+    0.995767414f, -0.091908956f,
+    0.995625256f, -0.093436336f,
+    0.995480755f, -0.094963495f,
+    0.995333912f, -0.096490431f,
+    0.995184727f, -0.098017140f,
+    0.995033199f, -0.099543619f,
+    0.994879331f, -0.101069863f,
+    0.994723121f, -0.102595869f,
+    0.994564571f, -0.104121634f,
+    0.994403680f, -0.105647154f,
+    0.994240449f, -0.107172425f,
+    0.994074879f, -0.108697444f,
+    0.993906970f, -0.110222207f,
+    0.993736722f, -0.111746711f,
+    0.993564136f, -0.113270952f,
+    0.993389211f, -0.114794927f,
+    0.993211949f, -0.116318631f,
+    0.993032350f, -0.117842062f,
+    0.992850414f, -0.119365215f,
+    0.992666142f, -0.120888087f,
+    0.992479535f, -0.122410675f,
+    0.992290591f, -0.123932975f,
+    0.992099313f, -0.125454983f,
+    0.991905700f, -0.126976696f,
+    0.991709754f, -0.128498111f,
+    0.991511473f, -0.130019223f,
+    0.991310860f, -0.131540029f,
+    0.991107914f, -0.133060525f,
+    0.990902635f, -0.134580709f,
+    0.990695025f, -0.136100575f,
+    0.990485084f, -0.137620122f,
+    0.990272812f, -0.139139344f,
+    0.990058210f, -0.140658239f,
+    0.989841278f, -0.142176804f,
+    0.989622017f, -0.143695033f,
+    0.989400428f, -0.145212925f,
+    0.989176510f, -0.146730474f,
+    0.988950265f, -0.148247679f,
+    0.988721692f, -0.149764535f,
+    0.988490793f, -0.151281038f,
+    0.988257568f, -0.152797185f,
+    0.988022017f, -0.154312973f,
+    0.987784142f, -0.155828398f,
+    0.987543942f, -0.157343456f,
+    0.987301418f, -0.158858143f,
+    0.987056571f, -0.160372457f,
+    0.986809402f, -0.161886394f,
+    0.986559910f, -0.163399949f,
+    0.986308097f, -0.164913120f,
+    0.986053963f, -0.166425904f,
+    0.985797509f, -0.167938295f,
+    0.985538735f, -0.169450291f,
+    0.985277642f, -0.170961889f,
+    0.985014231f, -0.172473084f,
+    0.984748502f, -0.173983873f,
+    0.984480455f, -0.175494253f,
+    0.984210092f, -0.177004220f,
+    0.983937413f, -0.178513771f,
+    0.983662419f, -0.180022901f,
+    0.983385110f, -0.181531608f,
+    0.983105487f, -0.183039888f,
+    0.982823551f, -0.184547737f,
+    0.982539302f, -0.186055152f,
+    0.982252741f, -0.187562129f,
+    0.981963869f, -0.189068664f,
+    0.981672686f, -0.190574755f,
+    0.981379193f, -0.192080397f,
+    0.981083391f, -0.193585587f,
+    0.980785280f, -0.195090322f,
+    0.980484862f, -0.196594598f,
+    0.980182136f, -0.198098411f,
+    0.979877104f, -0.199601758f,
+    0.979569766f, -0.201104635f,
+    0.979260123f, -0.202607039f,
+    0.978948175f, -0.204108966f,
+    0.978633924f, -0.205610413f,
+    0.978317371f, -0.207111376f,
+    0.977998515f, -0.208611852f,
+    0.977677358f, -0.210111837f,
+    0.977353900f, -0.211611327f,
+    0.977028143f, -0.213110320f,
+    0.976700086f, -0.214608811f,
+    0.976369731f, -0.216106797f,
+    0.976037079f, -0.217604275f,
+    0.975702130f, -0.219101240f,
+    0.975364885f, -0.220597690f,
+    0.975025345f, -0.222093621f,
+    0.974683511f, -0.223589029f,
+    0.974339383f, -0.225083911f,
+    0.973992962f, -0.226578264f,
+    0.973644250f, -0.228072083f,
+    0.973293246f, -0.229565366f,
+    0.972939952f, -0.231058108f,
+    0.972584369f, -0.232550307f,
+    0.972226497f, -0.234041959f,
+    0.971866337f, -0.235533059f,
+    0.971503891f, -0.237023606f,
+    0.971139158f, -0.238513595f,
+    0.970772141f, -0.240003022f,
+    0.970402839f, -0.241491885f,
+    0.970031253f, -0.242980180f,
+    0.969657385f, -0.244467903f,
+    0.969281235f, -0.245955050f,
+    0.968902805f, -0.247441619f,
+    0.968522094f, -0.248927606f,
+    0.968139105f, -0.250413007f,
+    0.967753837f, -0.251897818f,
+    0.967366292f, -0.253382037f,
+    0.966976471f, -0.254865660f,
+    0.966584374f, -0.256348682f,
+    0.966190003f, -0.257831102f,
+    0.965793359f, -0.259312915f,
+    0.965394442f, -0.260794118f,
+    0.964993253f, -0.262274707f,
+    0.964589793f, -0.263754679f,
+    0.964184064f, -0.265234030f,
+    0.963776066f, -0.266712757f,
+    0.963365800f, -0.268190857f,
+    0.962953267f, -0.269668326f,
+    0.962538468f, -0.271145160f,
+    0.962121404f, -0.272621355f,
+    0.961702077f, -0.274096910f,
+    0.961280486f, -0.275571819f,
+    0.960856633f, -0.277046080f,
+    0.960430519f, -0.278519689f,
+    0.960002146f, -0.279992643f,
+    0.959571513f, -0.281464938f,
+    0.959138622f, -0.282936570f,
+    0.958703475f, -0.284407537f,
+    0.958266071f, -0.285877835f,
+    0.957826413f, -0.287347460f,
+    0.957384501f, -0.288816408f,
+    0.956940336f, -0.290284677f,
+    0.956493919f, -0.291752263f,
+    0.956045251f, -0.293219163f,
+    0.955594334f, -0.294685372f,
+    0.955141168f, -0.296150888f,
+    0.954685755f, -0.297615707f,
+    0.954228095f, -0.299079826f,
+    0.953768190f, -0.300543241f,
+    0.953306040f, -0.302005949f,
+    0.952841648f, -0.303467947f,
+    0.952375013f, -0.304929230f,
+    0.951906137f, -0.306389795f,
+    0.951435021f, -0.307849640f,
+    0.950961666f, -0.309308760f,
+    0.950486074f, -0.310767153f,
+    0.950008245f, -0.312224814f,
+    0.949528181f, -0.313681740f,
+    0.949045882f, -0.315137929f,
+    0.948561350f, -0.316593376f,
+    0.948074586f, -0.318048077f,
+    0.947585591f, -0.319502031f,
+    0.947094366f, -0.320955232f,
+    0.946600913f, -0.322407679f,
+    0.946105232f, -0.323859367f,
+    0.945607325f, -0.325310292f,
+    0.945107193f, -0.326760452f,
+    0.944604837f, -0.328209844f,
+    0.944100258f, -0.329658463f,
+    0.943593458f, -0.331106306f,
+    0.943084437f, -0.332553370f,
+    0.942573198f, -0.333999651f,
+    0.942059740f, -0.335445147f,
+    0.941544065f, -0.336889853f,
+    0.941026175f, -0.338333767f,
+    0.940506071f, -0.339776884f,
+    0.939983753f, -0.341219202f,
+    0.939459224f, -0.342660717f,
+    0.938932484f, -0.344101426f,
+    0.938403534f, -0.345541325f,
+    0.937872376f, -0.346980411f,
+    0.937339012f, -0.348418680f,
+    0.936803442f, -0.349856130f,
+    0.936265667f, -0.351292756f,
+    0.935725689f, -0.352728556f,
+    0.935183510f, -0.354163525f,
+    0.934639130f, -0.355597662f,
+    0.934092550f, -0.357030961f,
+    0.933543773f, -0.358463421f,
+    0.932992799f, -0.359895037f,
+    0.932439629f, -0.361325806f,
+    0.931884266f, -0.362755724f,
+    0.931326709f, -0.364184790f,
+    0.930766961f, -0.365612998f,
+    0.930205023f, -0.367040346f,
+    0.929640896f, -0.368466830f,
+    0.929074581f, -0.369892447f,
+    0.928506080f, -0.371317194f,
+    0.927935395f, -0.372741067f,
+    0.927362526f, -0.374164063f,
+    0.926787474f, -0.375586178f,
+    0.926210242f, -0.377007410f,
+    0.925630831f, -0.378427755f,
+    0.925049241f, -0.379847209f,
+    0.924465474f, -0.381265769f,
+    0.923879533f, -0.382683432f,
+    0.923291417f, -0.384100195f,
+    0.922701128f, -0.385516054f,
+    0.922108669f, -0.386931006f,
+    0.921514039f, -0.388345047f,
+    0.920917242f, -0.389758174f,
+    0.920318277f, -0.391170384f,
+    0.919717146f, -0.392581674f,
+    0.919113852f, -0.393992040f,
+    0.918508394f, -0.395401479f,
+    0.917900776f, -0.396809987f,
+    0.917290997f, -0.398217562f,
+    0.916679060f, -0.399624200f,
+    0.916064966f, -0.401029897f,
+    0.915448716f, -0.402434651f,
+    0.914830312f, -0.403838458f,
+    0.914209756f, -0.405241314f,
+    0.913587048f, -0.406643217f,
+    0.912962190f, -0.408044163f,
+    0.912335185f, -0.409444149f,
+    0.911706032f, -0.410843171f,
+    0.911074734f, -0.412241227f,
+    0.910441292f, -0.413638312f,
+    0.909805708f, -0.415034424f,
+    0.909167983f, -0.416429560f,
+    0.908528119f, -0.417823716f,
+    0.907886116f, -0.419216888f,
+    0.907241978f, -0.420609074f,
+    0.906595705f, -0.422000271f,
+    0.905947298f, -0.423390474f,
+    0.905296759f, -0.424779681f,
+    0.904644091f, -0.426167889f,
+    0.903989293f, -0.427555093f,
+    0.903332368f, -0.428941292f,
+    0.902673318f, -0.430326481f,
+    0.902012144f, -0.431710658f,
+    0.901348847f, -0.433093819f,
+    0.900683429f, -0.434475961f,
+    0.900015892f, -0.435857080f,
+    0.899346237f, -0.437237174f,
+    0.898674466f, -0.438616239f,
+    0.898000580f, -0.439994271f,
+    0.897324581f, -0.441371269f,
+    0.896646470f, -0.442747228f,
+    0.895966250f, -0.444122145f,
+    0.895283921f, -0.445496017f,
+    0.894599486f, -0.446868840f,
+    0.893912945f, -0.448240612f,
+    0.893224301f, -0.449611330f,
+    0.892533555f, -0.450980989f,
+    0.891840709f, -0.452349587f,
+    0.891145765f, -0.453717121f,
+    0.890448723f, -0.455083587f,
+    0.889749586f, -0.456448982f,
+    0.889048356f, -0.457813304f,
+    0.888345033f, -0.459176548f,
+    0.887639620f, -0.460538711f,
+    0.886932119f, -0.461899791f,
+    0.886222530f, -0.463259784f,
+    0.885510856f, -0.464618686f,
+    0.884797098f, -0.465976496f,
+    0.884081259f, -0.467333209f,
+    0.883363339f, -0.468688822f,
+    0.882643340f, -0.470043332f,
+    0.881921264f, -0.471396737f,
+    0.881197113f, -0.472749032f,
+    0.880470889f, -0.474100215f,
+    0.879742593f, -0.475450282f,
+    0.879012226f, -0.476799230f,
+    0.878279792f, -0.478147056f,
+    0.877545290f, -0.479493758f,
+    0.876808724f, -0.480839331f,
+    0.876070094f, -0.482183772f,
+    0.875329403f, -0.483527079f,
+    0.874586652f, -0.484869248f,
+    0.873841843f, -0.486210276f,
+    0.873094978f, -0.487550160f,
+    0.872346059f, -0.488888897f,
+    0.871595087f, -0.490226483f,
+    0.870842063f, -0.491562916f,
+    0.870086991f, -0.492898192f,
+    0.869329871f, -0.494232309f,
+    0.868570706f, -0.495565262f,
+    0.867809497f, -0.496897049f,
+    0.867046246f, -0.498227667f,
+    0.866280954f, -0.499557113f,
+    0.865513624f, -0.500885383f,
+    0.864744258f, -0.502212474f,
+    0.863972856f, -0.503538384f,
+    0.863199422f, -0.504863109f,
+    0.862423956f, -0.506186645f,
+    0.861646461f, -0.507508991f,
+    0.860866939f, -0.508830143f,
+    0.860085390f, -0.510150097f,
+    0.859301818f, -0.511468850f,
+    0.858516224f, -0.512786401f,
+    0.857728610f, -0.514102744f,
+    0.856938977f, -0.515417878f,
+    0.856147328f, -0.516731799f,
+    0.855353665f, -0.518044504f,
+    0.854557988f, -0.519355990f,
+    0.853760301f, -0.520666254f,
+    0.852960605f, -0.521975293f,
+    0.852158902f, -0.523283103f,
+    0.851355193f, -0.524589683f,
+    0.850549481f, -0.525895027f,
+    0.849741768f, -0.527199135f,
+    0.848932055f, -0.528502002f,
+    0.848120345f, -0.529803625f,
+    0.847306639f, -0.531104001f,
+    0.846490939f, -0.532403128f,
+    0.845673247f, -0.533701002f,
+    0.844853565f, -0.534997620f,
+    0.844031895f, -0.536292979f,
+    0.843208240f, -0.537587076f,
+    0.842382600f, -0.538879909f,
+    0.841554977f, -0.540171473f,
+    0.840725375f, -0.541461766f,
+    0.839893794f, -0.542750785f,
+    0.839060237f, -0.544038527f,
+    0.838224706f, -0.545324988f,
+    0.837387202f, -0.546610167f,
+    0.836547727f, -0.547894059f,
+    0.835706284f, -0.549176662f,
+    0.834862875f, -0.550457973f,
+    0.834017501f, -0.551737988f,
+    0.833170165f, -0.553016706f,
+    0.832320868f, -0.554294121f,
+    0.831469612f, -0.555570233f,
+    0.830616400f, -0.556845037f,
+    0.829761234f, -0.558118531f,
+    0.828904115f, -0.559390712f,
+    0.828045045f, -0.560661576f,
+    0.827184027f, -0.561931121f,
+    0.826321063f, -0.563199344f,
+    0.825456154f, -0.564466242f,
+    0.824589303f, -0.565731811f,
+    0.823720511f, -0.566996049f,
+    0.822849781f, -0.568258953f,
+    0.821977115f, -0.569520519f,
+    0.821102515f, -0.570780746f,
+    0.820225983f, -0.572039629f,
+    0.819347520f, -0.573297167f,
+    0.818467130f, -0.574553355f,
+    0.817584813f, -0.575808191f,
+    0.816700573f, -0.577061673f,
+    0.815814411f, -0.578313796f,
+    0.814926329f, -0.579564559f,
+    0.814036330f, -0.580813958f,
+    0.813144415f, -0.582061990f,
+    0.812250587f, -0.583308653f,
+    0.811354847f, -0.584553943f,
+    0.810457198f, -0.585797857f,
+    0.809557642f, -0.587040394f,
+    0.808656182f, -0.588281548f,
+    0.807752818f, -0.589521319f,
+    0.806847554f, -0.590759702f,
+    0.805940391f, -0.591996695f,
+    0.805031331f, -0.593232295f,
+    0.804120377f, -0.594466499f,
+    0.803207531f, -0.595699304f,
+    0.802292796f, -0.596930708f,
+    0.801376172f, -0.598160707f,
+    0.800457662f, -0.599389298f,
+    0.799537269f, -0.600616479f,
+    0.798614995f, -0.601842247f,
+    0.797690841f, -0.603066599f,
+    0.796764810f, -0.604289531f,
+    0.795836905f, -0.605511041f,
+    0.794907126f, -0.606731127f,
+    0.793975478f, -0.607949785f,
+    0.793041960f, -0.609167012f,
+    0.792106577f, -0.610382806f,
+    0.791169330f, -0.611597164f,
+    0.790230221f, -0.612810082f,
+    0.789289253f, -0.614021559f,
+    0.788346428f, -0.615231591f,
+    0.787401747f, -0.616440175f,
+    0.786455214f, -0.617647308f,
+    0.785506830f, -0.618852988f,
+    0.784556597f, -0.620057212f,
+    0.783604519f, -0.621259977f,
+    0.782650596f, -0.622461279f,
+    0.781694832f, -0.623661118f,
+    0.780737229f, -0.624859488f,
+    0.779777788f, -0.626056388f,
+    0.778816512f, -0.627251815f,
+    0.777853404f, -0.628445767f,
+    0.776888466f, -0.629638239f,
+    0.775921699f, -0.630829230f,
+    0.774953107f, -0.632018736f,
+    0.773982691f, -0.633206755f,
+    0.773010453f, -0.634393284f,
+    0.772036397f, -0.635578320f,
+    0.771060524f, -0.636761861f,
+    0.770082837f, -0.637943904f,
+    0.769103338f, -0.639124445f,
+    0.768122029f, -0.640303482f,
+    0.767138912f, -0.641481013f,
+    0.766153990f, -0.642657034f,
+    0.765167266f, -0.643831543f,
+    0.764178741f, -0.645004537f,
+    0.763188417f, -0.646176013f,
+    0.762196298f, -0.647345969f,
+    0.761202385f, -0.648514401f,
+    0.760206682f, -0.649681307f,
+    0.759209189f, -0.650846685f,
+    0.758209910f, -0.652010531f,
+    0.757208847f, -0.653172843f,
+    0.756206001f, -0.654333618f,
+    0.755201377f, -0.655492853f,
+    0.754194975f, -0.656650546f,
+    0.753186799f, -0.657806693f,
+    0.752176850f, -0.658961293f,
+    0.751165132f, -0.660114342f,
+    0.750151646f, -0.661265838f,
+    0.749136395f, -0.662415778f,
+    0.748119380f, -0.663564159f,
+    0.747100606f, -0.664710978f,
+    0.746080074f, -0.665856234f,
+    0.745057785f, -0.666999922f,
+    0.744033744f, -0.668142041f,
+    0.743007952f, -0.669282588f,
+    0.741980412f, -0.670421560f,
+    0.740951125f, -0.671558955f,
+    0.739920095f, -0.672694769f,
+    0.738887324f, -0.673829000f,
+    0.737852815f, -0.674961646f,
+    0.736816569f, -0.676092704f,
+    0.735778589f, -0.677222170f,
+    0.734738878f, -0.678350043f,
+    0.733697438f, -0.679476320f,
+    0.732654272f, -0.680600998f,
+    0.731609381f, -0.681724074f,
+    0.730562769f, -0.682845546f,
+    0.729514438f, -0.683965412f,
+    0.728464390f, -0.685083668f,
+    0.727412629f, -0.686200312f,
+    0.726359155f, -0.687315341f,
+    0.725303972f, -0.688428753f,
+    0.724247083f, -0.689540545f,
+    0.723188489f, -0.690650714f,
+    0.722128194f, -0.691759258f,
+    0.721066199f, -0.692866175f,
+    0.720002508f, -0.693971461f,
+    0.718937122f, -0.695075114f,
+    0.717870045f, -0.696177131f,
+    0.716801279f, -0.697277511f,
+    0.715730825f, -0.698376249f,
+    0.714658688f, -0.699473345f,
+    0.713584869f, -0.700568794f,
+    0.712509371f, -0.701662595f,
+    0.711432196f, -0.702754744f,
+    0.710353347f, -0.703845241f,
+    0.709272826f, -0.704934080f,
+    0.708190637f, -0.706021261f,
+    0.707106781f, -0.707106781f,
+    0.706021261f, -0.708190637f,
+    0.704934080f, -0.709272826f,
+    0.703845241f, -0.710353347f,
+    0.702754744f, -0.711432196f,
+    0.701662595f, -0.712509371f,
+    0.700568794f, -0.713584869f,
+    0.699473345f, -0.714658688f,
+    0.698376249f, -0.715730825f,
+    0.697277511f, -0.716801279f,
+    0.696177131f, -0.717870045f,
+    0.695075114f, -0.718937122f,
+    0.693971461f, -0.720002508f,
+    0.692866175f, -0.721066199f,
+    0.691759258f, -0.722128194f,
+    0.690650714f, -0.723188489f,
+    0.689540545f, -0.724247083f,
+    0.688428753f, -0.725303972f,
+    0.687315341f, -0.726359155f,
+    0.686200312f, -0.727412629f,
+    0.685083668f, -0.728464390f,
+    0.683965412f, -0.729514438f,
+    0.682845546f, -0.730562769f,
+    0.681724074f, -0.731609381f,
+    0.680600998f, -0.732654272f,
+    0.679476320f, -0.733697438f,
+    0.678350043f, -0.734738878f,
+    0.677222170f, -0.735778589f,
+    0.676092704f, -0.736816569f,
+    0.674961646f, -0.737852815f,
+    0.673829000f, -0.738887324f,
+    0.672694769f, -0.739920095f,
+    0.671558955f, -0.740951125f,
+    0.670421560f, -0.741980412f,
+    0.669282588f, -0.743007952f,
+    0.668142041f, -0.744033744f,
+    0.666999922f, -0.745057785f,
+    0.665856234f, -0.746080074f,
+    0.664710978f, -0.747100606f,
+    0.663564159f, -0.748119380f,
+    0.662415778f, -0.749136395f,
+    0.661265838f, -0.750151646f,
+    0.660114342f, -0.751165132f,
+    0.658961293f, -0.752176850f,
+    0.657806693f, -0.753186799f,
+    0.656650546f, -0.754194975f,
+    0.655492853f, -0.755201377f,
+    0.654333618f, -0.756206001f,
+    0.653172843f, -0.757208847f,
+    0.652010531f, -0.758209910f,
+    0.650846685f, -0.759209189f,
+    0.649681307f, -0.760206682f,
+    0.648514401f, -0.761202385f,
+    0.647345969f, -0.762196298f,
+    0.646176013f, -0.763188417f,
+    0.645004537f, -0.764178741f,
+    0.643831543f, -0.765167266f,
+    0.642657034f, -0.766153990f,
+    0.641481013f, -0.767138912f,
+    0.640303482f, -0.768122029f,
+    0.639124445f, -0.769103338f,
+    0.637943904f, -0.770082837f,
+    0.636761861f, -0.771060524f,
+    0.635578320f, -0.772036397f,
+    0.634393284f, -0.773010453f,
+    0.633206755f, -0.773982691f,
+    0.632018736f, -0.774953107f,
+    0.630829230f, -0.775921699f,
+    0.629638239f, -0.776888466f,
+    0.628445767f, -0.777853404f,
+    0.627251815f, -0.778816512f,
+    0.626056388f, -0.779777788f,
+    0.624859488f, -0.780737229f,
+    0.623661118f, -0.781694832f,
+    0.622461279f, -0.782650596f,
+    0.621259977f, -0.783604519f,
+    0.620057212f, -0.784556597f,
+    0.618852988f, -0.785506830f,
+    0.617647308f, -0.786455214f,
+    0.616440175f, -0.787401747f,
+    0.615231591f, -0.788346428f,
+    0.614021559f, -0.789289253f,
+    0.612810082f, -0.790230221f,
+    0.611597164f, -0.791169330f,
+    0.610382806f, -0.792106577f,
+    0.609167012f, -0.793041960f,
+    0.607949785f, -0.793975478f,
+    0.606731127f, -0.794907126f,
+    0.605511041f, -0.795836905f,
+    0.604289531f, -0.796764810f,
+    0.603066599f, -0.797690841f,
+    0.601842247f, -0.798614995f,
+    0.600616479f, -0.799537269f,
+    0.599389298f, -0.800457662f,
+    0.598160707f, -0.801376172f,
+    0.596930708f, -0.802292796f,
+    0.595699304f, -0.803207531f,
+    0.594466499f, -0.804120377f,
+    0.593232295f, -0.805031331f,
+    0.591996695f, -0.805940391f,
+    0.590759702f, -0.806847554f,
+    0.589521319f, -0.807752818f,
+    0.588281548f, -0.808656182f,
+    0.587040394f, -0.809557642f,
+    0.585797857f, -0.810457198f,
+    0.584553943f, -0.811354847f,
+    0.583308653f, -0.812250587f,
+    0.582061990f, -0.813144415f,
+    0.580813958f, -0.814036330f,
+    0.579564559f, -0.814926329f,
+    0.578313796f, -0.815814411f,
+    0.577061673f, -0.816700573f,
+    0.575808191f, -0.817584813f,
+    0.574553355f, -0.818467130f,
+    0.573297167f, -0.819347520f,
+    0.572039629f, -0.820225983f,
+    0.570780746f, -0.821102515f,
+    0.569520519f, -0.821977115f,
+    0.568258953f, -0.822849781f,
+    0.566996049f, -0.823720511f,
+    0.565731811f, -0.824589303f,
+    0.564466242f, -0.825456154f,
+    0.563199344f, -0.826321063f,
+    0.561931121f, -0.827184027f,
+    0.560661576f, -0.828045045f,
+    0.559390712f, -0.828904115f,
+    0.558118531f, -0.829761234f,
+    0.556845037f, -0.830616400f,
+    0.555570233f, -0.831469612f,
+    0.554294121f, -0.832320868f,
+    0.553016706f, -0.833170165f,
+    0.551737988f, -0.834017501f,
+    0.550457973f, -0.834862875f,
+    0.549176662f, -0.835706284f,
+    0.547894059f, -0.836547727f,
+    0.546610167f, -0.837387202f,
+    0.545324988f, -0.838224706f,
+    0.544038527f, -0.839060237f,
+    0.542750785f, -0.839893794f,
+    0.541461766f, -0.840725375f,
+    0.540171473f, -0.841554977f,
+    0.538879909f, -0.842382600f,
+    0.537587076f, -0.843208240f,
+    0.536292979f, -0.844031895f,
+    0.534997620f, -0.844853565f,
+    0.533701002f, -0.845673247f,
+    0.532403128f, -0.846490939f,
+    0.531104001f, -0.847306639f,
+    0.529803625f, -0.848120345f,
+    0.528502002f, -0.848932055f,
+    0.527199135f, -0.849741768f,
+    0.525895027f, -0.850549481f,
+    0.524589683f, -0.851355193f,
+    0.523283103f, -0.852158902f,
+    0.521975293f, -0.852960605f,
+    0.520666254f, -0.853760301f,
+    0.519355990f, -0.854557988f,
+    0.518044504f, -0.855353665f,
+    0.516731799f, -0.856147328f,
+    0.515417878f, -0.856938977f,
+    0.514102744f, -0.857728610f,
+    0.512786401f, -0.858516224f,
+    0.511468850f, -0.859301818f,
+    0.510150097f, -0.860085390f,
+    0.508830143f, -0.860866939f,
+    0.507508991f, -0.861646461f,
+    0.506186645f, -0.862423956f,
+    0.504863109f, -0.863199422f,
+    0.503538384f, -0.863972856f,
+    0.502212474f, -0.864744258f,
+    0.500885383f, -0.865513624f,
+    0.499557113f, -0.866280954f,
+    0.498227667f, -0.867046246f,
+    0.496897049f, -0.867809497f,
+    0.495565262f, -0.868570706f,
+    0.494232309f, -0.869329871f,
+    0.492898192f, -0.870086991f,
+    0.491562916f, -0.870842063f,
+    0.490226483f, -0.871595087f,
+    0.488888897f, -0.872346059f,
+    0.487550160f, -0.873094978f,
+    0.486210276f, -0.873841843f,
+    0.484869248f, -0.874586652f,
+    0.483527079f, -0.875329403f,
+    0.482183772f, -0.876070094f,
+    0.480839331f, -0.876808724f,
+    0.479493758f, -0.877545290f,
+    0.478147056f, -0.878279792f,
+    0.476799230f, -0.879012226f,
+    0.475450282f, -0.879742593f,
+    0.474100215f, -0.880470889f,
+    0.472749032f, -0.881197113f,
+    0.471396737f, -0.881921264f,
+    0.470043332f, -0.882643340f,
+    0.468688822f, -0.883363339f,
+    0.467333209f, -0.884081259f,
+    0.465976496f, -0.884797098f,
+    0.464618686f, -0.885510856f,
+    0.463259784f, -0.886222530f,
+    0.461899791f, -0.886932119f,
+    0.460538711f, -0.887639620f,
+    0.459176548f, -0.888345033f,
+    0.457813304f, -0.889048356f,
+    0.456448982f, -0.889749586f,
+    0.455083587f, -0.890448723f,
+    0.453717121f, -0.891145765f,
+    0.452349587f, -0.891840709f,
+    0.450980989f, -0.892533555f,
+    0.449611330f, -0.893224301f,
+    0.448240612f, -0.893912945f,
+    0.446868840f, -0.894599486f,
+    0.445496017f, -0.895283921f,
+    0.444122145f, -0.895966250f,
+    0.442747228f, -0.896646470f,
+    0.441371269f, -0.897324581f,
+    0.439994271f, -0.898000580f,
+    0.438616239f, -0.898674466f,
+    0.437237174f, -0.899346237f,
+    0.435857080f, -0.900015892f,
+    0.434475961f, -0.900683429f,
+    0.433093819f, -0.901348847f,
+    0.431710658f, -0.902012144f,
+    0.430326481f, -0.902673318f,
+    0.428941292f, -0.903332368f,
+    0.427555093f, -0.903989293f,
+    0.426167889f, -0.904644091f,
+    0.424779681f, -0.905296759f,
+    0.423390474f, -0.905947298f,
+    0.422000271f, -0.906595705f,
+    0.420609074f, -0.907241978f,
+    0.419216888f, -0.907886116f,
+    0.417823716f, -0.908528119f,
+    0.416429560f, -0.909167983f,
+    0.415034424f, -0.909805708f,
+    0.413638312f, -0.910441292f,
+    0.412241227f, -0.911074734f,
+    0.410843171f, -0.911706032f,
+    0.409444149f, -0.912335185f,
+    0.408044163f, -0.912962190f,
+    0.406643217f, -0.913587048f,
+    0.405241314f, -0.914209756f,
+    0.403838458f, -0.914830312f,
+    0.402434651f, -0.915448716f,
+    0.401029897f, -0.916064966f,
+    0.399624200f, -0.916679060f,
+    0.398217562f, -0.917290997f,
+    0.396809987f, -0.917900776f,
+    0.395401479f, -0.918508394f,
+    0.393992040f, -0.919113852f,
+    0.392581674f, -0.919717146f,
+    0.391170384f, -0.920318277f,
+    0.389758174f, -0.920917242f,
+    0.388345047f, -0.921514039f,
+    0.386931006f, -0.922108669f,
+    0.385516054f, -0.922701128f,
+    0.384100195f, -0.923291417f,
+    0.382683432f, -0.923879533f,
+    0.381265769f, -0.924465474f,
+    0.379847209f, -0.925049241f,
+    0.378427755f, -0.925630831f,
+    0.377007410f, -0.926210242f,
+    0.375586178f, -0.926787474f,
+    0.374164063f, -0.927362526f,
+    0.372741067f, -0.927935395f,
+    0.371317194f, -0.928506080f,
+    0.369892447f, -0.929074581f,
+    0.368466830f, -0.929640896f,
+    0.367040346f, -0.930205023f,
+    0.365612998f, -0.930766961f,
+    0.364184790f, -0.931326709f,
+    0.362755724f, -0.931884266f,
+    0.361325806f, -0.932439629f,
+    0.359895037f, -0.932992799f,
+    0.358463421f, -0.933543773f,
+    0.357030961f, -0.934092550f,
+    0.355597662f, -0.934639130f,
+    0.354163525f, -0.935183510f,
+    0.352728556f, -0.935725689f,
+    0.351292756f, -0.936265667f,
+    0.349856130f, -0.936803442f,
+    0.348418680f, -0.937339012f,
+    0.346980411f, -0.937872376f,
+    0.345541325f, -0.938403534f,
+    0.344101426f, -0.938932484f,
+    0.342660717f, -0.939459224f,
+    0.341219202f, -0.939983753f,
+    0.339776884f, -0.940506071f,
+    0.338333767f, -0.941026175f,
+    0.336889853f, -0.941544065f,
+    0.335445147f, -0.942059740f,
+    0.333999651f, -0.942573198f,
+    0.332553370f, -0.943084437f,
+    0.331106306f, -0.943593458f,
+    0.329658463f, -0.944100258f,
+    0.328209844f, -0.944604837f,
+    0.326760452f, -0.945107193f,
+    0.325310292f, -0.945607325f,
+    0.323859367f, -0.946105232f,
+    0.322407679f, -0.946600913f,
+    0.320955232f, -0.947094366f,
+    0.319502031f, -0.947585591f,
+    0.318048077f, -0.948074586f,
+    0.316593376f, -0.948561350f,
+    0.315137929f, -0.949045882f,
+    0.313681740f, -0.949528181f,
+    0.312224814f, -0.950008245f,
+    0.310767153f, -0.950486074f,
+    0.309308760f, -0.950961666f,
+    0.307849640f, -0.951435021f,
+    0.306389795f, -0.951906137f,
+    0.304929230f, -0.952375013f,
+    0.303467947f, -0.952841648f,
+    0.302005949f, -0.953306040f,
+    0.300543241f, -0.953768190f,
+    0.299079826f, -0.954228095f,
+    0.297615707f, -0.954685755f,
+    0.296150888f, -0.955141168f,
+    0.294685372f, -0.955594334f,
+    0.293219163f, -0.956045251f,
+    0.291752263f, -0.956493919f,
+    0.290284677f, -0.956940336f,
+    0.288816408f, -0.957384501f,
+    0.287347460f, -0.957826413f,
+    0.285877835f, -0.958266071f,
+    0.284407537f, -0.958703475f,
+    0.282936570f, -0.959138622f,
+    0.281464938f, -0.959571513f,
+    0.279992643f, -0.960002146f,
+    0.278519689f, -0.960430519f,
+    0.277046080f, -0.960856633f,
+    0.275571819f, -0.961280486f,
+    0.274096910f, -0.961702077f,
+    0.272621355f, -0.962121404f,
+    0.271145160f, -0.962538468f,
+    0.269668326f, -0.962953267f,
+    0.268190857f, -0.963365800f,
+    0.266712757f, -0.963776066f,
+    0.265234030f, -0.964184064f,
+    0.263754679f, -0.964589793f,
+    0.262274707f, -0.964993253f,
+    0.260794118f, -0.965394442f,
+    0.259312915f, -0.965793359f,
+    0.257831102f, -0.966190003f,
+    0.256348682f, -0.966584374f,
+    0.254865660f, -0.966976471f,
+    0.253382037f, -0.967366292f,
+    0.251897818f, -0.967753837f,
+    0.250413007f, -0.968139105f,
+    0.248927606f, -0.968522094f,
+    0.247441619f, -0.968902805f,
+    0.245955050f, -0.969281235f,
+    0.244467903f, -0.969657385f,
+    0.242980180f, -0.970031253f,
+    0.241491885f, -0.970402839f,
+    0.240003022f, -0.970772141f,
+    0.238513595f, -0.971139158f,
+    0.237023606f, -0.971503891f,
+    0.235533059f, -0.971866337f,
+    0.234041959f, -0.972226497f,
+    0.232550307f, -0.972584369f,
+    0.231058108f, -0.972939952f,
+    0.229565366f, -0.973293246f,
+    0.228072083f, -0.973644250f,
+    0.226578264f, -0.973992962f,
+    0.225083911f, -0.974339383f,
+    0.223589029f, -0.974683511f,
+    0.222093621f, -0.975025345f,
+    0.220597690f, -0.975364885f,
+    0.219101240f, -0.975702130f,
+    0.217604275f, -0.976037079f,
+    0.216106797f, -0.976369731f,
+    0.214608811f, -0.976700086f,
+    0.213110320f, -0.977028143f,
+    0.211611327f, -0.977353900f,
+    0.210111837f, -0.977677358f,
+    0.208611852f, -0.977998515f,
+    0.207111376f, -0.978317371f,
+    0.205610413f, -0.978633924f,
+    0.204108966f, -0.978948175f,
+    0.202607039f, -0.979260123f,
+    0.201104635f, -0.979569766f,
+    0.199601758f, -0.979877104f,
+    0.198098411f, -0.980182136f,
+    0.196594598f, -0.980484862f,
+    0.195090322f, -0.980785280f,
+    0.193585587f, -0.981083391f,
+    0.192080397f, -0.981379193f,
+    0.190574755f, -0.981672686f,
+    0.189068664f, -0.981963869f,
+    0.187562129f, -0.982252741f,
+    0.186055152f, -0.982539302f,
+    0.184547737f, -0.982823551f,
+    0.183039888f, -0.983105487f,
+    0.181531608f, -0.983385110f,
+    0.180022901f, -0.983662419f,
+    0.178513771f, -0.983937413f,
+    0.177004220f, -0.984210092f,
+    0.175494253f, -0.984480455f,
+    0.173983873f, -0.984748502f,
+    0.172473084f, -0.985014231f,
+    0.170961889f, -0.985277642f,
+    0.169450291f, -0.985538735f,
+    0.167938295f, -0.985797509f,
+    0.166425904f, -0.986053963f,
+    0.164913120f, -0.986308097f,
+    0.163399949f, -0.986559910f,
+    0.161886394f, -0.986809402f,
+    0.160372457f, -0.987056571f,
+    0.158858143f, -0.987301418f,
+    0.157343456f, -0.987543942f,
+    0.155828398f, -0.987784142f,
+    0.154312973f, -0.988022017f,
+    0.152797185f, -0.988257568f,
+    0.151281038f, -0.988490793f,
+    0.149764535f, -0.988721692f,
+    0.148247679f, -0.988950265f,
+    0.146730474f, -0.989176510f,
+    0.145212925f, -0.989400428f,
+    0.143695033f, -0.989622017f,
+    0.142176804f, -0.989841278f,
+    0.140658239f, -0.990058210f,
+    0.139139344f, -0.990272812f,
+    0.137620122f, -0.990485084f,
+    0.136100575f, -0.990695025f,
+    0.134580709f, -0.990902635f,
+    0.133060525f, -0.991107914f,
+    0.131540029f, -0.991310860f,
+    0.130019223f, -0.991511473f,
+    0.128498111f, -0.991709754f,
+    0.126976696f, -0.991905700f,
+    0.125454983f, -0.992099313f,
+    0.123932975f, -0.992290591f,
+    0.122410675f, -0.992479535f,
+    0.120888087f, -0.992666142f,
+    0.119365215f, -0.992850414f,
+    0.117842062f, -0.993032350f,
+    0.116318631f, -0.993211949f,
+    0.114794927f, -0.993389211f,
+    0.113270952f, -0.993564136f,
+    0.111746711f, -0.993736722f,
+    0.110222207f, -0.993906970f,
+    0.108697444f, -0.994074879f,
+    0.107172425f, -0.994240449f,
+    0.105647154f, -0.994403680f,
+    0.104121634f, -0.994564571f,
+    0.102595869f, -0.994723121f,
+    0.101069863f, -0.994879331f,
+    0.099543619f, -0.995033199f,
+    0.098017140f, -0.995184727f,
+    0.096490431f, -0.995333912f,
+    0.094963495f, -0.995480755f,
+    0.093436336f, -0.995625256f,
+    0.091908956f, -0.995767414f,
+    0.090381361f, -0.995907229f,
+    0.088853553f, -0.996044701f,
+    0.087325535f, -0.996179829f,
+    0.085797312f, -0.996312612f,
+    0.084268888f, -0.996443051f,
+    0.082740265f, -0.996571146f,
+    0.081211447f, -0.996696895f,
+    0.079682438f, -0.996820299f,
+    0.078153242f, -0.996941358f,
+    0.076623861f, -0.997060070f,
+    0.075094301f, -0.997176437f,
+    0.073564564f, -0.997290457f,
+    0.072034653f, -0.997402130f,
+    0.070504573f, -0.997511456f,
+    0.068974328f, -0.997618435f,
+    0.067443920f, -0.997723067f,
+    0.065913353f, -0.997825350f,
+    0.064382631f, -0.997925286f,
+    0.062851758f, -0.998022874f,
+    0.061320736f, -0.998118113f,
+    0.059789571f, -0.998211003f,
+    0.058258265f, -0.998301545f,
+    0.056726821f, -0.998389737f,
+    0.055195244f, -0.998475581f,
+    0.053663538f, -0.998559074f,
+    0.052131705f, -0.998640218f,
+    0.050599749f, -0.998719012f,
+    0.049067674f, -0.998795456f,
+    0.047535484f, -0.998869550f,
+    0.046003182f, -0.998941293f,
+    0.044470772f, -0.999010686f,
+    0.042938257f, -0.999077728f,
+    0.041405641f, -0.999142419f,
+    0.039872928f, -0.999204759f,
+    0.038340120f, -0.999264747f,
+    0.036807223f, -0.999322385f,
+    0.035274239f, -0.999377670f,
+    0.033741172f, -0.999430605f,
+    0.032208025f, -0.999481187f,
+    0.030674803f, -0.999529418f,
+    0.029141509f, -0.999575296f,
+    0.027608146f, -0.999618822f,
+    0.026074718f, -0.999659997f,
+    0.024541229f, -0.999698819f,
+    0.023007681f, -0.999735288f,
+    0.021474080f, -0.999769405f,
+    0.019940429f, -0.999801170f,
+    0.018406730f, -0.999830582f,
+    0.016872988f, -0.999857641f,
+    0.015339206f, -0.999882347f,
+    0.013805389f, -0.999904701f,
+    0.012271538f, -0.999924702f,
+    0.010737659f, -0.999942350f,
+    0.009203755f, -0.999957645f,
+    0.007669829f, -0.999970586f,
+    0.006135885f, -0.999981175f,
+    0.004601926f, -0.999989411f,
+    0.003067957f, -0.999995294f,
+    0.001533980f, -0.999998823f
+};
+
+
+/**   
+ * \par    
+ * Example code for the generation of the floating-point sine table:
+ * <pre>
+ * tableSize = 512;    
+ * for(n = 0; n < (tableSize + 1); n++)    
+ * {    
+ *	sinTable[n]=sin(2*pi*n/tableSize);    
+ * }</pre>    
+ * \par    
+ * where pi value is  3.14159265358979    
+ */
+
+const float32_t sinTable_f32[FAST_MATH_TABLE_SIZE + 1] = {
+   0.00000000f, 0.01227154f, 0.02454123f, 0.03680722f, 0.04906767f, 0.06132074f,
+   0.07356456f, 0.08579731f, 0.09801714f, 0.11022221f, 0.12241068f, 0.13458071f,
+   0.14673047f, 0.15885814f, 0.17096189f, 0.18303989f, 0.19509032f, 0.20711138f,
+   0.21910124f, 0.23105811f, 0.24298018f, 0.25486566f, 0.26671276f, 0.27851969f,
+   0.29028468f, 0.30200595f, 0.31368174f, 0.32531029f, 0.33688985f, 0.34841868f,
+   0.35989504f, 0.37131719f, 0.38268343f, 0.39399204f, 0.40524131f, 0.41642956f,
+   0.42755509f, 0.43861624f, 0.44961133f, 0.46053871f, 0.47139674f, 0.48218377f,
+   0.49289819f, 0.50353838f, 0.51410274f, 0.52458968f, 0.53499762f, 0.54532499f,
+   0.55557023f, 0.56573181f, 0.57580819f, 0.58579786f, 0.59569930f, 0.60551104f,
+   0.61523159f, 0.62485949f, 0.63439328f, 0.64383154f, 0.65317284f, 0.66241578f,
+   0.67155895f, 0.68060100f, 0.68954054f, 0.69837625f, 0.70710678f, 0.71573083f,
+   0.72424708f, 0.73265427f, 0.74095113f, 0.74913639f, 0.75720885f, 0.76516727f,
+   0.77301045f, 0.78073723f, 0.78834643f, 0.79583690f, 0.80320753f, 0.81045720f,
+   0.81758481f, 0.82458930f, 0.83146961f, 0.83822471f, 0.84485357f, 0.85135519f,
+   0.85772861f, 0.86397286f, 0.87008699f, 0.87607009f, 0.88192126f, 0.88763962f,
+   0.89322430f, 0.89867447f, 0.90398929f, 0.90916798f, 0.91420976f, 0.91911385f,
+   0.92387953f, 0.92850608f, 0.93299280f, 0.93733901f, 0.94154407f, 0.94560733f,
+   0.94952818f, 0.95330604f, 0.95694034f, 0.96043052f, 0.96377607f, 0.96697647f,
+   0.97003125f, 0.97293995f, 0.97570213f, 0.97831737f, 0.98078528f, 0.98310549f,
+   0.98527764f, 0.98730142f, 0.98917651f, 0.99090264f, 0.99247953f, 0.99390697f,
+   0.99518473f, 0.99631261f, 0.99729046f, 0.99811811f, 0.99879546f, 0.99932238f,
+   0.99969882f, 0.99992470f, 1.00000000f, 0.99992470f, 0.99969882f, 0.99932238f,
+   0.99879546f, 0.99811811f, 0.99729046f, 0.99631261f, 0.99518473f, 0.99390697f,
+   0.99247953f, 0.99090264f, 0.98917651f, 0.98730142f, 0.98527764f, 0.98310549f,
+   0.98078528f, 0.97831737f, 0.97570213f, 0.97293995f, 0.97003125f, 0.96697647f,
+   0.96377607f, 0.96043052f, 0.95694034f, 0.95330604f, 0.94952818f, 0.94560733f,
+   0.94154407f, 0.93733901f, 0.93299280f, 0.92850608f, 0.92387953f, 0.91911385f,
+   0.91420976f, 0.90916798f, 0.90398929f, 0.89867447f, 0.89322430f, 0.88763962f,
+   0.88192126f, 0.87607009f, 0.87008699f, 0.86397286f, 0.85772861f, 0.85135519f,
+   0.84485357f, 0.83822471f, 0.83146961f, 0.82458930f, 0.81758481f, 0.81045720f,
+   0.80320753f, 0.79583690f, 0.78834643f, 0.78073723f, 0.77301045f, 0.76516727f,
+   0.75720885f, 0.74913639f, 0.74095113f, 0.73265427f, 0.72424708f, 0.71573083f,
+   0.70710678f, 0.69837625f, 0.68954054f, 0.68060100f, 0.67155895f, 0.66241578f,
+   0.65317284f, 0.64383154f, 0.63439328f, 0.62485949f, 0.61523159f, 0.60551104f,
+   0.59569930f, 0.58579786f, 0.57580819f, 0.56573181f, 0.55557023f, 0.54532499f,
+   0.53499762f, 0.52458968f, 0.51410274f, 0.50353838f, 0.49289819f, 0.48218377f,
+   0.47139674f, 0.46053871f, 0.44961133f, 0.43861624f, 0.42755509f, 0.41642956f,
+   0.40524131f, 0.39399204f, 0.38268343f, 0.37131719f, 0.35989504f, 0.34841868f,
+   0.33688985f, 0.32531029f, 0.31368174f, 0.30200595f, 0.29028468f, 0.27851969f,
+   0.26671276f, 0.25486566f, 0.24298018f, 0.23105811f, 0.21910124f, 0.20711138f,
+   0.19509032f, 0.18303989f, 0.17096189f, 0.15885814f, 0.14673047f, 0.13458071f,
+   0.12241068f, 0.11022221f, 0.09801714f, 0.08579731f, 0.07356456f, 0.06132074f,
+   0.04906767f, 0.03680722f, 0.02454123f, 0.01227154f, 0.00000000f, -0.01227154f,
+   -0.02454123f, -0.03680722f, -0.04906767f, -0.06132074f, -0.07356456f,
+   -0.08579731f, -0.09801714f, -0.11022221f, -0.12241068f, -0.13458071f,
+   -0.14673047f, -0.15885814f, -0.17096189f, -0.18303989f, -0.19509032f, 
+   -0.20711138f, -0.21910124f, -0.23105811f, -0.24298018f, -0.25486566f, 
+   -0.26671276f, -0.27851969f, -0.29028468f, -0.30200595f, -0.31368174f, 
+   -0.32531029f, -0.33688985f, -0.34841868f, -0.35989504f, -0.37131719f, 
+   -0.38268343f, -0.39399204f, -0.40524131f, -0.41642956f, -0.42755509f, 
+   -0.43861624f, -0.44961133f, -0.46053871f, -0.47139674f, -0.48218377f, 
+   -0.49289819f, -0.50353838f, -0.51410274f, -0.52458968f, -0.53499762f, 
+   -0.54532499f, -0.55557023f, -0.56573181f, -0.57580819f, -0.58579786f, 
+   -0.59569930f, -0.60551104f, -0.61523159f, -0.62485949f, -0.63439328f, 
+   -0.64383154f, -0.65317284f, -0.66241578f, -0.67155895f, -0.68060100f, 
+   -0.68954054f, -0.69837625f, -0.70710678f, -0.71573083f, -0.72424708f, 
+   -0.73265427f, -0.74095113f, -0.74913639f, -0.75720885f, -0.76516727f, 
+   -0.77301045f, -0.78073723f, -0.78834643f, -0.79583690f, -0.80320753f, 
+   -0.81045720f, -0.81758481f, -0.82458930f, -0.83146961f, -0.83822471f, 
+   -0.84485357f, -0.85135519f, -0.85772861f, -0.86397286f, -0.87008699f, 
+   -0.87607009f, -0.88192126f, -0.88763962f, -0.89322430f, -0.89867447f, 
+   -0.90398929f, -0.90916798f, -0.91420976f, -0.91911385f, -0.92387953f, 
+   -0.92850608f, -0.93299280f, -0.93733901f, -0.94154407f, -0.94560733f, 
+   -0.94952818f, -0.95330604f, -0.95694034f, -0.96043052f, -0.96377607f, 
+   -0.96697647f, -0.97003125f, -0.97293995f, -0.97570213f, -0.97831737f, 
+   -0.98078528f, -0.98310549f, -0.98527764f, -0.98730142f, -0.98917651f, 
+   -0.99090264f, -0.99247953f, -0.99390697f, -0.99518473f, -0.99631261f, 
+   -0.99729046f, -0.99811811f, -0.99879546f, -0.99932238f, -0.99969882f, 
+   -0.99992470f, -1.00000000f, -0.99992470f, -0.99969882f, -0.99932238f, 
+   -0.99879546f, -0.99811811f, -0.99729046f, -0.99631261f, -0.99518473f, 
+   -0.99390697f, -0.99247953f, -0.99090264f, -0.98917651f, -0.98730142f, 
+   -0.98527764f, -0.98310549f, -0.98078528f, -0.97831737f, -0.97570213f, 
+   -0.97293995f, -0.97003125f, -0.96697647f, -0.96377607f, -0.96043052f, 
+   -0.95694034f, -0.95330604f, -0.94952818f, -0.94560733f, -0.94154407f, 
+   -0.93733901f, -0.93299280f, -0.92850608f, -0.92387953f, -0.91911385f, 
+   -0.91420976f, -0.90916798f, -0.90398929f, -0.89867447f, -0.89322430f, 
+   -0.88763962f, -0.88192126f, -0.87607009f, -0.87008699f, -0.86397286f, 
+   -0.85772861f, -0.85135519f, -0.84485357f, -0.83822471f, -0.83146961f, 
+   -0.82458930f, -0.81758481f, -0.81045720f, -0.80320753f, -0.79583690f, 
+   -0.78834643f, -0.78073723f, -0.77301045f, -0.76516727f, -0.75720885f, 
+   -0.74913639f, -0.74095113f, -0.73265427f, -0.72424708f, -0.71573083f, 
+   -0.70710678f, -0.69837625f, -0.68954054f, -0.68060100f, -0.67155895f, 
+   -0.66241578f, -0.65317284f, -0.64383154f, -0.63439328f, -0.62485949f, 
+   -0.61523159f, -0.60551104f, -0.59569930f, -0.58579786f, -0.57580819f, 
+   -0.56573181f, -0.55557023f, -0.54532499f, -0.53499762f, -0.52458968f, 
+   -0.51410274f, -0.50353838f, -0.49289819f, -0.48218377f, -0.47139674f, 
+   -0.46053871f, -0.44961133f, -0.43861624f, -0.42755509f, -0.41642956f, 
+   -0.40524131f, -0.39399204f, -0.38268343f, -0.37131719f, -0.35989504f, 
+   -0.34841868f, -0.33688985f, -0.32531029f, -0.31368174f, -0.30200595f, 
+   -0.29028468f, -0.27851969f, -0.26671276f, -0.25486566f, -0.24298018f, 
+   -0.23105811f, -0.21910124f, -0.20711138f, -0.19509032f, -0.18303989f, 
+   -0.17096189f, -0.15885814f, -0.14673047f, -0.13458071f, -0.12241068f, 
+   -0.11022221f, -0.09801714f, -0.08579731f, -0.07356456f, -0.06132074f, 
+   -0.04906767f, -0.03680722f, -0.02454123f, -0.01227154f, -0.00000000f
+};
+
+/**   
+ * \par    
+ * Table values are in Q31 (1.31 fixed-point format) and generation is done in 
+ * three steps.  First,  generate sin values in floating point:    
+ * <pre>
+ * tableSize = 512;      
+ * for(n = 0; n < (tableSize + 1); n++)    
+ * {    
+ *	sinTable[n]= sin(2*pi*n/tableSize);    
+ * } </pre>    
+ * where pi value is  3.14159265358979    
+ * \par    
+ * Second, convert floating-point to Q31 (Fixed point):
+ *	(sinTable[i] * pow(2, 31))    
+ * \par    
+ * Finally, round to the nearest integer value:
+ * 	sinTable[i] += (sinTable[i] > 0 ? 0.5 :-0.5);    
+ */
+const q31_t sinTable_q31[FAST_MATH_TABLE_SIZE + 1] = {
+   0x00000000, 0x01921D20, 0x03242ABF, 0x04B6195D, 0x0647D97C, 0x07D95B9E, 
+   0x096A9049, 0x0AFB6805, 0x0C8BD35E, 0x0E1BC2E4, 0x0FAB272B, 0x1139F0CF, 
+   0x12C8106F, 0x145576B1, 0x15E21445, 0x176DD9DE, 0x18F8B83C, 0x1A82A026, 
+   0x1C0B826A, 0x1D934FE5, 0x1F19F97B, 0x209F701C, 0x2223A4C5, 0x23A6887F, 
+   0x25280C5E, 0x26A82186, 0x2826B928, 0x29A3C485, 0x2B1F34EB, 0x2C98FBBA, 
+   0x2E110A62, 0x2F875262, 0x30FBC54D, 0x326E54C7, 0x33DEF287, 0x354D9057, 
+   0x36BA2014, 0x382493B0, 0x398CDD32, 0x3AF2EEB7, 0x3C56BA70, 0x3DB832A6, 
+   0x3F1749B8, 0x4073F21D, 0x41CE1E65, 0x4325C135, 0x447ACD50, 0x45CD358F, 
+   0x471CECE7, 0x4869E665, 0x49B41533, 0x4AFB6C98, 0x4C3FDFF4, 0x4D8162C4, 
+   0x4EBFE8A5, 0x4FFB654D, 0x5133CC94, 0x5269126E, 0x539B2AF0, 0x54CA0A4B, 
+   0x55F5A4D2, 0x571DEEFA, 0x5842DD54, 0x59646498, 0x5A82799A, 0x5B9D1154, 
+   0x5CB420E0, 0x5DC79D7C, 0x5ED77C8A, 0x5FE3B38D, 0x60EC3830, 0x61F1003F, 
+   0x62F201AC, 0x63EF3290, 0x64E88926, 0x65DDFBD3, 0x66CF8120, 0x67BD0FBD, 
+   0x68A69E81, 0x698C246C, 0x6A6D98A4, 0x6B4AF279, 0x6C242960, 0x6CF934FC, 
+   0x6DCA0D14, 0x6E96A99D, 0x6F5F02B2, 0x7023109A, 0x70E2CBC6, 0x719E2CD2, 
+   0x72552C85, 0x7307C3D0, 0x73B5EBD1, 0x745F9DD1, 0x7504D345, 0x75A585CF, 
+   0x7641AF3D, 0x76D94989, 0x776C4EDB, 0x77FAB989, 0x78848414, 0x7909A92D, 
+   0x798A23B1, 0x7A05EEAD, 0x7A7D055B, 0x7AEF6323, 0x7B5D039E, 0x7BC5E290, 
+   0x7C29FBEE, 0x7C894BDE, 0x7CE3CEB2, 0x7D3980EC, 0x7D8A5F40, 0x7DD6668F, 
+   0x7E1D93EA, 0x7E5FE493, 0x7E9D55FC, 0x7ED5E5C6, 0x7F0991C4, 0x7F3857F6, 
+   0x7F62368F, 0x7F872BF3, 0x7FA736B4, 0x7FC25596, 0x7FD8878E, 0x7FE9CBC0, 
+   0x7FF62182, 0x7FFD885A, 0x7FFFFFFF, 0x7FFD885A, 0x7FF62182, 0x7FE9CBC0, 
+   0x7FD8878E, 0x7FC25596, 0x7FA736B4, 0x7F872BF3, 0x7F62368F, 0x7F3857F6, 
+   0x7F0991C4, 0x7ED5E5C6, 0x7E9D55FC, 0x7E5FE493, 0x7E1D93EA, 0x7DD6668F, 
+   0x7D8A5F40, 0x7D3980EC, 0x7CE3CEB2, 0x7C894BDE, 0x7C29FBEE, 0x7BC5E290, 
+   0x7B5D039E, 0x7AEF6323, 0x7A7D055B, 0x7A05EEAD, 0x798A23B1, 0x7909A92D, 
+   0x78848414, 0x77FAB989, 0x776C4EDB, 0x76D94989, 0x7641AF3D, 0x75A585CF, 
+   0x7504D345, 0x745F9DD1, 0x73B5EBD1, 0x7307C3D0, 0x72552C85, 0x719E2CD2, 
+   0x70E2CBC6, 0x7023109A, 0x6F5F02B2, 0x6E96A99D, 0x6DCA0D14, 0x6CF934FC, 
+   0x6C242960, 0x6B4AF279, 0x6A6D98A4, 0x698C246C, 0x68A69E81, 0x67BD0FBD, 
+   0x66CF8120, 0x65DDFBD3, 0x64E88926, 0x63EF3290, 0x62F201AC, 0x61F1003F, 
+   0x60EC3830, 0x5FE3B38D, 0x5ED77C8A, 0x5DC79D7C, 0x5CB420E0, 0x5B9D1154, 
+   0x5A82799A, 0x59646498, 0x5842DD54, 0x571DEEFA, 0x55F5A4D2, 0x54CA0A4B, 
+   0x539B2AF0, 0x5269126E, 0x5133CC94, 0x4FFB654D, 0x4EBFE8A5, 0x4D8162C4, 
+   0x4C3FDFF4, 0x4AFB6C98, 0x49B41533, 0x4869E665, 0x471CECE7, 0x45CD358F, 
+   0x447ACD50, 0x4325C135, 0x41CE1E65, 0x4073F21D, 0x3F1749B8, 0x3DB832A6, 
+   0x3C56BA70, 0x3AF2EEB7, 0x398CDD32, 0x382493B0, 0x36BA2014, 0x354D9057, 
+   0x33DEF287, 0x326E54C7, 0x30FBC54D, 0x2F875262, 0x2E110A62, 0x2C98FBBA, 
+   0x2B1F34EB, 0x29A3C485, 0x2826B928, 0x26A82186, 0x25280C5E, 0x23A6887F, 
+   0x2223A4C5, 0x209F701C, 0x1F19F97B, 0x1D934FE5, 0x1C0B826A, 0x1A82A026, 
+   0x18F8B83C, 0x176DD9DE, 0x15E21445, 0x145576B1, 0x12C8106F, 0x1139F0CF, 
+   0x0FAB272B, 0x0E1BC2E4, 0x0C8BD35E, 0x0AFB6805, 0x096A9049, 0x07D95B9E, 
+   0x0647D97C, 0x04B6195D, 0x03242ABF, 0x01921D20, 0x00000000, 0xFE6DE2E0, 
+   0xFCDBD541, 0xFB49E6A3, 0xF9B82684, 0xF826A462, 0xF6956FB7, 0xF50497FB, 
+   0xF3742CA2, 0xF1E43D1C, 0xF054D8D5, 0xEEC60F31, 0xED37EF91, 0xEBAA894F, 
+   0xEA1DEBBB, 0xE8922622, 0xE70747C4, 0xE57D5FDA, 0xE3F47D96, 0xE26CB01B, 
+   0xE0E60685, 0xDF608FE4, 0xDDDC5B3B, 0xDC597781, 0xDAD7F3A2, 0xD957DE7A, 
+   0xD7D946D8, 0xD65C3B7B, 0xD4E0CB15, 0xD3670446, 0xD1EEF59E, 0xD078AD9E, 
+   0xCF043AB3, 0xCD91AB39, 0xCC210D79, 0xCAB26FA9, 0xC945DFEC, 0xC7DB6C50, 
+   0xC67322CE, 0xC50D1149, 0xC3A94590, 0xC247CD5A, 0xC0E8B648, 0xBF8C0DE3, 
+   0xBE31E19B, 0xBCDA3ECB, 0xBB8532B0, 0xBA32CA71, 0xB8E31319, 0xB796199B, 
+   0xB64BEACD, 0xB5049368, 0xB3C0200C, 0xB27E9D3C, 0xB140175B, 0xB0049AB3, 
+   0xAECC336C, 0xAD96ED92, 0xAC64D510, 0xAB35F5B5, 0xAA0A5B2E, 0xA8E21106, 
+   0xA7BD22AC, 0xA69B9B68, 0xA57D8666, 0xA462EEAC, 0xA34BDF20, 0xA2386284, 
+   0xA1288376, 0xA01C4C73, 0x9F13C7D0, 0x9E0EFFC1, 0x9D0DFE54, 0x9C10CD70, 
+   0x9B1776DA, 0x9A22042D, 0x99307EE0, 0x9842F043, 0x9759617F, 0x9673DB94, 
+   0x9592675C, 0x94B50D87, 0x93DBD6A0, 0x9306CB04, 0x9235F2EC, 0x91695663, 
+   0x90A0FD4E, 0x8FDCEF66, 0x8F1D343A, 0x8E61D32E, 0x8DAAD37B, 0x8CF83C30, 
+   0x8C4A142F, 0x8BA0622F, 0x8AFB2CBB, 0x8A5A7A31, 0x89BE50C3, 0x8926B677, 
+   0x8893B125, 0x88054677, 0x877B7BEC, 0x86F656D3, 0x8675DC4F, 0x85FA1153, 
+   0x8582FAA5, 0x85109CDD, 0x84A2FC62, 0x843A1D70, 0x83D60412, 0x8376B422, 
+   0x831C314E, 0x82C67F14, 0x8275A0C0, 0x82299971, 0x81E26C16, 0x81A01B6D, 
+   0x8162AA04, 0x812A1A3A, 0x80F66E3C, 0x80C7A80A, 0x809DC971, 0x8078D40D, 
+   0x8058C94C, 0x803DAA6A, 0x80277872, 0x80163440, 0x8009DE7E, 0x800277A6, 
+   0x80000000, 0x800277A6, 0x8009DE7E, 0x80163440, 0x80277872, 0x803DAA6A, 
+   0x8058C94C, 0x8078D40D, 0x809DC971, 0x80C7A80A, 0x80F66E3C, 0x812A1A3A, 
+   0x8162AA04, 0x81A01B6D, 0x81E26C16, 0x82299971, 0x8275A0C0, 0x82C67F14, 
+   0x831C314E, 0x8376B422, 0x83D60412, 0x843A1D70, 0x84A2FC62, 0x85109CDD, 
+   0x8582FAA5, 0x85FA1153, 0x8675DC4F, 0x86F656D3, 0x877B7BEC, 0x88054677, 
+   0x8893B125, 0x8926B677, 0x89BE50C3, 0x8A5A7A31, 0x8AFB2CBB, 0x8BA0622F, 
+   0x8C4A142F, 0x8CF83C30, 0x8DAAD37B, 0x8E61D32E, 0x8F1D343A, 0x8FDCEF66, 
+   0x90A0FD4E, 0x91695663, 0x9235F2EC, 0x9306CB04, 0x93DBD6A0, 0x94B50D87, 
+   0x9592675C, 0x9673DB94, 0x9759617F, 0x9842F043, 0x99307EE0, 0x9A22042D, 
+   0x9B1776DA, 0x9C10CD70, 0x9D0DFE54, 0x9E0EFFC1, 0x9F13C7D0, 0xA01C4C73, 
+   0xA1288376, 0xA2386284, 0xA34BDF20, 0xA462EEAC, 0xA57D8666, 0xA69B9B68, 
+   0xA7BD22AC, 0xA8E21106, 0xAA0A5B2E, 0xAB35F5B5, 0xAC64D510, 0xAD96ED92, 
+   0xAECC336C, 0xB0049AB3, 0xB140175B, 0xB27E9D3C, 0xB3C0200C, 0xB5049368, 
+   0xB64BEACD, 0xB796199B, 0xB8E31319, 0xBA32CA71, 0xBB8532B0, 0xBCDA3ECB, 
+   0xBE31E19B, 0xBF8C0DE3, 0xC0E8B648, 0xC247CD5A, 0xC3A94590, 0xC50D1149, 
+   0xC67322CE, 0xC7DB6C50, 0xC945DFEC, 0xCAB26FA9, 0xCC210D79, 0xCD91AB39, 
+   0xCF043AB3, 0xD078AD9E, 0xD1EEF59E, 0xD3670446, 0xD4E0CB15, 0xD65C3B7B, 
+   0xD7D946D8, 0xD957DE7A, 0xDAD7F3A2, 0xDC597781, 0xDDDC5B3B, 0xDF608FE4, 
+   0xE0E60685, 0xE26CB01B, 0xE3F47D96, 0xE57D5FDA, 0xE70747C4, 0xE8922622, 
+   0xEA1DEBBB, 0xEBAA894F, 0xED37EF91, 0xEEC60F31, 0xF054D8D5, 0xF1E43D1C, 
+   0xF3742CA2, 0xF50497FB, 0xF6956FB7, 0xF826A462, 0xF9B82684, 0xFB49E6A3, 
+   0xFCDBD541, 0xFE6DE2E0, 0x00000000 
+};
+
+/**   
+ * \par    
+ * Table values are in Q15 (1.15 fixed-point format) and generation is done in 
+ * three steps.  First,  generate sin values in floating point:    
+ * <pre>
+ * tableSize = 512;      
+ * for(n = 0; n < (tableSize + 1); n++)    
+ * {    
+ *	sinTable[n]= sin(2*pi*n/tableSize);    
+ * } </pre>    
+ * where pi value is  3.14159265358979    
+ * \par    
+ * Second, convert floating-point to Q15 (Fixed point):
+ *	(sinTable[i] * pow(2, 15))    
+ * \par    
+ * Finally, round to the nearest integer value:
+ * 	sinTable[i] += (sinTable[i] > 0 ? 0.5 :-0.5);    
+ */
+const q15_t sinTable_q15[FAST_MATH_TABLE_SIZE + 1] = {
+   0x0000, 0x0192, 0x0324, 0x04B6, 0x0648, 0x07D9, 0x096B, 0x0AFB, 0x0C8C, 0x0E1C, 0x0FAB, 0x113A, 0x12C8,
+   0x1455, 0x15E2, 0x176E, 0x18F9, 0x1A83, 0x1C0C, 0x1D93, 0x1F1A, 0x209F, 0x2224, 0x23A7, 0x2528, 0x26A8,
+   0x2827, 0x29A4, 0x2B1F, 0x2C99, 0x2E11, 0x2F87, 0x30FC, 0x326E, 0x33DF, 0x354E, 0x36BA, 0x3825, 0x398D,
+   0x3AF3, 0x3C57, 0x3DB8, 0x3F17, 0x4074, 0x41CE, 0x4326, 0x447B, 0x45CD, 0x471D, 0x486A, 0x49B4, 0x4AFB,
+   0x4C40, 0x4D81, 0x4EC0, 0x4FFB, 0x5134, 0x5269, 0x539B, 0x54CA, 0x55F6, 0x571E, 0x5843, 0x5964, 0x5A82,
+   0x5B9D, 0x5CB4, 0x5DC8, 0x5ED7, 0x5FE4, 0x60EC, 0x61F1, 0x62F2, 0x63EF, 0x64E9, 0x65DE, 0x66D0, 0x67BD,
+   0x68A7, 0x698C, 0x6A6E, 0x6B4B, 0x6C24, 0x6CF9, 0x6DCA, 0x6E97, 0x6F5F, 0x7023, 0x70E3, 0x719E, 0x7255,
+   0x7308, 0x73B6, 0x7460, 0x7505, 0x75A6, 0x7642, 0x76D9, 0x776C, 0x77FB, 0x7885, 0x790A, 0x798A, 0x7A06,
+   0x7A7D, 0x7AEF, 0x7B5D, 0x7BC6, 0x7C2A, 0x7C89, 0x7CE4, 0x7D3A, 0x7D8A, 0x7DD6, 0x7E1E, 0x7E60, 0x7E9D,
+   0x7ED6, 0x7F0A, 0x7F38, 0x7F62, 0x7F87, 0x7FA7, 0x7FC2, 0x7FD9, 0x7FEA, 0x7FF6, 0x7FFE, 0x7FFF, 0x7FFE,
+   0x7FF6, 0x7FEA, 0x7FD9, 0x7FC2, 0x7FA7, 0x7F87, 0x7F62, 0x7F38, 0x7F0A, 0x7ED6, 0x7E9D, 0x7E60, 0x7E1E,
+   0x7DD6, 0x7D8A, 0x7D3A, 0x7CE4, 0x7C89, 0x7C2A, 0x7BC6, 0x7B5D, 0x7AEF, 0x7A7D, 0x7A06, 0x798A, 0x790A,
+   0x7885, 0x77FB, 0x776C, 0x76D9, 0x7642, 0x75A6, 0x7505, 0x7460, 0x73B6, 0x7308, 0x7255, 0x719E, 0x70E3,
+   0x7023, 0x6F5F, 0x6E97, 0x6DCA, 0x6CF9, 0x6C24, 0x6B4B, 0x6A6E, 0x698C, 0x68A7, 0x67BD, 0x66D0, 0x65DE,
+   0x64E9, 0x63EF, 0x62F2, 0x61F1, 0x60EC, 0x5FE4, 0x5ED7, 0x5DC8, 0x5CB4, 0x5B9D, 0x5A82, 0x5964, 0x5843,
+   0x571E, 0x55F6, 0x54CA, 0x539B, 0x5269, 0x5134, 0x4FFB, 0x4EC0, 0x4D81, 0x4C40, 0x4AFB, 0x49B4, 0x486A,
+   0x471D, 0x45CD, 0x447B, 0x4326, 0x41CE, 0x4074, 0x3F17, 0x3DB8, 0x3C57, 0x3AF3, 0x398D, 0x3825, 0x36BA,
+   0x354E, 0x33DF, 0x326E, 0x30FC, 0x2F87, 0x2E11, 0x2C99, 0x2B1F, 0x29A4, 0x2827, 0x26A8, 0x2528, 0x23A7,
+   0x2224, 0x209F, 0x1F1A, 0x1D93, 0x1C0C, 0x1A83, 0x18F9, 0x176E, 0x15E2, 0x1455, 0x12C8, 0x113A, 0x0FAB,
+   0x0E1C, 0x0C8C, 0x0AFB, 0x096B, 0x07D9, 0x0648, 0x04B6, 0x0324, 0x0192, 0x0000, 0xFE6E, 0xFCDC, 0xFB4A,
+   0xF9B8, 0xF827, 0xF695, 0xF505, 0xF374, 0xF1E4, 0xF055, 0xEEC6, 0xED38, 0xEBAB, 0xEA1E, 0xE892, 0xE707,
+   0xE57D, 0xE3F4, 0xE26D, 0xE0E6, 0xDF61, 0xDDDC, 0xDC59, 0xDAD8, 0xD958, 0xD7D9, 0xD65C, 0xD4E1, 0xD367,
+   0xD1EF, 0xD079, 0xCF04, 0xCD92, 0xCC21, 0xCAB2, 0xC946, 0xC7DB, 0xC673, 0xC50D, 0xC3A9, 0xC248, 0xC0E9,
+   0xBF8C, 0xBE32, 0xBCDA, 0xBB85, 0xBA33, 0xB8E3, 0xB796, 0xB64C, 0xB505, 0xB3C0, 0xB27F, 0xB140, 0xB005,
+   0xAECC, 0xAD97, 0xAC65, 0xAB36, 0xAA0A, 0xA8E2, 0xA7BD, 0xA69C, 0xA57E, 0xA463, 0xA34C, 0xA238, 0xA129,
+   0xA01C, 0x9F14, 0x9E0F, 0x9D0E, 0x9C11, 0x9B17, 0x9A22, 0x9930, 0x9843, 0x9759, 0x9674, 0x9592, 0x94B5,
+   0x93DC, 0x9307, 0x9236, 0x9169, 0x90A1, 0x8FDD, 0x8F1D, 0x8E62, 0x8DAB, 0x8CF8, 0x8C4A, 0x8BA0, 0x8AFB,
+   0x8A5A, 0x89BE, 0x8927, 0x8894, 0x8805, 0x877B, 0x86F6, 0x8676, 0x85FA, 0x8583, 0x8511, 0x84A3, 0x843A,
+   0x83D6, 0x8377, 0x831C, 0x82C6, 0x8276, 0x822A, 0x81E2, 0x81A0, 0x8163, 0x812A, 0x80F6, 0x80C8, 0x809E,
+   0x8079, 0x8059, 0x803E, 0x8027, 0x8016, 0x800A, 0x8002, 0x8000, 0x8002, 0x800A, 0x8016, 0x8027, 0x803E,
+   0x8059, 0x8079, 0x809E, 0x80C8, 0x80F6, 0x812A, 0x8163, 0x81A0, 0x81E2, 0x822A, 0x8276, 0x82C6, 0x831C,
+   0x8377, 0x83D6, 0x843A, 0x84A3, 0x8511, 0x8583, 0x85FA, 0x8676, 0x86F6, 0x877B, 0x8805, 0x8894, 0x8927,
+   0x89BE, 0x8A5A, 0x8AFB, 0x8BA0, 0x8C4A, 0x8CF8, 0x8DAB, 0x8E62, 0x8F1D, 0x8FDD, 0x90A1, 0x9169, 0x9236,
+   0x9307, 0x93DC, 0x94B5, 0x9592, 0x9674, 0x9759, 0x9843, 0x9930, 0x9A22, 0x9B17, 0x9C11, 0x9D0E, 0x9E0F,
+   0x9F14, 0xA01C, 0xA129, 0xA238, 0xA34C, 0xA463, 0xA57E, 0xA69C, 0xA7BD, 0xA8E2, 0xAA0A, 0xAB36, 0xAC65,
+   0xAD97, 0xAECC, 0xB005, 0xB140, 0xB27F, 0xB3C0, 0xB505, 0xB64C, 0xB796, 0xB8E3, 0xBA33, 0xBB85, 0xBCDA,
+   0xBE32, 0xBF8C, 0xC0E9, 0xC248, 0xC3A9, 0xC50D, 0xC673, 0xC7DB, 0xC946, 0xCAB2, 0xCC21, 0xCD92, 0xCF04,
+   0xD079, 0xD1EF, 0xD367, 0xD4E1, 0xD65C, 0xD7D9, 0xD958, 0xDAD8, 0xDC59, 0xDDDC, 0xDF61, 0xE0E6, 0xE26D,
+   0xE3F4, 0xE57D, 0xE707, 0xE892, 0xEA1E, 0xEBAB, 0xED38, 0xEEC6, 0xF055, 0xF1E4, 0xF374, 0xF505, 0xF695,
+   0xF827, 0xF9B8, 0xFB4A, 0xFCDC, 0xFE6E, 0x0000
+};

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/CommonTables/arm_const_structs.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/CommonTables/arm_const_structs.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,156 @@
+/* ---------------------------------------------------------------------- 
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved. 
+* 
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5
+* 
+* Project: 	    CMSIS DSP Library 
+* Title:	    arm_const_structs.c 
+* 
+* Description:	This file has constant structs that are initialized for
+*              user convenience.  For example, some can be given as 
+*              arguments to the arm_cfft_f32() function.
+* 
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_const_structs.h"
+
+//Floating-point structs
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len16 = {
+	16, twiddleCoef_16, armBitRevIndexTable16, ARMBITREVINDEXTABLE__16_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len32 = {
+	32, twiddleCoef_32, armBitRevIndexTable32, ARMBITREVINDEXTABLE__32_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len64 = {
+	64, twiddleCoef_64, armBitRevIndexTable64, ARMBITREVINDEXTABLE__64_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len128 = {
+	128, twiddleCoef_128, armBitRevIndexTable128, ARMBITREVINDEXTABLE_128_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len256 = {
+	256, twiddleCoef_256, armBitRevIndexTable256, ARMBITREVINDEXTABLE_256_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len512 = {
+	512, twiddleCoef_512, armBitRevIndexTable512, ARMBITREVINDEXTABLE_512_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len1024 = {
+	1024, twiddleCoef_1024, armBitRevIndexTable1024, ARMBITREVINDEXTABLE1024_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len2048 = {
+	2048, twiddleCoef_2048, armBitRevIndexTable2048, ARMBITREVINDEXTABLE2048_TABLE_LENGTH
+};
+
+const arm_cfft_instance_f32 arm_cfft_sR_f32_len4096 = {
+	4096, twiddleCoef_4096, armBitRevIndexTable4096, ARMBITREVINDEXTABLE4096_TABLE_LENGTH
+};
+
+//Fixed-point structs
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len16 = {
+	16, twiddleCoef_16_q31, armBitRevIndexTable_fixed_16, ARMBITREVINDEXTABLE_FIXED___16_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len32 = {
+	32, twiddleCoef_32_q31, armBitRevIndexTable_fixed_32, ARMBITREVINDEXTABLE_FIXED___32_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len64 = {
+	64, twiddleCoef_64_q31, armBitRevIndexTable_fixed_64, ARMBITREVINDEXTABLE_FIXED___64_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len128 = {
+	128, twiddleCoef_128_q31, armBitRevIndexTable_fixed_128, ARMBITREVINDEXTABLE_FIXED__128_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len256 = {
+	256, twiddleCoef_256_q31, armBitRevIndexTable_fixed_256, ARMBITREVINDEXTABLE_FIXED__256_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len512 = {
+	512, twiddleCoef_512_q31, armBitRevIndexTable_fixed_512, ARMBITREVINDEXTABLE_FIXED__512_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len1024 = {
+	1024, twiddleCoef_1024_q31, armBitRevIndexTable_fixed_1024, ARMBITREVINDEXTABLE_FIXED_1024_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len2048 = {
+	2048, twiddleCoef_2048_q31, armBitRevIndexTable_fixed_2048, ARMBITREVINDEXTABLE_FIXED_2048_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q31 arm_cfft_sR_q31_len4096 = {
+	4096, twiddleCoef_4096_q31, armBitRevIndexTable_fixed_4096, ARMBITREVINDEXTABLE_FIXED_4096_TABLE_LENGTH
+};
+
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len16 = {
+	16, twiddleCoef_16_q15, armBitRevIndexTable_fixed_16, ARMBITREVINDEXTABLE_FIXED___16_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len32 = {
+	32, twiddleCoef_32_q15, armBitRevIndexTable_fixed_32, ARMBITREVINDEXTABLE_FIXED___32_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len64 = {
+	64, twiddleCoef_64_q15, armBitRevIndexTable_fixed_64, ARMBITREVINDEXTABLE_FIXED___64_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len128 = {
+	128, twiddleCoef_128_q15, armBitRevIndexTable_fixed_128, ARMBITREVINDEXTABLE_FIXED__128_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len256 = {
+	256, twiddleCoef_256_q15, armBitRevIndexTable_fixed_256, ARMBITREVINDEXTABLE_FIXED__256_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len512 = {
+	512, twiddleCoef_512_q15, armBitRevIndexTable_fixed_512, ARMBITREVINDEXTABLE_FIXED__512_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len1024 = {
+	1024, twiddleCoef_1024_q15, armBitRevIndexTable_fixed_1024, ARMBITREVINDEXTABLE_FIXED_1024_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len2048 = {
+	2048, twiddleCoef_2048_q15, armBitRevIndexTable_fixed_2048, ARMBITREVINDEXTABLE_FIXED_2048_TABLE_LENGTH
+};
+
+const arm_cfft_instance_q15 arm_cfft_sR_q15_len4096 = {
+	4096, twiddleCoef_4096_q15, armBitRevIndexTable_fixed_4096, ARMBITREVINDEXTABLE_FIXED_4096_TABLE_LENGTH
+};

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,182 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_conj_f32.c    
+*    
+* Description:	Floating-point complex conjugate.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupCmplxMath        
+ */
+
+/**        
+ * @defgroup cmplx_conj Complex Conjugate        
+ *        
+ * Conjugates the elements of a complex data vector.        
+ *       
+ * The <code>pSrc</code> points to the source data and        
+ * <code>pDst</code> points to the where the result should be written.        
+ * <code>numSamples</code> specifies the number of complex samples        
+ * and the data in each array is stored in an interleaved fashion        
+ * (real, imag, real, imag, ...).        
+ * Each array has a total of <code>2*numSamples</code> values.        
+ * The underlying algorithm is used:        
+ *        
+ * <pre>        
+ * for(n=0; n<numSamples; n++) {        
+ *     pDst[(2*n)+0)] = pSrc[(2*n)+0];     // real part        
+ *     pDst[(2*n)+1)] = -pSrc[(2*n)+1];    // imag part        
+ * }        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup cmplx_conj        
+ * @{        
+ */
+
+/**        
+ * @brief  Floating-point complex conjugate.        
+ * @param  *pSrc points to the input vector        
+ * @param  *pDst points to the output vector        
+ * @param  numSamples number of complex samples in each vector        
+ * @return none.        
+ */
+
+void arm_cmplx_conj_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t numSamples)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t inR1, inR2, inR3, inR4;
+  float32_t inI1, inI2, inI3, inI4;
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[0]+jC[1] = A[0]+ j (-1) A[1] */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    /* read real input samples */
+    inR1 = pSrc[0];
+    /* store real samples to destination */
+    pDst[0] = inR1;
+    inR2 = pSrc[2];
+    pDst[2] = inR2;
+    inR3 = pSrc[4];
+    pDst[4] = inR3;
+    inR4 = pSrc[6];
+    pDst[6] = inR4;
+
+    /* read imaginary input samples */
+    inI1 = pSrc[1];
+    inI2 = pSrc[3];
+
+    /* conjugate input */
+    inI1 = -inI1;
+
+    /* read imaginary input samples */
+    inI3 = pSrc[5];
+
+    /* conjugate input */
+    inI2 = -inI2;
+
+    /* read imaginary input samples */
+    inI4 = pSrc[7];
+
+    /* conjugate input */
+    inI3 = -inI3;
+
+    /* store imaginary samples to destination */
+    pDst[1] = inI1;
+    pDst[3] = inI2;
+
+    /* conjugate input */
+    inI4 = -inI4;
+
+    /* store imaginary samples to destination */
+    pDst[5] = inI3;
+
+    /* increment source pointer by 8 to process next sampels */
+    pSrc += 8u;
+
+    /* store imaginary sample to destination */
+    pDst[7] = inI4;
+
+    /* increment destination pointer by 8 to store next samples */
+    pDst += 8u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* realOut + j (imagOut) = realIn + j (-1) imagIn */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    *pDst++ = *pSrc++;
+    *pDst++ = -*pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of cmplx_conj group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,161 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_conj_q15.c    
+*    
+* Description:	Q15 complex conjugate.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup cmplx_conj    
+ * @{    
+ */
+
+/**    
+ * @brief  Q15 complex conjugate.    
+ * @param  *pSrc points to the input vector    
+ * @param  *pDst points to the output vector    
+ * @param  numSamples number of complex samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * The Q15 value -1 (0x8000) will be saturated to the maximum allowable positive value 0x7FFF.    
+ */
+
+void arm_cmplx_conj_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t numSamples)
+{
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+  q31_t in1, in2, in3, in4;
+  q31_t zero = 0;
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[0]+jC[1] = A[0]+ j (-1) A[1] */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    in1 = *__SIMD32(pSrc)++;
+    in2 = *__SIMD32(pSrc)++;
+    in3 = *__SIMD32(pSrc)++;
+    in4 = *__SIMD32(pSrc)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    in1 = __QASX(zero, in1);
+    in2 = __QASX(zero, in2);
+    in3 = __QASX(zero, in3);
+    in4 = __QASX(zero, in4);
+
+#else
+
+    in1 = __QSAX(zero, in1);
+    in2 = __QSAX(zero, in2);
+    in3 = __QSAX(zero, in3);
+    in4 = __QSAX(zero, in4);
+
+#endif /* #ifndef ARM_MATH_BIG_ENDIAN */
+
+    in1 = ((uint32_t) in1 >> 16) | ((uint32_t) in1 << 16);
+    in2 = ((uint32_t) in2 >> 16) | ((uint32_t) in2 << 16);
+    in3 = ((uint32_t) in3 >> 16) | ((uint32_t) in3 << 16);
+    in4 = ((uint32_t) in4 >> 16) | ((uint32_t) in4 << 16);
+
+    *__SIMD32(pDst)++ = in1;
+    *__SIMD32(pDst)++ = in2;
+    *__SIMD32(pDst)++ = in3;
+    *__SIMD32(pDst)++ = in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[0]+jC[1] = A[0]+ j (-1) A[1] */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    *pDst++ = *pSrc++;
+    *pDst++ = __SSAT(-*pSrc++, 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  q15_t in;
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+    /* realOut + j (imagOut) = realIn+ j (-1) imagIn */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    *pDst++ = *pSrc++;
+    in = *pSrc++;
+    *pDst++ = (in == (q15_t) 0x8000) ? 0x7fff : -in;
+
+    /* Decrement the loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of cmplx_conj group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_conj_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,180 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_conj_q31.c    
+*    
+* Description:	Q31 complex conjugate.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupCmplxMath        
+ */
+
+/**        
+ * @addtogroup cmplx_conj        
+ * @{        
+ */
+
+/**        
+ * @brief  Q31 complex conjugate.        
+ * @param  *pSrc points to the input vector        
+ * @param  *pDst points to the output vector        
+ * @param  numSamples number of complex samples in each vector        
+ * @return none.        
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function uses saturating arithmetic.        
+ * The Q31 value -1 (0x80000000) will be saturated to the maximum allowable positive value 0x7FFFFFFF.        
+ */
+
+void arm_cmplx_conj_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t numSamples)
+{
+  uint32_t blkCnt;                               /* loop counter */
+  q31_t in;                                      /* Input value */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t inR1, inR2, inR3, inR4;                  /* Temporary real variables */
+  q31_t inI1, inI2, inI3, inI4;                  /* Temporary imaginary variables */
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[0]+jC[1] = A[0]+ j (-1) A[1] */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    /* Saturated to 0x7fffffff if the input is -1(0x80000000) */
+    /* read real input sample */
+    inR1 = pSrc[0];
+    /* store real input sample */
+    pDst[0] = inR1;
+
+    /* read imaginary input sample */
+    inI1 = pSrc[1];
+
+    /* read real input sample */
+    inR2 = pSrc[2];
+    /* store real input sample */
+    pDst[2] = inR2;
+
+    /* read imaginary input sample */
+    inI2 = pSrc[3];
+
+    /* negate imaginary input sample */
+    inI1 = __QSUB(0, inI1);
+
+    /* read real input sample */
+    inR3 = pSrc[4];
+    /* store real input sample */
+    pDst[4] = inR3;
+
+    /* read imaginary input sample */
+    inI3 = pSrc[5];
+
+    /* negate imaginary input sample */
+    inI2 = __QSUB(0, inI2);
+
+    /* read real input sample */
+    inR4 = pSrc[6];
+    /* store real input sample */
+    pDst[6] = inR4;
+
+    /* negate imaginary input sample */
+    inI3 = __QSUB(0, inI3);
+
+    /* store imaginary input sample */
+    inI4 = pSrc[7];
+
+    /* store imaginary input samples */
+    pDst[1] = inI1;
+
+    /* negate imaginary input sample */
+    inI4 = __QSUB(0, inI4);
+
+    /* store imaginary input samples */
+    pDst[3] = inI2;
+
+    /* increment source pointer by 8 to proecess next samples */
+    pSrc += 8u;
+
+    /* store imaginary input samples */
+    pDst[5] = inI3;
+    pDst[7] = inI4;
+
+    /* increment destination pointer by 8 to process next samples */
+    pDst += 8u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = numSamples;
+
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C[0]+jC[1] = A[0]+ j (-1) A[1] */
+    /* Calculate Complex Conjugate and then store the results in the destination buffer. */
+    /* Saturated to 0x7fffffff if the input is -1(0x80000000) */
+    *pDst++ = *pSrc++;
+    in = *pSrc++;
+    *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of cmplx_conj group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,203 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_dot_prod_f32.c    
+*    
+* Description:	Floating-point complex dot product    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @defgroup cmplx_dot_prod Complex Dot Product    
+ *    
+ * Computes the dot product of two complex vectors.    
+ * The vectors are multiplied element-by-element and then summed.    
+ *   
+ * The <code>pSrcA</code> points to the first complex input vector and    
+ * <code>pSrcB</code> points to the second complex input vector.    
+ * <code>numSamples</code> specifies the number of complex samples    
+ * and the data in each array is stored in an interleaved fashion    
+ * (real, imag, real, imag, ...).    
+ * Each array has a total of <code>2*numSamples</code> values.    
+ *    
+ * The underlying algorithm is used:    
+ * <pre>    
+ * realResult=0;    
+ * imagResult=0;    
+ * for(n=0; n<numSamples; n++) {    
+ *     realResult += pSrcA[(2*n)+0]*pSrcB[(2*n)+0] - pSrcA[(2*n)+1]*pSrcB[(2*n)+1];    
+ *     imagResult += pSrcA[(2*n)+0]*pSrcB[(2*n)+1] + pSrcA[(2*n)+1]*pSrcB[(2*n)+0];    
+ * }    
+ * </pre>    
+ *    
+ * There are separate functions for floating-point, Q15, and Q31 data types.    
+ */
+
+/**    
+ * @addtogroup cmplx_dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief  Floating-point complex dot product    
+ * @param  *pSrcA points to the first input vector    
+ * @param  *pSrcB points to the second input vector    
+ * @param  numSamples number of complex samples in each vector    
+ * @param  *realResult real part of the result returned here    
+ * @param  *imagResult imaginary part of the result returned here    
+ * @return none.    
+ */
+
+void arm_cmplx_dot_prod_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  uint32_t numSamples,
+  float32_t * realResult,
+  float32_t * imagResult)
+{
+  float32_t real_sum = 0.0f, imag_sum = 0.0f;    /* Temporary result storage */
+  float32_t a0,b0,c0,d0;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+    
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++; 
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+    
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++; 
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples & 0x3u;
+
+  while(blkCnt > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+
+      /* Decrement the loop counter */
+      numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Store the real and imaginary results in the destination buffers */
+  *realResult = real_sum;
+  *imagResult = imag_sum;
+}
+
+/**    
+ * @} end of cmplx_dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,189 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_dot_prod_q15.c    
+*    
+* Description:	Processing function for the Q15 Complex Dot product    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup cmplx_dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief  Q15 complex dot product    
+ * @param  *pSrcA points to the first input vector    
+ * @param  *pSrcB points to the second input vector    
+ * @param  numSamples number of complex samples in each vector    
+ * @param  *realResult real part of the result returned here    
+ * @param  *imagResult imaginary part of the result returned here    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The intermediate 1.15 by 1.15 multiplications are performed with full precision and yield a 2.30 result.    
+ * These are accumulated in a 64-bit accumulator with 34.30 precision.    
+ * As a final step, the accumulators are converted to 8.24 format.    
+ * The return results <code>realResult</code> and <code>imagResult</code> are in 8.24 format.    
+ */
+
+void arm_cmplx_dot_prod_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  uint32_t numSamples,
+  q31_t * realResult,
+  q31_t * imagResult)
+{
+  q63_t real_sum = 0, imag_sum = 0;              /* Temporary result storage */
+  q15_t a0,b0,c0,d0;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += (q31_t)a0 * c0;
+      imag_sum += (q31_t)a0 * d0;
+      real_sum -= (q31_t)b0 * d0;
+      imag_sum += (q31_t)b0 * c0;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += (q31_t)a0 * c0;
+      imag_sum += (q31_t)a0 * d0;
+      real_sum -= (q31_t)b0 * d0;
+      imag_sum += (q31_t)b0 * c0;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += (q31_t)a0 * c0;
+      imag_sum += (q31_t)a0 * d0;
+      real_sum -= (q31_t)b0 * d0;
+      imag_sum += (q31_t)b0 * c0;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += (q31_t)a0 * c0;
+      imag_sum += (q31_t)a0 * d0;
+      real_sum -= (q31_t)b0 * d0;
+      imag_sum += (q31_t)b0 * c0;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += (q31_t)a0 * c0;
+      imag_sum += (q31_t)a0 * d0;
+      real_sum -= (q31_t)b0 * d0;
+      imag_sum += (q31_t)b0 * c0;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += a0 * c0;
+      imag_sum += a0 * d0;
+      real_sum -= b0 * d0;
+      imag_sum += b0 * c0;
+
+
+      /* Decrement the loop counter */
+      numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Store the real and imaginary results in 8.24 format  */
+  /* Convert real data in 34.30 to 8.24 by 6 right shifts */
+  *realResult = (q31_t) (real_sum >> 6);
+  /* Convert imaginary data in 34.30 to 8.24 by 6 right shifts */
+  *imagResult = (q31_t) (imag_sum >> 6);
+}
+
+/**    
+ * @} end of cmplx_dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_dot_prod_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,187 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_dot_prod_q31.c    
+*    
+* Description:	Q31 complex dot product    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup cmplx_dot_prod    
+ * @{    
+ */
+
+/**    
+ * @brief  Q31 complex dot product    
+ * @param  *pSrcA points to the first input vector    
+ * @param  *pSrcB points to the second input vector    
+ * @param  numSamples number of complex samples in each vector    
+ * @param  *realResult real part of the result returned here    
+ * @param  *imagResult imaginary part of the result returned here    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The intermediate 1.31 by 1.31 multiplications are performed with 64-bit precision and then shifted to 16.48 format.    
+ * The internal real and imaginary accumulators are in 16.48 format and provide 15 guard bits.    
+ * Additions are nonsaturating and no overflow will occur as long as <code>numSamples</code> is less than 32768.    
+ * The return results <code>realResult</code> and <code>imagResult</code> are in 16.48 format.    
+ * Input down scaling is not required.    
+ */
+
+void arm_cmplx_dot_prod_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  uint32_t numSamples,
+  q63_t * realResult,
+  q63_t * imagResult)
+{
+  q63_t real_sum = 0, imag_sum = 0;              /* Temporary result storage */
+  q31_t a0,b0,c0,d0;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += ((q63_t)a0 * c0) >> 14;
+      imag_sum += ((q63_t)a0 * d0) >> 14;
+      real_sum -= ((q63_t)b0 * d0) >> 14;
+      imag_sum += ((q63_t)b0 * c0) >> 14;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += ((q63_t)a0 * c0) >> 14;
+      imag_sum += ((q63_t)a0 * d0) >> 14;
+      real_sum -= ((q63_t)b0 * d0) >> 14;
+      imag_sum += ((q63_t)b0 * c0) >> 14;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += ((q63_t)a0 * c0) >> 14;
+      imag_sum += ((q63_t)a0 * d0) >> 14;
+      real_sum -= ((q63_t)b0 * d0) >> 14;
+      imag_sum += ((q63_t)b0 * c0) >> 14;
+      
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += ((q63_t)a0 * c0) >> 14;
+      imag_sum += ((q63_t)a0 * d0) >> 14;
+      real_sum -= ((q63_t)b0 * d0) >> 14;
+      imag_sum += ((q63_t)b0 * c0) >> 14;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+  }
+
+  /* If the numSamples  is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += ((q63_t)a0 * c0) >> 14;
+      imag_sum += ((q63_t)a0 * d0) >> 14;
+      real_sum -= ((q63_t)b0 * d0) >> 14;
+      imag_sum += ((q63_t)b0 * c0) >> 14;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+      a0 = *pSrcA++;
+      b0 = *pSrcA++;
+      c0 = *pSrcB++;
+      d0 = *pSrcB++;  
+  
+      real_sum += ((q63_t)a0 * c0) >> 14;
+      imag_sum += ((q63_t)a0 * d0) >> 14;
+      real_sum -= ((q63_t)b0 * d0) >> 14;
+      imag_sum += ((q63_t)b0 * c0) >> 14;
+
+      /* Decrement the loop counter */
+      numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Store the real and imaginary results in 16.48 format  */
+  *realResult = real_sum;
+  *imagResult = imag_sum;
+}
+
+/**    
+ * @} end of cmplx_dot_prod group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,165 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_mag_f32.c    
+*    
+* Description:	Floating-point complex magnitude.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @defgroup cmplx_mag Complex Magnitude    
+ *    
+ * Computes the magnitude of the elements of a complex data vector.    
+ *   
+ * The <code>pSrc</code> points to the source data and    
+ * <code>pDst</code> points to the where the result should be written.    
+ * <code>numSamples</code> specifies the number of complex samples    
+ * in the input array and the data is stored in an interleaved fashion    
+ * (real, imag, real, imag, ...).    
+ * The input array has a total of <code>2*numSamples</code> values;    
+ * the output array has a total of <code>numSamples</code> values.    
+ * The underlying algorithm is used:    
+ *    
+ * <pre>    
+ * for(n=0; n<numSamples; n++) {    
+ *     pDst[n] = sqrt(pSrc[(2*n)+0]^2 + pSrc[(2*n)+1]^2);    
+ * }    
+ * </pre>    
+ *    
+ * There are separate functions for floating-point, Q15, and Q31 data types.    
+ */
+
+/**    
+ * @addtogroup cmplx_mag    
+ * @{    
+ */
+/**    
+ * @brief Floating-point complex magnitude.    
+ * @param[in]       *pSrc points to complex input buffer    
+ * @param[out]      *pDst points to real output buffer    
+ * @param[in]       numSamples number of complex samples in the input vector    
+ * @return none.    
+ *    
+ */
+
+
+void arm_cmplx_mag_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t numSamples)
+{
+  float32_t realIn, imagIn;                      /* Temporary variables to hold input values */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+    /* C[0] = sqrt(A[0] * A[0] + A[1] * A[1]) */
+    realIn = *pSrc++;
+    imagIn = *pSrc++;
+    /* store the result in the destination buffer. */
+    arm_sqrt_f32((realIn * realIn) + (imagIn * imagIn), pDst++);
+
+    realIn = *pSrc++;
+    imagIn = *pSrc++;
+    arm_sqrt_f32((realIn * realIn) + (imagIn * imagIn), pDst++);
+
+    realIn = *pSrc++;
+    imagIn = *pSrc++;
+    arm_sqrt_f32((realIn * realIn) + (imagIn * imagIn), pDst++);
+
+    realIn = *pSrc++;
+    imagIn = *pSrc++;
+    arm_sqrt_f32((realIn * realIn) + (imagIn * imagIn), pDst++);
+
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[0] = sqrt(A[0] * A[0] + A[1] * A[1]) */
+    realIn = *pSrc++;
+    imagIn = *pSrc++;
+    /* store the result in the destination buffer. */
+    arm_sqrt_f32((realIn * realIn) + (imagIn * imagIn), pDst++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+    /* out = sqrt((real * real) + (imag * imag)) */
+    realIn = *pSrc++;
+    imagIn = *pSrc++;
+    /* store the result in the destination buffer. */
+    arm_sqrt_f32((realIn * realIn) + (imagIn * imagIn), pDst++);
+
+    /* Decrement the loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of cmplx_mag group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,153 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_mag_q15.c    
+*    
+* Description:	Q15 complex magnitude.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup cmplx_mag    
+ * @{    
+ */
+
+
+/**    
+ * @brief  Q15 complex magnitude    
+ * @param  *pSrc points to the complex input vector    
+ * @param  *pDst points to the real output vector    
+ * @param  numSamples number of complex samples in the input vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function implements 1.15 by 1.15 multiplications and finally output is converted into 2.14 format.    
+ */
+
+void arm_cmplx_mag_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t numSamples)
+{
+  q31_t acc0, acc1;                              /* Accumulators */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+  q31_t in1, in2, in3, in4;
+  q31_t acc2, acc3;
+
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+    /* C[0] = sqrt(A[0] * A[0] + A[1] * A[1]) */
+    in1 = *__SIMD32(pSrc)++;
+    in2 = *__SIMD32(pSrc)++;
+    in3 = *__SIMD32(pSrc)++;
+    in4 = *__SIMD32(pSrc)++;
+
+    acc0 = __SMUAD(in1, in1);
+    acc1 = __SMUAD(in2, in2);
+    acc2 = __SMUAD(in3, in3);
+    acc3 = __SMUAD(in4, in4);
+
+    /* store the result in 2.14 format in the destination buffer. */
+    arm_sqrt_q15((q15_t) ((acc0) >> 17), pDst++);
+    arm_sqrt_q15((q15_t) ((acc1) >> 17), pDst++);
+    arm_sqrt_q15((q15_t) ((acc2) >> 17), pDst++);
+    arm_sqrt_q15((q15_t) ((acc3) >> 17), pDst++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[0] = sqrt(A[0] * A[0] + A[1] * A[1]) */
+    in1 = *__SIMD32(pSrc)++;
+    acc0 = __SMUAD(in1, in1);
+
+    /* store the result in 2.14 format in the destination buffer. */
+    arm_sqrt_q15((q15_t) (acc0 >> 17), pDst++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q15_t real, imag;                              /* Temporary variables to hold input values */
+
+  while(numSamples > 0u)
+  {
+    /* out = sqrt(real * real + imag * imag) */
+    real = *pSrc++;
+    imag = *pSrc++;
+
+    acc0 = (real * real);
+    acc1 = (imag * imag);
+
+    /* store the result in 2.14 format in the destination buffer. */
+    arm_sqrt_q15((q15_t) (((q63_t) acc0 + acc1) >> 17), pDst++);
+
+    /* Decrement the loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of cmplx_mag group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,185 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_mag_q31.c    
+*    
+* Description:	Q31 complex magnitude    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupCmplxMath        
+ */
+
+/**        
+ * @addtogroup cmplx_mag        
+ * @{        
+ */
+
+/**        
+ * @brief  Q31 complex magnitude        
+ * @param  *pSrc points to the complex input vector        
+ * @param  *pDst points to the real output vector        
+ * @param  numSamples number of complex samples in the input vector        
+ * @return none.        
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function implements 1.31 by 1.31 multiplications and finally output is converted into 2.30 format.        
+ * Input down scaling is not required.        
+ */
+
+void arm_cmplx_mag_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t numSamples)
+{
+  q31_t real, imag;                              /* Temporary variables to hold input values */
+  q31_t acc0, acc1;                              /* Accumulators */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t real1, real2, imag1, imag2;              /* Temporary variables to hold input values */
+  q31_t out1, out2, out3, out4;                  /* Accumulators */
+  q63_t mul1, mul2, mul3, mul4;                  /* Temporary variables */
+
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* read complex input from source buffer */
+    real1 = pSrc[0];
+    imag1 = pSrc[1];
+    real2 = pSrc[2];
+    imag2 = pSrc[3];
+
+    /* calculate power of input values */
+    mul1 = (q63_t) real1 *real1;
+    mul2 = (q63_t) imag1 *imag1;
+    mul3 = (q63_t) real2 *real2;
+    mul4 = (q63_t) imag2 *imag2;
+
+    /* get the result to 3.29 format */
+    out1 = (q31_t) (mul1 >> 33);
+    out2 = (q31_t) (mul2 >> 33);
+    out3 = (q31_t) (mul3 >> 33);
+    out4 = (q31_t) (mul4 >> 33);
+
+    /* add real and imaginary accumulators */
+    out1 = out1 + out2;
+    out3 = out3 + out4;
+
+    /* read complex input from source buffer */
+    real1 = pSrc[4];
+    imag1 = pSrc[5];
+    real2 = pSrc[6];
+    imag2 = pSrc[7];
+
+    /* calculate square root */
+    arm_sqrt_q31(out1, &pDst[0]);
+
+    /* calculate power of input values */
+    mul1 = (q63_t) real1 *real1;
+
+    /* calculate square root */
+    arm_sqrt_q31(out3, &pDst[1]);
+
+    /* calculate power of input values */
+    mul2 = (q63_t) imag1 *imag1;
+    mul3 = (q63_t) real2 *real2;
+    mul4 = (q63_t) imag2 *imag2;
+
+    /* get the result to 3.29 format */
+    out1 = (q31_t) (mul1 >> 33);
+    out2 = (q31_t) (mul2 >> 33);
+    out3 = (q31_t) (mul3 >> 33);
+    out4 = (q31_t) (mul4 >> 33);
+
+    /* add real and imaginary accumulators */
+    out1 = out1 + out2;
+    out3 = out3 + out4;
+
+    /* calculate square root */
+    arm_sqrt_q31(out1, &pDst[2]);
+
+    /* increment destination by 8 to process next samples */
+    pSrc += 8u;
+
+    /* calculate square root */
+    arm_sqrt_q31(out3, &pDst[3]);
+
+    /* increment destination by 4 to process next samples */
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C[0] = sqrt(A[0] * A[0] + A[1] * A[1]) */
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 2.30 format in the destination buffer. */
+    arm_sqrt_q31(acc0 + acc1, pDst++);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of cmplx_mag group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,215 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_mag_squared_f32.c    
+*    
+* Description:	Floating-point complex magnitude squared.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupCmplxMath        
+ */
+
+/**        
+ * @defgroup cmplx_mag_squared Complex Magnitude Squared        
+ *        
+ * Computes the magnitude squared of the elements of a complex data vector.        
+ *       
+ * The <code>pSrc</code> points to the source data and        
+ * <code>pDst</code> points to the where the result should be written.        
+ * <code>numSamples</code> specifies the number of complex samples        
+ * in the input array and the data is stored in an interleaved fashion        
+ * (real, imag, real, imag, ...).        
+ * The input array has a total of <code>2*numSamples</code> values;        
+ * the output array has a total of <code>numSamples</code> values.        
+ *        
+ * The underlying algorithm is used:        
+ *        
+ * <pre>        
+ * for(n=0; n<numSamples; n++) {        
+ *     pDst[n] = pSrc[(2*n)+0]^2 + pSrc[(2*n)+1]^2;        
+ * }        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup cmplx_mag_squared        
+ * @{        
+ */
+
+
+/**        
+ * @brief  Floating-point complex magnitude squared        
+ * @param[in]  *pSrc points to the complex input vector        
+ * @param[out]  *pDst points to the real output vector        
+ * @param[in]  numSamples number of complex samples in the input vector        
+ * @return none.        
+ */
+
+void arm_cmplx_mag_squared_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t numSamples)
+{
+  float32_t real, imag;                          /* Temporary variables to store real and imaginary values */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+  float32_t real1, real2, real3, real4;          /* Temporary variables to hold real values */
+  float32_t imag1, imag2, imag3, imag4;          /* Temporary variables to hold imaginary values */
+  float32_t mul1, mul2, mul3, mul4;              /* Temporary variables */
+  float32_t mul5, mul6, mul7, mul8;              /* Temporary variables */
+  float32_t out1, out2, out3, out4;              /* Temporary variables to hold output values */
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[0] = (A[0] * A[0] + A[1] * A[1]) */
+    /* read real input sample from source buffer */
+    real1 = pSrc[0];
+    /* read imaginary input sample from source buffer */
+    imag1 = pSrc[1];
+
+    /* calculate power of real value */
+    mul1 = real1 * real1;
+
+    /* read real input sample from source buffer */
+    real2 = pSrc[2];
+
+    /* calculate power of imaginary value */
+    mul2 = imag1 * imag1;
+
+    /* read imaginary input sample from source buffer */
+    imag2 = pSrc[3];
+
+    /* calculate power of real value */
+    mul3 = real2 * real2;
+
+    /* read real input sample from source buffer */
+    real3 = pSrc[4];
+
+    /* calculate power of imaginary value */
+    mul4 = imag2 * imag2;
+
+    /* read imaginary input sample from source buffer */
+    imag3 = pSrc[5];
+
+    /* calculate power of real value */
+    mul5 = real3 * real3;
+    /* calculate power of imaginary value */
+    mul6 = imag3 * imag3;
+
+    /* read real input sample from source buffer */
+    real4 = pSrc[6];
+
+    /* accumulate real and imaginary powers */
+    out1 = mul1 + mul2;
+
+    /* read imaginary input sample from source buffer */
+    imag4 = pSrc[7];
+
+    /* accumulate real and imaginary powers */
+    out2 = mul3 + mul4;
+
+    /* calculate power of real value */
+    mul7 = real4 * real4;
+    /* calculate power of imaginary value */
+    mul8 = imag4 * imag4;
+
+    /* store output to destination */
+    pDst[0] = out1;
+
+    /* accumulate real and imaginary powers */
+    out3 = mul5 + mul6;
+
+    /* store output to destination */
+    pDst[1] = out2;
+
+    /* accumulate real and imaginary powers */
+    out4 = mul7 + mul8;
+
+    /* store output to destination */
+    pDst[2] = out3;
+
+    /* increment destination pointer by 8 to process next samples */
+    pSrc += 8u;
+
+    /* store output to destination */
+    pDst[3] = out4;
+
+    /* increment destination pointer by 4 to process next samples */
+    pDst += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C[0] = (A[0] * A[0] + A[1] * A[1]) */
+    real = *pSrc++;
+    imag = *pSrc++;
+
+    /* out = (real * real) + (imag * imag) */
+    /* store the result in the destination buffer. */
+    *pDst++ = (real * real) + (imag * imag);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of cmplx_mag_squared group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,148 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_mag_squared_q15.c    
+*    
+* Description:	Q15 complex magnitude squared.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup cmplx_mag_squared    
+ * @{    
+ */
+
+/**    
+ * @brief  Q15 complex magnitude squared    
+ * @param  *pSrc points to the complex input vector    
+ * @param  *pDst points to the real output vector    
+ * @param  numSamples number of complex samples in the input vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function implements 1.15 by 1.15 multiplications and finally output is converted into 3.13 format.    
+ */
+
+void arm_cmplx_mag_squared_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t numSamples)
+{
+  q31_t acc0, acc1;                              /* Accumulators */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+  q31_t in1, in2, in3, in4;
+  q31_t acc2, acc3;
+
+  /*loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[0] = (A[0] * A[0] + A[1] * A[1]) */
+    in1 = *__SIMD32(pSrc)++;
+    in2 = *__SIMD32(pSrc)++;
+    in3 = *__SIMD32(pSrc)++;
+    in4 = *__SIMD32(pSrc)++;
+
+    acc0 = __SMUAD(in1, in1);
+    acc1 = __SMUAD(in2, in2);
+    acc2 = __SMUAD(in3, in3);
+    acc3 = __SMUAD(in4, in4);
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ = (q15_t) (acc0 >> 17);
+    *pDst++ = (q15_t) (acc1 >> 17);
+    *pDst++ = (q15_t) (acc2 >> 17);
+    *pDst++ = (q15_t) (acc3 >> 17);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[0] = (A[0] * A[0] + A[1] * A[1]) */
+    in1 = *__SIMD32(pSrc)++;
+    acc0 = __SMUAD(in1, in1);
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ = (q15_t) (acc0 >> 17);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q15_t real, imag;                              /* Temporary variables to store real and imaginary values */
+
+  while(numSamples > 0u)
+  {
+    /* out = ((real * real) + (imag * imag)) */
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (real * real);
+    acc1 = (imag * imag);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ = (q15_t) (((q63_t) acc0 + acc1) >> 17);
+
+    /* Decrement the loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of cmplx_mag_squared group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mag_squared_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,161 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_cmplx_mag_squared_q31.c    
+*    
+* Description:	Q31 complex magnitude squared.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup cmplx_mag_squared    
+ * @{    
+ */
+
+
+/**    
+ * @brief  Q31 complex magnitude squared    
+ * @param  *pSrc points to the complex input vector    
+ * @param  *pDst points to the real output vector    
+ * @param  numSamples number of complex samples in the input vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function implements 1.31 by 1.31 multiplications and finally output is converted into 3.29 format.    
+ * Input down scaling is not required.    
+ */
+
+void arm_cmplx_mag_squared_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t numSamples)
+{
+  q31_t real, imag;                              /* Temporary variables to store real and imaginary values */
+  q31_t acc0, acc1;                              /* Accumulators */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counter */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[0] = (A[0] * A[0] + A[1] * A[1]) */
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 3.29 format in the destination buffer. */
+    *pDst++ = acc0 + acc1;
+
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 3.29 format in the destination buffer. */
+    *pDst++ = acc0 + acc1;
+
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 3.29 format in the destination buffer. */
+    *pDst++ = acc0 + acc1;
+
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 3.29 format in the destination buffer. */
+    *pDst++ = acc0 + acc1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[0] = (A[0] * A[0] + A[1] * A[1]) */
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 3.29 format in the destination buffer. */
+    *pDst++ = acc0 + acc1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+    /* out = ((real * real) + (imag * imag)) */
+    real = *pSrc++;
+    imag = *pSrc++;
+    acc0 = (q31_t) (((q63_t) real * real) >> 33);
+    acc1 = (q31_t) (((q63_t) imag * imag) >> 33);
+    /* store the result in 3.29 format in the destination buffer. */
+    *pDst++ = acc0 + acc1;
+
+    /* Decrement the loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of cmplx_mag_squared group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,207 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cmplx_mult_cmplx_f32.c    
+*    
+* Description:	Floating-point complex-by-complex multiplication    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupCmplxMath        
+ */
+
+/**        
+ * @defgroup CmplxByCmplxMult Complex-by-Complex Multiplication        
+ *        
+ * Multiplies a complex vector by another complex vector and generates a complex result.        
+ * The data in the complex arrays is stored in an interleaved fashion        
+ * (real, imag, real, imag, ...).        
+ * The parameter <code>numSamples</code> represents the number of complex        
+ * samples processed.  The complex arrays have a total of <code>2*numSamples</code>        
+ * real values.        
+ *        
+ * The underlying algorithm is used:        
+ *        
+ * <pre>        
+ * for(n=0; n<numSamples; n++) {        
+ *     pDst[(2*n)+0] = pSrcA[(2*n)+0] * pSrcB[(2*n)+0] - pSrcA[(2*n)+1] * pSrcB[(2*n)+1];        
+ *     pDst[(2*n)+1] = pSrcA[(2*n)+0] * pSrcB[(2*n)+1] + pSrcA[(2*n)+1] * pSrcB[(2*n)+0];        
+ * }        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup CmplxByCmplxMult        
+ * @{        
+ */
+
+
+/**        
+ * @brief  Floating-point complex-by-complex multiplication        
+ * @param[in]  *pSrcA points to the first input vector        
+ * @param[in]  *pSrcB points to the second input vector        
+ * @param[out]  *pDst  points to the output vector        
+ * @param[in]  numSamples number of complex samples in each vector        
+ * @return none.        
+ */
+
+void arm_cmplx_mult_cmplx_f32(
+  float32_t * pSrcA,
+  float32_t * pSrcB,
+  float32_t * pDst,
+  uint32_t numSamples)
+{
+  float32_t a1, b1, c1, d1;                      /* Temporary variables to store real and imaginary values */
+  uint32_t blkCnt;                               /* loop counters */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t a2, b2, c2, d2;                      /* Temporary variables to store real and imaginary values */
+  float32_t acc1, acc2, acc3, acc4;
+
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a1 = *pSrcA;                /* A[2 * i] */
+    c1 = *pSrcB;                /* B[2 * i] */
+
+    b1 = *(pSrcA + 1);          /* A[2 * i + 1] */
+    acc1 = a1 * c1;             /* acc1 = A[2 * i] * B[2 * i] */
+
+    a2 = *(pSrcA + 2);          /* A[2 * i + 2] */
+    acc2 = (b1 * c1);           /* acc2 = A[2 * i + 1] * B[2 * i] */
+
+    d1 = *(pSrcB + 1);          /* B[2 * i + 1] */
+    c2 = *(pSrcB + 2);          /* B[2 * i + 2] */
+    acc1 -= b1 * d1;            /* acc1 =      A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1] */
+
+    d2 = *(pSrcB + 3);          /* B[2 * i + 3] */
+    acc3 = a2 * c2;             /* acc3 =       A[2 * i + 2] * B[2 * i + 2] */
+
+    b2 = *(pSrcA + 3);          /* A[2 * i + 3] */
+    acc2 += (a1 * d1);          /* acc2 =      A[2 * i + 1] * B[2 * i] + A[2 * i] * B[2 * i + 1] */
+
+    a1 = *(pSrcA + 4);          /* A[2 * i + 4] */
+    acc4 = (a2 * d2);           /* acc4 =   A[2 * i + 2] * B[2 * i + 3] */
+
+    c1 = *(pSrcB + 4);          /* B[2 * i + 4] */
+    acc3 -= (b2 * d2);          /* acc3 =       A[2 * i + 2] * B[2 * i + 2] - A[2 * i + 3] * B[2 * i + 3] */
+    *pDst = acc1;               /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1] */
+
+    b1 = *(pSrcA + 5);          /* A[2 * i + 5] */
+    acc4 += b2 * c2;            /* acc4 =   A[2 * i + 2] * B[2 * i + 3] + A[2 * i + 3] * B[2 * i + 2] */
+
+    *(pDst + 1) = acc2;         /* C[2 * i + 1] = A[2 * i + 1] * B[2 * i] + A[2 * i] * B[2 * i + 1]  */
+    acc1 = (a1 * c1);
+
+    d1 = *(pSrcB + 5);
+    acc2 = (b1 * c1);
+
+    *(pDst + 2) = acc3;
+    *(pDst + 3) = acc4;
+
+    a2 = *(pSrcA + 6);
+    acc1 -= (b1 * d1);
+
+    c2 = *(pSrcB + 6);
+    acc2 += (a1 * d1);
+
+    b2 = *(pSrcA + 7);
+    acc3 = (a2 * c2);
+
+    d2 = *(pSrcB + 7);
+    acc4 = (b2 * c2);
+
+    *(pDst + 4) = acc1;
+    pSrcA += 8u;
+
+    acc3 -= (b2 * d2);
+    acc4 += (a2 * d2);
+
+    *(pDst + 5) = acc2;
+    pSrcB += 8u;
+
+    *(pDst + 6) = acc3;
+    *(pDst + 7) = acc4;
+
+    pDst += 8u;
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a1 = *pSrcA++;
+    b1 = *pSrcA++;
+    c1 = *pSrcB++;
+    d1 = *pSrcB++;
+
+    /* store the result in the destination buffer. */
+    *pDst++ = (a1 * c1) - (b1 * d1);
+    *pDst++ = (a1 * d1) + (b1 * c1);
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of CmplxByCmplxMult group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,193 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cmplx_mult_cmplx_q15.c    
+*    
+* Description:	Q15 complex-by-complex multiplication    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup CmplxByCmplxMult    
+ * @{    
+ */
+
+/**    
+ * @brief  Q15 complex-by-complex multiplication    
+ * @param[in]  *pSrcA points to the first input vector    
+ * @param[in]  *pSrcB points to the second input vector    
+ * @param[out]  *pDst  points to the output vector    
+ * @param[in]  numSamples number of complex samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function implements 1.15 by 1.15 multiplications and finally output is converted into 3.13 format.    
+ */
+
+void arm_cmplx_mult_cmplx_q15(
+  q15_t * pSrcA,
+  q15_t * pSrcB,
+  q15_t * pDst,
+  uint32_t numSamples)
+{
+  q15_t a, b, c, d;                              /* Temporary variables to store real and imaginary values */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counters */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * c) >> 17) - (((q31_t) b * d) >> 17);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * d) >> 17) + (((q31_t) b * c) >> 17);
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * c) >> 17) - (((q31_t) b * d) >> 17);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * d) >> 17) + (((q31_t) b * c) >> 17);
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * c) >> 17) - (((q31_t) b * d) >> 17);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * d) >> 17) + (((q31_t) b * c) >> 17);
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * c) >> 17) - (((q31_t) b * d) >> 17);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * d) >> 17) + (((q31_t) b * c) >> 17);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * c) >> 17) - (((q31_t) b * d) >> 17);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * d) >> 17) + (((q31_t) b * c) >> 17);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * c) >> 17) - (((q31_t) b * d) >> 17);
+    /* store the result in 3.13 format in the destination buffer. */
+    *pDst++ =
+      (q15_t) (q31_t) (((q31_t) a * d) >> 17) + (((q31_t) b * c) >> 17);
+
+    /* Decrement the blockSize loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of CmplxByCmplxMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_cmplx_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,326 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cmplx_mult_cmplx_q31.c    
+*    
+* Description:	Q31 complex-by-complex multiplication    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup CmplxByCmplxMult    
+ * @{    
+ */
+
+
+/**    
+ * @brief  Q31 complex-by-complex multiplication    
+ * @param[in]  *pSrcA points to the first input vector    
+ * @param[in]  *pSrcB points to the second input vector    
+ * @param[out]  *pDst  points to the output vector    
+ * @param[in]  numSamples number of complex samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function implements 1.31 by 1.31 multiplications and finally output is converted into 3.29 format.    
+ * Input down scaling is not required.    
+ */
+
+void arm_cmplx_mult_cmplx_q31(
+  q31_t * pSrcA,
+  q31_t * pSrcB,
+  q31_t * pDst,
+  uint32_t numSamples)
+{
+  q31_t a, b, c, d;                              /* Temporary variables to store real and imaginary values */
+  uint32_t blkCnt;                               /* loop counters */
+  q31_t mul1, mul2, mul3, mul4;
+  q31_t out1, out2;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 1u;
+
+  /* First part of the processing with loop unrolling.  Compute 2 outputs at a time.     
+   ** a second loop below computes the remaining 1 sample. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 2, compute any remaining output samples here.     
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x2u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[2 * i] - A[2 * i + 1] * B[2 * i + 1].  */
+    /* C[2 * i + 1] = A[2 * i] * B[2 * i + 1] + A[2 * i + 1] * B[2 * i].  */
+    a = *pSrcA++;
+    b = *pSrcA++;
+    c = *pSrcB++;
+    d = *pSrcB++;
+
+    mul1 = (q31_t) (((q63_t) a * c) >> 32);
+    mul2 = (q31_t) (((q63_t) b * d) >> 32);
+    mul3 = (q31_t) (((q63_t) a * d) >> 32);
+    mul4 = (q31_t) (((q63_t) b * c) >> 32);
+
+    mul1 = (mul1 >> 1);
+    mul2 = (mul2 >> 1);
+    mul3 = (mul3 >> 1);
+    mul4 = (mul4 >> 1);
+
+    out1 = mul1 - mul2;
+    out2 = mul3 + mul4;
+
+    /* store the real result in 3.29 format in the destination buffer. */
+    *pDst++ = out1;
+    /* store the imag result in 3.29 format in the destination buffer. */
+    *pDst++ = out2;
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of CmplxByCmplxMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,225 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cmplx_mult_real_f32.c    
+*    
+* Description:	Floating-point complex by real multiplication    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupCmplxMath        
+ */
+
+/**        
+ * @defgroup CmplxByRealMult Complex-by-Real Multiplication        
+ *        
+ * Multiplies a complex vector by a real vector and generates a complex result.        
+ * The data in the complex arrays is stored in an interleaved fashion        
+ * (real, imag, real, imag, ...).        
+ * The parameter <code>numSamples</code> represents the number of complex        
+ * samples processed.  The complex arrays have a total of <code>2*numSamples</code>        
+ * real values while the real array has a total of <code>numSamples</code>        
+ * real values.        
+ *        
+ * The underlying algorithm is used:        
+ *        
+ * <pre>        
+ * for(n=0; n<numSamples; n++) {        
+ *     pCmplxDst[(2*n)+0] = pSrcCmplx[(2*n)+0] * pSrcReal[n];        
+ *     pCmplxDst[(2*n)+1] = pSrcCmplx[(2*n)+1] * pSrcReal[n];        
+ * }        
+ * </pre>        
+ *        
+ * There are separate functions for floating-point, Q15, and Q31 data types.        
+ */
+
+/**        
+ * @addtogroup CmplxByRealMult        
+ * @{        
+ */
+
+
+/**        
+ * @brief  Floating-point complex-by-real multiplication        
+ * @param[in]  *pSrcCmplx points to the complex input vector        
+ * @param[in]  *pSrcReal points to the real input vector        
+ * @param[out]  *pCmplxDst points to the complex output vector        
+ * @param[in]  numSamples number of samples in each vector        
+ * @return none.        
+ */
+
+void arm_cmplx_mult_real_f32(
+  float32_t * pSrcCmplx,
+  float32_t * pSrcReal,
+  float32_t * pCmplxDst,
+  uint32_t numSamples)
+{
+  float32_t in;                                  /* Temporary variable to store input value */
+  uint32_t blkCnt;                               /* loop counters */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t inA1, inA2, inA3, inA4;              /* Temporary variables to hold input data */
+  float32_t inA5, inA6, inA7, inA8;              /* Temporary variables to hold input data */
+  float32_t inB1, inB2, inB3, inB4;              /* Temporary variables to hold input data */
+  float32_t out1, out2, out3, out4;              /* Temporary variables to hold output data */
+  float32_t out5, out6, out7, out8;              /* Temporary variables to hold output data */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[i].            */
+    /* C[2 * i + 1] = A[2 * i + 1] * B[i].        */
+    /* read input from complex input buffer */
+    inA1 = pSrcCmplx[0];
+    inA2 = pSrcCmplx[1];
+    /* read input from real input buffer */
+    inB1 = pSrcReal[0];
+
+    /* read input from complex input buffer */
+    inA3 = pSrcCmplx[2];
+
+    /* multiply complex buffer real input with real buffer input */
+    out1 = inA1 * inB1;
+
+    /* read input from complex input buffer */
+    inA4 = pSrcCmplx[3];
+
+    /* multiply complex buffer imaginary input with real buffer input */
+    out2 = inA2 * inB1;
+
+    /* read input from real input buffer */
+    inB2 = pSrcReal[1];
+    /* read input from complex input buffer */
+    inA5 = pSrcCmplx[4];
+
+    /* multiply complex buffer real input with real buffer input */
+    out3 = inA3 * inB2;
+
+    /* read input from complex input buffer */
+    inA6 = pSrcCmplx[5];
+    /* read input from real input buffer */
+    inB3 = pSrcReal[2];
+
+    /* multiply complex buffer imaginary input with real buffer input */
+    out4 = inA4 * inB2;
+
+    /* read input from complex input buffer */
+    inA7 = pSrcCmplx[6];
+
+    /* multiply complex buffer real input with real buffer input */
+    out5 = inA5 * inB3;
+
+    /* read input from complex input buffer */
+    inA8 = pSrcCmplx[7];
+
+    /* multiply complex buffer imaginary input with real buffer input */
+    out6 = inA6 * inB3;
+
+    /* read input from real input buffer */
+    inB4 = pSrcReal[3];
+
+    /* store result to destination bufer */
+    pCmplxDst[0] = out1;
+
+    /* multiply complex buffer real input with real buffer input */
+    out7 = inA7 * inB4;
+
+    /* store result to destination bufer */
+    pCmplxDst[1] = out2;
+
+    /* multiply complex buffer imaginary input with real buffer input */
+    out8 = inA8 * inB4;
+
+    /* store result to destination bufer */
+    pCmplxDst[2] = out3;
+    pCmplxDst[3] = out4;
+    pCmplxDst[4] = out5;
+
+    /* incremnet complex input buffer by 8 to process next samples */
+    pSrcCmplx += 8u;
+
+    /* store result to destination bufer */
+    pCmplxDst[5] = out6;
+
+    /* increment real input buffer by 4 to process next samples */
+    pSrcReal += 4u;
+
+    /* store result to destination bufer */
+    pCmplxDst[6] = out7;
+    pCmplxDst[7] = out8;
+
+    /* increment destination buffer by 8 to process next sampels */
+    pCmplxDst += 8u;
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[i].            */
+    /* C[2 * i + 1] = A[2 * i + 1] * B[i].        */
+    in = *pSrcReal++;
+    /* store the result in the destination buffer. */
+    *pCmplxDst++ = (*pSrcCmplx++) * (in);
+    *pCmplxDst++ = (*pSrcCmplx++) * (in);
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+}
+
+/**        
+ * @} end of CmplxByRealMult group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,203 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cmplx_mult_real_q15.c    
+*    
+* Description:	Q15 complex by real multiplication    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup CmplxByRealMult    
+ * @{    
+ */
+
+
+/**    
+ * @brief  Q15 complex-by-real multiplication    
+ * @param[in]  *pSrcCmplx points to the complex input vector    
+ * @param[in]  *pSrcReal points to the real input vector    
+ * @param[out]  *pCmplxDst points to the complex output vector    
+ * @param[in]  numSamples number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+void arm_cmplx_mult_real_q15(
+  q15_t * pSrcCmplx,
+  q15_t * pSrcReal,
+  q15_t * pCmplxDst,
+  uint32_t numSamples)
+{
+  q15_t in;                                      /* Temporary variable to store input value */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counters */
+  q31_t inA1, inA2;                              /* Temporary variables to hold input data */
+  q31_t inB1;                                    /* Temporary variables to hold input data */
+  q15_t out1, out2, out3, out4;                  /* Temporary variables to hold output data */
+  q31_t mul1, mul2, mul3, mul4;                  /* Temporary variables to hold intermediate data */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[i].            */
+    /* C[2 * i + 1] = A[2 * i + 1] * B[i].        */
+    /* read complex number both real and imaginary from complex input buffer */
+    inA1 = *__SIMD32(pSrcCmplx)++;
+    /* read two real values at a time from real input buffer */
+    inB1 = *__SIMD32(pSrcReal)++;
+    /* read complex number both real and imaginary from complex input buffer */
+    inA2 = *__SIMD32(pSrcCmplx)++;
+
+    /* multiply complex number with real numbers */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    mul1 = (q31_t) ((q15_t) (inA1) * (q15_t) (inB1));
+    mul2 = (q31_t) ((q15_t) (inA1 >> 16) * (q15_t) (inB1));
+    mul3 = (q31_t) ((q15_t) (inA2) * (q15_t) (inB1 >> 16));
+    mul4 = (q31_t) ((q15_t) (inA2 >> 16) * (q15_t) (inB1 >> 16));
+
+#else
+
+    mul2 = (q31_t) ((q15_t) (inA1 >> 16) * (q15_t) (inB1 >> 16));
+    mul1 = (q31_t) ((q15_t) inA1 * (q15_t) (inB1 >> 16));
+    mul4 = (q31_t) ((q15_t) (inA2 >> 16) * (q15_t) inB1);
+    mul3 = (q31_t) ((q15_t) inA2 * (q15_t) inB1);
+
+#endif /* #ifndef ARM_MATH_BIG_ENDIAN */
+
+    /* saturate the result */
+    out1 = (q15_t) __SSAT(mul1 >> 15u, 16);
+    out2 = (q15_t) __SSAT(mul2 >> 15u, 16);
+    out3 = (q15_t) __SSAT(mul3 >> 15u, 16);
+    out4 = (q15_t) __SSAT(mul4 >> 15u, 16);
+
+    /* pack real and imaginary outputs and store them to destination */
+    *__SIMD32(pCmplxDst)++ = __PKHBT(out1, out2, 16);
+    *__SIMD32(pCmplxDst)++ = __PKHBT(out3, out4, 16);
+
+    inA1 = *__SIMD32(pSrcCmplx)++;
+    inB1 = *__SIMD32(pSrcReal)++;
+    inA2 = *__SIMD32(pSrcCmplx)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    mul1 = (q31_t) ((q15_t) (inA1) * (q15_t) (inB1));
+    mul2 = (q31_t) ((q15_t) (inA1 >> 16) * (q15_t) (inB1));
+    mul3 = (q31_t) ((q15_t) (inA2) * (q15_t) (inB1 >> 16));
+    mul4 = (q31_t) ((q15_t) (inA2 >> 16) * (q15_t) (inB1 >> 16));
+
+#else
+
+    mul2 = (q31_t) ((q15_t) (inA1 >> 16) * (q15_t) (inB1 >> 16));
+    mul1 = (q31_t) ((q15_t) inA1 * (q15_t) (inB1 >> 16));
+    mul4 = (q31_t) ((q15_t) (inA2 >> 16) * (q15_t) inB1);
+    mul3 = (q31_t) ((q15_t) inA2 * (q15_t) inB1);
+
+#endif /* #ifndef ARM_MATH_BIG_ENDIAN */
+
+    out1 = (q15_t) __SSAT(mul1 >> 15u, 16);
+    out2 = (q15_t) __SSAT(mul2 >> 15u, 16);
+    out3 = (q15_t) __SSAT(mul3 >> 15u, 16);
+    out4 = (q15_t) __SSAT(mul4 >> 15u, 16);
+
+    *__SIMD32(pCmplxDst)++ = __PKHBT(out1, out2, 16);
+    *__SIMD32(pCmplxDst)++ = __PKHBT(out3, out4, 16);
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[i].            */
+    /* C[2 * i + 1] = A[2 * i + 1] * B[i].        */
+    in = *pSrcReal++;
+    /* store the result in the destination buffer. */
+    *pCmplxDst++ =
+      (q15_t) __SSAT((((q31_t) (*pSrcCmplx++) * (in)) >> 15), 16);
+    *pCmplxDst++ =
+      (q15_t) __SSAT((((q31_t) (*pSrcCmplx++) * (in)) >> 15), 16);
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+    /* realOut = realA * realB.            */
+    /* imagOut = imagA * realB.                */
+    in = *pSrcReal++;
+    /* store the result in the destination buffer. */
+    *pCmplxDst++ =
+      (q15_t) __SSAT((((q31_t) (*pSrcCmplx++) * (in)) >> 15), 16);
+    *pCmplxDst++ =
+      (q15_t) __SSAT((((q31_t) (*pSrcCmplx++) * (in)) >> 15), 16);
+
+    /* Decrement the numSamples loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of CmplxByRealMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ComplexMathFunctions/arm_cmplx_mult_real_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,223 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cmplx_mult_real_q31.c    
+*    
+* Description:	Q31 complex by real multiplication    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupCmplxMath    
+ */
+
+/**    
+ * @addtogroup CmplxByRealMult    
+ * @{    
+ */
+
+
+/**    
+ * @brief  Q31 complex-by-real multiplication    
+ * @param[in]  *pSrcCmplx points to the complex input vector    
+ * @param[in]  *pSrcReal points to the real input vector    
+ * @param[out]  *pCmplxDst points to the complex output vector    
+ * @param[in]  numSamples number of samples in each vector    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] will be saturated.    
+ */
+
+void arm_cmplx_mult_real_q31(
+  q31_t * pSrcCmplx,
+  q31_t * pSrcReal,
+  q31_t * pCmplxDst,
+  uint32_t numSamples)
+{
+  q31_t inA1;                                    /* Temporary variable to store input value */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  uint32_t blkCnt;                               /* loop counters */
+  q31_t inA2, inA3, inA4;                        /* Temporary variables to hold input data */
+  q31_t inB1, inB2;                              /* Temporary variabels to hold input data */
+  q31_t out1, out2, out3, out4;                  /* Temporary variables to hold output data */
+
+  /* loop Unrolling */
+  blkCnt = numSamples >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[i].            */
+    /* C[2 * i + 1] = A[2 * i + 1] * B[i].        */
+    /* read real input from complex input buffer */
+    inA1 = *pSrcCmplx++;
+    inA2 = *pSrcCmplx++;
+    /* read input from real input bufer */
+    inB1 = *pSrcReal++;
+    inB2 = *pSrcReal++;
+    /* read imaginary input from complex input buffer */
+    inA3 = *pSrcCmplx++;
+    inA4 = *pSrcCmplx++;
+
+    /* multiply complex input with real input */
+    out1 = ((q63_t) inA1 * inB1) >> 32;
+    out2 = ((q63_t) inA2 * inB1) >> 32;
+    out3 = ((q63_t) inA3 * inB2) >> 32;
+    out4 = ((q63_t) inA4 * inB2) >> 32;
+
+    /* sature the result */
+    out1 = __SSAT(out1, 31);
+    out2 = __SSAT(out2, 31);
+    out3 = __SSAT(out3, 31);
+    out4 = __SSAT(out4, 31);
+
+    /* get result in 1.31 format */
+    out1 = out1 << 1;
+    out2 = out2 << 1;
+    out3 = out3 << 1;
+    out4 = out4 << 1;
+
+    /* store the result to destination buffer */
+    *pCmplxDst++ = out1;
+    *pCmplxDst++ = out2;
+    *pCmplxDst++ = out3;
+    *pCmplxDst++ = out4;
+
+    /* read real input from complex input buffer */
+    inA1 = *pSrcCmplx++;
+    inA2 = *pSrcCmplx++;
+    /* read input from real input bufer */
+    inB1 = *pSrcReal++;
+    inB2 = *pSrcReal++;
+    /* read imaginary input from complex input buffer */
+    inA3 = *pSrcCmplx++;
+    inA4 = *pSrcCmplx++;
+
+    /* multiply complex input with real input */
+    out1 = ((q63_t) inA1 * inB1) >> 32;
+    out2 = ((q63_t) inA2 * inB1) >> 32;
+    out3 = ((q63_t) inA3 * inB2) >> 32;
+    out4 = ((q63_t) inA4 * inB2) >> 32;
+
+    /* sature the result */
+    out1 = __SSAT(out1, 31);
+    out2 = __SSAT(out2, 31);
+    out3 = __SSAT(out3, 31);
+    out4 = __SSAT(out4, 31);
+
+    /* get result in 1.31 format */
+    out1 = out1 << 1;
+    out2 = out2 << 1;
+    out3 = out3 << 1;
+    out4 = out4 << 1;
+
+    /* store the result to destination buffer */
+    *pCmplxDst++ = out1;
+    *pCmplxDst++ = out2;
+    *pCmplxDst++ = out3;
+    *pCmplxDst++ = out4;
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+
+  /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = numSamples % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C[2 * i] = A[2 * i] * B[i].            */
+    /* C[2 * i + 1] = A[2 * i + 1] * B[i].        */
+    /* read real input from complex input buffer */
+    inA1 = *pSrcCmplx++;
+    inA2 = *pSrcCmplx++;
+    /* read input from real input bufer */
+    inB1 = *pSrcReal++;
+
+    /* multiply complex input with real input */
+    out1 = ((q63_t) inA1 * inB1) >> 32;
+    out2 = ((q63_t) inA2 * inB1) >> 32;
+
+    /* sature the result */
+    out1 = __SSAT(out1, 31);
+    out2 = __SSAT(out2, 31);
+
+    /* get result in 1.31 format */
+    out1 = out1 << 1;
+    out2 = out2 << 1;
+
+    /* store the result to destination buffer */
+    *pCmplxDst++ = out1;
+    *pCmplxDst++ = out2;
+
+    /* Decrement the numSamples loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(numSamples > 0u)
+  {
+    /* realOut = realA * realB.            */
+    /* imagReal = imagA * realB.               */
+    inA1 = *pSrcReal++;
+    /* store the result in the destination buffer. */
+    *pCmplxDst++ =
+      (q31_t) clip_q63_to_q31(((q63_t) * pSrcCmplx++ * inA1) >> 31);
+    *pCmplxDst++ =
+      (q31_t) clip_q63_to_q31(((q63_t) * pSrcCmplx++ * inA1) >> 31);
+
+    /* Decrement the numSamples loop counter */
+    numSamples--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of CmplxByRealMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,87 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_pid_init_f32.c    
+*    
+* Description:	Floating-point PID Control initialization function    
+*				   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+ /**    
+ * @addtogroup PID    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the floating-point PID Control.   
+ * @param[in,out] *S points to an instance of the PID structure.   
+ * @param[in]     resetStateFlag  flag to reset the state. 0 = no change in state & 1 = reset the state.   
+ * @return none.   
+ * \par Description:   
+ * \par    
+ * The <code>resetStateFlag</code> specifies whether to set state to zero or not. \n   
+ * The function computes the structure fields: <code>A0</code>, <code>A1</code> <code>A2</code>    
+ * using the proportional gain( \c Kp), integral gain( \c Ki) and derivative gain( \c Kd)    
+ * also sets the state variables to all zeros.    
+ */
+
+void arm_pid_init_f32(
+  arm_pid_instance_f32 * S,
+  int32_t resetStateFlag)
+{
+
+  /* Derived coefficient A0 */
+  S->A0 = S->Kp + S->Ki + S->Kd;
+
+  /* Derived coefficient A1 */
+  S->A1 = (-S->Kp) - ((float32_t) 2.0 * S->Kd);
+
+  /* Derived coefficient A2 */
+  S->A2 = S->Kd;
+
+  /* Check whether state needs reset or not */
+  if(resetStateFlag)
+  {
+    /* Clear the state buffer.  The size will be always 3 samples */
+    memset(S->state, 0, 3u * sizeof(float32_t));
+  }
+
+}
+
+/**    
+ * @} end of PID group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,122 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_pid_init_q15.c    
+*    
+* Description:	Q15 PID Control initialization function    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+ /**    
+ * @addtogroup PID    
+ * @{    
+ */
+
+/**    
+ * @details    
+ * @param[in,out] *S points to an instance of the Q15 PID structure.    
+ * @param[in]     resetStateFlag  flag to reset the state. 0 = no change in state 1 = reset the state.    
+ * @return none.    
+ * \par Description:   
+ * \par    
+ * The <code>resetStateFlag</code> specifies whether to set state to zero or not. \n   
+ * The function computes the structure fields: <code>A0</code>, <code>A1</code> <code>A2</code>    
+ * using the proportional gain( \c Kp), integral gain( \c Ki) and derivative gain( \c Kd)    
+ * also sets the state variables to all zeros.    
+ */
+
+void arm_pid_init_q15(
+  arm_pid_instance_q15 * S,
+  int32_t resetStateFlag)
+{
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* Derived coefficient A0 */
+  S->A0 = __QADD16(__QADD16(S->Kp, S->Ki), S->Kd);
+
+  /* Derived coefficients and pack into A1 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+  S->A1 = __PKHBT(-__QADD16(__QADD16(S->Kd, S->Kd), S->Kp), S->Kd, 16);
+
+#else
+
+  S->A1 = __PKHBT(S->Kd, -__QADD16(__QADD16(S->Kd, S->Kd), S->Kp), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+  /* Check whether state needs reset or not */
+  if(resetStateFlag)
+  {
+    /* Clear the state buffer.  The size will be always 3 samples */
+    memset(S->state, 0, 3u * sizeof(q15_t));
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t temp;                                    /*to store the sum */
+
+  /* Derived coefficient A0 */
+  temp = S->Kp + S->Ki + S->Kd;
+  S->A0 = (q15_t) __SSAT(temp, 16);
+
+  /* Derived coefficients and pack into A1 */
+  temp = -(S->Kd + S->Kd + S->Kp);
+  S->A1 = (q15_t) __SSAT(temp, 16);
+  S->A2 = S->Kd;
+
+
+
+  /* Check whether state needs reset or not */
+  if(resetStateFlag)
+  {
+    /* Clear the state buffer.  The size will be always 3 samples */
+    memset(S->state, 0, 3u * sizeof(q15_t));
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of PID group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,107 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_pid_init_q31.c    
+*    
+* Description:	Q31 PID Control initialization function     
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+ /**    
+ * @addtogroup PID    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the Q31 PID Control.   
+ * @param[in,out] *S points to an instance of the Q31 PID structure.   
+ * @param[in]     resetStateFlag  flag to reset the state. 0 = no change in state 1 = reset the state.   
+ * @return none.    
+ * \par Description:   
+ * \par    
+ * The <code>resetStateFlag</code> specifies whether to set state to zero or not. \n   
+ * The function computes the structure fields: <code>A0</code>, <code>A1</code> <code>A2</code>    
+ * using the proportional gain( \c Kp), integral gain( \c Ki) and derivative gain( \c Kd)    
+ * also sets the state variables to all zeros.    
+ */
+
+void arm_pid_init_q31(
+  arm_pid_instance_q31 * S,
+  int32_t resetStateFlag)
+{
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* Derived coefficient A0 */
+  S->A0 = __QADD(__QADD(S->Kp, S->Ki), S->Kd);
+
+  /* Derived coefficient A1 */
+  S->A1 = -__QADD(__QADD(S->Kd, S->Kd), S->Kp);
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t temp;
+
+  /* Derived coefficient A0 */
+  temp = clip_q63_to_q31((q63_t) S->Kp + S->Ki);
+  S->A0 = clip_q63_to_q31((q63_t) temp + S->Kd);
+
+  /* Derived coefficient A1 */
+  temp = clip_q63_to_q31((q63_t) S->Kd + S->Kd);
+  S->A1 = -clip_q63_to_q31((q63_t) temp + S->Kp);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Derived coefficient A2 */
+  S->A2 = S->Kd;
+
+  /* Check whether state needs reset or not */
+  if(resetStateFlag)
+  {
+    /* Clear the state buffer.  The size will be always 3 samples */
+    memset(S->state, 0, 3u * sizeof(q31_t));
+  }
+
+}
+
+/**    
+ * @} end of PID group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,65 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_pid_reset_f32.c    
+*    
+* Description:	Floating-point PID Control reset function   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+ /**    
+ * @addtogroup PID    
+ * @{    
+ */
+
+/**    
+* @brief  Reset function for the floating-point PID Control.   
+* @param[in] *S	Instance pointer of PID control data structure.   
+* @return none.    
+* \par Description:   
+* The function resets the state buffer to zeros.    
+*/
+void arm_pid_reset_f32(
+  arm_pid_instance_f32 * S)
+{
+
+  /* Clear the state buffer.  The size will be always 3 samples */
+  memset(S->state, 0, 3u * sizeof(float32_t));
+}
+
+/**    
+ * @} end of PID group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,64 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_pid_reset_q15.c    
+*    
+* Description:	Q15 PID Control reset function   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+ /**    
+ * @addtogroup PID    
+ * @{    
+ */
+
+/**    
+* @brief  Reset function for the Q15 PID Control.   
+* @param[in] *S		Instance pointer of PID control data structure.   
+* @return none.    
+* \par Description:   
+* The function resets the state buffer to zeros.    
+*/
+void arm_pid_reset_q15(
+  arm_pid_instance_q15 * S)
+{
+  /* Reset state to zero, The size will be always 3 samples */
+  memset(S->state, 0, 3u * sizeof(q15_t));
+}
+
+/**    
+ * @} end of PID group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_pid_reset_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,65 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_pid_reset_q31.c    
+*    
+* Description:	Q31 PID Control reset function   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+ /**    
+ * @addtogroup PID    
+ * @{    
+ */
+
+/**    
+* @brief  Reset function for the Q31 PID Control.   
+* @param[in] *S	Instance pointer of PID control data structure.   
+* @return none.    
+* \par Description:   
+* The function resets the state buffer to zeros.    
+*/
+void arm_pid_reset_q31(
+  arm_pid_instance_q31 * S)
+{
+
+  /* Clear the state buffer.  The size will be always 3 samples */
+  memset(S->state, 0, 3u * sizeof(q31_t));
+}
+
+/**    
+ * @} end of PID group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_sin_cos_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_sin_cos_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,149 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sin_cos_f32.c    
+*    
+* Description:	Sine and Cosine calculation for floating-point values.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @ingroup groupController    
+ */
+
+/**    
+ * @defgroup SinCos Sine Cosine   
+ *    
+ * Computes the trigonometric sine and cosine values using a combination of table lookup   
+ * and linear interpolation.     
+ * There are separate functions for Q31 and floating-point data types.   
+ * The input to the floating-point version is in degrees while the   
+ * fixed-point Q31 have a scaled input with the range   
+ * [-1 0.9999] mapping to [-180 +180] degrees.   
+ *
+ * The floating point function also allows values that are out of the usual range. When this happens, the function will
+ * take extra time to adjust the input value to the range of [-180 180].
+ *   
+ * The implementation is based on table lookup using 360 values together with linear interpolation.   
+ * The steps used are:   
+ *  -# Calculation of the nearest integer table index.   
+ *  -# Compute the fractional portion (fract) of the input.   
+ *  -# Fetch the value corresponding to \c index from sine table to \c y0 and also value from \c index+1 to \c y1.      
+ *  -# Sine value is computed as <code> *psinVal = y0 + (fract * (y1 - y0))</code>.    
+ *  -# Fetch the value corresponding to \c index from cosine table to \c y0 and also value from \c index+1 to \c y1.      
+ *  -# Cosine value is computed as <code> *pcosVal = y0 + (fract * (y1 - y0))</code>.    
+ */
+
+ /**    
+ * @addtogroup SinCos    
+ * @{    
+ */
+
+/**    
+ * @brief  Floating-point sin_cos function.   
+ * @param[in]  theta    input value in degrees    
+ * @param[out] *pSinVal points to the processed sine output.    
+ * @param[out] *pCosVal points to the processed cos output.    
+ * @return none.   
+ */
+
+void arm_sin_cos_f32(
+  float32_t theta,
+  float32_t * pSinVal,
+  float32_t * pCosVal)
+{
+  float32_t fract, in;                             /* Temporary variables for input, output */
+  uint16_t indexS, indexC;                         /* Index variable */
+  float32_t f1, f2, d1, d2;                        /* Two nearest output values */
+  int32_t n;
+  float32_t findex, Dn, Df, temp;
+
+  /* input x is in degrees */
+  /* Scale the input, divide input by 360, for cosine add 0.25 (pi/2) to read sine table */
+  in = theta * 0.00277777777778f;
+
+  /* Calculation of floor value of input */
+  n = (int32_t) in;
+
+  /* Make negative values towards -infinity */
+  if(in < 0.0f)
+  {
+    n--;
+  }
+  /* Map input value to [0 1] */
+  in = in - (float32_t) n;
+
+  /* Calculation of index of the table */
+  findex = (float32_t) FAST_MATH_TABLE_SIZE * in;
+  indexS = ((uint16_t)findex) & 0x1ff;
+  indexC = (indexS + (FAST_MATH_TABLE_SIZE / 4)) & 0x1ff;
+
+  /* fractional value calculation */
+  fract = findex - (float32_t) indexS;
+
+  /* Read two nearest values of input value from the cos & sin tables */
+  f1 = sinTable_f32[indexC+0];
+  f2 = sinTable_f32[indexC+1];
+  d1 = -sinTable_f32[indexS+0];
+  d2 = -sinTable_f32[indexS+1];
+
+  Dn = 0.0122718463030f; // delta between the two points (fixed), in this case 2*pi/FAST_MATH_TABLE_SIZE
+  Df = f2 - f1; // delta between the values of the functions
+  temp = Dn*(d1 + d2) - 2*Df;
+  temp = fract*temp + (3*Df - (d2 + 2*d1)*Dn);
+  temp = fract*temp + d1*Dn;
+
+  /* Calculation of cosine value */
+  *pCosVal = fract*temp + f1;
+  
+  /* Read two nearest values of input value from the cos & sin tables */
+  f1 = sinTable_f32[indexS+0];
+  f2 = sinTable_f32[indexS+1];
+  d1 = sinTable_f32[indexC+0];
+  d2 = sinTable_f32[indexC+1];
+
+  Df = f2 - f1; // delta between the values of the functions
+  temp = Dn*(d1 + d2) - 2*Df;
+  temp = fract*temp + (3*Df - (d2 + 2*d1)*Dn);
+  temp = fract*temp + d1*Dn;
+  
+  /* Calculation of sine value */
+  *pSinVal = fract*temp + f1;
+}
+/**    
+ * @} end of SinCos group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/ControllerFunctions/arm_sin_cos_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/ControllerFunctions/arm_sin_cos_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,122 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sin_cos_q31.c    
+*    
+* Description:	Cosine & Sine calculation for Q31 values.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @ingroup groupController    
+ */
+
+ /**    
+ * @addtogroup SinCos    
+ * @{    
+ */
+
+/**    
+ * @brief  Q31 sin_cos function.   
+ * @param[in]  theta    scaled input value in degrees    
+ * @param[out] *pSinVal points to the processed sine output.    
+ * @param[out] *pCosVal points to the processed cosine output.    
+ * @return none.   
+ *    
+ * The Q31 input value is in the range [-1 0.999999] and is mapped to a degree value in the range [-180 179].   
+ *    
+ */
+
+void arm_sin_cos_q31(
+  q31_t theta,
+  q31_t * pSinVal,
+  q31_t * pCosVal)
+{
+  q31_t fract;                                 /* Temporary variables for input, output */
+  uint16_t indexS, indexC;                     /* Index variable */
+  q31_t f1, f2, d1, d2;                        /* Two nearest output values */
+  q31_t Dn, Df;
+  q63_t temp;
+  
+  /* Calculate the nearest index */
+  indexS = (uint32_t)theta >> CONTROLLER_Q31_SHIFT;
+  indexC = (indexS + 128) & 0x1ff;
+
+  /* Calculation of fractional value */
+  fract = (theta - (indexS << CONTROLLER_Q31_SHIFT)) << 8;
+  
+  /* Read two nearest values of input value from the cos & sin tables */
+  f1 = sinTable_q31[indexC+0];
+  f2 = sinTable_q31[indexC+1];
+  d1 = -sinTable_q31[indexS+0];
+  d2 = -sinTable_q31[indexS+1];
+
+  Dn = 0x1921FB5; // delta between the two points (fixed), in this case 2*pi/FAST_MATH_TABLE_SIZE
+  Df = f2 - f1; // delta between the values of the functions
+  temp = Dn*((q63_t)d1 + d2);
+  temp = temp - ((q63_t)Df << 32);
+  temp = (q63_t)fract*(temp >> 31);
+  temp = temp + ((3*(q63_t)Df << 31) - (d2 + ((q63_t)d1 << 1))*Dn);
+  temp = (q63_t)fract*(temp >> 31);
+  temp = temp + (q63_t)d1*Dn;
+  temp = (q63_t)fract*(temp >> 31);
+
+  /* Calculation of cosine value */
+  *pCosVal = clip_q63_to_q31((temp >> 31) + (q63_t)f1);
+  
+  /* Read two nearest values of input value from the cos & sin tables */
+  f1 = sinTable_q31[indexS+0];
+  f2 = sinTable_q31[indexS+1];
+  d1 = sinTable_q31[indexC+0];
+  d2 = sinTable_q31[indexC+1];
+
+  Df = f2 - f1; // delta between the values of the functions
+  temp = Dn*((q63_t)d1 + d2);
+  temp = temp - ((q63_t)Df << 32);
+  temp = (q63_t)fract*(temp >> 31);
+  temp = temp + ((3*(q63_t)Df << 31) - (d2 + ((q63_t)d1 << 1))*Dn);
+  temp = (q63_t)fract*(temp >> 31);
+  temp = temp + (q63_t)d1*Dn;
+  temp = (q63_t)fract*(temp >> 31);
+  
+  /* Calculation of sine value */
+  *pSinVal = clip_q63_to_q31((temp >> 31) + (q63_t)f1);
+}
+
+/**    
+ * @} end of SinCos group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,127 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
+*
+* $Date:        21. September 2015
+* $Revision:    V.1.4.5 a
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_cos_f32.c
+*
+* Description:  Fast cosine calculation for floating-point values.
+*
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+/**
+ * @ingroup groupFastMath
+ */
+
+/**
+ * @defgroup cos Cosine
+ *
+ * Computes the trigonometric cosine function using a combination of table lookup
+ * and linear interpolation.  There are separate functions for
+ * Q15, Q31, and floating-point data types.
+ * The input to the floating-point version is in radians while the
+ * fixed-point Q15 and Q31 have a scaled input with the range
+ * [0 +0.9999] mapping to [0 2*pi).  The fixed-point range is chosen so that a
+ * value of 2*pi wraps around to 0.
+ *
+ * The implementation is based on table lookup using 256 values together with linear interpolation.
+ * The steps used are:
+ *  -# Calculation of the nearest integer table index
+ *  -# Compute the fractional portion (fract) of the table index.
+ *  -# The final result equals <code>(1.0f-fract)*a + fract*b;</code>
+ *
+ * where
+ * <pre>
+ *    b=Table[index+0];
+ *    c=Table[index+1];
+ * </pre>
+ */
+
+ /**
+ * @addtogroup cos
+ * @{
+ */
+
+/**
+ * @brief  Fast approximation to the trigonometric cosine function for floating-point data.
+ * @param[in] x input value in radians.
+ * @return cos(x).
+ */
+
+float32_t arm_cos_f32(
+  float32_t x)
+{
+  float32_t cosVal, fract, in;                   /* Temporary variables for input, output */
+  uint16_t index;                                /* Index variable */
+  float32_t a, b;                                /* Two nearest output values */
+  int32_t n;
+  float32_t findex;
+
+  /* input x is in radians */
+  /* Scale the input to [0 1] range from [0 2*PI] , divide input by 2*pi, add 0.25 (pi/2) to read sine table */
+  in = x * 0.159154943092f + 0.25f;
+
+  /* Calculation of floor value of input */
+  n = (int32_t) in;
+
+  /* Make negative values towards -infinity */
+  if(in < 0.0f)
+  {
+    n--;
+  }
+
+  /* Map input value to [0 1] */
+  in = in - (float32_t) n;
+
+  /* Calculation of index of the table */
+  findex = (float32_t) FAST_MATH_TABLE_SIZE * in;
+  index = ((uint16_t)findex) & 0x1ff;
+
+  /* fractional value calculation */
+  fract = findex - (float32_t) index;
+
+  /* Read two nearest values of input value from the cos table */
+  a = sinTable_f32[index];
+  b = sinTable_f32[index+1];
+
+  /* Linear interpolation process */
+  cosVal = (1.0f-fract)*a + fract*b;
+
+  /* Return the output value */
+  return (cosVal);
+}
+
+/**
+ * @} end of cos group
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,96 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
+*
+* $Date:        07. September 2015
+* $Revision:    V.1.4.5 a
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_cos_q15.c
+*
+* Description:  Fast cosine calculation for Q15 values.
+*
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**
+ * @ingroup groupFastMath
+ */
+
+ /**
+ * @addtogroup cos
+ * @{
+ */
+
+/**
+ * @brief Fast approximation to the trigonometric cosine function for Q15 data.
+ * @param[in] x Scaled input value in radians.
+ * @return  cos(x).
+ *
+ * The Q15 input value is in the range [0 +0.9999] and is mapped to a radian
+ * value in the range [0 2*pi).
+ */
+
+q15_t arm_cos_q15(
+  q15_t x)
+{
+  q15_t cosVal;                                  /* Temporary variables for input, output */
+  int32_t index;                                 /* Index variables */
+  q15_t a, b;                                    /* Four nearest output values */
+  q15_t fract;                                   /* Temporary values for fractional values */
+
+  /* add 0.25 (pi/2) to read sine table */
+  x = (uint16_t)x + 0x2000;
+  if(x < 0)
+  {   /* convert negative numbers to corresponding positive ones */
+      x = (uint16_t)x + 0x8000;
+  }
+
+  /* Calculate the nearest index */
+  index = (uint32_t)x >> FAST_MATH_Q15_SHIFT;
+
+  /* Calculation of fractional value */
+  fract = (x - (index << FAST_MATH_Q15_SHIFT)) << 9;
+
+  /* Read two nearest values of input value from the sin table */
+  a = sinTable_q15[index];
+  b = sinTable_q15[index+1];
+
+  /* Linear interpolation process */
+  cosVal = (q31_t)(0x8000-fract)*a >> 16;
+  cosVal = (q15_t)((((q31_t)cosVal << 16) + ((q31_t)fract*b)) >> 16);
+
+  return cosVal << 1;
+}
+
+/**
+ * @} end of cos group
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_cos_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,96 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
+*
+* $Date:        07. September 2015
+* $Revision:    V.1.4.5 a
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_cos_q31.c
+*
+* Description: Fast cosine calculation for Q31 values.
+*
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**
+ * @ingroup groupFastMath
+ */
+
+ /**
+ * @addtogroup cos
+ * @{
+ */
+
+/**
+ * @brief Fast approximation to the trigonometric cosine function for Q31 data.
+ * @param[in] x Scaled input value in radians.
+ * @return  cos(x).
+ *
+ * The Q31 input value is in the range [0 +0.9999] and is mapped to a radian
+ * value in the range [0 2*pi).
+ */
+
+q31_t arm_cos_q31(
+  q31_t x)
+{
+  q31_t cosVal;                                  /* Temporary variables for input, output */
+  int32_t index;                                 /* Index variables */
+  q31_t a, b;                                    /* Four nearest output values */
+  q31_t fract;                                   /* Temporary values for fractional values */
+
+  /* add 0.25 (pi/2) to read sine table */
+  x = (uint32_t)x + 0x20000000;
+  if(x < 0)
+  {   /* convert negative numbers to corresponding positive ones */
+      x = (uint32_t)x + 0x80000000;
+  }
+
+  /* Calculate the nearest index */
+  index = (uint32_t)x >> FAST_MATH_Q31_SHIFT;
+
+  /* Calculation of fractional value */
+  fract = (x - (index << FAST_MATH_Q31_SHIFT)) << 9;
+
+  /* Read two nearest values of input value from the sin table */
+  a = sinTable_q31[index];
+  b = sinTable_q31[index+1];
+
+  /* Linear interpolation process */
+  cosVal = (q63_t)(0x80000000-fract)*a >> 32;
+  cosVal = (q31_t)((((q63_t)cosVal << 32) + ((q63_t)fract*b)) >> 32);
+
+  return cosVal << 1;
+}
+
+/**
+ * @} end of cos group
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,133 @@
+/* ----------------------------------------------------------------------
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
+*
+* $Date:        21. September 2015
+* $Revision:    V.1.4.5 a
+*
+* Project:      CMSIS DSP Library
+* Title:        arm_sin_f32.c
+*
+* Description:  Fast sine calculation for floating-point values.
+*
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+#include <math.h>
+
+/**
+ * @ingroup groupFastMath
+ */
+
+/**
+ * @defgroup sin Sine
+ *
+ * Computes the trigonometric sine function using a combination of table lookup
+ * and linear interpolation.  There are separate functions for
+ * Q15, Q31, and floating-point data types.
+ * The input to the floating-point version is in radians while the
+ * fixed-point Q15 and Q31 have a scaled input with the range
+ * [0 +0.9999] mapping to [0 2*pi).  The fixed-point range is chosen so that a
+ * value of 2*pi wraps around to 0.
+ *
+ * The implementation is based on table lookup using 256 values together with linear interpolation.
+ * The steps used are:
+ *  -# Calculation of the nearest integer table index
+ *  -# Compute the fractional portion (fract) of the table index.
+ *  -# The final result equals <code>(1.0f-fract)*a + fract*b;</code>
+ *
+ * where
+ * <pre>
+ *    b=Table[index+0];
+ *    c=Table[index+1];
+ * </pre>
+ */
+
+/**
+ * @addtogroup sin
+ * @{
+ */
+
+/**
+ * @brief  Fast approximation to the trigonometric sine function for floating-point data.
+ * @param[in] x input value in radians.
+ * @return  sin(x).
+ */
+
+float32_t arm_sin_f32(
+  float32_t x)
+{
+  float32_t sinVal, fract, in;                           /* Temporary variables for input, output */
+  uint16_t index;                                        /* Index variable */
+  float32_t a, b;                                        /* Two nearest output values */
+  int32_t n;
+  float32_t findex;
+
+  /* input x is in radians */
+  /* Scale the input to [0 1] range from [0 2*PI] , divide input by 2*pi */
+  in = x * 0.159154943092f;
+
+  /* Calculation of floor value of input */
+  n = (int32_t) in;
+
+  /* Make negative values towards -infinity */
+  if(x < 0.0f)
+  {
+    n--;
+  }
+
+  /* Map input value to [0 1] */
+  in = in - (float32_t) n;
+
+  /* Calculation of index of the table */
+  findex = (float32_t) FAST_MATH_TABLE_SIZE * in;
+  if (findex >= 512.0f) {
+    findex -= 512.0f;
+  }
+
+  index = ((uint16_t)findex) & 0x1ff;
+
+  /* fractional value calculation */
+  fract = findex - (float32_t) index;
+
+  /* Read two nearest values of input value from the sin table */
+  a = sinTable_f32[index];
+  b = sinTable_f32[index+1];
+
+  /* Linear interpolation process */
+  sinVal = (1.0f-fract)*a + fract*b;
+
+  /* Return the output value */
+  return (sinVal);
+}
+
+/**
+ * @} end of sin group
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,88 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sin_q15.c    
+*    
+* Description:	Fast sine calculation for Q15 values.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @ingroup groupFastMath    
+ */
+
+ /**    
+ * @addtogroup sin    
+ * @{    
+ */
+
+/**   
+ * @brief Fast approximation to the trigonometric sine function for Q15 data.   
+ * @param[in] x Scaled input value in radians.   
+ * @return  sin(x).   
+ *   
+ * The Q15 input value is in the range [0 +0.9999] and is mapped to a radian value in the range [0 2*pi).
+ */
+
+q15_t arm_sin_q15(
+  q15_t x)
+{
+  q15_t sinVal;                                  /* Temporary variables for input, output */
+  int32_t index;                                 /* Index variables */
+  q15_t a, b;                                    /* Four nearest output values */
+  q15_t fract;                                   /* Temporary values for fractional values */
+
+  /* Calculate the nearest index */
+  index = (uint32_t)x >> FAST_MATH_Q15_SHIFT;
+
+  /* Calculation of fractional value */
+  fract = (x - (index << FAST_MATH_Q15_SHIFT)) << 9;
+
+  /* Read two nearest values of input value from the sin table */
+  a = sinTable_q15[index];
+  b = sinTable_q15[index+1];
+
+  /* Linear interpolation process */
+  sinVal = (q31_t)(0x8000-fract)*a >> 16;
+  sinVal = (q15_t)((((q31_t)sinVal << 16) + ((q31_t)fract*b)) >> 16);
+
+  return sinVal << 1;
+}
+
+/**    
+ * @} end of sin group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sin_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,87 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_sin_q31.c    
+*    
+* Description:	Fast sine calculation for Q31 values.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @ingroup groupFastMath    
+ */
+
+ /**    
+ * @addtogroup sin    
+ * @{    
+ */
+
+/**   
+ * @brief Fast approximation to the trigonometric sine function for Q31 data.
+ * @param[in] x Scaled input value in radians.
+ * @return  sin(x).
+ *
+ * The Q31 input value is in the range [0 +0.9999] and is mapped to a radian value in the range [0 2*pi). */
+
+q31_t arm_sin_q31(
+  q31_t x)
+{
+  q31_t sinVal;                                  /* Temporary variables for input, output */
+  int32_t index;                                 /* Index variables */
+  q31_t a, b;                                    /* Four nearest output values */
+  q31_t fract;                                   /* Temporary values for fractional values */
+
+  /* Calculate the nearest index */
+  index = (uint32_t)x >> FAST_MATH_Q31_SHIFT;
+
+  /* Calculation of fractional value */
+  fract = (x - (index << FAST_MATH_Q31_SHIFT)) << 9;
+
+  /* Read two nearest values of input value from the sin table */
+  a = sinTable_q31[index];
+  b = sinTable_q31[index+1];
+
+  /* Linear interpolation process */
+  sinVal = (q63_t)(0x80000000-fract)*a >> 32;
+  sinVal = (q31_t)((((q63_t)sinVal << 32) + ((q63_t)fract*b)) >> 32);
+
+  return sinVal << 1;
+}
+
+/**    
+ * @} end of sin group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sqrt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sqrt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,155 @@
+/* ----------------------------------------------------------------------     
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.  
+*     
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*     
+* Project:      CMSIS DSP Library  
+* Title:		arm_sqrt_q15.c     
+*     
+* Description:	Q15 square root function.    
+*     
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+
+/**     
+ * @ingroup groupFastMath     
+ */
+
+/**     
+ * @addtogroup SQRT     
+ * @{     
+ */
+
+  /**    
+   * @brief  Q15 square root function.    
+   * @param[in]   in     input value.  The range of the input value is [0 +1) or 0x0000 to 0x7FFF.    
+   * @param[out]  *pOut  square root of input value.    
+   * @return The function returns ARM_MATH_SUCCESS if the input value is positive
+   * and ARM_MATH_ARGUMENT_ERROR if the input is negative.  For
+   * negative inputs, the function returns *pOut = 0.
+   */
+
+arm_status arm_sqrt_q15(
+  q15_t in,
+  q15_t * pOut)
+{
+  q15_t number, temp1, var1, signBits1, half;
+  q31_t bits_val1;
+  float32_t temp_float1;
+  union
+  {
+    q31_t fracval;
+    float32_t floatval;
+  } tempconv;
+
+  number = in;
+
+  /* If the input is a positive number then compute the signBits. */
+  if(number > 0)
+  {
+    signBits1 = __CLZ(number) - 17;
+
+    /* Shift by the number of signBits1 */
+    if((signBits1 % 2) == 0)
+    {
+      number = number << signBits1;
+    }
+    else
+    {
+      number = number << (signBits1 - 1);
+    }
+
+    /* Calculate half value of the number */
+    half = number >> 1;
+    /* Store the number for later use */
+    temp1 = number;
+
+    /* Convert to float */
+    temp_float1 = number * 3.051757812500000e-005f;
+    /*Store as integer */
+    tempconv.floatval = temp_float1;
+    bits_val1 = tempconv.fracval;
+    /* Subtract the shifted value from the magic number to give intial guess */
+    bits_val1 = 0x5f3759df - (bits_val1 >> 1);  /* gives initial guess */
+    /* Store as float */
+    tempconv.fracval = bits_val1;
+    temp_float1 = tempconv.floatval;
+    /* Convert to integer format */
+    var1 = (q31_t) (temp_float1 * 16384);
+
+    /* 1st iteration */
+    var1 = ((q15_t) ((q31_t) var1 * (0x3000 -
+                                     ((q15_t)
+                                      ((((q15_t)
+                                         (((q31_t) var1 * var1) >> 15)) *
+                                        (q31_t) half) >> 15))) >> 15)) << 2;
+    /* 2nd iteration */
+    var1 = ((q15_t) ((q31_t) var1 * (0x3000 -
+                                     ((q15_t)
+                                      ((((q15_t)
+                                         (((q31_t) var1 * var1) >> 15)) *
+                                        (q31_t) half) >> 15))) >> 15)) << 2;
+    /* 3rd iteration */
+    var1 = ((q15_t) ((q31_t) var1 * (0x3000 -
+                                     ((q15_t)
+                                      ((((q15_t)
+                                         (((q31_t) var1 * var1) >> 15)) *
+                                        (q31_t) half) >> 15))) >> 15)) << 2;
+
+    /* Multiply the inverse square root with the original value */
+    var1 = ((q15_t) (((q31_t) temp1 * var1) >> 15)) << 1;
+
+    /* Shift the output down accordingly */
+    if((signBits1 % 2) == 0)
+    {
+      var1 = var1 >> (signBits1 / 2);
+    }
+    else
+    {
+      var1 = var1 >> ((signBits1 - 1) / 2);
+    }
+    *pOut = var1;
+
+    return (ARM_MATH_SUCCESS);
+  }
+  /* If the number is a negative number then store zero as its square root value */
+  else
+  {
+    *pOut = 0;
+    return (ARM_MATH_ARGUMENT_ERROR);
+  }
+}
+
+/**     
+ * @} end of SQRT group     
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sqrt_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FastMathFunctions/arm_sqrt_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,153 @@
+/* ----------------------------------------------------------------------     
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.  
+*     
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*     
+* Project:      CMSIS DSP Library  
+* Title:		arm_sqrt_q31.c     
+*     
+* Description:	Q31 square root function.    
+*     
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**     
+ * @ingroup groupFastMath     
+ */
+
+/**     
+ * @addtogroup SQRT     
+ * @{     
+ */
+
+/**    
+ * @brief Q31 square root function.    
+ * @param[in]   in    input value.  The range of the input value is [0 +1) or 0x00000000 to 0x7FFFFFFF.    
+ * @param[out]  *pOut square root of input value.    
+ * @return The function returns ARM_MATH_SUCCESS if the input value is positive
+ * and ARM_MATH_ARGUMENT_ERROR if the input is negative.  For
+ * negative inputs, the function returns *pOut = 0.
+ */
+
+arm_status arm_sqrt_q31(
+  q31_t in,
+  q31_t * pOut)
+{
+  q31_t number, temp1, bits_val1, var1, signBits1, half;
+  float32_t temp_float1;
+  union
+  {
+      q31_t fracval;
+      float32_t floatval;
+  } tempconv;
+
+  number = in;
+
+  /* If the input is a positive number then compute the signBits. */
+  if(number > 0)
+  {
+    signBits1 = __CLZ(number) - 1;
+
+    /* Shift by the number of signBits1 */
+    if((signBits1 % 2) == 0)
+    {
+      number = number << signBits1;
+    }
+    else
+    {
+      number = number << (signBits1 - 1);
+    }
+
+    /* Calculate half value of the number */
+    half = number >> 1;
+    /* Store the number for later use */
+    temp1 = number;
+
+    /*Convert to float */
+    temp_float1 = number * 4.6566128731e-010f;
+    /*Store as integer */
+    tempconv.floatval = temp_float1;
+    bits_val1 = tempconv.fracval;
+    /* Subtract the shifted value from the magic number to give intial guess */
+    bits_val1 = 0x5f3759df - (bits_val1 >> 1);  /* gives initial guess */
+    /* Store as float */
+    tempconv.fracval = bits_val1;
+    temp_float1 = tempconv.floatval;
+    /* Convert to integer format */
+    var1 = (q31_t) (temp_float1 * 1073741824);
+
+    /* 1st iteration */
+    var1 = ((q31_t) ((q63_t) var1 * (0x30000000 -
+                                     ((q31_t)
+                                      ((((q31_t)
+                                         (((q63_t) var1 * var1) >> 31)) *
+                                        (q63_t) half) >> 31))) >> 31)) << 2;
+    /* 2nd iteration */
+    var1 = ((q31_t) ((q63_t) var1 * (0x30000000 -
+                                     ((q31_t)
+                                      ((((q31_t)
+                                         (((q63_t) var1 * var1) >> 31)) *
+                                        (q63_t) half) >> 31))) >> 31)) << 2;
+    /* 3rd iteration */
+    var1 = ((q31_t) ((q63_t) var1 * (0x30000000 -
+                                     ((q31_t)
+                                      ((((q31_t)
+                                         (((q63_t) var1 * var1) >> 31)) *
+                                        (q63_t) half) >> 31))) >> 31)) << 2;
+
+    /* Multiply the inverse square root with the original value */
+    var1 = ((q31_t) (((q63_t) temp1 * var1) >> 31)) << 1;
+
+    /* Shift the output down accordingly */
+    if((signBits1 % 2) == 0)
+    {
+      var1 = var1 >> (signBits1 / 2);
+    }
+    else
+    {
+      var1 = var1 >> ((signBits1 - 1) / 2);
+    }
+    *pOut = var1;
+
+    return (ARM_MATH_SUCCESS);
+  }
+  /* If the number is a negative number then store zero as its square root value */
+  else
+  {
+    *pOut = 0;
+    return (ARM_MATH_ARGUMENT_ERROR);
+  }
+}
+
+/**     
+ * @} end of SQRT group     
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_32x64_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_32x64_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,110 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_32x64_init_q31.c    
+*    
+* Description:	High precision Q31 Biquad cascade filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1_32x64    
+ * @{    
+ */
+
+/**    
+ * @details    
+ *    
+ * @param[in,out] *S           	points to an instance of the high precision Q31 Biquad cascade filter structure.    
+ * @param[in]     numStages     number of 2nd order stages in the filter.    
+ * @param[in]     *pCoeffs      points to the filter coefficients.    
+ * @param[in]     *pState       points to the state buffer.    
+ * @param[in]     postShift     Shift to be applied after the accumulator.  Varies according to the coefficients format.    
+ * @return        none    
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ *    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ...}    
+ * </pre>    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>5*numStages</code> values.    
+ *    
+ * \par    
+ * The <code>pState</code> points to state variables array and size of each state variable is 1.63 format.    
+ * Each Biquad stage has 4 state variables <code>x[n-1], x[n-2], y[n-1],</code> and <code>y[n-2]</code>.    
+ * The state variables are arranged in the state array as:    
+ * <pre>    
+ *     {x[n-1], x[n-2], y[n-1], y[n-2]}    
+ * </pre>    
+ * The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>4*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ */
+
+void arm_biquad_cas_df1_32x64_init_q31(
+  arm_biquad_cas_df1_32x64_ins_q31 * S,
+  uint8_t numStages,
+  q31_t * pCoeffs,
+  q63_t * pState,
+  uint8_t postShift)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign postShift to be applied to the output */
+  S->postShift = postShift;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 4 * numStages */
+  memset(pState, 0, (4u * (uint32_t) numStages) * sizeof(q63_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF1_32x64 group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_32x64_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_32x64_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,561 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_32x64_q31.c    
+*    
+* Description:	High precision Q31 Biquad cascade filter processing function    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup BiquadCascadeDF1_32x64 High Precision Q31 Biquad Cascade Filter    
+ *    
+ * This function implements a high precision Biquad cascade filter which operates on    
+ * Q31 data values.  The filter coefficients are in 1.31 format and the state variables    
+ * are in 1.63 format.  The double precision state variables reduce quantization noise    
+ * in the filter and provide a cleaner output.    
+ * These filters are particularly useful when implementing filters in which the    
+ * singularities are close to the unit circle.  This is common for low pass or high    
+ * pass filters with very low cutoff frequencies.    
+ *    
+ * The function operates on blocks of input and output data    
+ * and each call to the function processes <code>blockSize</code> samples through    
+ * the filter. <code>pSrc</code> and <code>pDst</code> points to input and output arrays    
+ * containing <code>blockSize</code> Q31 values.    
+ *    
+ * \par Algorithm    
+ * Each Biquad stage implements a second order filter using the difference equation:    
+ * <pre>    
+ *     y[n] = b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]    
+ * </pre>    
+ * A Direct Form I algorithm is used with 5 coefficients and 4 state variables per stage.    
+ * \image html Biquad.gif "Single Biquad filter stage"    
+ * Coefficients <code>b0, b1, and b2 </code> multiply the input signal <code>x[n]</code> and are referred to as the feedforward coefficients.    
+ * Coefficients <code>a1</code> and <code>a2</code> multiply the output signal <code>y[n]</code> and are referred to as the feedback coefficients.    
+ * Pay careful attention to the sign of the feedback coefficients.    
+ * Some design tools use the difference equation    
+ * <pre>    
+ *     y[n] = b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] - a1 * y[n-1] - a2 * y[n-2]    
+ * </pre>    
+ * In this case the feedback coefficients <code>a1</code> and <code>a2</code> must be negated when used with the CMSIS DSP Library.    
+ *    
+ * \par    
+ * Higher order filters are realized as a cascade of second order sections.    
+ * <code>numStages</code> refers to the number of second order stages used.    
+ * For example, an 8th order filter would be realized with <code>numStages=4</code> second order stages.    
+ * \image html BiquadCascade.gif "8th order filter using a cascade of Biquad stages"    
+ * A 9th order filter would be realized with <code>numStages=5</code> second order stages with the coefficients for one of the stages configured as a first order filter (<code>b2=0</code> and <code>a2=0</code>).    
+ *    
+ * \par    
+ * The <code>pState</code> points to state variables array .    
+ * Each Biquad stage has 4 state variables <code>x[n-1], x[n-2], y[n-1],</code> and <code>y[n-2]</code> and each state variable in 1.63 format to improve precision.    
+ * The state variables are arranged in the array as:    
+ * <pre>    
+ *     {x[n-1], x[n-2], y[n-1], y[n-2]}    
+ * </pre>    
+ *    
+ * \par    
+ * The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>4*numStages</code> values of data in 1.63 format.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ *    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.    
+ *    
+ * \par Init Function    
+ * There is also an associated initialization function which performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numStages, pCoeffs, postShift, pState. Also set all of the values in pState to zero. 
+ *
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * Set the values in the state buffer to zeros before static initialization.    
+ * For example, to statically initialize the filter instance structure use    
+ * <pre>    
+ *     arm_biquad_cas_df1_32x64_ins_q31 S1 = {numStages, pState, pCoeffs, postShift};    
+ * </pre>    
+ * where <code>numStages</code> is the number of Biquad stages in the filter; <code>pState</code> is the address of the state buffer;    
+ * <code>pCoeffs</code> is the address of the coefficient buffer; <code>postShift</code> shift to be applied which is described in detail below.    
+ * \par Fixed-Point Behavior    
+ * Care must be taken while using Biquad Cascade 32x64 filter function.    
+ * Following issues must be considered:    
+ * - Scaling of coefficients    
+ * - Filter gain    
+ * - Overflow and saturation    
+ *    
+ * \par    
+ * Filter coefficients are represented as fractional values and    
+ * restricted to lie in the range <code>[-1 +1)</code>.    
+ * The processing function has an additional scaling parameter <code>postShift</code>    
+ * which allows the filter coefficients to exceed the range <code>[+1 -1)</code>.    
+ * At the output of the filter's accumulator is a shift register which shifts the result by <code>postShift</code> bits.    
+ * \image html BiquadPostshift.gif "Fixed-point Biquad with shift by postShift bits after accumulator"    
+ * This essentially scales the filter coefficients by <code>2^postShift</code>.    
+ * For example, to realize the coefficients    
+ * <pre>    
+ *    {1.5, -0.8, 1.2, 1.6, -0.9}    
+ * </pre>    
+ * set the Coefficient array to:    
+ * <pre>    
+ *    {0.75, -0.4, 0.6, 0.8, -0.45}    
+ * </pre>    
+ * and set <code>postShift=1</code>    
+ *    
+ * \par    
+ * The second thing to keep in mind is the gain through the filter.    
+ * The frequency response of a Biquad filter is a function of its coefficients.    
+ * It is possible for the gain through the filter to exceed 1.0 meaning that the filter increases the amplitude of certain frequencies.    
+ * This means that an input signal with amplitude < 1.0 may result in an output > 1.0 and these are saturated or overflowed based on the implementation of the filter.    
+ * To avoid this behavior the filter needs to be scaled down such that its peak gain < 1.0 or the input signal must be scaled down so that the combination of input and filter are never overflowed.    
+ *    
+ * \par    
+ * The third item to consider is the overflow and saturation behavior of the fixed-point Q31 version.    
+ * This is described in the function specific documentation below.    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1_32x64    
+ * @{    
+ */
+
+/**    
+ * @details    
+    
+ * @param[in]  *S points to an instance of the high precision Q31 Biquad cascade filter.    
+ * @param[in]  *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in]  blockSize number of samples to process.    
+ * @return none.    
+ *    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around rather than clip.    
+ * In order to avoid overflows completely the input signal must be scaled down by 2 bits and lie in the range [-0.25 +0.25).    
+ * After all 5 multiply-accumulates are performed, the 2.62 accumulator is shifted by <code>postShift</code> bits and the result truncated to    
+ * 1.31 format by discarding the low 32 bits.    
+ *    
+ * \par    
+ * Two related functions are provided in the CMSIS DSP library.    
+ * <code>arm_biquad_cascade_df1_q31()</code> implements a Biquad cascade with 32-bit coefficients and state variables with a Q63 accumulator.    
+ * <code>arm_biquad_cascade_df1_fast_q31()</code> implements a Biquad cascade with 32-bit coefficients and state variables with a Q31 accumulator.    
+ */
+
+void arm_biquad_cas_df1_32x64_q31(
+  const arm_biquad_cas_df1_32x64_ins_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pIn = pSrc;                             /*  input pointer initialization  */
+  q31_t *pOut = pDst;                            /*  output pointer initialization */
+  q63_t *pState = S->pState;                     /*  state pointer initialization  */
+  q31_t *pCoeffs = S->pCoeffs;                   /*  coeff pointer initialization  */
+  q63_t acc;                                     /*  accumulator                   */
+  q31_t Xn1, Xn2;                                /*  Input Filter state variables        */
+  q63_t Yn1, Yn2;                                /*  Output Filter state variables        */
+  q31_t b0, b1, b2, a1, a2;                      /*  Filter coefficients           */
+  q31_t Xn;                                      /*  temporary input               */
+  int32_t shift = (int32_t) S->postShift + 1;    /*  Shift to be applied to the output */
+  uint32_t sample, stage = S->numStages;         /*  loop counters                     */
+  q31_t acc_l, acc_h;                            /*  temporary output               */
+  uint32_t uShift = ((uint32_t) S->postShift + 1u);
+  uint32_t lShift = 32u - uShift;                /*  Shift to be applied to the output */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the state values */
+    Xn1 = (q31_t) (pState[0]);
+    Xn2 = (q31_t) (pState[1]);
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /* Apply loop unrolling and compute 4 output values simultaneously. */
+    /* The variable acc hold output value that is being computed and    
+     * stored in the destination buffer    
+     * acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]    
+     */
+
+    sample = blockSize >> 2u;
+
+    /* First part of the processing with loop unrolling. Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) Xn *b0;
+
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) Xn1 *b1;
+
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) Xn2 *b2;
+
+      /* acc +=  a1 * y[n-1] */
+      acc += mult32x64(Yn1, a1);
+
+      /* acc +=  a2 * y[n-2] */
+      acc += mult32x64(Yn2, a2);
+
+      /* The result is converted to 1.63 , Yn2 variable is reused */
+      Yn2 = acc << shift;
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      acc_h = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer in 1.31 format. */
+      *pOut = acc_h;
+
+      /* Read the second input into Xn2, to reuse the value */
+      Xn2 = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc +=  b1 * x[n-1] */
+      acc = (q63_t) Xn *b1;
+
+      /* acc =  b0 * x[n] */
+      acc += (q63_t) Xn2 *b0;
+
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) Xn1 *b2;
+
+      /* acc +=  a1 * y[n-1] */
+      acc += mult32x64(Yn2, a1);
+
+      /* acc +=  a2 * y[n-2] */
+      acc += mult32x64(Yn1, a2);
+
+      /* The result is converted to 1.63, Yn1 variable is reused */
+      Yn1 = acc << shift;
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      acc_h = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Read the third input into Xn1, to reuse the value */
+      Xn1 = *pIn++;
+
+      /* The result is converted to 1.31 */
+      /* Store the output in the destination buffer. */
+      *(pOut + 1u) = acc_h;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) Xn1 *b0;
+
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) Xn2 *b1;
+
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) Xn *b2;
+
+      /* acc +=  a1 * y[n-1] */
+      acc += mult32x64(Yn1, a1);
+
+      /* acc +=  a2 * y[n-2] */
+      acc += mult32x64(Yn2, a2);
+
+      /* The result is converted to 1.63, Yn2 variable is reused  */
+      Yn2 = acc << shift;
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      acc_h = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer in 1.31 format. */
+      *(pOut + 2u) = acc_h;
+
+      /* Read the fourth input into Xn, to reuse the value */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) Xn *b0;
+
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) Xn1 *b1;
+
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) Xn2 *b2;
+
+      /* acc +=  a1 * y[n-1] */
+      acc += mult32x64(Yn2, a1);
+
+      /* acc +=  a2 * y[n-2] */
+      acc += mult32x64(Yn1, a2);
+
+      /* The result is converted to 1.63, Yn1 variable is reused  */
+      Yn1 = acc << shift;
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      acc_h = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer in 1.31 format. */
+      *(pOut + 3u) = acc_h;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+
+      /* update output pointer */
+      pOut += 4u;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    sample = (blockSize & 0x3u);
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) Xn *b0;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) Xn1 *b1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) Xn2 *b2;
+      /* acc +=  a1 * y[n-1] */
+      acc += mult32x64(Yn1, a1);
+      /* acc +=  a2 * y[n-2] */
+      acc += mult32x64(Yn2, a2);
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      /* The result is converted to 1.63, Yn1 variable is reused  */
+      Yn1 = acc << shift;
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      acc_h = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer in 1.31 format. */
+      *pOut++ = acc_h;
+      /* Yn1 = acc << shift; */
+
+      /* Store the output in the destination buffer in 1.31 format. */
+/*      *pOut++ = (q31_t) (acc >> (32 - shift));  */
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /*  The first stage output is given as input to the second stage. */
+    pIn = pDst;
+
+    /* Reset to destination buffer working pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the pState array */
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = (q63_t) Xn1;
+    *pState++ = (q63_t) Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+  } while(--stage);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the state values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /* The variable acc hold output value that is being computed and        
+     * stored in the destination buffer            
+     * acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]            
+     */
+
+    sample = blockSize;
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) Xn *b0;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) Xn1 *b1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) Xn2 *b2;
+      /* acc +=  a1 * y[n-1] */
+      acc += mult32x64(Yn1, a1);
+      /* acc +=  a2 * y[n-2] */
+      acc += mult32x64(Yn2, a2);
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+
+      /* The result is converted to 1.63, Yn1 variable is reused  */
+      Yn1 = acc << shift;
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      acc_h = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer in 1.31 format. */
+      *pOut++ = acc_h;
+
+      /* Yn1 = acc << shift; */
+
+      /* Store the output in the destination buffer in 1.31 format. */
+      /* *pOut++ = (q31_t) (acc >> (32 - shift)); */
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /*  The first stage output is given as input to the second stage. */
+    pIn = pDst;
+
+    /* Reset to destination buffer working pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = (q63_t) Xn1;
+    *pState++ = (q63_t) Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+  } while(--stage);
+
+#endif /*    #ifndef ARM_MATH_CM0_FAMILY     */
+}
+
+  /**    
+   * @} end of BiquadCascadeDF1_32x64 group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,425 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_f32.c    
+*    
+* Description:	Processing function for the    
+*               floating-point Biquad cascade DirectFormI(DF1) filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup BiquadCascadeDF1 Biquad Cascade IIR Filters Using Direct Form I Structure    
+ *    
+ * This set of functions implements arbitrary order recursive (IIR) filters.    
+ * The filters are implemented as a cascade of second order Biquad sections.    
+ * The functions support Q15, Q31 and floating-point data types.  
+ * Fast version of Q15 and Q31 also supported on CortexM4 and Cortex-M3.    
+ *    
+ * The functions operate on blocks of input and output data and each call to the function    
+ * processes <code>blockSize</code> samples through the filter.    
+ * <code>pSrc</code> points to the array of input data and    
+ * <code>pDst</code> points to the array of output data.    
+ * Both arrays contain <code>blockSize</code> values.    
+ *    
+ * \par Algorithm    
+ * Each Biquad stage implements a second order filter using the difference equation:    
+ * <pre>    
+ *     y[n] = b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]    
+ * </pre>    
+ * A Direct Form I algorithm is used with 5 coefficients and 4 state variables per stage.    
+ * \image html Biquad.gif "Single Biquad filter stage"    
+ * Coefficients <code>b0, b1 and b2 </code> multiply the input signal <code>x[n]</code> and are referred to as the feedforward coefficients.    
+ * Coefficients <code>a1</code> and <code>a2</code> multiply the output signal <code>y[n]</code> and are referred to as the feedback coefficients.    
+ * Pay careful attention to the sign of the feedback coefficients.    
+ * Some design tools use the difference equation    
+ * <pre>    
+ *     y[n] = b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] - a1 * y[n-1] - a2 * y[n-2]    
+ * </pre>    
+ * In this case the feedback coefficients <code>a1</code> and <code>a2</code> must be negated when used with the CMSIS DSP Library.    
+ *    
+ * \par    
+ * Higher order filters are realized as a cascade of second order sections.    
+ * <code>numStages</code> refers to the number of second order stages used.    
+ * For example, an 8th order filter would be realized with <code>numStages=4</code> second order stages.    
+ * \image html BiquadCascade.gif "8th order filter using a cascade of Biquad stages"    
+ * A 9th order filter would be realized with <code>numStages=5</code> second order stages with the coefficients for one of the stages configured as a first order filter (<code>b2=0</code> and <code>a2=0</code>).    
+ *    
+ * \par    
+ * The <code>pState</code> points to state variables array.    
+ * Each Biquad stage has 4 state variables <code>x[n-1], x[n-2], y[n-1],</code> and <code>y[n-2]</code>.    
+ * The state variables are arranged in the <code>pState</code> array as:    
+ * <pre>    
+ *     {x[n-1], x[n-2], y[n-1], y[n-2]}    
+ * </pre>    
+ *    
+ * \par    
+ * The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>4*numStages</code> values.    
+ * The state variables are updated after each block of data is processed, the coefficients are untouched.    
+ *    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+ *    
+ * \par Init Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numStages, pCoeffs, pState. Also set all of the values in pState to zero. 
+ *    
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * Set the values in the state buffer to zeros before static initialization.    
+ * The code below statically initializes each of the 3 different data type filter instance structures    
+ * <pre>    
+ *     arm_biquad_casd_df1_inst_f32 S1 = {numStages, pState, pCoeffs};    
+ *     arm_biquad_casd_df1_inst_q15 S2 = {numStages, pState, pCoeffs, postShift};    
+ *     arm_biquad_casd_df1_inst_q31 S3 = {numStages, pState, pCoeffs, postShift};    
+ * </pre>    
+ * where <code>numStages</code> is the number of Biquad stages in the filter; <code>pState</code> is the address of the state buffer;    
+ * <code>pCoeffs</code> is the address of the coefficient buffer; <code>postShift</code> shift to be applied.    
+ *    
+ * \par Fixed-Point Behavior    
+ * Care must be taken when using the Q15 and Q31 versions of the Biquad Cascade filter functions.    
+ * Following issues must be considered:    
+ * - Scaling of coefficients    
+ * - Filter gain    
+ * - Overflow and saturation    
+ *    
+ * \par    
+ * <b>Scaling of coefficients: </b>    
+ * Filter coefficients are represented as fractional values and    
+ * coefficients are restricted to lie in the range <code>[-1 +1)</code>.    
+ * The fixed-point functions have an additional scaling parameter <code>postShift</code>    
+ * which allow the filter coefficients to exceed the range <code>[+1 -1)</code>.    
+ * At the output of the filter's accumulator is a shift register which shifts the result by <code>postShift</code> bits.    
+ * \image html BiquadPostshift.gif "Fixed-point Biquad with shift by postShift bits after accumulator"    
+ * This essentially scales the filter coefficients by <code>2^postShift</code>.    
+ * For example, to realize the coefficients    
+ * <pre>    
+ *    {1.5, -0.8, 1.2, 1.6, -0.9}    
+ * </pre>    
+ * set the pCoeffs array to:    
+ * <pre>    
+ *    {0.75, -0.4, 0.6, 0.8, -0.45}    
+ * </pre>    
+ * and set <code>postShift=1</code>    
+ *    
+ * \par    
+ * <b>Filter gain: </b>    
+ * The frequency response of a Biquad filter is a function of its coefficients.    
+ * It is possible for the gain through the filter to exceed 1.0 meaning that the filter increases the amplitude of certain frequencies.    
+ * This means that an input signal with amplitude < 1.0 may result in an output > 1.0 and these are saturated or overflowed based on the implementation of the filter.    
+ * To avoid this behavior the filter needs to be scaled down such that its peak gain < 1.0 or the input signal must be scaled down so that the combination of input and filter are never overflowed.    
+ *    
+ * \par    
+ * <b>Overflow and saturation: </b>    
+ * For Q15 and Q31 versions, it is described separately as part of the function specific documentation below.    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @param[in]  *S         points to an instance of the floating-point Biquad cascade structure.    
+ * @param[in]  *pSrc      points to the block of input data.    
+ * @param[out] *pDst      points to the block of output data.    
+ * @param[in]  blockSize  number of samples to process per call.    
+ * @return     none.    
+ *    
+ */
+
+void arm_biquad_cascade_df1_f32(
+  const arm_biquad_casd_df1_inst_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pIn = pSrc;                         /*  source pointer            */
+  float32_t *pOut = pDst;                        /*  destination pointer       */
+  float32_t *pState = S->pState;                 /*  pState pointer            */
+  float32_t *pCoeffs = S->pCoeffs;               /*  coefficient pointer       */
+  float32_t acc;                                 /*  Simulates the accumulator */
+  float32_t b0, b1, b2, a1, a2;                  /*  Filter coefficients       */
+  float32_t Xn1, Xn2, Yn1, Yn2;                  /*  Filter pState variables   */
+  float32_t Xn;                                  /*  temporary input           */
+  uint32_t sample, stage = S->numStages;         /*  loop counters             */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the pState values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /* Apply loop unrolling and compute 4 output values simultaneously. */
+    /*      The variable acc hold output values that are being computed:    
+     *    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1]   + a2 * y[n-2]    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1]   + a2 * y[n-2]    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1]   + a2 * y[n-2]    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1]   + a2 * y[n-2]    
+     */
+
+    sample = blockSize >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(sample > 0u)
+    {
+      /* Read the first input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      Yn2 = (b0 * Xn) + (b1 * Xn1) + (b2 * Xn2) + (a1 * Yn1) + (a2 * Yn2);
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = Yn2;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+
+      /* Read the second input */
+      Xn2 = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      Yn1 = (b0 * Xn2) + (b1 * Xn) + (b2 * Xn1) + (a1 * Yn2) + (a2 * Yn1);
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = Yn1;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+
+      /* Read the third input */
+      Xn1 = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      Yn2 = (b0 * Xn1) + (b1 * Xn2) + (b2 * Xn) + (a1 * Yn1) + (a2 * Yn2);
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = Yn2;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as: */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+
+      /* Read the forth input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      Yn1 = (b0 * Xn) + (b1 * Xn1) + (b2 * Xn2) + (a1 * Yn2) + (a2 * Yn1);
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = Yn1;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+
+      /* decrement the loop counter */
+      sample--;
+
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    sample = blockSize & 0x3u;
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      acc = (b0 * Xn) + (b1 * Xn1) + (b2 * Xn2) + (a1 * Yn1) + (a2 * Yn2);
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = acc;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:    */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      Yn1 = acc;
+
+      /* decrement the loop counter */
+      sample--;
+
+    }
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = Xn1;
+    *pState++ = Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+    /*  The first stage goes from the input buffer to the output buffer. */
+    /*  Subsequent numStages  occur in-place in the output buffer */
+    pIn = pDst;
+
+    /* Reset the output pointer */
+    pOut = pDst;
+
+    /* decrement the loop counter */
+    stage--;
+
+  } while(stage > 0u);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the pState values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /*      The variables acc holds the output value that is computed:        
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1]   + a2 * y[n-2]        
+     */
+
+    sample = blockSize;
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      acc = (b0 * Xn) + (b1 * Xn1) + (b2 * Xn2) + (a1 * Yn1) + (a2 * Yn2);
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = acc;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:    */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      Yn1 = acc;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = Xn1;
+    *pState++ = Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+    /*  The first stage goes from the input buffer to the output buffer. */
+    /*  Subsequent numStages  occur in-place in the output buffer */
+    pIn = pDst;
+
+    /* Reset the output pointer */
+    pOut = pDst;
+
+    /* decrement the loop counter */
+    stage--;
+
+  } while(stage > 0u);
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY         */
+
+}
+
+
+  /**    
+   * @} end of BiquadCascadeDF1 group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,286 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_fast_q15.c    
+*    
+* Description:	Fast processing function for the    
+*				Q15 Biquad cascade filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @details    
+ * @param[in]  *S points to an instance of the Q15 Biquad cascade structure.    
+ * @param[in]  *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in]  blockSize number of samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * This fast version uses a 32-bit accumulator with 2.30 format.    
+ * The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around and distorts the result.    
+ * In order to avoid overflows completely the input signal must be scaled down by two bits and lie in the range [-0.25 +0.25).    
+ * The 2.30 accumulator is then shifted by <code>postShift</code> bits and the result truncated to 1.15 format by discarding the low 16 bits.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_biquad_cascade_df1_q15()</code> for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.  Both the slow and the fast versions use the same instance structure.    
+ * Use the function <code>arm_biquad_cascade_df1_init_q15()</code> to initialize the filter structure.    
+ *    
+ */
+
+void arm_biquad_cascade_df1_fast_q15(
+  const arm_biquad_casd_df1_inst_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pIn = pSrc;                             /*  Source pointer                               */
+  q15_t *pOut = pDst;                            /*  Destination pointer                          */
+  q31_t in;                                      /*  Temporary variable to hold input value       */
+  q31_t out;                                     /*  Temporary variable to hold output value      */
+  q31_t b0;                                      /*  Temporary variable to hold bo value          */
+  q31_t b1, a1;                                  /*  Filter coefficients                          */
+  q31_t state_in, state_out;                     /*  Filter state variables                       */
+  q31_t acc;                                     /*  Accumulator                                  */
+  int32_t shift = (int32_t) (15 - S->postShift); /*  Post shift                                   */
+  q15_t *pState = S->pState;                     /*  State pointer                                */
+  q15_t *pCoeffs = S->pCoeffs;                   /*  Coefficient pointer                          */
+  uint32_t sample, stage = S->numStages;         /*  Stage loop counter                           */
+
+
+
+  do
+  {
+
+    /* Read the b0 and 0 coefficients using SIMD  */
+    b0 = *__SIMD32(pCoeffs)++;
+
+    /* Read the b1 and b2 coefficients using SIMD */
+    b1 = *__SIMD32(pCoeffs)++;
+
+    /* Read the a1 and a2 coefficients using SIMD */
+    a1 = *__SIMD32(pCoeffs)++;
+
+    /* Read the input state values from the state buffer:  x[n-1], x[n-2] */
+    state_in = *__SIMD32(pState)++;
+
+    /* Read the output state values from the state buffer:  y[n-1], y[n-2] */
+    state_out = *__SIMD32(pState)--;
+
+    /* Apply loop unrolling and compute 2 output values simultaneously. */
+    /*      The variable acc hold output values that are being computed:       
+     *    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]       
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]       
+     */
+    sample = blockSize >> 1u;
+
+    /* First part of the processing with loop unrolling.  Compute 2 outputs at a time.    
+     ** a second loop below computes the remaining 1 sample. */
+    while(sample > 0u)
+    {
+
+      /* Read the input */
+      in = *__SIMD32(pIn)++;
+
+      /* out =  b0 * x[n] + 0 * 0 */
+      out = __SMUAD(b0, in);
+      /* acc =  b1 * x[n-1] + acc +=  b2 * x[n-2] + out */
+      acc = __SMLAD(b1, state_in, out);
+      /* acc +=  a1 * y[n-1] + acc +=  a2 * y[n-2] */
+      acc = __SMLAD(a1, state_out, acc);
+
+      /* The result is converted from 3.29 to 1.31 and then saturation is applied */
+      out = __SSAT((acc >> shift), 16);
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      /* x[n-N], x[n-N-1] are packed together to make state_in of type q31 */
+      /* y[n-N], y[n-N-1] are packed together to make state_out of type q31 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      state_in = __PKHBT(in, state_in, 16);
+      state_out = __PKHBT(out, state_out, 16);
+
+#else
+
+      state_in = __PKHBT(state_in >> 16, (in >> 16), 16);
+      state_out = __PKHBT(state_out >> 16, (out), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* out =  b0 * x[n] + 0 * 0 */
+      out = __SMUADX(b0, in);
+      /* acc0 =  b1 * x[n-1] , acc0 +=  b2 * x[n-2] + out */
+      acc = __SMLAD(b1, state_in, out);
+      /* acc +=  a1 * y[n-1] + acc +=  a2 * y[n-2] */
+      acc = __SMLAD(a1, state_out, acc);
+
+      /* The result is converted from 3.29 to 1.31 and then saturation is applied */
+      out = __SSAT((acc >> shift), 16);
+
+
+      /* Store the output in the destination buffer. */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ = __PKHBT(state_out, out, 16);
+
+#else
+
+      *__SIMD32(pOut)++ = __PKHBT(out, state_out >> 16, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      /* x[n-N], x[n-N-1] are packed together to make state_in of type q31 */
+      /* y[n-N], y[n-N-1] are packed together to make state_out of type q31 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      state_in = __PKHBT(in >> 16, state_in, 16);
+      state_out = __PKHBT(out, state_out, 16);
+
+#else
+
+      state_in = __PKHBT(state_in >> 16, in, 16);
+      state_out = __PKHBT(state_out >> 16, out, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+
+      /* Decrement the loop counter */
+      sample--;
+
+    }
+
+    /* If the blockSize is not a multiple of 2, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+
+    if((blockSize & 0x1u) != 0u)
+    {
+      /* Read the input */
+      in = *pIn++;
+
+      /* out =  b0 * x[n] + 0 * 0 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      out = __SMUAD(b0, in);
+
+#else
+
+      out = __SMUADX(b0, in);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* acc =  b1 * x[n-1], acc +=  b2 * x[n-2] + out */
+      acc = __SMLAD(b1, state_in, out);
+      /* acc +=  a1 * y[n-1] + acc +=  a2 * y[n-2] */
+      acc = __SMLAD(a1, state_out, acc);
+
+      /* The result is converted from 3.29 to 1.31 and then saturation is applied */
+      out = __SSAT((acc >> shift), 16);
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = (q15_t) out;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      /* x[n-N], x[n-N-1] are packed together to make state_in of type q31 */
+      /* y[n-N], y[n-N-1] are packed together to make state_out of type q31 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      state_in = __PKHBT(in, state_in, 16);
+      state_out = __PKHBT(out, state_out, 16);
+
+#else
+
+      state_in = __PKHBT(state_in >> 16, in, 16);
+      state_out = __PKHBT(state_out >> 16, out, 16);
+
+#endif /*   #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+    }
+
+    /*  The first stage goes from the input buffer to the output buffer.  */
+    /*  Subsequent (numStages - 1) occur in-place in the output buffer  */
+    pIn = pDst;
+
+    /* Reset the output pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the state array */
+    *__SIMD32(pState)++ = state_in;
+    *__SIMD32(pState)++ = state_out;
+
+
+    /* Decrement the loop counter */
+    stage--;
+
+  } while(stage > 0u);
+}
+
+
+/**    
+ * @} end of BiquadCascadeDF1 group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,305 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. October 2015
+* $Revision: 	V.1.4.5 a
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_fast_q31.c    
+*    
+* Description:	Processing function for the    
+*				Q31 Fast Biquad cascade DirectFormI(DF1) filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @details    
+ *    
+ * @param[in]  *S        points to an instance of the Q31 Biquad cascade structure.    
+ * @param[in]  *pSrc     points to the block of input data.    
+ * @param[out] *pDst     points to the block of output data.    
+ * @param[in]  blockSize number of samples to process per call.    
+ * @return 	   none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * This function is optimized for speed at the expense of fixed-point precision and overflow protection.    
+ * The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format.    
+ * These intermediate results are added to a 2.30 accumulator.    
+ * Finally, the accumulator is saturated and converted to a 1.31 result.    
+ * The fast version has the same overflow behavior as the standard version and provides less precision since it discards the low 32 bits of each multiplication result.    
+ * In order to avoid overflows completely the input signal must be scaled down by two bits and lie in the range [-0.25 +0.25). Use the intialization function    
+ * arm_biquad_cascade_df1_init_q31() to initialize filter structure.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_biquad_cascade_df1_q31()</code> for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.  Both the slow and the fast versions use the same instance structure.    
+ * Use the function <code>arm_biquad_cascade_df1_init_q31()</code> to initialize the filter structure.    
+ */
+
+void arm_biquad_cascade_df1_fast_q31(
+  const arm_biquad_casd_df1_inst_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t acc = 0;                                 /*  accumulator                   */
+  q31_t Xn1, Xn2, Yn1, Yn2;                      /*  Filter state variables        */
+  q31_t b0, b1, b2, a1, a2;                      /*  Filter coefficients           */
+  q31_t *pIn = pSrc;                             /*  input pointer initialization  */
+  q31_t *pOut = pDst;                            /*  output pointer initialization */
+  q31_t *pState = S->pState;                     /*  pState pointer initialization */
+  q31_t *pCoeffs = S->pCoeffs;                   /*  coeff pointer initialization  */
+  q31_t Xn;                                      /*  temporary input               */
+  int32_t shift = (int32_t) S->postShift + 1;    /*  Shift to be applied to the output */
+  uint32_t sample, stage = S->numStages;         /*  loop counters                     */
+
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the state values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /* Apply loop unrolling and compute 4 output values simultaneously. */
+    /*      The variables acc ... acc3 hold output values that are being computed:       
+     *       
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]       
+     */
+
+    sample = blockSize >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      /*acc = (q31_t) (((q63_t) b1 * Xn1) >> 32);*/
+      mult_32x32_keep32_R(acc, b1, Xn1);
+      /* acc +=  b1 * x[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b0 * (Xn))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b0, Xn);
+      /* acc +=  b[2] * x[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b2 * (Xn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b2, Xn2);
+      /* acc +=  a1 * y[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a1 * (Yn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a1, Yn1);
+      /* acc +=  a2 * y[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a2 * (Yn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a2, Yn2);
+
+      /* The result is converted to 1.31 , Yn2 variable is reused */
+      Yn2 = acc << shift;
+
+      /* Read the second input */
+      Xn2 = *(pIn + 1u);
+
+      /* Store the output in the destination buffer. */
+      *pOut = Yn2;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      /*acc = (q31_t) (((q63_t) b0 * (Xn2)) >> 32);*/
+      mult_32x32_keep32_R(acc, b0, Xn2);
+      /* acc +=  b1 * x[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b1 * (Xn))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b1, Xn);
+      /* acc +=  b[2] * x[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b2 * (Xn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b2, Xn1);
+      /* acc +=  a1 * y[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a1 * (Yn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a1, Yn2);
+      /* acc +=  a2 * y[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a2 * (Yn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a2, Yn1);
+
+      /* The result is converted to 1.31, Yn1 variable is reused  */
+      Yn1 = acc << shift;
+
+      /* Read the third input  */
+      Xn1 = *(pIn + 2u);
+
+      /* Store the output in the destination buffer. */
+      *(pOut + 1u) = Yn1;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      /*acc = (q31_t) (((q63_t) b0 * (Xn1)) >> 32);*/
+      mult_32x32_keep32_R(acc, b0, Xn1);
+      /* acc +=  b1 * x[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b1 * (Xn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b1, Xn2);
+      /* acc +=  b[2] * x[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b2 * (Xn))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b2, Xn);
+      /* acc +=  a1 * y[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a1 * (Yn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a1, Yn1);
+      /* acc +=  a2 * y[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a2 * (Yn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a2, Yn2);
+
+      /* The result is converted to 1.31, Yn2 variable is reused  */
+      Yn2 = acc << shift;
+
+      /* Read the forth input */
+      Xn = *(pIn + 3u);
+
+      /* Store the output in the destination buffer. */
+      *(pOut + 2u) = Yn2;
+      pIn += 4u;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      /*acc = (q31_t) (((q63_t) b0 * (Xn)) >> 32);*/
+      mult_32x32_keep32_R(acc, b0, Xn);
+      /* acc +=  b1 * x[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b1 * (Xn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b1, Xn1);
+      /* acc +=  b[2] * x[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b2 * (Xn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b2, Xn2);
+      /* acc +=  a1 * y[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a1 * (Yn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a1, Yn2);
+      /* acc +=  a2 * y[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a2 * (Yn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a2, Yn1);
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      Xn2 = Xn1;
+
+      /* The result is converted to 1.31, Yn1 variable is reused  */
+      Yn1 = acc << shift;
+
+      /* Xn1 = Xn     */
+      Xn1 = Xn;
+
+      /* Store the output in the destination buffer. */
+      *(pOut + 3u) = Yn1;
+      pOut += 4u;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.       
+     ** No loop unrolling is used. */
+    sample = (blockSize & 0x3u);
+
+   while(sample > 0u)
+   {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      /*acc = (q31_t) (((q63_t) b0 * (Xn)) >> 32);*/
+      mult_32x32_keep32_R(acc, b0, Xn);
+      /* acc +=  b1 * x[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b1 * (Xn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b1, Xn1);
+      /* acc +=  b[2] * x[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) b2 * (Xn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, b2, Xn2);
+      /* acc +=  a1 * y[n-1] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a1 * (Yn1))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a1, Yn1);
+      /* acc +=  a2 * y[n-2] */
+      /*acc = (q31_t) ((((q63_t) acc << 32) + ((q63_t) a2 * (Yn2))) >> 32);*/
+      multAcc_32x32_keep32_R(acc, a2, Yn2);
+
+      /* The result is converted to 1.31  */
+      acc = acc << shift;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      Yn1 = acc;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = acc;
+
+      /* decrement the loop counter */
+      sample--;
+   }
+
+    /*  The first stage goes from the input buffer to the output buffer. */
+    /*  Subsequent stages occur in-place in the output buffer */
+    pIn = pDst;
+
+    /* Reset to destination pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = Xn1;
+    *pState++ = Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+  } while(--stage);
+}
+
+/**    
+  * @} end of BiquadCascadeDF1 group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,109 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_biquad_cascade_df1_init_f32.c    
+*    
+* Description:  floating-point Biquad cascade DirectFormI(DF1) filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @details    
+ * @brief  Initialization function for the floating-point Biquad cascade filter.    
+ * @param[in,out] *S           points to an instance of the floating-point Biquad cascade structure.    
+ * @param[in]     numStages    number of 2nd order stages in the filter.    
+ * @param[in]     *pCoeffs     points to the filter coefficients array.    
+ * @param[in]     *pState      points to the state array.    
+ * @return        none    
+ *    
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ *    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ...}    
+ * </pre>    
+ *    
+ * \par    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>5*numStages</code> values.    
+ *    
+ * \par    
+ * The <code>pState</code> is a pointer to state array.    
+ * Each Biquad stage has 4 state variables <code>x[n-1], x[n-2], y[n-1],</code> and <code>y[n-2]</code>.    
+ * The state variables are arranged in the <code>pState</code> array as:    
+ * <pre>    
+ *     {x[n-1], x[n-2], y[n-1], y[n-2]}    
+ * </pre>    
+ * The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>4*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ *    
+ */
+
+void arm_biquad_cascade_df1_init_f32(
+  arm_biquad_casd_df1_inst_f32 * S,
+  uint8_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 4 * numStages */
+  memset(pState, 0, (4u * (uint32_t) numStages) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF1 group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,111 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_biquad_cascade_df1_init_q15.c    
+*    
+* Description:  Q15 Biquad cascade DirectFormI(DF1) filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @details    
+ *    
+ * @param[in,out] *S           points to an instance of the Q15 Biquad cascade structure.    
+ * @param[in]     numStages    number of 2nd order stages in the filter.    
+ * @param[in]     *pCoeffs     points to the filter coefficients.    
+ * @param[in]     *pState      points to the state buffer.    
+ * @param[in]     postShift    Shift to be applied to the accumulator result. Varies according to the coefficients format    
+ * @return        none    
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ *    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, 0, b11, b12, a11, a12, b20, 0, b21, b22, a21, a22, ...}    
+ * </pre>    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>6*numStages</code> values.    
+ * The zero coefficient between <code>b1</code> and <code>b2</code> facilities  use of 16-bit SIMD instructions on the Cortex-M4.    
+ *    
+ * \par    
+ * The state variables are stored in the array <code>pState</code>.    
+ * Each Biquad stage has 4 state variables <code>x[n-1], x[n-2], y[n-1],</code> and <code>y[n-2]</code>.    
+ * The state variables are arranged in the <code>pState</code> array as:    
+ * <pre>    
+ *     {x[n-1], x[n-2], y[n-1], y[n-2]}    
+ * </pre>    
+ * The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>4*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ */
+
+void arm_biquad_cascade_df1_init_q15(
+  arm_biquad_casd_df1_inst_q15 * S,
+  uint8_t numStages,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  int8_t postShift)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign postShift to be applied to the output */
+  S->postShift = postShift;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 4 * numStages */
+  memset(pState, 0, (4u * (uint32_t) numStages) * sizeof(q15_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF1 group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,111 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_init_q31.c    
+*    
+* Description:	Q31 Biquad cascade DirectFormI(DF1) filter initialization function.    
+*    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @details    
+ *    
+ * @param[in,out] *S           points to an instance of the Q31 Biquad cascade structure.    
+ * @param[in]     numStages    number of 2nd order stages in the filter.    
+ * @param[in]     *pCoeffs     points to the filter coefficients buffer.    
+ * @param[in]     *pState      points to the state buffer.    
+ * @param[in]     postShift    Shift to be applied after the accumulator.  Varies according to the coefficients format    
+ * @return        none    
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ *    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ...}    
+ * </pre>    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>5*numStages</code> values.    
+ *    
+ * \par    
+ * The <code>pState</code> points to state variables array.    
+ * Each Biquad stage has 4 state variables <code>x[n-1], x[n-2], y[n-1],</code> and <code>y[n-2]</code>.    
+ * The state variables are arranged in the <code>pState</code> array as:    
+ * <pre>    
+ *     {x[n-1], x[n-2], y[n-1], y[n-2]}    
+ * </pre>    
+ * The 4 state variables for stage 1 are first, then the 4 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>4*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ */
+
+void arm_biquad_cascade_df1_init_q31(
+  arm_biquad_casd_df1_inst_q31 * S,
+  uint8_t numStages,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  int8_t postShift)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign postShift to be applied to the output */
+  S->postShift = postShift;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 4 * numStages */
+  memset(pState, 0, (4u * (uint32_t) numStages) * sizeof(q31_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF1 group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,411 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_q15.c    
+*    
+* Description:	Processing function for the    
+*				Q15 Biquad cascade DirectFormI(DF1) filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q15 Biquad cascade filter.    
+ * @param[in]  *S points to an instance of the Q15 Biquad cascade structure.    
+ * @param[in]  *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the location where the output result is written.    
+ * @param[in]  blockSize number of samples to process per call.    
+ * @return none.    
+ *    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * The accumulator is then shifted by <code>postShift</code> bits to truncate the result to 1.15 format by discarding the low 16 bits.    
+ * Finally, the result is saturated to 1.15 format.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_biquad_cascade_df1_fast_q15()</code> for a faster but less precise implementation of this filter for Cortex-M3 and Cortex-M4.    
+ */
+
+void arm_biquad_cascade_df1_q15(
+  const arm_biquad_casd_df1_inst_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t *pIn = pSrc;                             /*  Source pointer                               */
+  q15_t *pOut = pDst;                            /*  Destination pointer                          */
+  q31_t in;                                      /*  Temporary variable to hold input value       */
+  q31_t out;                                     /*  Temporary variable to hold output value      */
+  q31_t b0;                                      /*  Temporary variable to hold bo value          */
+  q31_t b1, a1;                                  /*  Filter coefficients                          */
+  q31_t state_in, state_out;                     /*  Filter state variables                       */
+  q31_t acc_l, acc_h;
+  q63_t acc;                                     /*  Accumulator                                  */
+  int32_t lShift = (15 - (int32_t) S->postShift);       /*  Post shift                                   */
+  q15_t *pState = S->pState;                     /*  State pointer                                */
+  q15_t *pCoeffs = S->pCoeffs;                   /*  Coefficient pointer                          */
+  uint32_t sample, stage = (uint32_t) S->numStages;     /*  Stage loop counter                           */
+  int32_t uShift = (32 - lShift);
+
+  do
+  {
+    /* Read the b0 and 0 coefficients using SIMD  */
+    b0 = *__SIMD32(pCoeffs)++;
+
+    /* Read the b1 and b2 coefficients using SIMD */
+    b1 = *__SIMD32(pCoeffs)++;
+
+    /* Read the a1 and a2 coefficients using SIMD */
+    a1 = *__SIMD32(pCoeffs)++;
+
+    /* Read the input state values from the state buffer:  x[n-1], x[n-2] */
+    state_in = *__SIMD32(pState)++;
+
+    /* Read the output state values from the state buffer:  y[n-1], y[n-2] */
+    state_out = *__SIMD32(pState)--;
+
+    /* Apply loop unrolling and compute 2 output values simultaneously. */
+    /*      The variable acc hold output values that are being computed:    
+     *    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]    
+     */
+    sample = blockSize >> 1u;
+
+    /* First part of the processing with loop unrolling.  Compute 2 outputs at a time.    
+     ** a second loop below computes the remaining 1 sample. */
+    while(sample > 0u)
+    {
+
+      /* Read the input */
+      in = *__SIMD32(pIn)++;
+
+      /* out =  b0 * x[n] + 0 * 0 */
+      out = __SMUAD(b0, in);
+
+      /* acc +=  b1 * x[n-1] +  b2 * x[n-2] + out */
+      acc = __SMLALD(b1, state_in, out);
+      /* acc +=  a1 * y[n-1] +  a2 * y[n-2] */
+      acc = __SMLALD(a1, state_out, acc);
+
+      /* The result is converted from 3.29 to 1.31 if postShift = 1, and then saturation is applied */
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      out = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      out = __SSAT(out, 16);
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      /* x[n-N], x[n-N-1] are packed together to make state_in of type q31 */
+      /* y[n-N], y[n-N-1] are packed together to make state_out of type q31 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      state_in = __PKHBT(in, state_in, 16);
+      state_out = __PKHBT(out, state_out, 16);
+
+#else
+
+      state_in = __PKHBT(state_in >> 16, (in >> 16), 16);
+      state_out = __PKHBT(state_out >> 16, (out), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* out =  b0 * x[n] + 0 * 0 */
+      out = __SMUADX(b0, in);
+      /* acc +=  b1 * x[n-1] +  b2 * x[n-2] + out */
+      acc = __SMLALD(b1, state_in, out);
+      /* acc +=  a1 * y[n-1] + a2 * y[n-2] */
+      acc = __SMLALD(a1, state_out, acc);
+
+      /* The result is converted from 3.29 to 1.31 if postShift = 1, and then saturation is applied */
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      out = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      out = __SSAT(out, 16);
+
+      /* Store the output in the destination buffer. */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ = __PKHBT(state_out, out, 16);
+
+#else
+
+      *__SIMD32(pOut)++ = __PKHBT(out, state_out >> 16, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      /* x[n-N], x[n-N-1] are packed together to make state_in of type q31 */
+      /* y[n-N], y[n-N-1] are packed together to make state_out of type q31 */
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      state_in = __PKHBT(in >> 16, state_in, 16);
+      state_out = __PKHBT(out, state_out, 16);
+
+#else
+
+      state_in = __PKHBT(state_in >> 16, in, 16);
+      state_out = __PKHBT(state_out >> 16, out, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+
+      /* Decrement the loop counter */
+      sample--;
+
+    }
+
+    /* If the blockSize is not a multiple of 2, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+
+    if((blockSize & 0x1u) != 0u)
+    {
+      /* Read the input */
+      in = *pIn++;
+
+      /* out =  b0 * x[n] + 0 * 0 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      out = __SMUAD(b0, in);
+
+#else
+
+      out = __SMUADX(b0, in);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* acc =  b1 * x[n-1] + b2 * x[n-2] + out */
+      acc = __SMLALD(b1, state_in, out);
+      /* acc +=  a1 * y[n-1] + a2 * y[n-2] */
+      acc = __SMLALD(a1, state_out, acc);
+
+      /* The result is converted from 3.29 to 1.31 if postShift = 1, and then saturation is applied */
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      out = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      out = __SSAT(out, 16);
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = (q15_t) out;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc   */
+      /* x[n-N], x[n-N-1] are packed together to make state_in of type q31 */
+      /* y[n-N], y[n-N-1] are packed together to make state_out of type q31 */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      state_in = __PKHBT(in, state_in, 16);
+      state_out = __PKHBT(out, state_out, 16);
+
+#else
+
+      state_in = __PKHBT(state_in >> 16, in, 16);
+      state_out = __PKHBT(state_out >> 16, out, 16);
+
+#endif /*   #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+    }
+
+    /*  The first stage goes from the input wire to the output wire.  */
+    /*  Subsequent numStages occur in-place in the output wire  */
+    pIn = pDst;
+
+    /* Reset the output pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the state array */
+    *__SIMD32(pState)++ = state_in;
+    *__SIMD32(pState)++ = state_out;
+
+
+    /* Decrement the loop counter */
+    stage--;
+
+  } while(stage > 0u);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q15_t *pIn = pSrc;                             /*  Source pointer                               */
+  q15_t *pOut = pDst;                            /*  Destination pointer                          */
+  q15_t b0, b1, b2, a1, a2;                      /*  Filter coefficients           */
+  q15_t Xn1, Xn2, Yn1, Yn2;                      /*  Filter state variables        */
+  q15_t Xn;                                      /*  temporary input               */
+  q63_t acc;                                     /*  Accumulator                                  */
+  int32_t shift = (15 - (int32_t) S->postShift); /*  Post shift                                   */
+  q15_t *pState = S->pState;                     /*  State pointer                                */
+  q15_t *pCoeffs = S->pCoeffs;                   /*  Coefficient pointer                          */
+  uint32_t sample, stage = (uint32_t) S->numStages;     /*  Stage loop counter                           */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    pCoeffs++;  // skip the 0 coefficient
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the state values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /*      The variables acc holds the output value that is computed:         
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]         
+     */
+
+    sample = blockSize;
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      acc = (q31_t) b0 *Xn;
+
+      /* acc +=  b1 * x[n-1] */
+      acc += (q31_t) b1 *Xn1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q31_t) b2 *Xn2;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q31_t) a1 *Yn1;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q31_t) a2 *Yn2;
+
+      /* The result is converted to 1.31  */
+      acc = __SSAT((acc >> shift), 16);
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      Yn1 = (q15_t) acc;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = (q15_t) acc;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /*  The first stage goes from the input buffer to the output buffer. */
+    /*  Subsequent stages occur in-place in the output buffer */
+    pIn = pDst;
+
+    /* Reset to destination pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = Xn1;
+    *pState++ = Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+  } while(--stage);
+
+#endif /*     #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+
+/**    
+ * @} end of BiquadCascadeDF1 group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df1_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,405 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df1_q31.c    
+*    
+* Description:	Processing function for the    
+*				Q31 Biquad cascade filter    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF1    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q31 Biquad cascade filter.    
+ * @param[in]  *S         points to an instance of the Q31 Biquad cascade structure.    
+ * @param[in]  *pSrc      points to the block of input data.    
+ * @param[out] *pDst      points to the block of output data.    
+ * @param[in]  blockSize  number of samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around rather than clip.    
+ * In order to avoid overflows completely the input signal must be scaled down by 2 bits and lie in the range [-0.25 +0.25).    
+ * After all 5 multiply-accumulates are performed, the 2.62 accumulator is shifted by <code>postShift</code> bits and the result truncated to    
+ * 1.31 format by discarding the low 32 bits.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_biquad_cascade_df1_fast_q31()</code> for a faster but less precise implementation of this filter for Cortex-M3 and Cortex-M4.    
+ */
+
+void arm_biquad_cascade_df1_q31(
+  const arm_biquad_casd_df1_inst_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q63_t acc;                                     /*  accumulator                   */
+  uint32_t uShift = ((uint32_t) S->postShift + 1u);
+  uint32_t lShift = 32u - uShift;                /*  Shift to be applied to the output */
+  q31_t *pIn = pSrc;                             /*  input pointer initialization  */
+  q31_t *pOut = pDst;                            /*  output pointer initialization */
+  q31_t *pState = S->pState;                     /*  pState pointer initialization */
+  q31_t *pCoeffs = S->pCoeffs;                   /*  coeff pointer initialization  */
+  q31_t Xn1, Xn2, Yn1, Yn2;                      /*  Filter state variables        */
+  q31_t b0, b1, b2, a1, a2;                      /*  Filter coefficients           */
+  q31_t Xn;                                      /*  temporary input               */
+  uint32_t sample, stage = S->numStages;         /*  loop counters                     */
+
+
+#ifndef ARM_MATH_CM0_FAMILY_FAMILY
+
+  q31_t acc_l, acc_h;                            /*  temporary output variables    */
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the state values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /* Apply loop unrolling and compute 4 output values simultaneously. */
+    /*      The variable acc hold output values that are being computed:    
+     *    
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]    
+     */
+
+    sample = blockSize >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) b0 *Xn;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) b1 *Xn1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) b2 *Xn2;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q63_t) a1 *Yn1;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q63_t) a2 *Yn2;
+
+      /* The result is converted to 1.31 , Yn2 variable is reused */
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      Yn2 = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = Yn2;
+
+      /* Read the second input */
+      Xn2 = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) b0 *Xn2;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) b1 *Xn;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) b2 *Xn1;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q63_t) a1 *Yn2;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q63_t) a2 *Yn1;
+
+
+      /* The result is converted to 1.31, Yn1 variable is reused  */
+
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      Yn1 = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = Yn1;
+
+      /* Read the third input  */
+      Xn1 = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) b0 *Xn1;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) b1 *Xn2;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) b2 *Xn;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q63_t) a1 *Yn1;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q63_t) a2 *Yn2;
+
+      /* The result is converted to 1.31, Yn2 variable is reused  */
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      Yn2 = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = Yn2;
+
+      /* Read the forth input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) b0 *Xn;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) b1 *Xn1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) b2 *Xn2;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q63_t) a1 *Yn2;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q63_t) a2 *Yn1;
+
+      /* The result is converted to 1.31, Yn1 variable is reused  */
+      /* Calc lower part of acc */
+      acc_l = acc & 0xffffffff;
+
+      /* Calc upper part of acc */
+      acc_h = (acc >> 32) & 0xffffffff;
+
+      /* Apply shift for lower part of acc and upper part of acc */
+      Yn1 = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = Yn1;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    sample = (blockSize & 0x3u);
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) b0 *Xn;
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) b1 *Xn1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) b2 *Xn2;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q63_t) a1 *Yn1;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q63_t) a2 *Yn2;
+
+      /* The result is converted to 1.31  */
+      acc = acc >> lShift;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      Yn1 = (q31_t) acc;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = (q31_t) acc;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /*  The first stage goes from the input buffer to the output buffer. */
+    /*  Subsequent stages occur in-place in the output buffer */
+    pIn = pDst;
+
+    /* Reset to destination pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = Xn1;
+    *pState++ = Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+  } while(--stage);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  do
+  {
+    /* Reading the coefficients */
+    b0 = *pCoeffs++;
+    b1 = *pCoeffs++;
+    b2 = *pCoeffs++;
+    a1 = *pCoeffs++;
+    a2 = *pCoeffs++;
+
+    /* Reading the state values */
+    Xn1 = pState[0];
+    Xn2 = pState[1];
+    Yn1 = pState[2];
+    Yn2 = pState[3];
+
+    /*      The variables acc holds the output value that is computed:         
+     *    acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2]         
+     */
+
+    sample = blockSize;
+
+    while(sample > 0u)
+    {
+      /* Read the input */
+      Xn = *pIn++;
+
+      /* acc =  b0 * x[n] + b1 * x[n-1] + b2 * x[n-2] + a1 * y[n-1] + a2 * y[n-2] */
+      /* acc =  b0 * x[n] */
+      acc = (q63_t) b0 *Xn;
+
+      /* acc +=  b1 * x[n-1] */
+      acc += (q63_t) b1 *Xn1;
+      /* acc +=  b[2] * x[n-2] */
+      acc += (q63_t) b2 *Xn2;
+      /* acc +=  a1 * y[n-1] */
+      acc += (q63_t) a1 *Yn1;
+      /* acc +=  a2 * y[n-2] */
+      acc += (q63_t) a2 *Yn2;
+
+      /* The result is converted to 1.31  */
+      acc = acc >> lShift;
+
+      /* Every time after the output is computed state should be updated. */
+      /* The states should be updated as:  */
+      /* Xn2 = Xn1    */
+      /* Xn1 = Xn     */
+      /* Yn2 = Yn1    */
+      /* Yn1 = acc    */
+      Xn2 = Xn1;
+      Xn1 = Xn;
+      Yn2 = Yn1;
+      Yn1 = (q31_t) acc;
+
+      /* Store the output in the destination buffer. */
+      *pOut++ = (q31_t) acc;
+
+      /* decrement the loop counter */
+      sample--;
+    }
+
+    /*  The first stage goes from the input buffer to the output buffer. */
+    /*  Subsequent stages occur in-place in the output buffer */
+    pIn = pDst;
+
+    /* Reset to destination pointer */
+    pOut = pDst;
+
+    /*  Store the updated state variables back into the pState array */
+    *pState++ = Xn1;
+    *pState++ = Xn2;
+    *pState++ = Yn1;
+    *pState++ = Yn2;
+
+  } while(--stage);
+
+#endif /*  #ifndef ARM_MATH_CM0_FAMILY_FAMILY */
+}
+
+
+
+
+/**    
+  * @} end of BiquadCascadeDF1 group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,603 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df2T_f32.c    
+*    
+* Description:  Processing function for the floating-point transposed    
+*               direct form II Biquad cascade filter.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**       
+* @ingroup groupFilters       
+*/
+
+/**       
+* @defgroup BiquadCascadeDF2T Biquad Cascade IIR Filters Using a Direct Form II Transposed Structure       
+*       
+* This set of functions implements arbitrary order recursive (IIR) filters using a transposed direct form II structure.       
+* The filters are implemented as a cascade of second order Biquad sections.       
+* These functions provide a slight memory savings as compared to the direct form I Biquad filter functions.      
+* Only floating-point data is supported.       
+*       
+* This function operate on blocks of input and output data and each call to the function       
+* processes <code>blockSize</code> samples through the filter.       
+* <code>pSrc</code> points to the array of input data and       
+* <code>pDst</code> points to the array of output data.       
+* Both arrays contain <code>blockSize</code> values.       
+*       
+* \par Algorithm       
+* Each Biquad stage implements a second order filter using the difference equation:       
+* <pre>       
+*    y[n] = b0 * x[n] + d1       
+*    d1 = b1 * x[n] + a1 * y[n] + d2       
+*    d2 = b2 * x[n] + a2 * y[n]       
+* </pre>       
+* where d1 and d2 represent the two state values.       
+*       
+* \par       
+* A Biquad filter using a transposed Direct Form II structure is shown below.       
+* \image html BiquadDF2Transposed.gif "Single transposed Direct Form II Biquad"       
+* Coefficients <code>b0, b1, and b2 </code> multiply the input signal <code>x[n]</code> and are referred to as the feedforward coefficients.       
+* Coefficients <code>a1</code> and <code>a2</code> multiply the output signal <code>y[n]</code> and are referred to as the feedback coefficients.       
+* Pay careful attention to the sign of the feedback coefficients.       
+* Some design tools flip the sign of the feedback coefficients:       
+* <pre>       
+*    y[n] = b0 * x[n] + d1;       
+*    d1 = b1 * x[n] - a1 * y[n] + d2;       
+*    d2 = b2 * x[n] - a2 * y[n];       
+* </pre>       
+* In this case the feedback coefficients <code>a1</code> and <code>a2</code> must be negated when used with the CMSIS DSP Library.       
+*       
+* \par       
+* Higher order filters are realized as a cascade of second order sections.       
+* <code>numStages</code> refers to the number of second order stages used.       
+* For example, an 8th order filter would be realized with <code>numStages=4</code> second order stages.       
+* A 9th order filter would be realized with <code>numStages=5</code> second order stages with the       
+* coefficients for one of the stages configured as a first order filter (<code>b2=0</code> and <code>a2=0</code>).       
+*       
+* \par       
+* <code>pState</code> points to the state variable array.       
+* Each Biquad stage has 2 state variables <code>d1</code> and <code>d2</code>.       
+* The state variables are arranged in the <code>pState</code> array as:       
+* <pre>       
+*     {d11, d12, d21, d22, ...}       
+* </pre>       
+* where <code>d1x</code> refers to the state variables for the first Biquad and       
+* <code>d2x</code> refers to the state variables for the second Biquad.       
+* The state array has a total length of <code>2*numStages</code> values.       
+* The state variables are updated after each block of data is processed; the coefficients are untouched.       
+*       
+* \par       
+* The CMSIS library contains Biquad filters in both Direct Form I and transposed Direct Form II.    
+* The advantage of the Direct Form I structure is that it is numerically more robust for fixed-point data types.    
+* That is why the Direct Form I structure supports Q15 and Q31 data types.    
+* The transposed Direct Form II structure, on the other hand, requires a wide dynamic range for the state variables <code>d1</code> and <code>d2</code>.    
+* Because of this, the CMSIS library only has a floating-point version of the Direct Form II Biquad.    
+* The advantage of the Direct Form II Biquad is that it requires half the number of state variables, 2 rather than 4, per Biquad stage.    
+*       
+* \par Instance Structure       
+* The coefficients and state variables for a filter are stored together in an instance data structure.       
+* A separate instance structure must be defined for each filter.       
+* Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.       
+*       
+* \par Init Functions       
+* There is also an associated initialization function.      
+* The initialization function performs following operations:       
+* - Sets the values of the internal structure fields.       
+* - Zeros out the values in the state buffer.       
+* To do this manually without calling the init function, assign the follow subfields of the instance structure:
+* numStages, pCoeffs, pState. Also set all of the values in pState to zero. 
+*       
+* \par       
+* Use of the initialization function is optional.       
+* However, if the initialization function is used, then the instance structure cannot be placed into a const data section.       
+* To place an instance structure into a const data section, the instance structure must be manually initialized.       
+* Set the values in the state buffer to zeros before static initialization.       
+* For example, to statically initialize the instance structure use       
+* <pre>       
+*     arm_biquad_cascade_df2T_instance_f32 S1 = {numStages, pState, pCoeffs};       
+* </pre>       
+* where <code>numStages</code> is the number of Biquad stages in the filter; <code>pState</code> is the address of the state buffer.       
+* <code>pCoeffs</code> is the address of the coefficient buffer;        
+*       
+*/
+
+/**       
+* @addtogroup BiquadCascadeDF2T       
+* @{       
+*/
+
+/**      
+* @brief Processing function for the floating-point transposed direct form II Biquad cascade filter.      
+* @param[in]  *S        points to an instance of the filter data structure.      
+* @param[in]  *pSrc     points to the block of input data.      
+* @param[out] *pDst     points to the block of output data      
+* @param[in]  blockSize number of samples to process.      
+* @return none.      
+*/
+
+
+LOW_OPTIMIZATION_ENTER
+void arm_biquad_cascade_df2T_f32(
+const arm_biquad_cascade_df2T_instance_f32 * S,
+float32_t * pSrc,
+float32_t * pDst,
+uint32_t blockSize)
+{
+
+   float32_t *pIn = pSrc;                         /*  source pointer            */
+   float32_t *pOut = pDst;                        /*  destination pointer       */
+   float32_t *pState = S->pState;                 /*  State pointer             */
+   float32_t *pCoeffs = S->pCoeffs;               /*  coefficient pointer       */
+   float32_t acc1;                                /*  accumulator               */
+   float32_t b0, b1, b2, a1, a2;                  /*  Filter coefficients       */
+   float32_t Xn1;                                 /*  temporary input           */
+   float32_t d1, d2;                              /*  state variables           */
+   uint32_t sample, stage = S->numStages;         /*  loop counters             */
+
+#if defined(ARM_MATH_CM7)
+	
+   float32_t Xn2, Xn3, Xn4, Xn5, Xn6, Xn7, Xn8;   /*  Input State variables     */
+   float32_t Xn9, Xn10, Xn11, Xn12, Xn13, Xn14, Xn15, Xn16;
+   float32_t acc2, acc3, acc4, acc5, acc6, acc7;  /*  Simulates the accumulator */
+   float32_t acc8, acc9, acc10, acc11, acc12, acc13, acc14, acc15, acc16;
+
+   do
+   {
+      /* Reading the coefficients */ 
+      b0 = pCoeffs[0]; 
+      b1 = pCoeffs[1]; 
+      b2 = pCoeffs[2]; 
+      a1 = pCoeffs[3]; 
+      /* Apply loop unrolling and compute 16 output values simultaneously. */ 
+      sample = blockSize >> 4u; 
+      a2 = pCoeffs[4]; 
+
+      /*Reading the state values */ 
+      d1 = pState[0]; 
+      d2 = pState[1]; 
+
+      pCoeffs += 5u;
+
+      
+      /* First part of the processing with loop unrolling.  Compute 16 outputs at a time.       
+       ** a second loop below computes the remaining 1 to 15 samples. */
+      while(sample > 0u) {
+
+         /* y[n] = b0 * x[n] + d1 */
+         /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+         /* d2 = b2 * x[n] + a2 * y[n] */
+
+         /* Read the first 2 inputs. 2 cycles */
+         Xn1  = pIn[0 ];
+         Xn2  = pIn[1 ];
+
+         /* Sample 1. 5 cycles */
+         Xn3  = pIn[2 ];
+         acc1 = b0 * Xn1 + d1;
+         
+         Xn4  = pIn[3 ];
+         d1 = b1 * Xn1 + d2;
+         
+         Xn5  = pIn[4 ];
+         d2 = b2 * Xn1;
+         
+         Xn6  = pIn[5 ];
+         d1 += a1 * acc1;
+         
+         Xn7  = pIn[6 ];
+         d2 += a2 * acc1;
+
+         /* Sample 2. 5 cycles */
+         Xn8  = pIn[7 ];
+         acc2 = b0 * Xn2 + d1;
+         
+         Xn9  = pIn[8 ];
+         d1 = b1 * Xn2 + d2;
+         
+         Xn10 = pIn[9 ];
+         d2 = b2 * Xn2;
+         
+         Xn11 = pIn[10];
+         d1 += a1 * acc2;
+         
+         Xn12 = pIn[11];
+         d2 += a2 * acc2;
+
+         /* Sample 3. 5 cycles */
+         Xn13 = pIn[12];
+         acc3 = b0 * Xn3 + d1;
+         
+         Xn14 = pIn[13];
+         d1 = b1 * Xn3 + d2;
+         
+         Xn15 = pIn[14];
+         d2 = b2 * Xn3;
+         
+         Xn16 = pIn[15];
+         d1 += a1 * acc3;
+         
+         pIn += 16;
+         d2 += a2 * acc3;
+
+         /* Sample 4. 5 cycles */
+         acc4 = b0 * Xn4 + d1;
+         d1 = b1 * Xn4 + d2;
+         d2 = b2 * Xn4;
+         d1 += a1 * acc4;
+         d2 += a2 * acc4;
+
+         /* Sample 5. 5 cycles */
+         acc5 = b0 * Xn5 + d1;
+         d1 = b1 * Xn5 + d2;
+         d2 = b2 * Xn5;
+         d1 += a1 * acc5;
+         d2 += a2 * acc5;
+
+         /* Sample 6. 5 cycles */
+         acc6 = b0 * Xn6 + d1;
+         d1 = b1 * Xn6 + d2;
+         d2 = b2 * Xn6;
+         d1 += a1 * acc6;
+         d2 += a2 * acc6;
+
+         /* Sample 7. 5 cycles */
+         acc7 = b0 * Xn7 + d1;
+         d1 = b1 * Xn7 + d2;
+         d2 = b2 * Xn7;
+         d1 += a1 * acc7;
+         d2 += a2 * acc7;
+
+         /* Sample 8. 5 cycles */
+         acc8 = b0 * Xn8 + d1;
+         d1 = b1 * Xn8 + d2;
+         d2 = b2 * Xn8;
+         d1 += a1 * acc8;
+         d2 += a2 * acc8;
+
+         /* Sample 9. 5 cycles */
+         acc9 = b0 * Xn9 + d1;
+         d1 = b1 * Xn9 + d2;
+         d2 = b2 * Xn9;
+         d1 += a1 * acc9;
+         d2 += a2 * acc9;
+
+         /* Sample 10. 5 cycles */
+         acc10 = b0 * Xn10 + d1;
+         d1 = b1 * Xn10 + d2;
+         d2 = b2 * Xn10;
+         d1 += a1 * acc10;
+         d2 += a2 * acc10;
+
+         /* Sample 11. 5 cycles */
+         acc11 = b0 * Xn11 + d1;
+         d1 = b1 * Xn11 + d2;
+         d2 = b2 * Xn11;
+         d1 += a1 * acc11;
+         d2 += a2 * acc11;
+
+         /* Sample 12. 5 cycles */
+         acc12 = b0 * Xn12 + d1;
+         d1 = b1 * Xn12 + d2;
+         d2 = b2 * Xn12;
+         d1 += a1 * acc12;
+         d2 += a2 * acc12;
+
+         /* Sample 13. 5 cycles */
+         acc13 = b0 * Xn13 + d1;         
+         d1 = b1 * Xn13 + d2;         
+         d2 = b2 * Xn13;
+         
+         pOut[0 ] = acc1 ;
+         d1 += a1 * acc13;
+         
+         pOut[1 ] = acc2 ;	
+         d2 += a2 * acc13;
+
+         /* Sample 14. 5 cycles */
+         pOut[2 ] = acc3 ;	
+         acc14 = b0 * Xn14 + d1;
+             
+         pOut[3 ] = acc4 ;
+         d1 = b1 * Xn14 + d2;
+          
+         pOut[4 ] = acc5 ; 
+         d2 = b2 * Xn14;
+         
+         pOut[5 ] = acc6 ;	  
+         d1 += a1 * acc14;
+         
+         pOut[6 ] = acc7 ;	
+         d2 += a2 * acc14;
+
+         /* Sample 15. 5 cycles */
+         pOut[7 ] = acc8 ;
+         pOut[8 ] = acc9 ;  
+         acc15 = b0 * Xn15 + d1;
+              
+         pOut[9 ] = acc10;	
+         d1 = b1 * Xn15 + d2;
+         
+         pOut[10] = acc11;	
+         d2 = b2 * Xn15;
+         
+         pOut[11] = acc12;
+         d1 += a1 * acc15;
+         
+         pOut[12] = acc13;
+         d2 += a2 * acc15;
+
+         /* Sample 16. 5 cycles */
+         pOut[13] = acc14;	
+         acc16 = b0 * Xn16 + d1;
+         
+         pOut[14] = acc15;	
+         d1 = b1 * Xn16 + d2;
+         
+         pOut[15] = acc16;
+         d2 = b2 * Xn16;
+         
+         sample--;	 
+         d1 += a1 * acc16;
+         
+         pOut += 16;
+         d2 += a2 * acc16;
+      }
+
+      sample = blockSize & 0xFu;
+      while(sample > 0u) {
+         Xn1 = *pIn;         
+         acc1 = b0 * Xn1 + d1;
+         
+         pIn++;
+         d1 = b1 * Xn1 + d2;
+         
+         *pOut = acc1; 
+         d2 = b2 * Xn1;
+         
+         pOut++;
+         d1 += a1 * acc1;
+         
+         sample--;	
+         d2 += a2 * acc1; 
+      }
+
+      /* Store the updated state variables back into the state array */ 
+      pState[0] = d1; 
+      /* The current stage input is given as the output to the next stage */ 
+      pIn = pDst; 
+      
+      pState[1] = d2; 
+      /* decrement the loop counter */ 
+      stage--; 
+
+      pState += 2u;
+
+      /*Reset the output working pointer */ 
+      pOut = pDst; 
+
+   } while(stage > 0u);
+	
+#elif defined(ARM_MATH_CM0_FAMILY)
+
+   /* Run the below code for Cortex-M0 */
+
+   do
+   {
+      /* Reading the coefficients */
+      b0 = *pCoeffs++;
+      b1 = *pCoeffs++;
+      b2 = *pCoeffs++;
+      a1 = *pCoeffs++;
+      a2 = *pCoeffs++;
+
+      /*Reading the state values */
+      d1 = pState[0];
+      d2 = pState[1];
+
+
+      sample = blockSize;
+
+      while(sample > 0u)
+      {
+         /* Read the input */
+         Xn1 = *pIn++;
+
+         /* y[n] = b0 * x[n] + d1 */
+         acc1 = (b0 * Xn1) + d1;
+
+         /* Store the result in the accumulator in the destination buffer. */
+         *pOut++ = acc1;
+
+         /* Every time after the output is computed state should be updated. */
+         /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+         d1 = ((b1 * Xn1) + (a1 * acc1)) + d2;
+
+         /* d2 = b2 * x[n] + a2 * y[n] */
+         d2 = (b2 * Xn1) + (a2 * acc1);
+
+         /* decrement the loop counter */
+         sample--;
+      }
+
+      /* Store the updated state variables back into the state array */
+      *pState++ = d1;
+      *pState++ = d2;
+
+      /* The current stage input is given as the output to the next stage */
+      pIn = pDst;
+
+      /*Reset the output working pointer */
+      pOut = pDst;
+
+      /* decrement the loop counter */
+      stage--;
+
+   } while(stage > 0u);
+	 
+#else
+
+   float32_t Xn2, Xn3, Xn4;                  	  /*  Input State variables     */
+   float32_t acc2, acc3, acc4;              		  /*  accumulator               */
+
+
+   float32_t p0, p1, p2, p3, p4, A1;
+
+   /* Run the below code for Cortex-M4 and Cortex-M3 */
+   do
+   {
+      /* Reading the coefficients */     
+      b0 = *pCoeffs++;
+      b1 = *pCoeffs++;
+      b2 = *pCoeffs++;
+      a1 = *pCoeffs++;
+      a2 = *pCoeffs++;
+      
+
+      /*Reading the state values */
+      d1 = pState[0];
+      d2 = pState[1];
+
+      /* Apply loop unrolling and compute 4 output values simultaneously. */
+      sample = blockSize >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+   ** a second loop below computes the remaining 1 to 3 samples. */
+      while(sample > 0u) {
+
+         /* y[n] = b0 * x[n] + d1 */
+         /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+         /* d2 = b2 * x[n] + a2 * y[n] */
+
+         /* Read the four inputs */
+         Xn1 = pIn[0];
+         Xn2 = pIn[1];
+         Xn3 = pIn[2];
+         Xn4 = pIn[3];
+         pIn += 4;     
+
+         p0 = b0 * Xn1; 
+         p1 = b1 * Xn1;
+         acc1 = p0 + d1;
+         p0 = b0 * Xn2; 
+         p3 = a1 * acc1;
+         p2 = b2 * Xn1;
+         A1 = p1 + p3;
+         p4 = a2 * acc1;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         p1 = b1 * Xn2;
+         acc2 = p0 + d1;
+         p0 = b0 * Xn3;	 
+         p3 = a1 * acc2; 
+         p2 = b2 * Xn2;                                 
+         A1 = p1 + p3;
+         p4 = a2 * acc2;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         p1 = b1 * Xn3;
+         acc3 = p0 + d1;
+         p0 = b0 * Xn4;	
+         p3 = a1 * acc3;
+         p2 = b2 * Xn3;
+         A1 = p1 + p3;
+         p4 = a2 * acc3;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         acc4 = p0 + d1;
+         p1 = b1 * Xn4;
+         p3 = a1 * acc4;
+         p2 = b2 * Xn4;
+         A1 = p1 + p3;
+         p4 = a2 * acc4;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         pOut[0] = acc1;	
+         pOut[1] = acc2;	
+         pOut[2] = acc3;	
+         pOut[3] = acc4;
+		 pOut += 4;
+				 
+         sample--;	       
+      }
+
+      sample = blockSize & 0x3u;
+      while(sample > 0u) {
+         Xn1 = *pIn++;
+
+         p0 = b0 * Xn1; 
+         p1 = b1 * Xn1;
+         acc1 = p0 + d1;
+         p3 = a1 * acc1;
+         p2 = b2 * Xn1;
+         A1 = p1 + p3;
+         p4 = a2 * acc1;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+	
+         *pOut++ = acc1;
+         
+         sample--;	       
+      }
+
+      /* Store the updated state variables back into the state array */
+      *pState++ = d1;
+      *pState++ = d2;
+
+      /* The current stage input is given as the output to the next stage */
+      pIn = pDst;
+
+      /*Reset the output working pointer */
+      pOut = pDst;
+
+      /* decrement the loop counter */
+      stage--;
+
+   } while(stage > 0u);
+
+#endif 
+
+}
+LOW_OPTIMIZATION_EXIT
+
+/**       
+   * @} end of BiquadCascadeDF2T group       
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_f64.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_f64.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,603 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_df2T_f64.c    
+*    
+* Description:  Processing function for the floating-point transposed    
+*               direct form II Biquad cascade filter.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**       
+* @ingroup groupFilters       
+*/
+
+/**       
+* @defgroup BiquadCascadeDF2T Biquad Cascade IIR Filters Using a Direct Form II Transposed Structure       
+*       
+* This set of functions implements arbitrary order recursive (IIR) filters using a transposed direct form II structure.       
+* The filters are implemented as a cascade of second order Biquad sections.       
+* These functions provide a slight memory savings as compared to the direct form I Biquad filter functions.      
+* Only floating-point data is supported.       
+*       
+* This function operate on blocks of input and output data and each call to the function       
+* processes <code>blockSize</code> samples through the filter.       
+* <code>pSrc</code> points to the array of input data and       
+* <code>pDst</code> points to the array of output data.       
+* Both arrays contain <code>blockSize</code> values.       
+*       
+* \par Algorithm       
+* Each Biquad stage implements a second order filter using the difference equation:       
+* <pre>       
+*    y[n] = b0 * x[n] + d1       
+*    d1 = b1 * x[n] + a1 * y[n] + d2       
+*    d2 = b2 * x[n] + a2 * y[n]       
+* </pre>       
+* where d1 and d2 represent the two state values.       
+*       
+* \par       
+* A Biquad filter using a transposed Direct Form II structure is shown below.       
+* \image html BiquadDF2Transposed.gif "Single transposed Direct Form II Biquad"       
+* Coefficients <code>b0, b1, and b2 </code> multiply the input signal <code>x[n]</code> and are referred to as the feedforward coefficients.       
+* Coefficients <code>a1</code> and <code>a2</code> multiply the output signal <code>y[n]</code> and are referred to as the feedback coefficients.       
+* Pay careful attention to the sign of the feedback coefficients.       
+* Some design tools flip the sign of the feedback coefficients:       
+* <pre>       
+*    y[n] = b0 * x[n] + d1;       
+*    d1 = b1 * x[n] - a1 * y[n] + d2;       
+*    d2 = b2 * x[n] - a2 * y[n];       
+* </pre>       
+* In this case the feedback coefficients <code>a1</code> and <code>a2</code> must be negated when used with the CMSIS DSP Library.       
+*       
+* \par       
+* Higher order filters are realized as a cascade of second order sections.       
+* <code>numStages</code> refers to the number of second order stages used.       
+* For example, an 8th order filter would be realized with <code>numStages=4</code> second order stages.       
+* A 9th order filter would be realized with <code>numStages=5</code> second order stages with the       
+* coefficients for one of the stages configured as a first order filter (<code>b2=0</code> and <code>a2=0</code>).       
+*       
+* \par       
+* <code>pState</code> points to the state variable array.       
+* Each Biquad stage has 2 state variables <code>d1</code> and <code>d2</code>.       
+* The state variables are arranged in the <code>pState</code> array as:       
+* <pre>       
+*     {d11, d12, d21, d22, ...}       
+* </pre>       
+* where <code>d1x</code> refers to the state variables for the first Biquad and       
+* <code>d2x</code> refers to the state variables for the second Biquad.       
+* The state array has a total length of <code>2*numStages</code> values.       
+* The state variables are updated after each block of data is processed; the coefficients are untouched.       
+*       
+* \par       
+* The CMSIS library contains Biquad filters in both Direct Form I and transposed Direct Form II.    
+* The advantage of the Direct Form I structure is that it is numerically more robust for fixed-point data types.    
+* That is why the Direct Form I structure supports Q15 and Q31 data types.    
+* The transposed Direct Form II structure, on the other hand, requires a wide dynamic range for the state variables <code>d1</code> and <code>d2</code>.    
+* Because of this, the CMSIS library only has a floating-point version of the Direct Form II Biquad.    
+* The advantage of the Direct Form II Biquad is that it requires half the number of state variables, 2 rather than 4, per Biquad stage.    
+*       
+* \par Instance Structure       
+* The coefficients and state variables for a filter are stored together in an instance data structure.       
+* A separate instance structure must be defined for each filter.       
+* Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.       
+*       
+* \par Init Functions       
+* There is also an associated initialization function.      
+* The initialization function performs following operations:       
+* - Sets the values of the internal structure fields.       
+* - Zeros out the values in the state buffer.       
+* To do this manually without calling the init function, assign the follow subfields of the instance structure:
+* numStages, pCoeffs, pState. Also set all of the values in pState to zero. 
+*       
+* \par       
+* Use of the initialization function is optional.       
+* However, if the initialization function is used, then the instance structure cannot be placed into a const data section.       
+* To place an instance structure into a const data section, the instance structure must be manually initialized.       
+* Set the values in the state buffer to zeros before static initialization.       
+* For example, to statically initialize the instance structure use       
+* <pre>       
+*     arm_biquad_cascade_df2T_instance_f64 S1 = {numStages, pState, pCoeffs};       
+* </pre>       
+* where <code>numStages</code> is the number of Biquad stages in the filter; <code>pState</code> is the address of the state buffer.       
+* <code>pCoeffs</code> is the address of the coefficient buffer;        
+*       
+*/
+
+/**       
+* @addtogroup BiquadCascadeDF2T       
+* @{       
+*/
+
+/**      
+* @brief Processing function for the floating-point transposed direct form II Biquad cascade filter.      
+* @param[in]  *S        points to an instance of the filter data structure.      
+* @param[in]  *pSrc     points to the block of input data.      
+* @param[out] *pDst     points to the block of output data      
+* @param[in]  blockSize number of samples to process.      
+* @return none.      
+*/
+
+
+LOW_OPTIMIZATION_ENTER
+void arm_biquad_cascade_df2T_f64(
+const arm_biquad_cascade_df2T_instance_f64 * S,
+float64_t * pSrc,
+float64_t * pDst,
+uint32_t blockSize)
+{
+
+   float64_t *pIn = pSrc;                         /*  source pointer            */
+   float64_t *pOut = pDst;                        /*  destination pointer       */
+   float64_t *pState = S->pState;                 /*  State pointer             */
+   float64_t *pCoeffs = S->pCoeffs;               /*  coefficient pointer       */
+   float64_t acc1;                                /*  accumulator               */
+   float64_t b0, b1, b2, a1, a2;                  /*  Filter coefficients       */
+   float64_t Xn1;                                 /*  temporary input           */
+   float64_t d1, d2;                              /*  state variables           */
+   uint32_t sample, stage = S->numStages;         /*  loop counters             */
+
+#if defined(ARM_MATH_CM7)
+	
+   float64_t Xn2, Xn3, Xn4, Xn5, Xn6, Xn7, Xn8;   /*  Input State variables     */
+   float64_t Xn9, Xn10, Xn11, Xn12, Xn13, Xn14, Xn15, Xn16;
+   float64_t acc2, acc3, acc4, acc5, acc6, acc7;  /*  Simulates the accumulator */
+   float64_t acc8, acc9, acc10, acc11, acc12, acc13, acc14, acc15, acc16;
+
+   do
+   {
+      /* Reading the coefficients */ 
+      b0 = pCoeffs[0]; 
+      b1 = pCoeffs[1]; 
+      b2 = pCoeffs[2]; 
+      a1 = pCoeffs[3]; 
+      /* Apply loop unrolling and compute 16 output values simultaneously. */ 
+      sample = blockSize >> 4u; 
+      a2 = pCoeffs[4]; 
+
+      /*Reading the state values */ 
+      d1 = pState[0]; 
+      d2 = pState[1]; 
+
+      pCoeffs += 5u;
+
+      
+      /* First part of the processing with loop unrolling.  Compute 16 outputs at a time.       
+       ** a second loop below computes the remaining 1 to 15 samples. */
+      while(sample > 0u) {
+
+         /* y[n] = b0 * x[n] + d1 */
+         /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+         /* d2 = b2 * x[n] + a2 * y[n] */
+
+         /* Read the first 2 inputs. 2 cycles */
+         Xn1  = pIn[0 ];
+         Xn2  = pIn[1 ];
+
+         /* Sample 1. 5 cycles */
+         Xn3  = pIn[2 ];
+         acc1 = b0 * Xn1 + d1;
+         
+         Xn4  = pIn[3 ];
+         d1 = b1 * Xn1 + d2;
+         
+         Xn5  = pIn[4 ];
+         d2 = b2 * Xn1;
+         
+         Xn6  = pIn[5 ];
+         d1 += a1 * acc1;
+         
+         Xn7  = pIn[6 ];
+         d2 += a2 * acc1;
+
+         /* Sample 2. 5 cycles */
+         Xn8  = pIn[7 ];
+         acc2 = b0 * Xn2 + d1;
+         
+         Xn9  = pIn[8 ];
+         d1 = b1 * Xn2 + d2;
+         
+         Xn10 = pIn[9 ];
+         d2 = b2 * Xn2;
+         
+         Xn11 = pIn[10];
+         d1 += a1 * acc2;
+         
+         Xn12 = pIn[11];
+         d2 += a2 * acc2;
+
+         /* Sample 3. 5 cycles */
+         Xn13 = pIn[12];
+         acc3 = b0 * Xn3 + d1;
+         
+         Xn14 = pIn[13];
+         d1 = b1 * Xn3 + d2;
+         
+         Xn15 = pIn[14];
+         d2 = b2 * Xn3;
+         
+         Xn16 = pIn[15];
+         d1 += a1 * acc3;
+         
+         pIn += 16;
+         d2 += a2 * acc3;
+
+         /* Sample 4. 5 cycles */
+         acc4 = b0 * Xn4 + d1;
+         d1 = b1 * Xn4 + d2;
+         d2 = b2 * Xn4;
+         d1 += a1 * acc4;
+         d2 += a2 * acc4;
+
+         /* Sample 5. 5 cycles */
+         acc5 = b0 * Xn5 + d1;
+         d1 = b1 * Xn5 + d2;
+         d2 = b2 * Xn5;
+         d1 += a1 * acc5;
+         d2 += a2 * acc5;
+
+         /* Sample 6. 5 cycles */
+         acc6 = b0 * Xn6 + d1;
+         d1 = b1 * Xn6 + d2;
+         d2 = b2 * Xn6;
+         d1 += a1 * acc6;
+         d2 += a2 * acc6;
+
+         /* Sample 7. 5 cycles */
+         acc7 = b0 * Xn7 + d1;
+         d1 = b1 * Xn7 + d2;
+         d2 = b2 * Xn7;
+         d1 += a1 * acc7;
+         d2 += a2 * acc7;
+
+         /* Sample 8. 5 cycles */
+         acc8 = b0 * Xn8 + d1;
+         d1 = b1 * Xn8 + d2;
+         d2 = b2 * Xn8;
+         d1 += a1 * acc8;
+         d2 += a2 * acc8;
+
+         /* Sample 9. 5 cycles */
+         acc9 = b0 * Xn9 + d1;
+         d1 = b1 * Xn9 + d2;
+         d2 = b2 * Xn9;
+         d1 += a1 * acc9;
+         d2 += a2 * acc9;
+
+         /* Sample 10. 5 cycles */
+         acc10 = b0 * Xn10 + d1;
+         d1 = b1 * Xn10 + d2;
+         d2 = b2 * Xn10;
+         d1 += a1 * acc10;
+         d2 += a2 * acc10;
+
+         /* Sample 11. 5 cycles */
+         acc11 = b0 * Xn11 + d1;
+         d1 = b1 * Xn11 + d2;
+         d2 = b2 * Xn11;
+         d1 += a1 * acc11;
+         d2 += a2 * acc11;
+
+         /* Sample 12. 5 cycles */
+         acc12 = b0 * Xn12 + d1;
+         d1 = b1 * Xn12 + d2;
+         d2 = b2 * Xn12;
+         d1 += a1 * acc12;
+         d2 += a2 * acc12;
+
+         /* Sample 13. 5 cycles */
+         acc13 = b0 * Xn13 + d1;         
+         d1 = b1 * Xn13 + d2;         
+         d2 = b2 * Xn13;
+         
+         pOut[0 ] = acc1 ;
+         d1 += a1 * acc13;
+         
+         pOut[1 ] = acc2 ;	
+         d2 += a2 * acc13;
+
+         /* Sample 14. 5 cycles */
+         pOut[2 ] = acc3 ;	
+         acc14 = b0 * Xn14 + d1;
+             
+         pOut[3 ] = acc4 ;
+         d1 = b1 * Xn14 + d2;
+          
+         pOut[4 ] = acc5 ; 
+         d2 = b2 * Xn14;
+         
+         pOut[5 ] = acc6 ;	  
+         d1 += a1 * acc14;
+         
+         pOut[6 ] = acc7 ;	
+         d2 += a2 * acc14;
+
+         /* Sample 15. 5 cycles */
+         pOut[7 ] = acc8 ;
+         pOut[8 ] = acc9 ;  
+         acc15 = b0 * Xn15 + d1;
+              
+         pOut[9 ] = acc10;	
+         d1 = b1 * Xn15 + d2;
+         
+         pOut[10] = acc11;	
+         d2 = b2 * Xn15;
+         
+         pOut[11] = acc12;
+         d1 += a1 * acc15;
+         
+         pOut[12] = acc13;
+         d2 += a2 * acc15;
+
+         /* Sample 16. 5 cycles */
+         pOut[13] = acc14;	
+         acc16 = b0 * Xn16 + d1;
+         
+         pOut[14] = acc15;	
+         d1 = b1 * Xn16 + d2;
+         
+         pOut[15] = acc16;
+         d2 = b2 * Xn16;
+         
+         sample--;	 
+         d1 += a1 * acc16;
+         
+         pOut += 16;
+         d2 += a2 * acc16;
+      }
+
+      sample = blockSize & 0xFu;
+      while(sample > 0u) {
+         Xn1 = *pIn;         
+         acc1 = b0 * Xn1 + d1;
+         
+         pIn++;
+         d1 = b1 * Xn1 + d2;
+         
+         *pOut = acc1; 
+         d2 = b2 * Xn1;
+         
+         pOut++;
+         d1 += a1 * acc1;
+         
+         sample--;	
+         d2 += a2 * acc1; 
+      }
+
+      /* Store the updated state variables back into the state array */ 
+      pState[0] = d1; 
+      /* The current stage input is given as the output to the next stage */ 
+      pIn = pDst; 
+      
+      pState[1] = d2; 
+      /* decrement the loop counter */ 
+      stage--; 
+
+      pState += 2u;
+
+      /*Reset the output working pointer */ 
+      pOut = pDst; 
+
+   } while(stage > 0u);
+	
+#elif defined(ARM_MATH_CM0_FAMILY)
+
+   /* Run the below code for Cortex-M0 */
+
+   do
+   {
+      /* Reading the coefficients */
+      b0 = *pCoeffs++;
+      b1 = *pCoeffs++;
+      b2 = *pCoeffs++;
+      a1 = *pCoeffs++;
+      a2 = *pCoeffs++;
+
+      /*Reading the state values */
+      d1 = pState[0];
+      d2 = pState[1];
+
+
+      sample = blockSize;
+
+      while(sample > 0u)
+      {
+         /* Read the input */
+         Xn1 = *pIn++;
+
+         /* y[n] = b0 * x[n] + d1 */
+         acc1 = (b0 * Xn1) + d1;
+
+         /* Store the result in the accumulator in the destination buffer. */
+         *pOut++ = acc1;
+
+         /* Every time after the output is computed state should be updated. */
+         /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+         d1 = ((b1 * Xn1) + (a1 * acc1)) + d2;
+
+         /* d2 = b2 * x[n] + a2 * y[n] */
+         d2 = (b2 * Xn1) + (a2 * acc1);
+
+         /* decrement the loop counter */
+         sample--;
+      }
+
+      /* Store the updated state variables back into the state array */
+      *pState++ = d1;
+      *pState++ = d2;
+
+      /* The current stage input is given as the output to the next stage */
+      pIn = pDst;
+
+      /*Reset the output working pointer */
+      pOut = pDst;
+
+      /* decrement the loop counter */
+      stage--;
+
+   } while(stage > 0u);
+	 
+#else
+
+   float64_t Xn2, Xn3, Xn4;                  	  /*  Input State variables     */
+   float64_t acc2, acc3, acc4;              		  /*  accumulator               */
+
+
+   float64_t p0, p1, p2, p3, p4, A1;
+
+   /* Run the below code for Cortex-M4 and Cortex-M3 */
+   do
+   {
+      /* Reading the coefficients */     
+      b0 = *pCoeffs++;
+      b1 = *pCoeffs++;
+      b2 = *pCoeffs++;
+      a1 = *pCoeffs++;
+      a2 = *pCoeffs++;
+      
+
+      /*Reading the state values */
+      d1 = pState[0];
+      d2 = pState[1];
+
+      /* Apply loop unrolling and compute 4 output values simultaneously. */
+      sample = blockSize >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+   ** a second loop below computes the remaining 1 to 3 samples. */
+      while(sample > 0u) {
+
+         /* y[n] = b0 * x[n] + d1 */
+         /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+         /* d2 = b2 * x[n] + a2 * y[n] */
+
+         /* Read the four inputs */
+         Xn1 = pIn[0];
+         Xn2 = pIn[1];
+         Xn3 = pIn[2];
+         Xn4 = pIn[3];
+         pIn += 4;     
+
+         p0 = b0 * Xn1; 
+         p1 = b1 * Xn1;
+         acc1 = p0 + d1;
+         p0 = b0 * Xn2; 
+         p3 = a1 * acc1;
+         p2 = b2 * Xn1;
+         A1 = p1 + p3;
+         p4 = a2 * acc1;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         p1 = b1 * Xn2;
+         acc2 = p0 + d1;
+         p0 = b0 * Xn3;	 
+         p3 = a1 * acc2; 
+         p2 = b2 * Xn2;                                 
+         A1 = p1 + p3;
+         p4 = a2 * acc2;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         p1 = b1 * Xn3;
+         acc3 = p0 + d1;
+         p0 = b0 * Xn4;	
+         p3 = a1 * acc3;
+         p2 = b2 * Xn3;
+         A1 = p1 + p3;
+         p4 = a2 * acc3;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         acc4 = p0 + d1;
+         p1 = b1 * Xn4;
+         p3 = a1 * acc4;
+         p2 = b2 * Xn4;
+         A1 = p1 + p3;
+         p4 = a2 * acc4;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+
+         pOut[0] = acc1;	
+         pOut[1] = acc2;	
+         pOut[2] = acc3;	
+         pOut[3] = acc4;
+				 pOut += 4;
+				 
+         sample--;	       
+      }
+
+      sample = blockSize & 0x3u;
+      while(sample > 0u) {
+         Xn1 = *pIn++;
+
+         p0 = b0 * Xn1; 
+         p1 = b1 * Xn1;
+         acc1 = p0 + d1;
+         p3 = a1 * acc1;
+         p2 = b2 * Xn1;
+         A1 = p1 + p3;
+         p4 = a2 * acc1;
+         d1 = A1 + d2;
+         d2 = p2 + p4;
+	
+         *pOut++ = acc1;
+         
+         sample--;	       
+      }
+
+      /* Store the updated state variables back into the state array */
+      *pState++ = d1;
+      *pState++ = d2;
+
+      /* The current stage input is given as the output to the next stage */
+      pIn = pDst;
+
+      /*Reset the output working pointer */
+      pOut = pDst;
+
+      /* decrement the loop counter */
+      stage--;
+
+   } while(stage > 0u);
+
+#endif 
+
+}
+LOW_OPTIMIZATION_EXIT
+
+/**       
+   * @} end of BiquadCascadeDF2T group       
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,102 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_biquad_cascade_df2T_init_f32.c    
+*    
+* Description:  Initialization function for the floating-point transposed   
+*               direct form II Biquad cascade filter.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF2T    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the floating-point transposed direct form II Biquad cascade filter.   
+ * @param[in,out] *S           points to an instance of the filter data structure.   
+ * @param[in]     numStages    number of 2nd order stages in the filter.   
+ * @param[in]     *pCoeffs     points to the filter coefficients.   
+ * @param[in]     *pState      points to the state buffer.   
+ * @return        none   
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ...}    
+ * </pre>    
+ *    
+ * \par    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>5*numStages</code> values.    
+ *    
+ * \par    
+ * The <code>pState</code> is a pointer to state array.    
+ * Each Biquad stage has 2 state variables <code>d1,</code> and <code>d2</code>.    
+ * The 2 state variables for stage 1 are first, then the 2 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>2*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ */
+
+void arm_biquad_cascade_df2T_init_f32(
+  arm_biquad_cascade_df2T_instance_f32 * S,
+  uint8_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 2 * numStages */
+  memset(pState, 0, (2u * (uint32_t) numStages) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF2T group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_init_f64.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_df2T_init_f64.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,102 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_biquad_cascade_df2T_init_f64.c    
+*    
+* Description:  Initialization function for the floating-point transposed   
+*               direct form II Biquad cascade filter.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF2T    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the floating-point transposed direct form II Biquad cascade filter.   
+ * @param[in,out] *S           points to an instance of the filter data structure.   
+ * @param[in]     numStages    number of 2nd order stages in the filter.   
+ * @param[in]     *pCoeffs     points to the filter coefficients.   
+ * @param[in]     *pState      points to the state buffer.   
+ * @return        none   
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ...}    
+ * </pre>    
+ *    
+ * \par    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>5*numStages</code> values.    
+ *    
+ * \par    
+ * The <code>pState</code> is a pointer to state array.    
+ * Each Biquad stage has 2 state variables <code>d1,</code> and <code>d2</code>.    
+ * The 2 state variables for stage 1 are first, then the 2 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>2*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ */
+
+void arm_biquad_cascade_df2T_init_f64(
+  arm_biquad_cascade_df2T_instance_f64 * S,
+  uint8_t numStages,
+  float64_t * pCoeffs,
+  float64_t * pState)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 2 * numStages */
+  memset(pState, 0, (2u * (uint32_t) numStages) * sizeof(float64_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF2T group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_stereo_df2T_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_stereo_df2T_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,683 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_biquad_cascade_stereo_df2T_f32.c    
+*    
+* Description:  Processing function for the floating-point transposed    
+*               direct form II Biquad cascade filter. 2 channels  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**       
+* @ingroup groupFilters       
+*/
+
+/**       
+* @defgroup BiquadCascadeDF2T Biquad Cascade IIR Filters Using a Direct Form II Transposed Structure       
+*       
+* This set of functions implements arbitrary order recursive (IIR) filters using a transposed direct form II structure.       
+* The filters are implemented as a cascade of second order Biquad sections.       
+* These functions provide a slight memory savings as compared to the direct form I Biquad filter functions.      
+* Only floating-point data is supported.       
+*       
+* This function operate on blocks of input and output data and each call to the function       
+* processes <code>blockSize</code> samples through the filter.       
+* <code>pSrc</code> points to the array of input data and       
+* <code>pDst</code> points to the array of output data.       
+* Both arrays contain <code>blockSize</code> values.       
+*       
+* \par Algorithm       
+* Each Biquad stage implements a second order filter using the difference equation:       
+* <pre>       
+*    y[n] = b0 * x[n] + d1       
+*    d1 = b1 * x[n] + a1 * y[n] + d2       
+*    d2 = b2 * x[n] + a2 * y[n]       
+* </pre>       
+* where d1 and d2 represent the two state values.       
+*       
+* \par       
+* A Biquad filter using a transposed Direct Form II structure is shown below.       
+* \image html BiquadDF2Transposed.gif "Single transposed Direct Form II Biquad"       
+* Coefficients <code>b0, b1, and b2 </code> multiply the input signal <code>x[n]</code> and are referred to as the feedforward coefficients.       
+* Coefficients <code>a1</code> and <code>a2</code> multiply the output signal <code>y[n]</code> and are referred to as the feedback coefficients.       
+* Pay careful attention to the sign of the feedback coefficients.       
+* Some design tools flip the sign of the feedback coefficients:       
+* <pre>       
+*    y[n] = b0 * x[n] + d1;       
+*    d1 = b1 * x[n] - a1 * y[n] + d2;       
+*    d2 = b2 * x[n] - a2 * y[n];       
+* </pre>       
+* In this case the feedback coefficients <code>a1</code> and <code>a2</code> must be negated when used with the CMSIS DSP Library.       
+*       
+* \par       
+* Higher order filters are realized as a cascade of second order sections.       
+* <code>numStages</code> refers to the number of second order stages used.       
+* For example, an 8th order filter would be realized with <code>numStages=4</code> second order stages.       
+* A 9th order filter would be realized with <code>numStages=5</code> second order stages with the       
+* coefficients for one of the stages configured as a first order filter (<code>b2=0</code> and <code>a2=0</code>).       
+*       
+* \par       
+* <code>pState</code> points to the state variable array.       
+* Each Biquad stage has 2 state variables <code>d1</code> and <code>d2</code>.       
+* The state variables are arranged in the <code>pState</code> array as:       
+* <pre>       
+*     {d11, d12, d21, d22, ...}       
+* </pre>       
+* where <code>d1x</code> refers to the state variables for the first Biquad and       
+* <code>d2x</code> refers to the state variables for the second Biquad.       
+* The state array has a total length of <code>2*numStages</code> values.       
+* The state variables are updated after each block of data is processed; the coefficients are untouched.       
+*       
+* \par       
+* The CMSIS library contains Biquad filters in both Direct Form I and transposed Direct Form II.    
+* The advantage of the Direct Form I structure is that it is numerically more robust for fixed-point data types.    
+* That is why the Direct Form I structure supports Q15 and Q31 data types.    
+* The transposed Direct Form II structure, on the other hand, requires a wide dynamic range for the state variables <code>d1</code> and <code>d2</code>.    
+* Because of this, the CMSIS library only has a floating-point version of the Direct Form II Biquad.    
+* The advantage of the Direct Form II Biquad is that it requires half the number of state variables, 2 rather than 4, per Biquad stage.    
+*       
+* \par Instance Structure       
+* The coefficients and state variables for a filter are stored together in an instance data structure.       
+* A separate instance structure must be defined for each filter.       
+* Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.       
+*       
+* \par Init Functions       
+* There is also an associated initialization function.      
+* The initialization function performs following operations:       
+* - Sets the values of the internal structure fields.       
+* - Zeros out the values in the state buffer.       
+* To do this manually without calling the init function, assign the follow subfields of the instance structure:
+* numStages, pCoeffs, pState. Also set all of the values in pState to zero. 
+*       
+* \par       
+* Use of the initialization function is optional.       
+* However, if the initialization function is used, then the instance structure cannot be placed into a const data section.       
+* To place an instance structure into a const data section, the instance structure must be manually initialized.       
+* Set the values in the state buffer to zeros before static initialization.       
+* For example, to statically initialize the instance structure use       
+* <pre>       
+*     arm_biquad_cascade_df2T_instance_f32 S1 = {numStages, pState, pCoeffs};       
+* </pre>       
+* where <code>numStages</code> is the number of Biquad stages in the filter; <code>pState</code> is the address of the state buffer.       
+* <code>pCoeffs</code> is the address of the coefficient buffer;        
+*       
+*/
+
+/**       
+* @addtogroup BiquadCascadeDF2T       
+* @{       
+*/
+
+/**      
+* @brief Processing function for the floating-point transposed direct form II Biquad cascade filter.      
+* @param[in]  *S        points to an instance of the filter data structure.      
+* @param[in]  *pSrc     points to the block of input data.      
+* @param[out] *pDst     points to the block of output data      
+* @param[in]  blockSize number of samples to process.      
+* @return none.      
+*/
+
+
+LOW_OPTIMIZATION_ENTER
+void arm_biquad_cascade_stereo_df2T_f32(
+const arm_biquad_cascade_stereo_df2T_instance_f32 * S,
+float32_t * pSrc,
+float32_t * pDst,
+uint32_t blockSize)
+{
+
+    float32_t *pIn = pSrc;                         /*  source pointer            */
+    float32_t *pOut = pDst;                        /*  destination pointer       */
+    float32_t *pState = S->pState;                 /*  State pointer             */
+    float32_t *pCoeffs = S->pCoeffs;               /*  coefficient pointer       */
+    float32_t acc1a, acc1b;                        /*  accumulator               */
+    float32_t b0, b1, b2, a1, a2;                  /*  Filter coefficients       */
+    float32_t Xn1a, Xn1b;                          /*  temporary input           */
+    float32_t d1a, d2a, d1b, d2b;                  /*  state variables           */
+    uint32_t sample, stage = S->numStages;         /*  loop counters             */
+
+#if defined(ARM_MATH_CM7)
+	
+    float32_t Xn2a, Xn3a, Xn4a, Xn5a, Xn6a, Xn7a, Xn8a;         /*  Input State variables     */
+    float32_t Xn2b, Xn3b, Xn4b, Xn5b, Xn6b, Xn7b, Xn8b;         /*  Input State variables     */
+    float32_t acc2a, acc3a, acc4a, acc5a, acc6a, acc7a, acc8a;  /*  Simulates the accumulator */
+    float32_t acc2b, acc3b, acc4b, acc5b, acc6b, acc7b, acc8b;  /*  Simulates the accumulator */
+
+    do
+    {
+        /* Reading the coefficients */ 
+        b0 = pCoeffs[0]; 
+        b1 = pCoeffs[1]; 
+        b2 = pCoeffs[2]; 
+        a1 = pCoeffs[3]; 
+        /* Apply loop unrolling and compute 8 output values simultaneously. */ 
+        sample = blockSize >> 3u; 
+        a2 = pCoeffs[4]; 
+
+        /*Reading the state values */ 
+        d1a = pState[0]; 
+        d2a = pState[1]; 
+        d1b = pState[2]; 
+        d2b = pState[3]; 
+
+        pCoeffs += 5u;
+
+        /* First part of the processing with loop unrolling.  Compute 8 outputs at a time.       
+        ** a second loop below computes the remaining 1 to 7 samples. */
+        while(sample > 0u) {
+
+            /* y[n] = b0 * x[n] + d1 */
+            /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+            /* d2 = b2 * x[n] + a2 * y[n] */
+
+            /* Read the first 2 inputs. 2 cycles */
+            Xn1a  = pIn[0 ];
+            Xn1b  = pIn[1 ];
+
+            /* Sample 1. 5 cycles */
+            Xn2a  = pIn[2 ];
+            acc1a = b0 * Xn1a + d1a;
+
+            Xn2b  = pIn[3 ];
+            d1a = b1 * Xn1a + d2a;
+
+            Xn3a  = pIn[4 ];
+            d2a = b2 * Xn1a;
+
+            Xn3b  = pIn[5 ];
+            d1a += a1 * acc1a;
+
+            Xn4a  = pIn[6 ];
+            d2a += a2 * acc1a;
+
+            /* Sample 2. 5 cycles */
+            Xn4b  = pIn[7 ];
+            acc1b = b0 * Xn1b + d1b;
+
+            Xn5a  = pIn[8 ];
+            d1b = b1 * Xn1b + d2b;
+
+            Xn5b = pIn[9 ];
+            d2b = b2 * Xn1b;
+
+            Xn6a = pIn[10];
+            d1b += a1 * acc1b;
+
+            Xn6b = pIn[11];
+            d2b += a2 * acc1b;
+
+            /* Sample 3. 5 cycles */
+            Xn7a = pIn[12];
+            acc2a = b0 * Xn2a + d1a;
+
+            Xn7b = pIn[13];
+            d1a = b1 * Xn2a + d2a;
+
+            Xn8a = pIn[14];
+            d2a = b2 * Xn2a;
+
+            Xn8b = pIn[15];
+            d1a += a1 * acc2a;
+
+            pIn += 16;
+            d2a += a2 * acc2a;
+
+            /* Sample 4. 5 cycles */
+            acc2b = b0 * Xn2b + d1b;
+            d1b = b1 * Xn2b + d2b;
+            d2b = b2 * Xn2b;
+            d1b += a1 * acc2b;
+            d2b += a2 * acc2b;
+
+            /* Sample 5. 5 cycles */
+            acc3a = b0 * Xn3a + d1a;
+            d1a = b1 * Xn3a + d2a;
+            d2a = b2 * Xn3a;
+            d1a += a1 * acc3a;
+            d2a += a2 * acc3a;
+
+            /* Sample 6. 5 cycles */
+            acc3b = b0 * Xn3b + d1b;
+            d1b = b1 * Xn3b + d2b;
+            d2b = b2 * Xn3b;
+            d1b += a1 * acc3b;
+            d2b += a2 * acc3b;
+
+            /* Sample 7. 5 cycles */
+            acc4a = b0 * Xn4a + d1a;
+            d1a = b1 * Xn4a + d2a;
+            d2a = b2 * Xn4a;
+            d1a += a1 * acc4a;
+            d2a += a2 * acc4a;
+
+            /* Sample 8. 5 cycles */
+            acc4b = b0 * Xn4b + d1b;
+            d1b = b1 * Xn4b + d2b;
+            d2b = b2 * Xn4b;
+            d1b += a1 * acc4b;
+            d2b += a2 * acc4b;
+
+            /* Sample 9. 5 cycles */
+            acc5a = b0 * Xn5a + d1a;
+            d1a = b1 * Xn5a + d2a;
+            d2a = b2 * Xn5a;
+            d1a += a1 * acc5a;
+            d2a += a2 * acc5a;
+
+            /* Sample 10. 5 cycles */
+            acc5b = b0 * Xn5b + d1b;
+            d1b = b1 * Xn5b + d2b;
+            d2b = b2 * Xn5b;
+            d1b += a1 * acc5b;
+            d2b += a2 * acc5b;
+
+            /* Sample 11. 5 cycles */
+            acc6a = b0 * Xn6a + d1a;
+            d1a = b1 * Xn6a + d2a;
+            d2a = b2 * Xn6a;
+            d1a += a1 * acc6a;
+            d2a += a2 * acc6a;
+
+            /* Sample 12. 5 cycles */
+            acc6b = b0 * Xn6b + d1b;
+            d1b = b1 * Xn6b + d2b;
+            d2b = b2 * Xn6b;
+            d1b += a1 * acc6b;
+            d2b += a2 * acc6b;
+
+            /* Sample 13. 5 cycles */
+            acc7a = b0 * Xn7a + d1a;         
+            d1a = b1 * Xn7a + d2a;   
+            
+            pOut[0 ] = acc1a ;      
+            d2a = b2 * Xn7a;
+
+            pOut[1 ] = acc1b ;	
+            d1a += a1 * acc7a;
+
+            pOut[2 ] = acc2a ;	
+            d2a += a2 * acc7a;
+
+            /* Sample 14. 5 cycles */
+            pOut[3 ] = acc2b ;
+            acc7b = b0 * Xn7b + d1b;
+
+            pOut[4 ] = acc3a ; 
+            d1b = b1 * Xn7b + d2b;
+
+            pOut[5 ] = acc3b ;	
+            d2b = b2 * Xn7b;
+
+            pOut[6 ] = acc4a ;	  
+            d1b += a1 * acc7b;
+
+            pOut[7 ] = acc4b ;
+            d2b += a2 * acc7b;
+
+            /* Sample 15. 5 cycles */
+            pOut[8 ] = acc5a ;  
+            acc8a = b0 * Xn8a + d1a;
+
+            pOut[9 ] = acc5b;	
+            d1a = b1 * Xn8a + d2a;
+
+            pOut[10] = acc6a;	
+            d2a = b2 * Xn8a;
+
+            pOut[11] = acc6b;
+            d1a += a1 * acc8a;
+
+            pOut[12] = acc7a;
+            d2a += a2 * acc8a;
+
+            /* Sample 16. 5 cycles */
+            pOut[13] = acc7b;	
+            acc8b = b0 * Xn8b + d1b;
+
+            pOut[14] = acc8a;	
+            d1b = b1 * Xn8b + d2b;
+
+            pOut[15] = acc8b;
+            d2b = b2 * Xn8b;
+
+            sample--;	 
+            d1b += a1 * acc8b;
+
+            pOut += 16;
+            d2b += a2 * acc8b;
+        }
+
+        sample = blockSize & 0x7u;
+        while(sample > 0u) {
+            /* Read the input */
+            Xn1a = *pIn++; //Channel a
+            Xn1b = *pIn++; //Channel b
+
+            /* y[n] = b0 * x[n] + d1 */
+            acc1a = (b0 * Xn1a) + d1a;
+            acc1b = (b0 * Xn1b) + d1b;
+
+            /* Store the result in the accumulator in the destination buffer. */
+            *pOut++ = acc1a;
+            *pOut++ = acc1b;
+
+            /* Every time after the output is computed state should be updated. */
+            /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+            d1a = ((b1 * Xn1a) + (a1 * acc1a)) + d2a;
+            d1b = ((b1 * Xn1b) + (a1 * acc1b)) + d2b;
+
+            /* d2 = b2 * x[n] + a2 * y[n] */
+            d2a = (b2 * Xn1a) + (a2 * acc1a);
+            d2b = (b2 * Xn1b) + (a2 * acc1b);
+
+            sample--;	
+        }
+
+        /* Store the updated state variables back into the state array */ 
+        pState[0] = d1a; 
+        pState[1] = d2a;         
+
+        pState[2] = d1b; 
+        pState[3] = d2b; 
+        
+        /* The current stage input is given as the output to the next stage */ 
+        pIn = pDst; 
+        /* decrement the loop counter */ 
+        stage--; 
+
+        pState += 4u;
+        /*Reset the output working pointer */ 
+        pOut = pDst; 
+
+    } while(stage > 0u);
+	
+#elif defined(ARM_MATH_CM0_FAMILY)
+
+    /* Run the below code for Cortex-M0 */
+
+    do
+    {
+        /* Reading the coefficients */
+        b0 = *pCoeffs++;
+        b1 = *pCoeffs++;
+        b2 = *pCoeffs++;
+        a1 = *pCoeffs++;
+        a2 = *pCoeffs++;
+
+        /*Reading the state values */
+        d1a = pState[0];
+        d2a = pState[1];
+        d1b = pState[2];
+        d2b = pState[3];
+
+
+        sample = blockSize;
+
+        while(sample > 0u)
+        {
+            /* Read the input */
+            Xn1a = *pIn++; //Channel a
+            Xn1b = *pIn++; //Channel b
+
+            /* y[n] = b0 * x[n] + d1 */
+            acc1a = (b0 * Xn1a) + d1a;
+            acc1b = (b0 * Xn1b) + d1b;
+
+            /* Store the result in the accumulator in the destination buffer. */
+            *pOut++ = acc1a;
+            *pOut++ = acc1b;
+
+            /* Every time after the output is computed state should be updated. */
+            /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+            d1a = ((b1 * Xn1a) + (a1 * acc1a)) + d2a;
+            d1b = ((b1 * Xn1b) + (a1 * acc1b)) + d2b;
+
+            /* d2 = b2 * x[n] + a2 * y[n] */
+            d2a = (b2 * Xn1a) + (a2 * acc1a);
+            d2b = (b2 * Xn1b) + (a2 * acc1b);
+
+            /* decrement the loop counter */
+            sample--;
+        }
+
+        /* Store the updated state variables back into the state array */
+        *pState++ = d1a;
+        *pState++ = d2a;
+        *pState++ = d1b;
+        *pState++ = d2b;
+
+        /* The current stage input is given as the output to the next stage */
+        pIn = pDst;
+
+        /*Reset the output working pointer */
+        pOut = pDst;
+
+        /* decrement the loop counter */
+        stage--;
+
+    } while(stage > 0u);
+	 
+#else
+
+    float32_t Xn2a, Xn3a, Xn4a;                          /*  Input State variables     */
+    float32_t Xn2b, Xn3b, Xn4b;                          /*  Input State variables     */
+    float32_t acc2a, acc3a, acc4a;                       /*  accumulator               */
+    float32_t acc2b, acc3b, acc4b;                       /*  accumulator               */
+    float32_t p0a, p1a, p2a, p3a, p4a, A1a;
+    float32_t p0b, p1b, p2b, p3b, p4b, A1b;
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+    do
+    {
+        /* Reading the coefficients */     
+        b0 = *pCoeffs++;
+        b1 = *pCoeffs++;
+        b2 = *pCoeffs++;
+        a1 = *pCoeffs++;
+        a2 = *pCoeffs++;      
+
+        /*Reading the state values */
+        d1a = pState[0];
+        d2a = pState[1];
+        d1b = pState[2];
+        d2b = pState[3];
+
+        /* Apply loop unrolling and compute 4 output values simultaneously. */
+        sample = blockSize >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+        ** a second loop below computes the remaining 1 to 3 samples. */
+        while(sample > 0u) {
+
+            /* y[n] = b0 * x[n] + d1 */
+            /* d1 = b1 * x[n] + a1 * y[n] + d2 */
+            /* d2 = b2 * x[n] + a2 * y[n] */
+
+            /* Read the four inputs */
+            Xn1a = pIn[0];
+            Xn1b = pIn[1];
+            Xn2a = pIn[2];
+            Xn2b = pIn[3];
+            Xn3a = pIn[4];
+            Xn3b = pIn[5];
+            Xn4a = pIn[6];
+            Xn4b = pIn[7];
+            pIn += 8;     
+            
+            p0a = b0 * Xn1a; 
+            p0b = b0 * Xn1b; 
+            p1a = b1 * Xn1a;
+            p1b = b1 * Xn1b;
+            acc1a = p0a + d1a;
+            acc1b = p0b + d1b;
+            p0a = b0 * Xn2a; 
+            p0b = b0 * Xn2b; 
+            p3a = a1 * acc1a;
+            p3b = a1 * acc1b;
+            p2a = b2 * Xn1a;
+            p2b = b2 * Xn1b;
+            A1a = p1a + p3a;
+            A1b = p1b + p3b;
+            p4a = a2 * acc1a;
+            p4b = a2 * acc1b;
+            d1a = A1a + d2a;
+            d1b = A1b + d2b;
+            d2a = p2a + p4a;
+            d2b = p2b + p4b;
+            
+            p1a = b1 * Xn2a;
+            p1b = b1 * Xn2b;
+            acc2a = p0a + d1a;
+            acc2b = p0b + d1b;
+            p0a = b0 * Xn3a; 
+            p0b = b0 * Xn3b; 
+            p3a = a1 * acc2a;
+            p3b = a1 * acc2b;
+            p2a = b2 * Xn2a;
+            p2b = b2 * Xn2b;
+            A1a = p1a + p3a;
+            A1b = p1b + p3b;
+            p4a = a2 * acc2a;
+            p4b = a2 * acc2b;
+            d1a = A1a + d2a;
+            d1b = A1b + d2b;
+            d2a = p2a + p4a;
+            d2b = p2b + p4b;
+            
+            p1a = b1 * Xn3a;
+            p1b = b1 * Xn3b;
+            acc3a = p0a + d1a;
+            acc3b = p0b + d1b;
+            p0a = b0 * Xn4a; 
+            p0b = b0 * Xn4b; 
+            p3a = a1 * acc3a;
+            p3b = a1 * acc3b;
+            p2a = b2 * Xn3a;
+            p2b = b2 * Xn3b;
+            A1a = p1a + p3a;
+            A1b = p1b + p3b;
+            p4a = a2 * acc3a;
+            p4b = a2 * acc3b;
+            d1a = A1a + d2a;
+            d1b = A1b + d2b;
+            d2a = p2a + p4a;
+            d2b = p2b + p4b;
+            
+            acc4a = p0a + d1a;
+            acc4b = p0b + d1b;
+            p1a = b1 * Xn4a;
+            p1b = b1 * Xn4b;
+            p3a = a1 * acc4a;
+            p3b = a1 * acc4b;
+            p2a = b2 * Xn4a;
+            p2b = b2 * Xn4b;
+            A1a = p1a + p3a;
+            A1b = p1b + p3b;
+            p4a = a2 * acc4a;
+            p4b = a2 * acc4b;
+            d1a = A1a + d2a;
+            d1b = A1b + d2b;
+            d2a = p2a + p4a;
+            d2b = p2b + p4b;
+
+            pOut[0] = acc1a;	
+            pOut[1] = acc1b;	
+            pOut[2] = acc2a;	
+            pOut[3] = acc2b;
+            pOut[4] = acc3a;	
+            pOut[5] = acc3b;	
+            pOut[6] = acc4a;	
+            pOut[7] = acc4b;
+            pOut += 8;
+             
+            sample--;	       
+        }
+
+        sample = blockSize & 0x3u;
+        while(sample > 0u) {
+            Xn1a = *pIn++;
+            Xn1b = *pIn++;
+
+            p0a = b0 * Xn1a; 
+            p0b = b0 * Xn1b; 
+            p1a = b1 * Xn1a;
+            p1b = b1 * Xn1b;
+            acc1a = p0a + d1a;
+            acc1b = p0b + d1b;
+            p3a = a1 * acc1a;
+            p3b = a1 * acc1b;
+            p2a = b2 * Xn1a;
+            p2b = b2 * Xn1b;
+            A1a = p1a + p3a;
+            A1b = p1b + p3b;
+            p4a = a2 * acc1a;
+            p4b = a2 * acc1b;
+            d1a = A1a + d2a;
+            d1b = A1b + d2b;
+            d2a = p2a + p4a;
+            d2b = p2b + p4b;
+
+            *pOut++ = acc1a;
+            *pOut++ = acc1b;
+
+            sample--;	       
+        }
+
+        /* Store the updated state variables back into the state array */
+        *pState++ = d1a;
+        *pState++ = d2a;
+        *pState++ = d1b;
+        *pState++ = d2b;
+
+        /* The current stage input is given as the output to the next stage */
+        pIn = pDst;
+
+        /*Reset the output working pointer */
+        pOut = pDst;
+
+        /* decrement the loop counter */
+        stage--;
+
+    } while(stage > 0u);
+
+#endif 
+
+}
+LOW_OPTIMIZATION_EXIT
+
+/**       
+   * @} end of BiquadCascadeDF2T group       
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_stereo_df2T_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_biquad_cascade_stereo_df2T_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,102 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_biquad_cascade_stereo_df2T_init_f32.c    
+*    
+* Description:  Initialization function for the floating-point transposed   
+*               direct form II Biquad cascade filter.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup BiquadCascadeDF2T    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the floating-point transposed direct form II Biquad cascade filter.   
+ * @param[in,out] *S           points to an instance of the filter data structure.   
+ * @param[in]     numStages    number of 2nd order stages in the filter.   
+ * @param[in]     *pCoeffs     points to the filter coefficients.   
+ * @param[in]     *pState      points to the state buffer.   
+ * @return        none   
+ *    
+ * <b>Coefficient and State Ordering:</b>    
+ * \par    
+ * The coefficients are stored in the array <code>pCoeffs</code> in the following order:    
+ * <pre>    
+ *     {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ...}    
+ * </pre>    
+ *    
+ * \par    
+ * where <code>b1x</code> and <code>a1x</code> are the coefficients for the first stage,    
+ * <code>b2x</code> and <code>a2x</code> are the coefficients for the second stage,    
+ * and so on.  The <code>pCoeffs</code> array contains a total of <code>5*numStages</code> values.    
+ *    
+ * \par    
+ * The <code>pState</code> is a pointer to state array.    
+ * Each Biquad stage has 2 state variables <code>d1,</code> and <code>d2</code> for each channel.    
+ * The 2 state variables for stage 1 are first, then the 2 state variables for stage 2, and so on.    
+ * The state array has a total length of <code>2*numStages</code> values.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ */
+
+void arm_biquad_cascade_stereo_df2T_init_f32(
+  arm_biquad_cascade_stereo_df2T_instance_f32 * S,
+  uint8_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState)
+{
+  /* Assign filter stages */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always 4 * numStages */
+  memset(pState, 0, (4u * (uint32_t) numStages) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+}
+
+/**    
+ * @} end of BiquadCascadeDF2T group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,647 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_f32.c    
+*    
+* Description:	Convolution of floating-point sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup Conv Convolution    
+ *    
+ * Convolution is a mathematical operation that operates on two finite length vectors to generate a finite length output vector.    
+ * Convolution is similar to correlation and is frequently used in filtering and data analysis.    
+ * The CMSIS DSP library contains functions for convolving Q7, Q15, Q31, and floating-point data types.    
+ * The library also provides fast versions of the Q15 and Q31 functions on Cortex-M4 and Cortex-M3.    
+ *    
+ * \par Algorithm    
+ * Let <code>a[n]</code> and <code>b[n]</code> be sequences of length <code>srcALen</code> and <code>srcBLen</code> samples respectively.    
+ * Then the convolution    
+ *    
+ * <pre>    
+ *                   c[n] = a[n] * b[n]    
+ * </pre>    
+ *    
+ * \par    
+ * is defined as    
+ * \image html ConvolutionEquation.gif    
+ * \par    
+ * Note that <code>c[n]</code> is of length <code>srcALen + srcBLen - 1</code> and is defined over the interval <code>n=0, 1, 2, ..., srcALen + srcBLen - 2</code>.    
+ * <code>pSrcA</code> points to the first input vector of length <code>srcALen</code> and    
+ * <code>pSrcB</code> points to the second input vector of length <code>srcBLen</code>.    
+ * The output result is written to <code>pDst</code> and the calling function must allocate <code>srcALen+srcBLen-1</code> words for the result.    
+ *    
+ * \par    
+ * Conceptually, when two signals <code>a[n]</code> and <code>b[n]</code> are convolved,    
+ * the signal <code>b[n]</code> slides over <code>a[n]</code>.    
+ * For each offset \c n, the overlapping portions of a[n] and b[n] are multiplied and summed together.    
+ *    
+ * \par    
+ * Note that convolution is a commutative operation:    
+ *    
+ * <pre>    
+ *                   a[n] * b[n] = b[n] * a[n].    
+ * </pre>    
+ *    
+ * \par    
+ * This means that switching the A and B arguments to the convolution functions has no effect.    
+ *    
+ * <b>Fixed-Point Behavior</b>    
+ *    
+ * \par    
+ * Convolution requires summing up a large number of intermediate products.    
+ * As such, the Q7, Q15, and Q31 functions run a risk of overflow and saturation.    
+ * Refer to the function specific documentation below for further details of the particular algorithm used.    
+ *
+ *
+ * <b>Fast Versions</b>
+ *
+ * \par 
+ * Fast versions are supported for Q31 and Q15.  Cycles for Fast versions are less compared to Q31 and Q15 of conv and the design requires
+ * the input signals should be scaled down to avoid intermediate overflows.   
+ *
+ *
+ * <b>Opt Versions</b>
+ *
+ * \par 
+ * Opt versions are supported for Q15 and Q7.  Design uses internal scratch buffer for getting good optimisation.
+ * These versions are optimised in cycles and consumes more memory(Scratch memory) compared to Q15 and Q7 versions 
+ */
+
+/**    
+ * @addtogroup Conv    
+ * @{    
+ */
+
+/**    
+ * @brief Convolution of floating-point sequences.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.    
+ * @return none.    
+ */
+
+void arm_conv_f32(
+  float32_t * pSrcA,
+  uint32_t srcALen,
+  float32_t * pSrcB,
+  uint32_t srcBLen,
+  float32_t * pDst)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t *pIn1;                               /* inputA pointer */
+  float32_t *pIn2;                               /* inputB pointer */
+  float32_t *pOut = pDst;                        /* output pointer */
+  float32_t *px;                                 /* Intermediate inputA pointer */
+  float32_t *py;                                 /* Intermediate inputB pointer */
+  float32_t *pSrc1, *pSrc2;                      /* Intermediate pointers */
+  float32_t sum, acc0, acc1, acc2, acc3;         /* Accumulator */
+  float32_t x0, x1, x2, x3, c0;                  /* Temporary variables to hold state and coefficient values */
+  uint32_t j, k, count, blkCnt, blockSize1, blockSize2, blockSize3;     /* loop counters */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally    
+   * divided into three stages according to the number of multiplications that has to be    
+   * taken place between inputA samples and inputB samples. In the first stage of the    
+   * algorithm, the multiplications increase by one for every iteration.    
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.    
+   * In the third stage of the algorithm, the multiplications decrease by one    
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.    
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------    
+   * initializations of stage1    
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]    
+   * sum = x[0] * y[1] + x[1] * y[0]    
+   * ....    
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]    
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------    
+   * Stage1 process    
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0.0f;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 1] */
+      sum += *px++ * *py--;
+
+      /* x[1] * y[srcBLen - 2] */
+      sum += *px++ * *py--;
+
+      /* x[2] * y[srcBLen - 3] */
+      sum += *px++ * *py--;
+
+      /* x[3] * y[srcBLen - 4] */
+      sum += *px++ * *py--;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += *px++ * *py--;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = sum;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------    
+   * Initializations of stage2    
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]    
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]    
+   * ....    
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]    
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------    
+   * Stage2 process    
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+   * So, to loop unroll over blockSize2,    
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0.0f;
+      acc1 = 0.0f;
+      acc2 = 0.0f;
+      acc3 = 0.0f;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+      x2 = *(px++);
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read y[srcBLen - 1] sample */
+        c0 = *(py--);
+
+        /* Read x[3] sample */
+        x3 = *(px);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[0] * y[srcBLen - 1] */
+        acc0 += x0 * c0;
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] */
+        acc1 += x1 * c0;
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] */
+        acc2 += x2 * c0;
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] */
+        acc3 += x3 * c0;
+
+        /* Read y[srcBLen - 2] sample */
+        c0 = *(py--);
+
+        /* Read x[4] sample */
+        x0 = *(px + 1u);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[1] * y[srcBLen - 2] */
+        acc0 += x1 * c0;
+        /* acc1 +=  x[2] * y[srcBLen - 2] */
+        acc1 += x2 * c0;
+        /* acc2 +=  x[3] * y[srcBLen - 2] */
+        acc2 += x3 * c0;
+        /* acc3 +=  x[4] * y[srcBLen - 2] */
+        acc3 += x0 * c0;
+
+        /* Read y[srcBLen - 3] sample */
+        c0 = *(py--);
+
+        /* Read x[5] sample */
+        x1 = *(px + 2u);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[2] * y[srcBLen - 3] */
+        acc0 += x2 * c0;
+        /* acc1 +=  x[3] * y[srcBLen - 2] */
+        acc1 += x3 * c0;
+        /* acc2 +=  x[4] * y[srcBLen - 2] */
+        acc2 += x0 * c0;
+        /* acc3 +=  x[5] * y[srcBLen - 2] */
+        acc3 += x1 * c0;
+
+        /* Read y[srcBLen - 4] sample */
+        c0 = *(py--);
+
+        /* Read x[6] sample */
+        x2 = *(px + 3u);
+        px += 4u;
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[3] * y[srcBLen - 4] */
+        acc0 += x3 * c0;
+        /* acc1 +=  x[4] * y[srcBLen - 4] */
+        acc1 += x0 * c0;
+        /* acc2 +=  x[5] * y[srcBLen - 4] */
+        acc2 += x1 * c0;
+        /* acc3 +=  x[6] * y[srcBLen - 4] */
+        acc3 += x2 * c0;
+
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Read y[srcBLen - 5] sample */
+        c0 = *(py--);
+
+        /* Read x[7] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[srcBLen - 5] */
+        acc0 += x0 * c0;
+        /* acc1 +=  x[5] * y[srcBLen - 5] */
+        acc1 += x1 * c0;
+        /* acc2 +=  x[6] * y[srcBLen - 5] */
+        acc2 += x2 * c0;
+        /* acc3 +=  x[7] * y[srcBLen - 5] */
+        acc3 += x3 * c0;
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = acc0;
+      *pOut++ = acc1;
+      *pOut++ = acc2;
+      *pOut++ = acc3;
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0.0f;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += *px++ * *py--;
+        sum += *px++ * *py--;
+        sum += *px++ * *py--;
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,    
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0.0f;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------    
+   * Initializations of stage3    
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]    
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]    
+   * ....    
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]    
+   * sum +=  x[srcALen-1] * y[srcBLen-1]    
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.    
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* -------------------    
+   * Stage3 process    
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0.0f;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+      sum += *px++ * *py--;
+
+      /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+      sum += *px++ * *py--;
+
+      /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+      sum += *px++ * *py--;
+
+      /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+      sum += *px++ * *py--;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+      sum += *px++ * *py--;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = sum;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t *pIn1 = pSrcA;                       /* inputA pointer */
+  float32_t *pIn2 = pSrcB;                       /* inputB pointer */
+  float32_t sum;                                 /* Accumulator */
+  uint32_t i, j;                                 /* loop counters */
+
+  /* Loop to calculate convolution for output length number of times */
+  for (i = 0u; i < ((srcALen + srcBLen) - 1u); i++)
+  {
+    /* Initialize sum with zero to carry out MAC operations */
+    sum = 0.0f;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0u; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if((((i - j) < srcBLen) && (j < srcALen)))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += pIn1[j] * pIn2[i - j];
+      }
+    }
+    /* Store the output in the destination buffer */
+    pDst[i] = sum;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY        */
+
+}
+
+/**    
+ * @} end of Conv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_opt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_opt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,543 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_fast_opt_q15.c    
+*    
+* Description:	Fast Q15 Convolution.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Conv    
+ * @{    
+ */
+
+/**    
+ * @brief Convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.    
+ * @param[in]  *pScratch1 points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.   
+ * @param[in]  *pScratch2 points to scratch buffer of size min(srcALen, srcBLen).   
+ * @return none.    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit    
+ *     
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * This fast version uses a 32-bit accumulator with 2.30 format.    
+ * The accumulator maintains full precision of the intermediate multiplication results    
+ * but provides only a single guard bit. There is no saturation on intermediate additions.    
+ * Thus, if the accumulator overflows it wraps around and distorts the result.    
+ * The input signals should be scaled down to avoid intermediate overflows.    
+ * Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows,    
+ * as maximum of min(srcALen, srcBLen) number of additions are carried internally.    
+ * The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.    
+ *    
+ * \par    
+ * See <code>arm_conv_q15()</code> for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.    
+ */
+
+void arm_conv_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulators */
+  q31_t x1, x2, x3;                              /* Temporary variables to hold state and coefficient values */
+  q31_t y1, y2;                                  /* State variables */
+  q15_t *pOut = pDst;                            /* output pointer */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch1 */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch1 */
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  uint32_t j, k, blkCnt;                         /* loop counter */
+  uint32_t tapCnt;                               /* loop count */
+#ifdef UNALIGNED_SUPPORT_DISABLE
+
+  q15_t a, b;
+
+#endif	/*	#ifdef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* Pointer to take end of scratch2 buffer */
+  pScr2 = pScratch2 + srcBLen - 1;
+
+  /* points to smaller length sequence */
+  px = pIn2;
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = srcBLen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+
+  /* Copy smaller length input sequence in reverse order into second scratch buffer */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr2-- = *px++;
+    *pScr2-- = *px++;
+    *pScr2-- = *px++;
+    *pScr2-- = *px++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcBLen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr2-- = *px++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* Initialze temporary scratch pointer */
+  pScr1 = pScratch1;
+
+  /* Assuming scratch1 buffer is aligned by 32-bit */
+  /* Fill (srcBLen - 1u) zeros in scratch1 buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update temporary scratch pointer */
+  pScr1 += (srcBLen - 1u);
+
+  /* Copy bigger length sequence(srcALen) samples in scratch1 buffer */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Copy (srcALen) samples in scratch buffer */
+  arm_copy_q15(pIn1, pScr1, srcALen);
+
+  /* Update pointers */
+  pScr1 += srcALen;
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = srcALen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr1++ = *pIn1++;
+    *pScr1++ = *pIn1++;
+    *pScr1++ = *pIn1++;
+    *pScr1++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcALen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr1++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update pointer */
+  pScr1 += (srcBLen - 1u);
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = (srcBLen - 1u) >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = (srcBLen - 1u) % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* Temporary pointer for scratch2 */
+  py = pScratch2;
+
+
+  /* Initialization of pIn2 pointer */
+  pIn2 = py;
+
+  /* First part of the processing with loop unrolling process 4 data points at a time.       
+   ** a second loop below process for the remaining 1 to 3 samples. */
+
+  /* Actual convolution process starts here */
+  blkCnt = (srcALen + srcBLen - 1u) >> 2;
+
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Read two samples from scratch1 buffer */
+    x1 = *__SIMD32(pScr1)++;
+
+    /* Read next two samples from scratch1 buffer */
+    x2 = *__SIMD32(pScr1)++;
+
+    tapCnt = (srcBLen) >> 2u;
+
+    while(tapCnt > 0u)
+    {
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pIn2);
+      y2 = _SIMD32_OFFSET(pIn2 + 2u);
+
+      /* multiply and accumlate */
+      acc0 = __SMLAD(x1, y1, acc0);
+      acc2 = __SMLAD(x2, y1, acc2);
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      /* multiply and accumlate */
+      acc1 = __SMLADX(x3, y1, acc1);
+
+      /* Read next two samples from scratch1 buffer */
+      x1 = _SIMD32_OFFSET(pScr1);
+
+      /* multiply and accumlate */
+      acc0 = __SMLAD(x2, y2, acc0);
+      acc2 = __SMLAD(x1, y2, acc2);
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+      acc1 = __SMLADX(x3, y2, acc1);
+
+      x2 = _SIMD32_OFFSET(pScr1 + 2u);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y2, acc3);
+
+#else	 
+
+      /* Read four samples from smaller buffer */
+	  a = *pIn2;
+	  b = *(pIn2 + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      y1 = __PKHBT(a, b, 16);
+#else
+      y1 = __PKHBT(b, a, 16);
+#endif
+	  
+	  a = *(pIn2 + 2);
+	  b = *(pIn2 + 3);
+#ifndef ARM_MATH_BIG_ENDIAN
+      y2 = __PKHBT(a, b, 16);
+#else
+      y2 = __PKHBT(b, a, 16);
+#endif				
+
+      acc0 = __SMLAD(x1, y1, acc0);
+
+      acc2 = __SMLAD(x2, y1, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc1 = __SMLADX(x3, y1, acc1);
+
+	  a = *pScr1;
+	  b = *(pScr1 + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(a, b, 16);
+#else
+      x1 = __PKHBT(b, a, 16);
+#endif
+
+      acc0 = __SMLAD(x2, y2, acc0);
+
+      acc2 = __SMLAD(x1, y2, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      acc1 = __SMLADX(x3, y2, acc1);
+
+	  a = *(pScr1 + 2);
+	  b = *(pScr1 + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x2 = __PKHBT(a, b, 16);
+#else
+      x2 = __PKHBT(b, a, 16);
+#endif
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y2, acc3);
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+      /* update scratch pointers */
+      pIn2 += 4u;
+      pScr1 += 4u;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Update scratch pointer for remaining samples of smaller length sequence */
+    pScr1 -= 4u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    tapCnt = (srcBLen) & 3u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pIn2);
+      acc1 += (*pScr1++ * *pIn2);
+      acc2 += (*pScr1++ * *pIn2);
+      acc3 += (*pScr1++ * *pIn2++);
+
+      pScr1 -= 3u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+
+    /* Store the results in the accumulators in the destination buffer. */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+
+#else
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+
+
+#endif /*      #ifndef ARM_MATH_BIG_ENDIAN       */
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch1 += 4u;
+
+  }
+
+
+  blkCnt = (srcALen + srcBLen - 1u) & 0x3;
+
+  /* Calculate convolution for remaining samples of Bigger length sequence */
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+
+    tapCnt = (srcBLen) >> 1u;
+
+    while(tapCnt > 0u)
+    {
+
+      acc0 += (*pScr1++ * *pIn2++);
+      acc0 += (*pScr1++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    tapCnt = (srcBLen) & 1u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* The result is in 2.30 format.  Convert to 1.15 with saturation.       
+     ** Then store the output in the destination buffer. */
+    *pOut++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch1 += 1u;
+
+  }
+
+}
+
+/**    
+ * @} end of Conv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,1410 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_conv_fast_q15.c   
+*   
+* Description:	Fast Q15 Convolution.   
+*   
+* Target Processor: Cortex-M4/Cortex-M3
+* 
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup Conv   
+ * @{   
+ */
+
+/**   
+ * @brief Convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.   
+ * @param[in] *pSrcA points to the first input sequence.   
+ * @param[in] srcALen length of the first input sequence.   
+ * @param[in] *pSrcB points to the second input sequence.   
+ * @param[in] srcBLen length of the second input sequence.   
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.   
+ * @return none.   
+ *   
+ * <b>Scaling and Overflow Behavior:</b>   
+ *   
+ * \par   
+ * This fast version uses a 32-bit accumulator with 2.30 format.   
+ * The accumulator maintains full precision of the intermediate multiplication results   
+ * but provides only a single guard bit. There is no saturation on intermediate additions.   
+ * Thus, if the accumulator overflows it wraps around and distorts the result.   
+ * The input signals should be scaled down to avoid intermediate overflows.   
+ * Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows,   
+ * as maximum of min(srcALen, srcBLen) number of additions are carried internally.   
+ * The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.   
+ *   
+ * \par   
+ * See <code>arm_conv_q15()</code> for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.   
+ */
+
+void arm_conv_fast_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst)
+{
+#ifndef UNALIGNED_SUPPORT_DISABLE
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *pOut = pDst;                            /* output pointer */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1, *pSrc2;                          /* Intermediate pointers */
+  q31_t x0, x1, x2, x3, c0;                      /* Temporary variables to hold state and coefficient values */
+  uint32_t blockSize1, blockSize2, blockSize3, j, k, count, blkCnt;     /* loop counter */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally   
+   * divided into three stages according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first stage of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+   * In the third stage of the algorithm, the multiplications decrease by one   
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.   
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]   
+   * sum = x[0] * y[1] + x[1] * y[0]   
+   * ....   
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* For loop unrolling by 4, this stage is divided into two. */
+  /* First part of this stage computes the MAC operations less than 4 */
+  /* Second part of this stage computes the MAC operations greater than or equal to 4 */
+
+  /* The first part of the stage starts here */
+  while((count < 4u) && (blockSize1 > 0u))
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Loop over number of MAC operations between   
+     * inputA samples and inputB samples */
+    k = count;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = __SMLAD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* The second part of the stage starts here */
+  /* The internal loop, over count, is unrolled by 4 */
+  /* To, read the last two inputB samples using SIMD:   
+   * y[srcBLen] and y[srcBLen-1] coefficients, py is decremented by 1 */
+  py = py - 1;
+
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0], x[1] are multiplied with y[srcBLen - 1], y[srcBLen - 2] respectively */
+      sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+      /* x[2], x[3] are multiplied with y[srcBLen - 3], y[srcBLen - 4] respectively */
+      sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* For the next MAC operations, the pointer py is used without SIMD   
+     * So, py is incremented by 1 */
+    py = py + 1u;
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = __SMLAD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + (count - 1u);
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is the index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+
+  /* --------------------   
+   * Stage2 process   
+   * -------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      py = py - 1u;
+
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+
+      /* read x[0], x[1] samples */
+      x0 = *__SIMD32(px);
+      /* read x[1], x[2] samples */
+      x1 = _SIMD32_OFFSET(px+1);
+	  px+= 2u;
+
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the last two inputB samples using SIMD:   
+         * y[srcBLen - 1] and y[srcBLen - 2] */
+        c0 = *__SIMD32(py)--;
+
+        /* acc0 +=  x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2] */
+        acc0 = __SMLADX(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2] */
+        acc1 = __SMLADX(x1, c0, acc1);
+
+        /* Read x[2], x[3] */
+        x2 = *__SIMD32(px);
+
+        /* Read x[3], x[4] */
+        x3 = _SIMD32_OFFSET(px+1);
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2] */
+        acc2 = __SMLADX(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2] */
+        acc3 = __SMLADX(x3, c0, acc3);
+
+        /* Read y[srcBLen - 3] and y[srcBLen - 4] */
+        c0 = *__SIMD32(py)--;
+
+        /* acc0 +=  x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4] */
+        acc0 = __SMLADX(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4] */
+        acc1 = __SMLADX(x3, c0, acc1);
+
+        /* Read x[4], x[5] */
+        x0 = _SIMD32_OFFSET(px+2);
+
+        /* Read x[5], x[6] */
+        x1 = _SIMD32_OFFSET(px+3);
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4] */
+        acc2 = __SMLADX(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4] */
+        acc3 = __SMLADX(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[srcBLen - 5] */
+        c0 = *(py+1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[7] */
+        x3 = *__SIMD32(px);
+		px++;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLADX(x1, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x0, c0, acc0);
+        acc1 = __SMLADX(x1, c0, acc1);
+        acc2 = __SMLADX(x3, c0, acc2);
+        acc3 = __SMLADX(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x0, c0, acc0);
+        acc1 = __SMLADX(x1, c0, acc1);
+        acc2 = __SMLADX(x3, c0, acc2);
+        acc3 = __SMLADX(x2, c0, acc3);
+
+        /* Read y[srcBLen - 7] */
+		c0 = *(py-1);
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[10] */
+        x3 =  _SIMD32_OFFSET(px+2);
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x1, c0, acc0);
+        acc1 = __SMLAD(x2, c0, acc1);
+        acc2 = __SMLADX(x2, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      /* Store the results in the accumulators in the destination buffer. */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ = __PKHBT((acc0 >> 15), (acc1 >> 15), 16);
+      *__SIMD32(pOut)++ = __PKHBT((acc2 >> 15), (acc3 >> 15), 16);
+
+#else
+
+      *__SIMD32(pOut)++ = __PKHBT((acc1 >> 15), (acc0 >> 15), 16);
+      *__SIMD32(pOut)++ = __PKHBT((acc3 >> 15), (acc2 >> 15), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+   * sum +=  x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  pIn2 = pSrc2 - 1u;
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  /* For loop unrolling by 4, this stage is divided into two. */
+  /* First part of this stage computes the MAC operations greater than 4 */
+  /* Second part of this stage computes the MAC operations less than or equal to 4 */
+
+  /* The first part of the stage starts here */
+  j = blockSize3 >> 2u;
+
+  while((j > 0u) && (blockSize3 > 0u))
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[srcALen - srcBLen + 1], x[srcALen - srcBLen + 2] are multiplied   
+       * with y[srcBLen - 1], y[srcBLen - 2] respectively */
+      sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+      /* x[srcALen - srcBLen + 3], x[srcALen - srcBLen + 4] are multiplied   
+       * with y[srcBLen - 3], y[srcBLen - 4] respectively */
+      sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* For the next MAC operations, the pointer py is used without SIMD   
+     * So, py is incremented by 1 */
+    py = py + 1u;
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* sum += x[srcALen - srcBLen + 5] * y[srcBLen - 5] */
+      sum = __SMLAD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+
+    j--;
+  }
+
+  /* The second part of the stage starts here */
+  /* SIMD is not used for the next MAC operations,   
+   * so pointer py is updated to read only one sample at a time */
+  py = py + 1u;
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+      sum = __SMLAD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *pOut = pDst;                            /* output pointer */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1, *pSrc2;                          /* Intermediate pointers */
+  q31_t x0, x1, x2, x3, c0;                      /* Temporary variables to hold state and coefficient values */
+  uint32_t blockSize1, blockSize2, blockSize3, j, k, count, blkCnt;     /* loop counter */
+  q15_t a, b;
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally   
+   * divided into three stages according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first stage of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+   * In the third stage of the algorithm, the multiplications decrease by one   
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.   
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]   
+   * sum = x[0] * y[1] + x[1] * y[0]   
+   * ....   
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* For loop unrolling by 4, this stage is divided into two. */
+  /* First part of this stage computes the MAC operations less than 4 */
+  /* Second part of this stage computes the MAC operations greater than or equal to 4 */
+
+  /* The first part of the stage starts here */
+  while((count < 4u) && (blockSize1 > 0u))
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Loop over number of MAC operations between   
+     * inputA samples and inputB samples */
+    k = count;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += ((q31_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* The second part of the stage starts here */
+  /* The internal loop, over count, is unrolled by 4 */
+  /* To, read the last two inputB samples using SIMD:   
+   * y[srcBLen] and y[srcBLen-1] coefficients, py is decremented by 1 */
+  py = py - 1;
+
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+	py++;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += ((q31_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + (count - 1u);
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is the index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+
+  /* --------------------   
+   * Stage2 process   
+   * -------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      py = py - 1u;
+
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;	  
+
+      /* read x[0], x[1] samples */
+	  a = *px++;
+	  b = *px++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x0 = __PKHBT(a, b, 16);
+	  a = *px;
+	  x1 = __PKHBT(b, a, 16);
+
+#else
+
+	  x0 = __PKHBT(b, a, 16);
+	  a = *px;
+	  x1 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the last two inputB samples using SIMD:   
+         * y[srcBLen - 1] and y[srcBLen - 2] */
+		a = *py;
+		b = *(py+1);
+		py -= 2;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc0 +=  x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2] */
+        acc0 = __SMLADX(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2] */
+        acc1 = __SMLADX(x1, c0, acc1);
+
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x2 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x3 = __PKHBT(b, a, 16);
+
+#else
+
+	  x2 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x3 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2] */
+        acc2 = __SMLADX(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2] */
+        acc3 = __SMLADX(x3, c0, acc3);
+
+        /* Read y[srcBLen - 3] and y[srcBLen - 4] */
+		a = *py;
+		b = *(py+1);
+		py -= 2;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc0 +=  x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4] */
+        acc0 = __SMLADX(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4] */
+        acc1 = __SMLADX(x3, c0, acc1);
+
+        /* Read x[4], x[5], x[6] */
+	  a = *(px + 2);
+	  b = *(px + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x0 = __PKHBT(a, b, 16);
+	  a = *(px + 4);
+	  x1 = __PKHBT(b, a, 16);
+
+#else
+
+	  x0 = __PKHBT(b, a, 16);
+	  a = *(px + 4);
+	  x1 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4] */
+        acc2 = __SMLADX(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4] */
+        acc3 = __SMLADX(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[srcBLen - 5] */
+        c0 = *(py+1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[7] */
+		a = *px;
+		b = *(px+1);
+		px++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		x3 = __PKHBT(a, b, 16);
+
+#else
+
+ 		x3 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLADX(x1, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+		a = *py;
+		b = *(py+1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* Read x[7], x[8], x[9] */
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x3 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(b, a, 16);
+
+#else
+
+	  x3 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x0, c0, acc0);
+        acc1 = __SMLADX(x1, c0, acc1);
+        acc2 = __SMLADX(x3, c0, acc2);
+        acc3 = __SMLADX(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+		a = *py;
+		b = *(py+1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* Read x[7], x[8], x[9] */
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x3 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(b, a, 16);
+
+#else
+
+	  x3 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x0, c0, acc0);
+        acc1 = __SMLADX(x1, c0, acc1);
+        acc2 = __SMLADX(x3, c0, acc2);
+        acc3 = __SMLADX(x2, c0, acc3);
+
+        /* Read y[srcBLen - 7] */
+		c0 = *(py-1);
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[10] */
+		a = *(px+2);
+		b = *(px+3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		x3 = __PKHBT(a, b, 16);
+
+#else
+
+ 		x3 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x1, c0, acc0);
+        acc1 = __SMLAD(x2, c0, acc1);
+        acc2 = __SMLADX(x2, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      /* Store the results in the accumulators in the destination buffer. */
+	  *pOut++ = (q15_t)(acc0 >> 15);
+	  *pOut++ = (q15_t)(acc1 >> 15);
+	  *pOut++ = (q15_t)(acc2 >> 15);
+	  *pOut++ = (q15_t)(acc3 >> 15);
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+   * sum +=  x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  pIn2 = pSrc2 - 1u;
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  /* For loop unrolling by 4, this stage is divided into two. */
+  /* First part of this stage computes the MAC operations greater than 4 */
+  /* Second part of this stage computes the MAC operations less than or equal to 4 */
+
+  /* The first part of the stage starts here */
+  j = blockSize3 >> 2u;
+
+  while((j > 0u) && (blockSize3 > 0u))
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+	py++;
+
+    while(k > 0u)
+    {	
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* sum += x[srcALen - srcBLen + 5] * y[srcBLen - 5] */
+        sum += ((q31_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+
+    j--;
+  }
+
+  /* The second part of the stage starts here */
+  /* SIMD is not used for the next MAC operations,   
+   * so pointer py is updated to read only one sample at a time */
+  py = py + 1u;
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum += ((q31_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (sum >> 15);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+}
+
+/**   
+ * @} end of Conv group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,577 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_fast_q31.c    
+*    
+* Description:	Q31 Convolution (fast version).    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Conv    
+ * @{    
+ */
+
+/**    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * This function is optimized for speed at the expense of fixed-point precision and overflow protection.    
+ * The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format.    
+ * These intermediate results are accumulated in a 32-bit register in 2.30 format.    
+ * Finally, the accumulator is saturated and converted to a 1.31 result.    
+ *    
+ * \par    
+ * The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result.    
+ * In order to avoid overflows completely the input signals must be scaled down.    
+ * Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows,    
+ * as maximum of min(srcALen, srcBLen) number of additions are carried internally.    
+ *    
+ * \par    
+ * See <code>arm_conv_q31()</code> for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.    
+ */
+
+void arm_conv_fast_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst)
+{
+  q31_t *pIn1;                                   /* inputA pointer */
+  q31_t *pIn2;                                   /* inputB pointer */
+  q31_t *pOut = pDst;                            /* output pointer */
+  q31_t *px;                                     /* Intermediate inputA pointer  */
+  q31_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t *pSrc1, *pSrc2;                          /* Intermediate pointers */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator */
+  q31_t x0, x1, x2, x3, c0;                      /* Temporary variables to hold state and coefficient values */
+  uint32_t j, k, count, blkCnt, blockSize1, blockSize2, blockSize3;     /* loop counter */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally    
+   * divided into three stages according to the number of multiplications that has to be    
+   * taken place between inputA samples and inputB samples. In the first stage of the    
+   * algorithm, the multiplications increase by one for every iteration.    
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.    
+   * In the third stage of the algorithm, the multiplications decrease by one    
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.    
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------    
+   * Initializations of stage1    
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]    
+   * sum = x[0] * y[1] + x[1] * y[0]    
+   * ....    
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]    
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------    
+   * Stage1 process    
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 1] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* x[1] * y[srcBLen - 2] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* x[2] * y[srcBLen - 3] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* x[3] * y[srcBLen - 4] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = sum << 1;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------    
+   * Initializations of stage2    
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]    
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]    
+   * ....    
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]    
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------    
+   * Stage2 process    
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+   * So, to loop unroll over blockSize2,    
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+      x2 = *(px++);
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read y[srcBLen - 1] sample */
+        c0 = *(py--);
+
+        /* Read x[3] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[0] * y[srcBLen - 1] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x2 * c0)) >> 32);
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x3 * c0)) >> 32);
+
+        /* Read y[srcBLen - 2] sample */
+        c0 = *(py--);
+
+        /* Read x[4] sample */
+        x0 = *(px++);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[1] * y[srcBLen - 2] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc1 +=  x[2] * y[srcBLen - 2] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc2 +=  x[3] * y[srcBLen - 2] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x3 * c0)) >> 32);
+        /* acc3 +=  x[4] * y[srcBLen - 2] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+        /* Read y[srcBLen - 3] sample */
+        c0 = *(py--);
+
+        /* Read x[5] sample */
+        x1 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[2] * y[srcBLen - 3] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc1 +=  x[3] * y[srcBLen - 3] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x3 * c0)) >> 32);
+        /* acc2 +=  x[4] * y[srcBLen - 3] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc3 +=  x[5] * y[srcBLen - 3] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+        /* Read y[srcBLen - 4] sample */
+        c0 = *(py--);
+
+        /* Read x[6] sample */
+        x2 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[3] * y[srcBLen - 4] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x3 * c0)) >> 32);
+        /* acc1 +=  x[4] * y[srcBLen - 4] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc2 +=  x[5] * y[srcBLen - 4] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc3 +=  x[6] * y[srcBLen - 4] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x2 * c0)) >> 32);
+
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Read y[srcBLen - 5] sample */
+        c0 = *(py--);
+
+        /* Read x[7] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[srcBLen - 5] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc1 +=  x[5] * y[srcBLen - 5] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc2 +=  x[6] * y[srcBLen - 5] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc3 +=  x[7] * y[srcBLen - 5] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x3 * c0)) >> 32);
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the results in the accumulators in the destination buffer. */
+      *pOut++ = (q31_t) (acc0 << 1);
+      *pOut++ = (q31_t) (acc1 << 1);
+      *pOut++ = (q31_t) (acc2 << 1);
+      *pOut++ = (q31_t) (acc3 << 1);
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum << 1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,    
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum << 1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------    
+   * Initializations of stage3    
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]    
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]    
+   * ....    
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]    
+   * sum +=  x[srcALen-1] * y[srcBLen-1]    
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.    
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* -------------------    
+   * Stage3 process    
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py--))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = sum << 1;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+}
+
+/**    
+ * @} end of Conv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_opt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_opt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,545 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_opt_q15.c    
+*    
+* Description:	Convolution of Q15 sequences.      
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Conv    
+ * @{    
+ */
+
+/**    
+ * @brief Convolution of Q15 sequences.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.    
+ * @param[in]  *pScratch1 points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.    
+ * @param[in]  *pScratch2 points to scratch buffer of size min(srcALen, srcBLen).    
+ * @return none.    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit    
+ *    
+ *       
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both inputs are in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * This approach provides 33 guard bits and there is no risk of overflow.    
+ * The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.    
+ *  
+ *   
+ * \par    
+ * Refer to <code>arm_conv_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4.     
+ * 
+ *  
+ */
+
+void arm_conv_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+  q63_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q31_t x1, x2, x3;                              /* Temporary variables to hold state and coefficient values */
+  q31_t y1, y2;                                  /* State variables */
+  q15_t *pOut = pDst;                            /* output pointer */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch1 */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch1 */
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  uint32_t j, k, blkCnt;                         /* loop counter */
+  uint32_t tapCnt;                               /* loop count */
+#ifdef UNALIGNED_SUPPORT_DISABLE
+
+  q15_t a, b;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* pointer to take end of scratch2 buffer */
+  pScr2 = pScratch2 + srcBLen - 1;
+
+  /* points to smaller length sequence */
+  px = pIn2;
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = srcBLen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  /* Copy smaller length input sequence in reverse order into second scratch buffer */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr2-- = *px++;
+    *pScr2-- = *px++;
+    *pScr2-- = *px++;
+    *pScr2-- = *px++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcBLen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr2-- = *px++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* Initialze temporary scratch pointer */
+  pScr1 = pScratch1;
+
+  /* Assuming scratch1 buffer is aligned by 32-bit */
+  /* Fill (srcBLen - 1u) zeros in scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update temporary scratch pointer */
+  pScr1 += (srcBLen - 1u);
+
+  /* Copy bigger length sequence(srcALen) samples in scratch1 buffer */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Copy (srcALen) samples in scratch buffer */
+  arm_copy_q15(pIn1, pScr1, srcALen);
+
+  /* Update pointers */
+  pScr1 += srcALen;
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = srcALen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr1++ = *pIn1++;
+    *pScr1++ = *pIn1++;
+    *pScr1++ = *pIn1++;
+    *pScr1++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcALen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr1++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#endif
+
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update pointer */
+  pScr1 += (srcBLen - 1u);
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = (srcBLen - 1u) >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = (srcBLen - 1u) % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#endif
+
+  /* Temporary pointer for scratch2 */
+  py = pScratch2;
+
+
+  /* Initialization of pIn2 pointer */
+  pIn2 = py;
+
+  /* First part of the processing with loop unrolling process 4 data points at a time.       
+   ** a second loop below process for the remaining 1 to 3 samples. */
+
+  /* Actual convolution process starts here */
+  blkCnt = (srcALen + srcBLen - 1u) >> 2;
+
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Read two samples from scratch1 buffer */
+    x1 = *__SIMD32(pScr1)++;
+
+    /* Read next two samples from scratch1 buffer */
+    x2 = *__SIMD32(pScr1)++;
+
+    tapCnt = (srcBLen) >> 2u;
+
+    while(tapCnt > 0u)
+    {
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pIn2);
+      y2 = _SIMD32_OFFSET(pIn2 + 2u);
+
+      /* multiply and accumlate */
+      acc0 = __SMLALD(x1, y1, acc0);
+      acc2 = __SMLALD(x2, y1, acc2);
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      /* multiply and accumlate */
+      acc1 = __SMLALDX(x3, y1, acc1);
+
+      /* Read next two samples from scratch1 buffer */
+      x1 = _SIMD32_OFFSET(pScr1);
+
+      /* multiply and accumlate */
+      acc0 = __SMLALD(x2, y2, acc0);
+      acc2 = __SMLALD(x1, y2, acc2);
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y1, acc3);
+      acc1 = __SMLALDX(x3, y2, acc1);
+
+      x2 = _SIMD32_OFFSET(pScr1 + 2u);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y2, acc3);
+
+#else	 
+
+      /* Read four samples from smaller buffer */
+	  a = *pIn2;
+	  b = *(pIn2 + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      y1 = __PKHBT(a, b, 16);
+#else
+      y1 = __PKHBT(b, a, 16);
+#endif
+	  
+	  a = *(pIn2 + 2);
+	  b = *(pIn2 + 3);
+#ifndef ARM_MATH_BIG_ENDIAN
+      y2 = __PKHBT(a, b, 16);
+#else
+      y2 = __PKHBT(b, a, 16);
+#endif				
+
+      acc0 = __SMLALD(x1, y1, acc0);
+
+      acc2 = __SMLALD(x2, y1, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc1 = __SMLALDX(x3, y1, acc1);
+
+	  a = *pScr1;
+	  b = *(pScr1 + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(a, b, 16);
+#else
+      x1 = __PKHBT(b, a, 16);
+#endif
+
+      acc0 = __SMLALD(x2, y2, acc0);
+
+      acc2 = __SMLALD(x1, y2, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y1, acc3);
+
+      acc1 = __SMLALDX(x3, y2, acc1);
+
+	  a = *(pScr1 + 2);
+	  b = *(pScr1 + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x2 = __PKHBT(a, b, 16);
+#else
+      x2 = __PKHBT(b, a, 16);
+#endif
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y2, acc3);
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+      pIn2 += 4u;
+      pScr1 += 4u;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Update scratch pointer for remaining samples of smaller length sequence */
+    pScr1 -= 4u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    tapCnt = (srcBLen) & 3u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pIn2);
+      acc1 += (*pScr1++ * *pIn2);
+      acc2 += (*pScr1++ * *pIn2);
+      acc3 += (*pScr1++ * *pIn2++);
+
+      pScr1 -= 3u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+
+    /* Store the results in the accumulators in the destination buffer. */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+
+    *__SIMD32(pOut)++ =
+      __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+
+#endif /*      #ifndef ARM_MATH_BIG_ENDIAN       */
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch1 += 4u;
+
+  }
+
+
+  blkCnt = (srcALen + srcBLen - 1u) & 0x3;
+
+  /* Calculate convolution for remaining samples of Bigger length sequence */
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+
+    tapCnt = (srcBLen) >> 1u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* Read next two samples from scratch1 buffer */
+      acc0 += (*pScr1++ * *pIn2++);
+      acc0 += (*pScr1++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    tapCnt = (srcBLen) & 1u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* The result is in 2.30 format.  Convert to 1.15 with saturation.       
+     ** Then store the output in the destination buffer. */
+    *pOut++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch1 += 1u;
+
+  }
+
+}
+
+
+/**    
+ * @} end of Conv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_opt_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_opt_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,435 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_opt_q7.c    
+*    
+* Description:	Convolution of Q7 sequences.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Conv    
+ * @{    
+ */
+
+/**    
+ * @brief Convolution of Q7 sequences.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.    
+ * @param[in]  *pScratch1 points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.   
+ * @param[in]  *pScratch2 points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).   
+ * @return none.    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit     
+ *       
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.    
+ * Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result.    
+ * The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.    
+ * This approach provides 17 guard bits and there is no risk of overflow as long as <code>max(srcALen, srcBLen)<131072</code>.    
+ * The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and then saturated to 1.7 format.    
+ *
+ */
+
+void arm_conv_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pScr2, *pScr1;                          /* Intermediate pointers for scratch pointers */
+  q15_t x4;                                      /* Temporary input variable */
+  q7_t *pIn1, *pIn2;                             /* inputA and inputB pointer */
+  uint32_t j, k, blkCnt, tapCnt;                 /* loop counter */
+  q7_t *px;                                      /* Temporary input1 pointer */
+  q15_t *py;                                     /* Temporary input2 pointer */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q31_t x1, x2, x3, y1;                          /* Temporary input variables */
+  q7_t *pOut = pDst;                             /* output pointer */
+  q7_t out0, out1, out2, out3;                   /* temporary variables */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* pointer to take end of scratch2 buffer */
+  pScr2 = pScratch2;
+
+  /* points to smaller length sequence */
+  px = pIn2 + srcBLen - 1;
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = srcBLen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    x4 = (q15_t) * px--;
+    *pScr2++ = x4;
+    x4 = (q15_t) * px--;
+    *pScr2++ = x4;
+    x4 = (q15_t) * px--;
+    *pScr2++ = x4;
+    x4 = (q15_t) * px--;
+    *pScr2++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcBLen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    x4 = (q15_t) * px--;
+    *pScr2++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* Initialze temporary scratch pointer */
+  pScr1 = pScratch1;
+
+  /* Fill (srcBLen - 1u) zeros in scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update temporary scratch pointer */
+  pScr1 += (srcBLen - 1u);
+
+  /* Copy (srcALen) samples in scratch buffer */
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = srcALen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcALen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update pointer */
+  pScr1 += (srcBLen - 1u);
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = (srcBLen - 1u) >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = (srcBLen - 1u) % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#endif
+
+  /* Temporary pointer for scratch2 */
+  py = pScratch2;
+
+  /* Initialization of pIn2 pointer */
+  pIn2 = (q7_t *) py;
+
+  pScr2 = py;
+
+  /* Actual convolution process starts here */
+  blkCnt = (srcALen + srcBLen - 1u) >> 2;
+
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Read two samples from scratch1 buffer */
+    x1 = *__SIMD32(pScr1)++;
+
+    /* Read next two samples from scratch1 buffer */
+    x2 = *__SIMD32(pScr1)++;
+
+    tapCnt = (srcBLen) >> 2u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pScr2);
+
+      /* multiply and accumlate */
+      acc0 = __SMLAD(x1, y1, acc0);
+      acc2 = __SMLAD(x2, y1, acc2);
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      /* multiply and accumlate */
+      acc1 = __SMLADX(x3, y1, acc1);
+
+      /* Read next two samples from scratch1 buffer */
+      x1 = *__SIMD32(pScr1)++;
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pScr2 + 2u);
+
+      acc0 = __SMLAD(x2, y1, acc0);
+
+      acc2 = __SMLAD(x1, y1, acc2);
+
+      acc1 = __SMLADX(x3, y1, acc1);
+
+      x2 = *__SIMD32(pScr1)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      pScr2 += 4u;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+
+
+    /* Update scratch pointer for remaining samples of smaller length sequence */
+    pScr1 -= 4u;
+
+
+    /* apply same above for remaining samples of smaller length sequence */
+    tapCnt = (srcBLen) & 3u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pScr2);
+      acc1 += (*pScr1++ * *pScr2);
+      acc2 += (*pScr1++ * *pScr2);
+      acc3 += (*pScr1++ * *pScr2++);
+
+      pScr1 -= 3u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* Store the result in the accumulator in the destination buffer. */
+    out0 = (q7_t) (__SSAT(acc0 >> 7u, 8));
+    out1 = (q7_t) (__SSAT(acc1 >> 7u, 8));
+    out2 = (q7_t) (__SSAT(acc2 >> 7u, 8));
+    out3 = (q7_t) (__SSAT(acc3 >> 7u, 8));
+
+    *__SIMD32(pOut)++ = __PACKq7(out0, out1, out2, out3);
+
+    /* Initialization of inputB pointer */
+    pScr2 = py;
+
+    pScratch1 += 4u;
+
+  }
+
+
+  blkCnt = (srcALen + srcBLen - 1u) & 0x3;
+
+  /* Calculate convolution for remaining samples of Bigger length sequence */
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+
+    tapCnt = (srcBLen) >> 1u;
+
+    while(tapCnt > 0u)
+    {
+      acc0 += (*pScr1++ * *pScr2++);
+      acc0 += (*pScr1++ * *pScr2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    tapCnt = (srcBLen) & 1u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pScr2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q7_t) (__SSAT(acc0 >> 7u, 8));
+
+    /* Initialization of inputB pointer */
+    pScr2 = py;
+
+    pScratch1 += 1u;
+
+  }
+
+}
+
+
+/**    
+ * @} end of Conv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,669 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_partial_f32.c    
+*    
+* Description:	Partial convolution of floating-point sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup PartialConv Partial Convolution    
+ *    
+ * Partial Convolution is equivalent to Convolution except that a subset of the output samples is generated.    
+ * Each function has two additional arguments.    
+ * <code>firstIndex</code> specifies the starting index of the subset of output samples.    
+ * <code>numPoints</code> is the number of output samples to compute.    
+ * The function computes the output in the range    
+ * <code>[firstIndex, ..., firstIndex+numPoints-1]</code>.    
+ * The output array <code>pDst</code> contains <code>numPoints</code> values.    
+ *    
+ * The allowable range of output indices is [0 srcALen+srcBLen-2].    
+ * If the requested subset does not fall in this range then the functions return ARM_MATH_ARGUMENT_ERROR.    
+ * Otherwise the functions return ARM_MATH_SUCCESS.    
+ * \note Refer arm_conv_f32() for details on fixed point behavior.   
+ *
+ * 
+ * <b>Fast Versions</b>
+ *
+ * \par 
+ * Fast versions are supported for Q31 and Q15 of partial convolution.  Cycles for Fast versions are less compared to Q31 and Q15 of partial conv and the design requires
+ * the input signals should be scaled down to avoid intermediate overflows.   
+ *
+ *
+ * <b>Opt Versions</b>
+ *
+ * \par 
+ * Opt versions are supported for Q15 and Q7.  Design uses internal scratch buffer for getting good optimisation.
+ * These versions are optimised in cycles and consumes more memory(Scratch memory) compared to Q15 and Q7 versions of partial convolution
+ */
+
+/**    
+ * @addtogroup PartialConv    
+ * @{    
+ */
+
+/**    
+ * @brief Partial convolution of floating-point sequences.    
+ * @param[in]       *pSrcA points to the first input sequence.    
+ * @param[in]       srcALen length of the first input sequence.    
+ * @param[in]       *pSrcB points to the second input sequence.    
+ * @param[in]       srcBLen length of the second input sequence.    
+ * @param[out]      *pDst points to the location where the output result is written.    
+ * @param[in]       firstIndex is the first output sample to start with.    
+ * @param[in]       numPoints is the number of output points to be computed.    
+ * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].    
+ */
+
+arm_status arm_conv_partial_f32(
+  float32_t * pSrcA,
+  uint32_t srcALen,
+  float32_t * pSrcB,
+  uint32_t srcBLen,
+  float32_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t *pIn1 = pSrcA;                       /* inputA pointer */
+  float32_t *pIn2 = pSrcB;                       /* inputB pointer */
+  float32_t *pOut = pDst;                        /* output pointer */
+  float32_t *px;                                 /* Intermediate inputA pointer */
+  float32_t *py;                                 /* Intermediate inputB pointer */
+  float32_t *pSrc1, *pSrc2;                      /* Intermediate pointers */
+  float32_t sum, acc0, acc1, acc2, acc3;         /* Accumulator */
+  float32_t x0, x1, x2, x3, c0;                  /* Temporary variables to hold state and coefficient values */
+  uint32_t j, k, count = 0u, blkCnt, check;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counters */
+  arm_status status;                             /* status of Partial convolution */
+
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds    
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = ((int32_t) srcBLen - 1) - (int32_t) firstIndex;
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = ((int32_t) check - blockSize3) -
+      (blockSize1 + (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally    
+     * divided into three stages according to the number of multiplications that has to be    
+     * taken place between inputA samples and inputB samples. In the first stage of the    
+     * algorithm, the multiplications increase by one for every iteration.    
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.    
+     * In the third stage of the algorithm, the multiplications decrease by one    
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex    
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------    
+     * Initializations of stage1    
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]    
+     * sum = x[0] * y[1] + x[1] * y[0]    
+     * ....    
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]    
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.    
+       The count variable holds the number of MAC operations performed.    
+       Since the partial convolution starts from from firstIndex    
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc1 = pIn2 + firstIndex;
+    py = pSrc1;
+
+    /* ------------------------    
+     * Stage1 process    
+     * ----------------------*/
+
+    /* The first stage starts here */
+    while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0.0f;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* x[0] * y[srcBLen - 1] */
+        sum += *px++ * *py--;
+
+        /* x[1] * y[srcBLen - 2] */
+        sum += *px++ * *py--;
+
+        /* x[2] * y[srcBLen - 3] */
+        sum += *px++ * *py--;
+
+        /* x[3] * y[srcBLen - 4] */
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc1;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------    
+     * Initializations of stage2    
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]    
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]    
+     * ....    
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]    
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* count is index by which the pointer pIn1 to be incremented */
+    count = 0u;
+
+    /* -------------------    
+     * Stage2 process    
+     * ------------------*/
+
+    /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+     * So, to loop unroll over blockSize2,    
+     * srcBLen should be greater than or equal to 4 */
+    if(srcBLen >= 4u)
+    {
+      /* Loop unroll over blockSize2, by 4 */
+      blkCnt = ((uint32_t) blockSize2 >> 2u);
+
+      while(blkCnt > 0u)
+      {
+        /* Set all accumulators to zero */
+        acc0 = 0.0f;
+        acc1 = 0.0f;
+        acc2 = 0.0f;
+        acc3 = 0.0f;
+
+        /* read x[0], x[1], x[2] samples */
+        x0 = *(px++);
+        x1 = *(px++);
+        x2 = *(px++);
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        do
+        {
+          /* Read y[srcBLen - 1] sample */
+          c0 = *(py--);
+
+          /* Read x[3] sample */
+          x3 = *(px++);
+
+          /* Perform the multiply-accumulate */
+          /* acc0 +=  x[0] * y[srcBLen - 1] */
+          acc0 += x0 * c0;
+
+          /* acc1 +=  x[1] * y[srcBLen - 1] */
+          acc1 += x1 * c0;
+
+          /* acc2 +=  x[2] * y[srcBLen - 1] */
+          acc2 += x2 * c0;
+
+          /* acc3 +=  x[3] * y[srcBLen - 1] */
+          acc3 += x3 * c0;
+
+          /* Read y[srcBLen - 2] sample */
+          c0 = *(py--);
+
+          /* Read x[4] sample */
+          x0 = *(px++);
+
+          /* Perform the multiply-accumulate */
+          /* acc0 +=  x[1] * y[srcBLen - 2] */
+          acc0 += x1 * c0;
+          /* acc1 +=  x[2] * y[srcBLen - 2] */
+          acc1 += x2 * c0;
+          /* acc2 +=  x[3] * y[srcBLen - 2] */
+          acc2 += x3 * c0;
+          /* acc3 +=  x[4] * y[srcBLen - 2] */
+          acc3 += x0 * c0;
+
+          /* Read y[srcBLen - 3] sample */
+          c0 = *(py--);
+
+          /* Read x[5] sample */
+          x1 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[2] * y[srcBLen - 3] */
+          acc0 += x2 * c0;
+          /* acc1 +=  x[3] * y[srcBLen - 2] */
+          acc1 += x3 * c0;
+          /* acc2 +=  x[4] * y[srcBLen - 2] */
+          acc2 += x0 * c0;
+          /* acc3 +=  x[5] * y[srcBLen - 2] */
+          acc3 += x1 * c0;
+
+          /* Read y[srcBLen - 4] sample */
+          c0 = *(py--);
+
+          /* Read x[6] sample */
+          x2 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[3] * y[srcBLen - 4] */
+          acc0 += x3 * c0;
+          /* acc1 +=  x[4] * y[srcBLen - 4] */
+          acc1 += x0 * c0;
+          /* acc2 +=  x[5] * y[srcBLen - 4] */
+          acc2 += x1 * c0;
+          /* acc3 +=  x[6] * y[srcBLen - 4] */
+          acc3 += x2 * c0;
+
+
+        } while(--k);
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Read y[srcBLen - 5] sample */
+          c0 = *(py--);
+
+          /* Read x[7] sample */
+          x3 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[4] * y[srcBLen - 5] */
+          acc0 += x0 * c0;
+          /* acc1 +=  x[5] * y[srcBLen - 5] */
+          acc1 += x1 * c0;
+          /* acc2 +=  x[6] * y[srcBLen - 5] */
+          acc2 += x2 * c0;
+          /* acc3 +=  x[7] * y[srcBLen - 5] */
+          acc3 += x3 * c0;
+
+          /* Reuse the present samples for the next MAC */
+          x0 = x1;
+          x1 = x2;
+          x2 = x3;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = acc0;
+        *pOut++ = acc1;
+        *pOut++ = acc2;
+        *pOut++ = acc3;
+
+        /* Increment the pointer pIn1 index, count by 1 */
+        count += 4u;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.    
+       ** No loop unrolling is used. */
+      blkCnt = (uint32_t) blockSize2 % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0.0f;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += *px++ * *py--;
+          sum += *px++ * *py--;
+          sum += *px++ * *py--;
+          sum += *px++ * *py--;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += *px++ * *py--;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = sum;
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,    
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0.0f;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += *px++ * *py--;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = sum;
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------    
+     * Initializations of stage3    
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]    
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]    
+     * ....    
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]    
+     * sum +=  x[srcALen-1] * y[srcBLen-1]    
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.    
+       The count variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0.0f;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+        sum += *px++ * *py--;
+
+        /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+        sum += *px++ * *py--;
+
+        /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+        sum += *px++ * *py--;
+
+        /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum += *px++ * *py--;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t *pIn1 = pSrcA;                       /* inputA pointer */
+  float32_t *pIn2 = pSrcB;                       /* inputB pointer */
+  float32_t sum;                                 /* Accumulator */
+  uint32_t i, j;                                 /* loop counters */
+  arm_status status;                             /* status of Partial convolution */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+    /* Loop to calculate convolution for output length number of values */
+    for (i = firstIndex; i <= (firstIndex + numPoints - 1); i++)
+    {
+      /* Initialize sum with zero to carry on MAC operations */
+      sum = 0.0f;
+
+      /* Loop to perform MAC operations according to convolution equation */
+      for (j = 0u; j <= i; j++)
+      {
+        /* Check the array limitations for inputs */
+        if((((i - j) < srcBLen) && (j < srcALen)))
+        {
+          /* z[i] += x[i-j] * y[j] */
+          sum += pIn1[j] * pIn2[i - j];
+        }
+      }
+      /* Store the output in the destination buffer */
+      pDst[i] = sum;
+    }
+    /* set status as ARM_SUCCESS as there are no argument errors */
+    status = ARM_MATH_SUCCESS;
+  }
+  return (status);
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of PartialConv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_opt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_opt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,768 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_partial_fast_opt_q15.c    
+*    
+* Description:	Fast Q15 Partial convolution.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup PartialConv    
+ * @{    
+ */
+
+/**    
+ * @brief Partial convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.    
+ * @param[in]       *pSrcA points to the first input sequence.    
+ * @param[in]       srcALen length of the first input sequence.    
+ * @param[in]       *pSrcB points to the second input sequence.    
+ * @param[in]       srcBLen length of the second input sequence.    
+ * @param[out]      *pDst points to the location where the output result is written.    
+ * @param[in]       firstIndex is the first output sample to start with.    
+ * @param[in]       numPoints is the number of output points to be computed.    
+ * @param[in]       *pScratch1 points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.   
+ * @param[in]       *pScratch2 points to scratch buffer of size min(srcALen, srcBLen).   
+ * @return Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].    
+ *    
+ * See <code>arm_conv_partial_q15()</code> for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit    
+ *     
+ */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+arm_status arm_conv_partial_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pOut = pDst;                            /* output pointer */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch1 */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch1 */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q31_t x1, x2, x3;                              /* Temporary variables to hold state and coefficient values */
+  q31_t y1, y2;                                  /* State variables */
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  uint32_t j, k, blkCnt;                         /* loop counter */
+  arm_status status;
+
+  uint32_t tapCnt;                               /* loop count */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Temporary pointer for scratch2 */
+    py = pScratch2;
+
+    /* pointer to take end of scratch2 buffer */
+    pScr2 = pScratch2 + srcBLen - 1;
+
+    /* points to smaller length sequence */
+    px = pIn2;
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcBLen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+
+    /* Copy smaller length input sequence in reverse order into second scratch buffer */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcBLen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Initialze temporary scratch pointer */
+    pScr1 = pScratch1;
+
+    /* Assuming scratch1 buffer is aligned by 32-bit */
+    /* Fill (srcBLen - 1u) zeros in scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update temporary scratch pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Copy bigger length sequence(srcALen) samples in scratch1 buffer */
+
+    /* Copy (srcALen) samples in scratch buffer */
+    arm_copy_q15(pIn1, pScr1, srcALen);
+
+    /* Update pointers */
+    pScr1 += srcALen;
+
+    /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Initialization of pIn2 pointer */
+    pIn2 = py;
+
+    pScratch1 += firstIndex;
+
+    pOut = pDst + firstIndex;
+
+    /* First part of the processing with loop unrolling process 4 data points at a time.       
+     ** a second loop below process for the remaining 1 to 3 samples. */
+
+    /* Actual convolution process starts here */
+    blkCnt = (numPoints) >> 2;
+
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* Read two samples from scratch1 buffer */
+      x1 = *__SIMD32(pScr1)++;
+
+      /* Read next two samples from scratch1 buffer */
+      x2 = *__SIMD32(pScr1)++;
+
+      tapCnt = (srcBLen) >> 2u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read four samples from smaller buffer */
+        y1 = _SIMD32_OFFSET(pIn2);
+        y2 = _SIMD32_OFFSET(pIn2 + 2u);
+
+        /* multiply and accumlate */
+        acc0 = __SMLAD(x1, y1, acc0);
+        acc2 = __SMLAD(x2, y1, acc2);
+
+        /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x2, x1, 0);
+#else
+        x3 = __PKHBT(x1, x2, 0);
+#endif
+
+        /* multiply and accumlate */
+        acc1 = __SMLADX(x3, y1, acc1);
+
+        /* Read next two samples from scratch1 buffer */
+        x1 = _SIMD32_OFFSET(pScr1);
+
+        /* multiply and accumlate */
+        acc0 = __SMLAD(x2, y2, acc0);
+
+        acc2 = __SMLAD(x1, y2, acc2);
+
+        /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x1, x2, 0);
+#else
+        x3 = __PKHBT(x2, x1, 0);
+#endif
+
+        acc3 = __SMLADX(x3, y1, acc3);
+        acc1 = __SMLADX(x3, y2, acc1);
+
+        x2 = _SIMD32_OFFSET(pScr1 + 2u);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x2, x1, 0);
+#else
+        x3 = __PKHBT(x1, x2, 0);
+#endif
+
+        acc3 = __SMLADX(x3, y2, acc3);
+
+        /* update scratch pointers */
+        pIn2 += 4u;
+        pScr1 += 4u;
+
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* Update scratch pointer for remaining samples of smaller length sequence */
+      pScr1 -= 4u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      tapCnt = (srcBLen) & 3u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2);
+        acc1 += (*pScr1++ * *pIn2);
+        acc2 += (*pScr1++ * *pIn2);
+        acc3 += (*pScr1++ * *pIn2++);
+
+        pScr1 -= 3u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+
+      /* Store the results in the accumulators in the destination buffer. */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 4u;
+
+    }
+
+
+    blkCnt = numPoints & 0x3;
+
+    /* Calculate convolution for remaining samples of Bigger length sequence */
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+
+      tapCnt = (srcBLen) >> 1u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read next two samples from scratch1 buffer */
+        x1 = *__SIMD32(pScr1)++;
+
+        /* Read two samples from smaller buffer */
+        y1 = *__SIMD32(pIn2)++;
+
+        acc0 = __SMLAD(x1, y1, acc0);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      tapCnt = (srcBLen) & 1u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2++);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* The result is in 2.30 format.  Convert to 1.15 with saturation.       
+       ** Then store the output in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 1u;
+
+    }
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+  /* Return to application */
+  return (status);
+}
+
+#else
+
+arm_status arm_conv_partial_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pOut = pDst;                            /* output pointer */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch1 */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch1 */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  uint32_t j, k, blkCnt;                         /* loop counter */
+  arm_status status;                             /* Status variable */
+  uint32_t tapCnt;                               /* loop count */
+  q15_t x10, x11, x20, x21;                      /* Temporary variables to hold srcA buffer */
+  q15_t y10, y11;                                /* Temporary variables to hold srcB buffer */
+
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Temporary pointer for scratch2 */
+    py = pScratch2;
+
+    /* pointer to take end of scratch2 buffer */
+    pScr2 = pScratch2 + srcBLen - 1;
+
+    /* points to smaller length sequence */
+    px = pIn2;
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcBLen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcBLen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Initialze temporary scratch pointer */
+    pScr1 = pScratch1;
+
+    /* Fill (srcBLen - 1u) zeros in scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update temporary scratch pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Copy bigger length sequence(srcALen) samples in scratch1 buffer */
+
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcALen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr1++ = *pIn1++;
+      *pScr1++ = *pIn1++;
+      *pScr1++ = *pIn1++;
+      *pScr1++ = *pIn1++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcALen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr1++ = *pIn1++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = (srcBLen - 1u) >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = (srcBLen - 1u) % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr1++ = 0;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+
+    /* Initialization of pIn2 pointer */
+    pIn2 = py;
+
+    pScratch1 += firstIndex;
+
+    pOut = pDst + firstIndex;
+
+    /* Actual convolution process starts here */
+    blkCnt = (numPoints) >> 2;
+
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* Read two samples from scratch1 buffer */
+      x10 = *pScr1++;
+      x11 = *pScr1++;
+
+      /* Read next two samples from scratch1 buffer */
+      x20 = *pScr1++;
+      x21 = *pScr1++;
+
+      tapCnt = (srcBLen) >> 2u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read two samples from smaller buffer */
+        y10 = *pIn2;
+        y11 = *(pIn2 + 1u);
+
+        /* multiply and accumlate */
+        acc0 += (q31_t) x10 *y10;
+        acc0 += (q31_t) x11 *y11;
+        acc2 += (q31_t) x20 *y10;
+        acc2 += (q31_t) x21 *y11;
+
+        /* multiply and accumlate */
+        acc1 += (q31_t) x11 *y10;
+        acc1 += (q31_t) x20 *y11;
+
+        /* Read next two samples from scratch1 buffer */
+        x10 = *pScr1;
+        x11 = *(pScr1 + 1u);
+
+        /* multiply and accumlate */
+        acc3 += (q31_t) x21 *y10;
+        acc3 += (q31_t) x10 *y11;
+
+        /* Read next two samples from scratch2 buffer */
+        y10 = *(pIn2 + 2u);
+        y11 = *(pIn2 + 3u);
+
+        /* multiply and accumlate */
+        acc0 += (q31_t) x20 *y10;
+        acc0 += (q31_t) x21 *y11;
+        acc2 += (q31_t) x10 *y10;
+        acc2 += (q31_t) x11 *y11;
+        acc1 += (q31_t) x21 *y10;
+        acc1 += (q31_t) x10 *y11;
+
+        /* Read next two samples from scratch1 buffer */
+        x20 = *(pScr1 + 2);
+        x21 = *(pScr1 + 3);
+
+        /* multiply and accumlate */
+        acc3 += (q31_t) x11 *y10;
+        acc3 += (q31_t) x20 *y11;
+
+        /* update scratch pointers */
+        pIn2 += 4u;
+        pScr1 += 4u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* Update scratch pointer for remaining samples of smaller length sequence */
+      pScr1 -= 4u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      tapCnt = (srcBLen) & 3u;
+
+      while(tapCnt > 0u)
+      {
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2);
+        acc1 += (*pScr1++ * *pIn2);
+        acc2 += (*pScr1++ * *pIn2);
+        acc3 += (*pScr1++ * *pIn2++);
+
+        pScr1 -= 3u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+
+      /* Store the results in the accumulators in the destination buffer. */
+      *pOut++ = __SSAT((acc0 >> 15), 16);
+      *pOut++ = __SSAT((acc1 >> 15), 16);
+      *pOut++ = __SSAT((acc2 >> 15), 16);
+      *pOut++ = __SSAT((acc3 >> 15), 16);
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 4u;
+
+    }
+
+
+    blkCnt = numPoints & 0x3;
+
+    /* Calculate convolution for remaining samples of Bigger length sequence */
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+
+      tapCnt = (srcBLen) >> 1u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read next two samples from scratch1 buffer */
+        x10 = *pScr1++;
+        x11 = *pScr1++;
+
+        /* Read two samples from smaller buffer */
+        y10 = *pIn2++;
+        y11 = *pIn2++;
+
+        /* multiply and accumlate */
+        acc0 += (q31_t) x10 *y10;
+        acc0 += (q31_t) x11 *y11;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      tapCnt = (srcBLen) & 1u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2++);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 1u;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+/**    
+ * @} end of PartialConv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,1492 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_conv_partial_fast_q15.c   
+*   
+* Description:	Fast Q15 Partial convolution.   
+*   
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup PartialConv   
+ * @{   
+ */
+
+/**   
+ * @brief Partial convolution of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.   
+ * @param[in]       *pSrcA points to the first input sequence.   
+ * @param[in]       srcALen length of the first input sequence.   
+ * @param[in]       *pSrcB points to the second input sequence.   
+ * @param[in]       srcBLen length of the second input sequence.   
+ * @param[out]      *pDst points to the location where the output result is written.   
+ * @param[in]       firstIndex is the first output sample to start with.   
+ * @param[in]       numPoints is the number of output points to be computed.   
+ * @return Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].   
+ *   
+ * See <code>arm_conv_partial_q15()</code> for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.   
+ */
+
+
+arm_status arm_conv_partial_fast_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints)
+{
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q15_t *pOut = pDst;                            /* output pointer               */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator                  */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1, *pSrc2;                          /* Intermediate pointers        */
+  q31_t x0, x1, x2, x3, c0;
+  uint32_t j, k, count, check, blkCnt;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counters                 */
+  arm_status status;                             /* status of Partial convolution */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >=srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds   
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = (((int32_t) srcBLen - 1) - (int32_t) firstIndex);
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = (int32_t) check - ((blockSize3 + blockSize1) +
+                                    (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally   
+     * divided into three stages according to the number of multiplications that has to be   
+     * taken place between inputA samples and inputB samples. In the first stage of the   
+     * algorithm, the multiplications increase by one for every iteration.   
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+     * In the third stage of the algorithm, the multiplications decrease by one   
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex   
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------   
+     * Initializations of stage1   
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]   
+     * sum = x[0] * y[1] + x[1] * y[0]   
+     * ....   
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed.   
+       Since the partial convolution starts from firstIndex   
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + firstIndex;
+    py = pSrc2;
+
+    /* ------------------------   
+     * Stage1 process   
+     * ----------------------*/
+
+    /* For loop unrolling by 4, this stage is divided into two. */
+    /* First part of this stage computes the MAC operations less than 4 */
+    /* Second part of this stage computes the MAC operations greater than or equal to 4 */
+
+    /* The first part of the stage starts here */
+    while((count < 4u) && (blockSize1 > 0))
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over number of MAC operations between   
+       * inputA samples and inputB samples */
+      k = count;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = __SMLAD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* The second part of the stage starts here */
+    /* The internal loop, over count, is unrolled by 4 */
+    /* To, read the last two inputB samples using SIMD:   
+     * y[srcBLen] and y[srcBLen-1] coefficients, py is decremented by 1 */
+    py = py - 1;
+
+    while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* x[0], x[1] are multiplied with y[srcBLen - 1], y[srcBLen - 2] respectively */
+        sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+        /* x[2], x[3] are multiplied with y[srcBLen - 3], y[srcBLen - 4] respectively */
+        sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* For the next MAC operations, the pointer py is used without SIMD   
+       * So, py is incremented by 1 */
+      py = py + 1u;
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = __SMLAD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2 - 1u;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------   
+     * Initializations of stage2   
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+     * ....   
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* count is the index by which the pointer pIn1 to be incremented */
+    count = 0u;
+
+
+    /* --------------------   
+     * Stage2 process   
+     * -------------------*/
+
+    /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+     * So, to loop unroll over blockSize2,   
+     * srcBLen should be greater than or equal to 4 */
+    if(srcBLen >= 4u)
+    {
+      /* Loop unroll over blockSize2, by 4 */
+      blkCnt = ((uint32_t) blockSize2 >> 2u);
+
+      while(blkCnt > 0u)
+      {
+      py = py - 1u;
+
+        /* Set all accumulators to zero */
+        acc0 = 0;
+        acc1 = 0;
+        acc2 = 0;
+        acc3 = 0;
+
+
+        /* read x[0], x[1] samples */
+      x0 = *__SIMD32(px);
+        /* read x[1], x[2] samples */
+      x1 = _SIMD32_OFFSET(px+1);
+	  px+= 2u;
+
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        do
+        {
+          /* Read the last two inputB samples using SIMD:   
+           * y[srcBLen - 1] and y[srcBLen - 2] */
+        c0 = *__SIMD32(py)--;
+
+          /* acc0 +=  x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2] */
+          acc0 = __SMLADX(x0, c0, acc0);
+
+          /* acc1 +=  x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2] */
+          acc1 = __SMLADX(x1, c0, acc1);
+
+          /* Read x[2], x[3] */
+        x2 = *__SIMD32(px);
+
+          /* Read x[3], x[4] */
+        x3 = _SIMD32_OFFSET(px+1);
+
+          /* acc2 +=  x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2] */
+          acc2 = __SMLADX(x2, c0, acc2);
+
+          /* acc3 +=  x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2] */
+          acc3 = __SMLADX(x3, c0, acc3);
+
+          /* Read y[srcBLen - 3] and y[srcBLen - 4] */
+        c0 = *__SIMD32(py)--;
+
+          /* acc0 +=  x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4] */
+          acc0 = __SMLADX(x2, c0, acc0);
+
+          /* acc1 +=  x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4] */
+          acc1 = __SMLADX(x3, c0, acc1);
+
+          /* Read x[4], x[5] */
+        x0 = _SIMD32_OFFSET(px+2);
+
+          /* Read x[5], x[6] */
+        x1 = _SIMD32_OFFSET(px+3);
+		px += 4u;
+
+          /* acc2 +=  x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4] */
+          acc2 = __SMLADX(x0, c0, acc2);
+
+          /* acc3 +=  x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4] */
+          acc3 = __SMLADX(x1, c0, acc3);
+
+        } while(--k);
+
+        /* For the next MAC operations, SIMD is not used   
+         * So, the 16 bit pointer if inputB, py is updated */
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        if(k == 1u)
+        {
+          /* Read y[srcBLen - 5] */
+        c0 = *(py+1);
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+          /* Read x[7] */
+        x3 = *__SIMD32(px);
+		px++;
+
+          /* Perform the multiply-accumulates */
+          acc0 = __SMLAD(x0, c0, acc0);
+          acc1 = __SMLAD(x1, c0, acc1);
+          acc2 = __SMLADX(x1, c0, acc2);
+          acc3 = __SMLADX(x3, c0, acc3);
+        }
+
+        if(k == 2u)
+        {
+          /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+          /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+		px += 2u;
+
+          /* Perform the multiply-accumulates */
+          acc0 = __SMLADX(x0, c0, acc0);
+          acc1 = __SMLADX(x1, c0, acc1);
+          acc2 = __SMLADX(x3, c0, acc2);
+          acc3 = __SMLADX(x2, c0, acc3);
+        }
+
+        if(k == 3u)
+        {
+          /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+          /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+          /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+
+          /* Perform the multiply-accumulates */
+          acc0 = __SMLADX(x0, c0, acc0);
+          acc1 = __SMLADX(x1, c0, acc1);
+          acc2 = __SMLADX(x3, c0, acc2);
+          acc3 = __SMLADX(x2, c0, acc3);
+
+		c0 = *(py-1);
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+          /* Read x[10] */
+        x3 =  _SIMD32_OFFSET(px+2);
+		px += 3u;
+
+          /* Perform the multiply-accumulates */
+          acc0 = __SMLADX(x1, c0, acc0);
+          acc1 = __SMLAD(x2, c0, acc1);
+          acc2 = __SMLADX(x2, c0, acc2);
+          acc3 = __SMLADX(x3, c0, acc3);
+        }
+
+        /* Store the results in the accumulators in the destination buffer. */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *__SIMD32(pOut)++ = __PKHBT(acc0 >> 15, acc1 >> 15, 16);
+        *__SIMD32(pOut)++ = __PKHBT(acc2 >> 15, acc3 >> 15, 16);
+
+#else
+
+        *__SIMD32(pOut)++ = __PKHBT(acc1 >> 15, acc0 >> 15, 16);
+        *__SIMD32(pOut)++ = __PKHBT(acc3 >> 15, acc2 >> 15, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+        /* Increment the pointer pIn1 index, count by 4 */
+        count += 4u;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+       ** No loop unrolling is used. */
+      blkCnt = (uint32_t) blockSize2 % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += ((q31_t) * px++ * *py--);
+          sum += ((q31_t) * px++ * *py--);
+          sum += ((q31_t) * px++ * *py--);
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q15_t) (sum >> 15);
+
+        /* Increment the pointer pIn1 index, count by 1 */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,   
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q15_t) (sum >> 15);
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------   
+     * Initializations of stage3   
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+     * ....   
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+     * sum +=  x[srcALen-1] * y[srcBLen-1]   
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    pIn2 = pSrc2 - 1u;
+    py = pIn2;
+
+    /* -------------------   
+     * Stage3 process   
+     * ------------------*/
+
+    /* For loop unrolling by 4, this stage is divided into two. */
+    /* First part of this stage computes the MAC operations greater than 4 */
+    /* Second part of this stage computes the MAC operations less than or equal to 4 */
+
+    /* The first part of the stage starts here */
+    j = count >> 2u;
+
+    while((j > 0u) && (blockSize3 > 0))
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* x[srcALen - srcBLen + 1], x[srcALen - srcBLen + 2] are multiplied   
+         * with y[srcBLen - 1], y[srcBLen - 2] respectively */
+        sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+        /* x[srcALen - srcBLen + 3], x[srcALen - srcBLen + 4] are multiplied   
+         * with y[srcBLen - 3], y[srcBLen - 4] respectively */
+        sum = __SMLADX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* For the next MAC operations, the pointer py is used without SIMD   
+       * So, py is incremented by 1 */
+      py = py + 1u;
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* sum += x[srcALen - srcBLen + 5] * y[srcBLen - 5] */
+        sum = __SMLAD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pIn2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+      j--;
+    }
+
+    /* The second part of the stage starts here */
+    /* SIMD is not used for the next MAC operations,   
+     * so pointer py is updated to read only one sample at a time */
+    py = py + 1u;
+
+    while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum = __SMLAD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+#else
+
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q15_t *pOut = pDst;                            /* output pointer               */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator                  */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1, *pSrc2;                          /* Intermediate pointers        */
+  q31_t x0, x1, x2, x3, c0;
+  uint32_t j, k, count, check, blkCnt;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counters                 */
+  arm_status status;                             /* status of Partial convolution */
+  q15_t a, b;
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >=srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds   
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = ((int32_t) srcBLen - 1) - (int32_t) firstIndex;
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = ((int32_t) check - blockSize3) -
+      (blockSize1 + (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally   
+     * divided into three stages according to the number of multiplications that has to be   
+     * taken place between inputA samples and inputB samples. In the first stage of the   
+     * algorithm, the multiplications increase by one for every iteration.   
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+     * In the third stage of the algorithm, the multiplications decrease by one   
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex   
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------   
+     * Initializations of stage1   
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]   
+     * sum = x[0] * y[1] + x[1] * y[0]   
+     * ....   
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed.   
+       Since the partial convolution starts from firstIndex   
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + firstIndex;
+    py = pSrc2;
+
+    /* ------------------------   
+     * Stage1 process   
+     * ----------------------*/
+
+    /* For loop unrolling by 4, this stage is divided into two. */
+    /* First part of this stage computes the MAC operations less than 4 */
+    /* Second part of this stage computes the MAC operations greater than or equal to 4 */
+
+    /* The first part of the stage starts here */
+  while((count < 4u) && (blockSize1 > 0))
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over number of MAC operations between   
+       * inputA samples and inputB samples */
+      k = count;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+      sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* The second part of the stage starts here */
+    /* The internal loop, over count, is unrolled by 4 */
+    /* To, read the last two inputB samples using SIMD:   
+     * y[srcBLen] and y[srcBLen-1] coefficients, py is decremented by 1 */
+    py = py - 1;
+
+  while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+	py++;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+      sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2 - 1u;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------   
+     * Initializations of stage2   
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+     * ....   
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* count is the index by which the pointer pIn1 to be incremented */
+    count = 0u;
+
+
+    /* --------------------   
+     * Stage2 process   
+     * -------------------*/
+
+    /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+     * So, to loop unroll over blockSize2,   
+     * srcBLen should be greater than or equal to 4 */
+    if(srcBLen >= 4u)
+    {
+      /* Loop unroll over blockSize2, by 4 */
+      blkCnt = ((uint32_t) blockSize2 >> 2u);
+
+      while(blkCnt > 0u)
+      {
+      py = py - 1u;
+
+        /* Set all accumulators to zero */
+        acc0 = 0;
+        acc1 = 0;
+        acc2 = 0;
+        acc3 = 0;
+
+      /* read x[0], x[1] samples */
+	  a = *px++;
+	  b = *px++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x0 = __PKHBT(a, b, 16);
+	  a = *px;
+	  x1 = __PKHBT(b, a, 16);
+
+#else
+
+	  x0 = __PKHBT(b, a, 16);
+	  a = *px;
+	  x1 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the last two inputB samples using SIMD:   
+         * y[srcBLen - 1] and y[srcBLen - 2] */
+		a = *py;
+		b = *(py+1);
+		py -= 2;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc0 +=  x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2] */
+        acc0 = __SMLADX(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2] */
+        acc1 = __SMLADX(x1, c0, acc1);
+
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x2 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x3 = __PKHBT(b, a, 16);
+
+#else
+
+	  x2 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x3 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2] */
+        acc2 = __SMLADX(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2] */
+        acc3 = __SMLADX(x3, c0, acc3);
+
+        /* Read y[srcBLen - 3] and y[srcBLen - 4] */
+		a = *py;
+		b = *(py+1);
+		py -= 2;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc0 +=  x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4] */
+        acc0 = __SMLADX(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4] */
+        acc1 = __SMLADX(x3, c0, acc1);
+
+        /* Read x[4], x[5], x[6] */
+	  a = *(px + 2);
+	  b = *(px + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x0 = __PKHBT(a, b, 16);
+	  a = *(px + 4);
+	  x1 = __PKHBT(b, a, 16);
+
+#else
+
+	  x0 = __PKHBT(b, a, 16);
+	  a = *(px + 4);
+	  x1 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4] */
+        acc2 = __SMLADX(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4] */
+        acc3 = __SMLADX(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[srcBLen - 5] */
+        c0 = *(py+1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[7] */
+		a = *px;
+		b = *(px+1);
+		px++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		x3 = __PKHBT(a, b, 16);
+
+#else
+
+ 		x3 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLADX(x1, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+		a = *py;
+		b = *(py+1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* Read x[7], x[8], x[9] */
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x3 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(b, a, 16);
+
+#else
+
+	  x3 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x0, c0, acc0);
+        acc1 = __SMLADX(x1, c0, acc1);
+        acc2 = __SMLADX(x3, c0, acc2);
+        acc3 = __SMLADX(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+		a = *py;
+		b = *(py+1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		c0 = __PKHBT(a, b, 16);
+
+#else
+
+ 		c0 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* Read x[7], x[8], x[9] */
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+	  x3 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(b, a, 16);
+
+#else
+
+	  x3 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x2 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	   */
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x0, c0, acc0);
+        acc1 = __SMLADX(x1, c0, acc1);
+        acc2 = __SMLADX(x3, c0, acc2);
+        acc3 = __SMLADX(x2, c0, acc3);
+
+        /* Read y[srcBLen - 7] */
+		c0 = *(py-1);
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[10] */
+		a = *(px+2);
+		b = *(px+3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+		x3 = __PKHBT(a, b, 16);
+
+#else
+
+ 		x3 = __PKHBT(b, a, 16);;
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x1, c0, acc0);
+        acc1 = __SMLAD(x2, c0, acc1);
+        acc2 = __SMLADX(x2, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      /* Store the results in the accumulators in the destination buffer. */
+	  *pOut++ = (q15_t)(acc0 >> 15);
+	  *pOut++ = (q15_t)(acc1 >> 15);
+	  *pOut++ = (q15_t)(acc2 >> 15);
+	  *pOut++ = (q15_t)(acc3 >> 15);
+
+        /* Increment the pointer pIn1 index, count by 4 */
+        count += 4u;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+       ** No loop unrolling is used. */
+      blkCnt = (uint32_t) blockSize2 % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += ((q31_t) * px++ * *py--);
+          sum += ((q31_t) * px++ * *py--);
+          sum += ((q31_t) * px++ * *py--);
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q15_t) (sum >> 15);
+
+        /* Increment the pointer pIn1 index, count by 1 */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,   
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q15_t) (sum >> 15);
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------   
+     * Initializations of stage3   
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+     * ....   
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+     * sum +=  x[srcALen-1] * y[srcBLen-1]   
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    pIn2 = pSrc2 - 1u;
+    py = pIn2;
+
+    /* -------------------   
+     * Stage3 process   
+     * ------------------*/
+
+    /* For loop unrolling by 4, this stage is divided into two. */
+    /* First part of this stage computes the MAC operations greater than 4 */
+    /* Second part of this stage computes the MAC operations less than or equal to 4 */
+
+    /* The first part of the stage starts here */
+    j = count >> 2u;
+
+    while((j > 0u) && (blockSize3 > 0))
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+	py++;
+
+    while(k > 0u)
+    {	
+      /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+        sum += ((q31_t) * px++ * *py--);
+      /* Decrement the loop counter */
+      k--;
+    }
+
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+      /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pIn2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+      j--;
+    }
+
+    /* The second part of the stage starts here */
+    /* SIMD is not used for the next MAC operations,   
+     * so pointer py is updated to read only one sample at a time */
+    py = py + 1u;
+
+  while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (sum >> 15);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+#endif /*     #ifndef UNALIGNED_SUPPORT_DISABLE      */
+}
+
+/**   
+ * @} end of PartialConv group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,611 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_partial_fast_q31.c    
+*    
+* Description:	Fast Q31 Partial convolution.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup PartialConv    
+ * @{    
+ */
+
+/**    
+ * @brief Partial convolution of Q31 sequences (fast version) for Cortex-M3 and Cortex-M4.    
+ * @param[in]       *pSrcA points to the first input sequence.    
+ * @param[in]       srcALen length of the first input sequence.    
+ * @param[in]       *pSrcB points to the second input sequence.    
+ * @param[in]       srcBLen length of the second input sequence.    
+ * @param[out]      *pDst points to the location where the output result is written.    
+ * @param[in]       firstIndex is the first output sample to start with.    
+ * @param[in]       numPoints is the number of output points to be computed.    
+ * @return Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].    
+ *    
+ * \par    
+ * See <code>arm_conv_partial_q31()</code> for a slower implementation of this function which uses a 64-bit accumulator to provide higher precision.    
+ */
+
+arm_status arm_conv_partial_fast_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints)
+{
+  q31_t *pIn1;                                   /* inputA pointer               */
+  q31_t *pIn2;                                   /* inputB pointer               */
+  q31_t *pOut = pDst;                            /* output pointer               */
+  q31_t *px;                                     /* Intermediate inputA pointer  */
+  q31_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t *pSrc1, *pSrc2;                          /* Intermediate pointers        */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulators                  */
+  q31_t x0, x1, x2, x3, c0;
+  uint32_t j, k, count, check, blkCnt;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counters                 */
+  arm_status status;                             /* status of Partial convolution */
+
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds    
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = (((int32_t) srcBLen - 1) - (int32_t) firstIndex);
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = (int32_t) check - ((blockSize3 + blockSize1) +
+                                    (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally    
+     * divided into three stages according to the number of multiplications that has to be    
+     * taken place between inputA samples and inputB samples. In the first stage of the    
+     * algorithm, the multiplications increase by one for every iteration.    
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.    
+     * In the third stage of the algorithm, the multiplications decrease by one    
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex    
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------    
+     * Initializations of stage1    
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]    
+     * sum = x[0] * y[1] + x[1] * y[0]    
+     * ....    
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]    
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.    
+       The count variable holds the number of MAC operations performed.    
+       Since the partial convolution starts from firstIndex    
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + firstIndex;
+    py = pSrc2;
+
+    /* ------------------------    
+     * Stage1 process    
+     * ----------------------*/
+
+    /* The first loop starts here */
+    while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* x[0] * y[srcBLen - 1] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* x[1] * y[srcBLen - 2] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* x[2] * y[srcBLen - 3] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* x[3] * y[srcBLen - 4] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum << 1;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------    
+     * Initializations of stage2    
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]    
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]    
+     * ....    
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]    
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* count is index by which the pointer pIn1 to be incremented */
+    count = 0u;
+
+    /* -------------------    
+     * Stage2 process    
+     * ------------------*/
+
+    /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+     * So, to loop unroll over blockSize2,    
+     * srcBLen should be greater than or equal to 4 */
+    if(srcBLen >= 4u)
+    {
+      /* Loop unroll over blockSize2 */
+      blkCnt = ((uint32_t) blockSize2 >> 2u);
+
+      while(blkCnt > 0u)
+      {
+        /* Set all accumulators to zero */
+        acc0 = 0;
+        acc1 = 0;
+        acc2 = 0;
+        acc3 = 0;
+
+        /* read x[0], x[1], x[2] samples */
+        x0 = *(px++);
+        x1 = *(px++);
+        x2 = *(px++);
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        do
+        {
+          /* Read y[srcBLen - 1] sample */
+          c0 = *(py--);
+
+          /* Read x[3] sample */
+          x3 = *(px++);
+
+          /* Perform the multiply-accumulate */
+          /* acc0 +=  x[0] * y[srcBLen - 1] */
+          acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+          /* acc1 +=  x[1] * y[srcBLen - 1] */
+          acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+          /* acc2 +=  x[2] * y[srcBLen - 1] */
+          acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x2 * c0)) >> 32);
+
+          /* acc3 +=  x[3] * y[srcBLen - 1] */
+          acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x3 * c0)) >> 32);
+
+          /* Read y[srcBLen - 2] sample */
+          c0 = *(py--);
+
+          /* Read x[4] sample */
+          x0 = *(px++);
+
+          /* Perform the multiply-accumulate */
+          /* acc0 +=  x[1] * y[srcBLen - 2] */
+          acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x1 * c0)) >> 32);
+          /* acc1 +=  x[2] * y[srcBLen - 2] */
+          acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x2 * c0)) >> 32);
+          /* acc2 +=  x[3] * y[srcBLen - 2] */
+          acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x3 * c0)) >> 32);
+          /* acc3 +=  x[4] * y[srcBLen - 2] */
+          acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+          /* Read y[srcBLen - 3] sample */
+          c0 = *(py--);
+
+          /* Read x[5] sample */
+          x1 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[2] * y[srcBLen - 3] */
+          acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x2 * c0)) >> 32);
+          /* acc1 +=  x[3] * y[srcBLen - 2] */
+          acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x3 * c0)) >> 32);
+          /* acc2 +=  x[4] * y[srcBLen - 2] */
+          acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x0 * c0)) >> 32);
+          /* acc3 +=  x[5] * y[srcBLen - 2] */
+          acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+          /* Read y[srcBLen - 4] sample */
+          c0 = *(py--);
+
+          /* Read x[6] sample */
+          x2 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[3] * y[srcBLen - 4] */
+          acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x3 * c0)) >> 32);
+          /* acc1 +=  x[4] * y[srcBLen - 4] */
+          acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x0 * c0)) >> 32);
+          /* acc2 +=  x[5] * y[srcBLen - 4] */
+          acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x1 * c0)) >> 32);
+          /* acc3 +=  x[6] * y[srcBLen - 4] */
+          acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x2 * c0)) >> 32);
+
+
+        } while(--k);
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Read y[srcBLen - 5] sample */
+          c0 = *(py--);
+
+          /* Read x[7] sample */
+          x3 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[4] * y[srcBLen - 5] */
+          acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+          /* acc1 +=  x[5] * y[srcBLen - 5] */
+          acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+          /* acc2 +=  x[6] * y[srcBLen - 5] */
+          acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x2 * c0)) >> 32);
+          /* acc3 +=  x[7] * y[srcBLen - 5] */
+          acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x3 * c0)) >> 32);
+
+          /* Reuse the present samples for the next MAC */
+          x0 = x1;
+          x1 = x2;
+          x2 = x3;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q31_t) (acc0 << 1);
+        *pOut++ = (q31_t) (acc1 << 1);
+        *pOut++ = (q31_t) (acc2 << 1);
+        *pOut++ = (q31_t) (acc3 << 1);
+
+        /* Increment the pointer pIn1 index, count by 4 */
+        count += 4u;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.    
+       ** No loop unrolling is used. */
+      blkCnt = (uint32_t) blockSize2 % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * px++ * (*py--))) >> 32);
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * px++ * (*py--))) >> 32);
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * px++ * (*py--))) >> 32);
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * px++ * (*py--))) >> 32);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * px++ * (*py--))) >> 32);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = sum << 1;
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,    
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * px++ * (*py--))) >> 32);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = sum << 1;
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------    
+     * Initializations of stage3    
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]    
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]    
+     * ....    
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]    
+     * sum +=  x[srcALen-1] * y[srcBLen-1]    
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.    
+       The count variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* -------------------    
+     * Stage3 process    
+     * ------------------*/
+
+    while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py--))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = sum << 1;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+}
+
+/**    
+ * @} end of PartialConv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_opt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_opt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,765 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_partial_opt_q15.c    
+*    
+* Description:	Partial convolution of Q15 sequences.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup PartialConv    
+ * @{    
+ */
+
+/**    
+ * @brief Partial convolution of Q15 sequences.    
+ * @param[in]       *pSrcA points to the first input sequence.    
+ * @param[in]       srcALen length of the first input sequence.    
+ * @param[in]       *pSrcB points to the second input sequence.    
+ * @param[in]       srcBLen length of the second input sequence.    
+ * @param[out]      *pDst points to the location where the output result is written.    
+ * @param[in]       firstIndex is the first output sample to start with.    
+ * @param[in]       numPoints is the number of output points to be computed.    
+ * @param[in]       *pScratch1 points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.   
+ * @param[in]       *pScratch2 points to scratch buffer of size min(srcALen, srcBLen).   
+ * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, state buffers should be aligned by 32-bit    
+ *    
+ * Refer to <code>arm_conv_partial_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4.   
+ *  
+ * 
+ */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+arm_status arm_conv_partial_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pOut = pDst;                            /* output pointer */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch1 */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch1 */
+  q63_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q31_t x1, x2, x3;                              /* Temporary variables to hold state and coefficient values */
+  q31_t y1, y2;                                  /* State variables */
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  uint32_t j, k, blkCnt;                         /* loop counter */
+  arm_status status;                             /* Status variable */
+  uint32_t tapCnt;                               /* loop count */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Temporary pointer for scratch2 */
+    py = pScratch2;
+
+    /* pointer to take end of scratch2 buffer */
+    pScr2 = pScratch2 + srcBLen - 1;
+
+    /* points to smaller length sequence */
+    px = pIn2;
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcBLen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcBLen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Initialze temporary scratch pointer */
+    pScr1 = pScratch1;
+
+    /* Fill (srcBLen - 1u) zeros in scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update temporary scratch pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Copy bigger length sequence(srcALen) samples in scratch1 buffer */
+
+    /* Copy (srcALen) samples in scratch buffer */
+    arm_copy_q15(pIn1, pScr1, srcALen);
+
+    /* Update pointers */
+    pScr1 += srcALen;
+
+    /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Initialization of pIn2 pointer */
+    pIn2 = py;
+
+    pScratch1 += firstIndex;
+
+    pOut = pDst + firstIndex;
+
+    /* Actual convolution process starts here */
+    blkCnt = (numPoints) >> 2;
+
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* Read two samples from scratch1 buffer */
+      x1 = *__SIMD32(pScr1)++;
+
+      /* Read next two samples from scratch1 buffer */
+      x2 = *__SIMD32(pScr1)++;
+
+      tapCnt = (srcBLen) >> 2u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read four samples from smaller buffer */
+        y1 = _SIMD32_OFFSET(pIn2);
+        y2 = _SIMD32_OFFSET(pIn2 + 2u);
+
+        /* multiply and accumlate */
+        acc0 = __SMLALD(x1, y1, acc0);
+        acc2 = __SMLALD(x2, y1, acc2);
+
+        /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x2, x1, 0);
+#else
+        x3 = __PKHBT(x1, x2, 0);
+#endif
+
+        /* multiply and accumlate */
+        acc1 = __SMLALDX(x3, y1, acc1);
+
+        /* Read next two samples from scratch1 buffer */
+        x1 = _SIMD32_OFFSET(pScr1);
+
+        /* multiply and accumlate */
+        acc0 = __SMLALD(x2, y2, acc0);
+        acc2 = __SMLALD(x1, y2, acc2);
+
+        /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x1, x2, 0);
+#else
+        x3 = __PKHBT(x2, x1, 0);
+#endif
+
+        acc3 = __SMLALDX(x3, y1, acc3);
+        acc1 = __SMLALDX(x3, y2, acc1);
+
+        x2 = _SIMD32_OFFSET(pScr1 + 2u);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x2, x1, 0);
+#else
+        x3 = __PKHBT(x1, x2, 0);
+#endif
+
+        acc3 = __SMLALDX(x3, y2, acc3);
+
+        /* update scratch pointers */
+        pIn2 += 4u;
+        pScr1 += 4u;
+
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* Update scratch pointer for remaining samples of smaller length sequence */
+      pScr1 -= 4u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      tapCnt = (srcBLen) & 3u;
+
+      while(tapCnt > 0u)
+      {
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2);
+        acc1 += (*pScr1++ * *pIn2);
+        acc2 += (*pScr1++ * *pIn2);
+        acc3 += (*pScr1++ * *pIn2++);
+
+        pScr1 -= 3u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+
+      /* Store the results in the accumulators in the destination buffer. */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 4u;
+
+    }
+
+
+    blkCnt = numPoints & 0x3;
+
+    /* Calculate convolution for remaining samples of Bigger length sequence */
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+
+      tapCnt = (srcBLen) >> 1u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read next two samples from scratch1 buffer */
+        x1 = *__SIMD32(pScr1)++;
+
+        /* Read two samples from smaller buffer */
+        y1 = *__SIMD32(pIn2)++;
+
+        acc0 = __SMLALD(x1, y1, acc0);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      tapCnt = (srcBLen) & 1u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2++);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 1u;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+#else
+
+arm_status arm_conv_partial_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pOut = pDst;                            /* output pointer */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch1 */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch1 */
+  q63_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  uint32_t j, k, blkCnt;                         /* loop counter */
+  arm_status status;                             /* Status variable */
+  uint32_t tapCnt;                               /* loop count */
+  q15_t x10, x11, x20, x21;                      /* Temporary variables to hold srcA buffer */
+  q15_t y10, y11;                                /* Temporary variables to hold srcB buffer */
+
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Temporary pointer for scratch2 */
+    py = pScratch2;
+
+    /* pointer to take end of scratch2 buffer */
+    pScr2 = pScratch2 + srcBLen - 1;
+
+    /* points to smaller length sequence */
+    px = pIn2;
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcBLen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcBLen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr2-- = *px++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Initialze temporary scratch pointer */
+    pScr1 = pScratch1;
+
+    /* Fill (srcBLen - 1u) zeros in scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update temporary scratch pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Copy bigger length sequence(srcALen) samples in scratch1 buffer */
+
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcALen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr1++ = *pIn1++;
+      *pScr1++ = *pIn1++;
+      *pScr1++ = *pIn1++;
+      *pScr1++ = *pIn1++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcALen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr1++ = *pIn1++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = (srcBLen - 1u) >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = (srcBLen - 1u) % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr1++ = 0;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+
+    /* Initialization of pIn2 pointer */
+    pIn2 = py;
+
+    pScratch1 += firstIndex;
+
+    pOut = pDst + firstIndex;
+
+    /* Actual convolution process starts here */
+    blkCnt = (numPoints) >> 2;
+
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* Read two samples from scratch1 buffer */
+      x10 = *pScr1++;
+      x11 = *pScr1++;
+
+      /* Read next two samples from scratch1 buffer */
+      x20 = *pScr1++;
+      x21 = *pScr1++;
+
+      tapCnt = (srcBLen) >> 2u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read two samples from smaller buffer */
+        y10 = *pIn2;
+        y11 = *(pIn2 + 1u);
+
+        /* multiply and accumlate */
+        acc0 += (q63_t) x10 *y10;
+        acc0 += (q63_t) x11 *y11;
+        acc2 += (q63_t) x20 *y10;
+        acc2 += (q63_t) x21 *y11;
+
+        /* multiply and accumlate */
+        acc1 += (q63_t) x11 *y10;
+        acc1 += (q63_t) x20 *y11;
+
+        /* Read next two samples from scratch1 buffer */
+        x10 = *pScr1;
+        x11 = *(pScr1 + 1u);
+
+        /* multiply and accumlate */
+        acc3 += (q63_t) x21 *y10;
+        acc3 += (q63_t) x10 *y11;
+
+        /* Read next two samples from scratch2 buffer */
+        y10 = *(pIn2 + 2u);
+        y11 = *(pIn2 + 3u);
+
+        /* multiply and accumlate */
+        acc0 += (q63_t) x20 *y10;
+        acc0 += (q63_t) x21 *y11;
+        acc2 += (q63_t) x10 *y10;
+        acc2 += (q63_t) x11 *y11;
+        acc1 += (q63_t) x21 *y10;
+        acc1 += (q63_t) x10 *y11;
+
+        /* Read next two samples from scratch1 buffer */
+        x20 = *(pScr1 + 2);
+        x21 = *(pScr1 + 3);
+
+        /* multiply and accumlate */
+        acc3 += (q63_t) x11 *y10;
+        acc3 += (q63_t) x20 *y11;
+
+        /* update scratch pointers */
+        pIn2 += 4u;
+        pScr1 += 4u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* Update scratch pointer for remaining samples of smaller length sequence */
+      pScr1 -= 4u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      tapCnt = (srcBLen) & 3u;
+
+      while(tapCnt > 0u)
+      {
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2);
+        acc1 += (*pScr1++ * *pIn2);
+        acc2 += (*pScr1++ * *pIn2);
+        acc3 += (*pScr1++ * *pIn2++);
+
+        pScr1 -= 3u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+
+      /* Store the results in the accumulators in the destination buffer. */
+      *pOut++ = __SSAT((acc0 >> 15), 16);
+      *pOut++ = __SSAT((acc1 >> 15), 16);
+      *pOut++ = __SSAT((acc2 >> 15), 16);
+      *pOut++ = __SSAT((acc3 >> 15), 16);
+
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 4u;
+
+    }
+
+
+    blkCnt = numPoints & 0x3;
+
+    /* Calculate convolution for remaining samples of Bigger length sequence */
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+
+      tapCnt = (srcBLen) >> 1u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read next two samples from scratch1 buffer */
+        x10 = *pScr1++;
+        x11 = *pScr1++;
+
+        /* Read two samples from smaller buffer */
+        y10 = *pIn2++;
+        y11 = *pIn2++;
+
+        /* multiply and accumlate */
+        acc0 += (q63_t) x10 *y10;
+        acc0 += (q63_t) x11 *y11;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      tapCnt = (srcBLen) & 1u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pIn2++);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+
+      /* Initialization of inputB pointer */
+      pIn2 = py;
+
+      pScratch1 += 1u;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+
+/**    
+ * @} end of PartialConv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_opt_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_opt_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,803 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_partial_opt_q7.c    
+*    
+* Description:	Partial convolution of Q7 sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup PartialConv    
+ * @{    
+ */
+
+/**    
+ * @brief Partial convolution of Q7 sequences.    
+ * @param[in]       *pSrcA points to the first input sequence.    
+ * @param[in]       srcALen length of the first input sequence.    
+ * @param[in]       *pSrcB points to the second input sequence.    
+ * @param[in]       srcBLen length of the second input sequence.    
+ * @param[out]      *pDst points to the location where the output result is written.    
+ * @param[in]       firstIndex is the first output sample to start with.    
+ * @param[in]       numPoints is the number of output points to be computed.    
+ * @param[in]      *pScratch1 points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.    
+ * @param[in]      *pScratch2 points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).    
+ * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit   
+ * 
+ *
+ * 
+ */
+
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+arm_status arm_conv_partial_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pScr2, *pScr1;                          /* Intermediate pointers for scratch pointers */
+  q15_t x4;                                      /* Temporary input variable */
+  q7_t *pIn1, *pIn2;                             /* inputA and inputB pointer */
+  uint32_t j, k, blkCnt, tapCnt;                 /* loop counter */
+  q7_t *px;                                      /* Temporary input1 pointer */
+  q15_t *py;                                     /* Temporary input2 pointer */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  q31_t x1, x2, x3, y1;                          /* Temporary input variables */
+  arm_status status;
+  q7_t *pOut = pDst;                             /* output pointer */
+  q7_t out0, out1, out2, out3;                   /* temporary variables */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* pointer to take end of scratch2 buffer */
+    pScr2 = pScratch2;
+
+    /* points to smaller length sequence */
+    px = pIn2 + srcBLen - 1;
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcBLen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcBLen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Initialze temporary scratch pointer */
+    pScr1 = pScratch1;
+
+    /* Fill (srcBLen - 1u) zeros in scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update temporary scratch pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Copy (srcALen) samples in scratch buffer */
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcALen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcALen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update pointer */
+    pScr1 += (srcBLen - 1u);
+
+
+    /* Temporary pointer for scratch2 */
+    py = pScratch2;
+
+    /* Initialization of pIn2 pointer */
+    pIn2 = (q7_t *) py;
+
+    pScr2 = py;
+
+    pOut = pDst + firstIndex;
+
+    pScratch1 += firstIndex;
+
+    /* Actual convolution process starts here */
+    blkCnt = (numPoints) >> 2;
+
+
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* Read two samples from scratch1 buffer */
+      x1 = *__SIMD32(pScr1)++;
+
+      /* Read next two samples from scratch1 buffer */
+      x2 = *__SIMD32(pScr1)++;
+
+      tapCnt = (srcBLen) >> 2u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read four samples from smaller buffer */
+        y1 = _SIMD32_OFFSET(pScr2);
+
+        /* multiply and accumlate */
+        acc0 = __SMLAD(x1, y1, acc0);
+        acc2 = __SMLAD(x2, y1, acc2);
+
+        /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x2, x1, 0);
+#else
+        x3 = __PKHBT(x1, x2, 0);
+#endif
+
+        /* multiply and accumlate */
+        acc1 = __SMLADX(x3, y1, acc1);
+
+        /* Read next two samples from scratch1 buffer */
+        x1 = *__SIMD32(pScr1)++;
+
+        /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x1, x2, 0);
+#else
+        x3 = __PKHBT(x2, x1, 0);
+#endif
+
+        acc3 = __SMLADX(x3, y1, acc3);
+
+        /* Read four samples from smaller buffer */
+        y1 = _SIMD32_OFFSET(pScr2 + 2u);
+
+        acc0 = __SMLAD(x2, y1, acc0);
+
+        acc2 = __SMLAD(x1, y1, acc2);
+
+        acc1 = __SMLADX(x3, y1, acc1);
+
+        x2 = *__SIMD32(pScr1)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+        x3 = __PKHBT(x2, x1, 0);
+#else
+        x3 = __PKHBT(x1, x2, 0);
+#endif
+
+        acc3 = __SMLADX(x3, y1, acc3);
+
+        pScr2 += 4u;
+
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+
+
+      /* Update scratch pointer for remaining samples of smaller length sequence */
+      pScr1 -= 4u;
+
+
+      /* apply same above for remaining samples of smaller length sequence */
+      tapCnt = (srcBLen) & 3u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pScr2);
+        acc1 += (*pScr1++ * *pScr2);
+        acc2 += (*pScr1++ * *pScr2);
+        acc3 += (*pScr1++ * *pScr2++);
+
+        pScr1 -= 3u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      out0 = (q7_t) (__SSAT(acc0 >> 7u, 8));
+      out1 = (q7_t) (__SSAT(acc1 >> 7u, 8));
+      out2 = (q7_t) (__SSAT(acc2 >> 7u, 8));
+      out3 = (q7_t) (__SSAT(acc3 >> 7u, 8));
+
+      *__SIMD32(pOut)++ = __PACKq7(out0, out1, out2, out3);
+
+      /* Initialization of inputB pointer */
+      pScr2 = py;
+
+      pScratch1 += 4u;
+
+    }
+
+    blkCnt = (numPoints) & 0x3;
+
+    /* Calculate convolution for remaining samples of Bigger length sequence */
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+
+      tapCnt = (srcBLen) >> 1u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read next two samples from scratch1 buffer */
+        x1 = *__SIMD32(pScr1)++;
+
+        /* Read two samples from smaller buffer */
+        y1 = *__SIMD32(pScr2)++;
+
+        acc0 = __SMLAD(x1, y1, acc0);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      tapCnt = (srcBLen) & 1u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pScr2++);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(acc0 >> 7u, 8));
+
+      /* Initialization of inputB pointer */
+      pScr2 = py;
+
+      pScratch1 += 1u;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+
+  }
+
+  return (status);
+
+}
+
+#else
+
+arm_status arm_conv_partial_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+
+  q15_t *pScr2, *pScr1;                          /* Intermediate pointers for scratch pointers */
+  q15_t x4;                                      /* Temporary input variable */
+  q7_t *pIn1, *pIn2;                             /* inputA and inputB pointer */
+  uint32_t j, k, blkCnt, tapCnt;                 /* loop counter */
+  q7_t *px;                                      /* Temporary input1 pointer */
+  q15_t *py;                                     /* Temporary input2 pointer */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulator */
+  arm_status status;
+  q7_t *pOut = pDst;                             /* output pointer */
+  q15_t x10, x11, x20, x21;                      /* Temporary input variables */
+  q15_t y10, y11;                                /* Temporary input variables */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* pointer to take end of scratch2 buffer */
+    pScr2 = pScratch2;
+
+    /* points to smaller length sequence */
+    px = pIn2 + srcBLen - 1;
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcBLen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcBLen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      x4 = (q15_t) * px--;
+      *pScr2++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Initialze temporary scratch pointer */
+    pScr1 = pScratch1;
+
+    /* Fill (srcBLen - 1u) zeros in scratch buffer */
+    arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+    /* Update temporary scratch pointer */
+    pScr1 += (srcBLen - 1u);
+
+    /* Copy (srcALen) samples in scratch buffer */
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = srcALen >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = srcALen % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      x4 = (q15_t) * pIn1++;
+      *pScr1++ = x4;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Apply loop unrolling and do 4 Copies simultaneously. */
+    k = (srcBLen - 1u) >> 2u;
+
+    /* First part of the processing with loop unrolling copies 4 data points at a time.       
+     ** a second loop below copies for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner */
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+      *pScr1++ = 0;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, copy remaining samples here.       
+     ** No loop unrolling is used. */
+    k = (srcBLen - 1u) % 0x4u;
+
+    while(k > 0u)
+    {
+      /* copy second buffer in reversal manner for remaining samples */
+      *pScr1++ = 0;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+
+    /* Temporary pointer for scratch2 */
+    py = pScratch2;
+
+    /* Initialization of pIn2 pointer */
+    pIn2 = (q7_t *) py;
+
+    pScr2 = py;
+
+    pOut = pDst + firstIndex;
+
+    pScratch1 += firstIndex;
+
+    /* Actual convolution process starts here */
+    blkCnt = (numPoints) >> 2;
+
+
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* Read two samples from scratch1 buffer */
+      x10 = *pScr1++;
+      x11 = *pScr1++;
+
+      /* Read next two samples from scratch1 buffer */
+      x20 = *pScr1++;
+      x21 = *pScr1++;
+
+      tapCnt = (srcBLen) >> 2u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read four samples from smaller buffer */
+        y10 = *pScr2;
+        y11 = *(pScr2 + 1u);
+
+        /* multiply and accumlate */
+        acc0 += (q31_t) x10 *y10;
+        acc0 += (q31_t) x11 *y11;
+        acc2 += (q31_t) x20 *y10;
+        acc2 += (q31_t) x21 *y11;
+
+
+        acc1 += (q31_t) x11 *y10;
+        acc1 += (q31_t) x20 *y11;
+
+        /* Read next two samples from scratch1 buffer */
+        x10 = *pScr1;
+        x11 = *(pScr1 + 1u);
+
+        /* multiply and accumlate */
+        acc3 += (q31_t) x21 *y10;
+        acc3 += (q31_t) x10 *y11;
+
+        /* Read next two samples from scratch2 buffer */
+        y10 = *(pScr2 + 2u);
+        y11 = *(pScr2 + 3u);
+
+        /* multiply and accumlate */
+        acc0 += (q31_t) x20 *y10;
+        acc0 += (q31_t) x21 *y11;
+        acc2 += (q31_t) x10 *y10;
+        acc2 += (q31_t) x11 *y11;
+        acc1 += (q31_t) x21 *y10;
+        acc1 += (q31_t) x10 *y11;
+
+        /* Read next two samples from scratch1 buffer */
+        x20 = *(pScr1 + 2);
+        x21 = *(pScr1 + 3);
+
+        /* multiply and accumlate */
+        acc3 += (q31_t) x11 *y10;
+        acc3 += (q31_t) x20 *y11;
+
+        /* update scratch pointers */
+
+        pScr1 += 4u;
+        pScr2 += 4u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+
+
+      /* Update scratch pointer for remaining samples of smaller length sequence */
+      pScr1 -= 4u;
+
+
+      /* apply same above for remaining samples of smaller length sequence */
+      tapCnt = (srcBLen) & 3u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pScr2);
+        acc1 += (*pScr1++ * *pScr2);
+        acc2 += (*pScr1++ * *pScr2);
+        acc3 += (*pScr1++ * *pScr2++);
+
+        pScr1 -= 3u;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(acc0 >> 7u, 8));
+      *pOut++ = (q7_t) (__SSAT(acc1 >> 7u, 8));
+      *pOut++ = (q7_t) (__SSAT(acc2 >> 7u, 8));
+      *pOut++ = (q7_t) (__SSAT(acc3 >> 7u, 8));
+
+      /* Initialization of inputB pointer */
+      pScr2 = py;
+
+      pScratch1 += 4u;
+
+    }
+
+    blkCnt = (numPoints) & 0x3;
+
+    /* Calculate convolution for remaining samples of Bigger length sequence */
+    while(blkCnt > 0)
+    {
+      /* Initialze temporary scratch pointer as scratch1 */
+      pScr1 = pScratch1;
+
+      /* Clear Accumlators */
+      acc0 = 0;
+
+      tapCnt = (srcBLen) >> 1u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read next two samples from scratch1 buffer */
+        x10 = *pScr1++;
+        x11 = *pScr1++;
+
+        /* Read two samples from smaller buffer */
+        y10 = *pScr2++;
+        y11 = *pScr2++;
+
+        /* multiply and accumlate */
+        acc0 += (q31_t) x10 *y10;
+        acc0 += (q31_t) x11 *y11;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      tapCnt = (srcBLen) & 1u;
+
+      /* apply same above for remaining samples of smaller length sequence */
+      while(tapCnt > 0u)
+      {
+
+        /* accumlate the results */
+        acc0 += (*pScr1++ * *pScr2++);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      blkCnt--;
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(acc0 >> 7u, 8));
+
+      /* Initialization of inputB pointer */
+      pScr2 = py;
+
+      pScratch1 += 1u;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+  }
+
+  return (status);
+
+}
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+
+
+/**    
+ * @} end of PartialConv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,786 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_conv_partial_q15.c   
+*   
+* Description:	Partial convolution of Q15 sequences.  
+*   
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+* 
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup PartialConv   
+ * @{   
+ */
+
+/**   
+ * @brief Partial convolution of Q15 sequences.   
+ * @param[in]       *pSrcA points to the first input sequence.   
+ * @param[in]       srcALen length of the first input sequence.   
+ * @param[in]       *pSrcB points to the second input sequence.   
+ * @param[in]       srcBLen length of the second input sequence.   
+ * @param[out]      *pDst points to the location where the output result is written.   
+ * @param[in]       firstIndex is the first output sample to start with.   
+ * @param[in]       numPoints is the number of output points to be computed.   
+ * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].   
+ *   
+ * Refer to <code>arm_conv_partial_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4.  
+ * 
+ * \par    
+ * Refer the function <code>arm_conv_partial_opt_q15()</code> for a faster implementation of this function using scratch buffers.
+ * 
+ */
+
+
+arm_status arm_conv_partial_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints)
+{
+
+#if (defined(ARM_MATH_CM4) || defined(ARM_MATH_CM3)) && !defined(UNALIGNED_SUPPORT_DISABLE)
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q15_t *pOut = pDst;                            /* output pointer               */
+  q63_t sum, acc0, acc1, acc2, acc3;             /* Accumulator                  */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1, *pSrc2;                          /* Intermediate pointers        */
+  q31_t x0, x1, x2, x3, c0;                      /* Temporary input variables */
+  uint32_t j, k, count, check, blkCnt;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counter                 */
+  arm_status status;                             /* status of Partial convolution */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds   
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = (((int32_t) srcBLen - 1) - (int32_t) firstIndex);
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = (int32_t) check - ((blockSize3 + blockSize1) +
+                                    (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally   
+     * divided into three stages according to the number of multiplications that has to be   
+     * taken place between inputA samples and inputB samples. In the first stage of the   
+     * algorithm, the multiplications increase by one for every iteration.   
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+     * In the third stage of the algorithm, the multiplications decrease by one   
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex   
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------   
+     * Initializations of stage1   
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]   
+     * sum = x[0] * y[1] + x[1] * y[0]   
+     * ....   
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed.   
+       Since the partial convolution starts from firstIndex   
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + firstIndex;
+    py = pSrc2;
+
+    /* ------------------------   
+     * Stage1 process   
+     * ----------------------*/
+
+    /* For loop unrolling by 4, this stage is divided into two. */
+    /* First part of this stage computes the MAC operations less than 4 */
+    /* Second part of this stage computes the MAC operations greater than or equal to 4 */
+
+    /* The first part of the stage starts here */
+    while((count < 4u) && (blockSize1 > 0))
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over number of MAC operations between   
+       * inputA samples and inputB samples */
+      k = count;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = __SMLALD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* The second part of the stage starts here */
+    /* The internal loop, over count, is unrolled by 4 */
+    /* To, read the last two inputB samples using SIMD:   
+     * y[srcBLen] and y[srcBLen-1] coefficients, py is decremented by 1 */
+    py = py - 1;
+
+    while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* x[0], x[1] are multiplied with y[srcBLen - 1], y[srcBLen - 2] respectively */
+        sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+        /* x[2], x[3] are multiplied with y[srcBLen - 3], y[srcBLen - 4] respectively */
+        sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* For the next MAC operations, the pointer py is used without SIMD   
+       * So, py is incremented by 1 */
+      py = py + 1u;
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = __SMLALD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2 - 1u;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------   
+     * Initializations of stage2   
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+     * ....   
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+  /* count is the index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+
+  /* --------------------   
+   * Stage2 process   
+   * -------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      py = py - 1u;
+
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+
+      /* read x[0], x[1] samples */
+      x0 = *__SIMD32(px);
+      /* read x[1], x[2] samples */
+      x1 = _SIMD32_OFFSET(px+1);
+	  px+= 2u;
+
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the last two inputB samples using SIMD:   
+         * y[srcBLen - 1] and y[srcBLen - 2] */
+        c0 = *__SIMD32(py)--;
+
+        /* acc0 +=  x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2] */
+        acc0 = __SMLALDX(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2] */
+        acc1 = __SMLALDX(x1, c0, acc1);
+
+        /* Read x[2], x[3] */
+        x2 = *__SIMD32(px);
+
+        /* Read x[3], x[4] */
+        x3 = _SIMD32_OFFSET(px+1);
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2] */
+        acc2 = __SMLALDX(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2] */
+        acc3 = __SMLALDX(x3, c0, acc3);
+
+        /* Read y[srcBLen - 3] and y[srcBLen - 4] */
+        c0 = *__SIMD32(py)--;
+
+        /* acc0 +=  x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4] */
+        acc0 = __SMLALDX(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4] */
+        acc1 = __SMLALDX(x3, c0, acc1);
+
+        /* Read x[4], x[5] */
+        x0 = _SIMD32_OFFSET(px+2);
+
+        /* Read x[5], x[6] */
+        x1 = _SIMD32_OFFSET(px+3);
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4] */
+        acc2 = __SMLALDX(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4] */
+        acc3 = __SMLALDX(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[srcBLen - 5] */
+        c0 = *(py+1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[7] */
+        x3 = *__SIMD32(px);
+		px++;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALD(x0, c0, acc0);
+        acc1 = __SMLALD(x1, c0, acc1);
+        acc2 = __SMLALDX(x1, c0, acc2);
+        acc3 = __SMLALDX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x0, c0, acc0);
+        acc1 = __SMLALDX(x1, c0, acc1);
+        acc2 = __SMLALDX(x3, c0, acc2);
+        acc3 = __SMLALDX(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x0, c0, acc0);
+        acc1 = __SMLALDX(x1, c0, acc1);
+        acc2 = __SMLALDX(x3, c0, acc2);
+        acc3 = __SMLALDX(x2, c0, acc3);
+
+		c0 = *(py-1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[10] */
+        x3 =  _SIMD32_OFFSET(px+2);
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x1, c0, acc0);
+        acc1 = __SMLALD(x2, c0, acc1);
+        acc2 = __SMLALDX(x2, c0, acc2);
+        acc3 = __SMLALDX(x3, c0, acc3);
+      }
+
+
+      /* Store the results in the accumulators in the destination buffer. */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+       ** No loop unrolling is used. */
+      blkCnt = (uint32_t) blockSize2 % 0x4u;
+  	  
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += (q63_t) ((q31_t) * px++ * *py--);
+          sum += (q63_t) ((q31_t) * px++ * *py--);
+          sum += (q63_t) ((q31_t) * px++ * *py--);
+          sum += (q63_t) ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += (q63_t) ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q15_t) (__SSAT(sum >> 15, 16));
+
+        /* Increment the pointer pIn1 index, count by 1 */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,   
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += (q63_t) ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q15_t) (__SSAT(sum >> 15, 16));
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+  
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------   
+     * Initializations of stage3   
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+     * ....   
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+     * sum +=  x[srcALen-1] * y[srcBLen-1]   
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    pIn2 = pSrc2 - 1u;
+    py = pIn2;
+
+    /* -------------------   
+     * Stage3 process   
+     * ------------------*/
+
+    /* For loop unrolling by 4, this stage is divided into two. */
+    /* First part of this stage computes the MAC operations greater than 4 */
+    /* Second part of this stage computes the MAC operations less than or equal to 4 */
+
+    /* The first part of the stage starts here */
+    j = count >> 2u;
+
+    while((j > 0u) && (blockSize3 > 0))
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* x[srcALen - srcBLen + 1], x[srcALen - srcBLen + 2] are multiplied   
+         * with y[srcBLen - 1], y[srcBLen - 2] respectively */
+        sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+        /* x[srcALen - srcBLen + 3], x[srcALen - srcBLen + 4] are multiplied   
+         * with y[srcBLen - 3], y[srcBLen - 4] respectively */
+        sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* For the next MAC operations, the pointer py is used without SIMD   
+       * So, py is incremented by 1 */
+      py = py + 1u;
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* sum += x[srcALen - srcBLen + 5] * y[srcBLen - 5] */
+        sum = __SMLALD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pIn2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+      j--;
+    }
+
+    /* The second part of the stage starts here */
+    /* SIMD is not used for the next MAC operations,   
+     * so pointer py is updated to read only one sample at a time */
+    py = py + 1u;
+
+    while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum = __SMLALD(*px++, *py--, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q15_t *pIn1 = pSrcA;                           /* inputA pointer */
+  q15_t *pIn2 = pSrcB;                           /* inputB pointer */
+  q63_t sum;                                     /* Accumulator */
+  uint32_t i, j;                                 /* loop counters */
+  arm_status status;                             /* status of Partial convolution */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+    /* Loop to calculate convolution for output length number of values */
+    for (i = firstIndex; i <= (firstIndex + numPoints - 1); i++)
+    {
+      /* Initialize sum with zero to carry on MAC operations */
+      sum = 0;
+
+      /* Loop to perform MAC operations according to convolution equation */
+      for (j = 0; j <= i; j++)
+      {
+        /* Check the array limitations */
+        if(((i - j) < srcBLen) && (j < srcALen))
+        {
+          /* z[i] += x[i-j] * y[j] */
+          sum += ((q31_t) pIn1[j] * (pIn2[i - j]));
+        }
+      }
+
+      /* Store the output in the destination buffer */
+      pDst[i] = (q15_t) __SSAT((sum >> 15u), 16u);
+    }
+    /* set status as ARM_SUCCESS as there are no argument errors */
+    status = ARM_MATH_SUCCESS;
+  }
+  return (status);
+
+#endif /* #if (defined(ARM_MATH_CM4) || defined(ARM_MATH_CM3)) && !defined(UNALIGNED_SUPPORT_DISABLE)  */
+
+}
+
+/**   
+ * @} end of PartialConv group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,607 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_partial_q31.c    
+*    
+* Description:	Partial convolution of Q31 sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup PartialConv    
+ * @{    
+ */
+
+/**    
+ * @brief Partial convolution of Q31 sequences.    
+ * @param[in]       *pSrcA points to the first input sequence.    
+ * @param[in]       srcALen length of the first input sequence.    
+ * @param[in]       *pSrcB points to the second input sequence.    
+ * @param[in]       srcBLen length of the second input sequence.    
+ * @param[out]      *pDst points to the location where the output result is written.    
+ * @param[in]       firstIndex is the first output sample to start with.    
+ * @param[in]       numPoints is the number of output points to be computed.    
+ * @return Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].    
+ *    
+ * See <code>arm_conv_partial_fast_q31()</code> for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.    
+ */
+
+arm_status arm_conv_partial_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t *pIn1;                                   /* inputA pointer               */
+  q31_t *pIn2;                                   /* inputB pointer               */
+  q31_t *pOut = pDst;                            /* output pointer               */
+  q31_t *px;                                     /* Intermediate inputA pointer  */
+  q31_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t *pSrc1, *pSrc2;                          /* Intermediate pointers        */
+  q63_t sum, acc0, acc1, acc2;                   /* Accumulator                  */
+  q31_t x0, x1, x2, c0;
+  uint32_t j, k, count, check, blkCnt;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counter                 */
+  arm_status status;                             /* status of Partial convolution */
+
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds    
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = (((int32_t) srcBLen - 1) - (int32_t) firstIndex);
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = (int32_t) check - ((blockSize3 + blockSize1) +
+                                    (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally    
+     * divided into three stages according to the number of multiplications that has to be    
+     * taken place between inputA samples and inputB samples. In the first stage of the    
+     * algorithm, the multiplications increase by one for every iteration.    
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.    
+     * In the third stage of the algorithm, the multiplications decrease by one    
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex    
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------    
+     * Initializations of stage1    
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]    
+     * sum = x[0] * y[1] + x[1] * y[0]    
+     * ....    
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]    
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.    
+       The count variable holds the number of MAC operations performed.    
+       Since the partial convolution starts from firstIndex    
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + firstIndex;
+    py = pSrc2;
+
+    /* ------------------------    
+     * Stage1 process    
+     * ----------------------*/
+
+    /* The first loop starts here */
+    while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* x[0] * y[srcBLen - 1] */
+        sum += (q63_t) * px++ * (*py--);
+        /* x[1] * y[srcBLen - 2] */
+        sum += (q63_t) * px++ * (*py--);
+        /* x[2] * y[srcBLen - 3] */
+        sum += (q63_t) * px++ * (*py--);
+        /* x[3] * y[srcBLen - 4] */
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q31_t) (sum >> 31);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------    
+     * Initializations of stage2    
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]    
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]    
+     * ....    
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]    
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* count is index by which the pointer pIn1 to be incremented */
+    count = 0u;
+
+    /* -------------------    
+     * Stage2 process    
+     * ------------------*/
+
+    /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+     * So, to loop unroll over blockSize2,    
+     * srcBLen should be greater than or equal to 4 */
+    if(srcBLen >= 4u)
+    {
+      /* Loop unroll over blkCnt */
+
+      blkCnt = blockSize2 / 3;
+      while(blkCnt > 0u)
+      {
+        /* Set all accumulators to zero */
+        acc0 = 0;
+        acc1 = 0;
+        acc2 = 0;
+
+        /* read x[0], x[1] samples */
+        x0 = *(px++);
+        x1 = *(px++);
+
+        /* Apply loop unrolling and compute 3 MACs simultaneously. */
+        k = srcBLen / 3;
+
+        /* First part of the processing with loop unrolling.  Compute 3 MACs at a time.        
+         ** a second loop below computes MACs for the remaining 1 to 2 samples. */
+        do
+        {
+          /* Read y[srcBLen - 1] sample */
+          c0 = *(py);
+
+          /* Read x[2] sample */
+          x2 = *(px);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[0] * y[srcBLen - 1] */
+          acc0 += (q63_t) x0 *c0;
+          /* acc1 +=  x[1] * y[srcBLen - 1] */
+          acc1 += (q63_t) x1 *c0;
+          /* acc2 +=  x[2] * y[srcBLen - 1] */
+          acc2 += (q63_t) x2 *c0;
+
+          /* Read y[srcBLen - 2] sample */
+          c0 = *(py - 1u);
+
+          /* Read x[3] sample */
+          x0 = *(px + 1u);
+
+          /* Perform the multiply-accumulate */
+          /* acc0 +=  x[1] * y[srcBLen - 2] */
+          acc0 += (q63_t) x1 *c0;
+          /* acc1 +=  x[2] * y[srcBLen - 2] */
+          acc1 += (q63_t) x2 *c0;
+          /* acc2 +=  x[3] * y[srcBLen - 2] */
+          acc2 += (q63_t) x0 *c0;
+
+          /* Read y[srcBLen - 3] sample */
+          c0 = *(py - 2u);
+
+          /* Read x[4] sample */
+          x1 = *(px + 2u);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[2] * y[srcBLen - 3] */
+          acc0 += (q63_t) x2 *c0;
+          /* acc1 +=  x[3] * y[srcBLen - 2] */
+          acc1 += (q63_t) x0 *c0;
+          /* acc2 +=  x[4] * y[srcBLen - 2] */
+          acc2 += (q63_t) x1 *c0;
+
+
+          px += 3u;
+
+          py -= 3u;
+
+        } while(--k);
+
+        /* If the srcBLen is not a multiple of 3, compute any remaining MACs here.        
+         ** No loop unrolling is used. */
+        k = srcBLen - (3 * (srcBLen / 3));
+
+        while(k > 0u)
+        {
+          /* Read y[srcBLen - 5] sample */
+          c0 = *(py--);
+
+          /* Read x[7] sample */
+          x2 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[4] * y[srcBLen - 5] */
+          acc0 += (q63_t) x0 *c0;
+          /* acc1 +=  x[5] * y[srcBLen - 5] */
+          acc1 += (q63_t) x1 *c0;
+          /* acc2 +=  x[6] * y[srcBLen - 5] */
+          acc2 += (q63_t) x2 *c0;
+
+          /* Reuse the present samples for the next MAC */
+          x0 = x1;
+          x1 = x2;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q31_t) (acc0 >> 31);
+        *pOut++ = (q31_t) (acc1 >> 31);
+        *pOut++ = (q31_t) (acc2 >> 31);
+
+        /* Increment the pointer pIn1 index, count by 3 */
+        count += 3u;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 3, compute any remaining output samples here.        
+       ** No loop unrolling is used. */
+      blkCnt = blockSize2 - 3 * (blockSize2 / 3);
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += (q63_t) * px++ * (*py--);
+          sum += (q63_t) * px++ * (*py--);
+          sum += (q63_t) * px++ * (*py--);
+          sum += (q63_t) * px++ * (*py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += (q63_t) * px++ * (*py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q31_t) (sum >> 31);
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,    
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += (q63_t) * px++ * (*py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q31_t) (sum >> 31);
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------    
+     * Initializations of stage3    
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]    
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]    
+     * ....    
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]    
+     * sum +=  x[srcALen-1] * y[srcBLen-1]    
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.    
+       The blockSize3 variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* -------------------    
+     * Stage3 process    
+     * ------------------*/
+
+    while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        sum += (q63_t) * px++ * (*py--);
+        sum += (q63_t) * px++ * (*py--);
+        sum += (q63_t) * px++ * (*py--);
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q31_t) (sum >> 31);
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t *pIn1 = pSrcA;                           /* inputA pointer */
+  q31_t *pIn2 = pSrcB;                           /* inputB pointer */
+  q63_t sum;                                     /* Accumulator */
+  uint32_t i, j;                                 /* loop counters */
+  arm_status status;                             /* status of Partial convolution */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+    /* Loop to calculate convolution for output length number of values */
+    for (i = firstIndex; i <= (firstIndex + numPoints - 1); i++)
+    {
+      /* Initialize sum with zero to carry on MAC operations */
+      sum = 0;
+
+      /* Loop to perform MAC operations according to convolution equation */
+      for (j = 0; j <= i; j++)
+      {
+        /* Check the array limitations */
+        if(((i - j) < srcBLen) && (j < srcALen))
+        {
+          /* z[i] += x[i-j] * y[j] */
+          sum += ((q63_t) pIn1[j] * (pIn2[i - j]));
+        }
+      }
+
+      /* Store the output in the destination buffer */
+      pDst[i] = (q31_t) (sum >> 31u);
+    }
+    /* set status as ARM_SUCCESS as there are no argument errors */
+    status = ARM_MATH_SUCCESS;
+  }
+  return (status);
+
+#endif /*    #ifndef ARM_MATH_CM0_FAMILY      */
+
+}
+
+/**    
+ * @} end of PartialConv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_partial_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,741 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_conv_partial_q7.c   
+*   
+* Description:	Partial convolution of Q7 sequences.   
+*   
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup PartialConv   
+ * @{   
+ */
+
+/**   
+ * @brief Partial convolution of Q7 sequences.   
+ * @param[in]       *pSrcA points to the first input sequence.   
+ * @param[in]       srcALen length of the first input sequence.   
+ * @param[in]       *pSrcB points to the second input sequence.   
+ * @param[in]       srcBLen length of the second input sequence.   
+ * @param[out]      *pDst points to the location where the output result is written.   
+ * @param[in]       firstIndex is the first output sample to start with.   
+ * @param[in]       numPoints is the number of output points to be computed.   
+ * @return  Returns either ARM_MATH_SUCCESS if the function completed correctly or ARM_MATH_ARGUMENT_ERROR if the requested subset is not in the range [0 srcALen+srcBLen-2].   
+ *  
+ * \par    
+ * Refer the function <code>arm_conv_partial_opt_q7()</code> for a faster implementation of this function.
+ *  
+ */
+
+arm_status arm_conv_partial_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  uint32_t firstIndex,
+  uint32_t numPoints)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t *pIn1;                                    /* inputA pointer */
+  q7_t *pIn2;                                    /* inputB pointer */
+  q7_t *pOut = pDst;                             /* output pointer */
+  q7_t *px;                                      /* Intermediate inputA pointer */
+  q7_t *py;                                      /* Intermediate inputB pointer */
+  q7_t *pSrc1, *pSrc2;                           /* Intermediate pointers */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator */
+  q31_t input1, input2;
+  q15_t in1, in2;
+  q7_t x0, x1, x2, x3, c0, c1;
+  uint32_t j, k, count, check, blkCnt;
+  int32_t blockSize1, blockSize2, blockSize3;    /* loop counter */
+  arm_status status;
+
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_MATH_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+
+    /* The algorithm implementation is based on the lengths of the inputs. */
+    /* srcB is always made to slide across srcA. */
+    /* So srcBLen is always considered as shorter or equal to srcALen */
+    if(srcALen >= srcBLen)
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcA;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcB;
+    }
+    else
+    {
+      /* Initialization of inputA pointer */
+      pIn1 = pSrcB;
+
+      /* Initialization of inputB pointer */
+      pIn2 = pSrcA;
+
+      /* srcBLen is always considered as shorter or equal to srcALen */
+      j = srcBLen;
+      srcBLen = srcALen;
+      srcALen = j;
+    }
+
+    /* Conditions to check which loopCounter holds   
+     * the first and last indices of the output samples to be calculated. */
+    check = firstIndex + numPoints;
+    blockSize3 = ((int32_t)check > (int32_t)srcALen) ? (int32_t)check - (int32_t)srcALen : 0;
+    blockSize3 = ((int32_t)firstIndex > (int32_t)srcALen - 1) ? blockSize3 - (int32_t)firstIndex + (int32_t)srcALen : blockSize3;
+    blockSize1 = (((int32_t) srcBLen - 1) - (int32_t) firstIndex);
+    blockSize1 = (blockSize1 > 0) ? ((check > (srcBLen - 1u)) ? blockSize1 :
+                                     (int32_t) numPoints) : 0;
+    blockSize2 = (int32_t) check - ((blockSize3 + blockSize1) +
+                                    (int32_t) firstIndex);
+    blockSize2 = (blockSize2 > 0) ? blockSize2 : 0;
+
+    /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+    /* The function is internally   
+     * divided into three stages according to the number of multiplications that has to be   
+     * taken place between inputA samples and inputB samples. In the first stage of the   
+     * algorithm, the multiplications increase by one for every iteration.   
+     * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+     * In the third stage of the algorithm, the multiplications decrease by one   
+     * for every iteration. */
+
+    /* Set the output pointer to point to the firstIndex   
+     * of the output sample to be calculated. */
+    pOut = pDst + firstIndex;
+
+    /* --------------------------   
+     * Initializations of stage1   
+     * -------------------------*/
+
+    /* sum = x[0] * y[0]   
+     * sum = x[0] * y[1] + x[1] * y[0]   
+     * ....   
+     * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+     */
+
+    /* In this stage the MAC operations are increased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed.   
+       Since the partial convolution starts from from firstIndex   
+       Number of Macs to be performed is firstIndex + 1 */
+    count = 1u + firstIndex;
+
+    /* Working pointer of inputA */
+    px = pIn1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + firstIndex;
+    py = pSrc2;
+
+    /* ------------------------   
+     * Stage1 process   
+     * ----------------------*/
+
+    /* The first stage starts here */
+    while(blockSize1 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* x[0] , x[1] */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* y[srcBLen - 1] , y[srcBLen - 2] */
+        in1 = (q15_t) * py--;
+        in2 = (q15_t) * py--;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* x[0] * y[srcBLen - 1] */
+        /* x[1] * y[srcBLen - 2] */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* x[2] , x[3] */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* y[srcBLen - 3] , y[srcBLen - 4] */
+        in1 = (q15_t) * py--;
+        in2 = (q15_t) * py--;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* x[2] * y[srcBLen - 3] */
+        /* x[3] * y[srcBLen - 4] */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(sum >> 7, 8));
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      py = ++pSrc2;
+      px = pIn1;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Decrement the loop counter */
+      blockSize1--;
+    }
+
+    /* --------------------------   
+     * Initializations of stage2   
+     * ------------------------*/
+
+    /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+     * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+     * ....   
+     * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+     */
+
+    /* Working pointer of inputA */
+    if((int32_t)firstIndex - (int32_t)srcBLen + 1 > 0)
+    {
+      px = pIn1 + firstIndex - srcBLen + 1;
+    }
+    else
+    {
+      px = pIn1;
+    }
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* count is index by which the pointer pIn1 to be incremented */
+    count = 0u;
+
+    /* -------------------   
+     * Stage2 process   
+     * ------------------*/
+
+    /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+     * So, to loop unroll over blockSize2,   
+     * srcBLen should be greater than or equal to 4 */
+    if(srcBLen >= 4u)
+    {
+      /* Loop unroll over blockSize2, by 4 */
+      blkCnt = ((uint32_t) blockSize2 >> 2u);
+
+      while(blkCnt > 0u)
+      {
+        /* Set all accumulators to zero */
+        acc0 = 0;
+        acc1 = 0;
+        acc2 = 0;
+        acc3 = 0;
+
+        /* read x[0], x[1], x[2] samples */
+        x0 = *(px++);
+        x1 = *(px++);
+        x2 = *(px++);
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        do
+        {
+          /* Read y[srcBLen - 1] sample */
+          c0 = *(py--);
+          /* Read y[srcBLen - 2] sample */
+          c1 = *(py--);
+
+          /* Read x[3] sample */
+          x3 = *(px++);
+
+          /* x[0] and x[1] are packed */
+          in1 = (q15_t) x0;
+          in2 = (q15_t) x1;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* y[srcBLen - 1]   and y[srcBLen - 2] are packed */
+          in1 = (q15_t) c0;
+          in2 = (q15_t) c1;
+
+          input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc0 += x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2]  */
+          acc0 = __SMLAD(input1, input2, acc0);
+
+          /* x[1] and x[2] are packed */
+          in1 = (q15_t) x1;
+          in2 = (q15_t) x2;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc1 += x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2]  */
+          acc1 = __SMLAD(input1, input2, acc1);
+
+          /* x[2] and x[3] are packed */
+          in1 = (q15_t) x2;
+          in2 = (q15_t) x3;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc2 += x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2]  */
+          acc2 = __SMLAD(input1, input2, acc2);
+
+          /* Read x[4] sample */
+          x0 = *(px++);
+
+          /* x[3] and x[4] are packed */
+          in1 = (q15_t) x3;
+          in2 = (q15_t) x0;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc3 += x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2]  */
+          acc3 = __SMLAD(input1, input2, acc3);
+
+          /* Read y[srcBLen - 3] sample */
+          c0 = *(py--);
+          /* Read y[srcBLen - 4] sample */
+          c1 = *(py--);
+
+          /* Read x[5] sample */
+          x1 = *(px++);
+
+          /* x[2] and x[3] are packed */
+          in1 = (q15_t) x2;
+          in2 = (q15_t) x3;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* y[srcBLen - 3] and y[srcBLen - 4] are packed */
+          in1 = (q15_t) c0;
+          in2 = (q15_t) c1;
+
+          input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc0 += x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4]  */
+          acc0 = __SMLAD(input1, input2, acc0);
+
+          /* x[3] and x[4] are packed */
+          in1 = (q15_t) x3;
+          in2 = (q15_t) x0;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc1 += x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4]  */
+          acc1 = __SMLAD(input1, input2, acc1);
+
+          /* x[4] and x[5] are packed */
+          in1 = (q15_t) x0;
+          in2 = (q15_t) x1;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc2 += x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4]  */
+          acc2 = __SMLAD(input1, input2, acc2);
+
+          /* Read x[6] sample */
+          x2 = *(px++);
+
+          /* x[5] and x[6] are packed */
+          in1 = (q15_t) x1;
+          in2 = (q15_t) x2;
+
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* acc3 += x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4]  */
+          acc3 = __SMLAD(input1, input2, acc3);
+
+        } while(--k);
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Read y[srcBLen - 5] sample */
+          c0 = *(py--);
+
+          /* Read x[7] sample */
+          x3 = *(px++);
+
+          /* Perform the multiply-accumulates */
+          /* acc0 +=  x[4] * y[srcBLen - 5] */
+          acc0 += ((q31_t) x0 * c0);
+          /* acc1 +=  x[5] * y[srcBLen - 5] */
+          acc1 += ((q31_t) x1 * c0);
+          /* acc2 +=  x[6] * y[srcBLen - 5] */
+          acc2 += ((q31_t) x2 * c0);
+          /* acc3 +=  x[7] * y[srcBLen - 5] */
+          acc3 += ((q31_t) x3 * c0);
+
+          /* Reuse the present samples for the next MAC */
+          x0 = x1;
+          x1 = x2;
+          x2 = x3;
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q7_t) (__SSAT(acc0 >> 7, 8));
+        *pOut++ = (q7_t) (__SSAT(acc1 >> 7, 8));
+        *pOut++ = (q7_t) (__SSAT(acc2 >> 7, 8));
+        *pOut++ = (q7_t) (__SSAT(acc3 >> 7, 8));
+
+        /* Increment the pointer pIn1 index, count by 4 */
+        count += 4u;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+
+      /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+       ** No loop unrolling is used. */
+      blkCnt = (uint32_t) blockSize2 % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        k = srcBLen >> 2u;
+
+        /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+         ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+        while(k > 0u)
+        {
+
+          /* Reading two inputs of SrcA buffer and packing */
+          in1 = (q15_t) * px++;
+          in2 = (q15_t) * px++;
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* Reading two inputs of SrcB buffer and packing */
+          in1 = (q15_t) * py--;
+          in2 = (q15_t) * py--;
+          input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* Perform the multiply-accumulates */
+          sum = __SMLAD(input1, input2, sum);
+
+          /* Reading two inputs of SrcA buffer and packing */
+          in1 = (q15_t) * px++;
+          in2 = (q15_t) * px++;
+          input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* Reading two inputs of SrcB buffer and packing */
+          in1 = (q15_t) * py--;
+          in2 = (q15_t) * py--;
+          input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+          /* Perform the multiply-accumulates */
+          sum = __SMLAD(input1, input2, sum);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+         ** No loop unrolling is used. */
+        k = srcBLen % 0x4u;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulates */
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q7_t) (__SSAT(sum >> 7, 8));
+
+        /* Increment the pointer pIn1 index, count by 1 */
+ 	    count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+      	px = pIn1 + count;
+        py = pSrc2;	
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+    else
+    {
+      /* If the srcBLen is not a multiple of 4,   
+       * the blockSize2 loop cannot be unrolled by 4 */
+      blkCnt = (uint32_t) blockSize2;
+
+      while(blkCnt > 0u)
+      {
+        /* Accumulator is made zero for every iteration */
+        sum = 0;
+
+        /* srcBLen number of MACS should be performed */
+        k = srcBLen;
+
+        while(k > 0u)
+        {
+          /* Perform the multiply-accumulate */
+          sum += ((q31_t) * px++ * *py--);
+
+          /* Decrement the loop counter */
+          k--;
+        }
+
+        /* Store the result in the accumulator in the destination buffer. */
+        *pOut++ = (q7_t) (__SSAT(sum >> 7, 8));
+
+        /* Increment the MAC count */
+        count++;
+
+        /* Update the inputA and inputB pointers for next MAC calculation */
+        px = pIn1 + count;
+        py = pSrc2;
+
+        /* Decrement the loop counter */
+        blkCnt--;
+      }
+    }
+
+
+    /* --------------------------   
+     * Initializations of stage3   
+     * -------------------------*/
+
+    /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+     * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+     * ....   
+     * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+     * sum +=  x[srcALen-1] * y[srcBLen-1]   
+     */
+
+    /* In this stage the MAC operations are decreased by 1 for every iteration.   
+       The count variable holds the number of MAC operations performed */
+    count = srcBLen - 1u;
+
+    /* Working pointer of inputA */
+    pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+    px = pSrc1;
+
+    /* Working pointer of inputB */
+    pSrc2 = pIn2 + (srcBLen - 1u);
+    py = pSrc2;
+
+    /* -------------------   
+     * Stage3 process   
+     * ------------------*/
+
+    while(blockSize3 > 0)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = count >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Reading two inputs, x[srcALen - srcBLen + 1] and x[srcALen - srcBLen + 2] of SrcA buffer and packing */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* Reading two inputs, y[srcBLen - 1] and y[srcBLen - 2] of SrcB buffer and packing */
+        in1 = (q15_t) * py--;
+        in2 = (q15_t) * py--;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+        /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Reading two inputs, x[srcALen - srcBLen + 3] and x[srcALen - srcBLen + 4] of SrcA buffer and packing */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* Reading two inputs, y[srcBLen - 3] and y[srcBLen - 4] of SrcB buffer and packing */
+        in1 = (q15_t) * py--;
+        in2 = (q15_t) * py--;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+        /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the count is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = count % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+        sum += ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(sum >> 7, 8));
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = ++pSrc1;
+      py = pSrc2;
+
+      /* Decrement the MAC count */
+      count--;
+
+      /* Decrement the loop counter */
+      blockSize3--;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q7_t *pIn1 = pSrcA;                            /* inputA pointer */
+  q7_t *pIn2 = pSrcB;                            /* inputB pointer */
+  q31_t sum;                                     /* Accumulator */
+  uint32_t i, j;                                 /* loop counters */
+  arm_status status;                             /* status of Partial convolution */
+
+  /* Check for range of output samples to be calculated */
+  if((firstIndex + numPoints) > ((srcALen + (srcBLen - 1u))))
+  {
+    /* Set status as ARM_ARGUMENT_ERROR */
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+    /* Loop to calculate convolution for output length number of values */
+    for (i = firstIndex; i <= (firstIndex + numPoints - 1); i++)
+    {
+      /* Initialize sum with zero to carry on MAC operations */
+      sum = 0;
+
+      /* Loop to perform MAC operations according to convolution equation */
+      for (j = 0; j <= i; j++)
+      {
+        /* Check the array limitations */
+        if(((i - j) < srcBLen) && (j < srcALen))
+        {
+          /* z[i] += x[i-j] * y[j] */
+          sum += ((q15_t) pIn1[j] * (pIn2[i - j]));
+        }
+      }
+
+      /* Store the output in the destination buffer */
+      pDst[i] = (q7_t) __SSAT((sum >> 7u), 8u);
+    }
+    /* set status as ARM_SUCCESS as there are no argument errors */
+    status = ARM_MATH_SUCCESS;
+  }
+  return (status);
+
+#endif /*  #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**   
+ * @} end of PartialConv group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,734 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_conv_q15.c   
+*   
+* Description:	Convolution of Q15 sequences.     
+*   
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup Conv   
+ * @{   
+ */
+
+/**   
+ * @brief Convolution of Q15 sequences.   
+ * @param[in] *pSrcA points to the first input sequence.   
+ * @param[in] srcALen length of the first input sequence.   
+ * @param[in] *pSrcB points to the second input sequence.   
+ * @param[in] srcBLen length of the second input sequence.   
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.   
+ * @return none.   
+ *   
+ * @details   
+ * <b>Scaling and Overflow Behavior:</b>   
+ *   
+ * \par   
+ * The function is implemented using a 64-bit internal accumulator.   
+ * Both inputs are in 1.15 format and multiplications yield a 2.30 result.   
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.   
+ * This approach provides 33 guard bits and there is no risk of overflow.   
+ * The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.   
+ *   
+ * \par   
+ * Refer to <code>arm_conv_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4. 
+ *
+ * \par    
+ * Refer the function <code>arm_conv_opt_q15()</code> for a faster implementation of this function using scratch buffers.
+ *  
+ */
+
+void arm_conv_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst)
+{
+
+#if (defined(ARM_MATH_CM4) || defined(ARM_MATH_CM3)) && !defined(UNALIGNED_SUPPORT_DISABLE)
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t *pIn1;                                   /* inputA pointer */
+  q15_t *pIn2;                                   /* inputB pointer */
+  q15_t *pOut = pDst;                            /* output pointer */
+  q63_t sum, acc0, acc1, acc2, acc3;             /* Accumulator */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1, *pSrc2;                          /* Intermediate pointers */
+  q31_t x0, x1, x2, x3, c0;                      /* Temporary variables to hold state and coefficient values */
+  uint32_t blockSize1, blockSize2, blockSize3, j, k, count, blkCnt;     /* loop counter */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally   
+   * divided into three stages according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first stage of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+   * In the third stage of the algorithm, the multiplications decrease by one   
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.   
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]   
+   * sum = x[0] * y[1] + x[1] * y[0]   
+   * ....   
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* For loop unrolling by 4, this stage is divided into two. */
+  /* First part of this stage computes the MAC operations less than 4 */
+  /* Second part of this stage computes the MAC operations greater than or equal to 4 */
+
+  /* The first part of the stage starts here */
+  while((count < 4u) && (blockSize1 > 0u))
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Loop over number of MAC operations between   
+     * inputA samples and inputB samples */
+    k = count;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = __SMLALD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* The second part of the stage starts here */
+  /* The internal loop, over count, is unrolled by 4 */
+  /* To, read the last two inputB samples using SIMD:   
+   * y[srcBLen] and y[srcBLen-1] coefficients, py is decremented by 1 */
+  py = py - 1;
+
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0], x[1] are multiplied with y[srcBLen - 1], y[srcBLen - 2] respectively */
+      sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+      /* x[2], x[3] are multiplied with y[srcBLen - 3], y[srcBLen - 4] respectively */
+      sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* For the next MAC operations, the pointer py is used without SIMD   
+     * So, py is incremented by 1 */
+    py = py + 1u;
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = __SMLALD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + (count - 1u);
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is the index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+
+  /* --------------------   
+   * Stage2 process   
+   * -------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      py = py - 1u;
+
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+
+      /* read x[0], x[1] samples */
+      x0 = *__SIMD32(px);
+      /* read x[1], x[2] samples */
+      x1 = _SIMD32_OFFSET(px+1);
+	  px+= 2u;
+
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the last two inputB samples using SIMD:   
+         * y[srcBLen - 1] and y[srcBLen - 2] */
+        c0 = *__SIMD32(py)--;
+
+        /* acc0 +=  x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2] */
+        acc0 = __SMLALDX(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2] */
+        acc1 = __SMLALDX(x1, c0, acc1);
+
+        /* Read x[2], x[3] */
+        x2 = *__SIMD32(px);
+
+        /* Read x[3], x[4] */
+        x3 = _SIMD32_OFFSET(px+1);
+
+        /* acc2 +=  x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2] */
+        acc2 = __SMLALDX(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2] */
+        acc3 = __SMLALDX(x3, c0, acc3);
+
+        /* Read y[srcBLen - 3] and y[srcBLen - 4] */
+        c0 = *__SIMD32(py)--;
+
+        /* acc0 +=  x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4] */
+        acc0 = __SMLALDX(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4] */
+        acc1 = __SMLALDX(x3, c0, acc1);
+
+        /* Read x[4], x[5] */
+        x0 = _SIMD32_OFFSET(px+2);
+
+        /* Read x[5], x[6] */
+        x1 = _SIMD32_OFFSET(px+3);
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4] */
+        acc2 = __SMLALDX(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4] */
+        acc3 = __SMLALDX(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[srcBLen - 5] */
+        c0 = *(py+1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+        /* Read x[7] */
+        x3 = *__SIMD32(px);
+		px++;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALD(x0, c0, acc0);
+        acc1 = __SMLALD(x1, c0, acc1);
+        acc2 = __SMLALDX(x1, c0, acc2);
+        acc3 = __SMLALDX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x0, c0, acc0);
+        acc1 = __SMLALDX(x1, c0, acc1);
+        acc2 = __SMLALDX(x3, c0, acc2);
+        acc3 = __SMLALDX(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[srcBLen - 5], y[srcBLen - 6] */
+        c0 = _SIMD32_OFFSET(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px+1);
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x0, c0, acc0);
+        acc1 = __SMLALDX(x1, c0, acc1);
+        acc2 = __SMLALDX(x3, c0, acc2);
+        acc3 = __SMLALDX(x2, c0, acc3);
+
+		c0 = *(py-1);
+
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+        /* Read x[10] */
+        x3 =  _SIMD32_OFFSET(px+2);
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x1, c0, acc0);
+        acc1 = __SMLALD(x2, c0, acc1);
+        acc2 = __SMLALDX(x2, c0, acc2);
+        acc3 = __SMLALDX(x3, c0, acc3);
+      }
+
+
+      /* Store the results in the accumulators in the destination buffer. */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+      *__SIMD32(pOut)++ =
+        __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+       /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += (q63_t) ((q31_t) * px++ * *py--);
+        sum += (q63_t) ((q31_t) * px++ * *py--);
+        sum += (q63_t) ((q31_t) * px++ * *py--);
+        sum += (q63_t) ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += (q63_t) ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT(sum >> 15, 16));
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) ((q31_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q15_t) (__SSAT(sum >> 15, 16));
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+   * sum +=  x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  blockSize3 = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  pIn2 = pSrc2 - 1u;
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  /* For loop unrolling by 4, this stage is divided into two. */
+  /* First part of this stage computes the MAC operations greater than 4 */
+  /* Second part of this stage computes the MAC operations less than or equal to 4 */
+
+  /* The first part of the stage starts here */
+  j = blockSize3 >> 2u;
+
+  while((j > 0u) && (blockSize3 > 0u))
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[srcALen - srcBLen + 1], x[srcALen - srcBLen + 2] are multiplied   
+       * with y[srcBLen - 1], y[srcBLen - 2] respectively */
+      sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+      /* x[srcALen - srcBLen + 3], x[srcALen - srcBLen + 4] are multiplied   
+       * with y[srcBLen - 3], y[srcBLen - 4] respectively */
+      sum = __SMLALDX(*__SIMD32(px)++, *__SIMD32(py)--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* For the next MAC operations, the pointer py is used without SIMD   
+     * So, py is incremented by 1 */
+    py = py + 1u;
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* sum += x[srcALen - srcBLen + 5] * y[srcBLen - 5] */
+      sum = __SMLALD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+
+    j--;
+  }
+
+  /* The second part of the stage starts here */
+  /* SIMD is not used for the next MAC operations,   
+   * so pointer py is updated to read only one sample at a time */
+  py = py + 1u;
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum +=  x[srcALen-1] * y[srcBLen-1] */
+      sum = __SMLALD(*px++, *py--, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  q15_t *pIn1 = pSrcA;                           /* input pointer */
+  q15_t *pIn2 = pSrcB;                           /* coefficient pointer */
+  q63_t sum;                                     /* Accumulator */
+  uint32_t i, j;                                 /* loop counter */
+
+  /* Loop to calculate output of convolution for output length number of times */
+  for (i = 0; i < (srcALen + srcBLen - 1); i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if(((i - j) < srcBLen) && (j < srcALen))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += (q31_t) pIn1[j] * (pIn2[i - j]);
+      }
+    }
+
+    /* Store the output in the destination buffer */
+    pDst[i] = (q15_t) __SSAT((sum >> 15u), 16u);
+  }
+
+#endif /*  #if (defined(ARM_MATH_CM4) || defined(ARM_MATH_CM3)) && !defined(UNALIGNED_SUPPORT_DISABLE)*/
+
+}
+
+/**   
+ * @} end of Conv group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,565 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_conv_q31.c    
+*    
+* Description:	Convolution of Q31 sequences.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Conv    
+ * @{    
+ */
+
+/**    
+ * @brief Convolution of Q31 sequences.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * There is no saturation on intermediate additions.    
+ * Thus, if the accumulator overflows it wraps around and distorts the result.    
+ * The input signals should be scaled down to avoid intermediate overflows.    
+ * Scale down the inputs by log2(min(srcALen, srcBLen)) (log2 is read as log to the base 2) times to avoid overflows,    
+ * as maximum of min(srcALen, srcBLen) number of additions are carried internally.    
+ * The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.    
+ *    
+ * \par    
+ * See <code>arm_conv_fast_q31()</code> for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.    
+ */
+
+void arm_conv_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t *pIn1;                                   /* inputA pointer */
+  q31_t *pIn2;                                   /* inputB pointer */
+  q31_t *pOut = pDst;                            /* output pointer */
+  q31_t *px;                                     /* Intermediate inputA pointer  */
+  q31_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t *pSrc1, *pSrc2;                          /* Intermediate pointers */
+  q63_t sum;                                     /* Accumulator */
+  q63_t acc0, acc1, acc2;                        /* Accumulator */
+  q31_t x0, x1, x2, c0;                          /* Temporary variables to hold state and coefficient values */
+  uint32_t j, k, count, blkCnt, blockSize1, blockSize2, blockSize3;     /* loop counter */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (q31_t *) pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = (q31_t *) pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally    
+   * divided into three stages according to the number of multiplications that has to be    
+   * taken place between inputA samples and inputB samples. In the first stage of the    
+   * algorithm, the multiplications increase by one for every iteration.    
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.    
+   * In the third stage of the algorithm, the multiplications decrease by one    
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.    
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------    
+   * Initializations of stage1    
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]    
+   * sum = x[0] * y[1] + x[1] * y[0]    
+   * ....    
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]    
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------    
+   * Stage1 process    
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 1] */
+      sum += (q63_t) * px++ * (*py--);
+      /* x[1] * y[srcBLen - 2] */
+      sum += (q63_t) * px++ * (*py--);
+      /* x[2] * y[srcBLen - 3] */
+      sum += (q63_t) * px++ * (*py--);
+      /* x[3] * y[srcBLen - 4] */
+      sum += (q63_t) * px++ * (*py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (q63_t) * px++ * (*py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q31_t) (sum >> 31);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------    
+   * Initializations of stage2    
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]    
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]    
+   * ....    
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]    
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------    
+   * Stage2 process    
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+   * So, to loop unroll over blockSize2,    
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll by 3 */
+    blkCnt = blockSize2 / 3;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+
+      /* Apply loop unrolling and compute 3 MACs simultaneously. */
+      k = srcBLen / 3;
+
+      /* First part of the processing with loop unrolling.  Compute 3 MACs at a time.        
+       ** a second loop below computes MACs for the remaining 1 to 2 samples. */
+      do
+      {
+        /* Read y[srcBLen - 1] sample */
+        c0 = *(py);
+
+        /* Read x[3] sample */
+        x2 = *(px);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[0] * y[srcBLen - 1] */
+        acc0 += ((q63_t) x0 * c0);
+        /* acc1 +=  x[1] * y[srcBLen - 1] */
+        acc1 += ((q63_t) x1 * c0);
+        /* acc2 +=  x[2] * y[srcBLen - 1] */
+        acc2 += ((q63_t) x2 * c0);
+
+        /* Read y[srcBLen - 2] sample */
+        c0 = *(py - 1u);
+
+        /* Read x[4] sample */
+        x0 = *(px + 1u);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[1] * y[srcBLen - 2] */
+        acc0 += ((q63_t) x1 * c0);
+        /* acc1 +=  x[2] * y[srcBLen - 2] */
+        acc1 += ((q63_t) x2 * c0);
+        /* acc2 +=  x[3] * y[srcBLen - 2] */
+        acc2 += ((q63_t) x0 * c0);
+
+        /* Read y[srcBLen - 3] sample */
+        c0 = *(py - 2u);
+
+        /* Read x[5] sample */
+        x1 = *(px + 2u);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[2] * y[srcBLen - 3] */
+        acc0 += ((q63_t) x2 * c0);
+        /* acc1 +=  x[3] * y[srcBLen - 2] */
+        acc1 += ((q63_t) x0 * c0);
+        /* acc2 +=  x[4] * y[srcBLen - 2] */
+        acc2 += ((q63_t) x1 * c0);
+
+        /* update scratch pointers */
+        px += 3u;
+        py -= 3u;
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 3, compute any remaining MACs here.        
+       ** No loop unrolling is used. */
+      k = srcBLen - (3 * (srcBLen / 3));
+
+      while(k > 0u)
+      {
+        /* Read y[srcBLen - 5] sample */
+        c0 = *(py--);
+
+        /* Read x[7] sample */
+        x2 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[srcBLen - 5] */
+        acc0 += ((q63_t) x0 * c0);
+        /* acc1 +=  x[5] * y[srcBLen - 5] */
+        acc1 += ((q63_t) x1 * c0);
+        /* acc2 +=  x[6] * y[srcBLen - 5] */
+        acc2 += ((q63_t) x2 * c0);
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the results in the accumulators in the destination buffer. */
+      *pOut++ = (q31_t) (acc0 >> 31);
+      *pOut++ = (q31_t) (acc1 >> 31);
+      *pOut++ = (q31_t) (acc2 >> 31);
+
+      /* Increment the pointer pIn1 index, count by 3 */
+      count += 3u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 3, compute any remaining output samples here.        
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 - 3 * (blockSize2 / 3);
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += (q63_t) * px++ * (*py--);
+        sum += (q63_t) * px++ * (*py--);
+        sum += (q63_t) * px++ * (*py--);
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q31_t) (sum >> 31);
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,    
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) * px++ * (*py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q31_t) (sum >> 31);
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------    
+   * Initializations of stage3    
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]    
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]    
+   * ....    
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]    
+   * sum +=  x[srcALen-1] * y[srcBLen-1]    
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.    
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* -------------------    
+   * Stage3 process    
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+      sum += (q63_t) * px++ * (*py--);
+      /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+      sum += (q63_t) * px++ * (*py--);
+      /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+      sum += (q63_t) * px++ * (*py--);
+      /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+      sum += (q63_t) * px++ * (*py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (q63_t) * px++ * (*py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q31_t) (sum >> 31);
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t *pIn1 = pSrcA;                           /* input pointer */
+  q31_t *pIn2 = pSrcB;                           /* coefficient pointer */
+  q63_t sum;                                     /* Accumulator */
+  uint32_t i, j;                                 /* loop counter */
+
+  /* Loop to calculate output of convolution for output length number of times */
+  for (i = 0; i < (srcALen + srcBLen - 1); i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if(((i - j) < srcBLen) && (j < srcALen))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += ((q63_t) pIn1[j] * (pIn2[i - j]));
+      }
+    }
+
+    /* Store the output in the destination buffer */
+    pDst[i] = (q31_t) (sum >> 31u);
+  }
+
+#endif /*     #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of Conv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_conv_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,690 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_conv_q7.c   
+*   
+* Description:	Convolution of Q7 sequences. 
+*   
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup Conv   
+ * @{   
+ */
+
+/**   
+ * @brief Convolution of Q7 sequences.   
+ * @param[in] *pSrcA points to the first input sequence.   
+ * @param[in] srcALen length of the first input sequence.   
+ * @param[in] *pSrcB points to the second input sequence.   
+ * @param[in] srcBLen length of the second input sequence.   
+ * @param[out] *pDst points to the location where the output result is written.  Length srcALen+srcBLen-1.   
+ * @return none.   
+ *   
+ * @details   
+ * <b>Scaling and Overflow Behavior:</b>   
+ *   
+ * \par   
+ * The function is implemented using a 32-bit internal accumulator.   
+ * Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result.   
+ * The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.   
+ * This approach provides 17 guard bits and there is no risk of overflow as long as <code>max(srcALen, srcBLen)<131072</code>.   
+ * The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and then saturated to 1.7 format.   
+ *
+ * \par    
+ * Refer the function <code>arm_conv_opt_q7()</code> for a faster implementation of this function.
+ * 
+ */
+
+void arm_conv_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t *pIn1;                                    /* inputA pointer */
+  q7_t *pIn2;                                    /* inputB pointer */
+  q7_t *pOut = pDst;                             /* output pointer */
+  q7_t *px;                                      /* Intermediate inputA pointer */
+  q7_t *py;                                      /* Intermediate inputB pointer */
+  q7_t *pSrc1, *pSrc2;                           /* Intermediate pointers */
+  q7_t x0, x1, x2, x3, c0, c1;                   /* Temporary variables to hold state and coefficient values */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulator */
+  q31_t input1, input2;                          /* Temporary input variables */
+  q15_t in1, in2;                                /* Temporary input variables */
+  uint32_t j, k, count, blkCnt, blockSize1, blockSize2, blockSize3;     /* loop counter */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+  }
+
+  /* conv(x,y) at n = x[n] * y[0] + x[n-1] * y[1] + x[n-2] * y[2] + ...+ x[n-N+1] * y[N -1] */
+  /* The function is internally   
+   * divided into three stages according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first stage of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second stage of the algorithm, srcBLen number of multiplications are done.   
+   * In the third stage of the algorithm, the multiplications decrease by one   
+   * for every iteration. */
+
+  /* The algorithm is implemented in three stages.   
+     The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = (srcALen - srcBLen) + 1u;
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[0]   
+   * sum = x[0] * y[1] + x[1] * y[0]   
+   * ....   
+   * sum = x[0] * y[srcBlen - 1] + x[1] * y[srcBlen - 2] +...+ x[srcBLen - 1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] , x[1] */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* y[srcBLen - 1] , y[srcBLen - 2] */
+      in1 = (q15_t) * py--;
+      in2 = (q15_t) * py--;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* x[0] * y[srcBLen - 1] */
+      /* x[1] * y[srcBLen - 2] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* x[2] , x[3] */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* y[srcBLen - 3] , y[srcBLen - 4] */
+      in1 = (q15_t) * py--;
+      in2 = (q15_t) * py--;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* x[2] * y[srcBLen - 3] */
+      /* x[3] * y[srcBLen - 4] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += ((q15_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q7_t) (__SSAT(sum >> 7u, 8));
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pIn2 + count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[srcBLen-1] + x[1] * y[srcBLen-2] +...+ x[srcBLen-1] * y[0]   
+   * sum = x[1] * y[srcBLen-1] + x[2] * y[srcBLen-2] +...+ x[srcBLen] * y[0]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[srcBLen-1] + x[srcALen] * y[srcBLen-2] +...+ x[srcALen-1] * y[0]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------   
+   * Stage2 process   
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+      x2 = *(px++);
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read y[srcBLen - 1] sample */
+        c0 = *(py--);
+        /* Read y[srcBLen - 2] sample */
+        c1 = *(py--);
+
+        /* Read x[3] sample */
+        x3 = *(px++);
+
+        /* x[0] and x[1] are packed */
+        in1 = (q15_t) x0;
+        in2 = (q15_t) x1;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* y[srcBLen - 1]   and y[srcBLen - 2] are packed */
+        in1 = (q15_t) c0;
+        in2 = (q15_t) c1;
+
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc0 += x[0] * y[srcBLen - 1] + x[1] * y[srcBLen - 2]  */
+        acc0 = __SMLAD(input1, input2, acc0);
+
+        /* x[1] and x[2] are packed */
+        in1 = (q15_t) x1;
+        in2 = (q15_t) x2;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc1 += x[1] * y[srcBLen - 1] + x[2] * y[srcBLen - 2]  */
+        acc1 = __SMLAD(input1, input2, acc1);
+
+        /* x[2] and x[3] are packed */
+        in1 = (q15_t) x2;
+        in2 = (q15_t) x3;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc2 += x[2] * y[srcBLen - 1] + x[3] * y[srcBLen - 2]  */
+        acc2 = __SMLAD(input1, input2, acc2);
+
+        /* Read x[4] sample */
+        x0 = *(px++);
+
+        /* x[3] and x[4] are packed */
+        in1 = (q15_t) x3;
+        in2 = (q15_t) x0;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc3 += x[3] * y[srcBLen - 1] + x[4] * y[srcBLen - 2]  */
+        acc3 = __SMLAD(input1, input2, acc3);
+
+        /* Read y[srcBLen - 3] sample */
+        c0 = *(py--);
+        /* Read y[srcBLen - 4] sample */
+        c1 = *(py--);
+
+        /* Read x[5] sample */
+        x1 = *(px++);
+
+        /* x[2] and x[3] are packed */
+        in1 = (q15_t) x2;
+        in2 = (q15_t) x3;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* y[srcBLen - 3] and y[srcBLen - 4] are packed */
+        in1 = (q15_t) c0;
+        in2 = (q15_t) c1;
+
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc0 += x[2] * y[srcBLen - 3] + x[3] * y[srcBLen - 4]  */
+        acc0 = __SMLAD(input1, input2, acc0);
+
+        /* x[3] and x[4] are packed */
+        in1 = (q15_t) x3;
+        in2 = (q15_t) x0;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc1 += x[3] * y[srcBLen - 3] + x[4] * y[srcBLen - 4]  */
+        acc1 = __SMLAD(input1, input2, acc1);
+
+        /* x[4] and x[5] are packed */
+        in1 = (q15_t) x0;
+        in2 = (q15_t) x1;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc2 += x[4] * y[srcBLen - 3] + x[5] * y[srcBLen - 4]  */
+        acc2 = __SMLAD(input1, input2, acc2);
+
+        /* Read x[6] sample */
+        x2 = *(px++);
+
+        /* x[5] and x[6] are packed */
+        in1 = (q15_t) x1;
+        in2 = (q15_t) x2;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* acc3 += x[5] * y[srcBLen - 3] + x[6] * y[srcBLen - 4]  */
+        acc3 = __SMLAD(input1, input2, acc3);
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Read y[srcBLen - 5] sample */
+        c0 = *(py--);
+
+        /* Read x[7] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[srcBLen - 5] */
+        acc0 += ((q15_t) x0 * c0);
+        /* acc1 +=  x[5] * y[srcBLen - 5] */
+        acc1 += ((q15_t) x1 * c0);
+        /* acc2 +=  x[6] * y[srcBLen - 5] */
+        acc2 += ((q15_t) x2 * c0);
+        /* acc3 +=  x[7] * y[srcBLen - 5] */
+        acc3 += ((q15_t) x3 * c0);
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(acc0 >> 7u, 8));
+      *pOut++ = (q7_t) (__SSAT(acc1 >> 7u, 8));
+      *pOut++ = (q7_t) (__SSAT(acc2 >> 7u, 8));
+      *pOut++ = (q7_t) (__SSAT(acc3 >> 7u, 8));
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+
+        /* Reading two inputs of SrcA buffer and packing */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* Reading two inputs of SrcB buffer and packing */
+        in1 = (q15_t) * py--;
+        in2 = (q15_t) * py--;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* Perform the multiply-accumulates */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Reading two inputs of SrcA buffer and packing */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* Reading two inputs of SrcB buffer and packing */
+        in1 = (q15_t) * py--;
+        in2 = (q15_t) * py--;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+        /* Perform the multiply-accumulates */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q15_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(sum >> 7u, 8));
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* srcBLen number of MACS should be performed */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q15_t) * px++ * *py--);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut++ = (q7_t) (__SSAT(sum >> 7u, 8));
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pSrc2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[srcBLen-1] + x[srcALen-srcBLen+2] * y[srcBLen-2] +...+ x[srcALen-1] * y[1]   
+   * sum += x[srcALen-srcBLen+2] * y[srcBLen-1] + x[srcALen-srcBLen+3] * y[srcBLen-2] +...+ x[srcALen-1] * y[2]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[srcBLen-1] + x[srcALen-1] * y[srcBLen-2]   
+   * sum +=  x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The blockSize3 variable holds the number of MAC operations performed */
+
+  /* Working pointer of inputA */
+  pSrc1 = pIn1 + (srcALen - (srcBLen - 1u));
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  pSrc2 = pIn2 + (srcBLen - 1u);
+  py = pSrc2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = blockSize3 >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Reading two inputs, x[srcALen - srcBLen + 1] and x[srcALen - srcBLen + 2] of SrcA buffer and packing */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* Reading two inputs, y[srcBLen - 1] and y[srcBLen - 2] of SrcB buffer and packing */
+      in1 = (q15_t) * py--;
+      in2 = (q15_t) * py--;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* sum += x[srcALen - srcBLen + 1] * y[srcBLen - 1] */
+      /* sum += x[srcALen - srcBLen + 2] * y[srcBLen - 2] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* Reading two inputs, x[srcALen - srcBLen + 3] and x[srcALen - srcBLen + 4] of SrcA buffer and packing */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* Reading two inputs, y[srcBLen - 3] and y[srcBLen - 4] of SrcB buffer and packing */
+      in1 = (q15_t) * py--;
+      in2 = (q15_t) * py--;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16u);
+
+      /* sum += x[srcALen - srcBLen + 3] * y[srcBLen - 3] */
+      /* sum += x[srcALen - srcBLen + 4] * y[srcBLen - 4] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the blockSize3 is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = blockSize3 % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += ((q15_t) * px++ * *py--);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut++ = (q7_t) (__SSAT(sum >> 7u, 8));
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pSrc2;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q7_t *pIn1 = pSrcA;                            /* input pointer */
+  q7_t *pIn2 = pSrcB;                            /* coefficient pointer */
+  q31_t sum;                                     /* Accumulator */
+  uint32_t i, j;                                 /* loop counter */
+
+  /* Loop to calculate output of convolution for output length number of times */
+  for (i = 0; i < (srcALen + srcBLen - 1); i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if(((i - j) < srcBLen) && (j < srcALen))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += (q15_t) pIn1[j] * (pIn2[i - j]);
+      }
+    }
+
+    /* Store the output in the destination buffer */
+    pDst[i] = (q7_t) __SSAT((sum >> 7u), 8u);
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY        */
+
+}
+
+/**   
+ * @} end of Conv group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,739 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_correlate_f32.c    
+*    
+* Description:	 Correlation of floating-point sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup Corr Correlation    
+ *    
+ * Correlation is a mathematical operation that is similar to convolution.    
+ * As with convolution, correlation uses two signals to produce a third signal.    
+ * The underlying algorithms in correlation and convolution are identical except that one of the inputs is flipped in convolution.    
+ * Correlation is commonly used to measure the similarity between two signals.    
+ * It has applications in pattern recognition, cryptanalysis, and searching.    
+ * The CMSIS library provides correlation functions for Q7, Q15, Q31 and floating-point data types.    
+ * Fast versions of the Q15 and Q31 functions are also provided.    
+ *    
+ * \par Algorithm    
+ * Let <code>a[n]</code> and <code>b[n]</code> be sequences of length <code>srcALen</code> and <code>srcBLen</code> samples respectively.    
+ * The convolution of the two signals is denoted by    
+ * <pre>    
+ *                   c[n] = a[n] * b[n]    
+ * </pre>    
+ * In correlation, one of the signals is flipped in time    
+ * <pre>    
+ *                   c[n] = a[n] * b[-n]    
+ * </pre>    
+ *    
+ * \par    
+ * and this is mathematically defined as    
+ * \image html CorrelateEquation.gif    
+ * \par    
+ * The <code>pSrcA</code> points to the first input vector of length <code>srcALen</code> and <code>pSrcB</code> points to the second input vector of length <code>srcBLen</code>.    
+ * The result <code>c[n]</code> is of length <code>2 * max(srcALen, srcBLen) - 1</code> and is defined over the interval <code>n=0, 1, 2, ..., (2 * max(srcALen, srcBLen) - 2)</code>.    
+ * The output result is written to <code>pDst</code> and the calling function must allocate <code>2 * max(srcALen, srcBLen) - 1</code> words for the result.    
+ *    
+ * <b>Note</b>   
+ * \par  
+ * The <code>pDst</code> should be initialized to all zeros before being used.  
+ *  
+ * <b>Fixed-Point Behavior</b>    
+ * \par    
+ * Correlation requires summing up a large number of intermediate products.    
+ * As such, the Q7, Q15, and Q31 functions run a risk of overflow and saturation.    
+ * Refer to the function specific documentation below for further details of the particular algorithm used.    
+ *
+ *
+ * <b>Fast Versions</b>
+ *
+ * \par 
+ * Fast versions are supported for Q31 and Q15.  Cycles for Fast versions are less compared to Q31 and Q15 of correlate and the design requires
+ * the input signals should be scaled down to avoid intermediate overflows.   
+ *
+ *
+ * <b>Opt Versions</b>
+ *
+ * \par 
+ * Opt versions are supported for Q15 and Q7.  Design uses internal scratch buffer for getting good optimisation.
+ * These versions are optimised in cycles and consumes more memory(Scratch memory) compared to Q15 and Q7 versions of correlate 
+ */
+
+/**    
+ * @addtogroup Corr    
+ * @{    
+ */
+/**    
+ * @brief Correlation of floating-point sequences.    
+ * @param[in]  *pSrcA points to the first input sequence.    
+ * @param[in]  srcALen length of the first input sequence.    
+ * @param[in]  *pSrcB points to the second input sequence.    
+ * @param[in]  srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.    
+ * @return none.    
+ */
+
+void arm_correlate_f32(
+  float32_t * pSrcA,
+  uint32_t srcALen,
+  float32_t * pSrcB,
+  uint32_t srcBLen,
+  float32_t * pDst)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t *pIn1;                               /* inputA pointer */
+  float32_t *pIn2;                               /* inputB pointer */
+  float32_t *pOut = pDst;                        /* output pointer */
+  float32_t *px;                                 /* Intermediate inputA pointer */
+  float32_t *py;                                 /* Intermediate inputB pointer */
+  float32_t *pSrc1;                              /* Intermediate pointers */
+  float32_t sum, acc0, acc1, acc2, acc3;         /* Accumulators */
+  float32_t x0, x1, x2, x3, c0;                  /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counters */
+  int32_t inc = 1;                               /* Destination address modifier */
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,    
+   * we assume zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,    
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,    
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcA;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcB;
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding has to be done to srcB    
+     * to make their lengths equal.    
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))    
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+    //while(j > 0u)   
+    //{   
+    //  /* Zero is stored in the destination buffer */   
+    //  *pOut++ = 0.0f;   
+
+    //  /* Decrement the loop counter */   
+    //  j--;   
+    //}   
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization of inputB pointer */
+    pIn2 = pSrcA;
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally    
+   * divided into three parts according to the number of multiplications that has to be    
+   * taken place between inputA samples and inputB samples. In the first part of the    
+   * algorithm, the multiplications increase by one for every iteration.    
+   * In the second part of the algorithm, srcBLen number of multiplications are done.    
+   * In the third part of the algorithm, the multiplications decrease by one    
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.    
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------    
+   * Initializations of stage1    
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]    
+   * sum = x[0] * y[srcBlen-2] + x[1] * y[srcBlen - 1]    
+   * ....    
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]    
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------    
+   * Stage1 process    
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0.0f;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 4] */
+      sum += *px++ * *py++;
+      /* x[1] * y[srcBLen - 3] */
+      sum += *px++ * *py++;
+      /* x[2] * y[srcBLen - 2] */
+      sum += *px++ * *py++;
+      /* x[3] * y[srcBLen - 1] */
+      sum += *px++ * *py++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      /* x[0] * y[srcBLen - 1] */
+      sum += *px++ * *py++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = sum;
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------    
+   * Initializations of stage2    
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]    
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]    
+   * ....    
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------    
+   * Stage2 process    
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+   * So, to loop unroll over blockSize2,    
+   * srcBLen should be greater than or equal to 4, to loop unroll the srcBLen loop */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0.0f;
+      acc1 = 0.0f;
+      acc2 = 0.0f;
+      acc3 = 0.0f;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+      x2 = *(px++);
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read y[0] sample */
+        c0 = *(py++);
+
+        /* Read x[3] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[0] * y[0] */
+        acc0 += x0 * c0;
+        /* acc1 +=  x[1] * y[0] */
+        acc1 += x1 * c0;
+        /* acc2 +=  x[2] * y[0] */
+        acc2 += x2 * c0;
+        /* acc3 +=  x[3] * y[0] */
+        acc3 += x3 * c0;
+
+        /* Read y[1] sample */
+        c0 = *(py++);
+
+        /* Read x[4] sample */
+        x0 = *(px++);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[1] * y[1] */
+        acc0 += x1 * c0;
+        /* acc1 +=  x[2] * y[1] */
+        acc1 += x2 * c0;
+        /* acc2 +=  x[3] * y[1] */
+        acc2 += x3 * c0;
+        /* acc3 +=  x[4] * y[1] */
+        acc3 += x0 * c0;
+
+        /* Read y[2] sample */
+        c0 = *(py++);
+
+        /* Read x[5] sample */
+        x1 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[2] * y[2] */
+        acc0 += x2 * c0;
+        /* acc1 +=  x[3] * y[2] */
+        acc1 += x3 * c0;
+        /* acc2 +=  x[4] * y[2] */
+        acc2 += x0 * c0;
+        /* acc3 +=  x[5] * y[2] */
+        acc3 += x1 * c0;
+
+        /* Read y[3] sample */
+        c0 = *(py++);
+
+        /* Read x[6] sample */
+        x2 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[3] * y[3] */
+        acc0 += x3 * c0;
+        /* acc1 +=  x[4] * y[3] */
+        acc1 += x0 * c0;
+        /* acc2 +=  x[5] * y[3] */
+        acc2 += x1 * c0;
+        /* acc3 +=  x[6] * y[3] */
+        acc3 += x2 * c0;
+
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Read y[4] sample */
+        c0 = *(py++);
+
+        /* Read x[7] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[4] */
+        acc0 += x0 * c0;
+        /* acc1 +=  x[5] * y[4] */
+        acc1 += x1 * c0;
+        /* acc2 +=  x[6] * y[4] */
+        acc2 += x2 * c0;
+        /* acc3 +=  x[7] * y[4] */
+        acc3 += x3 * c0;
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = acc0;
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = acc1;
+      pOut += inc;
+
+      *pOut = acc2;
+      pOut += inc;
+
+      *pOut = acc3;
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0.0f;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += *px++ * *py++;
+        sum += *px++ * *py++;
+        sum += *px++ * *py++;
+        sum += *px++ * *py++;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += *px++ * *py++;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = sum;
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,    
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0.0f;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += *px++ * *py++;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = sum;
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------    
+   * Initializations of stage3    
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   * ....    
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]    
+   * sum +=  x[srcALen-1] * y[0]    
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = pIn1 + (srcALen - (srcBLen - 1u));
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------    
+   * Stage3 process    
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0.0f;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum += x[srcALen - srcBLen + 4] * y[3] */
+      sum += *px++ * *py++;
+      /* sum += x[srcALen - srcBLen + 3] * y[2] */
+      sum += *px++ * *py++;
+      /* sum += x[srcALen - srcBLen + 2] * y[1] */
+      sum += *px++ * *py++;
+      /* sum += x[srcALen - srcBLen + 1] * y[0] */
+      sum += *px++ * *py++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += *px++ * *py++;
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = sum;
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t *pIn1 = pSrcA;                       /* inputA pointer */
+  float32_t *pIn2 = pSrcB + (srcBLen - 1u);      /* inputB pointer */
+  float32_t sum;                                 /* Accumulator */
+  uint32_t i = 0u, j;                            /* loop counters */
+  uint32_t inv = 0u;                             /* Reverse order flag */
+  uint32_t tot = 0u;                             /* Length */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and a varaible, inv is set to 1 */
+  /* If lengths are not equal then zero pad has to be done to  make the two    
+   * inputs of same length. But to improve the performance, we assume zeroes    
+   * in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen, (srcALen - srcBLen) zeroes has to included in the    
+   * starting of the output buffer */
+  /* If srcALen < srcBLen, (srcALen - srcBLen) zeroes has to included in the   
+   * ending of the output buffer */
+  /* Once the zero padding is done the remaining of the output is calcualted   
+   * using convolution but with the shorter signal time shifted. */
+
+  /* Calculate the length of the remaining sequence */
+  tot = ((srcALen + srcBLen) - 2u);
+
+  if(srcALen > srcBLen)
+  {
+    /* Calculating the number of zeros to be padded to the output */
+    j = srcALen - srcBLen;
+
+    /* Initialise the pointer after zero padding */
+    pDst += j;
+  }
+
+  else if(srcALen < srcBLen)
+  {
+    /* Initialization to inputB pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization to the end of inputA pointer */
+    pIn2 = pSrcA + (srcALen - 1u);
+
+    /* Initialisation of the pointer after zero padding */
+    pDst = pDst + tot;
+
+    /* Swapping the lengths */
+    j = srcALen;
+    srcALen = srcBLen;
+    srcBLen = j;
+
+    /* Setting the reverse flag */
+    inv = 1;
+
+  }
+
+  /* Loop to calculate convolution for output length number of times */
+  for (i = 0u; i <= tot; i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0.0f;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0u; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if((((i - j) < srcBLen) && (j < srcALen)))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += pIn1[j] * pIn2[-((int32_t) i - j)];
+      }
+    }
+    /* Store the output in the destination buffer */
+    if(inv == 1)
+      *pDst-- = sum;
+    else
+      *pDst++ = sum;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of Corr group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_opt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_opt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,512 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_correlate_fast_opt_q15.c    
+*    
+* Description:	Fast Q15 Correlation.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Corr    
+ * @{    
+ */
+
+/**    
+ * @brief Correlation of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.    
+ * @param[in]  *pScratch points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.   
+ * @return none.    
+ *    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch buffers should be aligned by 32-bit    
+ *    
+ *     
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * This fast version uses a 32-bit accumulator with 2.30 format.    
+ * The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * There is no saturation on intermediate additions.    
+ * Thus, if the accumulator overflows it wraps around and distorts the result.    
+ * The input signals should be scaled down to avoid intermediate overflows.    
+ * Scale down one of the inputs by 1/min(srcALen, srcBLen) to avoid overflow since a    
+ * maximum of min(srcALen, srcBLen) number of additions is carried internally.    
+ * The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.    
+ *    
+ * \par    
+ * See <code>arm_correlate_q15()</code> for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.    
+ */
+
+void arm_correlate_fast_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch)
+{
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulators                  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t x1, x2, x3;                              /* temporary variables for holding input and coefficient values */
+  uint32_t j, blkCnt, outBlockSize;              /* loop counter                 */
+  int32_t inc = 1;                               /* Destination address modifier */
+  uint32_t tapCnt;
+  q31_t y1, y2;
+  q15_t *pScr;                                   /* Intermediate pointers        */
+  q15_t *pOut = pDst;                            /* output pointer               */
+#ifdef UNALIGNED_SUPPORT_DISABLE
+
+  q15_t a, b;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,        
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,        
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,        
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB        
+     * to make their lengths equal.        
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))        
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  pScr = pScratch;
+
+  /* Fill (srcBLen - 1u) zeros in scratch buffer */
+  arm_fill_q15(0, pScr, (srcBLen - 1u));
+
+  /* Update temporary scratch pointer */
+  pScr += (srcBLen - 1u);
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Copy (srcALen) samples in scratch buffer */
+  arm_copy_q15(pIn1, pScr, srcALen);
+
+  /* Update pointers */
+  pScr += srcALen;
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  j = srcALen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr++ = *pIn1++;
+    *pScr++ = *pIn1++;
+    *pScr++ = *pIn1++;
+    *pScr++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  j = srcALen % 0x4u;
+
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+  arm_fill_q15(0, pScr, (srcBLen - 1u));
+
+  /* Update pointer */
+  pScr += (srcBLen - 1u);
+
+#else
+
+/* Apply loop unrolling and do 4 Copies simultaneously. */
+  j = (srcBLen - 1u) >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr++ = 0;
+    *pScr++ = 0;
+    *pScr++ = 0;
+    *pScr++ = 0;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  j = (srcBLen - 1u) % 0x4u;
+
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr++ = 0;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* Temporary pointer for scratch2 */
+  py = pIn2;
+
+
+  /* Actual correlation process starts here */
+  blkCnt = (srcALen + srcBLen - 1u) >> 2;
+
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr = pScratch;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Read four samples from scratch1 buffer */
+    x1 = *__SIMD32(pScr)++;
+
+    /* Read next four samples from scratch1 buffer */
+    x2 = *__SIMD32(pScr)++;
+
+    tapCnt = (srcBLen) >> 2u;
+
+    while(tapCnt > 0u)
+    {
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pIn2);
+      y2 = _SIMD32_OFFSET(pIn2 + 2u);
+
+      acc0 = __SMLAD(x1, y1, acc0);
+
+      acc2 = __SMLAD(x2, y1, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc1 = __SMLADX(x3, y1, acc1);
+
+      x1 = _SIMD32_OFFSET(pScr);
+
+      acc0 = __SMLAD(x2, y2, acc0);
+
+      acc2 = __SMLAD(x1, y2, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      acc1 = __SMLADX(x3, y2, acc1);
+
+      x2 = _SIMD32_OFFSET(pScr + 2u);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y2, acc3);
+#else	 
+
+      /* Read four samples from smaller buffer */
+	  a = *pIn2;
+	  b = *(pIn2 + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      y1 = __PKHBT(a, b, 16);
+#else
+      y1 = __PKHBT(b, a, 16);
+#endif
+	  
+	  a = *(pIn2 + 2);
+	  b = *(pIn2 + 3);
+#ifndef ARM_MATH_BIG_ENDIAN
+      y2 = __PKHBT(a, b, 16);
+#else
+      y2 = __PKHBT(b, a, 16);
+#endif				
+
+      acc0 = __SMLAD(x1, y1, acc0);
+
+      acc2 = __SMLAD(x2, y1, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc1 = __SMLADX(x3, y1, acc1);
+
+	  a = *pScr;
+	  b = *(pScr + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(a, b, 16);
+#else
+      x1 = __PKHBT(b, a, 16);
+#endif
+
+      acc0 = __SMLAD(x2, y2, acc0);
+
+      acc2 = __SMLAD(x1, y2, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      acc1 = __SMLADX(x3, y2, acc1);
+
+	  a = *(pScr + 2);
+	  b = *(pScr + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x2 = __PKHBT(a, b, 16);
+#else
+      x2 = __PKHBT(b, a, 16);
+#endif
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y2, acc3);
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+      pIn2 += 4u;
+
+      pScr += 4u;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+
+
+    /* Update scratch pointer for remaining samples of smaller length sequence */
+    pScr -= 4u;
+
+
+    /* apply same above for remaining samples of smaller length sequence */
+    tapCnt = (srcBLen) & 3u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr++ * *pIn2);
+      acc1 += (*pScr++ * *pIn2);
+      acc2 += (*pScr++ * *pIn2);
+      acc3 += (*pScr++ * *pIn2++);
+
+      pScr -= 3u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+
+    /* Store the results in the accumulators in the destination buffer. */
+    *pOut = (__SSAT(acc0 >> 15u, 16));
+    pOut += inc;
+    *pOut = (__SSAT(acc1 >> 15u, 16));
+    pOut += inc;
+    *pOut = (__SSAT(acc2 >> 15u, 16));
+    pOut += inc;
+    *pOut = (__SSAT(acc3 >> 15u, 16));
+    pOut += inc;
+
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch += 4u;
+
+  }
+
+
+  blkCnt = (srcALen + srcBLen - 1u) & 0x3;
+
+  /* Calculate correlation for remaining samples of Bigger length sequence */
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr = pScratch;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+
+    tapCnt = (srcBLen) >> 1u;
+
+    while(tapCnt > 0u)
+    {
+
+      acc0 += (*pScr++ * *pIn2++);
+      acc0 += (*pScr++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    tapCnt = (srcBLen) & 1u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* Store the result in the accumulator in the destination buffer. */
+
+    *pOut = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+    pOut += inc;
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch += 1u;
+
+  }
+}
+
+/**    
+ * @} end of Corr group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,1319 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_correlate_fast_q15.c   
+*   
+* Description:	Fast Q15 Correlation.   
+*   
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup Corr   
+ * @{   
+ */
+
+/**   
+ * @brief Correlation of Q15 sequences (fast version) for Cortex-M3 and Cortex-M4.   
+ * @param[in] *pSrcA points to the first input sequence.   
+ * @param[in] srcALen length of the first input sequence.   
+ * @param[in] *pSrcB points to the second input sequence.   
+ * @param[in] srcBLen length of the second input sequence.   
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.   
+ * @return none.   
+ *   
+ * <b>Scaling and Overflow Behavior:</b>   
+ *   
+ * \par   
+ * This fast version uses a 32-bit accumulator with 2.30 format.   
+ * The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit.   
+ * There is no saturation on intermediate additions.   
+ * Thus, if the accumulator overflows it wraps around and distorts the result.   
+ * The input signals should be scaled down to avoid intermediate overflows.   
+ * Scale down one of the inputs by 1/min(srcALen, srcBLen) to avoid overflow since a   
+ * maximum of min(srcALen, srcBLen) number of additions is carried internally.   
+ * The 2.30 accumulator is right shifted by 15 bits and then saturated to 1.15 format to yield the final result.   
+ *   
+ * \par   
+ * See <code>arm_correlate_q15()</code> for a slower implementation of this function which uses a 64-bit accumulator to avoid wrap around distortion.   
+ */
+
+void arm_correlate_fast_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst)
+{
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q15_t *pOut = pDst;                            /* output pointer               */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulators                  */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1;                                  /* Intermediate pointers        */
+  q31_t x0, x1, x2, x3, c0;                      /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counter                 */
+  int32_t inc = 1;                               /* Destination address modifier */
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,   
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB   
+     * to make their lengths equal.   
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))   
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally   
+   * divided into three parts according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first part of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second part of the algorithm, srcBLen number of multiplications are done.   
+   * In the third part of the algorithm, the multiplications decrease by one   
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.   
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]   
+   * sum = x[0] * y[srcBlen - 2] + x[1] * y[srcBlen - 1]   
+   * ....   
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* The first loop starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 4] , x[1] * y[srcBLen - 3] */
+      sum = __SMLAD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+      /* x[3] * y[srcBLen - 1] , x[2] * y[srcBLen - 2] */
+      sum = __SMLAD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0] * y[srcBLen - 1] */
+      sum = __SMLAD(*px++, *py++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (sum >> 15);
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]   
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------   
+   * Stage2 process   
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4, to loop unroll the srcBLen loop */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1] samples */
+      x0 = *__SIMD32(px);
+      /* read x[1], x[2] samples */
+      x1 = _SIMD32_OFFSET(px + 1);
+	  px += 2u;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the first two inputB samples using SIMD:   
+         * y[0] and y[1] */
+        c0 = *__SIMD32(py)++;
+
+        /* acc0 +=  x[0] * y[0] + x[1] * y[1] */
+        acc0 = __SMLAD(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[0] + x[2] * y[1] */
+        acc1 = __SMLAD(x1, c0, acc1);
+
+        /* Read x[2], x[3] */
+        x2 = *__SIMD32(px);
+
+        /* Read x[3], x[4] */
+        x3 = _SIMD32_OFFSET(px + 1);
+
+        /* acc2 +=  x[2] * y[0] + x[3] * y[1] */
+        acc2 = __SMLAD(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[0] + x[4] * y[1] */
+        acc3 = __SMLAD(x3, c0, acc3);
+
+        /* Read y[2] and y[3] */
+        c0 = *__SIMD32(py)++;
+
+        /* acc0 +=  x[2] * y[2] + x[3] * y[3] */
+        acc0 = __SMLAD(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[2] + x[4] * y[3] */
+        acc1 = __SMLAD(x3, c0, acc1);
+
+        /* Read x[4], x[5] */
+        x0 = _SIMD32_OFFSET(px + 2);
+
+        /* Read x[5], x[6] */
+        x1 = _SIMD32_OFFSET(px + 3);
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[2] + x[5] * y[3] */
+        acc2 = __SMLAD(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[2] + x[6] * y[3] */
+        acc3 = __SMLAD(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[4] */
+        c0 = *py;
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[7] */
+        x3 = *__SIMD32(px);
+		px++;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLADX(x1, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[4], y[5] */
+        c0 = *__SIMD32(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px + 1);
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLAD(x3, c0, acc2);
+        acc3 = __SMLAD(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[4], y[5] */
+        c0 = *__SIMD32(py)++;
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px + 1);
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLAD(x3, c0, acc2);
+        acc3 = __SMLAD(x2, c0, acc3);
+
+        c0 = (*py);
+        /* Read y[6] */
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[10] */
+        x3 = _SIMD32_OFFSET(px + 2);
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x1, c0, acc0);
+        acc1 = __SMLAD(x2, c0, acc1);
+        acc2 = __SMLADX(x2, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (acc0 >> 15);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = (q15_t) (acc1 >> 15);
+      pOut += inc;
+
+      *pOut = (q15_t) (acc2 >> 15);
+      pOut += inc;
+
+      *pOut = (q15_t) (acc3 >> 15);
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (sum >> 15);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q31_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (sum >> 15);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]   
+   * sum +=  x[srcALen-1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum += x[srcALen - srcBLen + 4] * y[3] , sum += x[srcALen - srcBLen + 3] * y[2] */
+      sum = __SMLAD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+      /* sum += x[srcALen - srcBLen + 2] * y[1] , sum += x[srcALen - srcBLen + 1] * y[0] */
+      sum = __SMLAD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = __SMLAD(*px++, *py++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (sum >> 15);
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q15_t *pOut = pDst;                            /* output pointer               */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulators                  */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1;                                  /* Intermediate pointers        */
+  q31_t x0, x1, x2, x3, c0;                      /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counter                 */
+  int32_t inc = 1;                               /* Destination address modifier */
+  q15_t a, b;
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,   
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB   
+     * to make their lengths equal.   
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))   
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally   
+   * divided into three parts according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first part of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second part of the algorithm, srcBLen number of multiplications are done.   
+   * In the third part of the algorithm, the multiplications decrease by one   
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.   
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]   
+   * sum = x[0] * y[srcBlen - 2] + x[1] * y[srcBlen - 1]   
+   * ....   
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* The first loop starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 4] , x[1] * y[srcBLen - 3] */
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0] * y[srcBLen - 1] */
+        sum += ((q31_t) * px++ * *py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (sum >> 15);
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]   
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------   
+   * Stage2 process   
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4, to loop unroll the srcBLen loop */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1], x[2] samples */
+	  a = *px;
+	  b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  x0 = __PKHBT(a, b, 16);
+	  a = *(px + 2);
+	  x1 = __PKHBT(b, a, 16);
+
+#else
+
+	  x0 = __PKHBT(b, a, 16);
+	  a = *(px + 2);
+	  x1 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+	  px += 2u;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the first two inputB samples using SIMD:   
+         * y[0] and y[1] */
+		  a = *py;
+		  b = *(py + 1);
+	
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+		  c0 = __PKHBT(a, b, 16);
+	
+#else
+	
+		  c0 = __PKHBT(b, a, 16);
+	
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc0 +=  x[0] * y[0] + x[1] * y[1] */
+        acc0 = __SMLAD(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[0] + x[2] * y[1] */
+        acc1 = __SMLAD(x1, c0, acc1);
+
+        /* Read x[2], x[3], x[4] */
+	  	a = *px;
+	  	b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  	x2 = __PKHBT(a, b, 16);
+	  	a = *(px + 2);
+	  	x3 = __PKHBT(b, a, 16);
+
+#else
+
+	  	x2 = __PKHBT(b, a, 16);
+	  	a = *(px + 2);
+	  	x3 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc2 +=  x[2] * y[0] + x[3] * y[1] */
+        acc2 = __SMLAD(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[0] + x[4] * y[1] */
+        acc3 = __SMLAD(x3, c0, acc3);
+
+        /* Read y[2] and y[3] */
+		  a = *(py + 2);
+		  b = *(py + 3);
+
+		  py += 4u;
+	
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+		  c0 = __PKHBT(a, b, 16);
+	
+#else
+	
+		  c0 = __PKHBT(b, a, 16);
+	
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* acc0 +=  x[2] * y[2] + x[3] * y[3] */
+        acc0 = __SMLAD(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[2] + x[4] * y[3] */
+        acc1 = __SMLAD(x3, c0, acc1);
+
+        /* Read x[4], x[5], x[6] */
+	  	a = *(px + 2);
+	  	b = *(px + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  	x0 = __PKHBT(a, b, 16);
+	  	a = *(px + 4);
+	  	x1 = __PKHBT(b, a, 16);
+
+#else
+
+	  	x0 = __PKHBT(b, a, 16);
+	  	a = *(px + 4);
+	  	x1 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[2] + x[5] * y[3] */
+        acc2 = __SMLAD(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[2] + x[6] * y[3] */
+        acc3 = __SMLAD(x1, c0, acc3);
+
+      } while(--k);
+
+      /* For the next MAC operations, SIMD is not used   
+       * So, the 16 bit pointer if inputB, py is updated */
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[4] */
+        c0 = *py;
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[7] */
+		a = *px;
+		b = *(px + 1);
+
+		px++;;
+	
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+		x3 = __PKHBT(a, b, 16);
+	
+#else
+	
+		x3 = __PKHBT(b, a, 16);
+	
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		px++;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLADX(x1, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[4], y[5] */
+		  a = *py;
+		  b = *(py + 1);
+	
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+		  c0 = __PKHBT(a, b, 16);
+	
+#else
+	
+		  c0 = __PKHBT(b, a, 16);
+	
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* Read x[7], x[8], x[9] */
+	  	a = *px;
+	  	b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  	x3 = __PKHBT(a, b, 16);
+	  	a = *(px + 2);
+	  	x2 = __PKHBT(b, a, 16);
+
+#else
+
+	  	x3 = __PKHBT(b, a, 16);
+	  	a = *(px + 2);
+	  	x2 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLAD(x3, c0, acc2);
+        acc3 = __SMLAD(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[4], y[5] */
+		  a = *py;
+		  b = *(py + 1);
+	
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+		  c0 = __PKHBT(a, b, 16);
+	
+#else
+	
+		  c0 = __PKHBT(b, a, 16);
+	
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		py += 2u;
+
+        /* Read x[7], x[8], x[9] */
+	  	a = *px;
+	  	b = *(px + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  	x3 = __PKHBT(a, b, 16);
+	  	a = *(px + 2);
+	  	x2 = __PKHBT(b, a, 16);
+
+#else
+
+	  	x3 = __PKHBT(b, a, 16);
+	  	a = *(px + 2);
+	  	x2 = __PKHBT(a, b, 16);
+
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLAD(x0, c0, acc0);
+        acc1 = __SMLAD(x1, c0, acc1);
+        acc2 = __SMLAD(x3, c0, acc2);
+        acc3 = __SMLAD(x2, c0, acc3);
+
+        c0 = (*py);
+        /* Read y[6] */
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+
+        /* Read x[10] */
+		b = *(px + 3);
+	
+#ifndef ARM_MATH_BIG_ENDIAN
+	
+		x3 = __PKHBT(a, b, 16);
+	
+#else
+	
+		x3 = __PKHBT(b, a, 16);
+	
+#endif	/*	#ifndef ARM_MATH_BIG_ENDIAN	*/
+
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLADX(x1, c0, acc0);
+        acc1 = __SMLAD(x2, c0, acc1);
+        acc2 = __SMLADX(x2, c0, acc2);
+        acc3 = __SMLADX(x3, c0, acc3);
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (acc0 >> 15);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = (q15_t) (acc1 >> 15);
+      pOut += inc;
+
+      *pOut = (q15_t) (acc2 >> 15);
+      pOut += inc;
+
+      *pOut = (q15_t) (acc3 >> 15);
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (sum >> 15);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q31_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (sum >> 15);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]   
+   * sum +=  x[srcALen-1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+        sum += ((q31_t) * px++ * *py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+        sum += ((q31_t) * px++ * *py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (sum >> 15);
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#endif /*   #ifndef UNALIGNED_SUPPORT_DISABLE */
+
+}
+
+/**   
+ * @} end of Corr group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,612 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_correlate_fast_q31.c    
+*    
+* Description:	Fast Q31 Correlation.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Corr    
+ * @{    
+ */
+
+/**    
+ * @brief Correlation of Q31 sequences (fast version) for Cortex-M3 and Cortex-M4.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * This function is optimized for speed at the expense of fixed-point precision and overflow protection.    
+ * The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format.    
+ * These intermediate results are accumulated in a 32-bit register in 2.30 format.    
+ * Finally, the accumulator is saturated and converted to a 1.31 result.    
+ *    
+ * \par    
+ * The fast version has the same overflow behavior as the standard version but provides less precision since it discards the low 32 bits of each multiplication result.    
+ * In order to avoid overflows completely the input signals must be scaled down.    
+ * The input signals should be scaled down to avoid intermediate overflows.    
+ * Scale down one of the inputs by 1/min(srcALen, srcBLen)to avoid overflows since a    
+ * maximum of min(srcALen, srcBLen) number of additions is carried internally.    
+ *    
+ * \par    
+ * See <code>arm_correlate_q31()</code> for a slower implementation of this function which uses 64-bit accumulation to provide higher precision.    
+ */
+
+void arm_correlate_fast_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst)
+{
+  q31_t *pIn1;                                   /* inputA pointer               */
+  q31_t *pIn2;                                   /* inputB pointer               */
+  q31_t *pOut = pDst;                            /* output pointer               */
+  q31_t *px;                                     /* Intermediate inputA pointer  */
+  q31_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t *pSrc1;                                  /* Intermediate pointers        */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulators                  */
+  q31_t x0, x1, x2, x3, c0;                      /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counter                 */
+  int32_t inc = 1;                               /* Destination address modifier */
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB    
+     * to make their lengths equal.    
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))    
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally    
+   * divided into three parts according to the number of multiplications that has to be    
+   * taken place between inputA samples and inputB samples. In the first part of the    
+   * algorithm, the multiplications increase by one for every iteration.    
+   * In the second part of the algorithm, srcBLen number of multiplications are done.    
+   * In the third part of the algorithm, the multiplications decrease by one    
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.    
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------    
+   * Initializations of stage1    
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]    
+   * sum = x[0] * y[srcBlen - 2] + x[1] * y[srcBlen - 1]    
+   * ....    
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]    
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------    
+   * Stage1 process    
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 4] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+      /* x[1] * y[srcBLen - 3] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+      /* x[2] * y[srcBLen - 2] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+      /* x[3] * y[srcBLen - 1] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0] * y[srcBLen - 1] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = sum << 1;
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------    
+   * Initializations of stage2    
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]    
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]    
+   * ....    
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------    
+   * Stage2 process    
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+   * So, to loop unroll over blockSize2,    
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+      x2 = *(px++);
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read y[0] sample */
+        c0 = *(py++);
+
+        /* Read x[3] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[0] * y[0] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc1 +=  x[1] * y[0] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc2 +=  x[2] * y[0] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc3 +=  x[3] * y[0] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x3 * c0)) >> 32);
+
+        /* Read y[1] sample */
+        c0 = *(py++);
+
+        /* Read x[4] sample */
+        x0 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[1] * y[1] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc1 +=  x[2] * y[1] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc2 +=  x[3] * y[1] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x3 * c0)) >> 32);
+        /* acc3 +=  x[4] * y[1] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+        /* Read y[2] sample */
+        c0 = *(py++);
+
+        /* Read x[5] sample */
+        x1 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[2] * y[2] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc1 +=  x[3] * y[2] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x3 * c0)) >> 32);
+        /* acc2 +=  x[4] * y[2] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc3 +=  x[5] * y[2] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+        /* Read y[3] sample */
+        c0 = *(py++);
+
+        /* Read x[6] sample */
+        x2 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[3] * y[3] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x3 * c0)) >> 32);
+        /* acc1 +=  x[4] * y[3] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc2 +=  x[5] * y[3] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc3 +=  x[6] * y[3] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x2 * c0)) >> 32);
+
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Read y[4] sample */
+        c0 = *(py++);
+
+        /* Read x[7] sample */
+        x3 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[4] */
+        acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+        /* acc1 +=  x[5] * y[4] */
+        acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+        /* acc2 +=  x[6] * y[4] */
+        acc2 = (q31_t) ((((q63_t) acc2 << 32) + ((q63_t) x2 * c0)) >> 32);
+        /* acc3 +=  x[7] * y[4] */
+        acc3 = (q31_t) ((((q63_t) acc3 << 32) + ((q63_t) x3 * c0)) >> 32);
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q31_t) (acc0 << 1);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = (q31_t) (acc1 << 1);
+      pOut += inc;
+
+      *pOut = (q31_t) (acc2 << 1);
+      pOut += inc;
+
+      *pOut = (q31_t) (acc3 << 1);
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 4 */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py++))) >> 32);
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py++))) >> 32);
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py++))) >> 32);
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py++))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py++))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = sum << 1;
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,    
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum = (q31_t) ((((q63_t) sum << 32) +
+                        ((q63_t) * px++ * (*py++))) >> 32);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = sum << 1;
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------    
+   * Initializations of stage3    
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   * ....    
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]    
+   * sum +=  x[srcALen-1] * y[0]    
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = ((pIn1 + srcALen) - srcBLen) + 1u;
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------    
+   * Stage3 process    
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum += x[srcALen - srcBLen + 4] * y[3] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+      /* sum += x[srcALen - srcBLen + 3] * y[2] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+      /* sum += x[srcALen - srcBLen + 2] * y[1] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+      /* sum += x[srcALen - srcBLen + 1] * y[0] */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = (q31_t) ((((q63_t) sum << 32) +
+                      ((q63_t) * px++ * (*py++))) >> 32);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = sum << 1;
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+}
+
+/**    
+ * @} end of Corr group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_opt_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_opt_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,513 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_correlate_opt_q15.c    
+*    
+* Description:	Correlation of Q15 sequences.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Corr    
+ * @{    
+ */
+
+/**    
+ * @brief Correlation of Q15 sequences.  
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.    
+ * @param[in]  *pScratch points to scratch buffer of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.    
+ * @return none.    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch buffers should be aligned by 32-bit    
+ *     
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both inputs are in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * This approach provides 33 guard bits and there is no risk of overflow.    
+ * The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.    
+ *    
+ * \par    
+ * Refer to <code>arm_correlate_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4.   
+ *  
+ * 
+ */
+
+
+void arm_correlate_opt_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst,
+  q15_t * pScratch)
+{
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q63_t acc0, acc1, acc2, acc3;                  /* Accumulators                  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t x1, x2, x3;                              /* temporary variables for holding input1 and input2 values */
+  uint32_t j, blkCnt, outBlockSize;              /* loop counter                 */
+  int32_t inc = 1;                               /* output pointer increment     */
+  uint32_t tapCnt;
+  q31_t y1, y2;
+  q15_t *pScr;                                   /* Intermediate pointers        */
+  q15_t *pOut = pDst;                            /* output pointer               */
+#ifdef UNALIGNED_SUPPORT_DISABLE
+
+  q15_t a, b;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,        
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,        
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,        
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB        
+     * to make their lengths equal.        
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))        
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  pScr = pScratch;
+
+  /* Fill (srcBLen - 1u) zeros in scratch buffer */
+  arm_fill_q15(0, pScr, (srcBLen - 1u));
+
+  /* Update temporary scratch pointer */
+  pScr += (srcBLen - 1u);
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Copy (srcALen) samples in scratch buffer */
+  arm_copy_q15(pIn1, pScr, srcALen);
+
+  /* Update pointers */
+  //pIn1 += srcALen;    
+  pScr += srcALen;
+
+#else
+
+  /* Apply loop unrolling and do 4 Copies simultaneously. */
+  j = srcALen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr++ = *pIn1++;
+    *pScr++ = *pIn1++;
+    *pScr++ = *pIn1++;
+    *pScr++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  j = srcALen % 0x4u;
+
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr++ = *pIn1++;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+  arm_fill_q15(0, pScr, (srcBLen - 1u));
+
+  /* Update pointer */
+  pScr += (srcBLen - 1u);
+
+#else
+
+/* Apply loop unrolling and do 4 Copies simultaneously. */
+  j = (srcBLen - 1u) >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr++ = 0;
+    *pScr++ = 0;
+    *pScr++ = 0;
+    *pScr++ = 0;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  j = (srcBLen - 1u) % 0x4u;
+
+  while(j > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr++ = 0;
+
+    /* Decrement the loop counter */
+    j--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* Temporary pointer for scratch2 */
+  py = pIn2;
+
+
+  /* Actual correlation process starts here */
+  blkCnt = (srcALen + srcBLen - 1u) >> 2;
+
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr = pScratch;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Read four samples from scratch1 buffer */
+    x1 = *__SIMD32(pScr)++;
+
+    /* Read next four samples from scratch1 buffer */
+    x2 = *__SIMD32(pScr)++;
+
+    tapCnt = (srcBLen) >> 2u;
+
+    while(tapCnt > 0u)
+    {
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pIn2);
+      y2 = _SIMD32_OFFSET(pIn2 + 2u);
+
+      acc0 = __SMLALD(x1, y1, acc0);
+
+      acc2 = __SMLALD(x2, y1, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc1 = __SMLALDX(x3, y1, acc1);
+
+      x1 = _SIMD32_OFFSET(pScr);
+
+      acc0 = __SMLALD(x2, y2, acc0);
+
+      acc2 = __SMLALD(x1, y2, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y1, acc3);
+
+      acc1 = __SMLALDX(x3, y2, acc1);
+
+      x2 = _SIMD32_OFFSET(pScr + 2u);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y2, acc3);
+
+#else	 
+
+      /* Read four samples from smaller buffer */
+	  a = *pIn2;
+	  b = *(pIn2 + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      y1 = __PKHBT(a, b, 16);
+#else
+      y1 = __PKHBT(b, a, 16);
+#endif
+	  
+	  a = *(pIn2 + 2);
+	  b = *(pIn2 + 3);
+#ifndef ARM_MATH_BIG_ENDIAN
+      y2 = __PKHBT(a, b, 16);
+#else
+      y2 = __PKHBT(b, a, 16);
+#endif				
+
+      acc0 = __SMLALD(x1, y1, acc0);
+
+      acc2 = __SMLALD(x2, y1, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc1 = __SMLALDX(x3, y1, acc1);
+
+	  a = *pScr;
+	  b = *(pScr + 1);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(a, b, 16);
+#else
+      x1 = __PKHBT(b, a, 16);
+#endif
+
+      acc0 = __SMLALD(x2, y2, acc0);
+
+      acc2 = __SMLALD(x1, y2, acc2);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y1, acc3);
+
+      acc1 = __SMLALDX(x3, y2, acc1);
+
+	  a = *(pScr + 2);
+	  b = *(pScr + 3);
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x2 = __PKHBT(a, b, 16);
+#else
+      x2 = __PKHBT(b, a, 16);
+#endif
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLALDX(x3, y2, acc3);
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+      pIn2 += 4u;
+
+      pScr += 4u;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+
+
+    /* Update scratch pointer for remaining samples of smaller length sequence */
+    pScr -= 4u;
+
+
+    /* apply same above for remaining samples of smaller length sequence */
+    tapCnt = (srcBLen) & 3u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr++ * *pIn2);
+      acc1 += (*pScr++ * *pIn2);
+      acc2 += (*pScr++ * *pIn2);
+      acc3 += (*pScr++ * *pIn2++);
+
+      pScr -= 3u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+
+    /* Store the results in the accumulators in the destination buffer. */
+    *pOut = (__SSAT(acc0 >> 15u, 16));
+    pOut += inc;
+    *pOut = (__SSAT(acc1 >> 15u, 16));
+    pOut += inc;
+    *pOut = (__SSAT(acc2 >> 15u, 16));
+    pOut += inc;
+    *pOut = (__SSAT(acc3 >> 15u, 16));
+    pOut += inc;
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch += 4u;
+
+  }
+
+
+  blkCnt = (srcALen + srcBLen - 1u) & 0x3;
+
+  /* Calculate correlation for remaining samples of Bigger length sequence */
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr = pScratch;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+
+    tapCnt = (srcBLen) >> 1u;
+
+    while(tapCnt > 0u)
+    {
+
+      acc0 += (*pScr++ * *pIn2++);
+      acc0 += (*pScr++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    tapCnt = (srcBLen) & 1u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr++ * *pIn2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+    pOut += inc;
+
+    /* Initialization of inputB pointer */
+    pIn2 = py;
+
+    pScratch += 1u;
+
+  }
+
+
+}
+
+/**    
+ * @} end of Corr group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_opt_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_opt_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,464 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_correlate_opt_q7.c    
+*    
+* Description:	Correlation of Q7 sequences.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Corr    
+ * @{    
+ */
+
+/**    
+ * @brief Correlation of Q7 sequences.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.    
+ * @param[in]  *pScratch1 points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2*min(srcALen, srcBLen) - 2.    
+ * @param[in]  *pScratch2 points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).    
+ * @return none.    
+ *    
+ *    
+ * \par Restrictions    
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE    
+ *	In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit     
+ *        
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.    
+ * Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result.    
+ * The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.    
+ * This approach provides 17 guard bits and there is no risk of overflow as long as <code>max(srcALen, srcBLen)<131072</code>.    
+ * The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and saturated to 1.7 format.  
+ *  
+ * 
+ */
+
+
+
+void arm_correlate_opt_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst,
+  q15_t * pScratch1,
+  q15_t * pScratch2)
+{
+  q7_t *pOut = pDst;                             /* output pointer                */
+  q15_t *pScr1 = pScratch1;                      /* Temporary pointer for scratch */
+  q15_t *pScr2 = pScratch2;                      /* Temporary pointer for scratch */
+  q7_t *pIn1;                                    /* inputA pointer                */
+  q7_t *pIn2;                                    /* inputB pointer                */
+  q15_t *py;                                     /* Intermediate inputB pointer   */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulators                  */
+  uint32_t j, k = 0u, blkCnt;                    /* loop counter                  */
+  int32_t inc = 1;                               /* output pointer increment          */
+  uint32_t outBlockSize;                         /* loop counter                  */
+  q15_t x4;                                      /* Temporary input variable      */
+  uint32_t tapCnt;                               /* loop counter                  */
+  q31_t x1, x2, x3, y1;                          /* Temporary input variables     */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,        
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,        
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,        
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB        
+     * to make their lengths equal.        
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))        
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+
+  /* Copy (srcBLen) samples in scratch buffer */
+  k = srcBLen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    x4 = (q15_t) * pIn2++;
+    *pScr2++ = x4;
+    x4 = (q15_t) * pIn2++;
+    *pScr2++ = x4;
+    x4 = (q15_t) * pIn2++;
+    *pScr2++ = x4;
+    x4 = (q15_t) * pIn2++;
+    *pScr2++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcBLen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    x4 = (q15_t) * pIn2++;
+    *pScr2++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* Fill (srcBLen - 1u) zeros in scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update temporary scratch pointer */
+  pScr1 += (srcBLen - 1u);
+
+  /* Copy (srcALen) samples in scratch buffer */
+  k = srcALen >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = srcALen % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    x4 = (q15_t) * pIn1++;
+    *pScr1++ = x4;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  /* Fill (srcBLen - 1u) zeros at end of scratch buffer */
+  arm_fill_q15(0, pScr1, (srcBLen - 1u));
+
+  /* Update pointer */
+  pScr1 += (srcBLen - 1u);
+
+#else
+
+/* Apply loop unrolling and do 4 Copies simultaneously. */
+  k = (srcBLen - 1u) >> 2u;
+
+  /* First part of the processing with loop unrolling copies 4 data points at a time.       
+   ** a second loop below copies for the remaining 1 to 3 samples. */
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner */
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+  /* If the count is not a multiple of 4, copy remaining samples here.       
+   ** No loop unrolling is used. */
+  k = (srcBLen - 1u) % 0x4u;
+
+  while(k > 0u)
+  {
+    /* copy second buffer in reversal manner for remaining samples */
+    *pScr1++ = 0;
+
+    /* Decrement the loop counter */
+    k--;
+  }
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+  /* Temporary pointer for second sequence */
+  py = pScratch2;
+
+  /* Initialization of pScr2 pointer */
+  pScr2 = pScratch2;
+
+  /* Actual correlation process starts here */
+  blkCnt = (srcALen + srcBLen - 1u) >> 2;
+
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Read two samples from scratch1 buffer */
+    x1 = *__SIMD32(pScr1)++;
+
+    /* Read next two samples from scratch1 buffer */
+    x2 = *__SIMD32(pScr1)++;
+
+    tapCnt = (srcBLen) >> 2u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pScr2);
+
+      /* multiply and accumlate */
+      acc0 = __SMLAD(x1, y1, acc0);
+      acc2 = __SMLAD(x2, y1, acc2);
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      /* multiply and accumlate */
+      acc1 = __SMLADX(x3, y1, acc1);
+
+      /* Read next two samples from scratch1 buffer */
+      x1 = *__SIMD32(pScr1)++;
+
+      /* pack input data */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x1, x2, 0);
+#else
+      x3 = __PKHBT(x2, x1, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      /* Read four samples from smaller buffer */
+      y1 = _SIMD32_OFFSET(pScr2 + 2u);
+
+      acc0 = __SMLAD(x2, y1, acc0);
+
+      acc2 = __SMLAD(x1, y1, acc2);
+
+      acc1 = __SMLADX(x3, y1, acc1);
+
+      x2 = *__SIMD32(pScr1)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+      x3 = __PKHBT(x2, x1, 0);
+#else
+      x3 = __PKHBT(x1, x2, 0);
+#endif
+
+      acc3 = __SMLADX(x3, y1, acc3);
+
+      pScr2 += 4u;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+
+
+    /* Update scratch pointer for remaining samples of smaller length sequence */
+    pScr1 -= 4u;
+
+
+    /* apply same above for remaining samples of smaller length sequence */
+    tapCnt = (srcBLen) & 3u;
+
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pScr2);
+      acc1 += (*pScr1++ * *pScr2);
+      acc2 += (*pScr1++ * *pScr2);
+      acc3 += (*pScr1++ * *pScr2++);
+
+      pScr1 -= 3u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q7_t) (__SSAT(acc0 >> 7u, 8));
+    pOut += inc;
+    *pOut = (q7_t) (__SSAT(acc1 >> 7u, 8));
+    pOut += inc;
+    *pOut = (q7_t) (__SSAT(acc2 >> 7u, 8));
+    pOut += inc;
+    *pOut = (q7_t) (__SSAT(acc3 >> 7u, 8));
+    pOut += inc;
+
+    /* Initialization of inputB pointer */
+    pScr2 = py;
+
+    pScratch1 += 4u;
+
+  }
+
+
+  blkCnt = (srcALen + srcBLen - 1u) & 0x3;
+
+  /* Calculate correlation for remaining samples of Bigger length sequence */
+  while(blkCnt > 0)
+  {
+    /* Initialze temporary scratch pointer as scratch1 */
+    pScr1 = pScratch1;
+
+    /* Clear Accumlators */
+    acc0 = 0;
+
+    tapCnt = (srcBLen) >> 1u;
+
+    while(tapCnt > 0u)
+    {
+      acc0 += (*pScr1++ * *pScr2++);
+      acc0 += (*pScr1++ * *pScr2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    tapCnt = (srcBLen) & 1u;
+
+    /* apply same above for remaining samples of smaller length sequence */
+    while(tapCnt > 0u)
+    {
+
+      /* accumlate the results */
+      acc0 += (*pScr1++ * *pScr2++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    blkCnt--;
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q7_t) (__SSAT(acc0 >> 7u, 8));
+
+    pOut += inc;
+
+    /* Initialization of inputB pointer */
+    pScr2 = py;
+
+    pScratch1 += 1u;
+
+  }
+
+}
+
+/**    
+ * @} end of Corr group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,719 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_correlate_q15.c   
+*   
+* Description:	Correlation of Q15 sequences. 
+*   
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup Corr   
+ * @{   
+ */
+
+/**   
+ * @brief Correlation of Q15 sequences. 
+ * @param[in] *pSrcA points to the first input sequence.   
+ * @param[in] srcALen length of the first input sequence.   
+ * @param[in] *pSrcB points to the second input sequence.   
+ * @param[in] srcBLen length of the second input sequence.   
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.   
+ * @return none.   
+ *   
+ * @details   
+ * <b>Scaling and Overflow Behavior:</b>   
+ *   
+ * \par   
+ * The function is implemented using a 64-bit internal accumulator.   
+ * Both inputs are in 1.15 format and multiplications yield a 2.30 result.   
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.   
+ * This approach provides 33 guard bits and there is no risk of overflow.   
+ * The 34.30 result is then truncated to 34.15 format by discarding the low 15 bits and then saturated to 1.15 format.   
+ *   
+ * \par   
+ * Refer to <code>arm_correlate_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4. 
+ *
+ * \par    
+ * Refer the function <code>arm_correlate_opt_q15()</code> for a faster implementation of this function using scratch buffers.
+ * 
+ */
+
+void arm_correlate_q15(
+  q15_t * pSrcA,
+  uint32_t srcALen,
+  q15_t * pSrcB,
+  uint32_t srcBLen,
+  q15_t * pDst)
+{
+
+#if (defined(ARM_MATH_CM4) || defined(ARM_MATH_CM3)) && !defined(UNALIGNED_SUPPORT_DISABLE)
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t *pIn1;                                   /* inputA pointer               */
+  q15_t *pIn2;                                   /* inputB pointer               */
+  q15_t *pOut = pDst;                            /* output pointer               */
+  q63_t sum, acc0, acc1, acc2, acc3;             /* Accumulators                  */
+  q15_t *px;                                     /* Intermediate inputA pointer  */
+  q15_t *py;                                     /* Intermediate inputB pointer  */
+  q15_t *pSrc1;                                  /* Intermediate pointers        */
+  q31_t x0, x1, x2, x3, c0;                      /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counter                 */
+  int32_t inc = 1;                               /* Destination address modifier */
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,   
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB   
+     * to make their lengths equal.   
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))   
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally   
+   * divided into three parts according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first part of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second part of the algorithm, srcBLen number of multiplications are done.   
+   * In the third part of the algorithm, the multiplications decrease by one   
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.   
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]   
+   * sum = x[0] * y[srcBlen - 2] + x[1] * y[srcBlen - 1]   
+   * ....   
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* The first loop starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 4] , x[1] * y[srcBLen - 3] */
+      sum = __SMLALD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+      /* x[3] * y[srcBLen - 1] , x[2] * y[srcBLen - 2] */
+      sum = __SMLALD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0] * y[srcBLen - 1] */
+      sum = __SMLALD(*px++, *py++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (__SSAT((sum >> 15), 16));
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]   
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------   
+   * Stage2 process   
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4, to loop unroll the srcBLen loop */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1] samples */
+      x0 = *__SIMD32(px);
+      /* read x[1], x[2] samples */
+      x1 = _SIMD32_OFFSET(px + 1);
+	  px += 2u;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read the first two inputB samples using SIMD:   
+         * y[0] and y[1] */
+        c0 = *__SIMD32(py)++;
+
+        /* acc0 +=  x[0] * y[0] + x[1] * y[1] */
+        acc0 = __SMLALD(x0, c0, acc0);
+
+        /* acc1 +=  x[1] * y[0] + x[2] * y[1] */
+        acc1 = __SMLALD(x1, c0, acc1);
+
+        /* Read x[2], x[3] */
+        x2 = *__SIMD32(px);
+
+        /* Read x[3], x[4] */
+        x3 = _SIMD32_OFFSET(px + 1);
+
+        /* acc2 +=  x[2] * y[0] + x[3] * y[1] */
+        acc2 = __SMLALD(x2, c0, acc2);
+
+        /* acc3 +=  x[3] * y[0] + x[4] * y[1] */
+        acc3 = __SMLALD(x3, c0, acc3);
+
+        /* Read y[2] and y[3] */
+        c0 = *__SIMD32(py)++;
+
+        /* acc0 +=  x[2] * y[2] + x[3] * y[3] */
+        acc0 = __SMLALD(x2, c0, acc0);
+
+        /* acc1 +=  x[3] * y[2] + x[4] * y[3] */
+        acc1 = __SMLALD(x3, c0, acc1);
+
+        /* Read x[4], x[5] */
+        x0 = _SIMD32_OFFSET(px + 2);
+
+        /* Read x[5], x[6] */
+        x1 = _SIMD32_OFFSET(px + 3);
+
+		px += 4u;
+
+        /* acc2 +=  x[4] * y[2] + x[5] * y[3] */
+        acc2 = __SMLALD(x0, c0, acc2);
+
+        /* acc3 +=  x[5] * y[2] + x[6] * y[3] */
+        acc3 = __SMLALD(x1, c0, acc3);
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      if(k == 1u)
+      {
+        /* Read y[4] */
+        c0 = *py;
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+
+#else
+
+        c0 = c0 & 0x0000FFFF;
+
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+        /* Read x[7] */
+        x3 = *__SIMD32(px);
+		px++;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALD(x0, c0, acc0);
+        acc1 = __SMLALD(x1, c0, acc1);
+        acc2 = __SMLALDX(x1, c0, acc2);
+        acc3 = __SMLALDX(x3, c0, acc3);
+      }
+
+      if(k == 2u)
+      {
+        /* Read y[4], y[5] */
+        c0 = *__SIMD32(py);
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px + 1);
+		px += 2u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALD(x0, c0, acc0);
+        acc1 = __SMLALD(x1, c0, acc1);
+        acc2 = __SMLALD(x3, c0, acc2);
+        acc3 = __SMLALD(x2, c0, acc3);
+      }
+
+      if(k == 3u)
+      {
+        /* Read y[4], y[5] */
+        c0 = *__SIMD32(py)++;
+
+        /* Read x[7], x[8] */
+        x3 = *__SIMD32(px);
+
+        /* Read x[9] */
+        x2 = _SIMD32_OFFSET(px + 1);
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALD(x0, c0, acc0);
+        acc1 = __SMLALD(x1, c0, acc1);
+        acc2 = __SMLALD(x3, c0, acc2);
+        acc3 = __SMLALD(x2, c0, acc3);
+
+        c0 = (*py);
+
+        /* Read y[6] */
+#ifdef  ARM_MATH_BIG_ENDIAN
+
+        c0 = c0 << 16u;
+#else
+
+        c0 = c0 & 0x0000FFFF;
+#endif /*      #ifdef  ARM_MATH_BIG_ENDIAN     */
+        /* Read x[10] */
+        x3 = _SIMD32_OFFSET(px + 2);
+		px += 3u;
+
+        /* Perform the multiply-accumulates */
+        acc0 = __SMLALDX(x1, c0, acc0);
+        acc1 = __SMLALD(x2, c0, acc1);
+        acc2 = __SMLALDX(x2, c0, acc2);
+        acc3 = __SMLALDX(x3, c0, acc3);
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (__SSAT(acc0 >> 15, 16));
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = (q15_t) (__SSAT(acc1 >> 15, 16));
+      pOut += inc;
+
+      *pOut = (q15_t) (__SSAT(acc2 >> 15, 16));
+      pOut += inc;
+
+      *pOut = (q15_t) (__SSAT(acc3 >> 15, 16));
+      pOut += inc;
+
+      /* Increment the count by 4 as 4 output values are computed */
+      count += 4u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q63_t) * px++ * *py++);
+        sum += ((q63_t) * px++ * *py++);
+        sum += ((q63_t) * px++ * *py++);
+        sum += ((q63_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q63_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (__SSAT(sum >> 15, 16));
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment count by 1, as one output value is computed */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q63_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q15_t) (__SSAT(sum >> 15, 16));
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]   
+   * sum +=  x[srcALen-1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = (pIn1 + srcALen) - (srcBLen - 1u);
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum += x[srcALen - srcBLen + 4] * y[3] , sum += x[srcALen - srcBLen + 3] * y[2] */
+      sum = __SMLALD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+      /* sum += x[srcALen - srcBLen + 2] * y[1] , sum += x[srcALen - srcBLen + 1] * y[0] */
+      sum = __SMLALD(*__SIMD32(px)++, *__SIMD32(py)++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum = __SMLALD(*px++, *py++, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q15_t) (__SSAT((sum >> 15), 16));
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  q15_t *pIn1 = pSrcA;                           /* inputA pointer               */
+  q15_t *pIn2 = pSrcB + (srcBLen - 1u);          /* inputB pointer               */
+  q63_t sum;                                     /* Accumulators                  */
+  uint32_t i = 0u, j;                            /* loop counters */
+  uint32_t inv = 0u;                             /* Reverse order flag */
+  uint32_t tot = 0u;                             /* Length */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and a varaible, inv is set to 1 */
+  /* If lengths are not equal then zero pad has to be done to  make the two   
+   * inputs of same length. But to improve the performance, we include zeroes   
+   * in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen, (srcALen - srcBLen) zeroes has to included in the   
+   * starting of the output buffer */
+  /* If srcALen < srcBLen, (srcALen - srcBLen) zeroes has to included in the  
+   * ending of the output buffer */
+  /* Once the zero padding is done the remaining of the output is calcualted  
+   * using convolution but with the shorter signal time shifted. */
+
+  /* Calculate the length of the remaining sequence */
+  tot = ((srcALen + srcBLen) - 2u);
+
+  if(srcALen > srcBLen)
+  {
+    /* Calculating the number of zeros to be padded to the output */
+    j = srcALen - srcBLen;
+
+    /* Initialise the pointer after zero padding */
+    pDst += j;
+  }
+
+  else if(srcALen < srcBLen)
+  {
+    /* Initialization to inputB pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization to the end of inputA pointer */
+    pIn2 = pSrcA + (srcALen - 1u);
+
+    /* Initialisation of the pointer after zero padding */
+    pDst = pDst + tot;
+
+    /* Swapping the lengths */
+    j = srcALen;
+    srcALen = srcBLen;
+    srcBLen = j;
+
+    /* Setting the reverse flag */
+    inv = 1;
+
+  }
+
+  /* Loop to calculate convolution for output length number of times */
+  for (i = 0u; i <= tot; i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0u; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if((((i - j) < srcBLen) && (j < srcALen)))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += ((q31_t) pIn1[j] * pIn2[-((int32_t) i - j)]);
+      }
+    }
+    /* Store the output in the destination buffer */
+    if(inv == 1)
+      *pDst-- = (q15_t) __SSAT((sum >> 15u), 16u);
+    else
+      *pDst++ = (q15_t) __SSAT((sum >> 15u), 16u);
+  }
+
+#endif /*#if (defined(ARM_MATH_CM4) || defined(ARM_MATH_CM3)) && !defined(UNALIGNED_SUPPORT_DISABLE) */
+
+}
+
+/**   
+ * @} end of Corr group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,665 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_correlate_q31.c    
+*    
+* Description:	Correlation of Q31 sequences.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup Corr    
+ * @{    
+ */
+
+/**    
+ * @brief Correlation of Q31 sequences.    
+ * @param[in] *pSrcA points to the first input sequence.    
+ * @param[in] srcALen length of the first input sequence.    
+ * @param[in] *pSrcB points to the second input sequence.    
+ * @param[in] srcBLen length of the second input sequence.    
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * There is no saturation on intermediate additions.    
+ * Thus, if the accumulator overflows it wraps around and distorts the result.    
+ * The input signals should be scaled down to avoid intermediate overflows.    
+ * Scale down one of the inputs by 1/min(srcALen, srcBLen)to avoid overflows since a    
+ * maximum of min(srcALen, srcBLen) number of additions is carried internally.    
+ * The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.    
+ *    
+ * \par    
+ * See <code>arm_correlate_fast_q31()</code> for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.    
+ */
+
+void arm_correlate_q31(
+  q31_t * pSrcA,
+  uint32_t srcALen,
+  q31_t * pSrcB,
+  uint32_t srcBLen,
+  q31_t * pDst)
+{
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t *pIn1;                                   /* inputA pointer               */
+  q31_t *pIn2;                                   /* inputB pointer               */
+  q31_t *pOut = pDst;                            /* output pointer               */
+  q31_t *px;                                     /* Intermediate inputA pointer  */
+  q31_t *py;                                     /* Intermediate inputB pointer  */
+  q31_t *pSrc1;                                  /* Intermediate pointers        */
+  q63_t sum, acc0, acc1, acc2;                   /* Accumulators                  */
+  q31_t x0, x1, x2, c0;                          /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counter                 */
+  int32_t inc = 1;                               /* Destination address modifier */
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,    
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,    
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,    
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB    
+     * to make their lengths equal.    
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))    
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally    
+   * divided into three parts according to the number of multiplications that has to be    
+   * taken place between inputA samples and inputB samples. In the first part of the    
+   * algorithm, the multiplications increase by one for every iteration.    
+   * In the second part of the algorithm, srcBLen number of multiplications are done.    
+   * In the third part of the algorithm, the multiplications decrease by one    
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.    
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------    
+   * Initializations of stage1    
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]    
+   * sum = x[0] * y[srcBlen - 2] + x[1] * y[srcBlen - 1]    
+   * ....    
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]    
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------    
+   * Stage1 process    
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] * y[srcBLen - 4] */
+      sum += (q63_t) * px++ * (*py++);
+      /* x[1] * y[srcBLen - 3] */
+      sum += (q63_t) * px++ * (*py++);
+      /* x[2] * y[srcBLen - 2] */
+      sum += (q63_t) * px++ * (*py++);
+      /* x[3] * y[srcBLen - 1] */
+      sum += (q63_t) * px++ * (*py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0] * y[srcBLen - 1] */
+      sum += (q63_t) * px++ * (*py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q31_t) (sum >> 31);
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------    
+   * Initializations of stage2    
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]    
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]    
+   * ....    
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------    
+   * Stage2 process    
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.    
+   * So, to loop unroll over blockSize2,    
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll by 3 */
+    blkCnt = blockSize2 / 3;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+
+      /* read x[0], x[1] samples */
+      x0 = *(px++);
+      x1 = *(px++);
+
+      /* Apply loop unrolling and compute 3 MACs simultaneously. */
+      k = srcBLen / 3;
+
+      /* First part of the processing with loop unrolling.  Compute 3 MACs at a time.        
+       ** a second loop below computes MACs for the remaining 1 to 2 samples. */
+      do
+      {
+        /* Read y[0] sample */
+        c0 = *(py);
+
+        /* Read x[2] sample */
+        x2 = *(px);
+
+        /* Perform the multiply-accumulate */
+        /* acc0 +=  x[0] * y[0] */
+        acc0 += ((q63_t) x0 * c0);
+        /* acc1 +=  x[1] * y[0] */
+        acc1 += ((q63_t) x1 * c0);
+        /* acc2 +=  x[2] * y[0] */
+        acc2 += ((q63_t) x2 * c0);
+
+        /* Read y[1] sample */
+        c0 = *(py + 1u);
+
+        /* Read x[3] sample */
+        x0 = *(px + 1u);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[1] * y[1] */
+        acc0 += ((q63_t) x1 * c0);
+        /* acc1 +=  x[2] * y[1] */
+        acc1 += ((q63_t) x2 * c0);
+        /* acc2 +=  x[3] * y[1] */
+        acc2 += ((q63_t) x0 * c0);
+
+        /* Read y[2] sample */
+        c0 = *(py + 2u);
+
+        /* Read x[4] sample */
+        x1 = *(px + 2u);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[2] * y[2] */
+        acc0 += ((q63_t) x2 * c0);
+        /* acc1 +=  x[3] * y[2] */
+        acc1 += ((q63_t) x0 * c0);
+        /* acc2 +=  x[4] * y[2] */
+        acc2 += ((q63_t) x1 * c0);
+
+        /* update scratch pointers */
+        px += 3u;
+        py += 3u;
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 3, compute any remaining MACs here.        
+       ** No loop unrolling is used. */
+      k = srcBLen - (3 * (srcBLen / 3));
+
+      while(k > 0u)
+      {
+        /* Read y[4] sample */
+        c0 = *(py++);
+
+        /* Read x[7] sample */
+        x2 = *(px++);
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[4] */
+        acc0 += ((q63_t) x0 * c0);
+        /* acc1 +=  x[5] * y[4] */
+        acc1 += ((q63_t) x1 * c0);
+        /* acc2 +=  x[6] * y[4] */
+        acc2 += ((q63_t) x2 * c0);
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q31_t) (acc0 >> 31);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = (q31_t) (acc1 >> 31);
+      pOut += inc;
+
+      *pOut = (q31_t) (acc2 >> 31);
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 3 */
+      count += 3u;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 3, compute any remaining output samples here.        
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 - 3 * (blockSize2 / 3);
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += (q63_t) * px++ * (*py++);
+        sum += (q63_t) * px++ * (*py++);
+        sum += (q63_t) * px++ * (*py++);
+        sum += (q63_t) * px++ * (*py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.    
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) * px++ * (*py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q31_t) (sum >> 31);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,    
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) * px++ * (*py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q31_t) (sum >> 31);
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------    
+   * Initializations of stage3    
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]    
+   * ....    
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]    
+   * sum +=  x[srcALen-1] * y[0]    
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.    
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = pIn1 + (srcALen - (srcBLen - 1u));
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------    
+   * Stage3 process    
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.    
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* sum += x[srcALen - srcBLen + 4] * y[3] */
+      sum += (q63_t) * px++ * (*py++);
+      /* sum += x[srcALen - srcBLen + 3] * y[2] */
+      sum += (q63_t) * px++ * (*py++);
+      /* sum += x[srcALen - srcBLen + 2] * y[1] */
+      sum += (q63_t) * px++ * (*py++);
+      /* sum += x[srcALen - srcBLen + 1] * y[0] */
+      sum += (q63_t) * px++ * (*py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.    
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += (q63_t) * px++ * (*py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q31_t) (sum >> 31);
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t *pIn1 = pSrcA;                           /* inputA pointer               */
+  q31_t *pIn2 = pSrcB + (srcBLen - 1u);          /* inputB pointer               */
+  q63_t sum;                                     /* Accumulators                  */
+  uint32_t i = 0u, j;                            /* loop counters */
+  uint32_t inv = 0u;                             /* Reverse order flag */
+  uint32_t tot = 0u;                             /* Length */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and a varaible, inv is set to 1 */
+  /* If lengths are not equal then zero pad has to be done to  make the two    
+   * inputs of same length. But to improve the performance, we include zeroes    
+   * in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen, (srcALen - srcBLen) zeroes has to included in the    
+   * starting of the output buffer */
+  /* If srcALen < srcBLen, (srcALen - srcBLen) zeroes has to included in the   
+   * ending of the output buffer */
+  /* Once the zero padding is done the remaining of the output is calcualted   
+   * using correlation but with the shorter signal time shifted. */
+
+  /* Calculate the length of the remaining sequence */
+  tot = ((srcALen + srcBLen) - 2u);
+
+  if(srcALen > srcBLen)
+  {
+    /* Calculating the number of zeros to be padded to the output */
+    j = srcALen - srcBLen;
+
+    /* Initialise the pointer after zero padding */
+    pDst += j;
+  }
+
+  else if(srcALen < srcBLen)
+  {
+    /* Initialization to inputB pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization to the end of inputA pointer */
+    pIn2 = pSrcA + (srcALen - 1u);
+
+    /* Initialisation of the pointer after zero padding */
+    pDst = pDst + tot;
+
+    /* Swapping the lengths */
+    j = srcALen;
+    srcALen = srcBLen;
+    srcBLen = j;
+
+    /* Setting the reverse flag */
+    inv = 1;
+
+  }
+
+  /* Loop to calculate correlation for output length number of times */
+  for (i = 0u; i <= tot; i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0;
+
+    /* Loop to perform MAC operations according to correlation equation */
+    for (j = 0u; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if((((i - j) < srcBLen) && (j < srcALen)))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += ((q63_t) pIn1[j] * pIn2[-((int32_t) i - j)]);
+      }
+    }
+    /* Store the output in the destination buffer */
+    if(inv == 1)
+      *pDst-- = (q31_t) (sum >> 31u);
+    else
+      *pDst++ = (q31_t) (sum >> 31u);
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of Corr group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_correlate_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,790 @@
+/* ----------------------------------------------------------------------   
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.   
+*   
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*   
+* Project: 	    CMSIS DSP Library   
+* Title:		arm_correlate_q7.c   
+*   
+* Description:	Correlation of Q7 sequences. 
+*   
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**   
+ * @ingroup groupFilters   
+ */
+
+/**   
+ * @addtogroup Corr   
+ * @{   
+ */
+
+/**   
+ * @brief Correlation of Q7 sequences.   
+ * @param[in] *pSrcA points to the first input sequence.   
+ * @param[in] srcALen length of the first input sequence.   
+ * @param[in] *pSrcB points to the second input sequence.   
+ * @param[in] srcBLen length of the second input sequence.   
+ * @param[out] *pDst points to the location where the output result is written.  Length 2 * max(srcALen, srcBLen) - 1.   
+ * @return none.   
+ *   
+ * @details   
+ * <b>Scaling and Overflow Behavior:</b>   
+ *   
+ * \par   
+ * The function is implemented using a 32-bit internal accumulator.   
+ * Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result.   
+ * The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.   
+ * This approach provides 17 guard bits and there is no risk of overflow as long as <code>max(srcALen, srcBLen)<131072</code>.   
+ * The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and saturated to 1.7 format.   
+ *
+ * \par    
+ * Refer the function <code>arm_correlate_opt_q7()</code> for a faster implementation of this function.
+ * 
+ */
+
+void arm_correlate_q7(
+  q7_t * pSrcA,
+  uint32_t srcALen,
+  q7_t * pSrcB,
+  uint32_t srcBLen,
+  q7_t * pDst)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t *pIn1;                                    /* inputA pointer               */
+  q7_t *pIn2;                                    /* inputB pointer               */
+  q7_t *pOut = pDst;                             /* output pointer               */
+  q7_t *px;                                      /* Intermediate inputA pointer  */
+  q7_t *py;                                      /* Intermediate inputB pointer  */
+  q7_t *pSrc1;                                   /* Intermediate pointers        */
+  q31_t sum, acc0, acc1, acc2, acc3;             /* Accumulators                  */
+  q31_t input1, input2;                          /* temporary variables */
+  q15_t in1, in2;                                /* temporary variables */
+  q7_t x0, x1, x2, x3, c0, c1;                   /* temporary variables for holding input and coefficient values */
+  uint32_t j, k = 0u, count, blkCnt, outBlockSize, blockSize1, blockSize2, blockSize3;  /* loop counter                 */
+  int32_t inc = 1;
+
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and the destination pointer modifier, inc is set to -1 */
+  /* If srcALen > srcBLen, zero pad has to be done to srcB to make the two inputs of same length */
+  /* But to improve the performance,   
+   * we include zeroes in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the starting of the output buffer */
+  /* If srcALen < srcBLen,   
+   * (srcALen - srcBLen) zeroes has to included in the ending of the output buffer */
+  if(srcALen >= srcBLen)
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcA);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcB);
+
+    /* Number of output samples is calculated */
+    outBlockSize = (2u * srcALen) - 1u;
+
+    /* When srcALen > srcBLen, zero padding is done to srcB   
+     * to make their lengths equal.   
+     * Instead, (outBlockSize - (srcALen + srcBLen - 1))   
+     * number of output samples are made zero */
+    j = outBlockSize - (srcALen + (srcBLen - 1u));
+
+    /* Updating the pointer position to non zero value */
+    pOut += j;
+
+  }
+  else
+  {
+    /* Initialization of inputA pointer */
+    pIn1 = (pSrcB);
+
+    /* Initialization of inputB pointer */
+    pIn2 = (pSrcA);
+
+    /* srcBLen is always considered as shorter or equal to srcALen */
+    j = srcBLen;
+    srcBLen = srcALen;
+    srcALen = j;
+
+    /* CORR(x, y) = Reverse order(CORR(y, x)) */
+    /* Hence set the destination pointer to point to the last output sample */
+    pOut = pDst + ((srcALen + srcBLen) - 2u);
+
+    /* Destination address modifier is set to -1 */
+    inc = -1;
+
+  }
+
+  /* The function is internally   
+   * divided into three parts according to the number of multiplications that has to be   
+   * taken place between inputA samples and inputB samples. In the first part of the   
+   * algorithm, the multiplications increase by one for every iteration.   
+   * In the second part of the algorithm, srcBLen number of multiplications are done.   
+   * In the third part of the algorithm, the multiplications decrease by one   
+   * for every iteration.*/
+  /* The algorithm is implemented in three stages.   
+   * The loop counters of each stage is initiated here. */
+  blockSize1 = srcBLen - 1u;
+  blockSize2 = srcALen - (srcBLen - 1u);
+  blockSize3 = blockSize1;
+
+  /* --------------------------   
+   * Initializations of stage1   
+   * -------------------------*/
+
+  /* sum = x[0] * y[srcBlen - 1]   
+   * sum = x[0] * y[srcBlen - 2] + x[1] * y[srcBlen - 1]   
+   * ....   
+   * sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen - 1] * y[srcBLen - 1]   
+   */
+
+  /* In this stage the MAC operations are increased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = 1u;
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  pSrc1 = pIn2 + (srcBLen - 1u);
+  py = pSrc1;
+
+  /* ------------------------   
+   * Stage1 process   
+   * ----------------------*/
+
+  /* The first stage starts here */
+  while(blockSize1 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[0] , x[1] */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* y[srcBLen - 4] , y[srcBLen - 3] */
+      in1 = (q15_t) * py++;
+      in2 = (q15_t) * py++;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* x[0] * y[srcBLen - 4] */
+      /* x[1] * y[srcBLen - 3] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* x[2] , x[3] */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* y[srcBLen - 2] , y[srcBLen - 1] */
+      in1 = (q15_t) * py++;
+      in2 = (q15_t) * py++;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* x[2] * y[srcBLen - 2] */
+      /* x[3] * y[srcBLen - 1] */
+      sum = __SMLAD(input1, input2, sum);
+
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      /* x[0] * y[srcBLen - 1] */
+      sum += (q31_t) ((q15_t) * px++ * *py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q7_t) (__SSAT(sum >> 7, 8));
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    py = pSrc1 - count;
+    px = pIn1;
+
+    /* Increment the MAC count */
+    count++;
+
+    /* Decrement the loop counter */
+    blockSize1--;
+  }
+
+  /* --------------------------   
+   * Initializations of stage2   
+   * ------------------------*/
+
+  /* sum = x[0] * y[0] + x[1] * y[1] +...+ x[srcBLen-1] * y[srcBLen-1]   
+   * sum = x[1] * y[0] + x[2] * y[1] +...+ x[srcBLen] * y[srcBLen-1]   
+   * ....   
+   * sum = x[srcALen-srcBLen-2] * y[0] + x[srcALen-srcBLen-1] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   */
+
+  /* Working pointer of inputA */
+  px = pIn1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* count is index by which the pointer pIn1 to be incremented */
+  count = 0u;
+
+  /* -------------------   
+   * Stage2 process   
+   * ------------------*/
+
+  /* Stage2 depends on srcBLen as in this stage srcBLen number of MACS are performed.   
+   * So, to loop unroll over blockSize2,   
+   * srcBLen should be greater than or equal to 4 */
+  if(srcBLen >= 4u)
+  {
+    /* Loop unroll over blockSize2, by 4 */
+    blkCnt = blockSize2 >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Set all accumulators to zero */
+      acc0 = 0;
+      acc1 = 0;
+      acc2 = 0;
+      acc3 = 0;
+
+      /* read x[0], x[1], x[2] samples */
+      x0 = *px++;
+      x1 = *px++;
+      x2 = *px++;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      do
+      {
+        /* Read y[0] sample */
+        c0 = *py++;
+        /* Read y[1] sample */
+        c1 = *py++;
+
+        /* Read x[3] sample */
+        x3 = *px++;
+
+        /* x[0] and x[1] are packed */
+        in1 = (q15_t) x0;
+        in2 = (q15_t) x1;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* y[0] and y[1] are packed */
+        in1 = (q15_t) c0;
+        in2 = (q15_t) c1;
+
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc0 += x[0] * y[0] + x[1] * y[1]  */
+        acc0 = __SMLAD(input1, input2, acc0);
+
+        /* x[1] and x[2] are packed */
+        in1 = (q15_t) x1;
+        in2 = (q15_t) x2;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc1 += x[1] * y[0] + x[2] * y[1] */
+        acc1 = __SMLAD(input1, input2, acc1);
+
+        /* x[2] and x[3] are packed */
+        in1 = (q15_t) x2;
+        in2 = (q15_t) x3;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc2 += x[2] * y[0] + x[3] * y[1]  */
+        acc2 = __SMLAD(input1, input2, acc2);
+
+        /* Read x[4] sample */
+        x0 = *(px++);
+
+        /* x[3] and x[4] are packed */
+        in1 = (q15_t) x3;
+        in2 = (q15_t) x0;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc3 += x[3] * y[0] + x[4] * y[1]  */
+        acc3 = __SMLAD(input1, input2, acc3);
+
+        /* Read y[2] sample */
+        c0 = *py++;
+        /* Read y[3] sample */
+        c1 = *py++;
+
+        /* Read x[5] sample */
+        x1 = *px++;
+
+        /* x[2] and x[3] are packed */
+        in1 = (q15_t) x2;
+        in2 = (q15_t) x3;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* y[2] and y[3] are packed */
+        in1 = (q15_t) c0;
+        in2 = (q15_t) c1;
+
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc0 += x[2] * y[2] + x[3] * y[3]  */
+        acc0 = __SMLAD(input1, input2, acc0);
+
+        /* x[3] and x[4] are packed */
+        in1 = (q15_t) x3;
+        in2 = (q15_t) x0;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc1 += x[3] * y[2] + x[4] * y[3]  */
+        acc1 = __SMLAD(input1, input2, acc1);
+
+        /* x[4] and x[5] are packed */
+        in1 = (q15_t) x0;
+        in2 = (q15_t) x1;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc2 += x[4] * y[2] + x[5] * y[3]  */
+        acc2 = __SMLAD(input1, input2, acc2);
+
+        /* Read x[6] sample */
+        x2 = *px++;
+
+        /* x[5] and x[6] are packed */
+        in1 = (q15_t) x1;
+        in2 = (q15_t) x2;
+
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* acc3 += x[5] * y[2] + x[6] * y[3]  */
+        acc3 = __SMLAD(input1, input2, acc3);
+
+      } while(--k);
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Read y[4] sample */
+        c0 = *py++;
+
+        /* Read x[7] sample */
+        x3 = *px++;
+
+        /* Perform the multiply-accumulates */
+        /* acc0 +=  x[4] * y[4] */
+        acc0 += ((q15_t) x0 * c0);
+        /* acc1 +=  x[5] * y[4] */
+        acc1 += ((q15_t) x1 * c0);
+        /* acc2 +=  x[6] * y[4] */
+        acc2 += ((q15_t) x2 * c0);
+        /* acc3 +=  x[7] * y[4] */
+        acc3 += ((q15_t) x3 * c0);
+
+        /* Reuse the present samples for the next MAC */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q7_t) (__SSAT(acc0 >> 7, 8));
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      *pOut = (q7_t) (__SSAT(acc1 >> 7, 8));
+      pOut += inc;
+
+      *pOut = (q7_t) (__SSAT(acc2 >> 7, 8));
+      pOut += inc;
+
+      *pOut = (q7_t) (__SSAT(acc3 >> 7, 8));
+      pOut += inc;
+
+	  count += 4u;
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize2 is not a multiple of 4, compute any remaining output samples here.   
+     ** No loop unrolling is used. */
+    blkCnt = blockSize2 % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Apply loop unrolling and compute 4 MACs simultaneously. */
+      k = srcBLen >> 2u;
+
+      /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+       ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+      while(k > 0u)
+      {
+        /* Reading two inputs of SrcA buffer and packing */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* Reading two inputs of SrcB buffer and packing */
+        in1 = (q15_t) * py++;
+        in2 = (q15_t) * py++;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* Perform the multiply-accumulates */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Reading two inputs of SrcA buffer and packing */
+        in1 = (q15_t) * px++;
+        in2 = (q15_t) * px++;
+        input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* Reading two inputs of SrcB buffer and packing */
+        in1 = (q15_t) * py++;
+        in2 = (q15_t) * py++;
+        input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+        /* Perform the multiply-accumulates */
+        sum = __SMLAD(input1, input2, sum);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* If the srcBLen is not a multiple of 4, compute any remaining MACs here.   
+       ** No loop unrolling is used. */
+      k = srcBLen % 0x4u;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulates */
+        sum += ((q15_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q7_t) (__SSAT(sum >> 7, 8));
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the pointer pIn1 index, count by 1 */
+	  count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+  else
+  {
+    /* If the srcBLen is not a multiple of 4,   
+     * the blockSize2 loop cannot be unrolled by 4 */
+    blkCnt = blockSize2;
+
+    while(blkCnt > 0u)
+    {
+      /* Accumulator is made zero for every iteration */
+      sum = 0;
+
+      /* Loop over srcBLen */
+      k = srcBLen;
+
+      while(k > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += ((q15_t) * px++ * *py++);
+
+        /* Decrement the loop counter */
+        k--;
+      }
+
+      /* Store the result in the accumulator in the destination buffer. */
+      *pOut = (q7_t) (__SSAT(sum >> 7, 8));
+      /* Destination pointer is updated according to the address modifier, inc */
+      pOut += inc;
+
+      /* Increment the MAC count */
+      count++;
+
+      /* Update the inputA and inputB pointers for next MAC calculation */
+      px = pIn1 + count;
+      py = pIn2;
+
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+  }
+
+  /* --------------------------   
+   * Initializations of stage3   
+   * -------------------------*/
+
+  /* sum += x[srcALen-srcBLen+1] * y[0] + x[srcALen-srcBLen+2] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * sum += x[srcALen-srcBLen+2] * y[0] + x[srcALen-srcBLen+3] * y[1] +...+ x[srcALen-1] * y[srcBLen-1]   
+   * ....   
+   * sum +=  x[srcALen-2] * y[0] + x[srcALen-1] * y[1]   
+   * sum +=  x[srcALen-1] * y[0]   
+   */
+
+  /* In this stage the MAC operations are decreased by 1 for every iteration.   
+     The count variable holds the number of MAC operations performed */
+  count = srcBLen - 1u;
+
+  /* Working pointer of inputA */
+  pSrc1 = pIn1 + (srcALen - (srcBLen - 1u));
+  px = pSrc1;
+
+  /* Working pointer of inputB */
+  py = pIn2;
+
+  /* -------------------   
+   * Stage3 process   
+   * ------------------*/
+
+  while(blockSize3 > 0u)
+  {
+    /* Accumulator is made zero for every iteration */
+    sum = 0;
+
+    /* Apply loop unrolling and compute 4 MACs simultaneously. */
+    k = count >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 MACs at a time.   
+     ** a second loop below computes MACs for the remaining 1 to 3 samples. */
+    while(k > 0u)
+    {
+      /* x[srcALen - srcBLen + 1] , x[srcALen - srcBLen + 2]  */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* y[0] , y[1] */
+      in1 = (q15_t) * py++;
+      in2 = (q15_t) * py++;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* sum += x[srcALen - srcBLen + 1] * y[0] */
+      /* sum += x[srcALen - srcBLen + 2] * y[1] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* x[srcALen - srcBLen + 3] , x[srcALen - srcBLen + 4] */
+      in1 = (q15_t) * px++;
+      in2 = (q15_t) * px++;
+      input1 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* y[2] , y[3] */
+      in1 = (q15_t) * py++;
+      in2 = (q15_t) * py++;
+      input2 = ((q31_t) in1 & 0x0000FFFF) | ((q31_t) in2 << 16);
+
+      /* sum += x[srcALen - srcBLen + 3] * y[2] */
+      /* sum += x[srcALen - srcBLen + 4] * y[3] */
+      sum = __SMLAD(input1, input2, sum);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* If the count is not a multiple of 4, compute any remaining MACs here.   
+     ** No loop unrolling is used. */
+    k = count % 0x4u;
+
+    while(k > 0u)
+    {
+      /* Perform the multiply-accumulates */
+      sum += ((q15_t) * px++ * *py++);
+
+      /* Decrement the loop counter */
+      k--;
+    }
+
+    /* Store the result in the accumulator in the destination buffer. */
+    *pOut = (q7_t) (__SSAT(sum >> 7, 8));
+    /* Destination pointer is updated according to the address modifier, inc */
+    pOut += inc;
+
+    /* Update the inputA and inputB pointers for next MAC calculation */
+    px = ++pSrc1;
+    py = pIn2;
+
+    /* Decrement the MAC count */
+    count--;
+
+    /* Decrement the loop counter */
+    blockSize3--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  q7_t *pIn1 = pSrcA;                            /* inputA pointer */
+  q7_t *pIn2 = pSrcB + (srcBLen - 1u);           /* inputB pointer */
+  q31_t sum;                                     /* Accumulator */
+  uint32_t i = 0u, j;                            /* loop counters */
+  uint32_t inv = 0u;                             /* Reverse order flag */
+  uint32_t tot = 0u;                             /* Length */
+
+  /* The algorithm implementation is based on the lengths of the inputs. */
+  /* srcB is always made to slide across srcA. */
+  /* So srcBLen is always considered as shorter or equal to srcALen */
+  /* But CORR(x, y) is reverse of CORR(y, x) */
+  /* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
+  /* and a varaible, inv is set to 1 */
+  /* If lengths are not equal then zero pad has to be done to  make the two   
+   * inputs of same length. But to improve the performance, we include zeroes   
+   * in the output instead of zero padding either of the the inputs*/
+  /* If srcALen > srcBLen, (srcALen - srcBLen) zeroes has to included in the   
+   * starting of the output buffer */
+  /* If srcALen < srcBLen, (srcALen - srcBLen) zeroes has to included in the  
+   * ending of the output buffer */
+  /* Once the zero padding is done the remaining of the output is calcualted  
+   * using convolution but with the shorter signal time shifted. */
+
+  /* Calculate the length of the remaining sequence */
+  tot = ((srcALen + srcBLen) - 2u);
+
+  if(srcALen > srcBLen)
+  {
+    /* Calculating the number of zeros to be padded to the output */
+    j = srcALen - srcBLen;
+
+    /* Initialise the pointer after zero padding */
+    pDst += j;
+  }
+
+  else if(srcALen < srcBLen)
+  {
+    /* Initialization to inputB pointer */
+    pIn1 = pSrcB;
+
+    /* Initialization to the end of inputA pointer */
+    pIn2 = pSrcA + (srcALen - 1u);
+
+    /* Initialisation of the pointer after zero padding */
+    pDst = pDst + tot;
+
+    /* Swapping the lengths */
+    j = srcALen;
+    srcALen = srcBLen;
+    srcBLen = j;
+
+    /* Setting the reverse flag */
+    inv = 1;
+
+  }
+
+  /* Loop to calculate convolution for output length number of times */
+  for (i = 0u; i <= tot; i++)
+  {
+    /* Initialize sum with zero to carry on MAC operations */
+    sum = 0;
+
+    /* Loop to perform MAC operations according to convolution equation */
+    for (j = 0u; j <= i; j++)
+    {
+      /* Check the array limitations */
+      if((((i - j) < srcBLen) && (j < srcALen)))
+      {
+        /* z[i] += x[i-j] * y[j] */
+        sum += ((q15_t) pIn1[j] * pIn2[-((int32_t) i - j)]);
+      }
+    }
+    /* Store the output in the destination buffer */
+    if(inv == 1)
+      *pDst-- = (q7_t) __SSAT((sum >> 7u), 8u);
+    else
+      *pDst++ = (q7_t) __SSAT((sum >> 7u), 8u);
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**   
+ * @} end of Corr group   
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,524 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_decimate_f32.c    
+*    
+* Description:	FIR decimation for floating-point sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup FIR_decimate Finite Impulse Response (FIR) Decimator    
+ *    
+ * These functions combine an FIR filter together with a decimator.    
+ * They are used in multirate systems for reducing the sample rate of a signal without introducing aliasing distortion.    
+ * Conceptually, the functions are equivalent to the block diagram below:    
+ * \image html FIRDecimator.gif "Components included in the FIR Decimator functions"    
+ * When decimating by a factor of <code>M</code>, the signal should be prefiltered by a lowpass filter with a normalized    
+ * cutoff frequency of <code>1/M</code> in order to prevent aliasing distortion.    
+ * The user of the function is responsible for providing the filter coefficients.    
+ *    
+ * The FIR decimator functions provided in the CMSIS DSP Library combine the FIR filter and the decimator in an efficient manner.    
+ * Instead of calculating all of the FIR filter outputs and discarding <code>M-1</code> out of every <code>M</code>, only the    
+ * samples output by the decimator are computed.    
+ * The functions operate on blocks of input and output data.    
+ * <code>pSrc</code> points to an array of <code>blockSize</code> input values and    
+ * <code>pDst</code> points to an array of <code>blockSize/M</code> output values.    
+ * In order to have an integer number of output samples <code>blockSize</code>    
+ * must always be a multiple of the decimation factor <code>M</code>.    
+ *    
+ * The library provides separate functions for Q15, Q31 and floating-point data types.    
+ *    
+ * \par Algorithm:    
+ * The FIR portion of the algorithm uses the standard form filter:    
+ * <pre>    
+ *    y[n] = b[0] * x[n] + b[1] * x[n-1] + b[2] * x[n-2] + ...+ b[numTaps-1] * x[n-numTaps+1]    
+ * </pre>    
+ * where, <code>b[n]</code> are the filter coefficients.    
+ * \par   
+ * The <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>.    
+ * Coefficients are stored in time reversed order.    
+ * \par    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to a state array of size <code>numTaps + blockSize - 1</code>.    
+ * Samples in the state buffer are stored in the order:    
+ * \par    
+ * <pre>    
+ *    {x[n-numTaps+1], x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2]....x[0], x[1], ..., x[blockSize-1]}    
+ * </pre>    
+ * The state variables are updated after each block of data is processed, the coefficients are untouched.    
+ *    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient arrays may be shared among several instances while state variable array should be allocated separately.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+ *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * - Checks to make sure that the size of the input is a multiple of the decimation factor.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numTaps, pCoeffs, M (decimation factor), pState. Also set all of the values in pState to zero. 
+ *    
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * The code below statically initializes each of the 3 different data type filter instance structures    
+ * <pre>    
+ *arm_fir_decimate_instance_f32 S = {M, numTaps, pCoeffs, pState};    
+ *arm_fir_decimate_instance_q31 S = {M, numTaps, pCoeffs, pState};    
+ *arm_fir_decimate_instance_q15 S = {M, numTaps, pCoeffs, pState};    
+ * </pre>    
+ * where <code>M</code> is the decimation factor; <code>numTaps</code> is the number of filter coefficients in the filter;    
+ * <code>pCoeffs</code> is the address of the coefficient buffer;    
+ * <code>pState</code> is the address of the state buffer.    
+ * Be sure to set the values in the state buffer to zeros when doing static initialization.    
+ *    
+ * \par Fixed-Point Behavior    
+ * Care must be taken when using the fixed-point versions of the FIR decimate filter functions.    
+ * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.    
+ * Refer to the function specific documentation below for usage guidelines.    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+  /**    
+   * @brief Processing function for the floating-point FIR decimator.    
+   * @param[in] *S        points to an instance of the floating-point FIR decimator structure.    
+   * @param[in] *pSrc     points to the block of input data.    
+   * @param[out] *pDst    points to the block of output data.    
+   * @param[in] blockSize number of input samples to process per call.    
+   * @return none.    
+   */
+
+void arm_fir_decimate_f32(
+  const arm_fir_decimate_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pState = S->pState;                 /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+  float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
+  float32_t sum0;                                /* Accumulator */
+  float32_t x0, c0;                              /* Temporary variables to hold state and coefficient values */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t i, tapCnt, blkCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  uint32_t blkCntN4;
+  float32_t *px0, *px1, *px2, *px3;
+  float32_t acc0, acc1, acc2, acc3;
+  float32_t x1, x2, x3;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize / 4;
+  blkCntN4 = outBlockSize - (4 * blkCnt);
+
+  while(blkCnt > 0u)
+  {
+    /* Copy 4 * decimation factor number of new input samples into the state buffer */
+    i = 4 * S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulators to zero */
+    acc0 = 0.0f;
+    acc1 = 0.0f;
+    acc2 = 0.0f;
+    acc3 = 0.0f;
+
+    /* Initialize state pointer for all the samples */
+    px0 = pState;
+    px1 = pState + S->M;
+    px2 = pState + 2 * S->M;
+    px3 = pState + 3 * S->M;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+
+    while(tapCnt > 0u)
+    {
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-1] sample for acc0 */
+      x0 = *(px0++);
+      /* Read x[n-numTaps-1] sample for acc1 */
+      x1 = *(px1++);
+      /* Read x[n-numTaps-1] sample for acc2 */
+      x2 = *(px2++);
+      /* Read x[n-numTaps-1] sample for acc3 */
+      x3 = *(px3++);
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+      acc2 += x2 * c0;
+      acc3 += x3 * c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-2] sample for acc0, acc1, acc2, acc3 */
+      x0 = *(px0++);
+      x1 = *(px1++);
+      x2 = *(px2++);
+      x3 = *(px3++);
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+      acc2 += x2 * c0;
+      acc3 += x3 * c0;
+
+      /* Read the b[numTaps-3] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-3] sample acc0, acc1, acc2, acc3 */
+      x0 = *(px0++);
+      x1 = *(px1++);
+      x2 = *(px2++);
+      x3 = *(px3++);
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+      acc2 += x2 * c0;
+      acc3 += x3 * c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-4] sample acc0, acc1, acc2, acc3 */
+      x0 = *(px0++);
+      x1 = *(px1++);
+      x2 = *(px2++);
+      x3 = *(px3++);
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+      acc2 += x2 * c0;
+      acc3 += x3 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch  state variables for acc0, acc1, acc2, acc3 */
+      x0 = *(px0++);
+      x1 = *(px1++);
+      x2 = *(px2++);
+      x3 = *(px3++);
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+      acc2 += x2 * c0;
+      acc3 += x3 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + 4 * S->M;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = acc0;
+    *pDst++ = acc1;
+    *pDst++ = acc2;
+    *pDst++ = acc3;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  while(blkCntN4 > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    sum0 = 0.0f;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-1] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-2] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-3] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-3] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = sum0;
+
+    /* Decrement the loop counter */
+    blkCntN4--;
+  }
+
+  /* Processing is complete.    
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.    
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    sum0 = 0.0f;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor           
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = sum0;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the start of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Copy numTaps number of values */
+  i = (numTaps - 1u);
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY        */
+
+}
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,598 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_decimate_fast_q15.c    
+*    
+* Description:	Fast Q15 FIR Decimator.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q15 FIR decimator (fast variant) for Cortex-M3 and Cortex-M4.    
+ * @param[in] *S points to an instance of the Q15 FIR decimator structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none    
+ *    
+ * \par Restrictions   
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE   
+ *	In this case input, output, state buffers should be aligned by 32-bit   
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * This fast version uses a 32-bit accumulator with 2.30 format.    
+ * The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around and distorts the result.    
+ * In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits (log2 is read as log to the base 2).    
+ * The 2.30 accumulator is then truncated to 2.15 format and saturated to yield the 1.15 result.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_decimate_q15()</code> for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.    
+ * Both the slow and the fast versions use the same instance structure.    
+ * Use the function <code>arm_fir_decimate_init_q15()</code> to initialize the filter structure.    
+ */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+void arm_fir_decimate_fast_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px;                                     /* Temporary pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer coefficient buffer */
+  q31_t x0, x1, c0, c1;                          /* Temporary variables to hold state and coefficient values */
+  q31_t sum0;                                    /* Accumulators */
+  q31_t acc0, acc1;
+  q15_t *px0, *px1;
+  uint32_t blkCntN3;
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, blkCnt, tapCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize / 2;
+  blkCntN3 = outBlockSize - (2 * blkCnt);
+
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = 2 * S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    acc0 = 0;
+    acc1 = 0;
+
+    /* Initialize state pointer */
+    px0 = pState;
+
+    px1 = pState + S->M;
+
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] and b[numTaps-2]  coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Read x[n-numTaps-1] and x[n-numTaps-2]sample */
+      x0 = *__SIMD32(px0)++;
+
+      x1 = *__SIMD32(px1)++;
+
+      /* Perform the multiply-accumulate */
+      acc0 = __SMLAD(x0, c0, acc0);
+
+      acc1 = __SMLAD(x1, c0, acc1);
+
+      /* Read the b[numTaps-3] and b[numTaps-4] coefficient */
+      c0 = *__SIMD32(pb)++;
+
+      /* Read x[n-numTaps-2] and x[n-numTaps-3] sample */
+      x0 = *__SIMD32(px0)++;
+
+      x1 = *__SIMD32(px1)++;
+
+      /* Perform the multiply-accumulate */
+      acc0 = __SMLAD(x0, c0, acc0);
+
+      acc1 = __SMLAD(x1, c0, acc1);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px0++;
+
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 = __SMLAD(x0, c0, acc0);
+      acc1 = __SMLAD(x1, c0, acc1);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M * 2;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+    *pDst++ = (q15_t) (__SSAT((acc1 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+
+  while(blkCntN3 > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /*Set sum to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] and b[numTaps-2]  coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Read x[n-numTaps-1] and x[n-numTaps-2]sample */
+      x0 = *__SIMD32(px)++;
+
+      /* Read the b[numTaps-3] and b[numTaps-4] coefficient */
+      c1 = *__SIMD32(pb)++;
+
+      /* Perform the multiply-accumulate */
+      sum0 = __SMLAD(x0, c0, sum0);
+
+      /* Read x[n-numTaps-2] and x[n-numTaps-3] sample */
+      x0 = *__SIMD32(px)++;
+
+      /* Perform the multiply-accumulate */
+      sum0 = __SMLAD(x0, c1, sum0);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 = __SMLAD(x0, c0, sum0);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((sum0 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCntN3--;
+  }
+
+  /* Processing is complete.       
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+}
+
+#else
+
+
+void arm_fir_decimate_fast_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px;                                     /* Temporary pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer coefficient buffer */
+  q15_t x0, x1, c0;                              /* Temporary variables to hold state and coefficient values */
+  q31_t sum0;                                    /* Accumulators */
+  q31_t acc0, acc1;
+  q15_t *px0, *px1;
+  uint32_t blkCntN3;
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, blkCnt, tapCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize / 2;
+  blkCntN3 = outBlockSize - (2 * blkCnt);
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = 2 * S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    acc0 = 0;
+    acc1 = 0;
+
+    /* Initialize state pointer */
+    px0 = pState;
+
+    px1 = pState + S->M;
+
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-1] for sample 0 and for sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-2] for sample 0 and sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Read the b[numTaps-3]  coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-3] for sample 0 and sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-4] for sample 0 and sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M * 2;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+    *pDst++ = (q15_t) (__SSAT((acc1 >> 15), 16));
+
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  while(blkCntN3 > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /*Set sum to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-1] and sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-2] and  sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-3]  coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-3] sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((sum0 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCntN3--;
+  }
+
+  /* Processing is complete.       
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+}
+
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,351 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_decimate_fast_q31.c    
+*    
+* Description:	Fast Q31 FIR Decimator.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q31 FIR decimator (fast variant) for Cortex-M3 and Cortex-M4.    
+ * @param[in] *S points to an instance of the Q31 FIR decimator structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * This function is optimized for speed at the expense of fixed-point precision and overflow protection.    
+ * The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format.    
+ * These intermediate results are added to a 2.30 accumulator.    
+ * Finally, the accumulator is saturated and converted to a 1.31 result.    
+ * The fast version has the same overflow behavior as the standard version and provides less precision since it discards the low 32 bits of each multiplication result.    
+ * In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits (where log2 is read as log to the base 2).    
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_decimate_q31()</code> for a slower implementation of this function which uses a 64-bit accumulator to provide higher precision.    
+ * Both the slow and the fast versions use the same instance structure.    
+ * Use the function <code>arm_fir_decimate_init_q31()</code> to initialize the filter structure.    
+ */
+
+void arm_fir_decimate_fast_q31(
+  arm_fir_decimate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  q31_t *px;                                     /* Temporary pointers for state buffer */
+  q31_t *pb;                                     /* Temporary pointers for coefficient buffer */
+  q31_t sum0;                                    /* Accumulator */
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, tapCnt, blkCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+  uint32_t blkCntN2;
+  q31_t x1;
+  q31_t acc0, acc1;
+  q31_t *px0, *px1;
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Total number of output samples to be computed */
+
+  blkCnt = outBlockSize / 2;
+  blkCntN2 = outBlockSize - (2 * blkCnt);
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = 2 * S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    acc0 = 0;
+    acc1 = 0;
+
+    /* Initialize state pointer */
+    px0 = pState;
+    px1 = pState + S->M;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb);
+
+      /* Read x[n-numTaps-1] for sample 0 sample 1 */
+      x0 = *(px0);
+      x1 = *(px1);
+
+      /* Perform the multiply-accumulate */
+      acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+      acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb + 1u);
+
+      /* Read x[n-numTaps-2]  for sample 0 sample 1  */
+      x0 = *(px0 + 1u);
+      x1 = *(px1 + 1u);
+
+      /* Perform the multiply-accumulate */
+      acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+      acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+      /* Read the b[numTaps-3] coefficient */
+      c0 = *(pb + 2u);
+
+      /* Read x[n-numTaps-3]  for sample 0 sample 1 */
+      x0 = *(px0 + 2u);
+      x1 = *(px1 + 2u);
+      pb += 4u;
+
+      /* Perform the multiply-accumulate */
+      acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+      acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *(pb - 1u);
+
+      /* Read x[n-numTaps-4] for sample 0 sample 1 */
+      x0 = *(px0 + 3u);
+      x1 = *(px1 + 3u);
+
+
+      /* Perform the multiply-accumulate */
+      acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+      acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+      /* update state pointers */
+      px0 += 4u;
+      px1 += 4u;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x0 = *(px0++);
+      x1 = *(px1++);
+
+      /* Perform the multiply-accumulate */
+      acc0 = (q31_t) ((((q63_t) acc0 << 32) + ((q63_t) x0 * c0)) >> 32);
+      acc1 = (q31_t) ((((q63_t) acc1 << 32) + ((q63_t) x1 * c0)) >> 32);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M * 2;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = (q31_t) (acc0 << 1);
+    *pDst++ = (q31_t) (acc1 << 1);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  while(blkCntN2 > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-1] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 = (q31_t) ((((q63_t) sum0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-2] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 = (q31_t) ((((q63_t) sum0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+      /* Read the b[numTaps-3] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-3] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 = (q31_t) ((((q63_t) sum0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 = (q31_t) ((((q63_t) sum0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 = (q31_t) ((((q63_t) sum0 << 32) + ((q63_t) x0 * c0)) >> 32);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = (q31_t) (sum0 << 1);
+
+    /* Decrement the loop counter */
+    blkCntN2--;
+  }
+
+  /* Processing is complete.       
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+}
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,117 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_decimate_init_f32.c    
+*    
+* Description:  Floating-point FIR Decimator initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the floating-point FIR decimator.    
+ * @param[in,out] *S points to an instance of the floating-point FIR decimator structure.    
+ * @param[in] numTaps  number of coefficients in the filter.    
+ * @param[in] M  decimation factor.    
+ * @param[in] *pCoeffs points to the filter coefficients.    
+ * @param[in] *pState points to the state buffer.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return    The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_LENGTH_ERROR if    
+ * <code>blockSize</code> is not a multiple of <code>M</code>.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize-1</code> words where <code>blockSize</code> is the number of input samples passed to <code>arm_fir_decimate_f32()</code>.    
+ * <code>M</code> is the decimation factor.    
+ */
+
+arm_status arm_fir_decimate_init_f32(
+  arm_fir_decimate_instance_f32 * S,
+  uint16_t numTaps,
+  uint8_t M,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  uint32_t blockSize)
+{
+  arm_status status;
+
+  /* The size of the input block must be a multiple of the decimation factor */
+  if((blockSize % M) != 0u)
+  {
+    /* Set status as ARM_MATH_LENGTH_ERROR */
+    status = ARM_MATH_LENGTH_ERROR;
+  }
+  else
+  {
+    /* Assign filter taps */
+    S->numTaps = numTaps;
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Clear state buffer and size is always (blockSize + numTaps - 1) */
+    memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(float32_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    /* Assign Decimation Factor */
+    S->M = M;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+}
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,119 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_decimate_init_q15.c    
+*    
+* Description:  Initialization function for the Q15 FIR Decimator.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the Q15 FIR decimator.    
+ * @param[in,out] *S points to an instance of the Q15 FIR decimator structure.    
+ * @param[in] numTaps  number of coefficients in the filter.    
+ * @param[in] M  decimation factor.    
+ * @param[in] *pCoeffs points to the filter coefficients.    
+ * @param[in] *pState points to the state buffer.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return    The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_LENGTH_ERROR if    
+ * <code>blockSize</code> is not a multiple of <code>M</code>.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize-1</code> words where <code>blockSize</code> is the number of input samples    
+ * to the call <code>arm_fir_decimate_q15()</code>.    
+ * <code>M</code> is the decimation factor.    
+ */
+
+arm_status arm_fir_decimate_init_q15(
+  arm_fir_decimate_instance_q15 * S,
+  uint16_t numTaps,
+  uint8_t M,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  uint32_t blockSize)
+{
+
+  arm_status status;
+
+  /* The size of the input block must be a multiple of the decimation factor */
+  if((blockSize % M) != 0u)
+  {
+    /* Set status as ARM_MATH_LENGTH_ERROR */
+    status = ARM_MATH_LENGTH_ERROR;
+  }
+  else
+  {
+    /* Assign filter taps */
+    S->numTaps = numTaps;
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Clear the state buffer.  The size of buffer is always (blockSize + numTaps - 1) */
+    memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(q15_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    /* Assign Decimation factor */
+    S->M = M;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+}
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,117 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_decimate_init_q31.c    
+*    
+* Description:  Initialization function for Q31 FIR Decimation filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the Q31 FIR decimator.    
+ * @param[in,out] *S points to an instance of the Q31 FIR decimator structure.    
+ * @param[in] numTaps  number of coefficients in the filter.    
+ * @param[in] M  decimation factor.    
+ * @param[in] *pCoeffs points to the filter coefficients.    
+ * @param[in] *pState points to the state buffer.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return    The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_LENGTH_ERROR if    
+ * <code>blockSize</code> is not a multiple of <code>M</code>.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize-1</code> words where <code>blockSize</code> is the number of input samples passed to <code>arm_fir_decimate_q31()</code>.    
+ * <code>M</code> is the decimation factor.    
+ */
+
+arm_status arm_fir_decimate_init_q31(
+  arm_fir_decimate_instance_q31 * S,
+  uint16_t numTaps,
+  uint8_t M,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  uint32_t blockSize)
+{
+  arm_status status;
+
+  /* The size of the input block must be a multiple of the decimation factor */
+  if((blockSize % M) != 0u)
+  {
+    /* Set status as ARM_MATH_LENGTH_ERROR */
+    status = ARM_MATH_LENGTH_ERROR;
+  }
+  else
+  {
+    /* Assign filter taps */
+    S->numTaps = numTaps;
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Clear the state buffer.  The size is always (blockSize + numTaps - 1) */
+    memset(pState, 0, (numTaps + (blockSize - 1)) * sizeof(q31_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    /* Assign Decimation factor */
+    S->M = M;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+}
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,696 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_decimate_q15.c    
+*    
+* Description:	Q15 FIR Decimator.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q15 FIR decimator.    
+ * @param[in] *S points to an instance of the Q15 FIR decimator structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the location where the output result is written.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits.    
+ * Lastly, the accumulator is saturated to yield a result in 1.15 format.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_decimate_fast_q15()</code> for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.    
+ */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+void arm_fir_decimate_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px;                                     /* Temporary pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer coefficient buffer */
+  q31_t x0, x1, c0, c1;                          /* Temporary variables to hold state and coefficient values */
+  q63_t sum0;                                    /* Accumulators */
+  q63_t acc0, acc1;
+  q15_t *px0, *px1;
+  uint32_t blkCntN3;
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, blkCnt, tapCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize / 2;
+  blkCntN3 = outBlockSize - (2 * blkCnt);
+
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = 2 * S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    acc0 = 0;
+    acc1 = 0;
+
+    /* Initialize state pointer */
+    px0 = pState;
+
+    px1 = pState + S->M;
+
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] and b[numTaps-2]  coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Read x[n-numTaps-1] and x[n-numTaps-2]sample */
+      x0 = *__SIMD32(px0)++;
+
+      x1 = *__SIMD32(px1)++;
+
+      /* Perform the multiply-accumulate */
+      acc0 = __SMLALD(x0, c0, acc0);
+
+      acc1 = __SMLALD(x1, c0, acc1);
+
+      /* Read the b[numTaps-3] and b[numTaps-4] coefficient */
+      c0 = *__SIMD32(pb)++;
+
+      /* Read x[n-numTaps-2] and x[n-numTaps-3] sample */
+      x0 = *__SIMD32(px0)++;
+
+      x1 = *__SIMD32(px1)++;
+
+      /* Perform the multiply-accumulate */
+      acc0 = __SMLALD(x0, c0, acc0);
+
+      acc1 = __SMLALD(x1, c0, acc1);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px0++;
+
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 = __SMLALD(x0, c0, acc0);
+      acc1 = __SMLALD(x1, c0, acc1);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M * 2;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+    *pDst++ = (q15_t) (__SSAT((acc1 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+
+  while(blkCntN3 > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /*Set sum to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] and b[numTaps-2]  coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Read x[n-numTaps-1] and x[n-numTaps-2]sample */
+      x0 = *__SIMD32(px)++;
+
+      /* Read the b[numTaps-3] and b[numTaps-4] coefficient */
+      c1 = *__SIMD32(pb)++;
+
+      /* Perform the multiply-accumulate */
+      sum0 = __SMLALD(x0, c0, sum0);
+
+      /* Read x[n-numTaps-2] and x[n-numTaps-3] sample */
+      x0 = *__SIMD32(px)++;
+
+      /* Perform the multiply-accumulate */
+      sum0 = __SMLALD(x0, c1, sum0);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 = __SMLALD(x0, c0, sum0);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((sum0 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCntN3--;
+  }
+
+  /* Processing is complete.       
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+}
+
+#else
+
+
+void arm_fir_decimate_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px;                                     /* Temporary pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer coefficient buffer */
+  q15_t x0, x1, c0;                              /* Temporary variables to hold state and coefficient values */
+  q63_t sum0;                                    /* Accumulators */
+  q63_t acc0, acc1;
+  q15_t *px0, *px1;
+  uint32_t blkCntN3;
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, blkCnt, tapCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize / 2;
+  blkCntN3 = outBlockSize - (2 * blkCnt);
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = 2 * S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    acc0 = 0;
+    acc1 = 0;
+
+    /* Initialize state pointer */
+    px0 = pState;
+
+    px1 = pState + S->M;
+
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-1] for sample 0 and for sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-2] for sample 0 and sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Read the b[numTaps-3] coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-3] for sample 0 and sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-4] for sample 0 and sample 1 */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px0++;
+      x1 = *px1++;
+
+      /* Perform the multiply-accumulate */
+      acc0 += x0 * c0;
+      acc1 += x1 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M * 2;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+    *pDst++ = (q15_t) (__SSAT((acc1 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  while(blkCntN3 > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /*Set sum to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the Read b[numTaps-1] coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-1] and sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-2] and  sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-3]  coefficients */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-3] sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *pb++;
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += x0 * c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor       
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* Store filter output, smlad returns the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((sum0 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCntN3--;
+  }
+
+  /* Processing is complete.       
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+}
+
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+#else
+
+
+void arm_fir_decimate_q15(
+  const arm_fir_decimate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px;                                     /* Temporary pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer coefficient buffer */
+  q31_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  q63_t sum0;                                    /* Accumulators */
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, blkCnt, tapCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+
+
+/* Run the below code for Cortex-M0 */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /*Set sum to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q31_t) x0 *c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor           
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /*Store filter output , smlad will return the values in 2.14 format */
+    /* so downsacle by 15 to get output in 1.15 */
+    *pDst++ = (q15_t) (__SSAT((sum0 >> 15), 16));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the start of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = numTaps - 1u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+
+}
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_decimate_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,311 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_decimate_q31.c    
+*    
+* Description:	Q31 FIR Decimator.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_decimate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q31 FIR decimator.    
+ * @param[in] *S points to an instance of the Q31 FIR decimator structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around rather than clip.    
+ * In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits (where log2 is read as log to the base 2).    
+ * After all multiply-accumulates are performed, the 2.62 accumulator is truncated to 1.32 format and then saturated to 1.31 format.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_decimate_fast_q31()</code> for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.    
+ */
+
+void arm_fir_decimate_q31(
+  const arm_fir_decimate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  q31_t *px;                                     /* Temporary pointers for state buffer */
+  q31_t *pb;                                     /* Temporary pointers for coefficient buffer */
+  q63_t sum0;                                    /* Accumulator */
+  uint32_t numTaps = S->numTaps;                 /* Number of taps */
+  uint32_t i, tapCnt, blkCnt, outBlockSize = blockSize / S->M;  /* Loop counters */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.    
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    while(tapCnt > 0u)
+    {
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-1] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q63_t) x0 *c0;
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-2] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q63_t) x0 *c0;
+
+      /* Read the b[numTaps-3] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-3] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q63_t) x0 *c0;
+
+      /* Read the b[numTaps-4] coefficient */
+      c0 = *(pb++);
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q63_t) x0 *c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q63_t) x0 *c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor    
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = (q31_t) (sum0 >> 31);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.    
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.    
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+  i = (numTaps - 1u) % 0x04u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Total number of output samples to be computed */
+  blkCnt = outBlockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy decimation factor number of new input samples into the state buffer */
+    i = S->M;
+
+    do
+    {
+      *pStateCurnt++ = *pSrc++;
+
+    } while(--i);
+
+    /* Set accumulator to zero */
+    sum0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = pCoeffs;
+
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Read coefficients */
+      c0 = *pb++;
+
+      /* Fetch 1 state variable */
+      x0 = *px++;
+
+      /* Perform the multiply-accumulate */
+      sum0 += (q63_t) x0 *c0;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance the state pointer by the decimation factor           
+     * to process the next group of decimation factor number samples */
+    pState = pState + S->M;
+
+    /* The result is in the accumulator, store in the destination buffer. */
+    *pDst++ = (q31_t) (sum0 >> 31);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the start of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = numTaps - 1u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR_decimate group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,997 @@
+/* ----------------------------------------------------------------------  
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.  
+*  
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*  
+* Project: 	    CMSIS DSP Library  
+* Title:	    arm_fir_f32.c  
+*  
+* Description:	Floating-point FIR filter processing function.  
+*  
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**  
+* @ingroup groupFilters  
+*/
+
+/**  
+* @defgroup FIR Finite Impulse Response (FIR) Filters  
+*  
+* This set of functions implements Finite Impulse Response (FIR) filters  
+* for Q7, Q15, Q31, and floating-point data types.  Fast versions of Q15 and Q31 are also provided.  
+* The functions operate on blocks of input and output data and each call to the function processes  
+* <code>blockSize</code> samples through the filter.  <code>pSrc</code> and  
+* <code>pDst</code> points to input and output arrays containing <code>blockSize</code> values.  
+*  
+* \par Algorithm:  
+* The FIR filter algorithm is based upon a sequence of multiply-accumulate (MAC) operations.  
+* Each filter coefficient <code>b[n]</code> is multiplied by a state variable which equals a previous input sample <code>x[n]</code>.  
+* <pre>  
+*    y[n] = b[0] * x[n] + b[1] * x[n-1] + b[2] * x[n-2] + ...+ b[numTaps-1] * x[n-numTaps+1]  
+* </pre>  
+* \par  
+* \image html FIR.gif "Finite Impulse Response filter"  
+* \par  
+* <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>.  
+* Coefficients are stored in time reversed order.  
+* \par  
+* <pre>  
+*    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}  
+* </pre>  
+* \par  
+* <code>pState</code> points to a state array of size <code>numTaps + blockSize - 1</code>.  
+* Samples in the state buffer are stored in the following order.  
+* \par  
+* <pre>  
+*    {x[n-numTaps+1], x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2]....x[0], x[1], ..., x[blockSize-1]}  
+* </pre>  
+* \par  
+* Note that the length of the state buffer exceeds the length of the coefficient array by <code>blockSize-1</code>.  
+* The increased state buffer length allows circular addressing, which is traditionally used in the FIR filters,  
+* to be avoided and yields a significant speed improvement.  
+* The state variables are updated after each block of data is processed; the coefficients are untouched.  
+* \par Instance Structure  
+* The coefficients and state variables for a filter are stored together in an instance data structure.  
+* A separate instance structure must be defined for each filter.  
+* Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.  
+* There are separate instance structure declarations for each of the 4 supported data types.  
+*  
+* \par Initialization Functions  
+* There is also an associated initialization function for each data type.  
+* The initialization function performs the following operations:  
+* - Sets the values of the internal structure fields.  
+* - Zeros out the values in the state buffer.  
+* To do this manually without calling the init function, assign the follow subfields of the instance structure:
+* numTaps, pCoeffs, pState. Also set all of the values in pState to zero. 
+*  
+* \par  
+* Use of the initialization function is optional.  
+* However, if the initialization function is used, then the instance structure cannot be placed into a const data section.  
+* To place an instance structure into a const data section, the instance structure must be manually initialized.  
+* Set the values in the state buffer to zeros before static initialization.  
+* The code below statically initializes each of the 4 different data type filter instance structures  
+* <pre>  
+*arm_fir_instance_f32 S = {numTaps, pState, pCoeffs};  
+*arm_fir_instance_q31 S = {numTaps, pState, pCoeffs};  
+*arm_fir_instance_q15 S = {numTaps, pState, pCoeffs};  
+*arm_fir_instance_q7 S =  {numTaps, pState, pCoeffs};  
+* </pre>  
+*  
+* where <code>numTaps</code> is the number of filter coefficients in the filter; <code>pState</code> is the address of the state buffer;  
+* <code>pCoeffs</code> is the address of the coefficient buffer.  
+*  
+* \par Fixed-Point Behavior  
+* Care must be taken when using the fixed-point versions of the FIR filter functions.  
+* In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.  
+* Refer to the function specific documentation below for usage guidelines.  
+*/
+
+/**  
+* @addtogroup FIR  
+* @{  
+*/
+
+/**  
+*  
+* @param[in]  *S points to an instance of the floating-point FIR filter structure.  
+* @param[in]  *pSrc points to the block of input data.  
+* @param[out] *pDst points to the block of output data.  
+* @param[in]  blockSize number of samples to process per call.  
+* @return     none.  
+*  
+*/
+
+#if defined(ARM_MATH_CM7)
+
+void arm_fir_f32(
+const arm_fir_instance_f32 * S,
+float32_t * pSrc,
+float32_t * pDst,
+uint32_t blockSize)
+{
+   float32_t *pState = S->pState;                 /* State pointer */
+   float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+   float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+   float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
+   float32_t acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7;     /* Accumulators */
+   float32_t x0, x1, x2, x3, x4, x5, x6, x7, c0;  /* Temporary variables to hold state and coefficient values */
+   uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+   uint32_t i, tapCnt, blkCnt;                    /* Loop counters */
+
+   /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+   /* pStateCurnt points to the location where the new input data should be written */
+   pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+   /* Apply loop unrolling and compute 8 output values simultaneously.  
+    * The variables acc0 ... acc7 hold output values that are being computed:  
+    *  
+    *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]  
+    *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]  
+    *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]  
+    *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]  
+    */
+   blkCnt = blockSize >> 3;
+
+   /* First part of the processing with loop unrolling.  Compute 8 outputs at a time.  
+   ** a second loop below computes the remaining 1 to 7 samples. */
+   while(blkCnt > 0u)
+   {
+      /* Copy four new input samples into the state buffer */
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+
+      /* Set all accumulators to zero */
+      acc0 = 0.0f;
+      acc1 = 0.0f;
+      acc2 = 0.0f;
+      acc3 = 0.0f;
+      acc4 = 0.0f;
+      acc5 = 0.0f;
+      acc6 = 0.0f;
+      acc7 = 0.0f;		
+
+      /* Initialize state pointer */
+      px = pState;
+
+      /* Initialize coeff pointer */
+      pb = (pCoeffs);		
+   
+      /* This is separated from the others to avoid 
+       * a call to __aeabi_memmove which would be slower
+       */
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+
+      /* Read the first seven samples from the state buffer:  x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2] */
+      x0 = *px++;
+      x1 = *px++;
+      x2 = *px++;
+      x3 = *px++;
+      x4 = *px++;
+      x5 = *px++;
+      x6 = *px++;
+
+      /* Loop unrolling.  Process 8 taps at a time. */
+      tapCnt = numTaps >> 3u;
+      
+      /* Loop over the number of taps.  Unroll by a factor of 8.  
+       ** Repeat until we've computed numTaps-8 coefficients. */
+      while(tapCnt > 0u)
+      {
+         /* Read the b[numTaps-1] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-3] sample */
+         x7 = *(px++);
+
+         /* acc0 +=  b[numTaps-1] * x[n-numTaps] */
+         acc0 += x0 * c0;
+
+         /* acc1 +=  b[numTaps-1] * x[n-numTaps-1] */
+         acc1 += x1 * c0;
+
+         /* acc2 +=  b[numTaps-1] * x[n-numTaps-2] */
+         acc2 += x2 * c0;
+
+         /* acc3 +=  b[numTaps-1] * x[n-numTaps-3] */
+         acc3 += x3 * c0;
+
+         /* acc4 +=  b[numTaps-1] * x[n-numTaps-4] */
+         acc4 += x4 * c0;
+
+         /* acc1 +=  b[numTaps-1] * x[n-numTaps-5] */
+         acc5 += x5 * c0;
+
+         /* acc2 +=  b[numTaps-1] * x[n-numTaps-6] */
+         acc6 += x6 * c0;
+
+         /* acc3 +=  b[numTaps-1] * x[n-numTaps-7] */
+         acc7 += x7 * c0;
+         
+         /* Read the b[numTaps-2] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-4] sample */
+         x0 = *(px++);
+
+         /* Perform the multiply-accumulate */
+         acc0 += x1 * c0;
+         acc1 += x2 * c0;   
+         acc2 += x3 * c0;   
+         acc3 += x4 * c0;   
+         acc4 += x5 * c0;   
+         acc5 += x6 * c0;   
+         acc6 += x7 * c0;   
+         acc7 += x0 * c0;   
+         
+         /* Read the b[numTaps-3] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-5] sample */
+         x1 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         acc0 += x2 * c0;
+         acc1 += x3 * c0;   
+         acc2 += x4 * c0;   
+         acc3 += x5 * c0;   
+         acc4 += x6 * c0;   
+         acc5 += x7 * c0;   
+         acc6 += x0 * c0;   
+         acc7 += x1 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x2 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         acc0 += x3 * c0;
+         acc1 += x4 * c0;   
+         acc2 += x5 * c0;   
+         acc3 += x6 * c0;   
+         acc4 += x7 * c0;   
+         acc5 += x0 * c0;   
+         acc6 += x1 * c0;   
+         acc7 += x2 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x3 = *(px++);
+         /* Perform the multiply-accumulates */      
+         acc0 += x4 * c0;
+         acc1 += x5 * c0;   
+         acc2 += x6 * c0;   
+         acc3 += x7 * c0;   
+         acc4 += x0 * c0;   
+         acc5 += x1 * c0;   
+         acc6 += x2 * c0;   
+         acc7 += x3 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x4 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         acc0 += x5 * c0;
+         acc1 += x6 * c0;   
+         acc2 += x7 * c0;   
+         acc3 += x0 * c0;   
+         acc4 += x1 * c0;   
+         acc5 += x2 * c0;   
+         acc6 += x3 * c0;   
+         acc7 += x4 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x5 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         acc0 += x6 * c0;
+         acc1 += x7 * c0;   
+         acc2 += x0 * c0;   
+         acc3 += x1 * c0;   
+         acc4 += x2 * c0;   
+         acc5 += x3 * c0;   
+         acc6 += x4 * c0;   
+         acc7 += x5 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x6 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         acc0 += x7 * c0;
+         acc1 += x0 * c0;   
+         acc2 += x1 * c0;   
+         acc3 += x2 * c0;   
+         acc4 += x3 * c0;   
+         acc5 += x4 * c0;   
+         acc6 += x5 * c0;   
+         acc7 += x6 * c0;   
+
+         tapCnt--;
+      }
+
+      /* If the filter length is not a multiple of 8, compute the remaining filter taps */
+      tapCnt = numTaps % 0x8u;
+
+      while(tapCnt > 0u)
+      {
+         /* Read coefficients */
+         c0 = *(pb++);
+
+         /* Fetch 1 state variable */
+         x7 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         acc0 += x0 * c0;
+         acc1 += x1 * c0;   
+         acc2 += x2 * c0;   
+         acc3 += x3 * c0;   
+         acc4 += x4 * c0;   
+         acc5 += x5 * c0;   
+         acc6 += x6 * c0;   
+         acc7 += x7 * c0;   
+
+         /* Reuse the present sample states for next sample */
+         x0 = x1;
+         x1 = x2;
+         x2 = x3;
+         x3 = x4;
+         x4 = x5;
+         x5 = x6;
+         x6 = x7;
+
+         /* Decrement the loop counter */
+         tapCnt--;
+      }
+
+      /* Advance the state pointer by 8 to process the next group of 8 samples */
+      pState = pState + 8;
+
+      /* The results in the 8 accumulators, store in the destination buffer. */
+      *pDst++ = acc0;
+      *pDst++ = acc1;
+      *pDst++ = acc2;
+      *pDst++ = acc3;
+      *pDst++ = acc4;
+      *pDst++ = acc5;
+      *pDst++ = acc6;
+      *pDst++ = acc7;
+
+      blkCnt--;
+   }
+
+   /* If the blockSize is not a multiple of 8, compute any remaining output samples here.  
+   ** No loop unrolling is used. */
+   blkCnt = blockSize % 0x8u;
+
+   while(blkCnt > 0u)
+   {
+      /* Copy one sample at a time into state buffer */
+      *pStateCurnt++ = *pSrc++;
+
+      /* Set the accumulator to zero */
+      acc0 = 0.0f;
+
+      /* Initialize state pointer */
+      px = pState;
+
+      /* Initialize Coefficient pointer */
+      pb = (pCoeffs);
+
+      i = numTaps;
+
+      /* Perform the multiply-accumulates */
+      do
+      {
+         acc0 += *px++ * *pb++;
+         i--;
+
+      } while(i > 0u);
+
+      /* The result is store in the destination buffer. */
+      *pDst++ = acc0;
+
+      /* Advance state pointer by 1 for the next sample */
+      pState = pState + 1;
+
+      blkCnt--;
+   }
+
+   /* Processing is complete.  
+   ** Now copy the last numTaps - 1 samples to the start of the state buffer.  
+   ** This prepares the state buffer for the next function call. */
+
+   /* Points to the start of the state buffer */
+   pStateCurnt = S->pState;
+
+   tapCnt = (numTaps - 1u) >> 2u;
+
+   /* copy data */
+   while(tapCnt > 0u)
+   {
+      *pStateCurnt++ = *pState++;
+      *pStateCurnt++ = *pState++;
+      *pStateCurnt++ = *pState++;
+      *pStateCurnt++ = *pState++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+   }
+
+   /* Calculate remaining number of copies */
+   tapCnt = (numTaps - 1u) % 0x4u;
+
+   /* Copy the remaining q31_t data */
+   while(tapCnt > 0u)
+   {
+      *pStateCurnt++ = *pState++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+   }
+}
+
+#elif defined(ARM_MATH_CM0_FAMILY)
+
+void arm_fir_f32(
+const arm_fir_instance_f32 * S,
+float32_t * pSrc,
+float32_t * pDst,
+uint32_t blockSize)
+{
+   float32_t *pState = S->pState;                 /* State pointer */
+   float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+   float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+   float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
+   uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+   uint32_t i, tapCnt, blkCnt;                    /* Loop counters */
+
+   /* Run the below code for Cortex-M0 */
+
+   float32_t acc;
+
+   /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+   /* pStateCurnt points to the location where the new input data should be written */
+   pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+   /* Initialize blkCnt with blockSize */
+   blkCnt = blockSize;
+
+   while(blkCnt > 0u)
+   {
+      /* Copy one sample at a time into state buffer */
+      *pStateCurnt++ = *pSrc++;
+
+      /* Set the accumulator to zero */
+      acc = 0.0f;
+
+      /* Initialize state pointer */
+      px = pState;
+
+      /* Initialize Coefficient pointer */
+      pb = pCoeffs;
+
+      i = numTaps;
+
+      /* Perform the multiply-accumulates */
+      do
+      {
+         /* acc =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0] */
+         acc += *px++ * *pb++;
+         i--;
+
+      } while(i > 0u);
+
+      /* The result is store in the destination buffer. */
+      *pDst++ = acc;
+
+      /* Advance state pointer by 1 for the next sample */
+      pState = pState + 1;
+
+      blkCnt--;
+   }
+
+   /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the starting of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+   /* Points to the start of the state buffer */
+   pStateCurnt = S->pState;
+
+   /* Copy numTaps number of values */
+   tapCnt = numTaps - 1u;
+
+   /* Copy data */
+   while(tapCnt > 0u)
+   {
+      *pStateCurnt++ = *pState++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+   }
+
+}
+
+#else
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+void arm_fir_f32(
+const arm_fir_instance_f32 * S,
+float32_t * pSrc,
+float32_t * pDst,
+uint32_t blockSize)
+{
+   float32_t *pState = S->pState;                 /* State pointer */
+   float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+   float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+   float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
+   float32_t acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7;     /* Accumulators */
+   float32_t x0, x1, x2, x3, x4, x5, x6, x7, c0;  /* Temporary variables to hold state and coefficient values */
+   uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+   uint32_t i, tapCnt, blkCnt;                    /* Loop counters */
+   float32_t p0,p1,p2,p3,p4,p5,p6,p7;             /* Temporary product values */
+
+   /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+   /* pStateCurnt points to the location where the new input data should be written */
+   pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+   /* Apply loop unrolling and compute 8 output values simultaneously.  
+    * The variables acc0 ... acc7 hold output values that are being computed:  
+    *  
+    *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]  
+    *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]  
+    *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]  
+    *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]  
+    */
+   blkCnt = blockSize >> 3;
+
+   /* First part of the processing with loop unrolling.  Compute 8 outputs at a time.  
+   ** a second loop below computes the remaining 1 to 7 samples. */
+   while(blkCnt > 0u)
+   {
+      /* Copy four new input samples into the state buffer */
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+
+      /* Set all accumulators to zero */
+      acc0 = 0.0f;
+      acc1 = 0.0f;
+      acc2 = 0.0f;
+      acc3 = 0.0f;
+      acc4 = 0.0f;
+      acc5 = 0.0f;
+      acc6 = 0.0f;
+      acc7 = 0.0f;		
+
+      /* Initialize state pointer */
+      px = pState;
+
+      /* Initialize coeff pointer */
+      pb = (pCoeffs);		
+   
+      /* This is separated from the others to avoid 
+       * a call to __aeabi_memmove which would be slower
+       */
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+      *pStateCurnt++ = *pSrc++;
+
+      /* Read the first seven samples from the state buffer:  x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2] */
+      x0 = *px++;
+      x1 = *px++;
+      x2 = *px++;
+      x3 = *px++;
+      x4 = *px++;
+      x5 = *px++;
+      x6 = *px++;
+
+      /* Loop unrolling.  Process 8 taps at a time. */
+      tapCnt = numTaps >> 3u;
+      
+      /* Loop over the number of taps.  Unroll by a factor of 8.  
+       ** Repeat until we've computed numTaps-8 coefficients. */
+      while(tapCnt > 0u)
+      {
+         /* Read the b[numTaps-1] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-3] sample */
+         x7 = *(px++);
+
+         /* acc0 +=  b[numTaps-1] * x[n-numTaps] */
+         p0 = x0 * c0;
+
+         /* acc1 +=  b[numTaps-1] * x[n-numTaps-1] */
+         p1 = x1 * c0;
+
+         /* acc2 +=  b[numTaps-1] * x[n-numTaps-2] */
+         p2 = x2 * c0;
+
+         /* acc3 +=  b[numTaps-1] * x[n-numTaps-3] */
+         p3 = x3 * c0;
+
+         /* acc4 +=  b[numTaps-1] * x[n-numTaps-4] */
+         p4 = x4 * c0;
+
+         /* acc1 +=  b[numTaps-1] * x[n-numTaps-5] */
+         p5 = x5 * c0;
+
+         /* acc2 +=  b[numTaps-1] * x[n-numTaps-6] */
+         p6 = x6 * c0;
+
+         /* acc3 +=  b[numTaps-1] * x[n-numTaps-7] */
+         p7 = x7 * c0;
+         
+         /* Read the b[numTaps-2] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-4] sample */
+         x0 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+
+         /* Perform the multiply-accumulate */
+         p0 = x1 * c0;
+         p1 = x2 * c0;   
+         p2 = x3 * c0;   
+         p3 = x4 * c0;   
+         p4 = x5 * c0;   
+         p5 = x6 * c0;   
+         p6 = x7 * c0;   
+         p7 = x0 * c0;   
+         
+         /* Read the b[numTaps-3] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-5] sample */
+         x1 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Perform the multiply-accumulates */      
+         p0 = x2 * c0;
+         p1 = x3 * c0;   
+         p2 = x4 * c0;   
+         p3 = x5 * c0;   
+         p4 = x6 * c0;   
+         p5 = x7 * c0;   
+         p6 = x0 * c0;   
+         p7 = x1 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x2 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Perform the multiply-accumulates */      
+         p0 = x3 * c0;
+         p1 = x4 * c0;   
+         p2 = x5 * c0;   
+         p3 = x6 * c0;   
+         p4 = x7 * c0;   
+         p5 = x0 * c0;   
+         p6 = x1 * c0;   
+         p7 = x2 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x3 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Perform the multiply-accumulates */      
+         p0 = x4 * c0;
+         p1 = x5 * c0;   
+         p2 = x6 * c0;   
+         p3 = x7 * c0;   
+         p4 = x0 * c0;   
+         p5 = x1 * c0;   
+         p6 = x2 * c0;   
+         p7 = x3 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x4 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Perform the multiply-accumulates */      
+         p0 = x5 * c0;
+         p1 = x6 * c0;   
+         p2 = x7 * c0;   
+         p3 = x0 * c0;   
+         p4 = x1 * c0;   
+         p5 = x2 * c0;   
+         p6 = x3 * c0;   
+         p7 = x4 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x5 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Perform the multiply-accumulates */      
+         p0 = x6 * c0;
+         p1 = x7 * c0;   
+         p2 = x0 * c0;   
+         p3 = x1 * c0;   
+         p4 = x2 * c0;   
+         p5 = x3 * c0;   
+         p6 = x4 * c0;   
+         p7 = x5 * c0;   
+
+         /* Read the b[numTaps-4] coefficient */
+         c0 = *(pb++);
+
+         /* Read x[n-numTaps-6] sample */
+         x6 = *(px++);
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Perform the multiply-accumulates */      
+         p0 = x7 * c0;
+         p1 = x0 * c0;   
+         p2 = x1 * c0;   
+         p3 = x2 * c0;   
+         p4 = x3 * c0;   
+         p5 = x4 * c0;   
+         p6 = x5 * c0;   
+         p7 = x6 * c0;   
+
+         tapCnt--;
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+      }
+
+      /* If the filter length is not a multiple of 8, compute the remaining filter taps */
+      tapCnt = numTaps % 0x8u;
+
+      while(tapCnt > 0u)
+      {
+         /* Read coefficients */
+         c0 = *(pb++);
+
+         /* Fetch 1 state variable */
+         x7 = *(px++);
+
+         /* Perform the multiply-accumulates */      
+         p0 = x0 * c0;
+         p1 = x1 * c0;   
+         p2 = x2 * c0;   
+         p3 = x3 * c0;   
+         p4 = x4 * c0;   
+         p5 = x5 * c0;   
+         p6 = x6 * c0;   
+         p7 = x7 * c0;   
+
+         /* Reuse the present sample states for next sample */
+         x0 = x1;
+         x1 = x2;
+         x2 = x3;
+         x3 = x4;
+         x4 = x5;
+         x5 = x6;
+         x6 = x7;
+         
+         acc0 += p0;
+         acc1 += p1;
+         acc2 += p2;
+         acc3 += p3;
+         acc4 += p4;
+         acc5 += p5;
+         acc6 += p6;
+         acc7 += p7;
+
+         /* Decrement the loop counter */
+         tapCnt--;
+      }
+
+      /* Advance the state pointer by 8 to process the next group of 8 samples */
+      pState = pState + 8;
+
+      /* The results in the 8 accumulators, store in the destination buffer. */
+      *pDst++ = acc0;
+      *pDst++ = acc1;
+      *pDst++ = acc2;
+      *pDst++ = acc3;
+      *pDst++ = acc4;
+      *pDst++ = acc5;
+      *pDst++ = acc6;
+      *pDst++ = acc7;
+
+      blkCnt--;
+   }
+
+   /* If the blockSize is not a multiple of 8, compute any remaining output samples here.  
+   ** No loop unrolling is used. */
+   blkCnt = blockSize % 0x8u;
+
+   while(blkCnt > 0u)
+   {
+      /* Copy one sample at a time into state buffer */
+      *pStateCurnt++ = *pSrc++;
+
+      /* Set the accumulator to zero */
+      acc0 = 0.0f;
+
+      /* Initialize state pointer */
+      px = pState;
+
+      /* Initialize Coefficient pointer */
+      pb = (pCoeffs);
+
+      i = numTaps;
+
+      /* Perform the multiply-accumulates */
+      do
+      {
+         acc0 += *px++ * *pb++;
+         i--;
+
+      } while(i > 0u);
+
+      /* The result is store in the destination buffer. */
+      *pDst++ = acc0;
+
+      /* Advance state pointer by 1 for the next sample */
+      pState = pState + 1;
+
+      blkCnt--;
+   }
+
+   /* Processing is complete.  
+   ** Now copy the last numTaps - 1 samples to the start of the state buffer.  
+   ** This prepares the state buffer for the next function call. */
+
+   /* Points to the start of the state buffer */
+   pStateCurnt = S->pState;
+
+   tapCnt = (numTaps - 1u) >> 2u;
+
+   /* copy data */
+   while(tapCnt > 0u)
+   {
+      *pStateCurnt++ = *pState++;
+      *pStateCurnt++ = *pState++;
+      *pStateCurnt++ = *pState++;
+      *pStateCurnt++ = *pState++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+   }
+
+   /* Calculate remaining number of copies */
+   tapCnt = (numTaps - 1u) % 0x4u;
+
+   /* Copy the remaining q31_t data */
+   while(tapCnt > 0u)
+   {
+      *pStateCurnt++ = *pState++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+   }
+}
+
+#endif 
+
+/**  
+* @} end of FIR group  
+*/

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,345 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_fast_q15.c    
+*    
+* Description:  Q15 Fast FIR filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @param[in] *S points to an instance of the Q15 FIR filter structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in] blockSize number of samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * This fast version uses a 32-bit accumulator with 2.30 format.    
+ * The accumulator maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around and distorts the result.    
+ * In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits.    
+ * The 2.30 accumulator is then truncated to 2.15 format and saturated to yield the 1.15 result.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_q15()</code> for a slower implementation of this function which uses 64-bit accumulation to avoid wrap around distortion.  Both the slow and the fast versions use the same instance structure.    
+ * Use the function <code>arm_fir_init_q15()</code> to initialize the filter structure.    
+ */
+
+void arm_fir_fast_q15(
+  const arm_fir_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulators */
+  q15_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q15_t *px;                                     /* Temporary q31 pointer for SIMD state buffer accesses */
+  q31_t x0, x1, x2, c0;                          /* Temporary variables to hold SIMD state and coefficient values */
+  uint32_t numTaps = S->numTaps;                 /* Number of taps in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Apply loop unrolling and compute 4 output values simultaneously.      
+   * The variables acc0 ... acc3 hold output values that are being computed:      
+   *      
+   *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]      
+   *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]      
+   *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]      
+   *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]      
+   */
+
+  blkCnt = blockSize >> 2;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.      
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Copy four new input samples into the state buffer.      
+     ** Use 32-bit SIMD to move the 16-bit data.  Only requires two copies. */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+
+    /* Set all accumulators to zero */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Typecast q15_t pointer to q31_t pointer for state reading in q31_t */
+    px = pState;
+
+    /* Typecast q15_t pointer to q31_t pointer for coefficient reading in q31_t */
+    pb = pCoeffs;
+
+    /* Read the first two samples from the state buffer:  x[n-N], x[n-N-1] */
+    x0 = *__SIMD32(px)++;
+
+    /* Read the third and forth samples from the state buffer: x[n-N-2], x[n-N-3] */
+    x2 = *__SIMD32(px)++;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.      
+     ** Repeat until we've computed numTaps-(numTaps%4) coefficients. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0)
+    {
+      /* Read the first two coefficients using SIMD:  b[N] and b[N-1] coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* acc0 +=  b[N] * x[n-N] + b[N-1] * x[n-N-1] */
+      acc0 = __SMLAD(x0, c0, acc0);
+
+      /* acc2 +=  b[N] * x[n-N-2] + b[N-1] * x[n-N-3] */
+      acc2 = __SMLAD(x2, c0, acc2);
+
+      /* pack  x[n-N-1] and x[n-N-2] */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x2, x0, 0);
+#else
+      x1 = __PKHBT(x0, x2, 0);
+#endif
+
+      /* Read state x[n-N-4], x[n-N-5] */
+      x0 = _SIMD32_OFFSET(px);
+
+      /* acc1 +=  b[N] * x[n-N-1] + b[N-1] * x[n-N-2] */
+      acc1 = __SMLADX(x1, c0, acc1);
+
+      /* pack  x[n-N-3] and x[n-N-4] */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x0, x2, 0);
+#else
+      x1 = __PKHBT(x2, x0, 0);
+#endif
+
+      /* acc3 +=  b[N] * x[n-N-3] + b[N-1] * x[n-N-4] */
+      acc3 = __SMLADX(x1, c0, acc3);
+
+      /* Read coefficients b[N-2], b[N-3] */
+      c0 = *__SIMD32(pb)++;
+
+      /* acc0 +=  b[N-2] * x[n-N-2] + b[N-3] * x[n-N-3] */
+      acc0 = __SMLAD(x2, c0, acc0);
+
+      /* Read state x[n-N-6], x[n-N-7] with offset */
+      x2 = _SIMD32_OFFSET(px + 2u);
+
+      /* acc2 +=  b[N-2] * x[n-N-4] + b[N-3] * x[n-N-5] */
+      acc2 = __SMLAD(x0, c0, acc2);
+
+      /* acc1 +=  b[N-2] * x[n-N-3] + b[N-3] * x[n-N-4] */
+      acc1 = __SMLADX(x1, c0, acc1);
+
+      /* pack  x[n-N-5] and x[n-N-6] */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x2, x0, 0);
+#else
+      x1 = __PKHBT(x0, x2, 0);
+#endif
+
+      /* acc3 +=  b[N-2] * x[n-N-5] + b[N-3] * x[n-N-6] */
+      acc3 = __SMLADX(x1, c0, acc3);
+
+      /* Update state pointer for next state reading */
+      px += 4u;
+
+      /* Decrement tap count */
+      tapCnt--;
+
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps.       
+     ** This is always be 2 taps since the filter length is even. */
+    if((numTaps & 0x3u) != 0u)
+    {
+
+      /* Read last two coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Perform the multiply-accumulates */
+      acc0 = __SMLAD(x0, c0, acc0);
+      acc2 = __SMLAD(x2, c0, acc2);
+
+      /* pack state variables */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x2, x0, 0);
+#else
+      x1 = __PKHBT(x0, x2, 0);
+#endif
+
+      /* Read last state variables */
+      x0 = *__SIMD32(px);
+
+      /* Perform the multiply-accumulates */
+      acc1 = __SMLADX(x1, c0, acc1);
+
+      /* pack state variables */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x0, x2, 0);
+#else
+      x1 = __PKHBT(x2, x0, 0);
+#endif
+
+      /* Perform the multiply-accumulates */
+      acc3 = __SMLADX(x1, c0, acc3);
+    }
+
+    /* The results in the 4 accumulators are in 2.30 format.  Convert to 1.15 with saturation.       
+     ** Then store the 4 outputs in the destination buffer. */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+
+#endif /*      #ifndef ARM_MATH_BIG_ENDIAN       */
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.      
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+  while(blkCnt > 0u)
+  {
+    /* Copy two samples into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc0 = 0;
+
+    /* Use SIMD to hold states and coefficients */
+    px = pState;
+    pb = pCoeffs;
+
+    tapCnt = numTaps >> 1u;
+
+    do
+    {
+
+      acc0 += (q31_t) * px++ * *pb++;
+	  acc0 += (q31_t) * px++ * *pb++;
+
+      tapCnt--;
+    }
+    while(tapCnt > 0u);
+
+    /* The result is in 2.30 format.  Convert to 1.15 with saturation.      
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.      
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.      
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Calculation of count for copying integer writes */
+  tapCnt = (numTaps - 1u) >> 2;
+
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    tapCnt--;
+
+  }
+
+  /* Calculation of count for remaining q15_t data */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* copy remaining data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,305 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_fast_q31.c    
+*    
+* Description:	Processing function for the Q31 Fast FIR filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @param[in] *S points to an instance of the Q31 structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block output data.    
+ * @param[in] blockSize number of samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * This function is optimized for speed at the expense of fixed-point precision and overflow protection.    
+ * The result of each 1.31 x 1.31 multiplication is truncated to 2.30 format.    
+ * These intermediate results are added to a 2.30 accumulator.    
+ * Finally, the accumulator is saturated and converted to a 1.31 result.    
+ * The fast version has the same overflow behavior as the standard version and provides less precision since it discards the low 32 bits of each multiplication result.    
+ * In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits.    
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_q31()</code> for a slower implementation of this function which uses a 64-bit accumulator to provide higher precision.  Both the slow and the fast versions use the same instance structure.    
+ * Use the function <code>arm_fir_init_q31()</code> to initialize the filter structure.    
+ */
+
+IAR_ONLY_LOW_OPTIMIZATION_ENTER
+void arm_fir_fast_q31(
+  const arm_fir_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t x0, x1, x2, x3;                          /* Temporary variables to hold state */
+  q31_t c0;                                      /* Temporary variable to hold coefficient value */
+  q31_t *px;                                     /* Temporary pointer for state */
+  q31_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulators */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t i, tapCnt, blkCnt;                    /* Loop counters */
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Apply loop unrolling and compute 4 output values simultaneously.    
+   * The variables acc0 ... acc3 hold output values that are being computed:    
+   *    
+   *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]    
+   *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]    
+   *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]    
+   *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]    
+   */
+  blkCnt = blockSize >> 2;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Copy four new input samples into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set all accumulators to zero */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Read the first three samples from the state buffer:    
+     *  x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2] */
+    x0 = *(px++);
+    x1 = *(px++);
+    x2 = *(px++);
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+    i = tapCnt;
+
+    while(i > 0u)
+    {
+      /* Read the b[numTaps] coefficient */
+      c0 = *pb;
+
+      /* Read x[n-numTaps-3] sample */
+      x3 = *px;
+
+      /* acc0 +=  b[numTaps] * x[n-numTaps] */
+      multAcc_32x32_keep32_R(acc0, x0, c0);
+
+      /* acc1 +=  b[numTaps] * x[n-numTaps-1] */
+      multAcc_32x32_keep32_R(acc1, x1, c0);
+
+      /* acc2 +=  b[numTaps] * x[n-numTaps-2] */
+      multAcc_32x32_keep32_R(acc2, x2, c0);
+
+      /* acc3 +=  b[numTaps] * x[n-numTaps-3] */
+      multAcc_32x32_keep32_R(acc3, x3, c0);
+
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb + 1u);
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *(px + 1u);
+
+      /* Perform the multiply-accumulates */      
+      multAcc_32x32_keep32_R(acc0, x1, c0);
+      multAcc_32x32_keep32_R(acc1, x2, c0);
+      multAcc_32x32_keep32_R(acc2, x3, c0);
+      multAcc_32x32_keep32_R(acc3, x0, c0);
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb + 2u);
+
+      /* Read x[n-numTaps-5] sample */
+      x1 = *(px + 2u);
+
+      /* Perform the multiply-accumulates */      
+      multAcc_32x32_keep32_R(acc0, x2, c0);
+      multAcc_32x32_keep32_R(acc1, x3, c0);
+      multAcc_32x32_keep32_R(acc2, x0, c0);
+      multAcc_32x32_keep32_R(acc3, x1, c0);
+
+      /* Read the b[numTaps-3] coefficients */
+      c0 = *(pb + 3u);
+
+      /* Read x[n-numTaps-6] sample */
+      x2 = *(px + 3u);
+
+      /* Perform the multiply-accumulates */      
+      multAcc_32x32_keep32_R(acc0, x3, c0);
+      multAcc_32x32_keep32_R(acc1, x0, c0);
+      multAcc_32x32_keep32_R(acc2, x1, c0);
+      multAcc_32x32_keep32_R(acc3, x2, c0);
+
+      /* update coefficient pointer */
+      pb += 4u;
+      px += 4u;
+      
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+
+    i = numTaps - (tapCnt * 4u);
+    while(i > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x3 = *(px++);
+
+      /* Perform the multiply-accumulates */      
+      multAcc_32x32_keep32_R(acc0, x0, c0);
+      multAcc_32x32_keep32_R(acc1, x1, c0);
+      multAcc_32x32_keep32_R(acc2, x2, c0);
+      multAcc_32x32_keep32_R(acc3, x3, c0);
+
+      /* Reuse the present sample states for next sample */
+      x0 = x1;
+      x1 = x2;
+      x2 = x3;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 4;
+
+    /* The results in the 4 accumulators are in 2.30 format.  Convert to 1.31    
+     ** Then store the 4 outputs in the destination buffer. */
+    *pDst++ = (q31_t) (acc0 << 1);
+    *pDst++ = (q31_t) (acc1 << 1);
+    *pDst++ = (q31_t) (acc2 << 1);
+    *pDst++ = (q31_t) (acc3 << 1);
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 4u;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy one sample at a time into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize Coefficient pointer */
+    pb = (pCoeffs);
+
+    i = numTaps;
+
+    /* Perform the multiply-accumulates */
+    do
+    {
+      multAcc_32x32_keep32_R(acc0, (*px++), (*(pb++)));
+      i--;
+    } while(i > 0u);
+
+    /* The result is in 2.30 format.  Convert to 1.31    
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q31_t) (acc0 << 1);
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.    
+   ** Now copy the last numTaps - 1 samples to the start of the state buffer.    
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u);
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+
+}
+IAR_ONLY_LOW_OPTIMIZATION_EXIT
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,96 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_init_f32.c    
+*    
+* Description:  Floating-point FIR filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @details    
+ *    
+ * @param[in,out] *S points to an instance of the floating-point FIR filter structure.    
+ * @param[in] 	  numTaps  Number of filter coefficients in the filter.    
+ * @param[in]     *pCoeffs points to the filter coefficients buffer.    
+ * @param[in]     *pState points to the state buffer.    
+ * @param[in] 	  blockSize number of samples that are processed per call.    
+ * @return 		  none.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize-1</code> samples, where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_f32()</code>.    
+ */
+
+void arm_fir_init_f32(
+  arm_fir_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and the size of state buffer is (blockSize + numTaps - 1) */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,154 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_init_q15.c    
+*    
+* Description:  Q15 FIR filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @param[in,out]  *S points to an instance of the Q15 FIR filter structure.    
+ * @param[in] 	   numTaps  Number of filter coefficients in the filter. Must be even and greater than or equal to 4.    
+ * @param[in]      *pCoeffs points to the filter coefficients buffer.    
+ * @param[in]      *pState points to the state buffer.    
+ * @param[in]      blockSize is number of samples processed per call.    
+ * @return The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_ARGUMENT_ERROR if    
+ * <code>numTaps</code> is not greater than or equal to 4 and even.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * Note that <code>numTaps</code> must be even and greater than or equal to 4.    
+ * To implement an odd length filter simply increase <code>numTaps</code> by 1 and set the last coefficient to zero.    
+ * For example, to implement a filter with <code>numTaps=3</code> and coefficients    
+ * <pre>    
+ *     {0.3, -0.8, 0.3}    
+ * </pre>    
+ * set <code>numTaps=4</code> and use the coefficients:    
+ * <pre>    
+ *     {0.3, -0.8, 0.3, 0}.    
+ * </pre>    
+ * Similarly, to implement a two point filter    
+ * <pre>    
+ *     {0.3, -0.3}    
+ * </pre>    
+ * set <code>numTaps=4</code> and use the coefficients:    
+ * <pre>    
+ *     {0.3, -0.3, 0, 0}.    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize</code>, when running on Cortex-M4 and Cortex-M3  and is of length <code>numTaps+blockSize-1</code>, when running on Cortex-M0 where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_q15()</code>.    
+ */
+
+arm_status arm_fir_init_q15(
+  arm_fir_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  uint32_t blockSize)
+{
+  arm_status status;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* The Number of filter coefficients in the filter must be even and at least 4 */
+  if(numTaps & 0x1u)
+  {
+    status = ARM_MATH_ARGUMENT_ERROR;
+  }
+  else
+  {
+    /* Assign filter taps */
+    S->numTaps = numTaps;
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Clear the state buffer.  The size is always (blockSize + numTaps ) */
+    memset(pState, 0, (numTaps + (blockSize)) * sizeof(q15_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear the state buffer.  The size is always (blockSize + numTaps - 1) */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(q15_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  status = ARM_MATH_SUCCESS;
+
+  return (status);
+
+#endif /*  #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,96 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_init_q31.c    
+*    
+* Description:	Q31 FIR filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @details    
+ *    
+ * @param[in,out] *S points to an instance of the Q31 FIR filter structure.    
+ * @param[in] 	  numTaps  Number of filter coefficients in the filter.    
+ * @param[in] 	  *pCoeffs points to the filter coefficients buffer.    
+ * @param[in] 	  *pState points to the state buffer.    
+ * @param[in] 	  blockSize number of samples that are processed per call.    
+ * @return        none.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize-1</code> samples, where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_q31()</code>.    
+ */
+
+void arm_fir_init_q31(
+  arm_fir_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and state array size is (blockSize + numTaps - 1) */
+  memset(pState, 0, (blockSize + ((uint32_t) numTaps - 1u)) * sizeof(q31_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_init_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,94 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_init_q7.c    
+*    
+* Description:  Q7 FIR filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+/**    
+ * @param[in,out] *S points to an instance of the Q7 FIR filter structure.    
+ * @param[in] 	  numTaps  Number of filter coefficients in the filter.    
+ * @param[in] 	  *pCoeffs points to the filter coefficients buffer.    
+ * @param[in]     *pState points to the state buffer.    
+ * @param[in]     blockSize number of samples that are processed per call.    
+ * @return     	  none    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>numTaps+blockSize-1</code> samples, where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_q7()</code>.    
+ */
+
+void arm_fir_init_q7(
+  arm_fir_instance_q7 * S,
+  uint16_t numTaps,
+  q7_t * pCoeffs,
+  q7_t * pState,
+  uint32_t blockSize)
+{
+
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear the state buffer.  The size is always (blockSize + numTaps - 1) */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(q7_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,581 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_interpolate_f32.c    
+*    
+* Description:	FIR interpolation for floating-point sequences.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @defgroup FIR_Interpolate Finite Impulse Response (FIR) Interpolator    
+ *    
+ * These functions combine an upsampler (zero stuffer) and an FIR filter.    
+ * They are used in multirate systems for increasing the sample rate of a signal without introducing high frequency images.    
+ * Conceptually, the functions are equivalent to the block diagram below:    
+ * \image html FIRInterpolator.gif "Components included in the FIR Interpolator functions"    
+ * After upsampling by a factor of <code>L</code>, the signal should be filtered by a lowpass filter with a normalized    
+ * cutoff frequency of <code>1/L</code> in order to eliminate high frequency copies of the spectrum.    
+ * The user of the function is responsible for providing the filter coefficients.    
+ *    
+ * The FIR interpolator functions provided in the CMSIS DSP Library combine the upsampler and FIR filter in an efficient manner.    
+ * The upsampler inserts <code>L-1</code> zeros between each sample.    
+ * Instead of multiplying by these zero values, the FIR filter is designed to skip them.    
+ * This leads to an efficient implementation without any wasted effort.    
+ * The functions operate on blocks of input and output data.    
+ * <code>pSrc</code> points to an array of <code>blockSize</code> input values and    
+ * <code>pDst</code> points to an array of <code>blockSize*L</code> output values.    
+ *    
+ * The library provides separate functions for Q15, Q31, and floating-point data types.    
+ *    
+ * \par Algorithm:    
+ * The functions use a polyphase filter structure:    
+ * <pre>    
+ *    y[n] = b[0] * x[n] + b[L]   * x[n-1] + ... + b[L*(phaseLength-1)] * x[n-phaseLength+1]    
+ *    y[n+1] = b[1] * x[n] + b[L+1] * x[n-1] + ... + b[L*(phaseLength-1)+1] * x[n-phaseLength+1]    
+ *    ...    
+ *    y[n+(L-1)] = b[L-1] * x[n] + b[2*L-1] * x[n-1] + ....+ b[L*(phaseLength-1)+(L-1)] * x[n-phaseLength+1]    
+ * </pre>    
+ * This approach is more efficient than straightforward upsample-then-filter algorithms.    
+ * With this method the computation is reduced by a factor of <code>1/L</code> when compared to using a standard FIR filter.    
+ * \par    
+ * <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>.    
+ * <code>numTaps</code> must be a multiple of the interpolation factor <code>L</code> and this is checked by the    
+ * initialization functions.    
+ * Internally, the function divides the FIR filter's impulse response into shorter filters of length    
+ * <code>phaseLength=numTaps/L</code>.    
+ * Coefficients are stored in time reversed order.    
+ * \par    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to a state array of size <code>blockSize + phaseLength - 1</code>.    
+ * Samples in the state buffer are stored in the order:    
+ * \par    
+ * <pre>    
+ *    {x[n-phaseLength+1], x[n-phaseLength], x[n-phaseLength-1], x[n-phaseLength-2]....x[0], x[1], ..., x[blockSize-1]}    
+ * </pre>    
+ * The state variables are updated after each block of data is processed, the coefficients are untouched.    
+ *    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient arrays may be shared among several instances while state variable array should be allocated separately.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+ *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * - Checks to make sure that the length of the filter is a multiple of the interpolation factor.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * L (interpolation factor), pCoeffs, phaseLength (numTaps / L), pState. Also set all of the values in pState to zero. 
+ *    
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * The code below statically initializes each of the 3 different data type filter instance structures    
+ * <pre>    
+ * arm_fir_interpolate_instance_f32 S = {L, phaseLength, pCoeffs, pState};    
+ * arm_fir_interpolate_instance_q31 S = {L, phaseLength, pCoeffs, pState};    
+ * arm_fir_interpolate_instance_q15 S = {L, phaseLength, pCoeffs, pState};    
+ * </pre>    
+ * where <code>L</code> is the interpolation factor; <code>phaseLength=numTaps/L</code> is the    
+ * length of each of the shorter FIR filters used internally,    
+ * <code>pCoeffs</code> is the address of the coefficient buffer;    
+ * <code>pState</code> is the address of the state buffer.    
+ * Be sure to set the values in the state buffer to zeros when doing static initialization.    
+ *    
+ * \par Fixed-Point Behavior    
+ * Care must be taken when using the fixed-point versions of the FIR interpolate filter functions.    
+ * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.    
+ * Refer to the function specific documentation below for usage guidelines.    
+ */
+
+/**    
+ * @addtogroup FIR_Interpolate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the floating-point FIR interpolator.    
+ * @param[in] *S        points to an instance of the floating-point FIR interpolator structure.    
+ * @param[in] *pSrc     points to the block of input data.    
+ * @param[out] *pDst    points to the block of output data.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none.    
+ */
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+void arm_fir_interpolate_f32(
+  const arm_fir_interpolate_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pState = S->pState;                 /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+  float32_t *ptr1, *ptr2;                        /* Temporary pointers for state and coefficient buffers */
+  float32_t sum0;                                /* Accumulators */
+  float32_t x0, c0;                              /* Temporary variables to hold state and coefficient values */
+  uint32_t i, blkCnt, j;                         /* Loop counters */
+  uint16_t phaseLen = S->phaseLength, tapCnt;    /* Length of each polyphase filter component */
+  float32_t acc0, acc1, acc2, acc3;
+  float32_t x1, x2, x3;
+  uint32_t blkCntN4;
+  float32_t c1, c2, c3;
+
+  /* S->pState buffer contains previous frame (phaseLen - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (phaseLen - 1u);
+
+  /* Initialise  blkCnt */
+  blkCnt = blockSize / 4;
+  blkCntN4 = blockSize - (4 * blkCnt);
+
+  /* Samples loop unrolled by 4 */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+    /* Address modifier index of coefficient buffer */
+    j = 1u;
+
+    /* Loop over the Interpolation factor. */
+    i = (S->L);
+
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      acc0 = 0.0f;
+      acc1 = 0.0f;
+      acc2 = 0.0f;
+      acc3 = 0.0f;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (S->L - j);
+
+      /* Loop over the polyPhase length. Unroll by a factor of 4.        
+       ** Repeat until we've computed numTaps-(4*S->L) coefficients. */
+      tapCnt = phaseLen >> 2u;
+
+      x0 = *(ptr1++);
+      x1 = *(ptr1++);
+      x2 = *(ptr1++);
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read the input sample */
+        x3 = *(ptr1++);
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Perform the multiply-accumulate */
+        acc0 += x0 * c0;
+        acc1 += x1 * c0;
+        acc2 += x2 * c0;
+        acc3 += x3 * c0;
+
+        /* Read the coefficient */
+        c1 = *(ptr2 + S->L);
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += x1 * c1;
+        acc1 += x2 * c1;
+        acc2 += x3 * c1;
+        acc3 += x0 * c1;
+
+        /* Read the coefficient */
+        c2 = *(ptr2 + S->L * 2);
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += x2 * c2;
+        acc1 += x3 * c2;
+        acc2 += x0 * c2;
+        acc3 += x1 * c2;
+
+        /* Read the coefficient */
+        c3 = *(ptr2 + S->L * 3);
+
+        /* Read the input sample */
+        x2 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += x3 * c3;
+        acc1 += x0 * c3;
+        acc2 += x1 * c3;
+        acc3 += x2 * c3;
+
+
+        /* Upsampling is done by stuffing L-1 zeros between each sample.        
+         * So instead of multiplying zeros with coefficients,        
+         * Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += 4 * S->L;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* If the polyPhase length is not a multiple of 4, compute the remaining filter taps */
+      tapCnt = phaseLen % 0x4u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read the input sample */
+        x3 = *(ptr1++);
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Perform the multiply-accumulate */
+        acc0 += x0 * c0;
+        acc1 += x1 * c0;
+        acc2 += x2 * c0;
+        acc3 += x3 * c0;
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* update states for next sample processing */
+        x0 = x1;
+        x1 = x2;
+        x2 = x3;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst = acc0;
+      *(pDst + S->L) = acc1;
+      *(pDst + 2 * S->L) = acc2;
+      *(pDst + 3 * S->L) = acc3;
+
+      pDst++;
+
+      /* Increment the address modifier index of coefficient buffer */
+      j++;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1        
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 4;
+
+    pDst += S->L * 3;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+
+  while(blkCntN4 > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Address modifier index of coefficient buffer */
+    j = 1u;
+
+    /* Loop over the Interpolation factor. */
+    i = S->L;
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      sum0 = 0.0f;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (S->L - j);
+
+      /* Loop over the polyPhase length. Unroll by a factor of 4.        
+       ** Repeat until we've computed numTaps-(4*S->L) coefficients. */
+      tapCnt = phaseLen >> 2u;
+      while(tapCnt > 0u)
+      {
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Upsampling is done by stuffing L-1 zeros between each sample.        
+         * So instead of multiplying zeros with coefficients,        
+         * Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += x0 * c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += x0 * c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += x0 * c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += x0 * c0;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* If the polyPhase length is not a multiple of 4, compute the remaining filter taps */
+      tapCnt = phaseLen % 0x4u;
+
+      while(tapCnt > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum0 += *(ptr1++) * (*ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst++ = sum0;
+
+      /* Increment the address modifier index of coefficient buffer */
+      j++;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1        
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCntN4--;
+  }
+
+  /* Processing is complete.        
+   ** Now copy the last phaseLen - 1 samples to the satrt of the state buffer.        
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  tapCnt = (phaseLen - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  tapCnt = (phaseLen - 1u) % 0x04u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+}
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+void arm_fir_interpolate_f32(
+  const arm_fir_interpolate_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pState = S->pState;                 /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+  float32_t *ptr1, *ptr2;                        /* Temporary pointers for state and coefficient buffers */
+
+
+  float32_t sum;                                 /* Accumulator */
+  uint32_t i, blkCnt;                            /* Loop counters */
+  uint16_t phaseLen = S->phaseLength, tapCnt;    /* Length of each polyphase filter component */
+
+
+  /* S->pState buffer contains previous frame (phaseLen - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (phaseLen - 1u);
+
+  /* Total number of intput samples */
+  blkCnt = blockSize;
+
+  /* Loop over the blockSize. */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Loop over the Interpolation factor. */
+    i = S->L;
+
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      sum = 0.0f;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (i - 1u);
+
+      /* Loop over the polyPhase length */
+      tapCnt = phaseLen;
+
+      while(tapCnt > 0u)
+      {
+        /* Perform the multiply-accumulate */
+        sum += *ptr1++ * *ptr2;
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst++ = sum;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1           
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last phaseLen - 1 samples to the start of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  tapCnt = phaseLen - 1u;
+
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+}
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+
+
+ /**    
+  * @} end of FIR_Interpolate group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,121 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_interpolate_init_f32.c    
+*    
+* Description:  Floating-point FIR interpolator initialization function    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Interpolate    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the floating-point FIR interpolator.    
+ * @param[in,out] *S        points to an instance of the floating-point FIR interpolator structure.    
+ * @param[in]     L         upsample factor.    
+ * @param[in]     numTaps   number of filter coefficients in the filter.    
+ * @param[in]     *pCoeffs  points to the filter coefficient buffer.    
+ * @param[in]     *pState   points to the state buffer.    
+ * @param[in]     blockSize number of input samples to process per call.    
+ * @return        The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_LENGTH_ERROR if    
+ * the filter length <code>numTaps</code> is not a multiple of the interpolation factor <code>L</code>.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[numTaps-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The length of the filter <code>numTaps</code> must be a multiple of the interpolation factor <code>L</code>.    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>(numTaps/L)+blockSize-1</code> words    
+ * where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_interpolate_f32()</code>.    
+ */
+
+arm_status arm_fir_interpolate_init_f32(
+  arm_fir_interpolate_instance_f32 * S,
+  uint8_t L,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  uint32_t blockSize)
+{
+  arm_status status;
+
+  /* The filter length must be a multiple of the interpolation factor */
+  if((numTaps % L) != 0u)
+  {
+    /* Set status as ARM_MATH_LENGTH_ERROR */
+    status = ARM_MATH_LENGTH_ERROR;
+  }
+  else
+  {
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Assign Interpolation factor */
+    S->L = L;
+
+    /* Assign polyPhaseLength */
+    S->phaseLength = numTaps / L;
+
+    /* Clear state buffer and size of state array is always phaseLength + blockSize - 1 */
+    memset(pState, 0,
+           (blockSize +
+            ((uint32_t) S->phaseLength - 1u)) * sizeof(float32_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+}
+
+ /**    
+  * @} end of FIR_Interpolate group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,120 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_interpolate_init_q15.c    
+*    
+* Description:  Q15 FIR interpolator initialization function    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Interpolate    
+ * @{    
+ */
+
+/**    
+ * @brief  Initialization function for the Q15 FIR interpolator.    
+ * @param[in,out] *S        points to an instance of the Q15 FIR interpolator structure.    
+ * @param[in]     L         upsample factor.    
+ * @param[in]     numTaps   number of filter coefficients in the filter.    
+ * @param[in]     *pCoeffs  points to the filter coefficient buffer.    
+ * @param[in]     *pState   points to the state buffer.    
+ * @param[in]     blockSize number of input samples to process per call.    
+ * @return        The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_LENGTH_ERROR if    
+ * the filter length <code>numTaps</code> is not a multiple of the interpolation factor <code>L</code>.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[numTaps-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The length of the filter <code>numTaps</code> must be a multiple of the interpolation factor <code>L</code>.    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>(numTaps/L)+blockSize-1</code> words    
+ * where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_interpolate_q15()</code>.    
+ */
+
+arm_status arm_fir_interpolate_init_q15(
+  arm_fir_interpolate_instance_q15 * S,
+  uint8_t L,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  uint32_t blockSize)
+{
+  arm_status status;
+
+  /* The filter length must be a multiple of the interpolation factor */
+  if((numTaps % L) != 0u)
+  {
+    /* Set status as ARM_MATH_LENGTH_ERROR */
+    status = ARM_MATH_LENGTH_ERROR;
+  }
+  else
+  {
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Assign Interpolation factor */
+    S->L = L;
+
+    /* Assign polyPhaseLength */
+    S->phaseLength = numTaps / L;
+
+    /* Clear state buffer and size of buffer is always phaseLength + blockSize - 1 */
+    memset(pState, 0,
+           (blockSize + ((uint32_t) S->phaseLength - 1u)) * sizeof(q15_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+}
+
+ /**    
+  * @} end of FIR_Interpolate group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,121 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_interpolate_init_q31.c    
+*    
+* Description:  Q31 FIR interpolator initialization function    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Interpolate    
+ * @{    
+ */
+
+
+/**    
+ * @brief  Initialization function for the Q31 FIR interpolator.    
+ * @param[in,out] *S        points to an instance of the Q31 FIR interpolator structure.    
+ * @param[in]     L         upsample factor.    
+ * @param[in]     numTaps   number of filter coefficients in the filter.    
+ * @param[in]     *pCoeffs  points to the filter coefficient buffer.    
+ * @param[in]     *pState   points to the state buffer.    
+ * @param[in]     blockSize number of input samples to process per call.    
+ * @return        The function returns ARM_MATH_SUCCESS if initialization was successful or ARM_MATH_LENGTH_ERROR if    
+ * the filter length <code>numTaps</code> is not a multiple of the interpolation factor <code>L</code>.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[numTaps-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The length of the filter <code>numTaps</code> must be a multiple of the interpolation factor <code>L</code>.    
+ * \par    
+ * <code>pState</code> points to the array of state variables.    
+ * <code>pState</code> is of length <code>(numTaps/L)+blockSize-1</code> words    
+ * where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_fir_interpolate_q31()</code>.    
+ */
+
+arm_status arm_fir_interpolate_init_q31(
+  arm_fir_interpolate_instance_q31 * S,
+  uint8_t L,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  uint32_t blockSize)
+{
+  arm_status status;
+
+  /* The filter length must be a multiple of the interpolation factor */
+  if((numTaps % L) != 0u)
+  {
+    /* Set status as ARM_MATH_LENGTH_ERROR */
+    status = ARM_MATH_LENGTH_ERROR;
+  }
+  else
+  {
+
+    /* Assign coefficient pointer */
+    S->pCoeffs = pCoeffs;
+
+    /* Assign Interpolation factor */
+    S->L = L;
+
+    /* Assign polyPhaseLength */
+    S->phaseLength = numTaps / L;
+
+    /* Clear state buffer and size of buffer is always phaseLength + blockSize - 1 */
+    memset(pState, 0,
+           (blockSize + ((uint32_t) S->phaseLength - 1u)) * sizeof(q31_t));
+
+    /* Assign state pointer */
+    S->pState = pState;
+
+    status = ARM_MATH_SUCCESS;
+  }
+
+  return (status);
+
+}
+
+ /**    
+  * @} end of FIR_Interpolate group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,508 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_fir_interpolate_q15.c    
+*    
+* Description:	Q15 FIR interpolation.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Interpolate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q15 FIR interpolator.    
+ * @param[in] *S        points to an instance of the Q15 FIR interpolator structure.    
+ * @param[in] *pSrc     points to the block of input data.    
+ * @param[out] *pDst    points to the block of output data.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits.    
+ * Lastly, the accumulator is saturated to yield a result in 1.15 format.    
+ */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+void arm_fir_interpolate_q15(
+  const arm_fir_interpolate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer                                            */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer                                      */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state                */
+  q15_t *ptr1, *ptr2;                            /* Temporary pointers for state and coefficient buffers     */
+  q63_t sum0;                                    /* Accumulators                                             */
+  q15_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  uint32_t i, blkCnt, j, tapCnt;                 /* Loop counters                                            */
+  uint16_t phaseLen = S->phaseLength;            /* Length of each polyphase filter component */
+  uint32_t blkCntN2;
+  q63_t acc0, acc1;
+  q15_t x1;
+
+  /* S->pState buffer contains previous frame (phaseLen - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + ((q31_t) phaseLen - 1);
+
+  /* Initialise  blkCnt */
+  blkCnt = blockSize / 2;
+  blkCntN2 = blockSize - (2 * blkCnt);
+
+  /* Samples loop unrolled by 2 */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+    /* Address modifier index of coefficient buffer */
+    j = 1u;
+
+    /* Loop over the Interpolation factor. */
+    i = (S->L);
+
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      acc0 = 0;
+      acc1 = 0;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (S->L - j);
+
+      /* Loop over the polyPhase length. Unroll by a factor of 4.        
+       ** Repeat until we've computed numTaps-(4*S->L) coefficients. */
+      tapCnt = phaseLen >> 2u;
+
+      x0 = *(ptr1++);
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x0 *c0;
+        acc1 += (q63_t) x1 *c0;
+
+
+        /* Read the coefficient */
+        c0 = *(ptr2 + S->L);
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x1 *c0;
+        acc1 += (q63_t) x0 *c0;
+
+
+        /* Read the coefficient */
+        c0 = *(ptr2 + S->L * 2);
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x0 *c0;
+        acc1 += (q63_t) x1 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2 + S->L * 3);
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x1 *c0;
+        acc1 += (q63_t) x0 *c0;
+
+
+        /* Upsampling is done by stuffing L-1 zeros between each sample.        
+         * So instead of multiplying zeros with coefficients,        
+         * Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += 4 * S->L;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* If the polyPhase length is not a multiple of 4, compute the remaining filter taps */
+      tapCnt = phaseLen % 0x4u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x0 *c0;
+        acc1 += (q63_t) x1 *c0;
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* update states for next sample processing */
+        x0 = x1;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst = (q15_t) (__SSAT((acc0 >> 15), 16));
+      *(pDst + S->L) = (q15_t) (__SSAT((acc1 >> 15), 16));
+
+      pDst++;
+
+      /* Increment the address modifier index of coefficient buffer */
+      j++;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1        
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 2;
+
+    pDst += S->L;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 2, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blkCntN2;
+
+  /* Loop over the blockSize. */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Address modifier index of coefficient buffer */
+    j = 1u;
+
+    /* Loop over the Interpolation factor. */
+    i = S->L;
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      sum0 = 0;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (S->L - j);
+
+      /* Loop over the polyPhase length. Unroll by a factor of 4.        
+       ** Repeat until we've computed numTaps-(4*S->L) coefficients. */
+      tapCnt = phaseLen >> 2;
+      while(tapCnt > 0u)
+      {
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Upsampling is done by stuffing L-1 zeros between each sample.        
+         * So instead of multiplying zeros with coefficients,        
+         * Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* If the polyPhase length is not a multiple of 4, compute the remaining filter taps */
+      tapCnt = phaseLen & 0x3u;
+
+      while(tapCnt > 0u)
+      {
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst++ = (q15_t) (__SSAT((sum0 >> 15), 16));
+
+      j++;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1        
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+  /* Processing is complete.    
+   ** Now copy the last phaseLen - 1 samples to the satrt of the state buffer.    
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = ((uint32_t) phaseLen - 1u) >> 2u;
+
+  /* copy data */
+  while(i > 0u)
+  {
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+
+#else
+
+    *pStateCurnt++ = *pState++;
+	*pStateCurnt++ = *pState++;
+	*pStateCurnt++ = *pState++;
+	*pStateCurnt++ = *pState++;
+	
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+	
+	/* Decrement the loop counter */
+    i--;
+  }
+
+  i = ((uint32_t) phaseLen - 1u) % 0x04u;
+
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+}
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+void arm_fir_interpolate_q15(
+  const arm_fir_interpolate_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer                                            */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer                                      */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state                */
+  q15_t *ptr1, *ptr2;                            /* Temporary pointers for state and coefficient buffers     */
+  q63_t sum;                                     /* Accumulator */
+  q15_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  uint32_t i, blkCnt, tapCnt;                    /* Loop counters                                            */
+  uint16_t phaseLen = S->phaseLength;            /* Length of each polyphase filter component */
+
+
+  /* S->pState buffer contains previous frame (phaseLen - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (phaseLen - 1u);
+
+  /* Total number of intput samples */
+  blkCnt = blockSize;
+
+  /* Loop over the blockSize. */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Loop over the Interpolation factor. */
+    i = S->L;
+
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      sum = 0;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (i - 1u);
+
+      /* Loop over the polyPhase length */
+      tapCnt = (uint32_t) phaseLen;
+
+      while(tapCnt > 0u)
+      {
+        /* Read the coefficient */
+        c0 = *ptr2;
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *ptr1++;
+
+        /* Perform the multiply-accumulate */
+        sum += ((q31_t) x0 * c0);
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* Store the result after converting to 1.15 format in the destination buffer */
+      *pDst++ = (q15_t) (__SSAT((sum >> 15), 16));
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1           
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last phaseLen - 1 samples to the start of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  i = (uint32_t) phaseLen - 1u;
+
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    i--;
+  }
+
+}
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+
+ /**    
+  * @} end of FIR_Interpolate group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_interpolate_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,504 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_fir_interpolate_q31.c    
+*    
+* Description:	Q31 FIR interpolation.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Interpolate    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q31 FIR interpolator.    
+ * @param[in] *S        points to an instance of the Q31 FIR interpolator structure.    
+ * @param[in] *pSrc     points to the block of input data.    
+ * @param[out] *pDst    points to the block of output data.    
+ * @param[in] blockSize number of input samples to process per call.    
+ * @return none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around rather than clip.    
+ * In order to avoid overflows completely the input signal must be scaled down by <code>1/(numTaps/L)</code>.    
+ * since <code>numTaps/L</code> additions occur per output sample.    
+ * After all multiply-accumulates are performed, the 2.62 accumulator is truncated to 1.32 format and then saturated to 1.31 format.    
+ */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+void arm_fir_interpolate_q31(
+  const arm_fir_interpolate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t *ptr1, *ptr2;                            /* Temporary pointers for state and coefficient buffers */
+  q63_t sum0;                                    /* Accumulators */
+  q31_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  uint32_t i, blkCnt, j;                         /* Loop counters */
+  uint16_t phaseLen = S->phaseLength, tapCnt;    /* Length of each polyphase filter component */
+
+  uint32_t blkCntN2;
+  q63_t acc0, acc1;
+  q31_t x1;
+
+  /* S->pState buffer contains previous frame (phaseLen - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + ((q31_t) phaseLen - 1);
+
+  /* Initialise  blkCnt */
+  blkCnt = blockSize / 2;
+  blkCntN2 = blockSize - (2 * blkCnt);
+
+  /* Samples loop unrolled by 2 */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+    /* Address modifier index of coefficient buffer */
+    j = 1u;
+
+    /* Loop over the Interpolation factor. */
+    i = (S->L);
+
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      acc0 = 0;
+      acc1 = 0;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (S->L - j);
+
+      /* Loop over the polyPhase length. Unroll by a factor of 4.        
+       ** Repeat until we've computed numTaps-(4*S->L) coefficients. */
+      tapCnt = phaseLen >> 2u;
+
+      x0 = *(ptr1++);
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x0 *c0;
+        acc1 += (q63_t) x1 *c0;
+
+
+        /* Read the coefficient */
+        c0 = *(ptr2 + S->L);
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x1 *c0;
+        acc1 += (q63_t) x0 *c0;
+
+
+        /* Read the coefficient */
+        c0 = *(ptr2 + S->L * 2);
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x0 *c0;
+        acc1 += (q63_t) x1 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2 + S->L * 3);
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x1 *c0;
+        acc1 += (q63_t) x0 *c0;
+
+
+        /* Upsampling is done by stuffing L-1 zeros between each sample.        
+         * So instead of multiplying zeros with coefficients,        
+         * Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += 4 * S->L;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* If the polyPhase length is not a multiple of 4, compute the remaining filter taps */
+      tapCnt = phaseLen % 0x4u;
+
+      while(tapCnt > 0u)
+      {
+
+        /* Read the input sample */
+        x1 = *(ptr1++);
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Perform the multiply-accumulate */
+        acc0 += (q63_t) x0 *c0;
+        acc1 += (q63_t) x1 *c0;
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* update states for next sample processing */
+        x0 = x1;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst = (q31_t) (acc0 >> 31);
+      *(pDst + S->L) = (q31_t) (acc1 >> 31);
+
+
+      pDst++;
+
+      /* Increment the address modifier index of coefficient buffer */
+      j++;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1        
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 2;
+
+    pDst += S->L;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 2, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blkCntN2;
+
+  /* Loop over the blockSize. */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Address modifier index of coefficient buffer */
+    j = 1u;
+
+    /* Loop over the Interpolation factor. */
+    i = S->L;
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      sum0 = 0;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (S->L - j);
+
+      /* Loop over the polyPhase length. Unroll by a factor of 4.        
+       ** Repeat until we've computed numTaps-(4*S->L) coefficients. */
+      tapCnt = phaseLen >> 2;
+      while(tapCnt > 0u)
+      {
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Upsampling is done by stuffing L-1 zeros between each sample.        
+         * So instead of multiplying zeros with coefficients,        
+         * Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* If the polyPhase length is not a multiple of 4, compute the remaining filter taps */
+      tapCnt = phaseLen & 0x3u;
+
+      while(tapCnt > 0u)
+      {
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *(ptr1++);
+
+        /* Perform the multiply-accumulate */
+        sum0 += (q63_t) x0 *c0;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst++ = (q31_t) (sum0 >> 31);
+
+      /* Increment the address modifier index of coefficient buffer */
+      j++;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1        
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.        
+   ** Now copy the last phaseLen - 1 samples to the satrt of the state buffer.        
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  tapCnt = (phaseLen - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  tapCnt = (phaseLen - 1u) % 0x04u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+}
+
+
+#else
+
+void arm_fir_interpolate_q31(
+  const arm_fir_interpolate_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t *ptr1, *ptr2;                            /* Temporary pointers for state and coefficient buffers */
+
+  /* Run the below code for Cortex-M0 */
+
+  q63_t sum;                                     /* Accumulator */
+  q31_t x0, c0;                                  /* Temporary variables to hold state and coefficient values */
+  uint32_t i, blkCnt;                            /* Loop counters */
+  uint16_t phaseLen = S->phaseLength, tapCnt;    /* Length of each polyphase filter component */
+
+
+  /* S->pState buffer contains previous frame (phaseLen - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + ((q31_t) phaseLen - 1);
+
+  /* Total number of intput samples */
+  blkCnt = blockSize;
+
+  /* Loop over the blockSize. */
+  while(blkCnt > 0u)
+  {
+    /* Copy new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Loop over the Interpolation factor. */
+    i = S->L;
+
+    while(i > 0u)
+    {
+      /* Set accumulator to zero */
+      sum = 0;
+
+      /* Initialize state pointer */
+      ptr1 = pState;
+
+      /* Initialize coefficient pointer */
+      ptr2 = pCoeffs + (i - 1u);
+
+      tapCnt = phaseLen;
+
+      while(tapCnt > 0u)
+      {
+        /* Read the coefficient */
+        c0 = *(ptr2);
+
+        /* Increment the coefficient pointer by interpolation factor times. */
+        ptr2 += S->L;
+
+        /* Read the input sample */
+        x0 = *ptr1++;
+
+        /* Perform the multiply-accumulate */
+        sum += (q63_t) x0 *c0;
+
+        /* Decrement the loop counter */
+        tapCnt--;
+      }
+
+      /* The result is in the accumulator, store in the destination buffer. */
+      *pDst++ = (q31_t) (sum >> 31);
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 1           
+     * to process the next group of interpolation factor number samples */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last phaseLen - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  tapCnt = phaseLen - 1u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+}
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+ /**    
+  * @} end of FIR_Interpolate group    
+  */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,506 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_lattice_f32.c    
+*    
+* Description:	Processing function for the floating-point FIR Lattice filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup FIR_Lattice Finite Impulse Response (FIR) Lattice Filters    
+ *    
+ * This set of functions implements Finite Impulse Response (FIR) lattice filters    
+ * for Q15, Q31 and floating-point data types.  Lattice filters are used in a     
+ * variety of adaptive filter applications.  The filter structure is feedforward and    
+ * the net impulse response is finite length.    
+ * The functions operate on blocks    
+ * of input and output data and each call to the function processes    
+ * <code>blockSize</code> samples through the filter.  <code>pSrc</code> and    
+ * <code>pDst</code> point to input and output arrays containing <code>blockSize</code> values.    
+ *    
+ * \par Algorithm:    
+ * \image html FIRLattice.gif "Finite Impulse Response Lattice filter"    
+ * The following difference equation is implemented:    
+ * <pre>    
+ *    f0[n] = g0[n] = x[n]    
+ *    fm[n] = fm-1[n] + km * gm-1[n-1] for m = 1, 2, ...M    
+ *    gm[n] = km * fm-1[n] + gm-1[n-1] for m = 1, 2, ...M    
+ *    y[n] = fM[n]    
+ * </pre>    
+ * \par    
+ * <code>pCoeffs</code> points to tha array of reflection coefficients of size <code>numStages</code>.    
+ * Reflection Coefficients are stored in the following order.    
+ * \par    
+ * <pre>    
+ *    {k1, k2, ..., kM}    
+ * </pre>    
+ * where M is number of stages    
+ * \par    
+ * <code>pState</code> points to a state array of size <code>numStages</code>.    
+ * The state variables (g values) hold previous inputs and are stored in the following order.    
+ * <pre>    
+ *    {g0[n], g1[n], g2[n] ...gM-1[n]}    
+ * </pre>    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+ *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numStages, pCoeffs, pState. Also set all of the values in pState to zero. 
+ *    
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * Set the values in the state buffer to zeros and then manually initialize the instance structure as follows:    
+ * <pre>    
+ *arm_fir_lattice_instance_f32 S = {numStages, pState, pCoeffs};    
+ *arm_fir_lattice_instance_q31 S = {numStages, pState, pCoeffs};    
+ *arm_fir_lattice_instance_q15 S = {numStages, pState, pCoeffs};    
+ * </pre>    
+ * \par    
+ * where <code>numStages</code> is the number of stages in the filter; <code>pState</code> is the address of the state buffer;    
+ * <code>pCoeffs</code> is the address of the coefficient buffer.    
+ * \par Fixed-Point Behavior    
+ * Care must be taken when using the fixed-point versions of the FIR Lattice filter functions.    
+ * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.    
+ * Refer to the function specific documentation below for usage guidelines.    
+ */
+
+/**    
+ * @addtogroup FIR_Lattice    
+ * @{    
+ */
+
+
+  /**    
+   * @brief Processing function for the floating-point FIR lattice filter.    
+   * @param[in]  *S        points to an instance of the floating-point FIR lattice structure.    
+   * @param[in]  *pSrc     points to the block of input data.    
+   * @param[out] *pDst     points to the block of output data    
+   * @param[in]  blockSize number of samples to process.    
+   * @return none.    
+   */
+
+void arm_fir_lattice_f32(
+  const arm_fir_lattice_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pState;                             /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *px;                                 /* temporary state pointer */
+  float32_t *pk;                                 /* temporary coefficient pointer */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t fcurr1, fnext1, gcurr1, gnext1;      /* temporary variables for first sample in loop unrolling */
+  float32_t fcurr2, fnext2, gnext2;              /* temporary variables for second sample in loop unrolling */
+  float32_t fcurr3, fnext3, gnext3;              /* temporary variables for third sample in loop unrolling */
+  float32_t fcurr4, fnext4, gnext4;              /* temporary variables for fourth sample in loop unrolling */
+  uint32_t numStages = S->numStages;             /* Number of stages in the filter */
+  uint32_t blkCnt, stageCnt;                     /* temporary variables for counts */
+
+  gcurr1 = 0.0f;
+  pState = &S->pState[0];
+
+  blkCnt = blockSize >> 2;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+    /* Read two samples from input buffer */
+    /* f0(n) = x(n) */
+    fcurr1 = *pSrc++;
+    fcurr2 = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Read g0(n-1) from state */
+    gcurr1 = *px;
+
+    /* Process first sample for first tap */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext1 = fcurr1 + ((*pk) * gcurr1);
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext1 = (fcurr1 * (*pk)) + gcurr1;
+
+    /* Process second sample for first tap */
+    /* for sample 2 processing */
+    fnext2 = fcurr2 + ((*pk) * fcurr1);
+    gnext2 = (fcurr2 * (*pk)) + fcurr1;
+
+    /* Read next two samples from input buffer */
+    /* f0(n+2) = x(n+2) */
+    fcurr3 = *pSrc++;
+    fcurr4 = *pSrc++;
+
+    /* Copy only last input samples into the state buffer    
+       which will be used for next four samples processing */
+    *px++ = fcurr4;
+
+    /* Process third sample for first tap */
+    fnext3 = fcurr3 + ((*pk) * fcurr2);
+    gnext3 = (fcurr3 * (*pk)) + fcurr2;
+
+    /* Process fourth sample for first tap */
+    fnext4 = fcurr4 + ((*pk) * fcurr3);
+    gnext4 = (fcurr4 * (*pk++)) + fcurr3;
+
+    /* Update of f values for next coefficient set processing */
+    fcurr1 = fnext1;
+    fcurr2 = fnext2;
+    fcurr3 = fnext3;
+    fcurr4 = fnext4;
+
+    /* Loop unrolling.  Process 4 taps at a time . */
+    stageCnt = (numStages - 1u) >> 2u;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.    
+     ** Repeat until we've computed numStages-3 coefficients. */
+
+    /* Process 2nd, 3rd, 4th and 5th taps ... here */
+    while(stageCnt > 0u)
+    {
+      /* Read g1(n-1), g3(n-1) .... from state */
+      gcurr1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = gnext4;
+
+      /* Process first sample for 2nd, 6th .. tap */
+      /* Sample processing for K2, K6.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext1 = fcurr1 + ((*pk) * gcurr1);
+      /* Process second sample for 2nd, 6th .. tap */
+      /* for sample 2 processing */
+      fnext2 = fcurr2 + ((*pk) * gnext1);
+      /* Process third sample for 2nd, 6th .. tap */
+      fnext3 = fcurr3 + ((*pk) * gnext2);
+      /* Process fourth sample for 2nd, 6th .. tap */
+      fnext4 = fcurr4 + ((*pk) * gnext3);
+
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      /* Calculation of state values for next stage */
+      gnext4 = (fcurr4 * (*pk)) + gnext3;
+      gnext3 = (fcurr3 * (*pk)) + gnext2;
+      gnext2 = (fcurr2 * (*pk)) + gnext1;
+      gnext1 = (fcurr1 * (*pk++)) + gcurr1;
+
+
+      /* Read g2(n-1), g4(n-1) .... from state */
+      gcurr1 = *px;
+
+      /* save g2(n) in state buffer */
+      *px++ = gnext4;
+
+      /* Sample processing for K3, K7.... */
+      /* Process first sample for 3rd, 7th .. tap */
+      /* f3(n) = f2(n) +  K3 * g2(n-1) */
+      fcurr1 = fnext1 + ((*pk) * gcurr1);
+      /* Process second sample for 3rd, 7th .. tap */
+      fcurr2 = fnext2 + ((*pk) * gnext1);
+      /* Process third sample for 3rd, 7th .. tap */
+      fcurr3 = fnext3 + ((*pk) * gnext2);
+      /* Process fourth sample for 3rd, 7th .. tap */
+      fcurr4 = fnext4 + ((*pk) * gnext3);
+
+      /* Calculation of state values for next stage */
+      /* g3(n) = f2(n) * K3  +  g2(n-1) */
+      gnext4 = (fnext4 * (*pk)) + gnext3;
+      gnext3 = (fnext3 * (*pk)) + gnext2;
+      gnext2 = (fnext2 * (*pk)) + gnext1;
+      gnext1 = (fnext1 * (*pk++)) + gcurr1;
+
+
+      /* Read g1(n-1), g3(n-1) .... from state */
+      gcurr1 = *px;
+
+      /* save g3(n) in state buffer */
+      *px++ = gnext4;
+
+      /* Sample processing for K4, K8.... */
+      /* Process first sample for 4th, 8th .. tap */
+      /* f4(n) = f3(n) +  K4 * g3(n-1) */
+      fnext1 = fcurr1 + ((*pk) * gcurr1);
+      /* Process second sample for 4th, 8th .. tap */
+      /* for sample 2 processing */
+      fnext2 = fcurr2 + ((*pk) * gnext1);
+      /* Process third sample for 4th, 8th .. tap */
+      fnext3 = fcurr3 + ((*pk) * gnext2);
+      /* Process fourth sample for 4th, 8th .. tap */
+      fnext4 = fcurr4 + ((*pk) * gnext3);
+
+      /* g4(n) = f3(n) * K4  +  g3(n-1) */
+      /* Calculation of state values for next stage */
+      gnext4 = (fcurr4 * (*pk)) + gnext3;
+      gnext3 = (fcurr3 * (*pk)) + gnext2;
+      gnext2 = (fcurr2 * (*pk)) + gnext1;
+      gnext1 = (fcurr1 * (*pk++)) + gcurr1;
+
+      /* Read g2(n-1), g4(n-1) .... from state */
+      gcurr1 = *px;
+
+      /* save g4(n) in state buffer */
+      *px++ = gnext4;
+
+      /* Sample processing for K5, K9.... */
+      /* Process first sample for 5th, 9th .. tap */
+      /* f5(n) = f4(n) +  K5 * g4(n-1) */
+      fcurr1 = fnext1 + ((*pk) * gcurr1);
+      /* Process second sample for 5th, 9th .. tap */
+      fcurr2 = fnext2 + ((*pk) * gnext1);
+      /* Process third sample for 5th, 9th .. tap */
+      fcurr3 = fnext3 + ((*pk) * gnext2);
+      /* Process fourth sample for 5th, 9th .. tap */
+      fcurr4 = fnext4 + ((*pk) * gnext3);
+
+      /* Calculation of state values for next stage */
+      /* g5(n) = f4(n) * K5  +  g4(n-1) */
+      gnext4 = (fnext4 * (*pk)) + gnext3;
+      gnext3 = (fnext3 * (*pk)) + gnext2;
+      gnext2 = (fnext2 * (*pk)) + gnext1;
+      gnext1 = (fnext1 * (*pk++)) + gcurr1;
+
+      stageCnt--;
+    }
+
+    /* If the (filter length -1) is not a multiple of 4, compute the remaining filter taps */
+    stageCnt = (numStages - 1u) % 0x4u;
+
+    while(stageCnt > 0u)
+    {
+      gcurr1 = *px;
+
+      /* save g value in state buffer */
+      *px++ = gnext4;
+
+      /* Process four samples for last three taps here */
+      fnext1 = fcurr1 + ((*pk) * gcurr1);
+      fnext2 = fcurr2 + ((*pk) * gnext1);
+      fnext3 = fcurr3 + ((*pk) * gnext2);
+      fnext4 = fcurr4 + ((*pk) * gnext3);
+
+      /* g1(n) = f0(n) * K1  +  g0(n-1) */
+      gnext4 = (fcurr4 * (*pk)) + gnext3;
+      gnext3 = (fcurr3 * (*pk)) + gnext2;
+      gnext2 = (fcurr2 * (*pk)) + gnext1;
+      gnext1 = (fcurr1 * (*pk++)) + gcurr1;
+
+      /* Update of f values for next coefficient set processing */
+      fcurr1 = fnext1;
+      fcurr2 = fnext2;
+      fcurr3 = fnext3;
+      fcurr4 = fnext4;
+
+      stageCnt--;
+
+    }
+
+    /* The results in the 4 accumulators, store in the destination buffer. */
+    /* y(n) = fN(n) */
+    *pDst++ = fcurr1;
+    *pDst++ = fcurr2;
+    *pDst++ = fcurr3;
+    *pDst++ = fcurr4;
+
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurr1 = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g2(n) from state buffer */
+    gcurr1 = *px;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext1 = fcurr1 + ((*pk) * gcurr1);
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext1 = (fcurr1 * (*pk++)) + gcurr1;
+
+    /* save g1(n) in state buffer */
+    *px++ = fcurr1;
+
+    /* f1(n) is saved in fcurr1    
+       for next stage processing */
+    fcurr1 = fnext1;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+      /* read g2(n) from state buffer */
+      gcurr1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = gnext1;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext1 = fcurr1 + ((*pk) * gcurr1);
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext1 = (fcurr1 * (*pk++)) + gcurr1;
+
+      /* f1(n) is saved in fcurr1    
+         for next stage processing */
+      fcurr1 = fnext1;
+
+      stageCnt--;
+
+    }
+
+    /* y(n) = fN(n) */
+    *pDst++ = fcurr1;
+
+    blkCnt--;
+
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t fcurr, fnext, gcurr, gnext;          /* temporary variables */
+  uint32_t numStages = S->numStages;             /* Length of the filter */
+  uint32_t blkCnt, stageCnt;                     /* temporary variables for counts */
+
+  pState = &S->pState[0];
+
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = pCoeffs;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g0(n-1) from state buffer */
+    gcurr = *px;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext = fcurr + ((*pk) * gcurr);
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext = (fcurr * (*pk++)) + gcurr;
+
+    /* save f0(n) in state buffer */
+    *px++ = fcurr;
+
+    /* f1(n) is saved in fcurr            
+       for next stage processing */
+    fcurr = fnext;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+      /* read g2(n) from state buffer */
+      gcurr = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = gnext;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext = fcurr + ((*pk) * gcurr);
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext = (fcurr * (*pk++)) + gcurr;
+
+      /* f1(n) is saved in fcurr1            
+         for next stage processing */
+      fcurr = fnext;
+
+      stageCnt--;
+
+    }
+
+    /* y(n) = fN(n) */
+    *pDst++ = fcurr;
+
+    blkCnt--;
+
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,83 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_lattice_init_f32.c    
+*    
+* Description:  Floating-point FIR Lattice filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Lattice    
+ * @{    
+ */
+
+/**    
+ * @brief Initialization function for the floating-point FIR lattice filter.    
+ * @param[in] *S points to an instance of the floating-point FIR lattice structure.    
+ * @param[in] numStages  number of filter stages.    
+ * @param[in] *pCoeffs points to the coefficient buffer.  The array is of length numStages.    
+ * @param[in] *pState points to the state buffer.  The array is of length numStages.    
+ * @return none.    
+ */
+
+void arm_fir_lattice_init_f32(
+  arm_fir_lattice_instance_f32 * S,
+  uint16_t numStages,
+  float32_t * pCoeffs,
+  float32_t * pState)
+{
+  /* Assign filter taps */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always numStages */
+  memset(pState, 0, (numStages) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,83 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_lattice_init_q15.c    
+*    
+* Description:  Q15 FIR Lattice filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Lattice    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for the Q15 FIR lattice filter.    
+   * @param[in] *S points to an instance of the Q15 FIR lattice structure.    
+   * @param[in] numStages  number of filter stages.    
+   * @param[in] *pCoeffs points to the coefficient buffer.  The array is of length numStages.     
+   * @param[in] *pState points to the state buffer.  The array is of length numStages.     
+   * @return none.    
+   */
+
+void arm_fir_lattice_init_q15(
+  arm_fir_lattice_instance_q15 * S,
+  uint16_t numStages,
+  q15_t * pCoeffs,
+  q15_t * pState)
+{
+  /* Assign filter taps */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always numStages */
+  memset(pState, 0, (numStages) * sizeof(q15_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,83 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_lattice_init_q31.c    
+*    
+* Description:  Q31 FIR lattice filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Lattice    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for the Q31 FIR lattice filter.    
+   * @param[in] *S points to an instance of the Q31 FIR lattice structure.    
+   * @param[in] numStages  number of filter stages.    
+   * @param[in] *pCoeffs points to the coefficient buffer.  The array is of length numStages.    
+   * @param[in] *pState points to the state buffer.   The array is of length numStages.    
+   * @return none.    
+   */
+
+void arm_fir_lattice_init_q31(
+  arm_fir_lattice_instance_q31 * S,
+  uint16_t numStages,
+  q31_t * pCoeffs,
+  q31_t * pState)
+{
+  /* Assign filter taps */
+  S->numStages = numStages;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always numStages */
+  memset(pState, 0, (numStages) * sizeof(q31_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,536 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_lattice_q15.c    
+*    
+* Description:	Q15 FIR lattice filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Lattice    
+ * @{    
+ */
+
+
+/**    
+ * @brief Processing function for the Q15 FIR lattice filter.    
+ * @param[in]  *S        points to an instance of the Q15 FIR lattice structure.    
+ * @param[in]  *pSrc     points to the block of input data.    
+ * @param[out] *pDst     points to the block of output data    
+ * @param[in]  blockSize number of samples to process.    
+ * @return none.    
+ */
+
+void arm_fir_lattice_q15(
+  const arm_fir_lattice_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState;                                 /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *px;                                     /* temporary state pointer */
+  q15_t *pk;                                     /* temporary coefficient pointer */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t fcurnt1, fnext1, gcurnt1 = 0, gnext1;    /* temporary variables for first sample in loop unrolling */
+  q31_t fcurnt2, fnext2, gnext2;                 /* temporary variables for second sample in loop unrolling */
+  q31_t fcurnt3, fnext3, gnext3;                 /* temporary variables for third sample in loop unrolling */
+  q31_t fcurnt4, fnext4, gnext4;                 /* temporary variables for fourth sample in loop unrolling */
+  uint32_t numStages = S->numStages;             /* Number of stages in the filter */
+  uint32_t blkCnt, stageCnt;                     /* temporary variables for counts */
+
+  pState = &S->pState[0];
+
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+    /* Read two samples from input buffer */
+    /* f0(n) = x(n) */
+    fcurnt1 = *pSrc++;
+    fcurnt2 = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Read g0(n-1) from state */
+    gcurnt1 = *px;
+
+    /* Process first sample for first tap */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext1 = (q31_t) ((gcurnt1 * (*pk)) >> 15u) + fcurnt1;
+    fnext1 = __SSAT(fnext1, 16);
+
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext1 = (q31_t) ((fcurnt1 * (*pk)) >> 15u) + gcurnt1;
+    gnext1 = __SSAT(gnext1, 16);
+
+    /* Process second sample for first tap */
+    /* for sample 2 processing */
+    fnext2 = (q31_t) ((fcurnt1 * (*pk)) >> 15u) + fcurnt2;
+    fnext2 = __SSAT(fnext2, 16);
+
+    gnext2 = (q31_t) ((fcurnt2 * (*pk)) >> 15u) + fcurnt1;
+    gnext2 = __SSAT(gnext2, 16);
+
+
+    /* Read next two samples from input buffer */
+    /* f0(n+2) = x(n+2) */
+    fcurnt3 = *pSrc++;
+    fcurnt4 = *pSrc++;
+
+    /* Copy only last input samples into the state buffer    
+       which is used for next four samples processing */
+    *px++ = (q15_t) fcurnt4;
+
+    /* Process third sample for first tap */
+    fnext3 = (q31_t) ((fcurnt2 * (*pk)) >> 15u) + fcurnt3;
+    fnext3 = __SSAT(fnext3, 16);
+    gnext3 = (q31_t) ((fcurnt3 * (*pk)) >> 15u) + fcurnt2;
+    gnext3 = __SSAT(gnext3, 16);
+
+    /* Process fourth sample for first tap */
+    fnext4 = (q31_t) ((fcurnt3 * (*pk)) >> 15u) + fcurnt4;
+    fnext4 = __SSAT(fnext4, 16);
+    gnext4 = (q31_t) ((fcurnt4 * (*pk++)) >> 15u) + fcurnt3;
+    gnext4 = __SSAT(gnext4, 16);
+
+    /* Update of f values for next coefficient set processing */
+    fcurnt1 = fnext1;
+    fcurnt2 = fnext2;
+    fcurnt3 = fnext3;
+    fcurnt4 = fnext4;
+
+
+    /* Loop unrolling.  Process 4 taps at a time . */
+    stageCnt = (numStages - 1u) >> 2;
+
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.    
+     ** Repeat until we've computed numStages-3 coefficients. */
+
+    /* Process 2nd, 3rd, 4th and 5th taps ... here */
+    while(stageCnt > 0u)
+    {
+      /* Read g1(n-1), g3(n-1) .... from state */
+      gcurnt1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = (q15_t) gnext4;
+
+      /* Process first sample for 2nd, 6th .. tap */
+      /* Sample processing for K2, K6.... */
+      /* f1(n) = f0(n) +  K1 * g0(n-1) */
+      fnext1 = (q31_t) ((gcurnt1 * (*pk)) >> 15u) + fcurnt1;
+      fnext1 = __SSAT(fnext1, 16);
+
+
+      /* Process second sample for 2nd, 6th .. tap */
+      /* for sample 2 processing */
+      fnext2 = (q31_t) ((gnext1 * (*pk)) >> 15u) + fcurnt2;
+      fnext2 = __SSAT(fnext2, 16);
+      /* Process third sample for 2nd, 6th .. tap */
+      fnext3 = (q31_t) ((gnext2 * (*pk)) >> 15u) + fcurnt3;
+      fnext3 = __SSAT(fnext3, 16);
+      /* Process fourth sample for 2nd, 6th .. tap */
+      /* fnext4 = fcurnt4 + (*pk) * gnext3; */
+      fnext4 = (q31_t) ((gnext3 * (*pk)) >> 15u) + fcurnt4;
+      fnext4 = __SSAT(fnext4, 16);
+
+      /* g1(n) = f0(n) * K1  +  g0(n-1) */
+      /* Calculation of state values for next stage */
+      gnext4 = (q31_t) ((fcurnt4 * (*pk)) >> 15u) + gnext3;
+      gnext4 = __SSAT(gnext4, 16);
+      gnext3 = (q31_t) ((fcurnt3 * (*pk)) >> 15u) + gnext2;
+      gnext3 = __SSAT(gnext3, 16);
+
+      gnext2 = (q31_t) ((fcurnt2 * (*pk)) >> 15u) + gnext1;
+      gnext2 = __SSAT(gnext2, 16);
+
+      gnext1 = (q31_t) ((fcurnt1 * (*pk++)) >> 15u) + gcurnt1;
+      gnext1 = __SSAT(gnext1, 16);
+
+
+      /* Read g2(n-1), g4(n-1) .... from state */
+      gcurnt1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = (q15_t) gnext4;
+
+      /* Sample processing for K3, K7.... */
+      /* Process first sample for 3rd, 7th .. tap */
+      /* f3(n) = f2(n) +  K3 * g2(n-1) */
+      fcurnt1 = (q31_t) ((gcurnt1 * (*pk)) >> 15u) + fnext1;
+      fcurnt1 = __SSAT(fcurnt1, 16);
+
+      /* Process second sample for 3rd, 7th .. tap */
+      fcurnt2 = (q31_t) ((gnext1 * (*pk)) >> 15u) + fnext2;
+      fcurnt2 = __SSAT(fcurnt2, 16);
+
+      /* Process third sample for 3rd, 7th .. tap */
+      fcurnt3 = (q31_t) ((gnext2 * (*pk)) >> 15u) + fnext3;
+      fcurnt3 = __SSAT(fcurnt3, 16);
+
+      /* Process fourth sample for 3rd, 7th .. tap */
+      fcurnt4 = (q31_t) ((gnext3 * (*pk)) >> 15u) + fnext4;
+      fcurnt4 = __SSAT(fcurnt4, 16);
+
+      /* Calculation of state values for next stage */
+      /* g3(n) = f2(n) * K3  +  g2(n-1) */
+      gnext4 = (q31_t) ((fnext4 * (*pk)) >> 15u) + gnext3;
+      gnext4 = __SSAT(gnext4, 16);
+
+      gnext3 = (q31_t) ((fnext3 * (*pk)) >> 15u) + gnext2;
+      gnext3 = __SSAT(gnext3, 16);
+
+      gnext2 = (q31_t) ((fnext2 * (*pk)) >> 15u) + gnext1;
+      gnext2 = __SSAT(gnext2, 16);
+
+      gnext1 = (q31_t) ((fnext1 * (*pk++)) >> 15u) + gcurnt1;
+      gnext1 = __SSAT(gnext1, 16);
+
+      /* Read g1(n-1), g3(n-1) .... from state */
+      gcurnt1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = (q15_t) gnext4;
+
+      /* Sample processing for K4, K8.... */
+      /* Process first sample for 4th, 8th .. tap */
+      /* f4(n) = f3(n) +  K4 * g3(n-1) */
+      fnext1 = (q31_t) ((gcurnt1 * (*pk)) >> 15u) + fcurnt1;
+      fnext1 = __SSAT(fnext1, 16);
+
+      /* Process second sample for 4th, 8th .. tap */
+      /* for sample 2 processing */
+      fnext2 = (q31_t) ((gnext1 * (*pk)) >> 15u) + fcurnt2;
+      fnext2 = __SSAT(fnext2, 16);
+
+      /* Process third sample for 4th, 8th .. tap */
+      fnext3 = (q31_t) ((gnext2 * (*pk)) >> 15u) + fcurnt3;
+      fnext3 = __SSAT(fnext3, 16);
+
+      /* Process fourth sample for 4th, 8th .. tap */
+      fnext4 = (q31_t) ((gnext3 * (*pk)) >> 15u) + fcurnt4;
+      fnext4 = __SSAT(fnext4, 16);
+
+      /* g4(n) = f3(n) * K4  +  g3(n-1) */
+      /* Calculation of state values for next stage */
+      gnext4 = (q31_t) ((fcurnt4 * (*pk)) >> 15u) + gnext3;
+      gnext4 = __SSAT(gnext4, 16);
+
+      gnext3 = (q31_t) ((fcurnt3 * (*pk)) >> 15u) + gnext2;
+      gnext3 = __SSAT(gnext3, 16);
+
+      gnext2 = (q31_t) ((fcurnt2 * (*pk)) >> 15u) + gnext1;
+      gnext2 = __SSAT(gnext2, 16);
+      gnext1 = (q31_t) ((fcurnt1 * (*pk++)) >> 15u) + gcurnt1;
+      gnext1 = __SSAT(gnext1, 16);
+
+
+      /* Read g2(n-1), g4(n-1) .... from state */
+      gcurnt1 = *px;
+
+      /* save g4(n) in state buffer */
+      *px++ = (q15_t) gnext4;
+
+      /* Sample processing for K5, K9.... */
+      /* Process first sample for 5th, 9th .. tap */
+      /* f5(n) = f4(n) +  K5 * g4(n-1) */
+      fcurnt1 = (q31_t) ((gcurnt1 * (*pk)) >> 15u) + fnext1;
+      fcurnt1 = __SSAT(fcurnt1, 16);
+
+      /* Process second sample for 5th, 9th .. tap */
+      fcurnt2 = (q31_t) ((gnext1 * (*pk)) >> 15u) + fnext2;
+      fcurnt2 = __SSAT(fcurnt2, 16);
+
+      /* Process third sample for 5th, 9th .. tap */
+      fcurnt3 = (q31_t) ((gnext2 * (*pk)) >> 15u) + fnext3;
+      fcurnt3 = __SSAT(fcurnt3, 16);
+
+      /* Process fourth sample for 5th, 9th .. tap */
+      fcurnt4 = (q31_t) ((gnext3 * (*pk)) >> 15u) + fnext4;
+      fcurnt4 = __SSAT(fcurnt4, 16);
+
+      /* Calculation of state values for next stage */
+      /* g5(n) = f4(n) * K5  +  g4(n-1) */
+      gnext4 = (q31_t) ((fnext4 * (*pk)) >> 15u) + gnext3;
+      gnext4 = __SSAT(gnext4, 16);
+      gnext3 = (q31_t) ((fnext3 * (*pk)) >> 15u) + gnext2;
+      gnext3 = __SSAT(gnext3, 16);
+      gnext2 = (q31_t) ((fnext2 * (*pk)) >> 15u) + gnext1;
+      gnext2 = __SSAT(gnext2, 16);
+      gnext1 = (q31_t) ((fnext1 * (*pk++)) >> 15u) + gcurnt1;
+      gnext1 = __SSAT(gnext1, 16);
+
+      stageCnt--;
+    }
+
+    /* If the (filter length -1) is not a multiple of 4, compute the remaining filter taps */
+    stageCnt = (numStages - 1u) % 0x4u;
+
+    while(stageCnt > 0u)
+    {
+      gcurnt1 = *px;
+
+      /* save g value in state buffer */
+      *px++ = (q15_t) gnext4;
+
+      /* Process four samples for last three taps here */
+      fnext1 = (q31_t) ((gcurnt1 * (*pk)) >> 15u) + fcurnt1;
+      fnext1 = __SSAT(fnext1, 16);
+      fnext2 = (q31_t) ((gnext1 * (*pk)) >> 15u) + fcurnt2;
+      fnext2 = __SSAT(fnext2, 16);
+
+      fnext3 = (q31_t) ((gnext2 * (*pk)) >> 15u) + fcurnt3;
+      fnext3 = __SSAT(fnext3, 16);
+
+      fnext4 = (q31_t) ((gnext3 * (*pk)) >> 15u) + fcurnt4;
+      fnext4 = __SSAT(fnext4, 16);
+
+      /* g1(n) = f0(n) * K1  +  g0(n-1) */
+      gnext4 = (q31_t) ((fcurnt4 * (*pk)) >> 15u) + gnext3;
+      gnext4 = __SSAT(gnext4, 16);
+      gnext3 = (q31_t) ((fcurnt3 * (*pk)) >> 15u) + gnext2;
+      gnext3 = __SSAT(gnext3, 16);
+      gnext2 = (q31_t) ((fcurnt2 * (*pk)) >> 15u) + gnext1;
+      gnext2 = __SSAT(gnext2, 16);
+      gnext1 = (q31_t) ((fcurnt1 * (*pk++)) >> 15u) + gcurnt1;
+      gnext1 = __SSAT(gnext1, 16);
+
+      /* Update of f values for next coefficient set processing */
+      fcurnt1 = fnext1;
+      fcurnt2 = fnext2;
+      fcurnt3 = fnext3;
+      fcurnt4 = fnext4;
+
+      stageCnt--;
+
+    }
+
+    /* The results in the 4 accumulators, store in the destination buffer. */
+    /* y(n) = fN(n) */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pDst)++ = __PKHBT(fcurnt1, fcurnt2, 16);
+    *__SIMD32(pDst)++ = __PKHBT(fcurnt3, fcurnt4, 16);
+
+#else
+
+    *__SIMD32(pDst)++ = __PKHBT(fcurnt2, fcurnt1, 16);
+    *__SIMD32(pDst)++ = __PKHBT(fcurnt4, fcurnt3, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurnt1 = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g2(n) from state buffer */
+    gcurnt1 = *px;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext1 = (((q31_t) gcurnt1 * (*pk)) >> 15u) + fcurnt1;
+    fnext1 = __SSAT(fnext1, 16);
+
+
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext1 = (((q31_t) fcurnt1 * (*pk++)) >> 15u) + gcurnt1;
+    gnext1 = __SSAT(gnext1, 16);
+
+    /* save g1(n) in state buffer */
+    *px++ = (q15_t) fcurnt1;
+
+    /* f1(n) is saved in fcurnt1    
+       for next stage processing */
+    fcurnt1 = fnext1;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+      /* read g2(n) from state buffer */
+      gcurnt1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = (q15_t) gnext1;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext1 = (((q31_t) gcurnt1 * (*pk)) >> 15u) + fcurnt1;
+      fnext1 = __SSAT(fnext1, 16);
+
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext1 = (((q31_t) fcurnt1 * (*pk++)) >> 15u) + gcurnt1;
+      gnext1 = __SSAT(gnext1, 16);
+
+
+      /* f1(n) is saved in fcurnt1    
+         for next stage processing */
+      fcurnt1 = fnext1;
+
+      stageCnt--;
+
+    }
+
+    /* y(n) = fN(n) */
+    *pDst++ = __SSAT(fcurnt1, 16);
+
+
+    blkCnt--;
+
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t fcurnt, fnext, gcurnt, gnext;            /* temporary variables */
+  uint32_t numStages = S->numStages;             /* Length of the filter */
+  uint32_t blkCnt, stageCnt;                     /* temporary variables for counts */
+
+  pState = &S->pState[0];
+
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurnt = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g0(n-1) from state buffer */
+    gcurnt = *px;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext = ((gcurnt * (*pk)) >> 15u) + fcurnt;
+    fnext = __SSAT(fnext, 16);
+
+
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext = ((fcurnt * (*pk++)) >> 15u) + gcurnt;
+    gnext = __SSAT(gnext, 16);
+
+    /* save f0(n) in state buffer */
+    *px++ = (q15_t) fcurnt;
+
+    /* f1(n) is saved in fcurnt            
+       for next stage processing */
+    fcurnt = fnext;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+      /* read g1(n-1) from state buffer */
+      gcurnt = *px;
+
+      /* save g0(n-1) in state buffer */
+      *px++ = (q15_t) gnext;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext = ((gcurnt * (*pk)) >> 15u) + fcurnt;
+      fnext = __SSAT(fnext, 16);
+
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext = ((fcurnt * (*pk++)) >> 15u) + gcurnt;
+      gnext = __SSAT(gnext, 16);
+
+
+      /* f1(n) is saved in fcurnt            
+         for next stage processing */
+      fcurnt = fnext;
+
+      stageCnt--;
+
+    }
+
+    /* y(n) = fN(n) */
+    *pDst++ = __SSAT(fcurnt, 16);
+
+
+    blkCnt--;
+
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_lattice_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,353 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_lattice_q31.c    
+*    
+* Description:	Q31 FIR lattice filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Lattice    
+ * @{    
+ */
+
+
+/**    
+ * @brief Processing function for the Q31 FIR lattice filter.    
+ * @param[in]  *S        points to an instance of the Q31 FIR lattice structure.    
+ * @param[in]  *pSrc     points to the block of input data.    
+ * @param[out] *pDst     points to the block of output data    
+ * @param[in]  blockSize number of samples to process.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * In order to avoid overflows the input signal must be scaled down by 2*log2(numStages) bits.    
+ */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+void arm_fir_lattice_q31(
+  const arm_fir_lattice_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState;                                 /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *px;                                     /* temporary state pointer */
+  q31_t *pk;                                     /* temporary coefficient pointer */
+  q31_t fcurr1, fnext1, gcurr1 = 0, gnext1;      /* temporary variables for first sample in loop unrolling */
+  q31_t fcurr2, fnext2, gnext2;                  /* temporary variables for second sample in loop unrolling */
+  uint32_t numStages = S->numStages;             /* Length of the filter */
+  uint32_t blkCnt, stageCnt;                     /* temporary variables for counts */
+  q31_t k;
+
+  pState = &S->pState[0];
+
+  blkCnt = blockSize >> 1u;
+
+  /* First part of the processing with loop unrolling.  Compute 2 outputs at a time.        
+     a second loop below computes the remaining 1 sample. */
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurr1 = *pSrc++;
+
+    /* f0(n) = x(n) */
+    fcurr2 = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g0(n - 1) from state buffer */
+    gcurr1 = *px;
+
+    /* Read the reflection coefficient */
+    k = *pk++;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext1 = (q31_t) (((q63_t) gcurr1 * k) >> 32);
+
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext1 = (q31_t) (((q63_t) fcurr1 * (k)) >> 32);
+    fnext1 = fcurr1 + (fnext1 << 1u);
+    gnext1 = gcurr1 + (gnext1 << 1u);
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext2 = (q31_t) (((q63_t) fcurr1 * k) >> 32);
+
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext2 = (q31_t) (((q63_t) fcurr2 * (k)) >> 32);
+    fnext2 = fcurr2 + (fnext2 << 1u);
+    gnext2 = fcurr1 + (gnext2 << 1u);
+
+    /* save g1(n) in state buffer */
+    *px++ = fcurr2;
+
+    /* f1(n) is saved in fcurr1        
+       for next stage processing */
+    fcurr1 = fnext1;
+    fcurr2 = fnext2;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+
+      /* Read the reflection coefficient */
+      k = *pk++;
+
+      /* read g2(n) from state buffer */
+      gcurr1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = gnext2;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext1 = (q31_t) (((q63_t) gcurr1 * k) >> 32);
+      fnext2 = (q31_t) (((q63_t) gnext1 * k) >> 32);
+
+      fnext1 = fcurr1 + (fnext1 << 1u);
+      fnext2 = fcurr2 + (fnext2 << 1u);
+
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext2 = (q31_t) (((q63_t) fcurr2 * (k)) >> 32);
+      gnext2 = gnext1 + (gnext2 << 1u);
+
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext1 = (q31_t) (((q63_t) fcurr1 * (k)) >> 32);
+      gnext1 = gcurr1 + (gnext1 << 1u);
+
+      /* f1(n) is saved in fcurr1        
+         for next stage processing */
+      fcurr1 = fnext1;
+      fcurr2 = fnext2;
+
+      stageCnt--;
+
+    }
+
+    /* y(n) = fN(n) */
+    *pDst++ = fcurr1;
+    *pDst++ = fcurr2;
+
+    blkCnt--;
+
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x2u;
+
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurr1 = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g0(n - 1) from state buffer */
+    gcurr1 = *px;
+
+    /* Read the reflection coefficient */
+    k = *pk++;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext1 = (q31_t) (((q63_t) gcurr1 * k) >> 32);
+    fnext1 = fcurr1 + (fnext1 << 1u);
+
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext1 = (q31_t) (((q63_t) fcurr1 * (k)) >> 32);
+    gnext1 = gcurr1 + (gnext1 << 1u);
+
+    /* save g1(n) in state buffer */
+    *px++ = fcurr1;
+
+    /* f1(n) is saved in fcurr1        
+       for next stage processing */
+    fcurr1 = fnext1;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+      /* Read the reflection coefficient */
+      k = *pk++;
+
+      /* read g2(n) from state buffer */
+      gcurr1 = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = gnext1;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext1 = (q31_t) (((q63_t) gcurr1 * k) >> 32);
+      fnext1 = fcurr1 + (fnext1 << 1u);
+
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext1 = (q31_t) (((q63_t) fcurr1 * (k)) >> 32);
+      gnext1 = gcurr1 + (gnext1 << 1u);
+
+      /* f1(n) is saved in fcurr1        
+         for next stage processing */
+      fcurr1 = fnext1;
+
+      stageCnt--;
+
+    }
+
+
+    /* y(n) = fN(n) */
+    *pDst++ = fcurr1;
+
+    blkCnt--;
+
+  }
+
+
+}
+
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+void arm_fir_lattice_q31(
+  const arm_fir_lattice_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState;                                 /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *px;                                     /* temporary state pointer */
+  q31_t *pk;                                     /* temporary coefficient pointer */
+  q31_t fcurr, fnext, gcurr, gnext;              /* temporary variables */
+  uint32_t numStages = S->numStages;             /* Length of the filter */
+  uint32_t blkCnt, stageCnt;                     /* temporary variables for counts */
+
+  pState = &S->pState[0];
+
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* f0(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize coeff pointer */
+    pk = (pCoeffs);
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* read g0(n-1) from state buffer */
+    gcurr = *px;
+
+    /* for sample 1 processing */
+    /* f1(n) = f0(n) +  K1 * g0(n-1) */
+    fnext = (q31_t) (((q63_t) gcurr * (*pk)) >> 31) + fcurr;
+    /* g1(n) = f0(n) * K1  +  g0(n-1) */
+    gnext = (q31_t) (((q63_t) fcurr * (*pk++)) >> 31) + gcurr;
+    /* save g1(n) in state buffer */
+    *px++ = fcurr;
+
+    /* f1(n) is saved in fcurr1            
+       for next stage processing */
+    fcurr = fnext;
+
+    stageCnt = (numStages - 1u);
+
+    /* stage loop */
+    while(stageCnt > 0u)
+    {
+      /* read g2(n) from state buffer */
+      gcurr = *px;
+
+      /* save g1(n) in state buffer */
+      *px++ = gnext;
+
+      /* Sample processing for K2, K3.... */
+      /* f2(n) = f1(n) +  K2 * g1(n-1) */
+      fnext = (q31_t) (((q63_t) gcurr * (*pk)) >> 31) + fcurr;
+      /* g2(n) = f1(n) * K2  +  g1(n-1) */
+      gnext = (q31_t) (((q63_t) fcurr * (*pk++)) >> 31) + gcurr;
+
+      /* f1(n) is saved in fcurr1            
+         for next stage processing */
+      fcurr = fnext;
+
+      stageCnt--;
+
+    }
+
+    /* y(n) = fN(n) */
+    *pDst++ = fcurr;
+
+    blkCnt--;
+
+  }
+
+}
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+
+/**    
+ * @} end of FIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,691 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_q15.c    
+*    
+* Description:  Q15 FIR filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**       
+ * @ingroup groupFilters       
+ */
+
+/**       
+ * @addtogroup FIR       
+ * @{       
+ */
+
+/**       
+ * @brief Processing function for the Q15 FIR filter.       
+ * @param[in] *S points to an instance of the Q15 FIR structure.       
+ * @param[in] *pSrc points to the block of input data.       
+ * @param[out] *pDst points to the block of output data.       
+ * @param[in]  blockSize number of samples to process per call.       
+ * @return none.       
+ *   
+ *   
+ * \par Restrictions   
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE   
+ *	In this case input, output, state buffers should be aligned by 32-bit   
+ *   
+ * <b>Scaling and Overflow Behavior:</b>       
+ * \par       
+ * The function is implemented using a 64-bit internal accumulator.       
+ * Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result.       
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.       
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.       
+ * After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits.       
+ * Lastly, the accumulator is saturated to yield a result in 1.15 format.       
+ *       
+ * \par       
+ * Refer to the function <code>arm_fir_fast_q15()</code> for a faster but less precise implementation of this function.       
+ */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+/* Run the below code for Cortex-M4 and Cortex-M3 */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+
+void arm_fir_q15(
+  const arm_fir_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px1;                                    /* Temporary q15 pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q31_t x0, x1, x2, x3, c0;                      /* Temporary variables to hold SIMD state and coefficient values */
+  q63_t acc0, acc1, acc2, acc3;                  /* Accumulators */
+  uint32_t numTaps = S->numTaps;                 /* Number of taps in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Apply loop unrolling and compute 4 output values simultaneously.       
+   * The variables acc0 ... acc3 hold output values that are being computed:       
+   *       
+   *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]       
+   *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]       
+   *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]       
+   *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]       
+   */
+
+  blkCnt = blockSize >> 2;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.       
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Copy four new input samples into the state buffer.       
+     ** Use 32-bit SIMD to move the 16-bit data.  Only requires two copies. */
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pSrc)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pSrc)++;
+
+    /* Set all accumulators to zero */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Initialize state pointer of type q15 */
+    px1 = pState;
+
+    /* Initialize coeff pointer of type q31 */
+    pb = pCoeffs;
+
+    /* Read the first two samples from the state buffer:  x[n-N], x[n-N-1] */
+    x0 = _SIMD32_OFFSET(px1);
+
+    /* Read the third and forth samples from the state buffer: x[n-N-1], x[n-N-2] */
+    x1 = _SIMD32_OFFSET(px1 + 1u);
+
+    px1 += 2u;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.       
+     ** Repeat until we've computed numTaps-4 coefficients. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0u)
+    {
+      /* Read the first two coefficients using SIMD:  b[N] and b[N-1] coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* acc0 +=  b[N] * x[n-N] + b[N-1] * x[n-N-1] */
+      acc0 = __SMLALD(x0, c0, acc0);
+
+      /* acc1 +=  b[N] * x[n-N-1] + b[N-1] * x[n-N-2] */
+      acc1 = __SMLALD(x1, c0, acc1);
+
+      /* Read state x[n-N-2], x[n-N-3] */
+      x2 = _SIMD32_OFFSET(px1);
+
+      /* Read state x[n-N-3], x[n-N-4] */
+      x3 = _SIMD32_OFFSET(px1 + 1u);
+
+      /* acc2 +=  b[N] * x[n-N-2] + b[N-1] * x[n-N-3] */
+      acc2 = __SMLALD(x2, c0, acc2);
+
+      /* acc3 +=  b[N] * x[n-N-3] + b[N-1] * x[n-N-4] */
+      acc3 = __SMLALD(x3, c0, acc3);
+
+      /* Read coefficients b[N-2], b[N-3] */
+      c0 = *__SIMD32(pb)++;
+
+      /* acc0 +=  b[N-2] * x[n-N-2] + b[N-3] * x[n-N-3] */
+      acc0 = __SMLALD(x2, c0, acc0);
+
+      /* acc1 +=  b[N-2] * x[n-N-3] + b[N-3] * x[n-N-4] */
+      acc1 = __SMLALD(x3, c0, acc1);
+
+      /* Read state x[n-N-4], x[n-N-5] */
+      x0 = _SIMD32_OFFSET(px1 + 2u);
+
+      /* Read state x[n-N-5], x[n-N-6] */
+      x1 = _SIMD32_OFFSET(px1 + 3u);
+
+      /* acc2 +=  b[N-2] * x[n-N-4] + b[N-3] * x[n-N-5] */
+      acc2 = __SMLALD(x0, c0, acc2);
+
+      /* acc3 +=  b[N-2] * x[n-N-5] + b[N-3] * x[n-N-6] */
+      acc3 = __SMLALD(x1, c0, acc3);
+
+      px1 += 4u;
+
+      tapCnt--;
+
+    }
+
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps.       
+     ** This is always be 2 taps since the filter length is even. */
+    if((numTaps & 0x3u) != 0u)
+    {
+      /* Read 2 coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Fetch 4 state variables */
+      x2 = _SIMD32_OFFSET(px1);
+
+      x3 = _SIMD32_OFFSET(px1 + 1u);
+
+      /* Perform the multiply-accumulates */
+      acc0 = __SMLALD(x0, c0, acc0);
+
+      px1 += 2u;
+
+      acc1 = __SMLALD(x1, c0, acc1);
+      acc2 = __SMLALD(x2, c0, acc2);
+      acc3 = __SMLALD(x3, c0, acc3);
+    }
+
+    /* The results in the 4 accumulators are in 2.30 format.  Convert to 1.15 with saturation.       
+     ** Then store the 4 outputs in the destination buffer. */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+#endif /*      #ifndef ARM_MATH_BIG_ENDIAN       */
+
+
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.       
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+  while(blkCnt > 0u)
+  {
+    /* Copy two samples into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc0 = 0;
+
+    /* Initialize state pointer of type q15 */
+    px1 = pState;
+
+    /* Initialize coeff pointer of type q31 */
+    pb = pCoeffs;
+
+    tapCnt = numTaps >> 1;
+
+    do
+    {
+
+      c0 = *__SIMD32(pb)++;
+      x0 = *__SIMD32(px1)++;
+
+      acc0 = __SMLALD(x0, c0, acc0);
+      tapCnt--;
+    }
+    while(tapCnt > 0u);
+
+    /* The result is in 2.30 format.  Convert to 1.15 with saturation.       
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.       
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Calculation of count for copying integer writes */
+  tapCnt = (numTaps - 1u) >> 2;
+
+  while(tapCnt > 0u)
+  {
+
+    /* Copy state values to start of state buffer */
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+
+    tapCnt--;
+
+  }
+
+  /* Calculation of count for remaining q15_t data */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* copy remaining data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+}
+
+#else /* UNALIGNED_SUPPORT_DISABLE */
+
+void arm_fir_q15(
+  const arm_fir_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q63_t acc0, acc1, acc2, acc3;                  /* Accumulators */
+  q15_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q15_t *px;                                     /* Temporary q31 pointer for SIMD state buffer accesses */
+  q31_t x0, x1, x2, c0;                          /* Temporary variables to hold SIMD state and coefficient values */
+  uint32_t numTaps = S->numTaps;                 /* Number of taps in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Apply loop unrolling and compute 4 output values simultaneously.      
+   * The variables acc0 ... acc3 hold output values that are being computed:      
+   *      
+   *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]      
+   *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]      
+   *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]      
+   *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]      
+   */
+
+  blkCnt = blockSize >> 2;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.      
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Copy four new input samples into the state buffer.      
+     ** Use 32-bit SIMD to move the 16-bit data.  Only requires two copies. */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+
+    /* Set all accumulators to zero */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Typecast q15_t pointer to q31_t pointer for state reading in q31_t */
+    px = pState;
+
+    /* Typecast q15_t pointer to q31_t pointer for coefficient reading in q31_t */
+    pb = pCoeffs;
+
+    /* Read the first two samples from the state buffer:  x[n-N], x[n-N-1] */
+    x0 = *__SIMD32(px)++;
+
+    /* Read the third and forth samples from the state buffer: x[n-N-2], x[n-N-3] */
+    x2 = *__SIMD32(px)++;
+
+    /* Loop over the number of taps.  Unroll by a factor of 4.      
+     ** Repeat until we've computed numTaps-(numTaps%4) coefficients. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0)
+    {
+      /* Read the first two coefficients using SIMD:  b[N] and b[N-1] coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* acc0 +=  b[N] * x[n-N] + b[N-1] * x[n-N-1] */
+      acc0 = __SMLALD(x0, c0, acc0);
+
+      /* acc2 +=  b[N] * x[n-N-2] + b[N-1] * x[n-N-3] */
+      acc2 = __SMLALD(x2, c0, acc2);
+
+      /* pack  x[n-N-1] and x[n-N-2] */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x2, x0, 0);
+#else
+      x1 = __PKHBT(x0, x2, 0);
+#endif
+
+      /* Read state x[n-N-4], x[n-N-5] */
+      x0 = _SIMD32_OFFSET(px);
+
+      /* acc1 +=  b[N] * x[n-N-1] + b[N-1] * x[n-N-2] */
+      acc1 = __SMLALDX(x1, c0, acc1);
+
+      /* pack  x[n-N-3] and x[n-N-4] */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x0, x2, 0);
+#else
+      x1 = __PKHBT(x2, x0, 0);
+#endif
+
+      /* acc3 +=  b[N] * x[n-N-3] + b[N-1] * x[n-N-4] */
+      acc3 = __SMLALDX(x1, c0, acc3);
+
+      /* Read coefficients b[N-2], b[N-3] */
+      c0 = *__SIMD32(pb)++;
+
+      /* acc0 +=  b[N-2] * x[n-N-2] + b[N-3] * x[n-N-3] */
+      acc0 = __SMLALD(x2, c0, acc0);
+
+      /* Read state x[n-N-6], x[n-N-7] with offset */
+      x2 = _SIMD32_OFFSET(px + 2u);
+
+      /* acc2 +=  b[N-2] * x[n-N-4] + b[N-3] * x[n-N-5] */
+      acc2 = __SMLALD(x0, c0, acc2);
+
+      /* acc1 +=  b[N-2] * x[n-N-3] + b[N-3] * x[n-N-4] */
+      acc1 = __SMLALDX(x1, c0, acc1);
+
+      /* pack  x[n-N-5] and x[n-N-6] */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x2, x0, 0);
+#else
+      x1 = __PKHBT(x0, x2, 0);
+#endif
+
+      /* acc3 +=  b[N-2] * x[n-N-5] + b[N-3] * x[n-N-6] */
+      acc3 = __SMLALDX(x1, c0, acc3);
+
+      /* Update state pointer for next state reading */
+      px += 4u;
+
+      /* Decrement tap count */
+      tapCnt--;
+
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps.       
+     ** This is always be 2 taps since the filter length is even. */
+    if((numTaps & 0x3u) != 0u)
+    {
+
+      /* Read last two coefficients */
+      c0 = *__SIMD32(pb)++;
+
+      /* Perform the multiply-accumulates */
+      acc0 = __SMLALD(x0, c0, acc0);
+      acc2 = __SMLALD(x2, c0, acc2);
+
+      /* pack state variables */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x2, x0, 0);
+#else
+      x1 = __PKHBT(x0, x2, 0);
+#endif
+
+      /* Read last state variables */
+      x0 = *__SIMD32(px);
+
+      /* Perform the multiply-accumulates */
+      acc1 = __SMLALDX(x1, c0, acc1);
+
+      /* pack state variables */
+#ifndef ARM_MATH_BIG_ENDIAN
+      x1 = __PKHBT(x0, x2, 0);
+#else
+      x1 = __PKHBT(x2, x0, 0);
+#endif
+
+      /* Perform the multiply-accumulates */
+      acc3 = __SMLALDX(x1, c0, acc3);
+    }
+
+    /* The results in the 4 accumulators are in 2.30 format.  Convert to 1.15 with saturation.       
+     ** Then store the 4 outputs in the destination buffer. */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc0 >> 15), 16), __SSAT((acc1 >> 15), 16), 16);
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc2 >> 15), 16), __SSAT((acc3 >> 15), 16), 16);
+
+#else
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc1 >> 15), 16), __SSAT((acc0 >> 15), 16), 16);
+
+    *__SIMD32(pDst)++ =
+      __PKHBT(__SSAT((acc3 >> 15), 16), __SSAT((acc2 >> 15), 16), 16);
+
+#endif /*      #ifndef ARM_MATH_BIG_ENDIAN       */
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.      
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+  while(blkCnt > 0u)
+  {
+    /* Copy two samples into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc0 = 0;
+
+    /* Use SIMD to hold states and coefficients */
+    px = pState;
+    pb = pCoeffs;
+
+    tapCnt = numTaps >> 1u;
+
+    do
+    {
+      acc0 += (q31_t) * px++ * *pb++;
+	  acc0 += (q31_t) * px++ * *pb++;
+      tapCnt--;
+    }
+    while(tapCnt > 0u);
+
+    /* The result is in 2.30 format.  Convert to 1.15 with saturation.      
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q15_t) (__SSAT((acc0 >> 15), 16));
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.      
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.      
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Calculation of count for copying integer writes */
+  tapCnt = (numTaps - 1u) >> 2;
+
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    tapCnt--;
+
+  }
+
+  /* Calculation of count for remaining q15_t data */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* copy remaining data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+}
+
+
+#endif /* #ifndef UNALIGNED_SUPPORT_DISABLE */
+
+#else /* ARM_MATH_CM0_FAMILY */
+
+
+/* Run the below code for Cortex-M0 */
+
+void arm_fir_q15(
+  const arm_fir_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+
+
+
+  q15_t *px;                                     /* Temporary pointer for state buffer */
+  q15_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q63_t acc;                                     /* Accumulator */
+  uint32_t numTaps = S->numTaps;                 /* Number of nTaps in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Initialize blkCnt with blockSize */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy one sample at a time into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize Coefficient pointer */
+    pb = pCoeffs;
+
+    tapCnt = numTaps;
+
+    /* Perform the multiply-accumulates */
+    do
+    {
+      /* acc =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0] */
+      acc += (q31_t) * px++ * *pb++;
+      tapCnt--;
+    } while(tapCnt > 0u);
+
+    /* The result is in 2.30 format.  Convert to 1.15         
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q15_t) __SSAT((acc >> 15u), 16);
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Copy numTaps number of values */
+  tapCnt = (numTaps - 1u);
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+}
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+
+
+/**       
+ * @} end of FIR group       
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,365 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_q31.c    
+*    
+* Description:	Q31 FIR filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @param[in] *S points to an instance of the Q31 FIR filter structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in] blockSize number of samples to process per call.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around rather than clip.    
+ * In order to avoid overflows completely the input signal must be scaled down by log2(numTaps) bits.    
+ * After all multiply-accumulates are performed, the 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.  
+ *    
+ * \par    
+ * Refer to the function <code>arm_fir_fast_q31()</code> for a faster but less precise implementation of this filter for Cortex-M3 and Cortex-M4.    
+ */
+
+void arm_fir_q31(
+  const arm_fir_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t x0, x1, x2;                              /* Temporary variables to hold state */
+  q31_t c0;                                      /* Temporary variable to hold coefficient value */
+  q31_t *px;                                     /* Temporary pointer for state */
+  q31_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q63_t acc0, acc1, acc2;                        /* Accumulators */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t i, tapCnt, blkCnt, tapCntN3;          /* Loop counters */
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Apply loop unrolling and compute 4 output values simultaneously.    
+   * The variables acc0 ... acc3 hold output values that are being computed:    
+   *    
+   *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]    
+   *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]    
+   *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]    
+   *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]    
+   */
+  blkCnt = blockSize / 3;
+  blockSize = blockSize - (3 * blkCnt);
+
+  tapCnt = numTaps / 3;
+  tapCntN3 = numTaps - (3 * tapCnt);
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Copy three new input samples into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set all accumulators to zero */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Read the first two samples from the state buffer:    
+     *  x[n-numTaps], x[n-numTaps-1] */
+    x0 = *(px++);
+    x1 = *(px++);
+
+    /* Loop unrolling.  Process 3 taps at a time. */
+    i = tapCnt;
+
+    while(i > 0u)
+    {
+      /* Read the b[numTaps] coefficient */
+      c0 = *pb;
+
+      /* Read x[n-numTaps-2] sample */
+      x2 = *(px++);
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q63_t) x0 * c0);
+      acc1 += ((q63_t) x1 * c0);
+      acc2 += ((q63_t) x2 * c0);
+
+      /* Read the coefficient and state */
+      c0 = *(pb + 1u);
+      x0 = *(px++);
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q63_t) x1 * c0);
+      acc1 += ((q63_t) x2 * c0);
+      acc2 += ((q63_t) x0 * c0);
+
+      /* Read the coefficient and state */
+      c0 = *(pb + 2u);
+      x1 = *(px++);
+
+      /* update coefficient pointer */
+      pb += 3u;
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q63_t) x2 * c0);
+      acc1 += ((q63_t) x0 * c0);
+      acc2 += ((q63_t) x1 * c0);
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* If the filter length is not a multiple of 3, compute the remaining filter taps */
+
+    i = tapCntN3;
+
+    while(i > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x2 = *(px++);
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q63_t) x0 * c0);
+      acc1 += ((q63_t) x1 * c0);
+      acc2 += ((q63_t) x2 * c0);
+
+      /* Reuse the present sample states for next sample */
+      x0 = x1;
+      x1 = x2;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 3 to process the next group of 3 samples */
+    pState = pState + 3;
+
+    /* The results in the 3 accumulators are in 2.30 format.  Convert to 1.31    
+     ** Then store the 3 outputs in the destination buffer. */
+    *pDst++ = (q31_t) (acc0 >> 31u);
+    *pDst++ = (q31_t) (acc1 >> 31u);
+    *pDst++ = (q31_t) (acc2 >> 31u);
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 3, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+
+  while(blockSize > 0u)
+  {
+    /* Copy one sample at a time into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize Coefficient pointer */
+    pb = (pCoeffs);
+
+    i = numTaps;
+
+    /* Perform the multiply-accumulates */
+    do
+    {
+      acc0 += (q63_t) * (px++) * (*(pb++));
+      i--;
+    } while(i > 0u);
+
+    /* The result is in 2.62 format.  Convert to 1.31    
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q31_t) (acc0 >> 31u);
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the samples loop counter */
+    blockSize--;
+  }
+
+  /* Processing is complete.    
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.    
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  tapCnt = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  q31_t *px;                                     /* Temporary pointer for state */
+  q31_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  q63_t acc;                                     /* Accumulator */
+  uint32_t numTaps = S->numTaps;                 /* Length of the filter */
+  uint32_t i, tapCnt, blkCnt;                    /* Loop counters */
+
+  /* S->pState buffer contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Initialize blkCnt with blockSize */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy one sample at a time into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize Coefficient pointer */
+    pb = pCoeffs;
+
+    i = numTaps;
+
+    /* Perform the multiply-accumulates */
+    do
+    {
+      /* acc =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0] */
+      acc += (q63_t) * px++ * *pb++;
+      i--;
+    } while(i > 0u);
+
+    /* The result is in 2.62 format.  Convert to 1.31         
+     ** Then store the output in the destination buffer. */
+    *pDst++ = (q31_t) (acc >> 31u);
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the starting of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  /* Copy numTaps number of values */
+  tapCnt = numTaps - 1u;
+
+  /* Copy the data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+
+#endif /*  #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,397 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_q7.c    
+*    
+* Description:  Q7 FIR filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR    
+ * @{    
+ */
+
+/**    
+ * @param[in]   *S points to an instance of the Q7 FIR filter structure.    
+ * @param[in]   *pSrc points to the block of input data.    
+ * @param[out]  *pDst points to the block of output data.    
+ * @param[in]   blockSize number of samples to process per call.    
+ * @return 	none.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.7 format and multiplications yield a 2.14 result.    
+ * The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * The accumulator is converted to 18.7 format by discarding the low 7 bits.    
+ * Finally, the result is truncated to 1.7 format.    
+ */
+
+void arm_fir_q7(
+  const arm_fir_instance_q7 * S,
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t *pState = S->pState;                      /* State pointer */
+  q7_t *pCoeffs = S->pCoeffs;                    /* Coefficient pointer */
+  q7_t *pStateCurnt;                             /* Points to the current sample of the state */
+  q7_t x0, x1, x2, x3;                           /* Temporary variables to hold state */
+  q7_t c0;                                       /* Temporary variable to hold coefficient value */
+  q7_t *px;                                      /* Temporary pointer for state */
+  q7_t *pb;                                      /* Temporary pointer for coefficient buffer */
+  q31_t acc0, acc1, acc2, acc3;                  /* Accumulators */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t i, tapCnt, blkCnt;                    /* Loop counters */
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Apply loop unrolling and compute 4 output values simultaneously.    
+   * The variables acc0 ... acc3 hold output values that are being computed:    
+   *    
+   *    acc0 =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0]    
+   *    acc1 =  b[numTaps-1] * x[n-numTaps] +   b[numTaps-2] * x[n-numTaps-1] + b[numTaps-3] * x[n-numTaps-2] +...+ b[0] * x[1]    
+   *    acc2 =  b[numTaps-1] * x[n-numTaps+1] + b[numTaps-2] * x[n-numTaps] +   b[numTaps-3] * x[n-numTaps-1] +...+ b[0] * x[2]    
+   *    acc3 =  b[numTaps-1] * x[n-numTaps+2] + b[numTaps-2] * x[n-numTaps+1] + b[numTaps-3] * x[n-numTaps]   +...+ b[0] * x[3]    
+   */
+  blkCnt = blockSize >> 2;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Copy four new input samples into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set all accumulators to zero */
+    acc0 = 0;
+    acc1 = 0;
+    acc2 = 0;
+    acc3 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Read the first three samples from the state buffer:    
+     *  x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2] */
+    x0 = *(px++);
+    x1 = *(px++);
+    x2 = *(px++);
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+    i = tapCnt;
+
+    while(i > 0u)
+    {
+      /* Read the b[numTaps] coefficient */
+      c0 = *pb;
+
+      /* Read x[n-numTaps-3] sample */
+      x3 = *px;
+      
+      /* acc0 +=  b[numTaps] * x[n-numTaps] */
+      acc0 += ((q15_t) x0 * c0);
+
+      /* acc1 +=  b[numTaps] * x[n-numTaps-1] */
+      acc1 += ((q15_t) x1 * c0);
+
+      /* acc2 +=  b[numTaps] * x[n-numTaps-2] */
+      acc2 += ((q15_t) x2 * c0);
+
+      /* acc3 +=  b[numTaps] * x[n-numTaps-3] */
+      acc3 += ((q15_t) x3 * c0);
+
+      /* Read the b[numTaps-1] coefficient */
+      c0 = *(pb + 1u);
+
+      /* Read x[n-numTaps-4] sample */
+      x0 = *(px + 1u);
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q15_t) x1 * c0);
+      acc1 += ((q15_t) x2 * c0);
+      acc2 += ((q15_t) x3 * c0);
+      acc3 += ((q15_t) x0 * c0);
+
+      /* Read the b[numTaps-2] coefficient */
+      c0 = *(pb + 2u);
+
+      /* Read x[n-numTaps-5] sample */
+      x1 = *(px + 2u);
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q15_t) x2 * c0);
+      acc1 += ((q15_t) x3 * c0);
+      acc2 += ((q15_t) x0 * c0);
+      acc3 += ((q15_t) x1 * c0);
+
+      /* Read the b[numTaps-3] coefficients */
+      c0 = *(pb + 3u);
+
+      /* Read x[n-numTaps-6] sample */
+      x2 = *(px + 3u);
+      
+      /* Perform the multiply-accumulates */
+      acc0 += ((q15_t) x3 * c0);
+      acc1 += ((q15_t) x0 * c0);
+      acc2 += ((q15_t) x1 * c0);
+      acc3 += ((q15_t) x2 * c0);
+
+      /* update coefficient pointer */
+      pb += 4u;
+      px += 4u;
+      
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+
+    i = numTaps - (tapCnt * 4u);
+    while(i > 0u)
+    {
+      /* Read coefficients */
+      c0 = *(pb++);
+
+      /* Fetch 1 state variable */
+      x3 = *(px++);
+
+      /* Perform the multiply-accumulates */
+      acc0 += ((q15_t) x0 * c0);
+      acc1 += ((q15_t) x1 * c0);
+      acc2 += ((q15_t) x2 * c0);
+      acc3 += ((q15_t) x3 * c0);
+
+      /* Reuse the present sample states for next sample */
+      x0 = x1;
+      x1 = x2;
+      x2 = x3;
+
+      /* Decrement the loop counter */
+      i--;
+    }
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 4;
+
+    /* The results in the 4 accumulators are in 2.62 format.  Convert to 1.31    
+     ** Then store the 4 outputs in the destination buffer. */
+    acc0 = __SSAT((acc0 >> 7u), 8);
+    *pDst++ = acc0;
+    acc1 = __SSAT((acc1 >> 7u), 8);
+    *pDst++ = acc1;
+    acc2 = __SSAT((acc2 >> 7u), 8);
+    *pDst++ = acc2;
+    acc3 = __SSAT((acc3 >> 7u), 8);
+    *pDst++ = acc3;
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 4u;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy one sample at a time into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set the accumulator to zero */
+    acc0 = 0;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize Coefficient pointer */
+    pb = (pCoeffs);
+
+    i = numTaps;
+
+    /* Perform the multiply-accumulates */
+    do
+    {
+      acc0 += (q15_t) * (px++) * (*(pb++));
+      i--;
+    } while(i > 0u);
+
+    /* The result is in 2.14 format.  Convert to 1.7    
+     ** Then store the output in the destination buffer. */
+    *pDst++ = __SSAT((acc0 >> 7u), 8);
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the samples loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.    
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.    
+   ** This prepares the state buffer for the next function call. */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+  tapCnt = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  uint32_t numTaps = S->numTaps;                 /* Number of taps in the filter */
+  uint32_t i, blkCnt;                            /* Loop counters */
+  q7_t *pState = S->pState;                      /* State pointer */
+  q7_t *pCoeffs = S->pCoeffs;                    /* Coefficient pointer */
+  q7_t *px, *pb;                                 /* Temporary pointers to state and coeff */
+  q31_t acc = 0;                                 /* Accumlator */
+  q7_t *pStateCurnt;                             /* Points to the current sample of the state */
+
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = S->pState + (numTaps - 1u);
+
+  /* Initialize blkCnt with blockSize */
+  blkCnt = blockSize;
+
+  /* Perform filtering upto BlockSize - BlockSize%4  */
+  while(blkCnt > 0u)
+  {
+    /* Copy one sample at a time into state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Set accumulator to zero */
+    acc = 0;
+
+    /* Initialize state pointer of type q7 */
+    px = pState;
+
+    /* Initialize coeff pointer of type q7 */
+    pb = pCoeffs;
+
+
+    i = numTaps;
+
+    while(i > 0u)
+    {
+      /* acc =  b[numTaps-1] * x[n-numTaps-1] + b[numTaps-2] * x[n-numTaps-2] + b[numTaps-3] * x[n-numTaps-3] +...+ b[0] * x[0] */
+      acc += (q15_t) * px++ * *pb++;
+      i--;
+    }
+
+    /* Store the 1.7 format filter output in destination buffer */
+    *pDst++ = (q7_t) __SSAT((acc >> 7), 8);
+
+    /* Advance the state pointer by 1 to process the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete.         
+   ** Now copy the last numTaps - 1 samples to the satrt of the state buffer.       
+   ** This prepares the state buffer for the next function call. */
+
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = S->pState;
+
+
+  /* Copy numTaps number of values */
+  i = (numTaps - 1u);
+
+  /* Copy q7_t data */
+  while(i > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    i--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,444 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_sparse_f32.c    
+*    
+* Description:	Floating-point sparse FIR filter processing function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup FIR_Sparse Finite Impulse Response (FIR) Sparse Filters    
+ *    
+ * This group of functions implements sparse FIR filters.     
+ * Sparse FIR filters are equivalent to standard FIR filters except that most of the coefficients are equal to zero.   
+ * Sparse filters are used for simulating reflections in communications and audio applications.   
+ *   
+ * There are separate functions for Q7, Q15, Q31, and floating-point data types.    
+ * The functions operate on blocks  of input and output data and each call to the function processes    
+ * <code>blockSize</code> samples through the filter.  <code>pSrc</code> and    
+ * <code>pDst</code> points to input and output arrays respectively containing <code>blockSize</code> values.    
+ *    
+ * \par Algorithm:    
+ * The sparse filter instant structure contains an array of tap indices <code>pTapDelay</code> which specifies the locations of the non-zero coefficients.   
+ * This is in addition to the coefficient array <code>b</code>.   
+ * The implementation essentially skips the multiplications by zero and leads to an efficient realization.   
+ * <pre>   
+ *     y[n] = b[0] * x[n-pTapDelay[0]] + b[1] * x[n-pTapDelay[1]] + b[2] * x[n-pTapDelay[2]] + ...+ b[numTaps-1] * x[n-pTapDelay[numTaps-1]]    
+ * </pre>    
+ * \par    
+ * \image html FIRSparse.gif "Sparse FIR filter.  b[n] represents the filter coefficients"   
+ * \par    
+ * <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>;    
+ * <code>pTapDelay</code> points to an array of nonzero indices and is also of size <code>numTaps</code>;   
+ * <code>pState</code> points to a state array of size <code>maxDelay + blockSize</code>, where   
+ * <code>maxDelay</code> is the largest offset value that is ever used in the <code>pTapDelay</code> array.   
+ * Some of the processing functions also require temporary working buffers.   
+ *   
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient and offset arrays may be shared among several instances while state variable arrays cannot be shared.    
+ * There are separate instance structure declarations for each of the 4 supported data types.    
+ *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numTaps, pCoeffs, pTapDelay, maxDelay, stateIndex, pState. Also set all of the values in pState to zero. 
+ *    
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * Set the values in the state buffer to zeros before static initialization.    
+ * The code below statically initializes each of the 4 different data type filter instance structures    
+ * <pre>    
+ *arm_fir_sparse_instance_f32 S = {numTaps, 0, pState, pCoeffs, maxDelay, pTapDelay};    
+ *arm_fir_sparse_instance_q31 S = {numTaps, 0, pState, pCoeffs, maxDelay, pTapDelay};    
+ *arm_fir_sparse_instance_q15 S = {numTaps, 0, pState, pCoeffs, maxDelay, pTapDelay};    
+ *arm_fir_sparse_instance_q7 S =  {numTaps, 0, pState, pCoeffs, maxDelay, pTapDelay};    
+ * </pre>    
+ * \par    
+ *    
+ * \par Fixed-Point Behavior    
+ * Care must be taken when using the fixed-point versions of the sparse FIR filter functions.    
+ * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.    
+ * Refer to the function specific documentation below for usage guidelines.    
+ */
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief Processing function for the floating-point sparse FIR filter.   
+ * @param[in]  *S          points to an instance of the floating-point sparse FIR structure.   
+ * @param[in]  *pSrc       points to the block of input data.   
+ * @param[out] *pDst       points to the block of output data   
+ * @param[in]  *pScratchIn points to a temporary buffer of size blockSize.   
+ * @param[in]  blockSize   number of input samples to process per call.   
+ * @return none.   
+ */
+
+void arm_fir_sparse_f32(
+  arm_fir_sparse_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  float32_t * pScratchIn,
+  uint32_t blockSize)
+{
+
+  float32_t *pState = S->pState;                 /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *px;                                 /* Scratch buffer pointer */
+  float32_t *py = pState;                        /* Temporary pointers for state buffer */
+  float32_t *pb = pScratchIn;                    /* Temporary pointers for scratch buffer */
+  float32_t *pOut;                               /* Destination pointer */
+  int32_t *pTapDelay = S->pTapDelay;             /* Pointer to the array containing offset of the non-zero tap values. */
+  uint32_t delaySize = S->maxDelay + blockSize;  /* state length */
+  uint16_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter  */
+  int32_t readIndex;                             /* Read index of the state buffer */
+  uint32_t tapCnt, blkCnt;                       /* loop counters */
+  float32_t coeff = *pCoeffs++;                  /* Read the first coefficient value */
+
+
+
+  /* BlockSize of Input samples are copied into the state buffer */
+  /* StateIndex points to the starting position to write in the state buffer */
+  arm_circularWrite_f32((int32_t *) py, delaySize, &S->stateIndex, 1,
+                        (int32_t *) pSrc, 1, blockSize);
+
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Working pointer for state buffer is updated */
+  py = pState;
+
+  /* blockSize samples are read from the state buffer */
+  arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+                       (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+                       blockSize);
+
+  /* Working pointer for the scratch buffer */
+  px = pb;
+
+  /* Working pointer for destination buffer */
+  pOut = pDst;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* Loop over the blockSize. Unroll by a factor of 4.    
+   * Compute 4 Multiplications at a time. */
+  blkCnt = blockSize >> 2u;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform Multiplications and store in destination buffer */
+    *pOut++ = *px++ * coeff;
+    *pOut++ = *px++ * coeff;
+    *pOut++ = *px++ * coeff;
+    *pOut++ = *px++ * coeff;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+   * compute the remaining samples */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform Multiplications and store in destination buffer */
+    *pOut++ = *px++ * coeff;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and    
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+                         (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+                         blockSize);
+
+    /* Working pointer for the scratch buffer */
+    px = pb;
+
+    /* Working pointer for destination buffer */
+    pOut = pDst;
+
+    /* Loop over the blockSize. Unroll by a factor of 4.    
+     * Compute 4 MACS at a time. */
+    blkCnt = blockSize >> 2u;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      *pOut++ += *px++ * coeff;
+      *pOut++ += *px++ * coeff;
+      *pOut++ += *px++ * coeff;
+      *pOut++ += *px++ * coeff;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4,    
+     * compute the remaining samples */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      *pOut++ += *px++ * coeff;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and    
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex = ((int32_t) S->stateIndex -
+                 (int32_t) blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */
+
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+											 (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+											 blockSize);
+
+	/* Working pointer for the scratch buffer */
+	px = pb;
+
+	/* Working pointer for destination buffer */
+	pOut = pDst;
+
+	/* Loop over the blockSize. Unroll by a factor of 4.    
+	 * Compute 4 MACS at a time. */
+	blkCnt = blockSize >> 2u;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		*pOut++ += *px++ * coeff;
+		*pOut++ += *px++ * coeff;
+		*pOut++ += *px++ * coeff;
+		*pOut++ += *px++ * coeff;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+	/* If the blockSize is not a multiple of 4,    
+	 * compute the remaining samples */
+	blkCnt = blockSize % 0x4u;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		*pOut++ += *px++ * coeff;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+#else
+
+/* Run the below code for Cortex-M0 */
+
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform Multiplications and store in destination buffer */
+    *pOut++ = *px++ * coeff;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and           
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+                         (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+                         blockSize);
+
+    /* Working pointer for the scratch buffer */
+    px = pb;
+
+    /* Working pointer for destination buffer */
+    pOut = pDst;
+
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      *pOut++ += *px++ * coeff;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and           
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex =
+      ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */	
+	
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+											 (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+											 blockSize);
+
+	/* Working pointer for the scratch buffer */
+	px = pb;
+
+	/* Working pointer for destination buffer */
+	pOut = pDst;
+
+	blkCnt = blockSize;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		*pOut++ += *px++ * coeff;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY        */
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,107 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_sparse_init_f32.c    
+*    
+* Description:	Floating-point sparse FIR filter initialization function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the floating-point sparse FIR filter.   
+ * @param[in,out] *S         points to an instance of the floating-point sparse FIR structure.   
+ * @param[in]     numTaps    number of nonzero coefficients in the filter.   
+ * @param[in]     *pCoeffs   points to the array of filter coefficients.   
+ * @param[in]     *pState    points to the state buffer.   
+ * @param[in]     *pTapDelay points to the array of offset times.   
+ * @param[in]     maxDelay   maximum offset time supported.   
+ * @param[in]     blockSize  number of samples that will be processed per block.   
+ * @return none   
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> holds the filter coefficients and has length <code>numTaps</code>.    
+ * <code>pState</code> holds the filter's state variables and must be of length    
+ * <code>maxDelay + blockSize</code>, where <code>maxDelay</code>    
+ * is the maximum number of delay line values.    
+ * <code>blockSize</code> is the    
+ * number of samples processed by the <code>arm_fir_sparse_f32()</code> function.    
+ */
+
+void arm_fir_sparse_init_f32(
+  arm_fir_sparse_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Assign TapDelay pointer */
+  S->pTapDelay = pTapDelay;
+
+  /* Assign MaxDelay */
+  S->maxDelay = maxDelay;
+
+  /* reset the stateIndex to 0 */
+  S->stateIndex = 0u;
+
+  /* Clear state buffer and size is always maxDelay + blockSize */
+  memset(pState, 0, (maxDelay + blockSize) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,107 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_sparse_init_q15.c    
+*    
+* Description:	Q15 sparse FIR filter initialization function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the Q15 sparse FIR filter.   
+ * @param[in,out] *S         points to an instance of the Q15 sparse FIR structure.   
+ * @param[in]     numTaps    number of nonzero coefficients in the filter.   
+ * @param[in]     *pCoeffs   points to the array of filter coefficients.   
+ * @param[in]     *pState    points to the state buffer.   
+ * @param[in]     *pTapDelay points to the array of offset times.   
+ * @param[in]     maxDelay   maximum offset time supported.   
+ * @param[in]     blockSize  number of samples that will be processed per block.   
+ * @return none   
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> holds the filter coefficients and has length <code>numTaps</code>.    
+ * <code>pState</code> holds the filter's state variables and must be of length    
+ * <code>maxDelay + blockSize</code>, where <code>maxDelay</code>    
+ * is the maximum number of delay line values.    
+ * <code>blockSize</code> is the    
+ * number of words processed by <code>arm_fir_sparse_q15()</code> function.    
+ */
+
+void arm_fir_sparse_init_q15(
+  arm_fir_sparse_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Assign TapDelay pointer */
+  S->pTapDelay = pTapDelay;
+
+  /* Assign MaxDelay */
+  S->maxDelay = maxDelay;
+
+  /* reset the stateIndex to 0 */
+  S->stateIndex = 0u;
+
+  /* Clear state buffer and size is always maxDelay + blockSize */
+  memset(pState, 0, (maxDelay + blockSize) * sizeof(q15_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,106 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_sparse_init_q31.c    
+*    
+* Description:	Q31 sparse FIR filter initialization function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the Q31 sparse FIR filter.   
+ * @param[in,out] *S         points to an instance of the Q31 sparse FIR structure.   
+ * @param[in]     numTaps    number of nonzero coefficients in the filter.   
+ * @param[in]     *pCoeffs   points to the array of filter coefficients.   
+ * @param[in]     *pState    points to the state buffer.   
+ * @param[in]     *pTapDelay points to the array of offset times.   
+ * @param[in]     maxDelay   maximum offset time supported.   
+ * @param[in]     blockSize  number of samples that will be processed per block.   
+ * @return none   
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> holds the filter coefficients and has length <code>numTaps</code>.    
+ * <code>pState</code> holds the filter's state variables and must be of length    
+ * <code>maxDelay + blockSize</code>, where <code>maxDelay</code>    
+ * is the maximum number of delay line values.    
+ * <code>blockSize</code> is the number of words processed by <code>arm_fir_sparse_q31()</code> function.    
+ */
+
+void arm_fir_sparse_init_q31(
+  arm_fir_sparse_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Assign TapDelay pointer */
+  S->pTapDelay = pTapDelay;
+
+  /* Assign MaxDelay */
+  S->maxDelay = maxDelay;
+
+  /* reset the stateIndex to 0 */
+  S->stateIndex = 0u;
+
+  /* Clear state buffer and size is always maxDelay + blockSize */
+  memset(pState, 0, (maxDelay + blockSize) * sizeof(q31_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_init_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,107 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_fir_sparse_init_q7.c    
+*    
+* Description:	Q7 sparse FIR filter initialization function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE. 
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief  Initialization function for the Q7 sparse FIR filter.   
+ * @param[in,out] *S         points to an instance of the Q7 sparse FIR structure.   
+ * @param[in]     numTaps    number of nonzero coefficients in the filter.   
+ * @param[in]     *pCoeffs   points to the array of filter coefficients.   
+ * @param[in]     *pState    points to the state buffer.   
+ * @param[in]     *pTapDelay points to the array of offset times.   
+ * @param[in]     maxDelay   maximum offset time supported.   
+ * @param[in]     blockSize  number of samples that will be processed per block.   
+ * @return none   
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> holds the filter coefficients and has length <code>numTaps</code>.    
+ * <code>pState</code> holds the filter's state variables and must be of length    
+ * <code>maxDelay + blockSize</code>, where <code>maxDelay</code>    
+ * is the maximum number of delay line values.    
+ * <code>blockSize</code> is the    
+ * number of samples processed by the <code>arm_fir_sparse_q7()</code> function.    
+ */
+
+void arm_fir_sparse_init_q7(
+  arm_fir_sparse_instance_q7 * S,
+  uint16_t numTaps,
+  q7_t * pCoeffs,
+  q7_t * pState,
+  int32_t * pTapDelay,
+  uint16_t maxDelay,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Assign TapDelay pointer */
+  S->pTapDelay = pTapDelay;
+
+  /* Assign MaxDelay */
+  S->maxDelay = maxDelay;
+
+  /* reset the stateIndex to 0 */
+  S->stateIndex = 0u;
+
+  /* Clear state buffer and size is always maxDelay + blockSize */
+  memset(pState, 0, (maxDelay + blockSize) * sizeof(q7_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,481 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_sparse_q15.c    
+*    
+* Description:	Q15 sparse FIR filter processing function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief Processing function for the Q15 sparse FIR filter.   
+ * @param[in]  *S           points to an instance of the Q15 sparse FIR structure.   
+ * @param[in]  *pSrc        points to the block of input data.   
+ * @param[out] *pDst        points to the block of output data   
+ * @param[in]  *pScratchIn  points to a temporary buffer of size blockSize.   
+ * @param[in]  *pScratchOut points to a temporary buffer of size blockSize.   
+ * @param[in]  blockSize    number of input samples to process per call.   
+ * @return none.   
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 32-bit accumulator.   
+ * The 1.15 x 1.15 multiplications yield a 2.30 result and these are added to a 2.30 accumulator.   
+ * Thus the full precision of the multiplications is maintained but there is only a single guard bit in the accumulator.   
+ * If the accumulator result overflows it will wrap around rather than saturate.   
+ * After all multiply-accumulates are performed, the 2.30 accumulator is truncated to 2.15 format and then saturated to 1.15 format.    
+ * In order to avoid overflows the input signal or coefficients must be scaled down by log2(numTaps) bits.   
+ */
+
+
+void arm_fir_sparse_q15(
+  arm_fir_sparse_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  q15_t * pScratchIn,
+  q31_t * pScratchOut,
+  uint32_t blockSize)
+{
+
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pIn = pSrc;                             /* Working pointer for input */
+  q15_t *pOut = pDst;                            /* Working pointer for output */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *px;                                     /* Temporary pointers for scratch buffer */
+  q15_t *pb = pScratchIn;                        /* Temporary pointers for scratch buffer */
+  q15_t *py = pState;                            /* Temporary pointers for state buffer */
+  int32_t *pTapDelay = S->pTapDelay;             /* Pointer to the array containing offset of the non-zero tap values. */
+  uint32_t delaySize = S->maxDelay + blockSize;  /* state length */
+  uint16_t numTaps = S->numTaps;                 /* Filter order */
+  int32_t readIndex;                             /* Read index of the state buffer */
+  uint32_t tapCnt, blkCnt;                       /* loop counters */
+  q15_t coeff = *pCoeffs++;                      /* Read the first coefficient value */
+  q31_t *pScr2 = pScratchOut;                    /* Working pointer for pScratchOut */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in1, in2;                                /* Temporary variables */
+
+
+  /* BlockSize of Input samples are copied into the state buffer */
+  /* StateIndex points to the starting position to write in the state buffer */
+  arm_circularWrite_q15(py, delaySize, &S->stateIndex, 1, pIn, 1, blockSize);
+
+  /* Loop over the number of taps. */
+  tapCnt = numTaps;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Working pointer for state buffer is updated */
+  py = pState;
+
+  /* blockSize samples are read from the state buffer */
+  arm_circularRead_q15(py, delaySize, &readIndex, 1,
+                       pb, pb, blockSize, 1, blockSize);
+
+  /* Working pointer for the scratch buffer of state values */
+  px = pb;
+
+  /* Working pointer for scratch buffer of output values */
+  pScratchOut = pScr2;
+
+  /* Loop over the blockSize. Unroll by a factor of 4.    
+   * Compute 4 multiplications at a time. */
+  blkCnt = blockSize >> 2;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform multiplication and store in the scratch buffer */
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+   * compute the remaining samples */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform multiplication and store in the scratch buffer */
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and    
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_q15(py, delaySize, &readIndex, 1,
+                         pb, pb, blockSize, 1, blockSize);
+
+    /* Working pointer for the scratch buffer of state values */
+    px = pb;
+
+    /* Working pointer for scratch buffer of output values */
+    pScratchOut = pScr2;
+
+    /* Loop over the blockSize. Unroll by a factor of 4.    
+     * Compute 4 MACS at a time. */
+    blkCnt = blockSize >> 2;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      *pScratchOut++ += (q31_t) * px++ * coeff;
+      *pScratchOut++ += (q31_t) * px++ * coeff;
+      *pScratchOut++ += (q31_t) * px++ * coeff;
+      *pScratchOut++ += (q31_t) * px++ * coeff;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4,    
+     * compute the remaining samples */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      *pScratchOut++ += (q31_t) * px++ * coeff;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and    
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex = (S->stateIndex - blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */		
+
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_q15(py, delaySize, &readIndex, 1,
+											 pb, pb, blockSize, 1, blockSize);
+
+	/* Working pointer for the scratch buffer of state values */
+	px = pb;
+
+	/* Working pointer for scratch buffer of output values */
+	pScratchOut = pScr2;
+
+	/* Loop over the blockSize. Unroll by a factor of 4.    
+	 * Compute 4 MACS at a time. */
+	blkCnt = blockSize >> 2;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		*pScratchOut++ += (q31_t) * px++ * coeff;
+		*pScratchOut++ += (q31_t) * px++ * coeff;
+		*pScratchOut++ += (q31_t) * px++ * coeff;
+		*pScratchOut++ += (q31_t) * px++ * coeff;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+	/* If the blockSize is not a multiple of 4,    
+	 * compute the remaining samples */
+	blkCnt = blockSize % 0x4u;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		*pScratchOut++ += (q31_t) * px++ * coeff;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+  /* All the output values are in pScratchOut buffer.    
+     Convert them into 1.15 format, saturate and store in the destination buffer. */
+  /* Loop over the blockSize. */
+  blkCnt = blockSize >> 2;
+
+  while(blkCnt > 0u)
+  {
+    in1 = *pScr2++;
+    in2 = *pScr2++;
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pOut)++ =
+      __PKHBT((q15_t) __SSAT(in1 >> 15, 16), (q15_t) __SSAT(in2 >> 15, 16),
+              16);
+
+#else
+    *__SIMD32(pOut)++ =
+      __PKHBT((q15_t) __SSAT(in2 >> 15, 16), (q15_t) __SSAT(in1 >> 15, 16),
+              16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+    in1 = *pScr2++;
+
+    in2 = *pScr2++;
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pOut)++ =
+      __PKHBT((q15_t) __SSAT(in1 >> 15, 16), (q15_t) __SSAT(in2 >> 15, 16),
+              16);
+
+#else
+
+    *__SIMD32(pOut)++ =
+      __PKHBT((q15_t) __SSAT(in2 >> 15, 16), (q15_t) __SSAT(in1 >> 15, 16),
+              16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+
+    blkCnt--;
+
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+     remaining samples are processed in the below loop */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    *pOut++ = (q15_t) __SSAT(*pScr2++ >> 15, 16);
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* BlockSize of Input samples are copied into the state buffer */
+  /* StateIndex points to the starting position to write in the state buffer */
+  arm_circularWrite_q15(py, delaySize, &S->stateIndex, 1, pIn, 1, blockSize);
+
+  /* Loop over the number of taps. */
+  tapCnt = numTaps;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Working pointer for state buffer is updated */
+  py = pState;
+
+  /* blockSize samples are read from the state buffer */
+  arm_circularRead_q15(py, delaySize, &readIndex, 1,
+                       pb, pb, blockSize, 1, blockSize);
+
+  /* Working pointer for the scratch buffer of state values */
+  px = pb;
+
+  /* Working pointer for scratch buffer of output values */
+  pScratchOut = pScr2;
+
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform multiplication and store in the scratch buffer */
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and           
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_q15(py, delaySize, &readIndex, 1,
+                         pb, pb, blockSize, 1, blockSize);
+
+    /* Working pointer for the scratch buffer of state values */
+    px = pb;
+
+    /* Working pointer for scratch buffer of output values */
+    pScratchOut = pScr2;
+
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      *pScratchOut++ += (q31_t) * px++ * coeff;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and           
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex = (S->stateIndex - blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */	
+	
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_q15(py, delaySize, &readIndex, 1,
+											 pb, pb, blockSize, 1, blockSize);
+
+	/* Working pointer for the scratch buffer of state values */
+	px = pb;
+
+	/* Working pointer for scratch buffer of output values */
+	pScratchOut = pScr2;
+
+	blkCnt = blockSize;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		*pScratchOut++ += (q31_t) * px++ * coeff;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+  /* All the output values are in pScratchOut buffer.       
+     Convert them into 1.15 format, saturate and store in the destination buffer. */
+  /* Loop over the blockSize. */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    *pOut++ = (q15_t) __SSAT(*pScr2++ >> 15, 16);
+    blkCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,461 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_sparse_q31.c    
+*    
+* Description:	Q31 sparse FIR filter processing function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ------------------------------------------------------------------- */
+#include "arm_math.h"
+
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+/**   
+ * @brief Processing function for the Q31 sparse FIR filter.   
+ * @param[in]  *S          points to an instance of the Q31 sparse FIR structure.   
+ * @param[in]  *pSrc       points to the block of input data.   
+ * @param[out] *pDst       points to the block of output data   
+ * @param[in]  *pScratchIn points to a temporary buffer of size blockSize.   
+ * @param[in]  blockSize   number of input samples to process per call.   
+ * @return none.   
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 32-bit accumulator.   
+ * The 1.31 x 1.31 multiplications are truncated to 2.30 format.   
+ * This leads to loss of precision on the intermediate multiplications and provides only a single guard bit.    
+ * If the accumulator result overflows, it wraps around rather than saturate.   
+ * In order to avoid overflows the input signal or coefficients must be scaled down by log2(numTaps) bits.   
+ */
+
+void arm_fir_sparse_q31(
+  arm_fir_sparse_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  q31_t * pScratchIn,
+  uint32_t blockSize)
+{
+
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *px;                                     /* Scratch buffer pointer */
+  q31_t *py = pState;                            /* Temporary pointers for state buffer */
+  q31_t *pb = pScratchIn;                        /* Temporary pointers for scratch buffer */
+  q31_t *pOut;                                   /* Destination pointer */
+  q63_t out;                                     /* Temporary output variable */
+  int32_t *pTapDelay = S->pTapDelay;             /* Pointer to the array containing offset of the non-zero tap values. */
+  uint32_t delaySize = S->maxDelay + blockSize;  /* state length */
+  uint16_t numTaps = S->numTaps;                 /* Filter order */
+  int32_t readIndex;                             /* Read index of the state buffer */
+  uint32_t tapCnt, blkCnt;                       /* loop counters */
+  q31_t coeff = *pCoeffs++;                      /* Read the first coefficient value */
+  q31_t in;
+
+
+  /* BlockSize of Input samples are copied into the state buffer */
+  /* StateIndex points to the starting position to write in the state buffer */
+  arm_circularWrite_f32((int32_t *) py, delaySize, &S->stateIndex, 1,
+                        (int32_t *) pSrc, 1, blockSize);
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (int32_t) (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Working pointer for state buffer is updated */
+  py = pState;
+
+  /* blockSize samples are read from the state buffer */
+  arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+                       (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+                       blockSize);
+
+  /* Working pointer for the scratch buffer of state values */
+  px = pb;
+
+  /* Working pointer for scratch buffer of output values */
+  pOut = pDst;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* Loop over the blockSize. Unroll by a factor of 4.    
+   * Compute 4 Multiplications at a time. */
+  blkCnt = blockSize >> 2;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform Multiplications and store in the destination buffer */
+    *pOut++ = (q31_t) (((q63_t) * px++ * coeff) >> 32);
+    *pOut++ = (q31_t) (((q63_t) * px++ * coeff) >> 32);
+    *pOut++ = (q31_t) (((q63_t) * px++ * coeff) >> 32);
+    *pOut++ = (q31_t) (((q63_t) * px++ * coeff) >> 32);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+   * compute the remaining samples */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform Multiplications and store in the destination buffer */
+    *pOut++ = (q31_t) (((q63_t) * px++ * coeff) >> 32);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and    
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (int32_t) (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+                         (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+                         blockSize);
+
+    /* Working pointer for the scratch buffer of state values */
+    px = pb;
+
+    /* Working pointer for scratch buffer of output values */
+    pOut = pDst;
+
+    /* Loop over the blockSize. Unroll by a factor of 4.    
+     * Compute 4 MACS at a time. */
+    blkCnt = blockSize >> 2;
+
+    while(blkCnt > 0u)
+    {
+      out = *pOut;
+      out += ((q63_t) * px++ * coeff) >> 32;
+      *pOut++ = (q31_t) (out);
+
+      out = *pOut;
+      out += ((q63_t) * px++ * coeff) >> 32;
+      *pOut++ = (q31_t) (out);
+
+      out = *pOut;
+      out += ((q63_t) * px++ * coeff) >> 32;
+      *pOut++ = (q31_t) (out);
+
+      out = *pOut;
+      out += ((q63_t) * px++ * coeff) >> 32;
+      *pOut++ = (q31_t) (out);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4,    
+     * compute the remaining samples */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      out = *pOut;
+      out += ((q63_t) * px++ * coeff) >> 32;
+      *pOut++ = (q31_t) (out);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and    
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex = (int32_t) (S->stateIndex - blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */
+	
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+											 (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+											 blockSize);
+
+	/* Working pointer for the scratch buffer of state values */
+	px = pb;
+
+	/* Working pointer for scratch buffer of output values */
+	pOut = pDst;
+
+	/* Loop over the blockSize. Unroll by a factor of 4.    
+	 * Compute 4 MACS at a time. */
+	blkCnt = blockSize >> 2;
+
+	while(blkCnt > 0u)
+	{
+		out = *pOut;
+		out += ((q63_t) * px++ * coeff) >> 32;
+		*pOut++ = (q31_t) (out);
+
+		out = *pOut;
+		out += ((q63_t) * px++ * coeff) >> 32;
+		*pOut++ = (q31_t) (out);
+
+		out = *pOut;
+		out += ((q63_t) * px++ * coeff) >> 32;
+		*pOut++ = (q31_t) (out);
+
+		out = *pOut;
+		out += ((q63_t) * px++ * coeff) >> 32;
+		*pOut++ = (q31_t) (out);
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+	/* If the blockSize is not a multiple of 4,    
+	 * compute the remaining samples */
+	blkCnt = blockSize % 0x4u;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		out = *pOut;
+		out += ((q63_t) * px++ * coeff) >> 32;
+		*pOut++ = (q31_t) (out);
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}	
+
+  /* Working output pointer is updated */
+  pOut = pDst;
+
+  /* Output is converted into 1.31 format. */
+  /* Loop over the blockSize. Unroll by a factor of 4.    
+   * process 4 output samples at a time. */
+  blkCnt = blockSize >> 2;
+
+  while(blkCnt > 0u)
+  {
+    in = *pOut << 1;
+    *pOut++ = in;
+    in = *pOut << 1;
+    *pOut++ = in;
+    in = *pOut << 1;
+    *pOut++ = in;
+    in = *pOut << 1;
+    *pOut++ = in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+   * process the remaining output samples */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    in = *pOut << 1;
+    *pOut++ = in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform Multiplications and store in the destination buffer */
+    *pOut++ = (q31_t) (((q63_t) * px++ * coeff) >> 32);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and           
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = (int32_t) (S->stateIndex - blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+                         (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+                         blockSize);
+
+    /* Working pointer for the scratch buffer of state values */
+    px = pb;
+
+    /* Working pointer for scratch buffer of output values */
+    pOut = pDst;
+
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      out = *pOut;
+      out += ((q63_t) * px++ * coeff) >> 32;
+      *pOut++ = (q31_t) (out);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and           
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex = (int32_t) (S->stateIndex - blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */	
+	
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_f32((int32_t *) py, delaySize, &readIndex, 1,
+											 (int32_t *) pb, (int32_t *) pb, blockSize, 1,
+											 blockSize);
+
+	/* Working pointer for the scratch buffer of state values */
+	px = pb;
+
+	/* Working pointer for scratch buffer of output values */
+	pOut = pDst;
+
+	blkCnt = blockSize;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		out = *pOut;
+		out += ((q63_t) * px++ * coeff) >> 32;
+		*pOut++ = (q31_t) (out);
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+  /* Working output pointer is updated */
+  pOut = pDst;
+
+  /* Output is converted into 1.31 format. */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    in = *pOut << 1;
+    *pOut++ = in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_fir_sparse_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,480 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_fir_sparse_q7.c    
+*    
+* Description:	Q7 sparse FIR filter processing function.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ------------------------------------------------------------------- */
+#include "arm_math.h"
+
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup FIR_Sparse    
+ * @{    
+ */
+
+
+/**   
+ * @brief Processing function for the Q7 sparse FIR filter.   
+ * @param[in]  *S           points to an instance of the Q7 sparse FIR structure.   
+ * @param[in]  *pSrc        points to the block of input data.   
+ * @param[out] *pDst        points to the block of output data   
+ * @param[in]  *pScratchIn  points to a temporary buffer of size blockSize.   
+ * @param[in]  *pScratchOut points to a temporary buffer of size blockSize.   
+ * @param[in]  blockSize    number of input samples to process per call.   
+ * @return none.   
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.7 format and multiplications yield a 2.14 result.    
+ * The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * The accumulator is then converted to 18.7 format by discarding the low 7 bits.   
+ * Finally, the result is truncated to 1.7 format.   
+ */
+
+void arm_fir_sparse_q7(
+  arm_fir_sparse_instance_q7 * S,
+  q7_t * pSrc,
+  q7_t * pDst,
+  q7_t * pScratchIn,
+  q31_t * pScratchOut,
+  uint32_t blockSize)
+{
+
+  q7_t *pState = S->pState;                      /* State pointer */
+  q7_t *pCoeffs = S->pCoeffs;                    /* Coefficient pointer */
+  q7_t *px;                                      /* Scratch buffer pointer */
+  q7_t *py = pState;                             /* Temporary pointers for state buffer */
+  q7_t *pb = pScratchIn;                         /* Temporary pointers for scratch buffer */
+  q7_t *pOut = pDst;                             /* Destination pointer */
+  int32_t *pTapDelay = S->pTapDelay;             /* Pointer to the array containing offset of the non-zero tap values. */
+  uint32_t delaySize = S->maxDelay + blockSize;  /* state length */
+  uint16_t numTaps = S->numTaps;                 /* Filter order */
+  int32_t readIndex;                             /* Read index of the state buffer */
+  uint32_t tapCnt, blkCnt;                       /* loop counters */
+  q7_t coeff = *pCoeffs++;                       /* Read the coefficient value */
+  q31_t *pScr2 = pScratchOut;                    /* Working pointer for scratch buffer of output values */
+  q31_t in;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t in1, in2, in3, in4;
+
+  /* BlockSize of Input samples are copied into the state buffer */
+  /* StateIndex points to the starting position to write in the state buffer */
+  arm_circularWrite_q7(py, (int32_t) delaySize, &S->stateIndex, 1, pSrc, 1,
+                       blockSize);
+
+  /* Loop over the number of taps. */
+  tapCnt = numTaps;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Working pointer for state buffer is updated */
+  py = pState;
+
+  /* blockSize samples are read from the state buffer */
+  arm_circularRead_q7(py, (int32_t) delaySize, &readIndex, 1, pb, pb,
+                      (int32_t) blockSize, 1, blockSize);
+
+  /* Working pointer for the scratch buffer of state values */
+  px = pb;
+
+  /* Working pointer for scratch buffer of output values */
+  pScratchOut = pScr2;
+
+  /* Loop over the blockSize. Unroll by a factor of 4.    
+   * Compute 4 multiplications at a time. */
+  blkCnt = blockSize >> 2;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform multiplication and store in the scratch buffer */
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+   * compute the remaining samples */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform multiplication and store in the scratch buffer */
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and    
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_q7(py, (int32_t) delaySize, &readIndex, 1, pb, pb,
+                        (int32_t) blockSize, 1, blockSize);
+
+    /* Working pointer for the scratch buffer of state values */
+    px = pb;
+
+    /* Working pointer for scratch buffer of output values */
+    pScratchOut = pScr2;
+
+    /* Loop over the blockSize. Unroll by a factor of 4.    
+     * Compute 4 MACS at a time. */
+    blkCnt = blockSize >> 2;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      in = *pScratchOut + ((q31_t) * px++ * coeff);
+      *pScratchOut++ = in;
+      in = *pScratchOut + ((q31_t) * px++ * coeff);
+      *pScratchOut++ = in;
+      in = *pScratchOut + ((q31_t) * px++ * coeff);
+      *pScratchOut++ = in;
+      in = *pScratchOut + ((q31_t) * px++ * coeff);
+      *pScratchOut++ = in;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4,    
+     * compute the remaining samples */
+    blkCnt = blockSize % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      in = *pScratchOut + ((q31_t) * px++ * coeff);
+      *pScratchOut++ = in;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and    
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex = ((int32_t) S->stateIndex -
+                 (int32_t) blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */	
+	
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_q7(py, (int32_t) delaySize, &readIndex, 1, pb, pb,
+											(int32_t) blockSize, 1, blockSize);
+
+	/* Working pointer for the scratch buffer of state values */
+	px = pb;
+
+	/* Working pointer for scratch buffer of output values */
+	pScratchOut = pScr2;
+
+	/* Loop over the blockSize. Unroll by a factor of 4.    
+	 * Compute 4 MACS at a time. */
+	blkCnt = blockSize >> 2;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		in = *pScratchOut + ((q31_t) * px++ * coeff);
+		*pScratchOut++ = in;
+		in = *pScratchOut + ((q31_t) * px++ * coeff);
+		*pScratchOut++ = in;
+		in = *pScratchOut + ((q31_t) * px++ * coeff);
+		*pScratchOut++ = in;
+		in = *pScratchOut + ((q31_t) * px++ * coeff);
+		*pScratchOut++ = in;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+	/* If the blockSize is not a multiple of 4,    
+	 * compute the remaining samples */
+	blkCnt = blockSize % 0x4u;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		in = *pScratchOut + ((q31_t) * px++ * coeff);
+		*pScratchOut++ = in;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+  /* All the output values are in pScratchOut buffer.    
+     Convert them into 1.15 format, saturate and store in the destination buffer. */
+  /* Loop over the blockSize. */
+  blkCnt = blockSize >> 2;
+
+  while(blkCnt > 0u)
+  {
+    in1 = (q7_t) __SSAT(*pScr2++ >> 7, 8);
+    in2 = (q7_t) __SSAT(*pScr2++ >> 7, 8);
+    in3 = (q7_t) __SSAT(*pScr2++ >> 7, 8);
+    in4 = (q7_t) __SSAT(*pScr2++ >> 7, 8);
+
+    *__SIMD32(pOut)++ = __PACKq7(in1, in2, in3, in4);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4,    
+     remaining samples are processed in the below loop */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    *pOut++ = (q7_t) __SSAT(*pScr2++ >> 7, 8);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* BlockSize of Input samples are copied into the state buffer */
+  /* StateIndex points to the starting position to write in the state buffer */
+  arm_circularWrite_q7(py, (int32_t) delaySize, &S->stateIndex, 1, pSrc, 1,
+                       blockSize);
+
+  /* Loop over the number of taps. */
+  tapCnt = numTaps;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Working pointer for state buffer is updated */
+  py = pState;
+
+  /* blockSize samples are read from the state buffer */
+  arm_circularRead_q7(py, (int32_t) delaySize, &readIndex, 1, pb, pb,
+                      (int32_t) blockSize, 1, blockSize);
+
+  /* Working pointer for the scratch buffer of state values */
+  px = pb;
+
+  /* Working pointer for scratch buffer of output values */
+  pScratchOut = pScr2;
+
+  /* Loop over the blockSize */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Perform multiplication and store in the scratch buffer */
+    *pScratchOut++ = ((q31_t) * px++ * coeff);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Load the coefficient value and           
+   * increment the coefficient buffer for the next set of state values */
+  coeff = *pCoeffs++;
+
+  /* Read Index, from where the state buffer should be read, is calculated. */
+  readIndex = ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+  /* Wraparound of readIndex */
+  if(readIndex < 0)
+  {
+    readIndex += (int32_t) delaySize;
+  }
+
+  /* Loop over the number of taps. */
+  tapCnt = (uint32_t) numTaps - 2u;
+
+  while(tapCnt > 0u)
+  {
+    /* Working pointer for state buffer is updated */
+    py = pState;
+
+    /* blockSize samples are read from the state buffer */
+    arm_circularRead_q7(py, (int32_t) delaySize, &readIndex, 1, pb, pb,
+                        (int32_t) blockSize, 1, blockSize);
+
+    /* Working pointer for the scratch buffer of state values */
+    px = pb;
+
+    /* Working pointer for scratch buffer of output values */
+    pScratchOut = pScr2;
+
+    /* Loop over the blockSize */
+    blkCnt = blockSize;
+
+    while(blkCnt > 0u)
+    {
+      /* Perform Multiply-Accumulate */
+      in = *pScratchOut + ((q31_t) * px++ * coeff);
+      *pScratchOut++ = in;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Load the coefficient value and           
+     * increment the coefficient buffer for the next set of state values */
+    coeff = *pCoeffs++;
+
+    /* Read Index, from where the state buffer should be read, is calculated. */
+    readIndex =
+      ((int32_t) S->stateIndex - (int32_t) blockSize) - *pTapDelay++;
+
+    /* Wraparound of readIndex */
+    if(readIndex < 0)
+    {
+      readIndex += (int32_t) delaySize;
+    }
+
+    /* Decrement the tap loop counter */
+    tapCnt--;
+  }
+	
+	/* Compute last tap without the final read of pTapDelay */	
+	
+	/* Working pointer for state buffer is updated */
+	py = pState;
+
+	/* blockSize samples are read from the state buffer */
+	arm_circularRead_q7(py, (int32_t) delaySize, &readIndex, 1, pb, pb,
+											(int32_t) blockSize, 1, blockSize);
+
+	/* Working pointer for the scratch buffer of state values */
+	px = pb;
+
+	/* Working pointer for scratch buffer of output values */
+	pScratchOut = pScr2;
+
+	/* Loop over the blockSize */
+	blkCnt = blockSize;
+
+	while(blkCnt > 0u)
+	{
+		/* Perform Multiply-Accumulate */
+		in = *pScratchOut + ((q31_t) * px++ * coeff);
+		*pScratchOut++ = in;
+
+		/* Decrement the loop counter */
+		blkCnt--;
+	}
+
+  /* All the output values are in pScratchOut buffer.       
+     Convert them into 1.15 format, saturate and store in the destination buffer. */
+  /* Loop over the blockSize. */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    *pOut++ = (q7_t) __SSAT(*pScr2++ >> 7, 8);
+
+    /* Decrement the blockSize loop counter */
+    blkCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of FIR_Sparse group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,447 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_iir_lattice_f32.c    
+*    
+* Description:	Floating-point IIR Lattice filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup IIR_Lattice Infinite Impulse Response (IIR) Lattice Filters    
+ *    
+ * This set of functions implements lattice filters    
+ * for Q15, Q31 and floating-point data types.  Lattice filters are used in a     
+ * variety of adaptive filter applications.  The filter structure has feedforward and    
+ * feedback components and the net impulse response is infinite length.    
+ * The functions operate on blocks    
+ * of input and output data and each call to the function processes    
+ * <code>blockSize</code> samples through the filter.  <code>pSrc</code> and    
+ * <code>pDst</code> point to input and output arrays containing <code>blockSize</code> values.    
+    
+ * \par Algorithm:    
+ * \image html IIRLattice.gif "Infinite Impulse Response Lattice filter"    
+ * <pre>    
+ *    fN(n)   =  x(n)    
+ *    fm-1(n) = fm(n) - km * gm-1(n-1)   for m = N, N-1, ...1    
+ *    gm(n)   = km * fm-1(n) + gm-1(n-1) for m = N, N-1, ...1    
+ *    y(n)    = vN * gN(n) + vN-1 * gN-1(n) + ...+ v0 * g0(n)    
+ * </pre>    
+ * \par    
+ * <code>pkCoeffs</code> points to array of reflection coefficients of size <code>numStages</code>.     
+ * Reflection coefficients are stored in time-reversed order.    
+ * \par    
+ * <pre>    
+ *    {kN, kN-1, ....k1}    
+ * </pre>    
+ * <code>pvCoeffs</code> points to the array of ladder coefficients of size <code>(numStages+1)</code>.     
+ * Ladder coefficients are stored in time-reversed order.    
+ * \par    
+ * <pre>    
+ *    {vN, vN-1, ...v0}    
+ * </pre>    
+ * <code>pState</code> points to a state array of size <code>numStages + blockSize</code>.    
+ * The state variables shown in the figure above (the g values) are stored in the <code>pState</code> array.    
+ * The state variables are updated after each block of data is processed; the coefficients are untouched.    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter.    
+ * Coefficient arrays may be shared among several instances while state variable arrays cannot be shared.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+  *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.   
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numStages, pkCoeffs, pvCoeffs, pState. Also set all of the values in pState to zero. 
+ *    
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * Set the values in the state buffer to zeros and then manually initialize the instance structure as follows:    
+ * <pre>    
+ *arm_iir_lattice_instance_f32 S = {numStages, pState, pkCoeffs, pvCoeffs};    
+ *arm_iir_lattice_instance_q31 S = {numStages, pState, pkCoeffs, pvCoeffs};    
+ *arm_iir_lattice_instance_q15 S = {numStages, pState, pkCoeffs, pvCoeffs};    
+ * </pre>    
+ * \par    
+ * where <code>numStages</code> is the number of stages in the filter; <code>pState</code> points to the state buffer array;    
+ * <code>pkCoeffs</code> points to array of the reflection coefficients; <code>pvCoeffs</code> points to the array of ladder coefficients.    
+ * \par Fixed-Point Behavior    
+ * Care must be taken when using the fixed-point versions of the IIR lattice filter functions.    
+ * In particular, the overflow and saturation behavior of the accumulator used in each function must be considered.    
+ * Refer to the function specific documentation below for usage guidelines.    
+ */
+
+/**    
+ * @addtogroup IIR_Lattice    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the floating-point IIR lattice filter.    
+ * @param[in] *S points to an instance of the floating-point IIR lattice structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in] blockSize number of samples to process.    
+ * @return none.    
+ */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+void arm_iir_lattice_f32(
+  const arm_iir_lattice_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t fnext1, gcurr1, gnext;               /* Temporary variables for lattice stages */
+  float32_t acc;                                 /* Accumlator */
+  uint32_t blkCnt, tapCnt;                       /* temporary variables for counts */
+  float32_t *px1, *px2, *pk, *pv;                /* temporary pointers for state and coef */
+  uint32_t numStages = S->numStages;             /* number of stages */
+  float32_t *pState;                             /* State pointer */
+  float32_t *pStateCurnt;                        /* State current pointer */
+  float32_t k1, k2;
+  float32_t v1, v2, v3, v4;
+  float32_t gcurr2;
+  float32_t fnext2;
+
+  /* initialise loop count */
+  blkCnt = blockSize;
+
+  /* initialise state pointer */
+  pState = &S->pState[0];
+
+  /* Sample processing */
+  while(blkCnt > 0u)
+  {
+    /* Read Sample from input buffer */
+    /* fN(n) = x(n) */
+    fnext2 = *pSrc++;
+
+    /* Initialize Ladder coeff pointer */
+    pv = &S->pvCoeffs[0];
+    /* Initialize Reflection coeff pointer */
+    pk = &S->pkCoeffs[0];
+
+    /* Initialize state read pointer */
+    px1 = pState;
+    /* Initialize state write pointer */
+    px2 = pState;
+
+    /* Set accumulator to zero */
+    acc = 0.0;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = (numStages) >> 2;
+
+    while(tapCnt > 0u)
+    {
+      /* Read gN-1(n-1) from state buffer */
+      gcurr1 = *px1;
+
+      /* read reflection coefficient kN */
+      k1 = *pk;
+
+      /* fN-1(n) = fN(n) - kN * gN-1(n-1) */
+      fnext1 = fnext2 - (k1 * gcurr1);
+
+      /* read ladder coefficient vN */
+      v1 = *pv;
+
+      /* read next reflection coefficient kN-1 */
+      k2 = *(pk + 1u);
+
+      /* Read gN-2(n-1) from state buffer */
+      gcurr2 = *(px1 + 1u);
+
+      /* read next ladder coefficient vN-1 */
+      v2 = *(pv + 1u);
+
+      /* fN-2(n) = fN-1(n) - kN-1 * gN-2(n-1) */
+      fnext2 = fnext1 - (k2 * gcurr2);
+
+      /* gN(n)   = kN * fN-1(n) + gN-1(n-1) */
+      gnext = gcurr1 + (k1 * fnext1);
+
+      /* read reflection coefficient kN-2 */
+      k1 = *(pk + 2u);
+
+      /* write gN(n) into state for next sample processing */
+      *px2++ = gnext;
+
+      /* Read gN-3(n-1) from state buffer */
+      gcurr1 = *(px1 + 2u);
+
+      /* y(n) += gN(n) * vN  */
+      acc += (gnext * v1);
+
+      /* fN-3(n) = fN-2(n) - kN-2 * gN-3(n-1) */
+      fnext1 = fnext2 - (k1 * gcurr1);
+
+      /* gN-1(n)   = kN-1 * fN-2(n) + gN-2(n-1) */
+      gnext = gcurr2 + (k2 * fnext2);
+
+      /* Read gN-4(n-1) from state buffer */
+      gcurr2 = *(px1 + 3u);
+
+      /* y(n) += gN-1(n) * vN-1  */
+      acc += (gnext * v2);
+
+      /* read reflection coefficient kN-3 */
+      k2 = *(pk + 3u);
+
+      /* write gN-1(n) into state for next sample processing */
+      *px2++ = gnext;
+
+      /* fN-4(n) = fN-3(n) - kN-3 * gN-4(n-1) */
+      fnext2 = fnext1 - (k2 * gcurr2);
+
+      /* gN-2(n) = kN-2 * fN-3(n) + gN-3(n-1) */
+      gnext = gcurr1 + (k1 * fnext1);
+
+      /* read ladder coefficient vN-2 */
+      v3 = *(pv + 2u);
+
+      /* y(n) += gN-2(n) * vN-2  */
+      acc += (gnext * v3);
+
+      /* write gN-2(n) into state for next sample processing */
+      *px2++ = gnext;
+
+      /* update pointer */
+      pk += 4u;
+
+      /* gN-3(n) = kN-3 * fN-4(n) + gN-4(n-1) */
+      gnext = (fnext2 * k2) + gcurr2;
+
+      /* read next ladder coefficient vN-3 */
+      v4 = *(pv + 3u);
+
+      /* y(n) += gN-4(n) * vN-4  */
+      acc += (gnext * v4);
+
+      /* write gN-3(n) into state for next sample processing */
+      *px2++ = gnext;
+
+      /* update pointers */
+      px1 += 4u;
+      pv += 4u;
+
+      tapCnt--;
+
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = (numStages) % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      gcurr1 = *px1++;
+      /* Process sample for last taps */
+      fnext1 = fnext2 - ((*pk) * gcurr1);
+      gnext = (fnext1 * (*pk++)) + gcurr1;
+      /* Output samples for last taps */
+      acc += (gnext * (*pv++));
+      *px2++ = gnext;
+      fnext2 = fnext1;
+
+      tapCnt--;
+
+    }
+
+    /* y(n) += g0(n) * v0 */
+    acc += (fnext2 * (*pv));
+
+    *px2++ = fnext2;
+
+    /* write out into pDst */
+    *pDst++ = acc;
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 1u;
+
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy last S->numStages samples to start of the buffer        
+     for the preperation of next frame process */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = &S->pState[0];
+  pState = &S->pState[blockSize];
+
+  tapCnt = numStages >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numStages) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+}
+
+#else
+
+void arm_iir_lattice_f32(
+  const arm_iir_lattice_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t fcurr, fnext = 0, gcurr, gnext;      /* Temporary variables for lattice stages */
+  float32_t acc;                                 /* Accumlator */
+  uint32_t blkCnt, tapCnt;                       /* temporary variables for counts */
+  float32_t *px1, *px2, *pk, *pv;                /* temporary pointers for state and coef */
+  uint32_t numStages = S->numStages;             /* number of stages */
+  float32_t *pState;                             /* State pointer */
+  float32_t *pStateCurnt;                        /* State current pointer */
+
+
+  /* Run the below code for Cortex-M0 */
+
+  blkCnt = blockSize;
+
+  pState = &S->pState[0];
+
+  /* Sample processing */
+  while(blkCnt > 0u)
+  {
+    /* Read Sample from input buffer */
+    /* fN(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize state read pointer */
+    px1 = pState;
+    /* Initialize state write pointer */
+    px2 = pState;
+    /* Set accumulator to zero */
+    acc = 0.0f;
+    /* Initialize Ladder coeff pointer */
+    pv = &S->pvCoeffs[0];
+    /* Initialize Reflection coeff pointer */
+    pk = &S->pkCoeffs[0];
+
+
+    /* Process sample for numStages */
+    tapCnt = numStages;
+
+    while(tapCnt > 0u)
+    {
+      gcurr = *px1++;
+      /* Process sample for last taps */
+      fnext = fcurr - ((*pk) * gcurr);
+      gnext = (fnext * (*pk++)) + gcurr;
+
+      /* Output samples for last taps */
+      acc += (gnext * (*pv++));
+      *px2++ = gnext;
+      fcurr = fnext;
+
+      /* Decrementing loop counter */
+      tapCnt--;
+
+    }
+
+    /* y(n) += g0(n) * v0 */
+    acc += (fnext * (*pv));
+
+    *px2++ = fnext;
+
+    /* write out into pDst */
+    *pDst++ = acc;
+
+    /* Advance the state pointer by 1 to process the next group of samples */
+    pState = pState + 1u;
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy last S->numStages samples to start of the buffer           
+     for the preperation of next frame process */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = &S->pState[0];
+  pState = &S->pState[blockSize];
+
+  tapCnt = numStages;
+
+  /* Copy the data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+}
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+
+/**    
+ * @} end of IIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,91 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_iir_lattice_init_f32.c    
+*    
+* Description:  Floating-point IIR lattice filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup IIR_Lattice    
+ * @{    
+ */
+
+/**    
+ * @brief Initialization function for the floating-point IIR lattice filter.    
+ * @param[in] *S points to an instance of the floating-point IIR lattice structure.    
+ * @param[in] numStages number of stages in the filter.    
+ * @param[in] *pkCoeffs points to the reflection coefficient buffer.  The array is of length numStages.    
+ * @param[in] *pvCoeffs points to the ladder coefficient buffer.  The array is of length numStages+1.    
+ * @param[in] *pState points to the state buffer.  The array is of length numStages+blockSize.    
+ * @param[in] blockSize number of samples to process.    
+ * @return none.    
+ */
+
+void arm_iir_lattice_init_f32(
+  arm_iir_lattice_instance_f32 * S,
+  uint16_t numStages,
+  float32_t * pkCoeffs,
+  float32_t * pvCoeffs,
+  float32_t * pState,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numStages = numStages;
+
+  /* Assign reflection coefficient pointer */
+  S->pkCoeffs = pkCoeffs;
+
+  /* Assign ladder coefficient pointer */
+  S->pvCoeffs = pvCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numStages */
+  memset(pState, 0, (numStages + blockSize) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+
+}
+
+  /**    
+   * @} end of IIR_Lattice group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,91 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_iir_lattice_init_q15.c    
+*    
+* Description:  Q15 IIR lattice filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup IIR_Lattice    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for the Q15 IIR lattice filter.    
+   * @param[in] *S points to an instance of the Q15 IIR lattice structure.    
+   * @param[in] numStages  number of stages in the filter.    
+   * @param[in] *pkCoeffs points to reflection coefficient buffer.  The array is of length numStages.    
+   * @param[in] *pvCoeffs points to ladder coefficient buffer.  The array is of length numStages+1.    
+   * @param[in] *pState points to state buffer.  The array is of length numStages+blockSize.    
+   * @param[in] blockSize number of samples to process per call.    
+   * @return none.    
+   */
+
+void arm_iir_lattice_init_q15(
+  arm_iir_lattice_instance_q15 * S,
+  uint16_t numStages,
+  q15_t * pkCoeffs,
+  q15_t * pvCoeffs,
+  q15_t * pState,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numStages = numStages;
+
+  /* Assign reflection coefficient pointer */
+  S->pkCoeffs = pkCoeffs;
+
+  /* Assign ladder coefficient pointer */
+  S->pvCoeffs = pvCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numStages */
+  memset(pState, 0, (numStages + blockSize) * sizeof(q15_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+
+}
+
+/**    
+ * @} end of IIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,91 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_iir_lattice_init_q31.c    
+*    
+* Description:  Initialization function for the Q31 IIR lattice filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup IIR_Lattice    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for the Q31 IIR lattice filter.    
+   * @param[in] *S points to an instance of the Q31 IIR lattice structure.    
+   * @param[in] numStages number of stages in the filter.    
+   * @param[in] *pkCoeffs points to the reflection coefficient buffer.  The array is of length numStages.    
+   * @param[in] *pvCoeffs points to the ladder coefficient buffer.  The array is of length numStages+1.    
+   * @param[in] *pState points to the state buffer.  The array is of length numStages+blockSize.    
+   * @param[in] blockSize number of samples to process.    
+   * @return none.    
+   */
+
+void arm_iir_lattice_init_q31(
+  arm_iir_lattice_instance_q31 * S,
+  uint16_t numStages,
+  q31_t * pkCoeffs,
+  q31_t * pvCoeffs,
+  q31_t * pState,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numStages = numStages;
+
+  /* Assign reflection coefficient pointer */
+  S->pkCoeffs = pkCoeffs;
+
+  /* Assign ladder coefficient pointer */
+  S->pvCoeffs = pvCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numStages */
+  memset(pState, 0, (numStages + blockSize) * sizeof(q31_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+
+}
+
+/**    
+ * @} end of IIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,464 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_iir_lattice_q15.c    
+*    
+* Description:	Q15 IIR lattice filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup IIR_Lattice    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q15 IIR lattice filter.    
+ * @param[in] *S points to an instance of the Q15 IIR lattice structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in] blockSize number of samples to process.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits.    
+ * Lastly, the accumulator is saturated to yield a result in 1.15 format.    
+ */
+
+void arm_iir_lattice_q15(
+  const arm_iir_lattice_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t fcurr, fnext, gcurr = 0, gnext;          /* Temporary variables for lattice stages */
+  q15_t gnext1, gnext2;                          /* Temporary variables for lattice stages */
+  uint32_t stgCnt;                               /* Temporary variables for counts */
+  q63_t acc;                                     /* Accumlator */
+  uint32_t blkCnt, tapCnt;                       /* Temporary variables for counts */
+  q15_t *px1, *px2, *pk, *pv;                    /* temporary pointers for state and coef */
+  uint32_t numStages = S->numStages;             /* number of stages */
+  q15_t *pState;                                 /* State pointer */
+  q15_t *pStateCurnt;                            /* State current pointer */
+  q15_t out;                                     /* Temporary variable for output */
+  q31_t v;                                       /* Temporary variable for ladder coefficient */
+#ifdef UNALIGNED_SUPPORT_DISABLE
+	q15_t v1, v2;
+#endif
+
+
+  blkCnt = blockSize;
+
+  pState = &S->pState[0];
+
+  /* Sample processing */
+  while(blkCnt > 0u)
+  {
+    /* Read Sample from input buffer */
+    /* fN(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize state read pointer */
+    px1 = pState;
+    /* Initialize state write pointer */
+    px2 = pState;
+    /* Set accumulator to zero */
+    acc = 0;
+    /* Initialize Ladder coeff pointer */
+    pv = &S->pvCoeffs[0];
+    /* Initialize Reflection coeff pointer */
+    pk = &S->pkCoeffs[0];
+
+
+    /* Process sample for first tap */
+    gcurr = *px1++;
+    /* fN-1(n) = fN(n) - kN * gN-1(n-1) */
+    fnext = fcurr - (((q31_t) gcurr * (*pk)) >> 15);
+    fnext = __SSAT(fnext, 16);
+    /* gN(n) = kN * fN-1(n) + gN-1(n-1) */
+    gnext = (((q31_t) fnext * (*pk++)) >> 15) + gcurr;
+    gnext = __SSAT(gnext, 16);
+    /* write gN(n) into state for next sample processing */
+    *px2++ = (q15_t) gnext;
+    /* y(n) += gN(n) * vN  */
+    acc += (q31_t) ((gnext * (*pv++)));
+
+
+    /* Update f values for next coefficient processing */
+    fcurr = fnext;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = (numStages - 1u) >> 2;
+
+    while(tapCnt > 0u)
+    {
+
+      /* Process sample for 2nd, 6th ...taps */
+      /* Read gN-2(n-1) from state buffer */
+      gcurr = *px1++;
+      /* Process sample for 2nd, 6th .. taps */
+      /* fN-2(n) = fN-1(n) - kN-1 * gN-2(n-1) */
+      fnext = fcurr - (((q31_t) gcurr * (*pk)) >> 15);
+      fnext = __SSAT(fnext, 16);
+      /* gN-1(n) = kN-1 * fN-2(n) + gN-2(n-1) */
+      gnext = (((q31_t) fnext * (*pk++)) >> 15) + gcurr;
+      gnext1 = (q15_t) __SSAT(gnext, 16);
+      /* write gN-1(n) into state */
+      *px2++ = (q15_t) gnext1;
+
+
+      /* Process sample for 3nd, 7th ...taps */
+      /* Read gN-3(n-1) from state */
+      gcurr = *px1++;
+      /* Process sample for 3rd, 7th .. taps */
+      /* fN-3(n) = fN-2(n) - kN-2 * gN-3(n-1) */
+      fcurr = fnext - (((q31_t) gcurr * (*pk)) >> 15);
+      fcurr = __SSAT(fcurr, 16);
+      /* gN-2(n) = kN-2 * fN-3(n) + gN-3(n-1) */
+      gnext = (((q31_t) fcurr * (*pk++)) >> 15) + gcurr;
+      gnext2 = (q15_t) __SSAT(gnext, 16);
+      /* write gN-2(n) into state */
+      *px2++ = (q15_t) gnext2;
+
+      /* Read vN-1 and vN-2 at a time */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      v = *__SIMD32(pv)++;
+
+#else
+
+	  v1 = *pv++;
+	  v2 = *pv++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  v = __PKHBT(v1, v2, 16);
+
+#else
+
+	  v = __PKHBT(v2, v1, 16);
+
+#endif	/* 	#ifndef ARM_MATH_BIG_ENDIAN		*/
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE */
+
+
+      /* Pack gN-1(n) and gN-2(n) */
+
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      gnext = __PKHBT(gnext1, gnext2, 16);
+
+#else
+
+      gnext = __PKHBT(gnext2, gnext1, 16);
+
+#endif /*   #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* y(n) += gN-1(n) * vN-1  */
+      /* process for gN-5(n) * vN-5, gN-9(n) * vN-9 ... */
+      /* y(n) += gN-2(n) * vN-2  */
+      /* process for gN-6(n) * vN-6, gN-10(n) * vN-10 ... */
+      acc = __SMLALD(gnext, v, acc);
+
+
+      /* Process sample for 4th, 8th ...taps */
+      /* Read gN-4(n-1) from state */
+      gcurr = *px1++;
+      /* Process sample for 4th, 8th .. taps */
+      /* fN-4(n) = fN-3(n) - kN-3 * gN-4(n-1) */
+      fnext = fcurr - (((q31_t) gcurr * (*pk)) >> 15);
+      fnext = __SSAT(fnext, 16);
+      /* gN-3(n) = kN-3 * fN-1(n) + gN-1(n-1) */
+      gnext = (((q31_t) fnext * (*pk++)) >> 15) + gcurr;
+      gnext1 = (q15_t) __SSAT(gnext, 16);
+      /* write  gN-3(n) for the next sample process */
+      *px2++ = (q15_t) gnext1;
+
+
+      /* Process sample for 5th, 9th ...taps */
+      /* Read gN-5(n-1) from state */
+      gcurr = *px1++;
+      /* Process sample for 5th, 9th .. taps */
+      /* fN-5(n) = fN-4(n) - kN-4 * gN-5(n-1) */
+      fcurr = fnext - (((q31_t) gcurr * (*pk)) >> 15);
+      fcurr = __SSAT(fcurr, 16);
+      /* gN-4(n) = kN-4 * fN-5(n) + gN-5(n-1) */
+      gnext = (((q31_t) fcurr * (*pk++)) >> 15) + gcurr;
+      gnext2 = (q15_t) __SSAT(gnext, 16);
+      /* write      gN-4(n) for the next sample process */
+      *px2++ = (q15_t) gnext2;
+
+      /* Read vN-3 and vN-4 at a time */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      v = *__SIMD32(pv)++;
+
+#else
+
+	  v1 = *pv++;
+	  v2 = *pv++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+	  v = __PKHBT(v1, v2, 16);
+
+#else
+
+	  v = __PKHBT(v2, v1, 16);
+
+#endif	/* #ifndef ARM_MATH_BIG_ENDIAN	 */
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE */
+
+
+      /* Pack gN-3(n) and gN-4(n) */
+#ifndef  ARM_MATH_BIG_ENDIAN
+
+      gnext = __PKHBT(gnext1, gnext2, 16);
+
+#else
+
+      gnext = __PKHBT(gnext2, gnext1, 16);
+
+#endif /*      #ifndef  ARM_MATH_BIG_ENDIAN    */
+
+      /* y(n) += gN-4(n) * vN-4  */
+      /* process for gN-8(n) * vN-8, gN-12(n) * vN-12 ... */
+      /* y(n) += gN-3(n) * vN-3  */
+      /* process for gN-7(n) * vN-7, gN-11(n) * vN-11 ... */
+      acc = __SMLALD(gnext, v, acc);
+
+      tapCnt--;
+
+    }
+
+    fnext = fcurr;
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = (numStages - 1u) % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      gcurr = *px1++;
+      /* Process sample for last taps */
+      fnext = fcurr - (((q31_t) gcurr * (*pk)) >> 15);
+      fnext = __SSAT(fnext, 16);
+      gnext = (((q31_t) fnext * (*pk++)) >> 15) + gcurr;
+      gnext = __SSAT(gnext, 16);
+      /* Output samples for last taps */
+      acc += (q31_t) (((q31_t) gnext * (*pv++)));
+      *px2++ = (q15_t) gnext;
+      fcurr = fnext;
+
+      tapCnt--;
+    }
+
+    /* y(n) += g0(n) * v0 */
+    acc += (q31_t) (((q31_t) fnext * (*pv++)));
+
+    out = (q15_t) __SSAT(acc >> 15, 16);
+    *px2++ = (q15_t) fnext;
+
+    /* write out into pDst */
+    *pDst++ = out;
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 1u;
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy last S->numStages samples to start of the buffer    
+     for the preperation of next frame process */
+  /* Points to the start of the state buffer */
+  pStateCurnt = &S->pState[0];
+  pState = &S->pState[blockSize];
+
+  stgCnt = (numStages >> 2u);
+
+  /* copy data */
+  while(stgCnt > 0u)
+  {
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+
+#else
+
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+#endif /*	#ifndef UNALIGNED_SUPPORT_DISABLE */
+
+    /* Decrement the loop counter */
+    stgCnt--;
+
+  }
+
+  /* Calculation of count for remaining q15_t data */
+  stgCnt = (numStages) % 0x4u;
+
+  /* copy data */
+  while(stgCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    stgCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t fcurr, fnext = 0, gcurr = 0, gnext;      /* Temporary variables for lattice stages */
+  uint32_t stgCnt;                               /* Temporary variables for counts */
+  q63_t acc;                                     /* Accumlator */
+  uint32_t blkCnt, tapCnt;                       /* Temporary variables for counts */
+  q15_t *px1, *px2, *pk, *pv;                    /* temporary pointers for state and coef */
+  uint32_t numStages = S->numStages;             /* number of stages */
+  q15_t *pState;                                 /* State pointer */
+  q15_t *pStateCurnt;                            /* State current pointer */
+  q15_t out;                                     /* Temporary variable for output */
+
+
+  blkCnt = blockSize;
+
+  pState = &S->pState[0];
+
+  /* Sample processing */
+  while(blkCnt > 0u)
+  {
+    /* Read Sample from input buffer */
+    /* fN(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize state read pointer */
+    px1 = pState;
+    /* Initialize state write pointer */
+    px2 = pState;
+    /* Set accumulator to zero */
+    acc = 0;
+    /* Initialize Ladder coeff pointer */
+    pv = &S->pvCoeffs[0];
+    /* Initialize Reflection coeff pointer */
+    pk = &S->pkCoeffs[0];
+
+    tapCnt = numStages;
+
+    while(tapCnt > 0u)
+    {
+      gcurr = *px1++;
+      /* Process sample */
+      /* fN-1(n) = fN(n) - kN * gN-1(n-1) */
+      fnext = fcurr - ((gcurr * (*pk)) >> 15);
+      fnext = __SSAT(fnext, 16);
+      /* gN(n) = kN * fN-1(n) + gN-1(n-1) */
+      gnext = ((fnext * (*pk++)) >> 15) + gcurr;
+      gnext = __SSAT(gnext, 16);
+      /* Output samples */
+      /* y(n) += gN(n) * vN */
+      acc += (q31_t) ((gnext * (*pv++)));
+      /* write gN(n) into state for next sample processing */
+      *px2++ = (q15_t) gnext;
+      /* Update f values for next coefficient processing */
+      fcurr = fnext;
+
+      tapCnt--;
+    }
+
+    /* y(n) += g0(n) * v0 */
+    acc += (q31_t) ((fnext * (*pv++)));
+
+    out = (q15_t) __SSAT(acc >> 15, 16);
+    *px2++ = (q15_t) fnext;
+
+    /* write out into pDst */
+    *pDst++ = out;
+
+    /* Advance the state pointer by 1 to process the next group of samples */
+    pState = pState + 1u;
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy last S->numStages samples to start of the buffer           
+     for the preperation of next frame process */
+  /* Points to the start of the state buffer */
+  pStateCurnt = &S->pState[0];
+  pState = &S->pState[blockSize];
+
+  stgCnt = numStages;
+
+  /* copy data */
+  while(stgCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    stgCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+
+
+
+/**    
+ * @} end of IIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_iir_lattice_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,350 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_iir_lattice_q31.c    
+*    
+* Description:	Q31 IIR lattice filter processing function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup IIR_Lattice    
+ * @{    
+ */
+
+/**    
+ * @brief Processing function for the Q31 IIR lattice filter.    
+ * @param[in] *S points to an instance of the Q31 IIR lattice structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[out] *pDst points to the block of output data.    
+ * @param[in] blockSize number of samples to process.    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate multiplication results but provides only a single guard bit.    
+ * Thus, if the accumulator result overflows it wraps around rather than clip.    
+ * In order to avoid overflows completely the input signal must be scaled down by 2*log2(numStages) bits.    
+ * After all multiply-accumulates are performed, the 2.62 accumulator is saturated to 1.32 format and then truncated to 1.31 format.    
+ */
+
+void arm_iir_lattice_q31(
+  const arm_iir_lattice_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t fcurr, fnext = 0, gcurr = 0, gnext;      /* Temporary variables for lattice stages */
+  q63_t acc;                                     /* Accumlator */
+  uint32_t blkCnt, tapCnt;                       /* Temporary variables for counts */
+  q31_t *px1, *px2, *pk, *pv;                    /* Temporary pointers for state and coef */
+  uint32_t numStages = S->numStages;             /* number of stages */
+  q31_t *pState;                                 /* State pointer */
+  q31_t *pStateCurnt;                            /* State current pointer */
+
+  blkCnt = blockSize;
+
+  pState = &S->pState[0];
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* Sample processing */
+  while(blkCnt > 0u)
+  {
+    /* Read Sample from input buffer */
+    /* fN(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize state read pointer */
+    px1 = pState;
+    /* Initialize state write pointer */
+    px2 = pState;
+    /* Set accumulator to zero */
+    acc = 0;
+    /* Initialize Ladder coeff pointer */
+    pv = &S->pvCoeffs[0];
+    /* Initialize Reflection coeff pointer */
+    pk = &S->pkCoeffs[0];
+
+
+    /* Process sample for first tap */
+    gcurr = *px1++;
+    /* fN-1(n) = fN(n) - kN * gN-1(n-1) */
+    fnext = __QSUB(fcurr, (q31_t) (((q63_t) gcurr * (*pk)) >> 31));
+    /* gN(n) = kN * fN-1(n) + gN-1(n-1) */
+    gnext = __QADD(gcurr, (q31_t) (((q63_t) fnext * (*pk++)) >> 31));
+    /* write gN-1(n-1) into state for next sample processing */
+    *px2++ = gnext;
+    /* y(n) += gN(n) * vN  */
+    acc += ((q63_t) gnext * *pv++);
+
+    /* Update f values for next coefficient processing */
+    fcurr = fnext;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = (numStages - 1u) >> 2;
+
+    while(tapCnt > 0u)
+    {
+
+      /* Process sample for 2nd, 6th .. taps */
+      /* Read gN-2(n-1) from state buffer */
+      gcurr = *px1++;
+      /* fN-2(n) = fN-1(n) - kN-1 * gN-2(n-1) */
+      fnext = __QSUB(fcurr, (q31_t) (((q63_t) gcurr * (*pk)) >> 31));
+      /* gN-1(n) = kN-1 * fN-2(n) + gN-2(n-1) */
+      gnext = __QADD(gcurr, (q31_t) (((q63_t) fnext * (*pk++)) >> 31));
+      /* y(n) += gN-1(n) * vN-1  */
+      /* process for gN-5(n) * vN-5, gN-9(n) * vN-9 ... */
+      acc += ((q63_t) gnext * *pv++);
+      /* write gN-1(n) into state for next sample processing */
+      *px2++ = gnext;
+
+      /* Process sample for 3nd, 7th ...taps */
+      /* Read gN-3(n-1) from state buffer */
+      gcurr = *px1++;
+      /* Process sample for 3rd, 7th .. taps */
+      /* fN-3(n) = fN-2(n) - kN-2 * gN-3(n-1) */
+      fcurr = __QSUB(fnext, (q31_t) (((q63_t) gcurr * (*pk)) >> 31));
+      /* gN-2(n) = kN-2 * fN-3(n) + gN-3(n-1) */
+      gnext = __QADD(gcurr, (q31_t) (((q63_t) fcurr * (*pk++)) >> 31));
+      /* y(n) += gN-2(n) * vN-2  */
+      /* process for gN-6(n) * vN-6, gN-10(n) * vN-10 ... */
+      acc += ((q63_t) gnext * *pv++);
+      /* write gN-2(n) into state for next sample processing */
+      *px2++ = gnext;
+
+
+      /* Process sample for 4th, 8th ...taps */
+      /* Read gN-4(n-1) from state buffer */
+      gcurr = *px1++;
+      /* Process sample for 4th, 8th .. taps */
+      /* fN-4(n) = fN-3(n) - kN-3 * gN-4(n-1) */
+      fnext = __QSUB(fcurr, (q31_t) (((q63_t) gcurr * (*pk)) >> 31));
+      /* gN-3(n) = kN-3 * fN-4(n) + gN-4(n-1) */
+      gnext = __QADD(gcurr, (q31_t) (((q63_t) fnext * (*pk++)) >> 31));
+      /* y(n) += gN-3(n) * vN-3  */
+      /* process for gN-7(n) * vN-7, gN-11(n) * vN-11 ... */
+      acc += ((q63_t) gnext * *pv++);
+      /* write gN-3(n) into state for next sample processing */
+      *px2++ = gnext;
+
+
+      /* Process sample for 5th, 9th ...taps */
+      /* Read gN-5(n-1) from state buffer */
+      gcurr = *px1++;
+      /* Process sample for 5th, 9th .. taps */
+      /* fN-5(n) = fN-4(n) - kN-4 * gN-1(n-1) */
+      fcurr = __QSUB(fnext, (q31_t) (((q63_t) gcurr * (*pk)) >> 31));
+      /* gN-4(n) = kN-4 * fN-5(n) + gN-5(n-1) */
+      gnext = __QADD(gcurr, (q31_t) (((q63_t) fcurr * (*pk++)) >> 31));
+      /* y(n) += gN-4(n) * vN-4  */
+      /* process for gN-8(n) * vN-8, gN-12(n) * vN-12 ... */
+      acc += ((q63_t) gnext * *pv++);
+      /* write gN-4(n) into state for next sample processing */
+      *px2++ = gnext;
+
+      tapCnt--;
+
+    }
+
+    fnext = fcurr;
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = (numStages - 1u) % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      gcurr = *px1++;
+      /* Process sample for last taps */
+      fnext = __QSUB(fcurr, (q31_t) (((q63_t) gcurr * (*pk)) >> 31));
+      gnext = __QADD(gcurr, (q31_t) (((q63_t) fnext * (*pk++)) >> 31));
+      /* Output samples for last taps */
+      acc += ((q63_t) gnext * *pv++);
+      *px2++ = gnext;
+      fcurr = fnext;
+
+      tapCnt--;
+
+    }
+
+    /* y(n) += g0(n) * v0 */
+    acc += (q63_t) fnext *(
+  *pv++);
+
+    *px2++ = fnext;
+
+    /* write out into pDst */
+    *pDst++ = (q31_t) (acc >> 31u);
+
+    /* Advance the state pointer by 4 to process the next group of 4 samples */
+    pState = pState + 1u;
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy last S->numStages samples to start of the buffer    
+     for the preperation of next frame process */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = &S->pState[0];
+  pState = &S->pState[blockSize];
+
+  tapCnt = numStages >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numStages) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  };
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  /* Sample processing */
+  while(blkCnt > 0u)
+  {
+    /* Read Sample from input buffer */
+    /* fN(n) = x(n) */
+    fcurr = *pSrc++;
+
+    /* Initialize state read pointer */
+    px1 = pState;
+    /* Initialize state write pointer */
+    px2 = pState;
+    /* Set accumulator to zero */
+    acc = 0;
+    /* Initialize Ladder coeff pointer */
+    pv = &S->pvCoeffs[0];
+    /* Initialize Reflection coeff pointer */
+    pk = &S->pkCoeffs[0];
+
+    tapCnt = numStages;
+
+    while(tapCnt > 0u)
+    {
+      gcurr = *px1++;
+      /* Process sample */
+      /* fN-1(n) = fN(n) - kN * gN-1(n-1) */
+      fnext =
+        clip_q63_to_q31(((q63_t) fcurr -
+                         ((q31_t) (((q63_t) gcurr * (*pk)) >> 31))));
+      /* gN(n) = kN * fN-1(n) + gN-1(n-1) */
+      gnext =
+        clip_q63_to_q31(((q63_t) gcurr +
+                         ((q31_t) (((q63_t) fnext * (*pk++)) >> 31))));
+      /* Output samples */
+      /* y(n) += gN(n) * vN  */
+      acc += ((q63_t) gnext * *pv++);
+      /* write gN-1(n-1) into state for next sample processing */
+      *px2++ = gnext;
+      /* Update f values for next coefficient processing */
+      fcurr = fnext;
+
+      tapCnt--;
+    }
+
+    /* y(n) += g0(n) * v0 */
+    acc += (q63_t) fnext *(
+  *pv++);
+
+    *px2++ = fnext;
+
+    /* write out into pDst */
+    *pDst++ = (q31_t) (acc >> 31u);
+
+    /* Advance the state pointer by 1 to process the next group of samples */
+    pState = pState + 1u;
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy last S->numStages samples to start of the buffer           
+     for the preperation of next frame process */
+
+  /* Points to the start of the state buffer */
+  pStateCurnt = &S->pState[0];
+  pState = &S->pState[blockSize];
+
+  tapCnt = numStages;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+
+
+
+/**    
+ * @} end of IIR_Lattice group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,442 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_lms_f32.c    
+*    
+* Description:	Processing function for the floating-point LMS filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup LMS Least Mean Square (LMS) Filters    
+ *    
+ * LMS filters are a class of adaptive filters that are able to "learn" an unknown transfer functions.    
+ * LMS filters use a gradient descent method in which the filter coefficients are updated based on the instantaneous error signal.    
+ * Adaptive filters are often used in communication systems, equalizers, and noise removal.    
+ * The CMSIS DSP Library contains LMS filter functions that operate on Q15, Q31, and floating-point data types.    
+ * The library also contains normalized LMS filters in which the filter coefficient adaptation is indepedent of the level of the input signal.    
+ *    
+ * An LMS filter consists of two components as shown below.    
+ * The first component is a standard transversal or FIR filter.    
+ * The second component is a coefficient update mechanism.    
+ * The LMS filter has two input signals.    
+ * The "input" feeds the FIR filter while the "reference input" corresponds to the desired output of the FIR filter.    
+ * That is, the FIR filter coefficients are updated so that the output of the FIR filter matches the reference input.    
+ * The filter coefficient update mechanism is based on the difference between the FIR filter output and the reference input.    
+ * This "error signal" tends towards zero as the filter adapts.    
+ * The LMS processing functions accept the input and reference input signals and generate the filter output and error signal.    
+ * \image html LMS.gif "Internal structure of the Least Mean Square filter"    
+ *    
+ * The functions operate on blocks of data and each call to the function processes    
+ * <code>blockSize</code> samples through the filter.    
+ * <code>pSrc</code> points to input signal, <code>pRef</code> points to reference signal,    
+ * <code>pOut</code> points to output signal and <code>pErr</code> points to error signal.    
+ * All arrays contain <code>blockSize</code> values.    
+ *    
+ * The functions operate on a block-by-block basis.    
+ * Internally, the filter coefficients <code>b[n]</code> are updated on a sample-by-sample basis.    
+ * The convergence of the LMS filter is slower compared to the normalized LMS algorithm.    
+ *    
+ * \par Algorithm:    
+ * The output signal <code>y[n]</code> is computed by a standard FIR filter:    
+ * <pre>    
+ *     y[n] = b[0] * x[n] + b[1] * x[n-1] + b[2] * x[n-2] + ...+ b[numTaps-1] * x[n-numTaps+1]    
+ * </pre>    
+ *    
+ * \par    
+ * The error signal equals the difference between the reference signal <code>d[n]</code> and the filter output:    
+ * <pre>    
+ *     e[n] = d[n] - y[n].    
+ * </pre>    
+ *    
+ * \par    
+ * After each sample of the error signal is computed, the filter coefficients <code>b[k]</code> are updated on a sample-by-sample basis:    
+ * <pre>    
+ *     b[k] = b[k] + e[n] * mu * x[n-k],  for k=0, 1, ..., numTaps-1    
+ * </pre>    
+ * where <code>mu</code> is the step size and controls the rate of coefficient convergence.    
+ *\par    
+ * In the APIs, <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>.    
+ * Coefficients are stored in time reversed order.    
+ * \par    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to a state array of size <code>numTaps + blockSize - 1</code>.    
+ * Samples in the state buffer are stored in the order:    
+ * \par    
+ * <pre>    
+ *    {x[n-numTaps+1], x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2]....x[0], x[1], ..., x[blockSize-1]}    
+ * </pre>    
+ * \par    
+ * Note that the length of the state buffer exceeds the length of the coefficient array by <code>blockSize-1</code> samples.    
+ * The increased state buffer length allows circular addressing, which is traditionally used in FIR filters,    
+ * to be avoided and yields a significant speed improvement.    
+ * The state variables are updated after each block of data is processed.    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter and    
+ * coefficient and state arrays cannot be shared among instances.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+ *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numTaps, pCoeffs, mu, postShift (not for f32), pState. Also set all of the values in pState to zero. 
+ *
+ * \par    
+ * Use of the initialization function is optional.    
+ * However, if the initialization function is used, then the instance structure cannot be placed into a const data section.    
+ * To place an instance structure into a const data section, the instance structure must be manually initialized.    
+ * Set the values in the state buffer to zeros before static initialization.    
+ * The code below statically initializes each of the 3 different data type filter instance structures    
+ * <pre>    
+ *    arm_lms_instance_f32 S = {numTaps, pState, pCoeffs, mu};    
+ *    arm_lms_instance_q31 S = {numTaps, pState, pCoeffs, mu, postShift};    
+ *    arm_lms_instance_q15 S = {numTaps, pState, pCoeffs, mu, postShift};    
+ * </pre>    
+ * where <code>numTaps</code> is the number of filter coefficients in the filter; <code>pState</code> is the address of the state buffer;    
+ * <code>pCoeffs</code> is the address of the coefficient buffer; <code>mu</code> is the step size parameter; and <code>postShift</code> is the shift applied to coefficients.    
+ *    
+ * \par Fixed-Point Behavior:    
+ * Care must be taken when using the Q15 and Q31 versions of the LMS filter.    
+ * The following issues must be considered:    
+ * - Scaling of coefficients    
+ * - Overflow and saturation    
+ *    
+ * \par Scaling of Coefficients:    
+ * Filter coefficients are represented as fractional values and    
+ * coefficients are restricted to lie in the range <code>[-1 +1)</code>.    
+ * The fixed-point functions have an additional scaling parameter <code>postShift</code>.    
+ * At the output of the filter's accumulator is a shift register which shifts the result by <code>postShift</code> bits.    
+ * This essentially scales the filter coefficients by <code>2^postShift</code> and    
+ * allows the filter coefficients to exceed the range <code>[+1 -1)</code>.    
+ * The value of <code>postShift</code> is set by the user based on the expected gain through the system being modeled.    
+ *    
+ * \par Overflow and Saturation:    
+ * Overflow and saturation behavior of the fixed-point Q15 and Q31 versions are    
+ * described separately as part of the function specific documentation below.    
+ */
+
+/**    
+ * @addtogroup LMS    
+ * @{    
+ */
+
+/**           
+ * @details           
+ * This function operates on floating-point data types.       
+ *    
+ * @brief Processing function for floating-point LMS filter.    
+ * @param[in]  *S points to an instance of the floating-point LMS filter structure.    
+ * @param[in]  *pSrc points to the block of input data.    
+ * @param[in]  *pRef points to the block of reference data.    
+ * @param[out] *pOut points to the block of output data.    
+ * @param[out] *pErr points to the block of error data.    
+ * @param[in]  blockSize number of samples to process.    
+ * @return     none.    
+ */
+
+void arm_lms_f32(
+  const arm_lms_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pRef,
+  float32_t * pOut,
+  float32_t * pErr,
+  uint32_t blockSize)
+{
+  float32_t *pState = S->pState;                 /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+  float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
+  float32_t mu = S->mu;                          /* Adaptive factor */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+  float32_t sum, e, d;                           /* accumulator, error, reference data sample */
+  float32_t w = 0.0f;                            /* weight factor */
+
+  e = 0.0f;
+  d = 0.0f;
+
+  /* S->pState points to state array which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  blkCnt = blockSize;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Set the accumulator to zero */
+    sum = 0.0f;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (*px++) * (*pb++);
+      sum += (*px++) * (*pb++);
+      sum += (*px++) * (*pb++);
+      sum += (*px++) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (*px++) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* The result in the accumulator, store in the destination buffer. */
+    *pOut++ = sum;
+
+    /* Compute and store error */
+    d = (float32_t) (*pRef++);
+    e = d - sum;
+    *pErr++ = e;
+
+    /* Calculation of Weighting factor for the updating filter coefficients */
+    w = e * mu;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Update filter coefficients */
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      *pb = *pb + (w * (*px++));
+      pb++;
+
+      *pb = *pb + (w * (*px++));
+      pb++;
+
+      *pb = *pb + (w * (*px++));
+      pb++;
+
+      *pb = *pb + (w * (*px++));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      *pb = *pb + (w * (*px++));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the    
+     satrt of the state buffer. This prepares the state buffer for the    
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Loop unrolling for (numTaps - 1u) samples copy */
+  tapCnt = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Set the accumulator to zero */
+    sum = 0.0f;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (*px++) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* The result is stored in the destination buffer. */
+    *pOut++ = sum;
+
+    /* Compute and store error */
+    d = (float32_t) (*pRef++);
+    e = d - sum;
+    *pErr++ = e;
+
+    /* Weighting factor for the LMS version */
+    w = e * mu;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      *pb = *pb + (w * (*px++));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the        
+   * start of the state buffer. This prepares the state buffer for the        
+   * next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /*  Copy (numTaps - 1u) samples  */
+  tapCnt = (numTaps - 1u);
+
+  /* Copy the data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+   * @} end of LMS group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,95 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_lms_init_f32.c    
+*    
+* Description:  Floating-point LMS filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @addtogroup LMS    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for floating-point LMS filter.    
+   * @param[in] *S points to an instance of the floating-point LMS filter structure.    
+   * @param[in] numTaps  number of filter coefficients.    
+   * @param[in] *pCoeffs points to the coefficient buffer.    
+   * @param[in] *pState points to state buffer.    
+   * @param[in] mu step size that controls filter coefficient updates.    
+   * @param[in] blockSize number of samples to process.    
+   * @return none.    
+   */
+
+/**    
+ * \par Description:    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The initial filter coefficients serve as a starting point for the adaptive filter.    
+ * <code>pState</code> points to an array of length <code>numTaps+blockSize-1</code> samples, where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_lms_f32()</code>.    
+ */
+
+void arm_lms_init_f32(
+  arm_lms_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  float32_t mu,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numTaps */
+  memset(pState, 0, (numTaps + (blockSize - 1)) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  /* Assign Step size value */
+  S->mu = mu;
+}
+
+/**    
+ * @} end of LMS group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,105 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_lms_init_q15.c    
+*    
+* Description:  Q15 LMS filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS    
+ * @{    
+ */
+
+/**    
+* @brief Initialization function for the Q15 LMS filter.    
+* @param[in] *S points to an instance of the Q15 LMS filter structure.    
+* @param[in] numTaps  number of filter coefficients.    
+* @param[in] *pCoeffs points to the coefficient buffer.    
+* @param[in] *pState points to the state buffer.    
+* @param[in] mu step size that controls filter coefficient updates.    
+* @param[in] blockSize number of samples to process.    
+* @param[in] postShift bit shift applied to coefficients.    
+* @return    none.    
+*    
+* \par Description:    
+* <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+* <pre>    
+*    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+* </pre>    
+* The initial filter coefficients serve as a starting point for the adaptive filter.    
+* <code>pState</code> points to the array of state variables and size of array is    
+* <code>numTaps+blockSize-1</code> samples, where <code>blockSize</code> is the number of    
+* input samples processed by each call to <code>arm_lms_q15()</code>.    
+*/
+
+void arm_lms_init_q15(
+  arm_lms_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  q15_t mu,
+  uint32_t blockSize,
+  uint32_t postShift)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numTaps - 1 */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(q15_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  /* Assign Step size value */
+  S->mu = mu;
+
+  /* Assign postShift value to be applied */
+  S->postShift = postShift;
+
+}
+
+/**    
+ * @} end of LMS group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,105 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_lms_init_q31.c    
+*    
+* Description:  Q31 LMS filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for Q31 LMS filter.    
+   * @param[in] *S points to an instance of the Q31 LMS filter structure.    
+   * @param[in] numTaps  number of filter coefficients.    
+   * @param[in] *pCoeffs points to coefficient buffer.    
+   * @param[in] *pState points to state buffer.    
+   * @param[in] mu step size that controls filter coefficient updates.    
+   * @param[in] blockSize number of samples to process.    
+   * @param[in] postShift bit shift applied to coefficients.    
+   * @return none.    
+ *    
+ * \par Description:    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The initial filter coefficients serve as a starting point for the adaptive filter.    
+ * <code>pState</code> points to an array of length <code>numTaps+blockSize-1</code> samples,    
+ * where <code>blockSize</code> is the number of input samples processed by each call to    
+ * <code>arm_lms_q31()</code>.    
+ */
+
+void arm_lms_init_q31(
+  arm_lms_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  q31_t mu,
+  uint32_t blockSize,
+  uint32_t postShift)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numTaps - 1 */
+  memset(pState, 0, ((uint32_t) numTaps + (blockSize - 1u)) * sizeof(q31_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  /* Assign Step size value */
+  S->mu = mu;
+
+  /* Assign postShift value to be applied */
+  S->postShift = postShift;
+
+}
+
+/**    
+ * @} end of LMS group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,466 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_lms_norm_f32.c    
+*    
+* Description:	Processing function for the floating-point Normalised LMS.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @defgroup LMS_NORM Normalized LMS Filters    
+ *    
+ * This set of functions implements a commonly used adaptive filter.    
+ * It is related to the Least Mean Square (LMS) adaptive filter and includes an additional normalization    
+ * factor which increases the adaptation rate of the filter.    
+ * The CMSIS DSP Library contains normalized LMS filter functions that operate on Q15, Q31, and floating-point data types.    
+ *    
+ * A normalized least mean square (NLMS) filter consists of two components as shown below.    
+ * The first component is a standard transversal or FIR filter.    
+ * The second component is a coefficient update mechanism.    
+ * The NLMS filter has two input signals.    
+ * The "input" feeds the FIR filter while the "reference input" corresponds to the desired output of the FIR filter.    
+ * That is, the FIR filter coefficients are updated so that the output of the FIR filter matches the reference input.    
+ * The filter coefficient update mechanism is based on the difference between the FIR filter output and the reference input.    
+ * This "error signal" tends towards zero as the filter adapts.    
+ * The NLMS processing functions accept the input and reference input signals and generate the filter output and error signal.    
+ * \image html LMS.gif "Internal structure of the NLMS adaptive filter"    
+ *    
+ * The functions operate on blocks of data and each call to the function processes    
+ * <code>blockSize</code> samples through the filter.    
+ * <code>pSrc</code> points to input signal, <code>pRef</code> points to reference signal,    
+ * <code>pOut</code> points to output signal and <code>pErr</code> points to error signal.    
+ * All arrays contain <code>blockSize</code> values.    
+ *    
+ * The functions operate on a block-by-block basis.    
+ * Internally, the filter coefficients <code>b[n]</code> are updated on a sample-by-sample basis.    
+ * The convergence of the LMS filter is slower compared to the normalized LMS algorithm.    
+ *    
+ * \par Algorithm:    
+ * The output signal <code>y[n]</code> is computed by a standard FIR filter:    
+ * <pre>    
+ *     y[n] = b[0] * x[n] + b[1] * x[n-1] + b[2] * x[n-2] + ...+ b[numTaps-1] * x[n-numTaps+1]    
+ * </pre>    
+ *    
+ * \par    
+ * The error signal equals the difference between the reference signal <code>d[n]</code> and the filter output:    
+ * <pre>    
+ *     e[n] = d[n] - y[n].    
+ * </pre>    
+ *    
+ * \par    
+ * After each sample of the error signal is computed the instanteous energy of the filter state variables is calculated:    
+ * <pre>    
+ *    E = x[n]^2 + x[n-1]^2 + ... + x[n-numTaps+1]^2.    
+ * </pre>    
+ * The filter coefficients <code>b[k]</code> are then updated on a sample-by-sample basis:    
+ * <pre>    
+ *     b[k] = b[k] + e[n] * (mu/E) * x[n-k],  for k=0, 1, ..., numTaps-1    
+ * </pre>    
+ * where <code>mu</code> is the step size and controls the rate of coefficient convergence.    
+ *\par    
+ * In the APIs, <code>pCoeffs</code> points to a coefficient array of size <code>numTaps</code>.    
+ * Coefficients are stored in time reversed order.    
+ * \par    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * \par    
+ * <code>pState</code> points to a state array of size <code>numTaps + blockSize - 1</code>.    
+ * Samples in the state buffer are stored in the order:    
+ * \par    
+ * <pre>    
+ *    {x[n-numTaps+1], x[n-numTaps], x[n-numTaps-1], x[n-numTaps-2]....x[0], x[1], ..., x[blockSize-1]}    
+ * </pre>    
+ * \par    
+ * Note that the length of the state buffer exceeds the length of the coefficient array by <code>blockSize-1</code> samples.    
+ * The increased state buffer length allows circular addressing, which is traditionally used in FIR filters,    
+ * to be avoided and yields a significant speed improvement.    
+ * The state variables are updated after each block of data is processed.    
+ * \par Instance Structure    
+ * The coefficients and state variables for a filter are stored together in an instance data structure.    
+ * A separate instance structure must be defined for each filter and    
+ * coefficient and state arrays cannot be shared among instances.    
+ * There are separate instance structure declarations for each of the 3 supported data types.    
+ *    
+ * \par Initialization Functions    
+ * There is also an associated initialization function for each data type.    
+ * The initialization function performs the following operations:    
+ * - Sets the values of the internal structure fields.    
+ * - Zeros out the values in the state buffer.    
+ * To do this manually without calling the init function, assign the follow subfields of the instance structure:
+ * numTaps, pCoeffs, mu, energy, x0, pState. Also set all of the values in pState to zero. 
+ * For Q7, Q15, and Q31 the following fields must also be initialized;
+ * recipTable, postShift
+ *
+ * \par    
+ * Instance structure cannot be placed into a const data section and it is recommended to use the initialization function.    
+ * \par Fixed-Point Behavior:    
+ * Care must be taken when using the Q15 and Q31 versions of the normalised LMS filter.    
+ * The following issues must be considered:    
+ * - Scaling of coefficients    
+ * - Overflow and saturation    
+ *    
+ * \par Scaling of Coefficients:    
+ * Filter coefficients are represented as fractional values and    
+ * coefficients are restricted to lie in the range <code>[-1 +1)</code>.    
+ * The fixed-point functions have an additional scaling parameter <code>postShift</code>.    
+ * At the output of the filter's accumulator is a shift register which shifts the result by <code>postShift</code> bits.    
+ * This essentially scales the filter coefficients by <code>2^postShift</code> and    
+ * allows the filter coefficients to exceed the range <code>[+1 -1)</code>.    
+ * The value of <code>postShift</code> is set by the user based on the expected gain through the system being modeled.    
+ *    
+ * \par Overflow and Saturation:    
+ * Overflow and saturation behavior of the fixed-point Q15 and Q31 versions are    
+ * described separately as part of the function specific documentation below.    
+ */
+
+
+/**    
+ * @addtogroup LMS_NORM    
+ * @{    
+ */
+
+
+  /**    
+   * @brief Processing function for floating-point normalized LMS filter.    
+   * @param[in] *S points to an instance of the floating-point normalized LMS filter structure.    
+   * @param[in] *pSrc points to the block of input data.    
+   * @param[in] *pRef points to the block of reference data.    
+   * @param[out] *pOut points to the block of output data.    
+   * @param[out] *pErr points to the block of error data.    
+   * @param[in] blockSize number of samples to process.    
+   * @return none.    
+   */
+
+void arm_lms_norm_f32(
+  arm_lms_norm_instance_f32 * S,
+  float32_t * pSrc,
+  float32_t * pRef,
+  float32_t * pOut,
+  float32_t * pErr,
+  uint32_t blockSize)
+{
+  float32_t *pState = S->pState;                 /* State pointer */
+  float32_t *pCoeffs = S->pCoeffs;               /* Coefficient pointer */
+  float32_t *pStateCurnt;                        /* Points to the current sample of the state */
+  float32_t *px, *pb;                            /* Temporary pointers for state and coefficient buffers */
+  float32_t mu = S->mu;                          /* Adaptive factor */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+  float32_t energy;                              /* Energy of the input */
+  float32_t sum, e, d;                           /* accumulator, error, reference data sample */
+  float32_t w, x0, in;                           /* weight factor, temporary variable to hold input sample and state */
+
+  /* Initializations of error,  difference, Coefficient update */
+  e = 0.0f;
+  d = 0.0f;
+  w = 0.0f;
+
+  energy = S->energy;
+  x0 = S->x0;
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Read the sample from input buffer */
+    in = *pSrc++;
+
+    /* Update the energy calculation */
+    energy -= x0 * x0;
+    energy += in * in;
+
+    /* Set the accumulator to zero */
+    sum = 0.0f;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (*px++) * (*pb++);
+      sum += (*px++) * (*pb++);
+      sum += (*px++) * (*pb++);
+      sum += (*px++) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (*px++) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* The result in the accumulator, store in the destination buffer. */
+    *pOut++ = sum;
+
+    /* Compute and store error */
+    d = (float32_t) (*pRef++);
+    e = d - sum;
+    *pErr++ = e;
+
+    /* Calculation of Weighting factor for updating filter coefficients */
+    /* epsilon value 0.000000119209289f */
+    w = (e * mu) / (energy + 0.000000119209289f);
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Update filter coefficients */
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      *pb += w * (*px++);
+      pb++;
+
+      *pb += w * (*px++);
+      pb++;
+
+      *pb += w * (*px++);
+      pb++;
+
+      *pb += w * (*px++);
+      pb++;
+
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      *pb += w * (*px++);
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    x0 = *pState;
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  S->energy = energy;
+  S->x0 = x0;
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the    
+     satrt of the state buffer. This prepares the state buffer for the    
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Loop unrolling for (numTaps - 1u)/4 samples copy */
+  tapCnt = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Read the sample from input buffer */
+    in = *pSrc++;
+
+    /* Update the energy calculation */
+    energy -= x0 * x0;
+    energy += in * in;
+
+    /* Set the accumulator to zero */
+    sum = 0.0f;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      sum += (*px++) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* The result in the accumulator is stored in the destination buffer. */
+    *pOut++ = sum;
+
+    /* Compute and store error */
+    d = (float32_t) (*pRef++);
+    e = d - sum;
+    *pErr++ = e;
+
+    /* Calculation of Weighting factor for updating filter coefficients */
+    /* epsilon value 0.000000119209289f */
+    w = (e * mu) / (energy + 0.000000119209289f);
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCcoeffs pointer */
+    pb = pCoeffs;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      *pb += w * (*px++);
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    x0 = *pState;
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  S->energy = energy;
+  S->x0 = x0;
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the        
+     satrt of the state buffer. This prepares the state buffer for the        
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Copy (numTaps - 1u) samples  */
+  tapCnt = (numTaps - 1u);
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+   * @} end of LMS_NORM group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,105 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_lms_norm_init_f32.c    
+*    
+* Description:  Floating-point NLMS filter initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS_NORM    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for floating-point normalized LMS filter.    
+   * @param[in] *S points to an instance of the floating-point LMS filter structure.    
+   * @param[in] numTaps  number of filter coefficients.    
+   * @param[in] *pCoeffs points to coefficient buffer.    
+   * @param[in] *pState points to state buffer.    
+   * @param[in] mu step size that controls filter coefficient updates.    
+   * @param[in] blockSize number of samples to process.    
+   * @return none.    
+   *    
+ * \par Description:    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The initial filter coefficients serve as a starting point for the adaptive filter.    
+ * <code>pState</code> points to an array of length <code>numTaps+blockSize-1</code> samples,    
+ * where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_lms_norm_f32()</code>.    
+ */
+
+void arm_lms_norm_init_f32(
+  arm_lms_norm_instance_f32 * S,
+  uint16_t numTaps,
+  float32_t * pCoeffs,
+  float32_t * pState,
+  float32_t mu,
+  uint32_t blockSize)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numTaps - 1 */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(float32_t));
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  /* Assign Step size value */
+  S->mu = mu;
+
+  /* Initialise Energy to zero */
+  S->energy = 0.0f;
+
+  /* Initialise x0 to zero */
+  S->x0 = 0.0f;
+
+}
+
+/**    
+ * @} end of LMS_NORM group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,112 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_lms_norm_init_q15.c    
+*    
+* Description:  Q15 NLMS initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @addtogroup LMS_NORM    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for Q15 normalized LMS filter.    
+   * @param[in] *S points to an instance of the Q15 normalized LMS filter structure.    
+   * @param[in] numTaps  number of filter coefficients.    
+   * @param[in] *pCoeffs points to coefficient buffer.    
+   * @param[in] *pState points to state buffer.    
+   * @param[in] mu step size that controls filter coefficient updates.    
+   * @param[in] blockSize number of samples to process.    
+   * @param[in] postShift bit shift applied to coefficients.    
+   * @return none.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The initial filter coefficients serve as a starting point for the adaptive filter.    
+ * <code>pState</code> points to the array of state variables and size of array is    
+ * <code>numTaps+blockSize-1</code> samples, where <code>blockSize</code> is the number of input samples processed    
+ * by each call to <code>arm_lms_norm_q15()</code>.    
+ */
+
+void arm_lms_norm_init_q15(
+  arm_lms_norm_instance_q15 * S,
+  uint16_t numTaps,
+  q15_t * pCoeffs,
+  q15_t * pState,
+  q15_t mu,
+  uint32_t blockSize,
+  uint8_t postShift)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numTaps - 1 */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(q15_t));
+
+  /* Assign post Shift value applied to coefficients */
+  S->postShift = postShift;
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  /* Assign Step size value */
+  S->mu = mu;
+
+  /* Initialize reciprocal pointer table */
+  S->recipTable = (q15_t *) armRecipTableQ15;
+
+  /* Initialise Energy to zero */
+  S->energy = 0;
+
+  /* Initialise x0 to zero */
+  S->x0 = 0;
+
+}
+
+/**    
+ * @} end of LMS_NORM group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,111 @@
+/*-----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_lms_norm_init_q31.c    
+*    
+* Description:  Q31 NLMS initialization function.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------*/
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @addtogroup LMS_NORM    
+ * @{    
+ */
+
+  /**    
+   * @brief Initialization function for Q31 normalized LMS filter.    
+   * @param[in] *S points to an instance of the Q31 normalized LMS filter structure.    
+   * @param[in] numTaps  number of filter coefficients.    
+   * @param[in] *pCoeffs points to coefficient buffer.    
+   * @param[in] *pState points to state buffer.    
+   * @param[in] mu step size that controls filter coefficient updates.    
+   * @param[in] blockSize number of samples to process.    
+   * @param[in] postShift bit shift applied to coefficients.    
+   * @return none.    
+ *    
+ * <b>Description:</b>    
+ * \par    
+ * <code>pCoeffs</code> points to the array of filter coefficients stored in time reversed order:    
+ * <pre>    
+ *    {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}    
+ * </pre>    
+ * The initial filter coefficients serve as a starting point for the adaptive filter.    
+ * <code>pState</code> points to an array of length <code>numTaps+blockSize-1</code> samples,    
+ * where <code>blockSize</code> is the number of input samples processed by each call to <code>arm_lms_norm_q31()</code>.    
+ */
+
+void arm_lms_norm_init_q31(
+  arm_lms_norm_instance_q31 * S,
+  uint16_t numTaps,
+  q31_t * pCoeffs,
+  q31_t * pState,
+  q31_t mu,
+  uint32_t blockSize,
+  uint8_t postShift)
+{
+  /* Assign filter taps */
+  S->numTaps = numTaps;
+
+  /* Assign coefficient pointer */
+  S->pCoeffs = pCoeffs;
+
+  /* Clear state buffer and size is always blockSize + numTaps - 1  */
+  memset(pState, 0, (numTaps + (blockSize - 1u)) * sizeof(q31_t));
+
+  /* Assign post Shift value applied to coefficients */
+  S->postShift = postShift;
+
+  /* Assign state pointer */
+  S->pState = pState;
+
+  /* Assign Step size value */
+  S->mu = mu;
+
+  /* Initialize reciprocal pointer table */
+  S->recipTable = (q31_t *) armRecipTableQ31;
+
+  /* Initialise Energy to zero */
+  S->energy = 0;
+
+  /* Initialise x0 to zero */
+  S->x0 = 0;
+
+}
+
+/**    
+ * @} end of LMS_NORM group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,440 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_lms_norm_q15.c    
+*    
+* Description:	Q15 NLMS filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS_NORM    
+ * @{    
+ */
+
+/**    
+* @brief Processing function for Q15 normalized LMS filter.    
+* @param[in] *S points to an instance of the Q15 normalized LMS filter structure.    
+* @param[in] *pSrc points to the block of input data.    
+* @param[in] *pRef points to the block of reference data.    
+* @param[out] *pOut points to the block of output data.    
+* @param[out] *pErr points to the block of error data.    
+* @param[in] blockSize number of samples to process.    
+* @return none.    
+*    
+* <b>Scaling and Overflow Behavior:</b>     
+* \par     
+* The function is implemented using a 64-bit internal accumulator.     
+* Both coefficients and state variables are represented in 1.15 format and    
+* multiplications yield a 2.30 result. The 2.30 intermediate results are    
+* accumulated in a 64-bit accumulator in 34.30 format.     
+* There is no risk of internal overflow with this approach and the full    
+* precision of intermediate multiplications is preserved. After all additions    
+* have been performed, the accumulator is truncated to 34.15 format by    
+* discarding low 15 bits. Lastly, the accumulator is saturated to yield a    
+* result in 1.15 format.    
+*    
+* \par   
+* 	In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.    
+*    
+ */
+
+void arm_lms_norm_q15(
+  arm_lms_norm_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pRef,
+  q15_t * pOut,
+  q15_t * pErr,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t *px, *pb;                                /* Temporary pointers for state and coefficient buffers */
+  q15_t mu = S->mu;                              /* Adaptive factor */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+  q31_t energy;                                  /* Energy of the input */
+  q63_t acc;                                     /* Accumulator */
+  q15_t e = 0, d = 0;                            /* error, reference data sample */
+  q15_t w = 0, in;                               /* weight factor and state */
+  q15_t x0;                                      /* temporary variable to hold input sample */
+  //uint32_t shift = (uint32_t) S->postShift + 1u; /* Shift to be applied to the output */ 
+  q15_t errorXmu, oneByEnergy;                   /* Temporary variables to store error and mu product and reciprocal of energy */
+  q15_t postShift;                               /* Post shift to be applied to weight after reciprocal calculation */
+  q31_t coef;                                    /* Teporary variable for coefficient */
+  q31_t acc_l, acc_h;
+  int32_t lShift = (15 - (int32_t) S->postShift);       /*  Post shift  */
+  int32_t uShift = (32 - lShift);
+
+  energy = S->energy;
+  x0 = S->x0;
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Read the sample from input buffer */
+    in = *pSrc++;
+
+    /* Update the energy calculation */
+    energy -= (((q31_t) x0 * (x0)) >> 15);
+    energy += (((q31_t) in * (in)) >> 15);
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0u)
+    {
+
+      /* Perform the multiply-accumulate */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      acc = __SMLALD(*__SIMD32(px)++, (*__SIMD32(pb)++), acc);
+      acc = __SMLALD(*__SIMD32(px)++, (*__SIMD32(pb)++), acc);
+
+#else
+
+      acc += (((q31_t) * px++ * (*pb++)));
+      acc += (((q31_t) * px++ * (*pb++)));
+      acc += (((q31_t) * px++ * (*pb++)));
+      acc += (((q31_t) * px++ * (*pb++)));
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += (((q31_t) * px++ * (*pb++)));
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    /* Apply shift for lower part of acc and upper part of acc */
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    /* Converting the result to 1.15 format and saturate the output */
+    acc = __SSAT(acc, 16u);
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q15_t) acc;
+
+    /* Compute and store error */
+    d = *pRef++;
+    e = d - (q15_t) acc;
+    *pErr++ = e;
+
+    /* Calculation of 1/energy */
+    postShift = arm_recip_q15((q15_t) energy + DELTA_Q15,
+                              &oneByEnergy, S->recipTable);
+
+    /* Calculation of e * mu value */
+    errorXmu = (q15_t) (((q31_t) e * mu) >> 15);
+
+    /* Calculation of (e * mu) * (1/energy) value */
+    acc = (((q31_t) errorXmu * oneByEnergy) >> (15 - postShift));
+
+    /* Weighting factor for the normalized version */
+    w = (q15_t) __SSAT((q31_t) acc, 16);
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Update filter coefficients */
+    while(tapCnt > 0u)
+    {
+      coef = *pb + (((q31_t) w * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+      coef = *pb + (((q31_t) w * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+      coef = *pb + (((q31_t) w * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+      coef = *pb + (((q31_t) w * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = *pb + (((q31_t) w * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Read the sample from state buffer */
+    x0 = *pState;
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Save energy and x0 values for the next frame */
+  S->energy = (q15_t) energy;
+  S->x0 = x0;
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the    
+     satrt of the state buffer. This prepares the state buffer for the    
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Calculation of count for copying integer writes */
+  tapCnt = (numTaps - 1u) >> 2;
+
+  while(tapCnt > 0u)
+  {
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+
+#else
+
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+#endif
+
+    tapCnt--;
+
+  }
+
+  /* Calculation of count for remaining q15_t data */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Read the sample from input buffer */
+    in = *pSrc++;
+
+    /* Update the energy calculation */
+    energy -= (((q31_t) x0 * (x0)) >> 15);
+    energy += (((q31_t) in * (in)) >> 15);
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += (((q31_t) * px++ * (*pb++)));
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    /* Apply shift for lower part of acc and upper part of acc */
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    /* Converting the result to 1.15 format and saturate the output */
+    acc = __SSAT(acc, 16u);
+
+    /* Converting the result to 1.15 format */
+    //acc = __SSAT((acc >> (16u - shift)), 16u); 
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q15_t) acc;
+
+    /* Compute and store error */
+    d = *pRef++;
+    e = d - (q15_t) acc;
+    *pErr++ = e;
+
+    /* Calculation of 1/energy */
+    postShift = arm_recip_q15((q15_t) energy + DELTA_Q15,
+                              &oneByEnergy, S->recipTable);
+
+    /* Calculation of e * mu value */
+    errorXmu = (q15_t) (((q31_t) e * mu) >> 15);
+
+    /* Calculation of (e * mu) * (1/energy) value */
+    acc = (((q31_t) errorXmu * oneByEnergy) >> (15 - postShift));
+
+    /* Weighting factor for the normalized version */
+    w = (q15_t) __SSAT((q31_t) acc, 16);
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = *pb + (((q31_t) w * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Read the sample from state buffer */
+    x0 = *pState;
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Save energy and x0 values for the next frame */
+  S->energy = (q15_t) energy;
+  S->x0 = x0;
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the        
+     satrt of the state buffer. This prepares the state buffer for the        
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* copy (numTaps - 1u) data */
+  tapCnt = (numTaps - 1u);
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+
+/**    
+   * @} end of LMS_NORM group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_norm_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,431 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_lms_norm_q31.c    
+*    
+* Description:	Processing function for the Q31 NLMS filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS_NORM    
+ * @{    
+ */
+
+/**    
+* @brief Processing function for Q31 normalized LMS filter.    
+* @param[in] *S points to an instance of the Q31 normalized LMS filter structure.    
+* @param[in] *pSrc points to the block of input data.    
+* @param[in] *pRef points to the block of reference data.    
+* @param[out] *pOut points to the block of output data.    
+* @param[out] *pErr points to the block of error data.    
+* @param[in] blockSize number of samples to process.    
+* @return none.    
+*    
+* <b>Scaling and Overflow Behavior:</b>     
+* \par     
+* The function is implemented using an internal 64-bit accumulator.     
+* The accumulator has a 2.62 format and maintains full precision of the intermediate   
+* multiplication results but provides only a single guard bit.     
+* Thus, if the accumulator result overflows it wraps around rather than clip.     
+* In order to avoid overflows completely the input signal must be scaled down by    
+* log2(numTaps) bits. The reference signal should not be scaled down.     
+* After all multiply-accumulates are performed, the 2.62 accumulator is shifted    
+* and saturated to 1.31 format to yield the final result.     
+* The output signal and error signal are in 1.31 format.     
+*    
+* \par    
+* 	In this filter, filter coefficients are updated for each sample and the    
+* updation of filter cofficients are saturted.    
+*     
+*/
+
+void arm_lms_norm_q31(
+  arm_lms_norm_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pRef,
+  q31_t * pOut,
+  q31_t * pErr,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t *px, *pb;                                /* Temporary pointers for state and coefficient buffers */
+  q31_t mu = S->mu;                              /* Adaptive factor */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+  q63_t energy;                                  /* Energy of the input */
+  q63_t acc;                                     /* Accumulator */
+  q31_t e = 0, d = 0;                            /* error, reference data sample */
+  q31_t w = 0, in;                               /* weight factor and state */
+  q31_t x0;                                      /* temporary variable to hold input sample */
+//  uint32_t shift = 32u - ((uint32_t) S->postShift + 1u);        /* Shift to be applied to the output */      
+  q31_t errorXmu, oneByEnergy;                   /* Temporary variables to store error and mu product and reciprocal of energy */
+  q31_t postShift;                               /* Post shift to be applied to weight after reciprocal calculation */
+  q31_t coef;                                    /* Temporary variable for coef */
+  q31_t acc_l, acc_h;                            /*  temporary input */
+  uint32_t uShift = ((uint32_t) S->postShift + 1u);
+  uint32_t lShift = 32u - uShift;                /*  Shift to be applied to the output */
+
+  energy = S->energy;
+  x0 = S->x0;
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  while(blkCnt > 0u)
+  {
+
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Read the sample from input buffer */
+    in = *pSrc++;
+
+    /* Update the energy calculation */
+    energy = (q31_t) ((((q63_t) energy << 32) -
+                       (((q63_t) x0 * x0) << 1)) >> 32);
+    energy = (q31_t) (((((q63_t) in * in) << 1) + (energy << 32)) >> 32);
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += ((q63_t) (*px++)) * (*pb++);
+      acc += ((q63_t) (*px++)) * (*pb++);
+      acc += ((q63_t) (*px++)) * (*pb++);
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Converting the result to 1.31 format */
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q31_t) acc;
+
+    /* Compute and store error */
+    d = *pRef++;
+    e = d - (q31_t) acc;
+    *pErr++ = e;
+
+    /* Calculates the reciprocal of energy */
+    postShift = arm_recip_q31(energy + DELTA_Q31,
+                              &oneByEnergy, &S->recipTable[0]);
+
+    /* Calculation of product of (e * mu) */
+    errorXmu = (q31_t) (((q63_t) e * mu) >> 31);
+
+    /* Weighting factor for the normalized version */
+    w = clip_q63_to_q31(((q63_t) errorXmu * oneByEnergy) >> (31 - postShift));
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Update filter coefficients */
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+
+      /* coef is in 2.30 format */
+      coef = (q31_t) (((q63_t) w * (*px++)) >> (32));
+      /* get coef in 1.31 format by left shifting */
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      /* update coefficient buffer to next coefficient */
+      pb++;
+
+      coef = (q31_t) (((q63_t) w * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      coef = (q31_t) (((q63_t) w * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      coef = (q31_t) (((q63_t) w * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = (q31_t) (((q63_t) w * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Read the sample from state buffer */
+    x0 = *pState;
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Save energy and x0 values for the next frame */
+  S->energy = (q31_t) energy;
+  S->x0 = x0;
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the    
+     satrt of the state buffer. This prepares the state buffer for the    
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Loop unrolling for (numTaps - 1u) samples copy */
+  tapCnt = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(blkCnt > 0u)
+  {
+
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Read the sample from input buffer */
+    in = *pSrc++;
+
+    /* Update the energy calculation */
+    energy =
+      (q31_t) ((((q63_t) energy << 32) - (((q63_t) x0 * x0) << 1)) >> 32);
+    energy = (q31_t) (((((q63_t) in * in) << 1) + (energy << 32)) >> 32);
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Converting the result to 1.31 format */
+    /* Converting the result to 1.31 format */
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+
+    //acc = (q31_t) (acc >> shift); 
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q31_t) acc;
+
+    /* Compute and store error */
+    d = *pRef++;
+    e = d - (q31_t) acc;
+    *pErr++ = e;
+
+    /* Calculates the reciprocal of energy */
+    postShift =
+      arm_recip_q31(energy + DELTA_Q31, &oneByEnergy, &S->recipTable[0]);
+
+    /* Calculation of product of (e * mu) */
+    errorXmu = (q31_t) (((q63_t) e * mu) >> 31);
+
+    /* Weighting factor for the normalized version */
+    w = clip_q63_to_q31(((q63_t) errorXmu * oneByEnergy) >> (31 - postShift));
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize coeff pointer */
+    pb = (pCoeffs);
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      /* coef is in 2.30 format */
+      coef = (q31_t) (((q63_t) w * (*px++)) >> (32));
+      /* get coef in 1.31 format by left shifting */
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      /* update coefficient buffer to next coefficient */
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Read the sample from state buffer */
+    x0 = *pState;
+
+    /* Advance state pointer by 1 for the next sample */
+    pState = pState + 1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Save energy and x0 values for the next frame */
+  S->energy = (q31_t) energy;
+  S->x0 = x0;
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the     
+     start of the state buffer. This prepares the state buffer for the        
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Loop for (numTaps - 1u) samples copy */
+  tapCnt = (numTaps - 1u);
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of LMS_NORM group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,380 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_lms_q15.c    
+*    
+* Description:	Processing function for the Q15 LMS filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS    
+ * @{    
+ */
+
+ /**    
+ * @brief Processing function for Q15 LMS filter.    
+ * @param[in] *S points to an instance of the Q15 LMS filter structure.    
+ * @param[in] *pSrc points to the block of input data.    
+ * @param[in] *pRef points to the block of reference data.    
+ * @param[out] *pOut points to the block of output data.    
+ * @param[out] *pErr points to the block of error data.    
+ * @param[in] blockSize number of samples to process.    
+ * @return none.    
+ *    
+ * \par Scaling and Overflow Behavior:    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * Both coefficients and state variables are represented in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate results are accumulated in a 64-bit accumulator in 34.30 format.    
+ * There is no risk of internal overflow with this approach and the full precision of intermediate multiplications is preserved.    
+ * After all additions have been performed, the accumulator is truncated to 34.15 format by discarding low 15 bits.    
+ * Lastly, the accumulator is saturated to yield a result in 1.15 format.    
+ *   
+ * \par   
+ * 	In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.   
+ *    
+ */
+
+void arm_lms_q15(
+  const arm_lms_instance_q15 * S,
+  q15_t * pSrc,
+  q15_t * pRef,
+  q15_t * pOut,
+  q15_t * pErr,
+  uint32_t blockSize)
+{
+  q15_t *pState = S->pState;                     /* State pointer */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  q15_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q15_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q15_t mu = S->mu;                              /* Adaptive factor */
+  q15_t *px;                                     /* Temporary pointer for state */
+  q15_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+  q63_t acc;                                     /* Accumulator */
+  q15_t e = 0;                                   /* error of data sample */
+  q15_t alpha;                                   /* Intermediate constant for taps update */
+  q31_t coef;                                    /* Teporary variable for coefficient */
+  q31_t acc_l, acc_h;
+  int32_t lShift = (15 - (int32_t) S->postShift);       /*  Post shift  */
+  int32_t uShift = (32 - lShift);
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Initializing blkCnt with blockSize */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2u;
+
+    while(tapCnt > 0u)
+    {
+      /* acc +=  b[N] * x[n-N] + b[N-1] * x[n-N-1] */
+      /* Perform the multiply-accumulate */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+      acc = __SMLALD(*__SIMD32(px)++, (*__SIMD32(pb)++), acc);
+      acc = __SMLALD(*__SIMD32(px)++, (*__SIMD32(pb)++), acc);
+
+#else
+
+      acc += (q63_t) (((q31_t) (*px++) * (*pb++)));
+      acc += (q63_t) (((q31_t) (*px++) * (*pb++)));
+      acc += (q63_t) (((q31_t) (*px++) * (*pb++)));
+      acc += (q63_t) (((q31_t) (*px++) * (*pb++)));
+
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += (q63_t) (((q31_t) (*px++) * (*pb++)));
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    /* Apply shift for lower part of acc and upper part of acc */
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    /* Converting the result to 1.15 format and saturate the output */
+    acc = __SSAT(acc, 16);
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q15_t) acc;
+
+    /* Compute and store error */
+    e = *pRef++ - (q15_t) acc;
+
+    *pErr++ = (q15_t) e;
+
+    /* Compute alpha i.e. intermediate constant for taps update */
+    alpha = (q15_t) (((q31_t) e * (mu)) >> 15);
+
+    /* Initialize state pointer */
+    /* Advance state pointer by 1 for the next sample */
+    px = pState++;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2u;
+
+    /* Update filter coefficients */
+    while(tapCnt > 0u)
+    {
+      coef = (q31_t) * pb + (((q31_t) alpha * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+      coef = (q31_t) * pb + (((q31_t) alpha * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+      coef = (q31_t) * pb + (((q31_t) alpha * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+      coef = (q31_t) * pb + (((q31_t) alpha * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = (q31_t) * pb + (((q31_t) alpha * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Decrement the loop counter */
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the    
+     satrt of the state buffer. This prepares the state buffer for the    
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Calculation of count for copying integer writes */
+  tapCnt = (numTaps - 1u) >> 2;
+
+  while(tapCnt > 0u)
+  {
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+    *__SIMD32(pStateCurnt)++ = *__SIMD32(pState)++;
+#else
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+#endif
+
+    tapCnt--;
+
+  }
+
+  /* Calculation of count for remaining q15_t data */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += (q63_t) ((q31_t) (*px++) * (*pb++));
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    /* Apply shift for lower part of acc and upper part of acc */
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    /* Converting the result to 1.15 format and saturate the output */
+    acc = __SSAT(acc, 16);
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q15_t) acc;
+
+    /* Compute and store error */
+    e = *pRef++ - (q15_t) acc;
+
+    *pErr++ = (q15_t) e;
+
+    /* Compute alpha i.e. intermediate constant for taps update */
+    alpha = (q15_t) (((q31_t) e * (mu)) >> 15);
+
+    /* Initialize pState pointer */
+    /* Advance state pointer by 1 for the next sample */
+    px = pState++;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = (q31_t) * pb + (((q31_t) alpha * (*px++)) >> 15);
+      *pb++ = (q15_t) __SSAT((coef), 16);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Decrement the loop counter */
+    blkCnt--;
+
+  }
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the        
+     start of the state buffer. This prepares the state buffer for the   
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /*  Copy (numTaps - 1u) samples  */
+  tapCnt = (numTaps - 1u);
+
+  /* Copy the data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+   * @} end of LMS group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/FilteringFunctions/arm_lms_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,369 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_lms_q31.c    
+*    
+* Description:	Processing function for the Q31 LMS filter.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+/**    
+ * @ingroup groupFilters    
+ */
+
+/**    
+ * @addtogroup LMS    
+ * @{    
+ */
+
+ /**    
+ * @brief Processing function for Q31 LMS filter.    
+ * @param[in]  *S points to an instance of the Q15 LMS filter structure.    
+ * @param[in]  *pSrc points to the block of input data.    
+ * @param[in]  *pRef points to the block of reference data.    
+ * @param[out] *pOut points to the block of output data.    
+ * @param[out] *pErr points to the block of error data.    
+ * @param[in]  blockSize number of samples to process.    
+ * @return     none.    
+ *    
+ * \par Scaling and Overflow Behavior:     
+ * The function is implemented using an internal 64-bit accumulator.     
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate    
+ * multiplication results but provides only a single guard bit.     
+ * Thus, if the accumulator result overflows it wraps around rather than clips.     
+ * In order to avoid overflows completely the input signal must be scaled down by    
+ * log2(numTaps) bits.     
+ * The reference signal should not be scaled down.     
+ * After all multiply-accumulates are performed, the 2.62 accumulator is shifted    
+ * and saturated to 1.31 format to yield the final result.     
+ * The output signal and error signal are in 1.31 format.     
+ *    
+ * \par    
+ * 	In this filter, filter coefficients are updated for each sample and the updation of filter cofficients are saturted.    
+ */
+
+void arm_lms_q31(
+  const arm_lms_instance_q31 * S,
+  q31_t * pSrc,
+  q31_t * pRef,
+  q31_t * pOut,
+  q31_t * pErr,
+  uint32_t blockSize)
+{
+  q31_t *pState = S->pState;                     /* State pointer */
+  uint32_t numTaps = S->numTaps;                 /* Number of filter coefficients in the filter */
+  q31_t *pCoeffs = S->pCoeffs;                   /* Coefficient pointer */
+  q31_t *pStateCurnt;                            /* Points to the current sample of the state */
+  q31_t mu = S->mu;                              /* Adaptive factor */
+  q31_t *px;                                     /* Temporary pointer for state */
+  q31_t *pb;                                     /* Temporary pointer for coefficient buffer */
+  uint32_t tapCnt, blkCnt;                       /* Loop counters */
+  q63_t acc;                                     /* Accumulator */
+  q31_t e = 0;                                   /* error of data sample */
+  q31_t alpha;                                   /* Intermediate constant for taps update */
+  q31_t coef;                                    /* Temporary variable for coef */
+  q31_t acc_l, acc_h;                            /*  temporary input */
+  uint32_t uShift = ((uint32_t) S->postShift + 1u);
+  uint32_t lShift = 32u - uShift;                /*  Shift to be applied to the output */
+
+  /* S->pState points to buffer which contains previous frame (numTaps - 1) samples */
+  /* pStateCurnt points to the location where the new input data should be written */
+  pStateCurnt = &(S->pState[(numTaps - 1u)]);
+
+  /* Initializing blkCnt with blockSize */
+  blkCnt = blockSize;
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Initialize state pointer */
+    px = pState;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      /* acc +=  b[N] * x[n-N] */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* acc +=  b[N-1] * x[n-N-1] */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* acc +=  b[N-2] * x[n-N-2] */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* acc +=  b[N-3] * x[n-N-3] */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Converting the result to 1.31 format */
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    /* Store the result from accumulator into the destination buffer. */
+    *pOut++ = (q31_t) acc;
+
+    /* Compute and store error */
+    e = *pRef++ - (q31_t) acc;
+
+    *pErr++ = (q31_t) e;
+
+    /* Compute alpha i.e. intermediate constant for taps update */
+    alpha = (q31_t) (((q63_t) e * mu) >> 31);
+
+    /* Initialize state pointer */
+    /* Advance state pointer by 1 for the next sample */
+    px = pState++;
+
+    /* Initialize coefficient pointer */
+    pb = pCoeffs;
+
+    /* Loop unrolling.  Process 4 taps at a time. */
+    tapCnt = numTaps >> 2;
+
+    /* Update filter coefficients */
+    while(tapCnt > 0u)
+    {
+      /* coef is in 2.30 format */
+      coef = (q31_t) (((q63_t) alpha * (*px++)) >> (32));
+      /* get coef in 1.31 format by left shifting */
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      /* update coefficient buffer to next coefficient */
+      pb++;
+
+      coef = (q31_t) (((q63_t) alpha * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      coef = (q31_t) (((q63_t) alpha * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      coef = (q31_t) (((q63_t) alpha * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* If the filter length is not a multiple of 4, compute the remaining filter taps */
+    tapCnt = numTaps % 0x4u;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = (q31_t) (((q63_t) alpha * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the    
+     satrt of the state buffer. This prepares the state buffer for the    
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /* Loop unrolling for (numTaps - 1u) samples copy */
+  tapCnt = (numTaps - 1u) >> 2u;
+
+  /* copy data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+  /* Calculate remaining number of copies */
+  tapCnt = (numTaps - 1u) % 0x4u;
+
+  /* Copy the remaining q31_t data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  while(blkCnt > 0u)
+  {
+    /* Copy the new input sample into the state buffer */
+    *pStateCurnt++ = *pSrc++;
+
+    /* Initialize pState pointer */
+    px = pState;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Set the accumulator to zero */
+    acc = 0;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      acc += ((q63_t) (*px++)) * (*pb++);
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Converting the result to 1.31 format */
+    /* Store the result from accumulator into the destination buffer. */
+    /* Calc lower part of acc */
+    acc_l = acc & 0xffffffff;
+
+    /* Calc upper part of acc */
+    acc_h = (acc >> 32) & 0xffffffff;
+
+    acc = (uint32_t) acc_l >> lShift | acc_h << uShift;
+
+    *pOut++ = (q31_t) acc;
+
+    /* Compute and store error */
+    e = *pRef++ - (q31_t) acc;
+
+    *pErr++ = (q31_t) e;
+
+    /* Weighting factor for the LMS version */
+    alpha = (q31_t) (((q63_t) e * mu) >> 31);
+
+    /* Initialize pState pointer */
+    /* Advance state pointer by 1 for the next sample */
+    px = pState++;
+
+    /* Initialize pCoeffs pointer */
+    pb = pCoeffs;
+
+    /* Loop over numTaps number of values */
+    tapCnt = numTaps;
+
+    while(tapCnt > 0u)
+    {
+      /* Perform the multiply-accumulate */
+      coef = (q31_t) (((q63_t) alpha * (*px++)) >> (32));
+      *pb = clip_q63_to_q31((q63_t) * pb + (coef << 1u));
+      pb++;
+
+      /* Decrement the loop counter */
+      tapCnt--;
+    }
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Processing is complete. Now copy the last numTaps - 1 samples to the     
+     start of the state buffer. This prepares the state buffer for the   
+     next function call. */
+
+  /* Points to the start of the pState buffer */
+  pStateCurnt = S->pState;
+
+  /*  Copy (numTaps - 1u) samples  */
+  tapCnt = (numTaps - 1u);
+
+  /* Copy the data */
+  while(tapCnt > 0u)
+  {
+    *pStateCurnt++ = *pState++;
+
+    /* Decrement the loop counter */
+    tapCnt--;
+  }
+
+#endif /*   #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+   * @} end of LMS group    
+   */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,208 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_mat_add_f32.c    
+*    
+* Description:	Floating-point matrix addition    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMatrix        
+ */
+
+/**        
+ * @defgroup MatrixAdd Matrix Addition        
+ *        
+ * Adds two matrices.        
+ * \image html MatrixAddition.gif "Addition of two 3 x 3 matrices"        
+ *        
+ * The functions check to make sure that        
+ * <code>pSrcA</code>, <code>pSrcB</code>, and <code>pDst</code> have the same        
+ * number of rows and columns.        
+ */
+
+/**        
+ * @addtogroup MatrixAdd        
+ * @{        
+ */
+
+
+/**        
+ * @brief Floating-point matrix addition.        
+ * @param[in]       *pSrcA points to the first input matrix structure        
+ * @param[in]       *pSrcB points to the second input matrix structure        
+ * @param[out]      *pDst points to output matrix structure        
+ * @return     		The function returns either        
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.        
+ */
+
+arm_status arm_mat_add_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn1 = pSrcA->pData;                /* input data matrix pointer A  */
+  float32_t *pIn2 = pSrcB->pData;                /* input data matrix pointer B  */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer   */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  float32_t inA1, inA2, inB1, inB2, out1, out2;  /* temporary variables */
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  uint32_t numSamples;                           /* total number of elements in the matrix  */
+  uint32_t blkCnt;                               /* loop counters */
+  arm_status status;                             /* status of matrix addition */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numRows != pSrcB->numRows) ||
+     (pSrcA->numCols != pSrcB->numCols) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcA->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif
+  {
+
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrcA->numRows * pSrcA->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Loop unrolling */
+    blkCnt = numSamples >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add and then store the results in the destination buffer. */
+      /* Read values from source A */
+      inA1 = pIn1[0];
+
+      /* Read values from source B */
+      inB1 = pIn2[0];
+
+      /* Read values from source A */
+      inA2 = pIn1[1];
+
+      /* out = sourceA + sourceB */
+      out1 = inA1 + inB1;
+
+      /* Read values from source B */
+      inB2 = pIn2[1];
+
+      /* Read values from source A */
+      inA1 = pIn1[2];
+
+      /* out = sourceA + sourceB */
+      out2 = inA2 + inB2;
+
+      /* Read values from source B */
+      inB1 = pIn2[2];
+
+      /* Store result in destination */
+      pOut[0] = out1;
+      pOut[1] = out2;
+
+      /* Read values from source A */
+      inA2 = pIn1[3];
+
+      /* Read values from source B */
+      inB2 = pIn2[3];
+
+      /* out = sourceA + sourceB */
+      out1 = inA1 + inB1;
+
+      /* out = sourceA + sourceB */
+      out2 = inA2 + inB2;
+
+      /* Store result in destination */
+      pOut[2] = out1;
+
+      /* Store result in destination */
+      pOut[3] = out2;
+
+
+      /* update pointers to process next sampels */
+      pIn1 += 4u;
+      pIn2 += 4u;
+      pOut += 4u;
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add and then store the results in the destination buffer. */
+      *pOut++ = (*pIn1++) + (*pIn2++);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixAdd group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,163 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_add_q15.c    
+*    
+* Description:	Q15 matrix addition    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixAdd    
+ * @{    
+ */
+
+/**    
+ * @brief Q15 matrix addition.    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+arm_status arm_mat_add_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst)
+{
+  q15_t *pInA = pSrcA->pData;                    /* input data matrix pointer A  */
+  q15_t *pInB = pSrcB->pData;                    /* input data matrix pointer B */
+  q15_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  uint16_t numSamples;                           /* total number of elements in the matrix  */
+  uint32_t blkCnt;                               /* loop counters  */
+  arm_status status;                             /* status of matrix addition  */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numRows != pSrcB->numRows) ||
+     (pSrcA->numCols != pSrcB->numCols) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcA->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint16_t) (pSrcA->numRows * pSrcA->numCols);
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Loop unrolling */
+    blkCnt = (uint32_t) numSamples >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add, Saturate and then store the results in the destination buffer. */
+      *__SIMD32(pOut)++ = __QADD16(*__SIMD32(pInA)++, *__SIMD32(pInB)++);
+      *__SIMD32(pOut)++ = __QADD16(*__SIMD32(pInA)++, *__SIMD32(pInB)++);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = (uint32_t) numSamples % 0x4u;
+
+    /* q15 pointers of input and output are initialized */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add, Saturate and then store the results in the destination buffer. */
+      *pOut++ = (q15_t) __QADD16(*pInA++, *pInB++);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = (uint32_t) numSamples;
+
+
+    /* q15 pointers of input and output are initialized */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add, Saturate and then store the results in the destination buffer. */
+      *pOut++ = (q15_t) __SSAT(((q31_t) * pInA++ + *pInB++), 16);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixAdd group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_add_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,207 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_add_q31.c    
+*    
+* Description:	Q31 matrix addition    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**      
+ * @ingroup groupMatrix      
+ */
+
+/**      
+ * @addtogroup MatrixAdd      
+ * @{      
+ */
+
+/**      
+ * @brief Q31 matrix addition.      
+ * @param[in]       *pSrcA points to the first input matrix structure      
+ * @param[in]       *pSrcB points to the second input matrix structure      
+ * @param[out]      *pDst points to output matrix structure      
+ * @return     		The function returns either      
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.      
+ *      
+ * <b>Scaling and Overflow Behavior:</b>      
+ * \par      
+ * The function uses saturating arithmetic.      
+ * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated.      
+ */
+
+arm_status arm_mat_add_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn1 = pSrcA->pData;                    /* input data matrix pointer A */
+  q31_t *pIn2 = pSrcB->pData;                    /* input data matrix pointer B */
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  q31_t inA1, inB1;                              /* temporary variables */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  q31_t inA2, inB2;                              /* temporary variables */
+  q31_t out1, out2;                              /* temporary variables */
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  uint32_t numSamples;                           /* total number of elements in the matrix  */
+  uint32_t blkCnt;                               /* loop counters */
+  arm_status status;                             /* status of matrix addition */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numRows != pSrcB->numRows) ||
+     (pSrcA->numCols != pSrcB->numCols) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcA->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrcA->numRows * pSrcA->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Loop Unrolling */
+    blkCnt = numSamples >> 2u;
+
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add, saturate and then store the results in the destination buffer. */
+      /* Read values from source A */
+      inA1 = pIn1[0];
+
+      /* Read values from source B */
+      inB1 = pIn2[0];
+
+      /* Read values from source A */
+      inA2 = pIn1[1];
+
+      /* Add and saturate */
+      out1 = __QADD(inA1, inB1);
+
+      /* Read values from source B */
+      inB2 = pIn2[1];
+
+      /* Read values from source A */
+      inA1 = pIn1[2];
+
+      /* Add and saturate */
+      out2 = __QADD(inA2, inB2);
+
+      /* Read values from source B */
+      inB1 = pIn2[2];
+
+      /* Store result in destination */
+      pOut[0] = out1;
+      pOut[1] = out2;
+
+      /* Read values from source A */
+      inA2 = pIn1[3];
+
+      /* Read values from source B */
+      inB2 = pIn2[3];
+
+      /* Add and saturate */
+      out1 = __QADD(inA1, inB1);
+      out2 = __QADD(inA2, inB2);
+
+      /* Store result in destination */
+      pOut[2] = out1;
+      pOut[3] = out2;
+
+      /* update pointers to process next sampels */
+      pIn1 += 4u;
+      pIn2 += 4u;
+      pOut += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.      
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) + B(m,n) */
+      /* Add, saturate and then store the results in the destination buffer. */
+      inA1 = *pIn1++;
+      inB1 = *pIn2++;
+
+      inA1 = __QADD(inA1, inB1);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+
+      *pOut++ = inA1;
+
+    }
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**      
+ * @} end of MatrixAdd group      
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,283 @@
+/* ----------------------------------------------------------------------      
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved. 
+*      
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*      
+* Project:      CMSIS DSP Library 
+* Title:	    arm_mat_cmplx_mult_f32.c      
+*      
+* Description:  Floating-point matrix multiplication.      
+*      
+* Target Processor:          Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**      
+ * @ingroup groupMatrix      
+ */
+
+/**      
+ * @defgroup CmplxMatrixMult  Complex Matrix Multiplication     
+ *     
+ * Complex Matrix multiplication is only defined if the number of columns of the      
+ * first matrix equals the number of rows of the second matrix.      
+ * Multiplying an <code>M x N</code> matrix with an <code>N x P</code> matrix results      
+ * in an <code>M x P</code> matrix.      
+ * When matrix size checking is enabled, the functions check: (1) that the inner dimensions of      
+ * <code>pSrcA</code> and <code>pSrcB</code> are equal; and (2) that the size of the output      
+ * matrix equals the outer dimensions of <code>pSrcA</code> and <code>pSrcB</code>.      
+ */
+
+
+/**      
+ * @addtogroup CmplxMatrixMult      
+ * @{      
+ */
+
+/**      
+ * @brief Floating-point Complex matrix multiplication.      
+ * @param[in]       *pSrcA points to the first input complex matrix structure      
+ * @param[in]       *pSrcB points to the second input complex matrix structure      
+ * @param[out]      *pDst points to output complex matrix structure      
+ * @return     		The function returns either      
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.      
+ */
+
+arm_status arm_mat_cmplx_mult_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn1 = pSrcA->pData;                /* input data matrix pointer A */
+  float32_t *pIn2 = pSrcB->pData;                /* input data matrix pointer B */
+  float32_t *pInA = pSrcA->pData;                /* input data matrix pointer A  */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer */
+  float32_t *px;                                 /* Temporary output data matrix pointer */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  float32_t sumReal1, sumImag1;                  /* accumulator */
+  float32_t a0, b0, c0, d0;
+  float32_t a1, b1, c1, d1;
+  float32_t sumReal2, sumImag2;                  /* accumulator */
+
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  uint16_t col, i = 0u, j, row = numRowsA, colCnt;      /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + 2 * i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set      
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      j = 0u;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sumReal1 = 0.0f;
+        sumImag1 = 0.0f;
+
+        sumReal2 = 0.0f;
+        sumImag2 = 0.0f;
+
+        /* Initiate the pointer pIn1 to point to the starting address of the column being processed */
+        pIn1 = pInA;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        colCnt = numColsA >> 2;
+
+        /* matrix multiplication        */
+        while(colCnt > 0u)
+        {
+
+          /* Reading real part of complex matrix A */
+          a0 = *pIn1;
+
+          /* Reading real part of complex matrix B */
+          c0 = *pIn2;
+
+          /* Reading imaginary part of complex matrix A */
+          b0 = *(pIn1 + 1u);
+
+          /* Reading imaginary part of complex matrix B */
+          d0 = *(pIn2 + 1u);
+
+          sumReal1 += a0 * c0;
+          sumImag1 += b0 * c0;
+
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          sumReal2 -= b0 * d0;
+          sumImag2 += a0 * d0;
+
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+
+          a1 = *pIn1;
+          c1 = *pIn2;
+
+          b1 = *(pIn1 + 1u);
+          d1 = *(pIn2 + 1u);
+
+          sumReal1 += a1 * c1;
+          sumImag1 += b1 * c1;
+
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          sumReal2 -= b1 * d1;
+          sumImag2 += a1 * d1;
+
+          a0 = *pIn1;
+          c0 = *pIn2;
+
+          b0 = *(pIn1 + 1u);
+          d0 = *(pIn2 + 1u);
+
+          sumReal1 += a0 * c0;
+          sumImag1 += b0 * c0;
+
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          sumReal2 -= b0 * d0;
+          sumImag2 += a0 * d0;
+
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+
+          a1 = *pIn1;
+          c1 = *pIn2;
+
+          b1 = *(pIn1 + 1u);
+          d1 = *(pIn2 + 1u);
+
+          sumReal1 += a1 * c1;
+          sumImag1 += b1 * c1;
+
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          sumReal2 -= b1 * d1;
+          sumImag2 += a1 * d1;
+
+          /* Decrement the loop count */
+          colCnt--;
+        }
+
+        /* If the columns of pSrcA is not a multiple of 4, compute any remaining MACs here.      
+         ** No loop unrolling is used. */
+        colCnt = numColsA % 0x4u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          a1 = *pIn1;
+          c1 = *pIn2;
+
+          b1 = *(pIn1 + 1u);
+          d1 = *(pIn2 + 1u);
+
+          sumReal1 += a1 * c1;
+          sumImag1 += b1 * c1;
+
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          sumReal2 -= b1 * d1;
+          sumImag2 += a1 * d1;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        sumReal1 += sumReal2;
+        sumImag1 += sumImag2;
+
+        /* Store the result in the destination buffer */
+        *px++ = sumReal1;
+        *px++ = sumImag1;
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        j++;
+        pIn2 = pSrcB->pData + 2u * j;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+      /* Update the pointer pInA to point to the  starting address of the next row */
+      i = i + numColsB;
+      pInA = pInA + 2 * numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**      
+ * @} end of MatrixMult group      
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,424 @@
+/* ----------------------------------------------------------------------      
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved. 
+*      
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*      
+* Project:      CMSIS DSP Library 
+* Title:	    arm_cmplx_mat_mult_q15.c      
+*      
+* Description:	 Q15 complex matrix multiplication.      
+*      
+* Target Processor:          Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**      
+ * @ingroup groupMatrix      
+ */
+
+/**      
+ * @addtogroup CmplxMatrixMult      
+ * @{      
+ */
+
+
+/**      
+ * @brief Q15 Complex matrix multiplication      
+ * @param[in]       *pSrcA points to the first input complex matrix structure      
+ * @param[in]       *pSrcB points to the second input complex matrix structure      
+ * @param[out]      *pDst points to output complex matrix structure      
+ * @param[in]		*pScratch points to the array for storing intermediate results     
+ * @return     		The function returns either      
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.         
+ *  
+ * \par Conditions for optimum performance  
+ *  Input, output and state buffers should be aligned by 32-bit  
+ *  
+ * \par Restrictions  
+ *  If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE  
+ *	In this case input, output, scratch buffers should be aligned by 32-bit  
+ *  
+ * @details      
+ * <b>Scaling and Overflow Behavior:</b>      
+ *      
+ * \par      
+ * The function is implemented using a 64-bit internal accumulator. The inputs to the      
+ * multiplications are in 1.15 format and multiplications yield a 2.30 result.      
+ * The 2.30 intermediate      
+ * results are accumulated in a 64-bit accumulator in 34.30 format. This approach      
+ * provides 33 guard bits and there is no risk of overflow. The 34.30 result is then      
+ * truncated to 34.15 format by discarding the low 15 bits and then saturated to      
+ * 1.15 format.      
+ *      
+ * \par      
+ * Refer to <code>arm_mat_mult_fast_q15()</code> for a faster but less precise version of this function.      
+ *      
+ */
+
+
+
+
+arm_status arm_mat_cmplx_mult_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst,
+  q15_t * pScratch)
+{
+  /* accumulator */
+  q15_t *pSrcBT = pScratch;                      /* input data matrix pointer for transpose */
+  q15_t *pInA = pSrcA->pData;                    /* input data matrix pointer A of Q15 type */
+  q15_t *pInB = pSrcB->pData;                    /* input data matrix pointer B of Q15 type */
+  q15_t *px;                                     /* Temporary output data matrix pointer */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A    */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  uint16_t numRowsB = pSrcB->numRows;            /* number of rows of input matrix A    */
+  uint16_t col, i = 0u, row = numRowsB, colCnt;  /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+  q63_t sumReal, sumImag;
+
+#ifdef UNALIGNED_SUPPORT_DISABLE
+  q15_t in;                                      /* Temporary variable to hold the input value */
+  q15_t a, b, c, d;
+#else
+  q31_t in;                                      /* Temporary variable to hold the input value */
+  q31_t prod1, prod2;
+  q31_t pSourceA, pSourceB;
+#endif
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif
+  {
+    /* Matrix transpose */
+    do
+    {
+      /* Apply loop unrolling and exchange the columns with row elements */
+      col = numColsB >> 2;
+
+      /* The pointer px is set to starting address of the column being processed */
+      px = pSrcBT + i;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.      
+       ** a second loop below computes the remaining 1 to 3 samples. */
+      while(col > 0u)
+      {
+#ifdef UNALIGNED_SUPPORT_DISABLE
+        /* Read two elements from the row */
+        in = *pInB++;
+        *px = in;
+        in = *pInB++;
+        px[1] = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Read two elements from the row */
+        in = *pInB++;
+        *px = in;
+        in = *pInB++;
+        px[1] = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Read two elements from the row */
+        in = *pInB++;
+        *px = in;
+        in = *pInB++;
+        px[1] = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Read two elements from the row */
+        in = *pInB++;
+        *px = in;
+        in = *pInB++;
+        px[1] = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      /* If the columns of pSrcB is not a multiple of 4, compute any remaining output samples here.      
+       ** No loop unrolling is used. */
+      col = numColsB % 0x4u;
+
+      while(col > 0u)
+      {
+        /* Read two elements from the row */
+        in = *pInB++;
+        *px = in;
+        in = *pInB++;
+        px[1] = in;
+#else
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        *__SIMD32(px) = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        *__SIMD32(px) = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        *__SIMD32(px) = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        *__SIMD32(px) = in;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      /* If the columns of pSrcB is not a multiple of 4, compute any remaining output samples here.      
+       ** No loop unrolling is used. */
+      col = numColsB % 0x4u;
+
+      while(col > 0u)
+      {
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        *__SIMD32(px) = in;
+#endif
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB * 2;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      i = i + 2u;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* Reset the variables for the usage in the following multiplication process */
+    row = numRowsA;
+    i = 0u;
+    px = pDst->pData;
+
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set      
+       ** to the starting address of the transposed pSrcB data */
+      pInB = pSrcBT;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sumReal = 0;
+        sumImag = 0;
+
+        /* Apply loop unrolling and compute 2 MACs simultaneously. */
+        colCnt = numColsA >> 1;
+
+        /* Initiate the pointer pIn1 to point to the starting address of the column being processed */
+        pInA = pSrcA->pData + i * 2;
+
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+
+#ifdef UNALIGNED_SUPPORT_DISABLE
+
+          /* read real and imag values from pSrcA buffer */
+          a = *pInA;
+          b = *(pInA + 1u);
+          /* read real and imag values from pSrcB buffer */
+          c = *pInB;
+          d = *(pInB + 1u);
+
+          /* Multiply and Accumlates */
+          sumReal += (q31_t) a *c;
+          sumImag += (q31_t) a *d;
+          sumReal -= (q31_t) b *d;
+          sumImag += (q31_t) b *c;
+
+          /* read next real and imag values from pSrcA buffer */
+          a = *(pInA + 2u);
+          b = *(pInA + 3u);
+          /* read next real and imag values from pSrcB buffer */
+          c = *(pInB + 2u);
+          d = *(pInB + 3u);
+
+          /* update pointer */
+          pInA += 4u;
+
+          /* Multiply and Accumlates */
+          sumReal += (q31_t) a *c;
+          sumImag += (q31_t) a *d;
+          sumReal -= (q31_t) b *d;
+          sumImag += (q31_t) b *c;
+          /* update pointer */
+          pInB += 4u;
+#else
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          pSourceA = *__SIMD32(pInA)++;
+          pSourceB = *__SIMD32(pInB)++;
+
+          /* Multiply and Accumlates */
+#ifdef ARM_MATH_BIG_ENDIAN
+          prod1 = -__SMUSD(pSourceA, pSourceB);
+#else
+          prod1 = __SMUSD(pSourceA, pSourceB);
+#endif
+          prod2 = __SMUADX(pSourceA, pSourceB);
+          sumReal += (q63_t) prod1;
+          sumImag += (q63_t) prod2;
+
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          pSourceA = *__SIMD32(pInA)++;
+          pSourceB = *__SIMD32(pInB)++;
+
+          /* Multiply and Accumlates */
+#ifdef ARM_MATH_BIG_ENDIAN
+          prod1 = -__SMUSD(pSourceA, pSourceB);
+#else
+          prod1 = __SMUSD(pSourceA, pSourceB);
+#endif
+          prod2 = __SMUADX(pSourceA, pSourceB);
+          sumReal += (q63_t) prod1;
+          sumImag += (q63_t) prod2;
+
+#endif /*      #ifdef UNALIGNED_SUPPORT_DISABLE */
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* process odd column samples */
+        if((numColsA & 0x1u) > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+
+#ifdef UNALIGNED_SUPPORT_DISABLE
+
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          a = *pInA++;
+          b = *pInA++;
+          c = *pInB++;
+          d = *pInB++;
+
+          /* Multiply and Accumlates */
+          sumReal += (q31_t) a *c;
+          sumImag += (q31_t) a *d;
+          sumReal -= (q31_t) b *d;
+          sumImag += (q31_t) b *c;
+
+#else
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          pSourceA = *__SIMD32(pInA)++;
+          pSourceB = *__SIMD32(pInB)++;
+
+          /* Multiply and Accumlates */
+#ifdef ARM_MATH_BIG_ENDIAN
+          prod1 = -__SMUSD(pSourceA, pSourceB);
+#else
+          prod1 = __SMUSD(pSourceA, pSourceB);
+#endif
+          prod2 = __SMUADX(pSourceA, pSourceB);
+          sumReal += (q63_t) prod1;
+          sumImag += (q63_t) prod2;
+
+#endif /*      #ifdef UNALIGNED_SUPPORT_DISABLE */
+
+        }
+
+        /* Saturate and store the result in the destination buffer */
+
+        *px++ = (q15_t) (__SSAT(sumReal >> 15, 16));
+        *px++ = (q15_t) (__SSAT(sumImag >> 15, 16));
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+      i = i + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**      
+ * @} end of MatrixMult group      
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_cmplx_mult_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,293 @@
+/* ----------------------------------------------------------------------      
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved. 
+*      
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*      
+* Project:      CMSIS DSP Library 
+* Title:	    arm_mat_cmplx_mult_q31.c      
+*      
+* Description:  Floating-point matrix multiplication.      
+*      
+* Target Processor:          Cortex-M4/Cortex-M3/Cortex-M0
+*
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------- */
+#include "arm_math.h"
+
+/**     
+ * @ingroup groupMatrix     
+ */
+
+/**      
+ * @addtogroup CmplxMatrixMult      
+ * @{      
+ */
+
+/**      
+ * @brief Q31 Complex matrix multiplication      
+ * @param[in]       *pSrcA points to the first input complex matrix structure      
+ * @param[in]       *pSrcB points to the second input complex matrix structure      
+ * @param[out]      *pDst points to output complex matrix structure      
+ * @return     		The function returns either      
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.      
+ *      
+ * @details      
+ * <b>Scaling and Overflow Behavior:</b>      
+ *      
+ * \par      
+ * The function is implemented using an internal 64-bit accumulator.      
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate      
+ * multiplication results but provides only a single guard bit. There is no saturation      
+ * on intermediate additions. Thus, if the accumulator overflows it wraps around and      
+ * distorts the result. The input signals should be scaled down to avoid intermediate      
+ * overflows. The input is thus scaled down by log2(numColsA) bits      
+ * to avoid overflows, as a total of numColsA additions are performed internally.      
+ * The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.      
+ *      
+ *      
+ */
+
+arm_status arm_mat_cmplx_mult_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn1 = pSrcA->pData;                    /* input data matrix pointer A */
+  q31_t *pIn2 = pSrcB->pData;                    /* input data matrix pointer B */
+  q31_t *pInA = pSrcA->pData;                    /* input data matrix pointer A  */
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  q31_t *px;                                     /* Temporary output data matrix pointer */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  q63_t sumReal1, sumImag1;                      /* accumulator */
+  q31_t a0, b0, c0, d0;
+  q31_t a1, b1, c1, d1;
+
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  uint16_t col, i = 0u, j, row = numRowsA, colCnt;      /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + 2 * i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set     
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      j = 0u;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sumReal1 = 0.0;
+        sumImag1 = 0.0;
+
+        /* Initiate the pointer pIn1 to point to the starting address of the column being processed */
+        pIn1 = pInA;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        colCnt = numColsA >> 2;
+
+        /* matrix multiplication        */
+        while(colCnt > 0u)
+        {
+
+          /* Reading real part of complex matrix A */
+          a0 = *pIn1;
+
+          /* Reading real part of complex matrix B */
+          c0 = *pIn2;
+
+          /* Reading imaginary part of complex matrix A */
+          b0 = *(pIn1 + 1u);
+
+          /* Reading imaginary part of complex matrix B */
+          d0 = *(pIn2 + 1u);
+
+          /* Multiply and Accumlates */
+          sumReal1 += (q63_t) a0 *c0;
+          sumImag1 += (q63_t) b0 *c0;
+
+          /* update pointers */
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          /* Multiply and Accumlates */
+          sumReal1 -= (q63_t) b0 *d0;
+          sumImag1 += (q63_t) a0 *d0;
+
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          a1 = *pIn1;
+          c1 = *pIn2;
+          b1 = *(pIn1 + 1u);
+          d1 = *(pIn2 + 1u);
+
+          /* Multiply and Accumlates */
+          sumReal1 += (q63_t) a1 *c1;
+          sumImag1 += (q63_t) b1 *c1;
+
+          /* update pointers */
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          /* Multiply and Accumlates */
+          sumReal1 -= (q63_t) b1 *d1;
+          sumImag1 += (q63_t) a1 *d1;
+
+          a0 = *pIn1;
+          c0 = *pIn2;
+
+          b0 = *(pIn1 + 1u);
+          d0 = *(pIn2 + 1u);
+
+          /* Multiply and Accumlates */
+          sumReal1 += (q63_t) a0 *c0;
+          sumImag1 += (q63_t) b0 *c0;
+
+          /* update pointers */
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          /* Multiply and Accumlates */
+          sumReal1 -= (q63_t) b0 *d0;
+          sumImag1 += (q63_t) a0 *d0;
+
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+
+          a1 = *pIn1;
+          c1 = *pIn2;
+
+          b1 = *(pIn1 + 1u);
+          d1 = *(pIn2 + 1u);
+
+          /* Multiply and Accumlates */
+          sumReal1 += (q63_t) a1 *c1;
+          sumImag1 += (q63_t) b1 *c1;
+
+          /* update pointers */
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          /* Multiply and Accumlates */
+          sumReal1 -= (q63_t) b1 *d1;
+          sumImag1 += (q63_t) a1 *d1;
+
+          /* Decrement the loop count */
+          colCnt--;
+        }
+
+        /* If the columns of pSrcA is not a multiple of 4, compute any remaining MACs here.     
+         ** No loop unrolling is used. */
+        colCnt = numColsA % 0x4u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          a1 = *pIn1;
+          c1 = *pIn2;
+
+          b1 = *(pIn1 + 1u);
+          d1 = *(pIn2 + 1u);
+
+          /* Multiply and Accumlates */
+          sumReal1 += (q63_t) a1 *c1;
+          sumImag1 += (q63_t) b1 *c1;
+
+          /* update pointers */
+          pIn1 += 2u;
+          pIn2 += 2 * numColsB;
+
+          /* Multiply and Accumlates */
+          sumReal1 -= (q63_t) b1 *d1;
+          sumImag1 += (q63_t) a1 *d1;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Store the result in the destination buffer */
+        *px++ = (q31_t) clip_q63_to_q31(sumReal1 >> 31);
+        *px++ = (q31_t) clip_q63_to_q31(sumImag1 >> 31);
+        
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        j++;
+        pIn2 = pSrcB->pData + 2u * j;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+      /* Update the pointer pInA to point to the  starting address of the next row */
+      i = i + numColsB;
+      pInA = pInA + 2 * numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**     
+ * @} end of MatrixMult group     
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,88 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_mat_init_f32.c    
+*    
+* Description:	Floating-point matrix initialization.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @defgroup MatrixInit Matrix Initialization    
+ *    
+ * Initializes the underlying matrix data structure.    
+ * The functions set the <code>numRows</code>,    
+ * <code>numCols</code>, and <code>pData</code> fields    
+ * of the matrix data structure.    
+ */
+
+/**    
+ * @addtogroup MatrixInit    
+ * @{    
+ */
+
+/**    
+   * @brief  Floating-point matrix initialization.    
+   * @param[in,out] *S             points to an instance of the floating-point matrix structure.    
+   * @param[in]     nRows          number of rows in the matrix.    
+   * @param[in]     nColumns       number of columns in the matrix.    
+   * @param[in]     *pData	   points to the matrix data array.    
+   * @return        none    
+   */
+
+void arm_mat_init_f32(
+  arm_matrix_instance_f32 * S,
+  uint16_t nRows,
+  uint16_t nColumns,
+  float32_t * pData)
+{
+  /* Assign Number of Rows */
+  S->numRows = nRows;
+
+  /* Assign Number of Columns */
+  S->numCols = nColumns;
+
+  /* Assign Data pointer */
+  S->pData = pData;
+}
+
+/**    
+ * @} end of MatrixInit group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,80 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_mat_init_q15.c    
+*    
+* Description:	Q15 matrix initialization.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------------- */
+
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixInit    
+ * @{    
+ */
+
+  /**    
+   * @brief  Q15 matrix initialization.    
+   * @param[in,out] *S             points to an instance of the floating-point matrix structure.    
+   * @param[in]     nRows          number of rows in the matrix.    
+   * @param[in]     nColumns       number of columns in the matrix.    
+   * @param[in]     *pData	   points to the matrix data array.    
+   * @return        none    
+   */
+
+void arm_mat_init_q15(
+  arm_matrix_instance_q15 * S,
+  uint16_t nRows,
+  uint16_t nColumns,
+  q15_t * pData)
+{
+  /* Assign Number of Rows */
+  S->numRows = nRows;
+
+  /* Assign Number of Columns */
+  S->numCols = nColumns;
+
+  /* Assign Data pointer */
+  S->pData = pData;
+}
+
+/**    
+ * @} end of MatrixInit group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_init_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,84 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_mat_init_q31.c    
+*    
+* Description:	Q31 matrix initialization.    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------------- */
+
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @defgroup MatrixInit Matrix Initialization    
+ *    
+ */
+
+/**    
+ * @addtogroup MatrixInit    
+ * @{    
+ */
+
+  /**    
+   * @brief  Q31 matrix initialization.    
+   * @param[in,out] *S             points to an instance of the floating-point matrix structure.    
+   * @param[in]     nRows          number of rows in the matrix.    
+   * @param[in]     nColumns       number of columns in the matrix.    
+   * @param[in]     *pData	   points to the matrix data array.    
+   * @return        none    
+   */
+
+void arm_mat_init_q31(
+  arm_matrix_instance_q31 * S,
+  uint16_t nRows,
+  uint16_t nColumns,
+  q31_t * pData)
+{
+  /* Assign Number of Rows */
+  S->numRows = nRows;
+
+  /* Assign Number of Columns */
+  S->numCols = nColumns;
+
+  /* Assign Data pointer */
+  S->pData = pData;
+}
+
+/**    
+ * @} end of MatrixInit group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_inverse_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_inverse_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,703 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_inverse_f32.c    
+*    
+* Description:	Floating-point matrix inverse.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @defgroup MatrixInv Matrix Inverse    
+ *    
+ * Computes the inverse of a matrix.    
+ *    
+ * The inverse is defined only if the input matrix is square and non-singular (the determinant    
+ * is non-zero). The function checks that the input and output matrices are square and of the    
+ * same size.    
+ *    
+ * Matrix inversion is numerically sensitive and the CMSIS DSP library only supports matrix    
+ * inversion of floating-point matrices.    
+ *    
+ * \par Algorithm    
+ * The Gauss-Jordan method is used to find the inverse.    
+ * The algorithm performs a sequence of elementary row-operations until it    
+ * reduces the input matrix to an identity matrix. Applying the same sequence    
+ * of elementary row-operations to an identity matrix yields the inverse matrix.    
+ * If the input matrix is singular, then the algorithm terminates and returns error status    
+ * <code>ARM_MATH_SINGULAR</code>.    
+ * \image html MatrixInverse.gif "Matrix Inverse of a 3 x 3 matrix using Gauss-Jordan Method"    
+ */
+
+/**    
+ * @addtogroup MatrixInv    
+ * @{    
+ */
+
+/**    
+ * @brief Floating-point matrix inverse.    
+ * @param[in]       *pSrc points to input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> if the input matrix is not square or if the size    
+ * of the output matrix does not match the size of the input matrix.    
+ * If the input matrix is found to be singular (non-invertible), then the function returns    
+ * <code>ARM_MATH_SINGULAR</code>.  Otherwise, the function returns <code>ARM_MATH_SUCCESS</code>.    
+ */
+
+arm_status arm_mat_inverse_f32(
+  const arm_matrix_instance_f32 * pSrc,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn = pSrc->pData;                  /* input data matrix pointer */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer */
+  float32_t *pInT1, *pInT2;                      /* Temporary input data matrix pointer */
+  float32_t *pOutT1, *pOutT2;                    /* Temporary output data matrix pointer */
+  float32_t *pPivotRowIn, *pPRT_in, *pPivotRowDst, *pPRT_pDst;  /* Temporary input and output data matrix pointer */
+  uint32_t numRows = pSrc->numRows;              /* Number of rows in the matrix  */
+  uint32_t numCols = pSrc->numCols;              /* Number of Cols in the matrix  */
+
+#ifndef ARM_MATH_CM0_FAMILY
+  float32_t maxC;                                /* maximum value in the column */
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t Xchg, in = 0.0f, in1;                /* Temporary input values  */
+  uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l;      /* loop counters */
+  arm_status status;                             /* status of matrix inverse */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols)
+     || (pSrc->numRows != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+
+    /*--------------------------------------------------------------------------------------------------------------    
+	 * Matrix Inverse can be solved using elementary row operations.    
+	 *    
+	 *	Gauss-Jordan Method:    
+	 *    
+	 *	   1. First combine the identity matrix and the input matrix separated by a bar to form an    
+	 *        augmented matrix as follows:    
+	 *				        _ 	      	       _         _	       _    
+	 *					   |  a11  a12 | 1   0  |       |  X11 X12  |    
+	 *					   |           |        |   =   |           |    
+	 *					   |_ a21  a22 | 0   1 _|       |_ X21 X21 _|    
+	 *    
+	 *		2. In our implementation, pDst Matrix is used as identity matrix.    
+	 *    
+	 *		3. Begin with the first row. Let i = 1.    
+	 *    
+	 *	    4. Check to see if the pivot for column i is the greatest of the column.    
+	 *		   The pivot is the element of the main diagonal that is on the current row.    
+	 *		   For instance, if working with row i, then the pivot element is aii.    
+	 *		   If the pivot is not the most significant of the columns, exchange that row with a row
+	 *		   below it that does contain the most significant value in column i. If the most
+	 *         significant value of the column is zero, then an inverse to that matrix does not exist.
+	 *		   The most significant value of the column is the absolute maximum.
+	 *    
+	 *	    5. Divide every element of row i by the pivot.    
+	 *    
+	 *	    6. For every row below and  row i, replace that row with the sum of that row and    
+	 *		   a multiple of row i so that each new element in column i below row i is zero.    
+	 *    
+	 *	    7. Move to the next row and column and repeat steps 2 through 5 until you have zeros    
+	 *		   for every element below and above the main diagonal.    
+	 *    
+	 *		8. Now an identical matrix is formed to the left of the bar(input matrix, pSrc).    
+	 *		   Therefore, the matrix to the right of the bar is our solution(pDst matrix, pDst).    
+	 *----------------------------------------------------------------------------------------------------------------*/
+
+    /* Working pointer for destination matrix */
+    pOutT1 = pOut;
+
+    /* Loop over the number of rows */
+    rowCnt = numRows;
+
+    /* Making the destination matrix as identity matrix */
+    while(rowCnt > 0u)
+    {
+      /* Writing all zeroes in lower triangle of the destination matrix */
+      j = numRows - rowCnt;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Writing all ones in the diagonal of the destination matrix */
+      *pOutT1++ = 1.0f;
+
+      /* Writing all zeroes in upper triangle of the destination matrix */
+      j = rowCnt - 1u;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Decrement the loop counter */
+      rowCnt--;
+    }
+
+    /* Loop over the number of columns of the input matrix.    
+       All the elements in each column are processed by the row operations */
+    loopCnt = numCols;
+
+    /* Index modifier to navigate through the columns */
+    l = 0u;
+
+    while(loopCnt > 0u)
+    {
+      /* Check if the pivot element is zero..    
+       * If it is zero then interchange the row with non zero row below.    
+       * If there is no non zero element to replace in the rows below,    
+       * then the matrix is Singular. */
+
+      /* Working pointer for the input matrix that points    
+       * to the pivot element of the particular row  */
+      pInT1 = pIn + (l * numCols);
+
+      /* Working pointer for the destination matrix that points    
+       * to the pivot element of the particular row  */
+      pOutT1 = pOut + (l * numCols);
+
+      /* Temporary variable to hold the pivot value */
+      in = *pInT1;
+
+      /* Grab the most significant value from column l */
+      maxC = 0;
+      for (i = l; i < numRows; i++)
+      {
+        maxC = *pInT1 > 0 ? (*pInT1 > maxC ? *pInT1 : maxC) : (-*pInT1 > maxC ? -*pInT1 : maxC);
+        pInT1 += numCols;
+      }
+
+      /* Update the status if the matrix is singular */
+      if(maxC == 0.0f)
+      {
+        return ARM_MATH_SINGULAR;
+      }
+
+      /* Restore pInT1  */
+      pInT1 = pIn;
+
+      /* Destination pointer modifier */
+      k = 1u;
+      
+      /* Check if the pivot element is the most significant of the column */
+      if( (in > 0.0f ? in : -in) != maxC)
+      {
+        /* Loop over the number rows present below */
+        i = numRows - (l + 1u);
+
+        while(i > 0u)
+        {
+          /* Update the input and destination pointers */
+          pInT2 = pInT1 + (numCols * l);
+          pOutT2 = pOutT1 + (numCols * k);
+
+          /* Look for the most significant element to    
+           * replace in the rows below */
+          if((*pInT2 > 0.0f ? *pInT2: -*pInT2) == maxC)
+          {
+            /* Loop over number of columns    
+             * to the right of the pilot element */
+            j = numCols - l;
+
+            while(j > 0u)
+            {
+              /* Exchange the row elements of the input matrix */
+              Xchg = *pInT2;
+              *pInT2++ = *pInT1;
+              *pInT1++ = Xchg;
+
+              /* Decrement the loop counter */
+              j--;
+            }
+
+            /* Loop over number of columns of the destination matrix */
+            j = numCols;
+
+            while(j > 0u)
+            {
+              /* Exchange the row elements of the destination matrix */
+              Xchg = *pOutT2;
+              *pOutT2++ = *pOutT1;
+              *pOutT1++ = Xchg;
+
+              /* Decrement the loop counter */
+              j--;
+            }
+
+            /* Flag to indicate whether exchange is done or not */
+            flag = 1u;
+
+            /* Break after exchange is done */
+            break;
+          }
+
+          /* Update the destination pointer modifier */
+          k++;
+
+          /* Decrement the loop counter */
+          i--;
+        }
+      }
+
+      /* Update the status if the matrix is singular */
+      if((flag != 1u) && (in == 0.0f))
+      {
+        return ARM_MATH_SINGULAR;
+      }
+
+      /* Points to the pivot row of input and destination matrices */
+      pPivotRowIn = pIn + (l * numCols);
+      pPivotRowDst = pOut + (l * numCols);
+
+      /* Temporary pointers to the pivot row pointers */
+      pInT1 = pPivotRowIn;
+      pInT2 = pPivotRowDst;
+
+      /* Pivot element of the row */
+      in = *pPivotRowIn;
+
+      /* Loop over number of columns    
+       * to the right of the pilot element */
+      j = (numCols - l);
+
+      while(j > 0u)
+      {
+        /* Divide each element of the row of the input matrix    
+         * by the pivot element */
+        in1 = *pInT1;
+        *pInT1++ = in1 / in;
+
+        /* Decrement the loop counter */
+        j--;
+      }
+
+      /* Loop over number of columns of the destination matrix */
+      j = numCols;
+
+      while(j > 0u)
+      {
+        /* Divide each element of the row of the destination matrix    
+         * by the pivot element */
+        in1 = *pInT2;
+        *pInT2++ = in1 / in;
+
+        /* Decrement the loop counter */
+        j--;
+      }
+
+      /* Replace the rows with the sum of that row and a multiple of row i    
+       * so that each new element in column i above row i is zero.*/
+
+      /* Temporary pointers for input and destination matrices */
+      pInT1 = pIn;
+      pInT2 = pOut;
+
+      /* index used to check for pivot element */
+      i = 0u;
+
+      /* Loop over number of rows */
+      /*  to be replaced by the sum of that row and a multiple of row i */
+      k = numRows;
+
+      while(k > 0u)
+      {
+        /* Check for the pivot element */
+        if(i == l)
+        {
+          /* If the processing element is the pivot element,    
+             only the columns to the right are to be processed */
+          pInT1 += numCols - l;
+
+          pInT2 += numCols;
+        }
+        else
+        {
+          /* Element of the reference row */
+          in = *pInT1;
+
+          /* Working pointers for input and destination pivot rows */
+          pPRT_in = pPivotRowIn;
+          pPRT_pDst = pPivotRowDst;
+
+          /* Loop over the number of columns to the right of the pivot element,    
+             to replace the elements in the input matrix */
+          j = (numCols - l);
+
+          while(j > 0u)
+          {
+            /* Replace the element by the sum of that row    
+               and a multiple of the reference row  */
+            in1 = *pInT1;
+            *pInT1++ = in1 - (in * *pPRT_in++);
+
+            /* Decrement the loop counter */
+            j--;
+          }
+
+          /* Loop over the number of columns to    
+             replace the elements in the destination matrix */
+          j = numCols;
+
+          while(j > 0u)
+          {
+            /* Replace the element by the sum of that row    
+               and a multiple of the reference row  */
+            in1 = *pInT2;
+            *pInT2++ = in1 - (in * *pPRT_pDst++);
+
+            /* Decrement the loop counter */
+            j--;
+          }
+
+        }
+
+        /* Increment the temporary input pointer */
+        pInT1 = pInT1 + l;
+
+        /* Decrement the loop counter */
+        k--;
+
+        /* Increment the pivot index */
+        i++;
+      }
+
+      /* Increment the input pointer */
+      pIn++;
+
+      /* Decrement the loop counter */
+      loopCnt--;
+
+      /* Increment the index modifier */
+      l++;
+    }
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t Xchg, in = 0.0f;                     /* Temporary input values  */
+  uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l;      /* loop counters */
+  arm_status status;                             /* status of matrix inverse */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols)
+     || (pSrc->numRows != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+  {
+
+    /*--------------------------------------------------------------------------------------------------------------       
+	 * Matrix Inverse can be solved using elementary row operations.        
+	 *        
+	 *	Gauss-Jordan Method:       
+	 *	 	       
+	 *	   1. First combine the identity matrix and the input matrix separated by a bar to form an        
+	 *        augmented matrix as follows:        
+	 *				        _  _	      _	    _	   _   _         _	       _       
+	 *					   |  |  a11  a12  | | | 1   0  |   |       |  X11 X12  |         
+	 *					   |  |            | | |        |   |   =   |           |        
+	 *					   |_ |_ a21  a22 _| | |_0   1 _|  _|       |_ X21 X21 _|       
+	 *					          
+	 *		2. In our implementation, pDst Matrix is used as identity matrix.    
+	 *       
+	 *		3. Begin with the first row. Let i = 1.       
+	 *       
+	 *	    4. Check to see if the pivot for row i is zero.       
+	 *		   The pivot is the element of the main diagonal that is on the current row.       
+	 *		   For instance, if working with row i, then the pivot element is aii.       
+	 *		   If the pivot is zero, exchange that row with a row below it that does not        
+	 *		   contain a zero in column i. If this is not possible, then an inverse        
+	 *		   to that matrix does not exist.       
+	 *	       
+	 *	    5. Divide every element of row i by the pivot.       
+	 *	       
+	 *	    6. For every row below and  row i, replace that row with the sum of that row and        
+	 *		   a multiple of row i so that each new element in column i below row i is zero.       
+	 *	       
+	 *	    7. Move to the next row and column and repeat steps 2 through 5 until you have zeros       
+	 *		   for every element below and above the main diagonal.        
+	 *		   		          
+	 *		8. Now an identical matrix is formed to the left of the bar(input matrix, src).       
+	 *		   Therefore, the matrix to the right of the bar is our solution(dst matrix, dst).         
+	 *----------------------------------------------------------------------------------------------------------------*/
+
+    /* Working pointer for destination matrix */
+    pOutT1 = pOut;
+
+    /* Loop over the number of rows */
+    rowCnt = numRows;
+
+    /* Making the destination matrix as identity matrix */
+    while(rowCnt > 0u)
+    {
+      /* Writing all zeroes in lower triangle of the destination matrix */
+      j = numRows - rowCnt;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Writing all ones in the diagonal of the destination matrix */
+      *pOutT1++ = 1.0f;
+
+      /* Writing all zeroes in upper triangle of the destination matrix */
+      j = rowCnt - 1u;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Decrement the loop counter */
+      rowCnt--;
+    }
+
+    /* Loop over the number of columns of the input matrix.     
+       All the elements in each column are processed by the row operations */
+    loopCnt = numCols;
+
+    /* Index modifier to navigate through the columns */
+    l = 0u;
+    //for(loopCnt = 0u; loopCnt < numCols; loopCnt++)   
+    while(loopCnt > 0u)
+    {
+      /* Check if the pivot element is zero..    
+       * If it is zero then interchange the row with non zero row below.   
+       * If there is no non zero element to replace in the rows below,   
+       * then the matrix is Singular. */
+
+      /* Working pointer for the input matrix that points     
+       * to the pivot element of the particular row  */
+      pInT1 = pIn + (l * numCols);
+
+      /* Working pointer for the destination matrix that points     
+       * to the pivot element of the particular row  */
+      pOutT1 = pOut + (l * numCols);
+
+      /* Temporary variable to hold the pivot value */
+      in = *pInT1;
+
+      /* Destination pointer modifier */
+      k = 1u;
+
+      /* Check if the pivot element is zero */
+      if(*pInT1 == 0.0f)
+      {
+        /* Loop over the number rows present below */
+        for (i = (l + 1u); i < numRows; i++)
+        {
+          /* Update the input and destination pointers */
+          pInT2 = pInT1 + (numCols * l);
+          pOutT2 = pOutT1 + (numCols * k);
+
+          /* Check if there is a non zero pivot element to     
+           * replace in the rows below */
+          if(*pInT2 != 0.0f)
+          {
+            /* Loop over number of columns     
+             * to the right of the pilot element */
+            for (j = 0u; j < (numCols - l); j++)
+            {
+              /* Exchange the row elements of the input matrix */
+              Xchg = *pInT2;
+              *pInT2++ = *pInT1;
+              *pInT1++ = Xchg;
+            }
+
+            for (j = 0u; j < numCols; j++)
+            {
+              Xchg = *pOutT2;
+              *pOutT2++ = *pOutT1;
+              *pOutT1++ = Xchg;
+            }
+
+            /* Flag to indicate whether exchange is done or not */
+            flag = 1u;
+
+            /* Break after exchange is done */
+            break;
+          }
+
+          /* Update the destination pointer modifier */
+          k++;
+        }
+      }
+
+      /* Update the status if the matrix is singular */
+      if((flag != 1u) && (in == 0.0f))
+      {
+        return ARM_MATH_SINGULAR;
+      }
+
+      /* Points to the pivot row of input and destination matrices */
+      pPivotRowIn = pIn + (l * numCols);
+      pPivotRowDst = pOut + (l * numCols);
+
+      /* Temporary pointers to the pivot row pointers */
+      pInT1 = pPivotRowIn;
+      pOutT1 = pPivotRowDst;
+
+      /* Pivot element of the row */
+      in = *(pIn + (l * numCols));
+
+      /* Loop over number of columns     
+       * to the right of the pilot element */
+      for (j = 0u; j < (numCols - l); j++)
+      {
+        /* Divide each element of the row of the input matrix     
+         * by the pivot element */
+        *pInT1 = *pInT1 / in;
+        pInT1++;
+      }
+      for (j = 0u; j < numCols; j++)
+      {
+        /* Divide each element of the row of the destination matrix     
+         * by the pivot element */
+        *pOutT1 = *pOutT1 / in;
+        pOutT1++;
+      }
+
+      /* Replace the rows with the sum of that row and a multiple of row i     
+       * so that each new element in column i above row i is zero.*/
+
+      /* Temporary pointers for input and destination matrices */
+      pInT1 = pIn;
+      pOutT1 = pOut;
+
+      for (i = 0u; i < numRows; i++)
+      {
+        /* Check for the pivot element */
+        if(i == l)
+        {
+          /* If the processing element is the pivot element,     
+             only the columns to the right are to be processed */
+          pInT1 += numCols - l;
+          pOutT1 += numCols;
+        }
+        else
+        {
+          /* Element of the reference row */
+          in = *pInT1;
+
+          /* Working pointers for input and destination pivot rows */
+          pPRT_in = pPivotRowIn;
+          pPRT_pDst = pPivotRowDst;
+
+          /* Loop over the number of columns to the right of the pivot element,     
+             to replace the elements in the input matrix */
+          for (j = 0u; j < (numCols - l); j++)
+          {
+            /* Replace the element by the sum of that row     
+               and a multiple of the reference row  */
+            *pInT1 = *pInT1 - (in * *pPRT_in++);
+            pInT1++;
+          }
+          /* Loop over the number of columns to     
+             replace the elements in the destination matrix */
+          for (j = 0u; j < numCols; j++)
+          {
+            /* Replace the element by the sum of that row     
+               and a multiple of the reference row  */
+            *pOutT1 = *pOutT1 - (in * *pPRT_pDst++);
+            pOutT1++;
+          }
+
+        }
+        /* Increment the temporary input pointer */
+        pInT1 = pInT1 + l;
+      }
+      /* Increment the input pointer */
+      pIn++;
+
+      /* Decrement the loop counter */
+      loopCnt--;
+      /* Increment the index modifier */
+      l++;
+    }
+
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+    if((flag != 1u) && (in == 0.0f))
+    {
+      pIn = pSrc->pData;
+      for (i = 0; i < numRows * numCols; i++)
+      {
+        if (pIn[i] != 0.0f)
+            break;
+      }
+      
+      if (i == numRows * numCols)
+        status = ARM_MATH_SINGULAR;
+    }
+  }
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixInv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_inverse_f64.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_inverse_f64.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,703 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_inverse_f64.c    
+*    
+* Description:	Floating-point matrix inverse.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @defgroup MatrixInv Matrix Inverse    
+ *    
+ * Computes the inverse of a matrix.    
+ *    
+ * The inverse is defined only if the input matrix is square and non-singular (the determinant    
+ * is non-zero). The function checks that the input and output matrices are square and of the    
+ * same size.    
+ *    
+ * Matrix inversion is numerically sensitive and the CMSIS DSP library only supports matrix    
+ * inversion of floating-point matrices.    
+ *    
+ * \par Algorithm    
+ * The Gauss-Jordan method is used to find the inverse.    
+ * The algorithm performs a sequence of elementary row-operations until it    
+ * reduces the input matrix to an identity matrix. Applying the same sequence    
+ * of elementary row-operations to an identity matrix yields the inverse matrix.    
+ * If the input matrix is singular, then the algorithm terminates and returns error status    
+ * <code>ARM_MATH_SINGULAR</code>.    
+ * \image html MatrixInverse.gif "Matrix Inverse of a 3 x 3 matrix using Gauss-Jordan Method"    
+ */
+
+/**    
+ * @addtogroup MatrixInv    
+ * @{    
+ */
+
+/**    
+ * @brief Floating-point matrix inverse.    
+ * @param[in]       *pSrc points to input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> if the input matrix is not square or if the size    
+ * of the output matrix does not match the size of the input matrix.    
+ * If the input matrix is found to be singular (non-invertible), then the function returns    
+ * <code>ARM_MATH_SINGULAR</code>.  Otherwise, the function returns <code>ARM_MATH_SUCCESS</code>.    
+ */
+
+arm_status arm_mat_inverse_f64(
+  const arm_matrix_instance_f64 * pSrc,
+  arm_matrix_instance_f64 * pDst)
+{
+  float64_t *pIn = pSrc->pData;                  /* input data matrix pointer */
+  float64_t *pOut = pDst->pData;                 /* output data matrix pointer */
+  float64_t *pInT1, *pInT2;                      /* Temporary input data matrix pointer */
+  float64_t *pOutT1, *pOutT2;                    /* Temporary output data matrix pointer */
+  float64_t *pPivotRowIn, *pPRT_in, *pPivotRowDst, *pPRT_pDst;  /* Temporary input and output data matrix pointer */
+  uint32_t numRows = pSrc->numRows;              /* Number of rows in the matrix  */
+  uint32_t numCols = pSrc->numCols;              /* Number of Cols in the matrix  */
+
+#ifndef ARM_MATH_CM0_FAMILY
+  float64_t maxC;                                /* maximum value in the column */
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float64_t Xchg, in = 0.0f, in1;                /* Temporary input values  */
+  uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l;      /* loop counters */
+  arm_status status;                             /* status of matrix inverse */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols)
+     || (pSrc->numRows != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+
+    /*--------------------------------------------------------------------------------------------------------------    
+	 * Matrix Inverse can be solved using elementary row operations.    
+	 *    
+	 *	Gauss-Jordan Method:    
+	 *    
+	 *	   1. First combine the identity matrix and the input matrix separated by a bar to form an    
+	 *        augmented matrix as follows:    
+	 *				        _ 	      	       _         _	       _    
+	 *					   |  a11  a12 | 1   0  |       |  X11 X12  |    
+	 *					   |           |        |   =   |           |    
+	 *					   |_ a21  a22 | 0   1 _|       |_ X21 X21 _|    
+	 *    
+	 *		2. In our implementation, pDst Matrix is used as identity matrix.    
+	 *    
+	 *		3. Begin with the first row. Let i = 1.    
+	 *    
+	 *	    4. Check to see if the pivot for column i is the greatest of the column.    
+	 *		   The pivot is the element of the main diagonal that is on the current row.    
+	 *		   For instance, if working with row i, then the pivot element is aii.    
+	 *		   If the pivot is not the most significant of the columns, exchange that row with a row
+	 *		   below it that does contain the most significant value in column i. If the most
+	 *         significant value of the column is zero, then an inverse to that matrix does not exist.
+	 *		   The most significant value of the column is the absolute maximum.
+	 *    
+	 *	    5. Divide every element of row i by the pivot.    
+	 *    
+	 *	    6. For every row below and  row i, replace that row with the sum of that row and    
+	 *		   a multiple of row i so that each new element in column i below row i is zero.    
+	 *    
+	 *	    7. Move to the next row and column and repeat steps 2 through 5 until you have zeros    
+	 *		   for every element below and above the main diagonal.    
+	 *    
+	 *		8. Now an identical matrix is formed to the left of the bar(input matrix, pSrc).    
+	 *		   Therefore, the matrix to the right of the bar is our solution(pDst matrix, pDst).    
+	 *----------------------------------------------------------------------------------------------------------------*/
+
+    /* Working pointer for destination matrix */
+    pOutT1 = pOut;
+
+    /* Loop over the number of rows */
+    rowCnt = numRows;
+
+    /* Making the destination matrix as identity matrix */
+    while(rowCnt > 0u)
+    {
+      /* Writing all zeroes in lower triangle of the destination matrix */
+      j = numRows - rowCnt;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Writing all ones in the diagonal of the destination matrix */
+      *pOutT1++ = 1.0f;
+
+      /* Writing all zeroes in upper triangle of the destination matrix */
+      j = rowCnt - 1u;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Decrement the loop counter */
+      rowCnt--;
+    }
+
+    /* Loop over the number of columns of the input matrix.    
+       All the elements in each column are processed by the row operations */
+    loopCnt = numCols;
+
+    /* Index modifier to navigate through the columns */
+    l = 0u;
+
+    while(loopCnt > 0u)
+    {
+      /* Check if the pivot element is zero..    
+       * If it is zero then interchange the row with non zero row below.    
+       * If there is no non zero element to replace in the rows below,    
+       * then the matrix is Singular. */
+
+      /* Working pointer for the input matrix that points    
+       * to the pivot element of the particular row  */
+      pInT1 = pIn + (l * numCols);
+
+      /* Working pointer for the destination matrix that points    
+       * to the pivot element of the particular row  */
+      pOutT1 = pOut + (l * numCols);
+
+      /* Temporary variable to hold the pivot value */
+      in = *pInT1;
+
+      /* Grab the most significant value from column l */
+      maxC = 0;
+      for (i = l; i < numRows; i++)
+      {
+        maxC = *pInT1 > 0 ? (*pInT1 > maxC ? *pInT1 : maxC) : (-*pInT1 > maxC ? -*pInT1 : maxC);
+        pInT1 += numCols;
+      }
+
+      /* Update the status if the matrix is singular */
+      if(maxC == 0.0f)
+      {
+        return ARM_MATH_SINGULAR;
+      }
+
+      /* Restore pInT1  */
+      pInT1 = pIn;
+
+      /* Destination pointer modifier */
+      k = 1u;
+      
+      /* Check if the pivot element is the most significant of the column */
+      if( (in > 0.0f ? in : -in) != maxC)
+      {
+        /* Loop over the number rows present below */
+        i = numRows - (l + 1u);
+
+        while(i > 0u)
+        {
+          /* Update the input and destination pointers */
+          pInT2 = pInT1 + (numCols * l);
+          pOutT2 = pOutT1 + (numCols * k);
+
+          /* Look for the most significant element to    
+           * replace in the rows below */
+          if((*pInT2 > 0.0f ? *pInT2: -*pInT2) == maxC)
+          {
+            /* Loop over number of columns    
+             * to the right of the pilot element */
+            j = numCols - l;
+
+            while(j > 0u)
+            {
+              /* Exchange the row elements of the input matrix */
+              Xchg = *pInT2;
+              *pInT2++ = *pInT1;
+              *pInT1++ = Xchg;
+
+              /* Decrement the loop counter */
+              j--;
+            }
+
+            /* Loop over number of columns of the destination matrix */
+            j = numCols;
+
+            while(j > 0u)
+            {
+              /* Exchange the row elements of the destination matrix */
+              Xchg = *pOutT2;
+              *pOutT2++ = *pOutT1;
+              *pOutT1++ = Xchg;
+
+              /* Decrement the loop counter */
+              j--;
+            }
+
+            /* Flag to indicate whether exchange is done or not */
+            flag = 1u;
+
+            /* Break after exchange is done */
+            break;
+          }
+
+          /* Update the destination pointer modifier */
+          k++;
+
+          /* Decrement the loop counter */
+          i--;
+        }
+      }
+
+      /* Update the status if the matrix is singular */
+      if((flag != 1u) && (in == 0.0f))
+      {
+        return ARM_MATH_SINGULAR;
+      }
+
+      /* Points to the pivot row of input and destination matrices */
+      pPivotRowIn = pIn + (l * numCols);
+      pPivotRowDst = pOut + (l * numCols);
+
+      /* Temporary pointers to the pivot row pointers */
+      pInT1 = pPivotRowIn;
+      pInT2 = pPivotRowDst;
+
+      /* Pivot element of the row */
+      in = *pPivotRowIn;
+
+      /* Loop over number of columns    
+       * to the right of the pilot element */
+      j = (numCols - l);
+
+      while(j > 0u)
+      {
+        /* Divide each element of the row of the input matrix    
+         * by the pivot element */
+        in1 = *pInT1;
+        *pInT1++ = in1 / in;
+
+        /* Decrement the loop counter */
+        j--;
+      }
+
+      /* Loop over number of columns of the destination matrix */
+      j = numCols;
+
+      while(j > 0u)
+      {
+        /* Divide each element of the row of the destination matrix    
+         * by the pivot element */
+        in1 = *pInT2;
+        *pInT2++ = in1 / in;
+
+        /* Decrement the loop counter */
+        j--;
+      }
+
+      /* Replace the rows with the sum of that row and a multiple of row i    
+       * so that each new element in column i above row i is zero.*/
+
+      /* Temporary pointers for input and destination matrices */
+      pInT1 = pIn;
+      pInT2 = pOut;
+
+      /* index used to check for pivot element */
+      i = 0u;
+
+      /* Loop over number of rows */
+      /*  to be replaced by the sum of that row and a multiple of row i */
+      k = numRows;
+
+      while(k > 0u)
+      {
+        /* Check for the pivot element */
+        if(i == l)
+        {
+          /* If the processing element is the pivot element,    
+             only the columns to the right are to be processed */
+          pInT1 += numCols - l;
+
+          pInT2 += numCols;
+        }
+        else
+        {
+          /* Element of the reference row */
+          in = *pInT1;
+
+          /* Working pointers for input and destination pivot rows */
+          pPRT_in = pPivotRowIn;
+          pPRT_pDst = pPivotRowDst;
+
+          /* Loop over the number of columns to the right of the pivot element,    
+             to replace the elements in the input matrix */
+          j = (numCols - l);
+
+          while(j > 0u)
+          {
+            /* Replace the element by the sum of that row    
+               and a multiple of the reference row  */
+            in1 = *pInT1;
+            *pInT1++ = in1 - (in * *pPRT_in++);
+
+            /* Decrement the loop counter */
+            j--;
+          }
+
+          /* Loop over the number of columns to    
+             replace the elements in the destination matrix */
+          j = numCols;
+
+          while(j > 0u)
+          {
+            /* Replace the element by the sum of that row    
+               and a multiple of the reference row  */
+            in1 = *pInT2;
+            *pInT2++ = in1 - (in * *pPRT_pDst++);
+
+            /* Decrement the loop counter */
+            j--;
+          }
+
+        }
+
+        /* Increment the temporary input pointer */
+        pInT1 = pInT1 + l;
+
+        /* Decrement the loop counter */
+        k--;
+
+        /* Increment the pivot index */
+        i++;
+      }
+
+      /* Increment the input pointer */
+      pIn++;
+
+      /* Decrement the loop counter */
+      loopCnt--;
+
+      /* Increment the index modifier */
+      l++;
+    }
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float64_t Xchg, in = 0.0f;                     /* Temporary input values  */
+  uint32_t i, rowCnt, flag = 0u, j, loopCnt, k, l;      /* loop counters */
+  arm_status status;                             /* status of matrix inverse */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pSrc->numCols) || (pDst->numRows != pDst->numCols)
+     || (pSrc->numRows != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+  {
+
+    /*--------------------------------------------------------------------------------------------------------------       
+	 * Matrix Inverse can be solved using elementary row operations.        
+	 *        
+	 *	Gauss-Jordan Method:       
+	 *	 	       
+	 *	   1. First combine the identity matrix and the input matrix separated by a bar to form an        
+	 *        augmented matrix as follows:        
+	 *				        _  _	      _	    _	   _   _         _	       _       
+	 *					   |  |  a11  a12  | | | 1   0  |   |       |  X11 X12  |         
+	 *					   |  |            | | |        |   |   =   |           |        
+	 *					   |_ |_ a21  a22 _| | |_0   1 _|  _|       |_ X21 X21 _|       
+	 *					          
+	 *		2. In our implementation, pDst Matrix is used as identity matrix.    
+	 *       
+	 *		3. Begin with the first row. Let i = 1.       
+	 *       
+	 *	    4. Check to see if the pivot for row i is zero.       
+	 *		   The pivot is the element of the main diagonal that is on the current row.       
+	 *		   For instance, if working with row i, then the pivot element is aii.       
+	 *		   If the pivot is zero, exchange that row with a row below it that does not        
+	 *		   contain a zero in column i. If this is not possible, then an inverse        
+	 *		   to that matrix does not exist.       
+	 *	       
+	 *	    5. Divide every element of row i by the pivot.       
+	 *	       
+	 *	    6. For every row below and  row i, replace that row with the sum of that row and        
+	 *		   a multiple of row i so that each new element in column i below row i is zero.       
+	 *	       
+	 *	    7. Move to the next row and column and repeat steps 2 through 5 until you have zeros       
+	 *		   for every element below and above the main diagonal.        
+	 *		   		          
+	 *		8. Now an identical matrix is formed to the left of the bar(input matrix, src).       
+	 *		   Therefore, the matrix to the right of the bar is our solution(dst matrix, dst).         
+	 *----------------------------------------------------------------------------------------------------------------*/
+
+    /* Working pointer for destination matrix */
+    pOutT1 = pOut;
+
+    /* Loop over the number of rows */
+    rowCnt = numRows;
+
+    /* Making the destination matrix as identity matrix */
+    while(rowCnt > 0u)
+    {
+      /* Writing all zeroes in lower triangle of the destination matrix */
+      j = numRows - rowCnt;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Writing all ones in the diagonal of the destination matrix */
+      *pOutT1++ = 1.0f;
+
+      /* Writing all zeroes in upper triangle of the destination matrix */
+      j = rowCnt - 1u;
+      while(j > 0u)
+      {
+        *pOutT1++ = 0.0f;
+        j--;
+      }
+
+      /* Decrement the loop counter */
+      rowCnt--;
+    }
+
+    /* Loop over the number of columns of the input matrix.     
+       All the elements in each column are processed by the row operations */
+    loopCnt = numCols;
+
+    /* Index modifier to navigate through the columns */
+    l = 0u;
+    //for(loopCnt = 0u; loopCnt < numCols; loopCnt++)   
+    while(loopCnt > 0u)
+    {
+      /* Check if the pivot element is zero..    
+       * If it is zero then interchange the row with non zero row below.   
+       * If there is no non zero element to replace in the rows below,   
+       * then the matrix is Singular. */
+
+      /* Working pointer for the input matrix that points     
+       * to the pivot element of the particular row  */
+      pInT1 = pIn + (l * numCols);
+
+      /* Working pointer for the destination matrix that points     
+       * to the pivot element of the particular row  */
+      pOutT1 = pOut + (l * numCols);
+
+      /* Temporary variable to hold the pivot value */
+      in = *pInT1;
+
+      /* Destination pointer modifier */
+      k = 1u;
+
+      /* Check if the pivot element is zero */
+      if(*pInT1 == 0.0f)
+      {
+        /* Loop over the number rows present below */
+        for (i = (l + 1u); i < numRows; i++)
+        {
+          /* Update the input and destination pointers */
+          pInT2 = pInT1 + (numCols * l);
+          pOutT2 = pOutT1 + (numCols * k);
+
+          /* Check if there is a non zero pivot element to     
+           * replace in the rows below */
+          if(*pInT2 != 0.0f)
+          {
+            /* Loop over number of columns     
+             * to the right of the pilot element */
+            for (j = 0u; j < (numCols - l); j++)
+            {
+              /* Exchange the row elements of the input matrix */
+              Xchg = *pInT2;
+              *pInT2++ = *pInT1;
+              *pInT1++ = Xchg;
+            }
+
+            for (j = 0u; j < numCols; j++)
+            {
+              Xchg = *pOutT2;
+              *pOutT2++ = *pOutT1;
+              *pOutT1++ = Xchg;
+            }
+
+            /* Flag to indicate whether exchange is done or not */
+            flag = 1u;
+
+            /* Break after exchange is done */
+            break;
+          }
+
+          /* Update the destination pointer modifier */
+          k++;
+        }
+      }
+
+      /* Update the status if the matrix is singular */
+      if((flag != 1u) && (in == 0.0f))
+      {
+        return ARM_MATH_SINGULAR;
+      }
+
+      /* Points to the pivot row of input and destination matrices */
+      pPivotRowIn = pIn + (l * numCols);
+      pPivotRowDst = pOut + (l * numCols);
+
+      /* Temporary pointers to the pivot row pointers */
+      pInT1 = pPivotRowIn;
+      pOutT1 = pPivotRowDst;
+
+      /* Pivot element of the row */
+      in = *(pIn + (l * numCols));
+
+      /* Loop over number of columns     
+       * to the right of the pilot element */
+      for (j = 0u; j < (numCols - l); j++)
+      {
+        /* Divide each element of the row of the input matrix     
+         * by the pivot element */
+        *pInT1 = *pInT1 / in;
+        pInT1++;
+      }
+      for (j = 0u; j < numCols; j++)
+      {
+        /* Divide each element of the row of the destination matrix     
+         * by the pivot element */
+        *pOutT1 = *pOutT1 / in;
+        pOutT1++;
+      }
+
+      /* Replace the rows with the sum of that row and a multiple of row i     
+       * so that each new element in column i above row i is zero.*/
+
+      /* Temporary pointers for input and destination matrices */
+      pInT1 = pIn;
+      pOutT1 = pOut;
+
+      for (i = 0u; i < numRows; i++)
+      {
+        /* Check for the pivot element */
+        if(i == l)
+        {
+          /* If the processing element is the pivot element,     
+             only the columns to the right are to be processed */
+          pInT1 += numCols - l;
+          pOutT1 += numCols;
+        }
+        else
+        {
+          /* Element of the reference row */
+          in = *pInT1;
+
+          /* Working pointers for input and destination pivot rows */
+          pPRT_in = pPivotRowIn;
+          pPRT_pDst = pPivotRowDst;
+
+          /* Loop over the number of columns to the right of the pivot element,     
+             to replace the elements in the input matrix */
+          for (j = 0u; j < (numCols - l); j++)
+          {
+            /* Replace the element by the sum of that row     
+               and a multiple of the reference row  */
+            *pInT1 = *pInT1 - (in * *pPRT_in++);
+            pInT1++;
+          }
+          /* Loop over the number of columns to     
+             replace the elements in the destination matrix */
+          for (j = 0u; j < numCols; j++)
+          {
+            /* Replace the element by the sum of that row     
+               and a multiple of the reference row  */
+            *pOutT1 = *pOutT1 - (in * *pPRT_pDst++);
+            pOutT1++;
+          }
+
+        }
+        /* Increment the temporary input pointer */
+        pInT1 = pInT1 + l;
+      }
+      /* Increment the input pointer */
+      pIn++;
+
+      /* Decrement the loop counter */
+      loopCnt--;
+      /* Increment the index modifier */
+      l++;
+    }
+
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+
+    if((flag != 1u) && (in == 0.0f))
+    {
+      pIn = pSrc->pData;
+      for (i = 0; i < numRows * numCols; i++)
+      {
+        if (pIn[i] != 0.0f)
+            break;
+      }
+      
+      if (i == numRows * numCols)
+        status = ARM_MATH_SINGULAR;
+    }
+  }
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixInv group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,286 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_mult_f32.c    
+*    
+* Description:  Floating-point matrix multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @defgroup MatrixMult Matrix Multiplication    
+ *    
+ * Multiplies two matrices.    
+ *    
+ * \image html MatrixMultiplication.gif "Multiplication of two 3 x 3 matrices"    
+    
+ * Matrix multiplication is only defined if the number of columns of the    
+ * first matrix equals the number of rows of the second matrix.    
+ * Multiplying an <code>M x N</code> matrix with an <code>N x P</code> matrix results    
+ * in an <code>M x P</code> matrix.    
+ * When matrix size checking is enabled, the functions check: (1) that the inner dimensions of    
+ * <code>pSrcA</code> and <code>pSrcB</code> are equal; and (2) that the size of the output    
+ * matrix equals the outer dimensions of <code>pSrcA</code> and <code>pSrcB</code>.    
+ */
+
+
+/**    
+ * @addtogroup MatrixMult    
+ * @{    
+ */
+
+/**    
+ * @brief Floating-point matrix multiplication.    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ */
+
+arm_status arm_mat_mult_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn1 = pSrcA->pData;                /* input data matrix pointer A */
+  float32_t *pIn2 = pSrcB->pData;                /* input data matrix pointer B */
+  float32_t *pInA = pSrcA->pData;                /* input data matrix pointer A  */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer */
+  float32_t *px;                                 /* Temporary output data matrix pointer */
+  float32_t sum;                                 /* Accumulator */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t in1, in2, in3, in4;
+  uint16_t col, i = 0u, j, row = numRowsA, colCnt;      /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set    
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      j = 0u;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0.0f;
+
+        /* Initiate the pointer pIn1 to point to the starting address of the column being processed */
+        pIn1 = pInA;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        colCnt = numColsA >> 2u;
+
+        /* matrix multiplication        */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          in3 = *pIn2;
+          pIn2 += numColsB;
+          in1 = pIn1[0];
+          in2 = pIn1[1];
+          sum += in1 * in3;
+          in4 = *pIn2;
+          pIn2 += numColsB;
+          sum += in2 * in4;
+
+          in3 = *pIn2;
+          pIn2 += numColsB;
+          in1 = pIn1[2];
+          in2 = pIn1[3];
+          sum += in1 * in3;
+          in4 = *pIn2;
+          pIn2 += numColsB;
+          sum += in2 * in4;
+          pIn1 += 4u;
+
+          /* Decrement the loop count */
+          colCnt--;
+        }
+
+        /* If the columns of pSrcA is not a multiple of 4, compute any remaining MACs here.    
+         ** No loop unrolling is used. */
+        colCnt = numColsA % 0x4u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          sum += *pIn1++ * (*pIn2);
+          pIn2 += numColsB;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Store the result in the destination buffer */
+        *px++ = sum;
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        j++;
+        pIn2 = pSrcB->pData + j;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t *pInB = pSrcB->pData;                /* input data matrix pointer B */
+  uint16_t col, i = 0u, row = numRowsA, colCnt;  /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pInA with each column in pInB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set     
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0.0f;
+
+        /* Initialize the pointer pIn1 to point to the starting address of the row being processed */
+        pIn1 = pInA;
+
+        /* Matrix A columns number of MAC operations are to be performed */
+        colCnt = numColsA;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          sum += *pIn1++ * (*pIn2);
+          pIn2 += numColsB;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Store the result in the destination buffer */
+        *px++ = sum;
+
+        /* Decrement the column loop counter */
+        col--;
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        pIn2 = pInB + (numColsB - col);
+
+      } while(col > 0u);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+      /* Update the pointer pInA to point to the  starting address of the next row */
+      i = i + numColsB;
+      pInA = pInA + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_fast_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_fast_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,369 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_mult_fast_q15.c    
+*    
+* Description:	 Q15 matrix multiplication (fast variant)    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixMult    
+ * @{    
+ */
+
+
+/**    
+ * @brief Q15 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @param[in]		*pState points to the array for storing intermediate results    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The difference between the function arm_mat_mult_q15() and this fast variant is that    
+ * the fast variant use a 32-bit rather than a 64-bit accumulator.    
+ * The result of each 1.15 x 1.15 multiplication is truncated to        
+ * 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30        
+ * format. Finally, the accumulator is saturated and converted to a 1.15 result.        
+ *        
+ * \par        
+ * The fast version has the same overflow behavior as the standard version but provides        
+ * less precision since it discards the low 16 bits of each multiplication result.        
+ * In order to avoid overflows completely the input signals must be scaled down.        
+ * Scale down one of the input matrices by log2(numColsA) bits to        
+ * avoid overflows, as a total of numColsA additions are computed internally for each        
+ * output element.        
+ *        
+ * \par    
+ * See <code>arm_mat_mult_q15()</code> for a slower implementation of this function    
+ * which uses 64-bit accumulation to provide higher precision.    
+ */
+
+arm_status arm_mat_mult_fast_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst,
+  q15_t * pState)
+{
+  q31_t sum;                                     /* accumulator */
+  q15_t *pSrcBT = pState;                        /* input data matrix pointer for transpose */
+  q15_t *pInA = pSrcA->pData;                    /* input data matrix pointer A of Q15 type */
+  q15_t *pInB = pSrcB->pData;                    /* input data matrix pointer B of Q15 type */
+  q15_t *px;                                     /* Temporary output data matrix pointer */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A    */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  uint16_t numRowsB = pSrcB->numRows;            /* number of rows of input matrix A    */
+  uint16_t col, i = 0u, row = numRowsB, colCnt;  /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  q31_t in;                                      /* Temporary variable to hold the input value */
+  q31_t inA1, inA2, inB1, inB2;
+
+#else
+
+  q15_t in;                                      /* Temporary variable to hold the input value */
+  q15_t inA1, inA2, inB1, inB2;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif
+  {
+    /* Matrix transpose */
+    do
+    {
+      /* Apply loop unrolling and exchange the columns with row elements */
+      col = numColsB >> 2;
+
+      /* The pointer px is set to starting address of the column being processed */
+      px = pSrcBT + i;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+       ** a second loop below computes the remaining 1 to 3 samples. */
+      while(col > 0u)
+      {
+#ifndef UNALIGNED_SUPPORT_DISABLE
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        /* Unpack and store one element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) in;
+
+#else
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Unpack and store the second element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#else
+
+        *px = (q15_t) in;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        /* Unpack and store one element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) in;
+
+#else
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Unpack and store the second element in the destination */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#else
+
+        *px = (q15_t) in;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+#else
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+        
+		/* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      /* If the columns of pSrcB is not a multiple of 4, compute any remaining output samples here.        
+       ** No loop unrolling is used. */
+      col = numColsB % 0x4u;
+
+      while(col > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pInB++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      i++;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* Reset the variables for the usage in the following multiplication process */
+    row = numRowsA;
+    i = 0u;
+    px = pDst->pData;
+
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set        
+       ** to the starting address of the transposed pSrcB data */
+      pInB = pSrcBT;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 2 MACs simultaneously. */
+        colCnt = numColsA >> 2;
+
+        /* Initiate the pointer pIn1 to point to the starting address of the column being processed */
+        pInA = pSrcA->pData + i;
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+          inA1 = *__SIMD32(pInA)++;
+          inB1 = *__SIMD32(pInB)++;
+          inA2 = *__SIMD32(pInA)++;
+          inB2 = *__SIMD32(pInB)++;
+
+          sum = __SMLAD(inA1, inB1, sum);
+          sum = __SMLAD(inA2, inB2, sum);
+
+#else
+
+          inA1 = *pInA++;
+          inB1 = *pInB++;
+          inA2 = *pInA++;
+          sum += inA1 * inB1;
+          inB2 = *pInB++;
+
+          inA1 = *pInA++;
+          inB1 = *pInB++;
+          sum += inA2 * inB2;
+          inA2 = *pInA++;
+          inB2 = *pInB++;
+
+          sum += inA1 * inB1;
+          sum += inA2 * inB2;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* process odd column samples */
+        colCnt = numColsA % 0x4u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          sum += (q31_t) (*pInA++) * (*pInB++);
+
+          colCnt--;
+        }
+
+        /* Saturate and store the result in the destination buffer */
+        *px = (q15_t) (sum >> 15);
+        px++;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+      i = i + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixMult group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_fast_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_fast_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,226 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_mult_fast_q31.c    
+*    
+* Description:	 Q31 matrix multiplication (fast variant).    
+*    
+* Target Processor: Cortex-M4/Cortex-M3
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixMult    
+ * @{    
+ */
+
+/**    
+ * @brief Q31 matrix multiplication (fast variant) for Cortex-M3 and Cortex-M4    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The difference between the function arm_mat_mult_q31() and this fast variant is that    
+ * the fast variant use a 32-bit rather than a 64-bit accumulator.    
+ * The result of each 1.31 x 1.31 multiplication is truncated to    
+ * 2.30 format. These intermediate results are accumulated in a 32-bit register in 2.30    
+ * format. Finally, the accumulator is saturated and converted to a 1.31 result.    
+ *    
+ * \par    
+ * The fast version has the same overflow behavior as the standard version but provides    
+ * less precision since it discards the low 32 bits of each multiplication result.    
+ * In order to avoid overflows completely the input signals must be scaled down.    
+ * Scale down one of the input matrices by log2(numColsA) bits to    
+ * avoid overflows, as a total of numColsA additions are computed internally for each    
+ * output element.    
+ *    
+ * \par    
+ * See <code>arm_mat_mult_q31()</code> for a slower implementation of this function    
+ * which uses 64-bit accumulation to provide higher precision.    
+ */
+
+arm_status arm_mat_mult_fast_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn1 = pSrcA->pData;                    /* input data matrix pointer A */
+  q31_t *pIn2 = pSrcB->pData;                    /* input data matrix pointer B */
+  q31_t *pInA = pSrcA->pData;                    /* input data matrix pointer A */
+//  q31_t *pSrcB = pSrcB->pData;                    /* input data matrix pointer B */    
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  q31_t *px;                                     /* Temporary output data matrix pointer */
+  q31_t sum;                                     /* Accumulator */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A    */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  uint16_t col, i = 0u, j, row = numRowsA, colCnt;      /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+  q31_t inA1, inA2, inA3, inA4, inB1, inB2, inB3, inB4;
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set    
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      j = 0u;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0;
+
+        /* Initiate the pointer pIn1 to point to the starting address of pInA */
+        pIn1 = pInA;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        colCnt = numColsA >> 2;
+
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          /* Perform the multiply-accumulates */
+          inB1 = *pIn2;
+          pIn2 += numColsB;
+
+          inA1 = pIn1[0];
+          inA2 = pIn1[1];
+
+          inB2 = *pIn2;
+          pIn2 += numColsB;
+
+          inB3 = *pIn2;
+          pIn2 += numColsB;
+
+          sum = (q31_t) ((((q63_t) sum << 32) + ((q63_t) inA1 * inB1)) >> 32);
+          sum = (q31_t) ((((q63_t) sum << 32) + ((q63_t) inA2 * inB2)) >> 32);
+
+          inA3 = pIn1[2];
+          inA4 = pIn1[3];
+
+          inB4 = *pIn2;
+          pIn2 += numColsB;
+
+          sum = (q31_t) ((((q63_t) sum << 32) + ((q63_t) inA3 * inB3)) >> 32);
+          sum = (q31_t) ((((q63_t) sum << 32) + ((q63_t) inA4 * inB4)) >> 32);
+
+          pIn1 += 4u;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* If the columns of pSrcA is not a multiple of 4, compute any remaining output samples here.    
+         ** No loop unrolling is used. */
+        colCnt = numColsA % 0x4u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          /* Perform the multiply-accumulates */
+          sum = (q31_t) ((((q63_t) sum << 32) +
+                          ((q63_t) * pIn1++ * (*pIn2))) >> 32);
+          pIn2 += numColsB;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Convert the result from 2.30 to 1.31 format and store in destination buffer */
+        *px++ = sum << 1;
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        j++;
+        pIn2 = pSrcB->pData + j;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+      /* Update the pointer pInA to point to the  starting address of the next row */
+      i = i + numColsB;
+      pInA = pInA + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,469 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_mult_q15.c    
+*    
+* Description:	 Q15 matrix multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixMult    
+ * @{    
+ */
+
+
+/**    
+ * @brief Q15 matrix multiplication    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @param[in]		*pState points to the array for storing intermediate results (Unused)  
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator. The inputs to the    
+ * multiplications are in 1.15 format and multiplications yield a 2.30 result.    
+ * The 2.30 intermediate    
+ * results are accumulated in a 64-bit accumulator in 34.30 format. This approach    
+ * provides 33 guard bits and there is no risk of overflow. The 34.30 result is then    
+ * truncated to 34.15 format by discarding the low 15 bits and then saturated to    
+ * 1.15 format.    
+ *    
+ * \par    
+ * Refer to <code>arm_mat_mult_fast_q15()</code> for a faster but less precise version of this function for Cortex-M3 and Cortex-M4.    
+ *    
+ */
+
+arm_status arm_mat_mult_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst,
+  q15_t * pState CMSIS_UNUSED)
+{
+  q63_t sum;                                     /* accumulator */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q15_t *pSrcBT = pState;                        /* input data matrix pointer for transpose */
+  q15_t *pInA = pSrcA->pData;                    /* input data matrix pointer A of Q15 type */
+  q15_t *pInB = pSrcB->pData;                    /* input data matrix pointer B of Q15 type */
+  q15_t *px;                                     /* Temporary output data matrix pointer */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A    */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  uint16_t numRowsB = pSrcB->numRows;            /* number of rows of input matrix A    */
+  uint16_t col, i = 0u, row = numRowsB, colCnt;  /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  q31_t in;                                      /* Temporary variable to hold the input value */
+  q31_t pSourceA1, pSourceB1, pSourceA2, pSourceB2;
+
+#else
+
+  q15_t in;                                      /* Temporary variable to hold the input value */
+  q15_t inA1, inB1, inA2, inB2;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+  {
+    /* Matrix transpose */
+    do
+    {
+      /* Apply loop unrolling and exchange the columns with row elements */
+      col = numColsB >> 2;
+
+      /* The pointer px is set to starting address of the column being processed */
+      px = pSrcBT + i;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.        
+       ** a second loop below computes the remaining 1 to 3 samples. */
+      while(col > 0u)
+      {
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        /* Unpack and store one element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) in;
+
+#else
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Unpack and store the second element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#else
+
+        *px = (q15_t) in;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pInB)++;
+
+        /* Unpack and store one element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) in;
+
+#else
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Unpack and store the second element in the destination */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *px = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#else
+
+        *px = (q15_t) in;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+#else
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Read one element from the row */
+        in = *pInB++;
+
+        /* Store one element in the destination */
+        *px = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+       /* Decrement the column loop counter */
+        col--;
+      }
+
+      /* If the columns of pSrcB is not a multiple of 4, compute any remaining output samples here.        
+       ** No loop unrolling is used. */
+      col = numColsB % 0x4u;
+
+      while(col > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pInB++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += numRowsB;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      i++;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* Reset the variables for the usage in the following multiplication process */
+    row = numRowsA;
+    i = 0u;
+    px = pDst->pData;
+
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set        
+       ** to the starting address of the transposed pSrcB data */
+      pInB = pSrcBT;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0;
+
+        /* Apply loop unrolling and compute 2 MACs simultaneously. */
+        colCnt = numColsA >> 2;
+
+        /* Initiate the pointer pIn1 to point to the starting address of the column being processed */
+        pInA = pSrcA->pData + i;
+
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          pSourceA1 = *__SIMD32(pInA)++;
+          pSourceB1 = *__SIMD32(pInB)++;
+
+          pSourceA2 = *__SIMD32(pInA)++;
+          pSourceB2 = *__SIMD32(pInB)++;
+
+          /* Multiply and Accumlates */
+          sum = __SMLALD(pSourceA1, pSourceB1, sum);
+          sum = __SMLALD(pSourceA2, pSourceB2, sum);
+
+#else
+          /* read real and imag values from pSrcA and pSrcB buffer */
+          inA1 = *pInA++;
+          inB1 = *pInB++;
+          inA2 = *pInA++;
+          /* Multiply and Accumlates */
+          sum += inA1 * inB1;
+          inB2 = *pInB++;
+
+          inA1 = *pInA++;
+          inB1 = *pInB++;
+          /* Multiply and Accumlates */
+          sum += inA2 * inB2;
+          inA2 = *pInA++;
+          inB2 = *pInB++;
+
+          /* Multiply and Accumlates */
+          sum += inA1 * inB1;
+          sum += inA2 * inB2;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* process remaining column samples */
+        colCnt = numColsA & 3u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          sum += *pInA++ * *pInB++;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Saturate and store the result in the destination buffer */
+        *px = (q15_t) (__SSAT((sum >> 15), 16));
+        px++;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+      i = i + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q15_t *pIn1 = pSrcA->pData;                    /* input data matrix pointer A */
+  q15_t *pIn2 = pSrcB->pData;                    /* input data matrix pointer B */
+  q15_t *pInA = pSrcA->pData;                    /* input data matrix pointer A of Q15 type */
+  q15_t *pInB = pSrcB->pData;                    /* input data matrix pointer B of Q15 type */
+  q15_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  q15_t *px;                                     /* Temporary output data matrix pointer */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A    */
+  uint16_t col, i = 0u, row = numRowsA, colCnt;  /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set          
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0;
+
+        /* Initiate the pointer pIn1 to point to the starting address of pSrcA */
+        pIn1 = pInA;
+
+        /* Matrix A columns number of MAC operations are to be performed */
+        colCnt = numColsA;
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          /* Perform the multiply-accumulates */
+          sum += (q31_t) * pIn1++ * *pIn2;
+          pIn2 += numColsB;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Convert the result from 34.30 to 1.15 format and store the saturated value in destination buffer */
+        /* Saturate and store the result in the destination buffer */
+        *px++ = (q15_t) __SSAT((sum >> 15), 16);
+
+        /* Decrement the column loop counter */
+        col--;
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        pIn2 = pInB + (numColsB - col);
+
+      } while(col > 0u);
+
+      /* Update the pointer pSrcA to point to the  starting address of the next row */
+      i = i + numColsB;
+      pInA = pInA + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixMult group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_mult_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,294 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_mult_q31.c    
+*    
+* Description:	 Q31 matrix multiplication.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixMult    
+ * @{    
+ */
+
+/**    
+ * @brief Q31 matrix multiplication    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using an internal 64-bit accumulator.    
+ * The accumulator has a 2.62 format and maintains full precision of the intermediate    
+ * multiplication results but provides only a single guard bit. There is no saturation    
+ * on intermediate additions. Thus, if the accumulator overflows it wraps around and    
+ * distorts the result. The input signals should be scaled down to avoid intermediate    
+ * overflows. The input is thus scaled down by log2(numColsA) bits    
+ * to avoid overflows, as a total of numColsA additions are performed internally.    
+ * The 2.62 accumulator is right shifted by 31 bits and saturated to 1.31 format to yield the final result.    
+ *    
+ * \par    
+ * See <code>arm_mat_mult_fast_q31()</code> for a faster but less precise implementation of this function for Cortex-M3 and Cortex-M4.    
+ *    
+ */
+
+arm_status arm_mat_mult_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn1 = pSrcA->pData;                    /* input data matrix pointer A */
+  q31_t *pIn2 = pSrcB->pData;                    /* input data matrix pointer B */
+  q31_t *pInA = pSrcA->pData;                    /* input data matrix pointer A */
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  q31_t *px;                                     /* Temporary output data matrix pointer */
+  q63_t sum;                                     /* Accumulator */
+  uint16_t numRowsA = pSrcA->numRows;            /* number of rows of input matrix A    */
+  uint16_t numColsB = pSrcB->numCols;            /* number of columns of input matrix B */
+  uint16_t numColsA = pSrcA->numCols;            /* number of columns of input matrix A */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  uint16_t col, i = 0u, j, row = numRowsA, colCnt;      /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+  q31_t a0, a1, a2, a3, b0, b1, b2, b3;
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set    
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      j = 0u;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0;
+
+        /* Initiate the pointer pIn1 to point to the starting address of pInA */
+        pIn1 = pInA;
+
+        /* Apply loop unrolling and compute 4 MACs simultaneously. */
+        colCnt = numColsA >> 2;
+
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          /* Perform the multiply-accumulates */
+          b0 = *pIn2;
+          pIn2 += numColsB;
+
+          a0 = *pIn1++;
+          a1 = *pIn1++;
+
+          b1 = *pIn2;
+          pIn2 += numColsB;
+          b2 = *pIn2;
+          pIn2 += numColsB;
+
+          sum += (q63_t) a0 *b0;
+          sum += (q63_t) a1 *b1;
+
+          a2 = *pIn1++;
+          a3 = *pIn1++;
+
+          b3 = *pIn2;
+          pIn2 += numColsB;
+
+          sum += (q63_t) a2 *b2;
+          sum += (q63_t) a3 *b3;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* If the columns of pSrcA is not a multiple of 4, compute any remaining output samples here.    
+         ** No loop unrolling is used. */
+        colCnt = numColsA % 0x4u;
+
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          /* Perform the multiply-accumulates */
+          sum += (q63_t) * pIn1++ * *pIn2;
+          pIn2 += numColsB;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Convert the result from 2.62 to 1.31 format and store in destination buffer */
+        *px++ = (q31_t) (sum >> 31);
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        j++;
+        pIn2 = (pSrcB->pData) + j;
+
+        /* Decrement the column loop counter */
+        col--;
+
+      } while(col > 0u);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q31_t *pInB = pSrcB->pData;                    /* input data matrix pointer B */
+  uint16_t col, i = 0u, row = numRowsA, colCnt;  /* loop counters */
+  arm_status status;                             /* status of matrix multiplication */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numCols != pSrcB->numRows) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcB->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* The following loop performs the dot-product of each row in pSrcA with each column in pSrcB */
+    /* row loop */
+    do
+    {
+      /* Output pointer is set to starting address of the row being processed */
+      px = pOut + i;
+
+      /* For every row wise process, the column loop counter is to be initiated */
+      col = numColsB;
+
+      /* For every row wise process, the pIn2 pointer is set          
+       ** to the starting address of the pSrcB data */
+      pIn2 = pSrcB->pData;
+
+      /* column loop */
+      do
+      {
+        /* Set the variable sum, that acts as accumulator, to zero */
+        sum = 0;
+
+        /* Initiate the pointer pIn1 to point to the starting address of pInA */
+        pIn1 = pInA;
+
+        /* Matrix A columns number of MAC operations are to be performed */
+        colCnt = numColsA;
+
+        /* matrix multiplication */
+        while(colCnt > 0u)
+        {
+          /* c(m,n) = a(1,1)*b(1,1) + a(1,2) * b(2,1) + .... + a(m,p)*b(p,n) */
+          /* Perform the multiply-accumulates */
+          sum += (q63_t) * pIn1++ * *pIn2;
+          pIn2 += numColsB;
+
+          /* Decrement the loop counter */
+          colCnt--;
+        }
+
+        /* Convert the result from 2.62 to 1.31 format and store in destination buffer */
+        *px++ = (q31_t) clip_q63_to_q31(sum >> 31);
+
+        /* Decrement the column loop counter */
+        col--;
+
+        /* Update the pointer pIn2 to point to the  starting address of the next column */
+        pIn2 = pInB + (numColsB - col);
+
+      } while(col > 0u);
+
+#endif
+
+      /* Update the pointer pInA to point to the  starting address of the next row */
+      i = i + numColsB;
+      pInA = pInA + numColsA;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixMult group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,181 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:        arm_mat_scale_f32.c    
+*    
+* Description:	Multiplies a floating-point matrix by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMatrix        
+ */
+
+/**        
+ * @defgroup MatrixScale Matrix Scale        
+ *        
+ * Multiplies a matrix by a scalar.  This is accomplished by multiplying each element in the        
+ * matrix by the scalar.  For example:        
+ * \image html MatrixScale.gif "Matrix Scaling of a 3 x 3 matrix"        
+ *        
+ * The function checks to make sure that the input and output matrices are of the same size.        
+ *        
+ * In the fixed-point Q15 and Q31 functions, <code>scale</code> is represented by        
+ * a fractional multiplication <code>scaleFract</code> and an arithmetic shift <code>shift</code>.        
+ * The shift allows the gain of the scaling operation to exceed 1.0.        
+ * The overall scale factor applied to the fixed-point data is        
+ * <pre>        
+ *     scale = scaleFract * 2^shift.        
+ * </pre>        
+ */
+
+/**        
+ * @addtogroup MatrixScale        
+ * @{        
+ */
+
+/**        
+ * @brief Floating-point matrix scaling.        
+ * @param[in]       *pSrc points to input matrix structure        
+ * @param[in]       scale scale factor to be applied         
+ * @param[out]      *pDst points to output matrix structure        
+ * @return     		The function returns either <code>ARM_MATH_SIZE_MISMATCH</code>         
+ * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.        
+ *        
+ */
+
+arm_status arm_mat_scale_f32(
+  const arm_matrix_instance_f32 * pSrc,
+  float32_t scale,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn = pSrc->pData;                  /* input data matrix pointer */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer */
+  uint32_t numSamples;                           /* total number of elements in the matrix */
+  uint32_t blkCnt;                               /* loop counters */
+  arm_status status;                             /* status of matrix scaling     */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  float32_t in1, in2, in3, in4;                  /* temporary variables */
+  float32_t out1, out2, out3, out4;              /* temporary variables */
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numRows) || (pSrc->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrc->numRows * pSrc->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Loop Unrolling */
+    blkCnt = numSamples >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) * scale */
+      /* Scaling and results are stored in the destination buffer. */
+      in1 = pIn[0];
+      in2 = pIn[1];
+      in3 = pIn[2];
+      in4 = pIn[3];
+
+      out1 = in1 * scale;
+      out2 = in2 * scale;
+      out3 = in3 * scale;
+      out4 = in4 * scale;
+
+
+      pOut[0] = out1;
+      pOut[1] = out2;
+      pOut[2] = out3;
+      pOut[3] = out4;
+
+      /* update pointers to process next sampels */
+      pIn += 4u;
+      pOut += 4u;
+
+      /* Decrement the numSamples loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) * scale */
+      /* The results are stored in the destination buffer. */
+      *pOut++ = (*pIn++) * scale;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixScale group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,183 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_scale_q15.c    
+*    
+* Description:	Multiplies a Q15 matrix by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixScale    
+ * @{    
+ */
+
+/**    
+ * @brief Q15 matrix scaling.    
+ * @param[in]       *pSrc points to input matrix    
+ * @param[in]       scaleFract fractional portion of the scale factor    
+ * @param[in]       shift number of bits to shift the result by    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The input data <code>*pSrc</code> and <code>scaleFract</code> are in 1.15 format.    
+ * These are multiplied to yield a 2.30 intermediate result and this is shifted with saturation to 1.15 format.    
+ */
+
+arm_status arm_mat_scale_q15(
+  const arm_matrix_instance_q15 * pSrc,
+  q15_t scaleFract,
+  int32_t shift,
+  arm_matrix_instance_q15 * pDst)
+{
+  q15_t *pIn = pSrc->pData;                      /* input data matrix pointer */
+  q15_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  uint32_t numSamples;                           /* total number of elements in the matrix */
+  int32_t totShift = 15 - shift;                 /* total shift to apply after scaling */
+  uint32_t blkCnt;                               /* loop counters */
+  arm_status status;                             /* status of matrix scaling     */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  q15_t in1, in2, in3, in4;
+  q31_t out1, out2, out3, out4;
+  q31_t inA1, inA2;
+
+#endif //     #ifndef ARM_MATH_CM0_FAMILY
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch */
+  if((pSrc->numRows != pDst->numRows) || (pSrc->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif //    #ifdef ARM_MATH_MATRIX_CHECK
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrc->numRows * pSrc->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+    /* Loop Unrolling */
+    blkCnt = numSamples >> 2;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) * k */
+      /* Scale, saturate and then store the results in the destination buffer. */
+      /* Reading 2 inputs from memory */
+      inA1 = _SIMD32_OFFSET(pIn);
+      inA2 = _SIMD32_OFFSET(pIn + 2);
+
+      /* C = A * scale */
+      /* Scale the inputs and then store the 2 results in the destination buffer        
+       * in single cycle by packing the outputs */
+      out1 = (q31_t) ((q15_t) (inA1 >> 16) * scaleFract);
+      out2 = (q31_t) ((q15_t) inA1 * scaleFract);
+      out3 = (q31_t) ((q15_t) (inA2 >> 16) * scaleFract);
+      out4 = (q31_t) ((q15_t) inA2 * scaleFract);
+
+      out1 = out1 >> totShift;
+      inA1 = _SIMD32_OFFSET(pIn + 4);
+      out2 = out2 >> totShift;
+      inA2 = _SIMD32_OFFSET(pIn + 6);
+      out3 = out3 >> totShift;
+      out4 = out4 >> totShift;
+
+      in1 = (q15_t) (__SSAT(out1, 16));
+      in2 = (q15_t) (__SSAT(out2, 16));
+      in3 = (q15_t) (__SSAT(out3, 16));
+      in4 = (q15_t) (__SSAT(out4, 16));
+
+      _SIMD32_OFFSET(pOut) = __PKHBT(in2, in1, 16);
+      _SIMD32_OFFSET(pOut + 2) = __PKHBT(in4, in3, 16);
+
+      /* update pointers to process next sampels */
+      pIn += 4u;
+      pOut += 4u;
+
+
+      /* Decrement the numSamples loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) * k */
+      /* Scale, saturate and then store the results in the destination buffer. */
+      *pOut++ =
+        (q15_t) (__SSAT(((q31_t) (*pIn++) * scaleFract) >> totShift, 16));
+
+      /* Decrement the numSamples loop counter */
+      blkCnt--;
+    }
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixScale group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_scale_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,202 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_scale_q31.c    
+*    
+* Description:	Multiplies a Q31 matrix by a scalar.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  ------------------------------------------------ */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMatrix        
+ */
+
+/**        
+ * @addtogroup MatrixScale        
+ * @{        
+ */
+
+/**        
+ * @brief Q31 matrix scaling.        
+ * @param[in]       *pSrc points to input matrix        
+ * @param[in]       scaleFract fractional portion of the scale factor        
+ * @param[in]       shift number of bits to shift the result by        
+ * @param[out]      *pDst points to output matrix structure        
+ * @return     		The function returns either        
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.        
+ *        
+ * @details        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The input data <code>*pSrc</code> and <code>scaleFract</code> are in 1.31 format.        
+ * These are multiplied to yield a 2.62 intermediate result and this is shifted with saturation to 1.31 format.        
+ */
+
+arm_status arm_mat_scale_q31(
+  const arm_matrix_instance_q31 * pSrc,
+  q31_t scaleFract,
+  int32_t shift,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn = pSrc->pData;                      /* input data matrix pointer */
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  uint32_t numSamples;                           /* total number of elements in the matrix */
+  int32_t totShift = shift + 1;                  /* shift to apply after scaling */
+  uint32_t blkCnt;                               /* loop counters  */
+  arm_status status;                             /* status of matrix scaling      */
+  q31_t in1, in2, out1;                          /* temporary variabels */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  q31_t in3, in4, out2, out3, out4;              /* temporary variables */
+
+#endif //      #ifndef ARM_MAT_CM0
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch  */
+  if((pSrc->numRows != pDst->numRows) || (pSrc->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif //    #ifdef ARM_MATH_MATRIX_CHECK
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrc->numRows * pSrc->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Loop Unrolling */
+    blkCnt = numSamples >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) * k */
+      /* Read values from input */
+      in1 = *pIn;
+      in2 = *(pIn + 1);
+      in3 = *(pIn + 2);
+      in4 = *(pIn + 3);
+
+      /* multiply input with scaler value */
+      in1 = ((q63_t) in1 * scaleFract) >> 32;
+      in2 = ((q63_t) in2 * scaleFract) >> 32;
+      in3 = ((q63_t) in3 * scaleFract) >> 32;
+      in4 = ((q63_t) in4 * scaleFract) >> 32;
+
+      /* apply shifting */
+      out1 = in1 << totShift;
+      out2 = in2 << totShift;
+
+      /* saturate the results. */
+      if(in1 != (out1 >> totShift))
+        out1 = 0x7FFFFFFF ^ (in1 >> 31);
+
+      if(in2 != (out2 >> totShift))
+        out2 = 0x7FFFFFFF ^ (in2 >> 31);
+
+      out3 = in3 << totShift;
+      out4 = in4 << totShift;
+
+      *pOut = out1;
+      *(pOut + 1) = out2;
+
+      if(in3 != (out3 >> totShift))
+        out3 = 0x7FFFFFFF ^ (in3 >> 31);
+
+      if(in4 != (out4 >> totShift))
+        out4 = 0x7FFFFFFF ^ (in4 >> 31);
+
+
+      *(pOut + 2) = out3;
+      *(pOut + 3) = out4;
+
+      /* update pointers to process next sampels */
+      pIn += 4u;
+      pOut += 4u;
+
+
+      /* Decrement the numSamples loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) * k */
+      /* Scale, saturate and then store the results in the destination buffer. */
+      in1 = *pIn++;
+
+      in2 = ((q63_t) in1 * scaleFract) >> 32;
+
+      out1 = in2 << totShift;
+
+      if(in2 != (out1 >> totShift))
+        out1 = 0x7FFFFFFF ^ (in2 >> 31);
+
+      *pOut++ = out1;
+
+      /* Decrement the numSamples loop counter */
+      blkCnt--;
+    }
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixScale group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,209 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_sub_f32.c    
+*    
+* Description:	Floating-point matrix subtraction.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMatrix        
+ */
+
+/**        
+ * @defgroup MatrixSub Matrix Subtraction        
+ *        
+ * Subtract two matrices.        
+ * \image html MatrixSubtraction.gif "Subraction of two 3 x 3 matrices"        
+ *        
+ * The functions check to make sure that        
+ * <code>pSrcA</code>, <code>pSrcB</code>, and <code>pDst</code> have the same        
+ * number of rows and columns.        
+ */
+
+/**        
+ * @addtogroup MatrixSub        
+ * @{        
+ */
+
+/**        
+ * @brief Floating-point matrix subtraction        
+ * @param[in]       *pSrcA points to the first input matrix structure        
+ * @param[in]       *pSrcB points to the second input matrix structure        
+ * @param[out]      *pDst points to output matrix structure        
+ * @return     		The function returns either        
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.        
+ */
+
+arm_status arm_mat_sub_f32(
+  const arm_matrix_instance_f32 * pSrcA,
+  const arm_matrix_instance_f32 * pSrcB,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn1 = pSrcA->pData;                /* input data matrix pointer A */
+  float32_t *pIn2 = pSrcB->pData;                /* input data matrix pointer B */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer  */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  float32_t inA1, inA2, inB1, inB2, out1, out2;  /* temporary variables */
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  uint32_t numSamples;                           /* total number of elements in the matrix  */
+  uint32_t blkCnt;                               /* loop counters */
+  arm_status status;                             /* status of matrix subtraction */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numRows != pSrcB->numRows) ||
+     (pSrcA->numCols != pSrcB->numCols) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcA->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrcA->numRows * pSrcA->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Loop Unrolling */
+    blkCnt = numSamples >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract and then store the results in the destination buffer. */
+      /* Read values from source A */
+      inA1 = pIn1[0];
+
+      /* Read values from source B */
+      inB1 = pIn2[0];
+
+      /* Read values from source A */
+      inA2 = pIn1[1];
+
+      /* out = sourceA - sourceB */
+      out1 = inA1 - inB1;
+
+      /* Read values from source B */
+      inB2 = pIn2[1];
+
+      /* Read values from source A */
+      inA1 = pIn1[2];
+
+      /* out = sourceA - sourceB */
+      out2 = inA2 - inB2;
+
+      /* Read values from source B */
+      inB1 = pIn2[2];
+
+      /* Store result in destination */
+      pOut[0] = out1;
+      pOut[1] = out2;
+
+      /* Read values from source A */
+      inA2 = pIn1[3];
+
+      /* Read values from source B */
+      inB2 = pIn2[3];
+
+      /* out = sourceA - sourceB */
+      out1 = inA1 - inB1;
+
+
+      /* out = sourceA - sourceB */
+      out2 = inA2 - inB2;
+
+      /* Store result in destination */
+      pOut[2] = out1;
+
+      /* Store result in destination */
+      pOut[3] = out2;
+
+
+      /* update pointers to process next sampels */
+      pIn1 += 4u;
+      pIn2 += 4u;
+      pOut += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract and then store the results in the destination buffer. */
+      *pOut++ = (*pIn1++) - (*pIn2++);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixSub group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,160 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_sub_q15.c    
+*    
+* Description:	Q15 Matrix subtraction    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixSub    
+ * @{    
+ */
+
+/**    
+ * @brief Q15 matrix subtraction.    
+ * @param[in]       *pSrcA points to the first input matrix structure    
+ * @param[in]       *pSrcB points to the second input matrix structure    
+ * @param[out]      *pDst points to output matrix structure    
+ * @return     		The function returns either    
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ *    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ */
+
+arm_status arm_mat_sub_q15(
+  const arm_matrix_instance_q15 * pSrcA,
+  const arm_matrix_instance_q15 * pSrcB,
+  arm_matrix_instance_q15 * pDst)
+{
+  q15_t *pInA = pSrcA->pData;                    /* input data matrix pointer A */
+  q15_t *pInB = pSrcB->pData;                    /* input data matrix pointer B */
+  q15_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  uint32_t numSamples;                           /* total number of elements in the matrix */
+  uint32_t blkCnt;                               /* loop counters  */
+  arm_status status;                             /* status of matrix subtraction  */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrcA->numRows != pSrcB->numRows) ||
+     (pSrcA->numCols != pSrcB->numCols) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcA->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrcA->numRows * pSrcA->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Apply loop unrolling */
+    blkCnt = numSamples >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract, Saturate and then store the results in the destination buffer. */
+      *__SIMD32(pOut)++ = __QSUB16(*__SIMD32(pInA)++, *__SIMD32(pInB)++);
+      *__SIMD32(pOut)++ = __QSUB16(*__SIMD32(pInA)++, *__SIMD32(pInB)++);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract and then store the results in the destination buffer. */
+      *pOut++ = (q15_t) __QSUB16(*pInA++, *pInB++);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract and then store the results in the destination buffer. */
+      *pOut++ = (q15_t) __SSAT(((q31_t) * pInA++ - *pInB++), 16);
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixSub group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_sub_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,208 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_sub_q31.c    
+*    
+* Description:	Q31 matrix subtraction    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupMatrix        
+ */
+
+/**        
+ * @addtogroup MatrixSub        
+ * @{        
+ */
+
+/**        
+ * @brief Q31 matrix subtraction.        
+ * @param[in]       *pSrcA points to the first input matrix structure        
+ * @param[in]       *pSrcB points to the second input matrix structure        
+ * @param[out]      *pDst points to output matrix structure        
+ * @return     		The function returns either        
+ * <code>ARM_MATH_SIZE_MISMATCH</code> or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.        
+ *        
+ * <b>Scaling and Overflow Behavior:</b>        
+ * \par        
+ * The function uses saturating arithmetic.        
+ * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated.        
+ */
+
+
+arm_status arm_mat_sub_q31(
+  const arm_matrix_instance_q31 * pSrcA,
+  const arm_matrix_instance_q31 * pSrcB,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn1 = pSrcA->pData;                    /* input data matrix pointer A */
+  q31_t *pIn2 = pSrcB->pData;                    /* input data matrix pointer B */
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  q31_t inA1, inB1;                              /* temporary variables */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  q31_t inA2, inB2;                              /* temporary variables */
+  q31_t out1, out2;                              /* temporary variables */
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  uint32_t numSamples;                           /* total number of elements in the matrix  */
+  uint32_t blkCnt;                               /* loop counters */
+  arm_status status;                             /* status of matrix subtraction */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+  /* Check for matrix mismatch condition  */
+  if((pSrcA->numRows != pSrcB->numRows) ||
+     (pSrcA->numCols != pSrcB->numCols) ||
+     (pSrcA->numRows != pDst->numRows) || (pSrcA->numCols != pDst->numCols))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif
+  {
+    /* Total number of samples in the input matrix */
+    numSamples = (uint32_t) pSrcA->numRows * pSrcA->numCols;
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+    /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+    /* Loop Unrolling */
+    blkCnt = numSamples >> 2u;
+
+    /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+     ** a second loop below computes the remaining 1 to 3 samples. */
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract, saturate and then store the results in the destination buffer. */
+      /* Read values from source A */
+      inA1 = pIn1[0];
+
+      /* Read values from source B */
+      inB1 = pIn2[0];
+
+      /* Read values from source A */
+      inA2 = pIn1[1];
+
+      /* Subtract and saturate */
+      out1 = __QSUB(inA1, inB1);
+
+      /* Read values from source B */
+      inB2 = pIn2[1];
+
+      /* Read values from source A */
+      inA1 = pIn1[2];
+
+      /* Subtract and saturate */
+      out2 = __QSUB(inA2, inB2);
+
+      /* Read values from source B */
+      inB1 = pIn2[2];
+
+      /* Store result in destination */
+      pOut[0] = out1;
+      pOut[1] = out2;
+
+      /* Read values from source A */
+      inA2 = pIn1[3];
+
+      /* Read values from source B */
+      inB2 = pIn2[3];
+
+      /* Subtract and saturate */
+      out1 = __QSUB(inA1, inB1);
+
+      /* Subtract and saturate */
+      out2 = __QSUB(inA2, inB2);
+
+      /* Store result in destination */
+      pOut[2] = out1;
+      pOut[3] = out2;
+
+      /* update pointers to process next samples */
+      pIn1 += 4u;
+      pIn2 += 4u;
+      pOut += 4u;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* If the numSamples is not a multiple of 4, compute any remaining output samples here.        
+     ** No loop unrolling is used. */
+    blkCnt = numSamples % 0x4u;
+
+#else
+
+    /* Run the below code for Cortex-M0 */
+
+    /* Initialize blkCnt with number of samples */
+    blkCnt = numSamples;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+    while(blkCnt > 0u)
+    {
+      /* C(m,n) = A(m,n) - B(m,n) */
+      /* Subtract, saturate and then store the results in the destination buffer. */
+      inA1 = *pIn1++;
+      inB1 = *pIn2++;
+
+      inA1 = __QSUB(inA1, inB1);
+
+      *pOut++ = inA1;
+
+      /* Decrement the loop counter */
+      blkCnt--;
+    }
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**        
+ * @} end of MatrixSub group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,218 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_trans_f32.c    
+*    
+* Description:	Floating-point matrix transpose.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+/**    
+ * @defgroup MatrixTrans Matrix Transpose    
+ *    
+ * Tranposes a matrix.    
+ * Transposing an <code>M x N</code> matrix flips it around the center diagonal and results in an <code>N x M</code> matrix.    
+ * \image html MatrixTranspose.gif "Transpose of a 3 x 3 matrix"    
+ */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixTrans    
+ * @{    
+ */
+
+/**    
+  * @brief Floating-point matrix transpose.    
+  * @param[in]  *pSrc points to the input matrix    
+  * @param[out] *pDst points to the output matrix    
+  * @return 	The function returns either  <code>ARM_MATH_SIZE_MISMATCH</code>    
+  * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+  */
+
+
+arm_status arm_mat_trans_f32(
+  const arm_matrix_instance_f32 * pSrc,
+  arm_matrix_instance_f32 * pDst)
+{
+  float32_t *pIn = pSrc->pData;                  /* input data matrix pointer */
+  float32_t *pOut = pDst->pData;                 /* output data matrix pointer */
+  float32_t *px;                                 /* Temporary output data matrix pointer */
+  uint16_t nRows = pSrc->numRows;                /* number of rows */
+  uint16_t nColumns = pSrc->numCols;             /* number of columns */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  uint16_t blkCnt, i = 0u, row = nRows;          /* loop counters */
+  arm_status status;                             /* status of matrix transpose  */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numCols) || (pSrc->numCols != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Matrix transpose by exchanging the rows with columns */
+    /* row loop     */
+    do
+    {
+      /* Loop Unrolling */
+      blkCnt = nColumns >> 2;
+
+      /* The pointer px is set to starting address of the column being processed */
+      px = pOut + i;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+       ** a second loop below computes the remaining 1 to 3 samples. */
+      while(blkCnt > 0u)        /* column loop */
+      {
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Decrement the column loop counter */
+        blkCnt--;
+      }
+
+      /* Perform matrix transpose for last 3 samples here. */
+      blkCnt = nColumns % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Decrement the column loop counter */
+        blkCnt--;
+      }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  uint16_t col, i = 0u, row = nRows;             /* loop counters */
+  arm_status status;                             /* status of matrix transpose  */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numCols) || (pSrc->numCols != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Matrix transpose by exchanging the rows with columns */
+    /* row loop     */
+    do
+    {
+      /* The pointer px is set to starting address of the column being processed */
+      px = pOut + i;
+
+      /* Initialize column loop counter */
+      col = nColumns;
+
+      while(col > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+      i++;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);          /* row loop end  */
+
+    /* Set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixTrans group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,284 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_trans_q15.c    
+*    
+* Description:	Q15 matrix transpose.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixTrans    
+ * @{    
+ */
+
+/*    
+ * @brief Q15 matrix transpose.    
+ * @param[in]  *pSrc points to the input matrix    
+ * @param[out] *pDst points to the output matrix    
+ * @return 	The function returns either  <code>ARM_MATH_SIZE_MISMATCH</code>    
+ * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ */
+
+arm_status arm_mat_trans_q15(
+  const arm_matrix_instance_q15 * pSrc,
+  arm_matrix_instance_q15 * pDst)
+{
+  q15_t *pSrcA = pSrc->pData;                    /* input data matrix pointer */
+  q15_t *pOut = pDst->pData;                     /* output data matrix pointer */
+  uint16_t nRows = pSrc->numRows;                /* number of nRows */
+  uint16_t nColumns = pSrc->numCols;             /* number of nColumns */
+  uint16_t col, row = nRows, i = 0u;             /* row and column loop counters */
+  arm_status status;                             /* status of matrix transpose */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+  q31_t in;                                      /* variable to hold temporary output  */
+
+#else
+
+  q15_t in;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numCols) || (pSrc->numCols != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*      #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Matrix transpose by exchanging the rows with columns */
+    /* row loop     */
+    do
+    {
+
+      /* Apply loop unrolling and exchange the columns with row elements */
+      col = nColumns >> 2u;
+
+      /* The pointer pOut is set to starting address of the column being processed */
+      pOut = pDst->pData + i;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+       ** a second loop below computes the remaining 1 to 3 samples. */
+      while(col > 0u)
+      {
+#ifndef UNALIGNED_SUPPORT_DISABLE
+
+        /* Read two elements from the row */
+        in = *__SIMD32(pSrcA)++;
+
+        /* Unpack and store one element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *pOut = (q15_t) in;
+
+#else
+
+        *pOut = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer pOut to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Unpack and store the second element in the destination */
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *pOut = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#else
+
+        *pOut = (q15_t) in;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer pOut to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Read two elements from the row */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        in = *__SIMD32(pSrcA)++;
+
+#else
+
+        in = *__SIMD32(pSrcA)++;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Unpack and store one element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *pOut = (q15_t) in;
+
+#else
+
+        *pOut = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+        /* Update the pointer pOut to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Unpack and store the second element in the destination */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+        *pOut = (q15_t) ((in & (q31_t) 0xffff0000) >> 16);
+
+#else
+
+        *pOut = (q15_t) in;
+
+#endif /*    #ifndef ARM_MATH_BIG_ENDIAN    */
+
+#else	 
+        /* Read one element from the row */
+        in = *pSrcA++;
+
+        /* Store one element in the destination */
+        *pOut = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Read one element from the row */
+        in = *pSrcA++;
+
+        /* Store one element in the destination */
+        *pOut = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Read one element from the row */
+        in = *pSrcA++;
+
+        /* Store one element in the destination */
+        *pOut = in;
+ 
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Read one element from the row */
+        in = *pSrcA++;
+
+        /* Store one element in the destination */
+        *pOut = in;
+
+#endif	/*	#ifndef UNALIGNED_SUPPORT_DISABLE	*/
+
+        /* Update the pointer pOut to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      /* Perform matrix transpose for last 3 samples here. */
+      col = nColumns % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numCols) || (pSrc->numCols != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Matrix transpose by exchanging the rows with columns */
+    /* row loop     */
+    do
+    {
+      /* The pointer pOut is set to starting address of the column being processed */
+      pOut = pDst->pData + i;
+
+      /* Initialize column loop counter */
+      col = nColumns;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+      while(col > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *pOut = *pSrcA++;
+
+        /* Update the pointer pOut to point to the next row of the transposed matrix */
+        pOut += nRows;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+      i++;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    } while(row > 0u);
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixTrans group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/MatrixFunctions/arm_mat_trans_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,210 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_mat_trans_q31.c    
+*    
+* Description:	Q31 matrix transpose.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupMatrix    
+ */
+
+/**    
+ * @addtogroup MatrixTrans    
+ * @{    
+ */
+
+/*    
+  * @brief Q31 matrix transpose.    
+  * @param[in]  *pSrc points to the input matrix    
+  * @param[out] *pDst points to the output matrix    
+  * @return 	The function returns either  <code>ARM_MATH_SIZE_MISMATCH</code>    
+  * or <code>ARM_MATH_SUCCESS</code> based on the outcome of size checking.    
+ */
+
+arm_status arm_mat_trans_q31(
+  const arm_matrix_instance_q31 * pSrc,
+  arm_matrix_instance_q31 * pDst)
+{
+  q31_t *pIn = pSrc->pData;                      /* input data matrix pointer  */
+  q31_t *pOut = pDst->pData;                     /* output data matrix pointer  */
+  q31_t *px;                                     /* Temporary output data matrix pointer */
+  uint16_t nRows = pSrc->numRows;                /* number of nRows */
+  uint16_t nColumns = pSrc->numCols;             /* number of nColumns  */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  uint16_t blkCnt, i = 0u, row = nRows;          /* loop counters */
+  arm_status status;                             /* status of matrix transpose */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numCols) || (pSrc->numCols != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Matrix transpose by exchanging the rows with columns */
+    /* row loop     */
+    do
+    {
+      /* Apply loop unrolling and exchange the columns with row elements */
+      blkCnt = nColumns >> 2u;
+
+      /* The pointer px is set to starting address of the column being processed */
+      px = pOut + i;
+
+      /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+       ** a second loop below computes the remaining 1 to 3 samples. */
+      while(blkCnt > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Decrement the column loop counter */
+        blkCnt--;
+      }
+
+      /* Perform matrix transpose for last 3 samples here. */
+      blkCnt = nColumns % 0x4u;
+
+      while(blkCnt > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Decrement the column loop counter */
+        blkCnt--;
+      }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  uint16_t col, i = 0u, row = nRows;             /* loop counters */
+  arm_status status;                             /* status of matrix transpose */
+
+
+#ifdef ARM_MATH_MATRIX_CHECK
+
+  /* Check for matrix mismatch condition */
+  if((pSrc->numRows != pDst->numCols) || (pSrc->numCols != pDst->numRows))
+  {
+    /* Set status as ARM_MATH_SIZE_MISMATCH */
+    status = ARM_MATH_SIZE_MISMATCH;
+  }
+  else
+#endif /*    #ifdef ARM_MATH_MATRIX_CHECK    */
+
+  {
+    /* Matrix transpose by exchanging the rows with columns */
+    /* row loop     */
+    do
+    {
+      /* The pointer px is set to starting address of the column being processed */
+      px = pOut + i;
+
+      /* Initialize column loop counter */
+      col = nColumns;
+
+      while(col > 0u)
+      {
+        /* Read and store the input element in the destination */
+        *px = *pIn++;
+
+        /* Update the pointer px to point to the next row of the transposed matrix */
+        px += nRows;
+
+        /* Decrement the column loop counter */
+        col--;
+      }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+      i++;
+
+      /* Decrement the row loop counter */
+      row--;
+
+    }
+    while(row > 0u);            /* row loop end */
+
+    /* set status as ARM_MATH_SUCCESS */
+    status = ARM_MATH_SUCCESS;
+  }
+
+  /* Return to application */
+  return (status);
+}
+
+/**    
+ * @} end of MatrixTrans group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,186 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_max_f32.c    
+*    
+* Description:	Maximum value of a floating-point vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup Max Maximum    
+ *    
+ * Computes the maximum value of an array of data.     
+ * The function returns both the maximum value and its position within the array.     
+ * There are separate functions for floating-point, Q31, Q15, and Q7 data types.    
+ */
+
+/**    
+ * @addtogroup Max    
+ * @{    
+ */
+
+
+/**    
+ * @brief Maximum value of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult maximum value returned here    
+ * @param[out]      *pIndex index of maximum value returned here    
+ * @return none.    
+ */
+
+void arm_max_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t maxVal1, maxVal2, out;               /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 1u;
+    }
+
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 2u;
+    }
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  float32_t maxVal1, out;                        /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  blkCnt = (blockSize - 1u);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and it's index */
+      out = maxVal1;
+      outIndex = blockSize - blkCnt;
+    }
+
+
+    /* Decrement the loop counter */
+    blkCnt--;
+
+  }
+
+  /* Store the maximum value and it's index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+}
+
+/**    
+ * @} end of Max group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,176 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_max_q15.c    
+*    
+* Description:	Maximum value of a Q15 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup Max    
+ * @{    
+ */
+
+
+/**    
+ * @brief Maximum value of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult maximum value returned here    
+ * @param[out]      *pIndex index of maximum value returned here    
+ * @return none.    
+ */
+
+void arm_max_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q15_t maxVal1, maxVal2, out;                   /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 1u;
+    }
+
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 2u;
+    }
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q15_t maxVal1, out;                            /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  blkCnt = (blockSize - 1u);
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and it's index */
+      out = maxVal1;
+      outIndex = blockSize - blkCnt;
+    }
+    /* Decrement the loop counter */
+    blkCnt--;
+
+  }
+
+  /* Store the maximum value and its index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+}
+
+/**    
+ * @} end of Max group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,177 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_max_q31.c    
+*    
+* Description:	Maximum value of a Q31 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup Max    
+ * @{    
+ */
+
+
+/**    
+ * @brief Maximum value of a Q31 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult maximum value returned here    
+ * @param[out]      *pIndex index of maximum value returned here    
+ * @return none.    
+ */
+
+void arm_max_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t maxVal1, maxVal2, out;                   /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 1u;
+    }
+
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 2u;
+    }
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q31_t maxVal1, out;                            /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  blkCnt = (blockSize - 1u);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and it's index */
+      out = maxVal1;
+      outIndex = blockSize - blkCnt;
+    }
+
+    /* Decrement the loop counter */
+    blkCnt--;
+
+  }
+
+  /* Store the maximum value and its index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+}
+
+/**    
+ * @} end of Max group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_max_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,177 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_max_q7.c    
+*    
+* Description:	Maximum value of a Q7 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup Max    
+ * @{    
+ */
+
+
+/**    
+ * @brief Maximum value of a Q7 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult maximum value returned here    
+ * @param[out]      *pIndex index of maximum value returned here    
+  * @return none.    
+ */
+
+void arm_max_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q7_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t maxVal1, maxVal2, out;                    /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 1u;
+    }
+
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 2u;
+    }
+
+    maxVal2 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the maximum value */
+    if(out < maxVal2)
+    {
+      /* Update the maximum value and its index */
+      out = maxVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q7_t maxVal1, out;                             /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  blkCnt = (blockSize - 1u);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* Initialize maxVal to the next consecutive values one by one */
+    maxVal1 = *pSrc++;
+
+    /* compare for the maximum value */
+    if(out < maxVal1)
+    {
+      /* Update the maximum value and it's index */
+      out = maxVal1;
+      outIndex = blockSize - blkCnt;
+    }
+    /* Decrement the loop counter */
+    blkCnt--;
+
+  }
+
+  /* Store the maximum value and its index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+
+}
+
+/**    
+ * @} end of Max group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,139 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_mean_f32.c    
+*    
+* Description:	Mean value of a floating-point vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup mean Mean    
+ *    
+ * Calculates the mean of the input vector. Mean is defined as the average of the elements in the vector.    
+ * The underlying algorithm is used:    
+ *    
+ * <pre>    
+ * 	Result = (pSrc[0] + pSrc[1] + pSrc[2] + ... + pSrc[blockSize-1]) / blockSize;    
+ * </pre>    
+ *    
+ * There are separate functions for floating-point, Q31, Q15, and Q7 data types.    
+ */
+
+/**    
+ * @addtogroup mean    
+ * @{    
+ */
+
+
+/**    
+ * @brief Mean value of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult mean value returned here    
+ * @return none.    
+ */
+
+
+void arm_mean_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult)
+{
+  float32_t sum = 0.0f;                          /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    sum += in1;
+    sum += in2;
+    sum += in3;
+    sum += in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    sum += *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) / blockSize  */
+  /* Store the result to the destination */
+  *pResult = sum / (float32_t) blockSize;
+}
+
+/**    
+ * @} end of mean group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,133 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_mean_q15.c    
+*    
+* Description:	Mean value of a Q15 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup mean    
+ * @{    
+ */
+
+/**    
+ * @brief Mean value of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult mean value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.    
+ * The input is represented in 1.15 format and is accumulated in a 32-bit     
+ * accumulator in 17.15 format.     
+ * There is no risk of internal overflow with this approach, and the     
+ * full precision of intermediate result is preserved.     
+ * Finally, the accumulator is saturated and truncated to yield a result of 1.15 format.    
+ *    
+ */
+
+
+void arm_mean_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult)
+{
+  q31_t sum = 0;                                 /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    in = *__SIMD32(pSrc)++;
+    sum += ((in << 16) >> 16);
+    sum += (in >> 16);
+    in = *__SIMD32(pSrc)++;
+    sum += ((in << 16) >> 16);
+    sum += (in >> 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    sum += *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) / blockSize  */
+  /* Store the result to the destination */
+  *pResult = (q15_t) (sum / (q31_t)blockSize);
+}
+
+/**    
+ * @} end of mean group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,136 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_mean_q31.c    
+*    
+* Description:	Mean value of a Q31 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup mean    
+ * @{    
+ */
+
+/**    
+ * @brief Mean value of a Q31 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult mean value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *\par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * The input is represented in 1.31 format and is accumulated in a 64-bit    
+ * accumulator in 33.31 format.    
+ * There is no risk of internal overflow with this approach, and the     
+ * full precision of intermediate result is preserved.     
+ * Finally, the accumulator is truncated to yield a result of 1.31 format.    
+ *    
+ */
+
+
+void arm_mean_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult)
+{
+  q63_t sum = 0;                                 /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    sum += in1;
+    sum += in2;
+    sum += in3;
+    sum += in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    sum += *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) / blockSize  */
+  /* Store the result to the destination */
+  *pResult = (q31_t) (sum / (int32_t) blockSize);
+}
+
+/**    
+ * @} end of mean group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_mean_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,133 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_mean_q7.c    
+*    
+* Description:	Mean value of a Q7 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup mean    
+ * @{    
+ */
+
+/**    
+ * @brief Mean value of a Q7 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult mean value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.     
+ * The input is represented in 1.7 format and is accumulated in a 32-bit    
+ * accumulator in 25.7 format.    
+ * There is no risk of internal overflow with this approach, and the     
+ * full precision of intermediate result is preserved.     
+ * Finally, the accumulator is truncated to yield a result of 1.7 format.    
+ *    
+ */
+
+
+void arm_mean_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q7_t * pResult)
+{
+  q31_t sum = 0;                                 /* Temporary result storage */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    in = *__SIMD32(pSrc)++;
+
+    sum += ((in << 24) >> 24);
+    sum += ((in << 16) >> 24);
+    sum += ((in << 8) >> 24);
+    sum += (in >> 24);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    sum += *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) / blockSize  */
+  /* Store the result to the destination */
+  *pResult = (q7_t) (sum / (int32_t) blockSize);
+}
+
+/**    
+ * @} end of mean group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,183 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_min_f32.c    
+*    
+* Description:	Minimum value of a floating-point vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup Min Minimum    
+ *    
+ * Computes the minimum value of an array of data.     
+ * The function returns both the minimum value and its position within the array.     
+ * There are separate functions for floating-point, Q31, Q15, and Q7 data types.    
+ */
+
+/**    
+ * @addtogroup Min    
+ * @{    
+ */
+
+
+/**    
+ * @brief Minimum value of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult minimum value returned here    
+ * @param[out]      *pIndex index of minimum value returned here    
+  * @return none.    
+ *    
+ */
+
+void arm_min_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t minVal1, minVal2, out;               /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 1u;
+    }
+
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 2u;
+    }
+
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u ) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  float32_t minVal1, out;                        /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  blkCnt = (blockSize - 1u);
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and it's index */
+      out = minVal1;
+      outIndex = blockSize - blkCnt;
+    }
+
+    blkCnt--;
+
+  }
+
+  /* Store the minimum value and it's index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+}
+
+/**    
+ * @} end of Min group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,177 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_min_q15.c    
+*    
+* Description:	Minimum value of a Q15 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+
+/**    
+ * @addtogroup Min    
+ * @{    
+ */
+
+
+/**    
+ * @brief Minimum value of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult minimum value returned here    
+ * @param[out]      *pIndex index of minimum value returned here    
+ * @return none.    
+ *    
+ */
+
+void arm_min_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q15_t minVal1, minVal2, out;                   /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 1u;
+    }
+
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 2u;
+    }
+
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u ) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q15_t minVal1, out;                            /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  blkCnt = (blockSize - 1u);
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and it's index */
+      out = minVal1;
+      outIndex = blockSize - blkCnt;
+    }
+
+    blkCnt--;
+
+  }
+
+
+
+  /* Store the minimum value and its index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+}
+
+/**    
+ * @} end of Min group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,176 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_min_q31.c    
+*    
+* Description:	Minimum value of a Q31 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+
+/**    
+ * @addtogroup Min    
+ * @{    
+ */
+
+
+/**    
+ * @brief Minimum value of a Q31 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult minimum value returned here    
+ * @param[out]      *pIndex index of minimum value returned here    
+ * @return none.    
+ *    
+ */
+
+void arm_min_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t minVal1, minVal2, out;                   /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 1u;
+    }
+
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 2u;
+    }
+
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u ) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q31_t minVal1, out;                            /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  blkCnt = (blockSize - 1u);
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and it's index */
+      out = minVal1;
+      outIndex = blockSize - blkCnt;
+    }
+
+    blkCnt--;
+
+  }
+
+  /* Store the minimum value and its index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+}
+
+/**    
+ * @} end of Min group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_min_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,178 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_min_q7.c    
+*    
+* Description:	Minimum value of a Q7 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup Min    
+ * @{    
+ */
+
+
+/**    
+ * @brief Minimum value of a Q7 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult minimum value returned here    
+ * @param[out]      *pIndex index of minimum value returned here    
+ * @return none.    
+ *    
+ */
+
+void arm_min_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q7_t * pResult,
+  uint32_t * pIndex)
+{
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q7_t minVal1, minVal2, out;                    /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex, count;              /* loop counter */
+
+  /* Initialise the count value. */
+  count = 0u;
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  /* Loop unrolling */
+  blkCnt = (blockSize - 1u) >> 2u;
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 1u;
+    }
+
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 2u;
+    }
+
+    minVal2 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and its index */
+      out = minVal1;
+      outIndex = count + 3u;
+    }
+
+    /* compare for the minimum value */
+    if(out > minVal2)
+    {
+      /* Update the minimum value and its index */
+      out = minVal2;
+      outIndex = count + 4u;
+    }
+
+    count += 4u;
+
+    blkCnt--;
+  }
+
+  /* if (blockSize - 1u ) is not multiple of 4 */
+  blkCnt = (blockSize - 1u) % 4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q7_t minVal1, out;                             /* Temporary variables to store the output value. */
+  uint32_t blkCnt, outIndex;                     /* loop counter */
+
+  /* Initialise the index value to zero. */
+  outIndex = 0u;
+  /* Load first input value that act as reference value for comparision */
+  out = *pSrc++;
+
+  blkCnt = (blockSize - 1u);
+
+#endif //      #ifndef ARM_MATH_CM0_FAMILY
+
+  while(blkCnt > 0)
+  {
+    /* Initialize minVal to the next consecutive values one by one */
+    minVal1 = *pSrc++;
+
+    /* compare for the minimum value */
+    if(out > minVal1)
+    {
+      /* Update the minimum value and it's index */
+      out = minVal1;
+      outIndex = blockSize - blkCnt;
+    }
+
+    blkCnt--;
+
+  }
+
+  /* Store the minimum value and its index into destination pointers */
+  *pResult = out;
+  *pIndex = outIndex;
+
+
+}
+
+/**    
+ * @} end of Min group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,143 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_power_f32.c    
+*    
+* Description:	Sum of the squares of the elements of a floating-point vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup power Power    
+ *    
+ * Calculates the sum of the squares of the elements in the input vector.    
+ * The underlying algorithm is used:    
+ *    
+ * <pre>    
+ * 	Result = pSrc[0] * pSrc[0] + pSrc[1] * pSrc[1] + pSrc[2] * pSrc[2] + ... + pSrc[blockSize-1] * pSrc[blockSize-1];    
+ * </pre>    
+ *   
+ * There are separate functions for floating point, Q31, Q15, and Q7 data types.     
+ */
+
+/**    
+ * @addtogroup power    
+ * @{    
+ */
+
+
+/**    
+ * @brief Sum of the squares of the elements of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult sum of the squares value returned here    
+ * @return none.    
+ *    
+ */
+
+
+void arm_power_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult)
+{
+  float32_t sum = 0.0f;                          /* accumulator */
+  float32_t in;                                  /* Temporary variable to store input value */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += in * in;
+    in = *pSrc++;
+    sum += in * in;
+    in = *pSrc++;
+    sum += in * in;
+    in = *pSrc++;
+    sum += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* compute power and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Store the result to the destination */
+  *pResult = sum;
+}
+
+/**    
+ * @} end of power group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,152 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_power_q15.c    
+*    
+* Description:	Sum of the squares of the elements of a Q15 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup power    
+ * @{    
+ */
+
+/**    
+ * @brief Sum of the squares of the elements of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult sum of the squares value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.     
+ * The input is represented in 1.15 format.   
+ * Intermediate multiplication yields a 2.30 format, and this    
+ * result is added without saturation to a 64-bit accumulator in 34.30 format.    
+ * With 33 guard bits in the accumulator, there is no risk of overflow, and the    
+ * full precision of the intermediate multiplication is preserved.    
+ * Finally, the return result is in 34.30 format.     
+ *    
+ */
+
+void arm_power_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q63_t * pResult)
+{
+  q63_t sum = 0;                                 /* Temporary result storage */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in32;                                    /* Temporary variable to store input value */
+  q15_t in16;                                    /* Temporary variable to store input value */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power and then store the result in a temporary variable, sum. */
+    in32 = *__SIMD32(pSrc)++;
+    sum = __SMLALD(in32, in32, sum);
+    in32 = *__SIMD32(pSrc)++;
+    sum = __SMLALD(in32, in32, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power and then store the result in a temporary variable, sum. */
+    in16 = *pSrc++;
+    sum = __SMLALD(in16, in16, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q15_t in;                                      /* Temporary variable to store input value */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += ((q31_t) in * in);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Store the results in 34.30 format  */
+  *pResult = sum;
+}
+
+/**    
+ * @} end of power group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,143 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_power_q31.c    
+*    
+* Description:	Sum of the squares of the elements of a Q31 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup power    
+ * @{    
+ */
+
+/**    
+ * @brief Sum of the squares of the elements of a Q31 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult sum of the squares value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * The input is represented in 1.31 format.    
+ * Intermediate multiplication yields a 2.62 format, and this    
+ * result is truncated to 2.48 format by discarding the lower 14 bits.    
+ * The 2.48 result is then added without saturation to a 64-bit accumulator in 16.48 format.    
+ * With 15 guard bits in the accumulator, there is no risk of overflow, and the    
+ * full precision of the intermediate multiplication is preserved.    
+ * Finally, the return result is in 16.48 format.     
+ *    
+ */
+
+void arm_power_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q63_t * pResult)
+{
+  q63_t sum = 0;                                 /* Temporary result storage */
+  q31_t in;
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power then shift intermediate results by 14 bits to maintain 16.48 format and then store the result in a temporary variable sum, providing 15 guard bits. */
+    in = *pSrc++;
+    sum += ((q63_t) in * in) >> 14u;
+
+    in = *pSrc++;
+    sum += ((q63_t) in * in) >> 14u;
+
+    in = *pSrc++;
+    sum += ((q63_t) in * in) >> 14u;
+
+    in = *pSrc++;
+    sum += ((q63_t) in * in) >> 14u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += ((q63_t) in * in) >> 14u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Store the results in 16.48 format  */
+  *pResult = sum;
+}
+
+/**    
+ * @} end of power group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_power_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,141 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_power_q7.c    
+*    
+* Description:	Sum of the squares of the elements of a Q7 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup power    
+ * @{    
+ */
+
+/**    
+ * @brief Sum of the squares of the elements of a Q7 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult sum of the squares value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 32-bit internal accumulator.     
+ * The input is represented in 1.7 format.   
+ * Intermediate multiplication yields a 2.14 format, and this    
+ * result is added without saturation to an accumulator in 18.14 format.    
+ * With 17 guard bits in the accumulator, there is no risk of overflow, and the    
+ * full precision of the intermediate multiplication is preserved.    
+ * Finally, the return result is in 18.14 format.     
+ *    
+ */
+
+void arm_power_q7(
+  q7_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult)
+{
+  q31_t sum = 0;                                 /* Temporary result storage */
+  q7_t in;                                       /* Temporary variable to store input */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t input1;                                  /* Temporary variable to store packed input */
+  q31_t in1, in2;                                /* Temporary variables to store input */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* Reading two inputs of pSrc vector and packing */
+    input1 = *__SIMD32(pSrc)++;
+
+    in1 = __SXTB16(__ROR(input1, 8));
+    in2 = __SXTB16(input1);
+
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* calculate power and accumulate to accumulator */
+    sum = __SMLAD(in1, in1, sum);
+    sum = __SMLAD(in2, in2, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute Power and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += ((q15_t) in * in);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Store the result in 18.14 format  */
+  *pResult = sum;
+}
+
+/**    
+ * @} end of power group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,141 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_rms_f32.c    
+*    
+* Description:	Root mean square value of an array of F32 type    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup RMS Root mean square (RMS)    
+ *    
+ *     
+ * Calculates the Root Mean Sqaure of the elements in the input vector.    
+ * The underlying algorithm is used:    
+ *    
+ * <pre>    
+ * 	Result = sqrt(((pSrc[0] * pSrc[0] + pSrc[1] * pSrc[1] + ... + pSrc[blockSize-1] * pSrc[blockSize-1]) / blockSize));    
+ * </pre>    
+ *   
+ * There are separate functions for floating point, Q31, and Q15 data types.     
+ */
+
+/**    
+ * @addtogroup RMS    
+ * @{    
+ */
+
+
+/**    
+ * @brief Root Mean Square of the elements of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult rms value returned here    
+ * @return none.    
+ *    
+ */
+
+void arm_rms_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult)
+{
+  float32_t sum = 0.0f;                          /* Accumulator */
+  float32_t in;                                  /* Tempoprary variable to store input value */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute sum of the squares and then store the result in a temporary variable, sum  */
+    in = *pSrc++;
+    sum += in * in;
+    in = *pSrc++;
+    sum += in * in;
+    in = *pSrc++;
+    sum += in * in;
+    in = *pSrc++;
+    sum += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute sum of the squares and then store the results in a temporary variable, sum  */
+    in = *pSrc++;
+    sum += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Rms and store the result in the destination */
+  arm_sqrt_f32(sum / (float32_t) blockSize, pResult);
+}
+
+/**    
+ * @} end of RMS group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,153 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_rms_q15.c    
+*    
+* Description:	Root Mean Square of the elements of a Q15 vector.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @addtogroup RMS    
+ * @{    
+ */
+
+/**    
+ * @brief Root Mean Square of the elements of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult rms value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * The input is represented in 1.15 format.    
+ * Intermediate multiplication yields a 2.30 format, and this    
+ * result is added without saturation to a 64-bit accumulator in 34.30 format.    
+ * With 33 guard bits in the accumulator, there is no risk of overflow, and the    
+ * full precision of the intermediate multiplication is preserved.    
+ * Finally, the 34.30 result is truncated to 34.15 format by discarding the lower     
+ * 15 bits, and then saturated to yield a result in 1.15 format.    
+ *    
+ */
+
+void arm_rms_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult)
+{
+  q63_t sum = 0;                                 /* accumulator */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in;                                      /* temporary variable to store the input value */
+  q15_t in1;                                     /* temporary variable to store the input value */
+  uint32_t blkCnt;                               /* loop counter */
+
+  /* loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute sum of the squares and then store the results in a temporary variable, sum */
+    in = *__SIMD32(pSrc)++;
+    sum = __SMLALD(in, in, sum);
+    in = *__SIMD32(pSrc)++;
+    sum = __SMLALD(in, in, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute sum of the squares and then store the results in a temporary variable, sum */
+    in1 = *pSrc++;
+    sum = __SMLALD(in1, in1, sum);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Truncating and saturating the accumulator to 1.15 format */
+  /* Store the result in the destination */
+  arm_sqrt_q15(__SSAT((sum / (q63_t)blockSize) >> 15, 16), pResult);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  q15_t in;                                      /* temporary variable to store the input value */
+  uint32_t blkCnt;                               /* loop counter */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute sum of the squares and then store the results in a temporary variable, sum */
+    in = *pSrc++;
+    sum += ((q31_t) in * in);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Truncating and saturating the accumulator to 1.15 format */
+  /* Store the result in the destination */
+  arm_sqrt_q15(__SSAT((sum / (q63_t)blockSize) >> 15, 16), pResult);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of RMS group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_rms_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,150 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_rms_q31.c    
+*    
+* Description:	Root Mean Square of the elements of a Q31 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @addtogroup RMS        
+ * @{        
+ */
+
+
+/**        
+ * @brief Root Mean Square of the elements of a Q31 vector.        
+ * @param[in]       *pSrc points to the input vector        
+ * @param[in]       blockSize length of the input vector        
+ * @param[out]      *pResult rms value returned here        
+ * @return none.        
+ *        
+ * @details        
+ * <b>Scaling and Overflow Behavior:</b>        
+ *        
+ *\par        
+ * The function is implemented using an internal 64-bit accumulator.        
+ * The input is represented in 1.31 format, and intermediate multiplication        
+ * yields a 2.62 format.        
+ * The accumulator maintains full precision of the intermediate multiplication results,         
+ * but provides only a single guard bit.        
+ * There is no saturation on intermediate additions.        
+ * If the accumulator overflows, it wraps around and distorts the result.         
+ * In order to avoid overflows completely, the input signal must be scaled down by         
+ * log2(blockSize) bits, as a total of blockSize additions are performed internally.         
+ * Finally, the 2.62 accumulator is right shifted by 31 bits to yield a 1.31 format value.        
+ *        
+ */
+
+void arm_rms_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult)
+{
+  q63_t sum = 0;                                 /* accumulator */
+  q31_t in;                                      /* Temporary variable to store the input */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in1, in2, in3, in4;                      /* Temporary input variables */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 8 outputs at a time.        
+   ** a second loop below computes the remaining 1 to 7 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute sum of the squares and then store the result in a temporary variable, sum */
+    /* read two samples from source buffer */
+    in1 = pSrc[0];
+    in2 = pSrc[1];
+
+    /* calculate power and accumulate to accumulator */
+    sum += (q63_t) in1 *in1;
+    sum += (q63_t) in2 *in2;
+
+    /* read two samples from source buffer */
+    in3 = pSrc[2];
+    in4 = pSrc[3];
+
+    /* calculate power and accumulate to accumulator */
+    sum += (q63_t) in3 *in3;
+    sum += (q63_t) in4 *in4;
+
+
+    /* update source buffer to process next samples */
+    pSrc += 4u;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 8, compute any remaining output samples here.        
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A[0] * A[0] + A[1] * A[1] + A[2] * A[2] + ... + A[blockSize-1] * A[blockSize-1] */
+    /* Compute sum of the squares and then store the results in a temporary variable, sum */
+    in = *pSrc++;
+    sum += (q63_t) in *in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Convert data in 2.62 to 1.31 by 31 right shifts and saturate */
+  /* Compute Rms and store the result in the destination vector */
+  arm_sqrt_q31(clip_q63_to_q31((sum / (q63_t) blockSize) >> 31), pResult);
+}
+
+/**        
+ * @} end of RMS group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,208 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_std_f32.c    
+*    
+* Description:	Standard deviation of the elements of a floating-point vector.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup STD Standard deviation    
+ *    
+ * Calculates the standard deviation of the elements in the input vector.     
+ * The underlying algorithm is used:    
+ *   
+ * <pre>    
+ * 	Result = sqrt((sumOfSquares - sum<sup>2</sup> / blockSize) / (blockSize - 1))   
+ *   
+ *	   where, sumOfSquares = pSrc[0] * pSrc[0] + pSrc[1] * pSrc[1] + ... + pSrc[blockSize-1] * pSrc[blockSize-1]   
+ *   
+ *	                   sum = pSrc[0] + pSrc[1] + pSrc[2] + ... + pSrc[blockSize-1]   
+ * </pre>   
+ *    
+ * There are separate functions for floating point, Q31, and Q15 data types.    
+ */
+
+/**    
+ * @addtogroup STD    
+ * @{    
+ */
+
+
+/**    
+ * @brief Standard deviation of the elements of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult standard deviation value returned here    
+ * @return none.    
+ *    
+ */
+
+
+void arm_std_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult)
+{
+  float32_t sum = 0.0f;                          /* Temporary result storage */
+  float32_t sumOfSquares = 0.0f;                 /* Sum of squares */
+  float32_t in;                                  /* input value */
+  uint32_t blkCnt;                               /* loop counter */
+   
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t meanOfSquares, mean, squareOfMean;
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1])  */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples    
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = sumOfSquares / ((float32_t) blockSize - 1.0f);
+
+  /* Compute mean of all input values */
+  mean = sum / (float32_t) blockSize;
+
+  /* Compute square of mean */
+  squareOfMean = (mean * mean) * (((float32_t) blockSize) /
+                                  ((float32_t) blockSize - 1.0f));
+
+  /* Compute standard deviation and then store the result to the destination */
+  arm_sqrt_f32((meanOfSquares - squareOfMean), pResult);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  float32_t squareOfSum;                         /* Square of Sum */
+  float32_t var;                                 /* Temporary varaince storage */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples     
+     * and then store the result in a temporary variable, sumOfSquares. */
+    in = *pSrc++;
+    sumOfSquares += in * in;
+
+    /* C = (A[0] + A[1] + ... + A[blockSize-1]) */
+    /* Compute Sum of the input samples     
+     * and then store the result in a temporary variable, sum. */
+    sum += in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute the square of sum */
+  squareOfSum = ((sum * sum) / (float32_t) blockSize);
+
+  /* Compute the variance */
+  var = ((sumOfSquares - squareOfSum) / (float32_t) (blockSize - 1.0f));
+
+  /* Compute standard deviation and then store the result to the destination */
+  arm_sqrt_f32(var, pResult);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of STD group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,195 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_std_q15.c    
+*    
+* Description:	Standard deviation of an array of Q15 type.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup STD    
+ * @{    
+ */
+
+/**    
+ * @brief Standard deviation of the elements of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult standard deviation value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * The input is represented in 1.15 format.   
+ * Intermediate multiplication yields a 2.30 format, and this    
+ * result is added without saturation to a 64-bit accumulator in 34.30 format.    
+ * With 33 guard bits in the accumulator, there is no risk of overflow, and the    
+ * full precision of the intermediate multiplication is preserved.    
+ * Finally, the 34.30 result is truncated to 34.15 format by discarding the lower     
+ * 15 bits, and then saturated to yield a result in 1.15 format.    
+ */
+
+void arm_std_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult)
+{
+  q31_t sum = 0;                                 /* Accumulator */
+  q31_t meanOfSquares, squareOfMean;             /* square of mean and mean of square */
+  uint32_t blkCnt;                               /* loop counter */
+  q63_t sumOfSquares = 0;                        /* Accumulator */
+   
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in;                                      /* input value */
+  q15_t in1;                                     /* input value */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1])  */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *__SIMD32(pSrc)++;
+    sum += ((in << 16) >> 16);
+    sum += (in >> 16);
+    sumOfSquares = __SMLALD(in, in, sumOfSquares);
+    in = *__SIMD32(pSrc)++;
+    sum += ((in << 16) >> 16);
+    sum += (in >> 16);
+    sumOfSquares = __SMLALD(in, in, sumOfSquares);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in1 = *pSrc++;
+    sumOfSquares = __SMLALD(in1, in1, sumOfSquares);
+    sum += in1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples    
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = (q31_t)(sumOfSquares / (q63_t)(blockSize - 1));
+
+  /* Compute square of mean */
+  squareOfMean = (q31_t) ((q63_t)sum * sum / (q63_t)(blockSize * (blockSize - 1)));
+
+  /* mean of the squares minus the square of the mean. */
+  /* Compute standard deviation and store the result to the destination */
+  arm_sqrt_q15(__SSAT((meanOfSquares - squareOfMean) >> 15, 16u), pResult);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q15_t in;                                      /* input value */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples     
+     * and then store the result in a temporary variable, sumOfSquares. */
+    in = *pSrc++;
+    sumOfSquares += (in * in);
+
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    /* Compute sum of all input values and then store the result in a temporary variable, sum. */
+    sum += in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples     
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = (q31_t)(sumOfSquares / (q63_t)(blockSize - 1));
+
+  /* Compute square of mean */
+  squareOfMean = (q31_t) ((q63_t)sum * sum / (q63_t)(blockSize * (blockSize - 1)));
+
+  /* mean of the squares minus the square of the mean. */
+  /* Compute standard deviation and store the result to the destination */
+  arm_sqrt_q15(__SSAT((meanOfSquares - squareOfMean) >> 15, 16u), pResult);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+}
+
+/**    
+ * @} end of STD group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_std_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,186 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_std_q31.c    
+*    
+* Description:	Standard deviation of an array of Q31 type.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup STD    
+ * @{    
+ */
+
+
+/**    
+ * @brief Standard deviation of the elements of a Q31 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult standard deviation value returned here    
+ * @return none.    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ *\par    
+ * The function is implemented using an internal 64-bit accumulator.        
+ * The input is represented in 1.31 format, which is then downshifted by 8 bits
+ * which yields 1.23, and intermediate multiplication yields a 2.46 format.        
+ * The accumulator maintains full precision of the intermediate multiplication results,         
+ * but provides only a 16 guard bits.        
+ * There is no saturation on intermediate additions.        
+ * If the accumulator overflows it wraps around and distorts the result.        
+ * In order to avoid overflows completely the input signal must be scaled down by         
+ * log2(blockSize)-8 bits, as a total of blockSize additions are performed internally.  
+ * After division, internal variables should be Q18.46 
+ * Finally, the 18.46 accumulator is right shifted by 15 bits to yield a 1.31 format value. 
+ *    
+ */
+
+
+void arm_std_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult)
+{
+  q63_t sum = 0;                                 /* Accumulator */
+  q63_t meanOfSquares, squareOfMean;             /* square of mean and mean of square */
+  q31_t in;                                      /* input value */
+  uint32_t blkCnt;                               /* loop counter */
+  q63_t sumOfSquares = 0;                        /* Accumulator */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+   
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1])  */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples    
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = sumOfSquares / (q63_t)(blockSize - 1);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples     
+     * and then store the result in a temporary variable, sumOfSquares. */
+    in = *pSrc++ >> 8;
+    sumOfSquares += ((q63_t) (in) * (in));
+
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    /* Compute sum of all input values and then store the result in a temporary variable, sum. */
+    sum += in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples     
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = sumOfSquares / (q63_t)(blockSize - 1);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Compute square of mean */
+  squareOfMean = sum * sum / (q63_t)(blockSize * (blockSize - 1u));
+
+  /* Compute standard deviation and then store the result to the destination */
+  arm_sqrt_q31((meanOfSquares - squareOfMean) >> 15, pResult);
+
+}
+
+/**    
+ * @} end of STD group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,204 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_var_f32.c    
+*    
+* Description:	Variance of the elements of a floating-point vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @defgroup variance  Variance    
+ *    
+ * Calculates the variance of the elements in the input vector.    
+ * The underlying algorithm is used:    
+ *    
+ * <pre>    
+ * 	Result = (sumOfSquares - sum<sup>2</sup> / blockSize) / (blockSize - 1)   
+ *   
+ *	   where, sumOfSquares = pSrc[0] * pSrc[0] + pSrc[1] * pSrc[1] + ... + pSrc[blockSize-1] * pSrc[blockSize-1]   
+ *   
+ *	                   sum = pSrc[0] + pSrc[1] + pSrc[2] + ... + pSrc[blockSize-1]   
+ * </pre>   
+ *    
+ * There are separate functions for floating point, Q31, and Q15 data types.    
+ */
+
+/**    
+ * @addtogroup variance    
+ * @{    
+ */
+
+
+/**    
+ * @brief Variance of the elements of a floating-point vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult variance value returned here    
+ * @return none.    
+ *    
+ */
+
+
+void arm_var_f32(
+  float32_t * pSrc,
+  uint32_t blockSize,
+  float32_t * pResult)
+{
+
+  float32_t sum = 0.0f;                          /* Temporary result storage */
+  float32_t sumOfSquares = 0.0f;                 /* Sum of squares */
+  float32_t in;                                  /* input value */
+  uint32_t blkCnt;                               /* loop counter */
+  
+#ifndef ARM_MATH_CM0_FAMILY
+   
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  float32_t meanOfSquares, mean, squareOfMean;   /* Temporary variables */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1])  */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++;
+    sum += in;
+    sumOfSquares += in * in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples    
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = sumOfSquares / ((float32_t) blockSize - 1.0f);
+
+  /* Compute mean of all input values */
+  mean = sum / (float32_t) blockSize;
+
+  /* Compute square of mean */
+  squareOfMean = (mean * mean) * (((float32_t) blockSize) /
+                                  ((float32_t) blockSize - 1.0f));
+
+  /* Compute variance and then store the result to the destination */
+  *pResult = meanOfSquares - squareOfMean;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  float32_t squareOfSum;                         /* Square of Sum */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples     
+     * and then store the result in a temporary variable, sumOfSquares. */
+    in = *pSrc++;
+    sumOfSquares += in * in;
+
+    /* C = (A[0] + A[1] + ... + A[blockSize-1]) */
+    /* Compute Sum of the input samples     
+     * and then store the result in a temporary variable, sum. */
+    sum += in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute the square of sum */
+  squareOfSum = ((sum * sum) / (float32_t) blockSize);
+
+  /* Compute the variance */
+  *pResult = ((sumOfSquares - squareOfSum) / (float32_t) (blockSize - 1.0f));
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of variance group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,195 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_var_q15.c    
+*    
+* Description:	Variance of an array of Q15 type.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupStats    
+ */
+
+/**    
+ * @addtogroup variance    
+ * @{    
+ */
+
+/**    
+ * @brief Variance of the elements of a Q15 vector.    
+ * @param[in]       *pSrc points to the input vector    
+ * @param[in]       blockSize length of the input vector    
+ * @param[out]      *pResult variance value returned here    
+ * @return none.    
+ *    
+ * @details    
+ * <b>Scaling and Overflow Behavior:</b>    
+ *    
+ * \par    
+ * The function is implemented using a 64-bit internal accumulator.    
+ * The input is represented in 1.15 format.   
+ * Intermediate multiplication yields a 2.30 format, and this    
+ * result is added without saturation to a 64-bit accumulator in 34.30 format.    
+ * With 33 guard bits in the accumulator, there is no risk of overflow, and the    
+ * full precision of the intermediate multiplication is preserved.    
+ * Finally, the 34.30 result is truncated to 34.15 format by discarding the lower     
+ * 15 bits, and then saturated to yield a result in 1.15 format.    
+ *    
+ */
+
+
+void arm_var_q15(
+  q15_t * pSrc,
+  uint32_t blockSize,
+  q15_t * pResult)
+{
+
+  q31_t sum = 0;                                 /* Accumulator */
+  q31_t meanOfSquares, squareOfMean;             /* square of mean and mean of square */
+  uint32_t blkCnt;                               /* loop counter */
+  q63_t sumOfSquares = 0;                        /* Accumulator */
+   
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t in;                                      /* input value */
+  q15_t in1;                                     /* input value */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1])  */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *__SIMD32(pSrc)++;
+    sum += ((in << 16) >> 16);
+    sum += (in >> 16);
+    sumOfSquares = __SMLALD(in, in, sumOfSquares);
+    in = *__SIMD32(pSrc)++;
+    sum += ((in << 16) >> 16);
+    sum += (in >> 16);
+    sumOfSquares = __SMLALD(in, in, sumOfSquares);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in1 = *pSrc++;
+    sumOfSquares = __SMLALD(in1, in1, sumOfSquares);
+    sum += in1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples    
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = (q31_t) (sumOfSquares / (q63_t)(blockSize - 1));
+
+  /* Compute square of mean */
+  squareOfMean = (q31_t)((q63_t)sum * sum / (q63_t)(blockSize * (blockSize - 1)));
+
+  /* mean of the squares minus the square of the mean. */
+  *pResult = (meanOfSquares - squareOfMean) >> 15;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+  q15_t in;                                      /* input value */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples     
+     * and then store the result in a temporary variable, sumOfSquares. */
+    in = *pSrc++;
+    sumOfSquares += (in * in);
+
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    /* Compute sum of all input values and then store the result in a temporary variable, sum. */
+    sum += in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples     
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = (q31_t) (sumOfSquares / (q63_t)(blockSize - 1));
+
+  /* Compute square of mean */
+  squareOfMean = (q31_t)((q63_t)sum * sum / (q63_t)(blockSize * (blockSize - 1)));
+
+  /* mean of the squares minus the square of the mean. */
+  *pResult = (meanOfSquares - squareOfMean) >> 15;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of variance group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/StatisticsFunctions/arm_var_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,187 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_var_q31.c    
+*    
+* Description:	Variance of an array of Q31 type.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**        
+ * @ingroup groupStats        
+ */
+
+/**        
+ * @addtogroup variance        
+ * @{        
+ */
+
+/**        
+ * @brief Variance of the elements of a Q31 vector.        
+ * @param[in]       *pSrc points to the input vector        
+ * @param[in]       blockSize length of the input vector        
+ * @param[out]      *pResult variance value returned here        
+ * @return none.        
+ *        
+ * @details        
+ * <b>Scaling and Overflow Behavior:</b>        
+ *        
+ *\par        
+ * The function is implemented using an internal 64-bit accumulator.        
+ * The input is represented in 1.31 format, which is then downshifted by 8 bits
+ * which yields 1.23, and intermediate multiplication yields a 2.46 format.        
+ * The accumulator maintains full precision of the intermediate multiplication results,         
+ * but provides only a 16 guard bits.        
+ * There is no saturation on intermediate additions.        
+ * If the accumulator overflows it wraps around and distorts the result.        
+ * In order to avoid overflows completely the input signal must be scaled down by         
+ * log2(blockSize)-8 bits, as a total of blockSize additions are performed internally.  
+ * After division, internal variables should be Q18.46 
+ * Finally, the 18.46 accumulator is right shifted by 15 bits to yield a 1.31 format value.        
+ *        
+ */
+
+
+void arm_var_q31(
+  q31_t * pSrc,
+  uint32_t blockSize,
+  q31_t * pResult)
+{
+  q63_t sum = 0;                                 /* Accumulator */
+  q63_t meanOfSquares, squareOfMean;             /* square of mean and mean of square */
+  q31_t in;                                      /* input value */
+  uint32_t blkCnt;                               /* loop counter */
+  q63_t sumOfSquares = 0;                        /* Accumulator */
+
+	if(blockSize == 1)
+	{
+		*pResult = 0;
+		return;
+	}
+   
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1])  */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples    
+     * and then store the result in a temporary variable, sum. */
+    in = *pSrc++ >> 8;
+    sum += in;
+    sumOfSquares += ((q63_t) (in) * (in));
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples    
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = sumOfSquares / (q63_t)(blockSize - 1);
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+    /* C = (A[0] * A[0] + A[1] * A[1] + ... + A[blockSize-1] * A[blockSize-1]) */
+    /* Compute Sum of squares of the input samples     
+     * and then store the result in a temporary variable, sumOfSquares. */
+    in = *pSrc++ >> 8;
+    sumOfSquares += ((q63_t) (in) * (in));
+
+    /* C = (A[0] + A[1] + A[2] + ... + A[blockSize-1]) */
+    /* Compute sum of all input values and then store the result in a temporary variable, sum. */
+    sum += in;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* Compute Mean of squares of the input samples     
+   * and then store the result in a temporary variable, meanOfSquares. */
+  meanOfSquares = sumOfSquares / (q63_t)(blockSize - 1);
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  /* Compute square of mean */
+  squareOfMean = sum * sum / (q63_t)(blockSize * (blockSize - 1u));
+
+
+  /* Compute standard deviation and then store the result to the destination */
+  *pResult = (meanOfSquares - squareOfMean) >> 15;
+
+}
+
+/**        
+ * @} end of variance group        
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,135 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_copy_f32.c    
+*    
+* Description:	Copies the elements of a floating-point vector.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @defgroup copy Vector Copy    
+ *    
+ * Copies sample by sample from source vector to destination vector.    
+ *    
+ * <pre>    
+ * 	pDst[n] = pSrc[n];   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ * There are separate functions for floating point, Q31, Q15, and Q7 data types.     
+ */
+
+/**    
+ * @addtogroup copy    
+ * @{    
+ */
+
+/**    
+ * @brief Copies the elements of a floating-point vector.     
+ * @param[in]       *pSrc points to input vector    
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the input vector   
+ * @return none.    
+ *    
+ */
+
+
+void arm_copy_f32(
+  float32_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the results in the destination buffer */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    *pDst++ = in1;
+    *pDst++ = in2;
+    *pDst++ = in3;
+    *pDst++ = in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the results in the destination buffer */
+    *pDst++ = *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of BasicCopy group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,114 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_copy_q15.c    
+*    
+* Description:	Copies the elements of a Q15 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup copy    
+ * @{    
+ */
+/**    
+ * @brief Copies the elements of a Q15 vector.     
+ * @param[in]       *pSrc points to input vector    
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the input vector   
+ * @return none.    
+ *    
+ */
+
+void arm_copy_q15(
+  q15_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Read two inputs */
+    *__SIMD32(pDst)++ = *__SIMD32(pSrc)++;
+    *__SIMD32(pDst)++ = *__SIMD32(pSrc)++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the value in the destination buffer */
+    *pDst++ = *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of BasicCopy group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,123 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_copy_q31.c    
+*    
+* Description:	Copies the elements of a Q31 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup copy    
+ * @{    
+ */
+
+/**    
+ * @brief Copies the elements of a Q31 vector.     
+ * @param[in]       *pSrc points to input vector    
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the input vector   
+ * @return none.    
+ *    
+ */
+
+void arm_copy_q31(
+  q31_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the values in the destination buffer */
+    in1 = *pSrc++;
+    in2 = *pSrc++;
+    in3 = *pSrc++;
+    in4 = *pSrc++;
+
+    *pDst++ = in1;
+    *pDst++ = in2;
+    *pDst++ = in3;
+    *pDst++ = in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the value in the destination buffer */
+    *pDst++ = *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of BasicCopy group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_copy_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,115 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_copy_q7.c    
+*    
+* Description:	Copies the elements of a Q7 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup copy    
+ * @{    
+ */
+
+/**    
+ * @brief Copies the elements of a Q7 vector.    
+ * @param[in]       *pSrc points to input vector    
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the input vector   
+ * @return none.    
+ *    
+ */
+
+void arm_copy_q7(
+  q7_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the results in the destination buffer */
+    /* 4 samples are copied and stored at a time using SIMD */
+    *__SIMD32(pDst)++ = *__SIMD32(pSrc)++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = A */
+    /* Copy and then store the results in the destination buffer */
+    *pDst++ = *pSrc++;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of BasicCopy group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,134 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_fill_f32.c    
+*    
+* Description:	Fills a constant value into a floating-point vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @defgroup Fill Vector Fill    
+ *    
+ * Fills the destination vector with a constant value.    
+ *    
+ * <pre>    
+ * 	pDst[n] = value;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ * There are separate functions for floating point, Q31, Q15, and Q7 data types.     
+ */
+
+/**    
+ * @addtogroup Fill    
+ * @{    
+ */
+
+/**    
+ * @brief Fills a constant value into a floating-point vector.     
+ * @param[in]       value input value to be filled   
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the output vector   
+ * @return none.    
+ *    
+ */
+
+
+void arm_fill_f32(
+  float32_t value,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  float32_t in1 = value;
+  float32_t in2 = value;
+  float32_t in3 = value;
+  float32_t in4 = value;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *pDst++ = in1;
+    *pDst++ = in2;
+    *pDst++ = in3;
+    *pDst++ = in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *pDst++ = value;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of Fill group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,120 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_fill_q15.c    
+*    
+* Description:	Fills a constant value into a Q15 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup Fill    
+ * @{    
+ */
+
+/**    
+ * @brief Fills a constant value into a Q15 vector.    
+ * @param[in]       value input value to be filled   
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the output vector   
+ * @return none.    
+ *    
+ */
+
+void arm_fill_q15(
+  q15_t value,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t packedValue;                             /* value packed to 32 bits */
+
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* Packing two 16 bit values to 32 bit value in order to use SIMD */
+  packedValue = __PKHBT(value, value, 16u);
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *__SIMD32(pDst)++ = packedValue;
+    *__SIMD32(pDst)++ = packedValue;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *pDst++ = value;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of Fill group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,121 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_fill_q31.c    
+*    
+* Description:	Fills a constant value into a Q31 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup Fill    
+ * @{    
+ */
+
+/**    
+ * @brief Fills a constant value into a Q31 vector.    
+ * @param[in]       value input value to be filled   
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the output vector   
+ * @return none.    
+ *    
+ */
+
+void arm_fill_q31(
+  q31_t value,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1 = value;
+  q31_t in2 = value;
+  q31_t in3 = value;
+  q31_t in4 = value;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *pDst++ = in1;
+    *pDst++ = in2;
+    *pDst++ = in3;
+    *pDst++ = in4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *pDst++ = value;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of Fill group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_fill_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,118 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_fill_q7.c    
+*    
+* Description:	Fills a constant value into a Q7 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup Fill    
+ * @{    
+ */
+
+/**    
+ * @brief Fills a constant value into a Q7 vector.    
+ * @param[in]       value input value to be filled   
+ * @param[out]      *pDst points to output vector    
+ * @param[in]       blockSize length of the output vector   
+ * @return none.    
+ *    
+ */
+
+void arm_fill_q7(
+  q7_t value,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  q31_t packedValue;                             /* value packed to 32 bits */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* Packing four 8 bit values to 32 bit value in order to use SIMD */
+  packedValue = __PACKq7(value, value, value, value);
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *__SIMD32(pDst)++ = packedValue;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = value */
+    /* Fill the value in the destination buffer */
+    *pDst++ = value;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of Fill group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,204 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_float_to_q15.c    
+*    
+* Description:	Converts the elements of the floating-point vector to Q15 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup float_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the floating-point vector to Q15 vector.    
+ * @param[in]       *pSrc points to the floating-point input vector    
+ * @param[out]      *pDst points to the Q15 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ * \par   
+ * The equation used for the conversion process is:    
+ * <pre>    
+ * 	pDst[n] = (q15_t)(pSrc[n] * 32768);   0 <= n < blockSize.    
+ * </pre>    
+ * \par Scaling and Overflow Behavior:    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated.    
+ * \note   
+ * In order to apply rounding, the library should be rebuilt with the ROUNDING macro     
+ * defined in the preprocessor section of project options.     
+ *    
+ */
+
+
+void arm_float_to_q15(
+  float32_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pIn = pSrc;                         /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifdef ARM_MATH_ROUNDING
+
+  float32_t in;
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+    /* C = A * 32768 */
+    /* convert from float to q15 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 32768.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q15_t) (__SSAT((q31_t) (in), 16));
+
+    in = *pIn++;
+    in = (in * 32768.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q15_t) (__SSAT((q31_t) (in), 16));
+
+    in = *pIn++;
+    in = (in * 32768.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q15_t) (__SSAT((q31_t) (in), 16));
+
+    in = *pIn++;
+    in = (in * 32768.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q15_t) (__SSAT((q31_t) (in), 16));
+
+#else
+
+    /* C = A * 32768 */
+    /* convert from float to q15 and then store the results in the destination buffer */
+    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
+    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
+    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
+    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+    /* C = A * 32768 */
+    /* convert from float to q15 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 32768.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q15_t) (__SSAT((q31_t) (in), 16));
+
+#else
+
+    /* C = A * 32768 */
+    /* convert from float to q15 and then store the results in the destination buffer */
+    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+    /* C = A * 32768 */
+    /* convert from float to q15 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 32768.0f);
+    in += in > 0 ? 0.5f : -0.5f;
+    *pDst++ = (q15_t) (__SSAT((q31_t) (in), 16));
+
+#else
+
+    /* C = A * 32768 */
+    /* convert from float to q15 and then store the results in the destination buffer */
+    *pDst++ = (q15_t) __SSAT((q31_t) (*pIn++ * 32768.0f), 16);
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of float_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,211 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_float_to_q31.c    
+*    
+* Description:	Converts the elements of the floating-point vector to Q31 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @defgroup float_to_x  Convert 32-bit floating point value    
+ */
+
+/**    
+ * @addtogroup float_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the floating-point vector to Q31 vector.    
+ * @param[in]       *pSrc points to the floating-point input vector    
+ * @param[out]      *pDst points to the Q31 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ *\par Description:    
+ * \par   
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (q31_t)(pSrc[n] * 2147483648);   0 <= n < blockSize.    
+ * </pre>    
+ * <b>Scaling and Overflow Behavior:</b>    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q31 range[0x80000000 0x7FFFFFFF] will be saturated.    
+ *   
+ * \note In order to apply rounding, the library should be rebuilt with the ROUNDING macro     
+ * defined in the preprocessor section of project options.     
+ */
+
+
+void arm_float_to_q31(
+  float32_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pIn = pSrc;                         /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifdef ARM_MATH_ROUNDING
+
+  float32_t in;
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+
+    /* C = A * 32768 */
+    /* convert from float to Q31 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 2147483648.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = clip_q63_to_q31((q63_t) (in));
+
+    in = *pIn++;
+    in = (in * 2147483648.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = clip_q63_to_q31((q63_t) (in));
+
+    in = *pIn++;
+    in = (in * 2147483648.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = clip_q63_to_q31((q63_t) (in));
+
+    in = *pIn++;
+    in = (in * 2147483648.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = clip_q63_to_q31((q63_t) (in));
+
+#else
+
+    /* C = A * 2147483648 */
+    /* convert from float to Q31 and then store the results in the destination buffer */
+    *pDst++ = clip_q63_to_q31((q63_t) (*pIn++ * 2147483648.0f));
+    *pDst++ = clip_q63_to_q31((q63_t) (*pIn++ * 2147483648.0f));
+    *pDst++ = clip_q63_to_q31((q63_t) (*pIn++ * 2147483648.0f));
+    *pDst++ = clip_q63_to_q31((q63_t) (*pIn++ * 2147483648.0f));
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+
+    /* C = A * 2147483648 */
+    /* convert from float to Q31 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 2147483648.0f);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = clip_q63_to_q31((q63_t) (in));
+
+#else
+
+    /* C = A * 2147483648 */
+    /* convert from float to Q31 and then store the results in the destination buffer */
+    *pDst++ = clip_q63_to_q31((q63_t) (*pIn++ * 2147483648.0f));
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+
+    /* C = A * 2147483648 */
+    /* convert from float to Q31 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 2147483648.0f);
+    in += in > 0 ? 0.5f : -0.5f;
+    *pDst++ = clip_q63_to_q31((q63_t) (in));
+
+#else
+
+    /* C = A * 2147483648 */
+    /* convert from float to Q31 and then store the results in the destination buffer */
+    *pDst++ = clip_q63_to_q31((q63_t) (*pIn++ * 2147483648.0f));
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of float_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_float_to_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,203 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_float_to_q7.c    
+*    
+* Description:	Converts the elements of the floating-point vector to Q7 vector.   
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup float_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the floating-point vector to Q7 vector.    
+ * @param[in]       *pSrc points to the floating-point input vector    
+ * @param[out]      *pDst points to the Q7 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ *\par Description:    
+ * \par   
+ * The equation used for the conversion process is:    
+ * <pre>    
+ * 	pDst[n] = (q7_t)(pSrc[n] * 128);   0 <= n < blockSize.    
+ * </pre>    
+ * \par Scaling and Overflow Behavior:    
+ * \par    
+ * The function uses saturating arithmetic.    
+ * Results outside of the allowable Q7 range [0x80 0x7F] will be saturated.    
+ * \note   
+ * In order to apply rounding, the library should be rebuilt with the ROUNDING macro     
+ * defined in the preprocessor section of project options.     
+ */
+
+
+void arm_float_to_q7(
+  float32_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  float32_t *pIn = pSrc;                         /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifdef ARM_MATH_ROUNDING
+
+  float32_t in;
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+    /* C = A * 128 */
+    /* convert from float to q7 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 128);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q7_t) (__SSAT((q15_t) (in), 8));
+
+    in = *pIn++;
+    in = (in * 128);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q7_t) (__SSAT((q15_t) (in), 8));
+
+    in = *pIn++;
+    in = (in * 128);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q7_t) (__SSAT((q15_t) (in), 8));
+
+    in = *pIn++;
+    in = (in * 128);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q7_t) (__SSAT((q15_t) (in), 8));
+
+#else
+
+    /* C = A * 128 */
+    /* convert from float to q7 and then store the results in the destination buffer */
+    *pDst++ = __SSAT((q31_t) (*pIn++ * 128.0f), 8);
+    *pDst++ = __SSAT((q31_t) (*pIn++ * 128.0f), 8);
+    *pDst++ = __SSAT((q31_t) (*pIn++ * 128.0f), 8);
+    *pDst++ = __SSAT((q31_t) (*pIn++ * 128.0f), 8);
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+  while(blkCnt > 0u)
+  {
+
+#ifdef ARM_MATH_ROUNDING
+    /* C = A * 128 */
+    /* convert from float to q7 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 128);
+    in += in > 0.0f ? 0.5f : -0.5f;
+    *pDst++ = (q7_t) (__SSAT((q15_t) (in), 8));
+
+#else
+
+    /* C = A * 128 */
+    /* convert from float to q7 and then store the results in the destination buffer */
+    *pDst++ = __SSAT((q31_t) (*pIn++ * 128.0f), 8);
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+  while(blkCnt > 0u)
+  {
+#ifdef ARM_MATH_ROUNDING
+    /* C = A * 128 */
+    /* convert from float to q7 and then store the results in the destination buffer */
+    in = *pIn++;
+    in = (in * 128.0f);
+    in += in > 0 ? 0.5f : -0.5f;
+    *pDst++ = (q7_t) (__SSAT((q31_t) (in), 8));
+
+#else
+
+    /* C = A * 128 */
+    /* convert from float to q7 and then store the results in the destination buffer */
+    *pDst++ = (q7_t) __SSAT((q31_t) (*pIn++ * 128.0f), 8);
+
+#endif /*      #ifdef ARM_MATH_ROUNDING        */
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+}
+
+/**    
+ * @} end of float_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_float.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_float.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,134 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q15_to_float.c    
+*    
+* Description:	Converts the elements of the Q15 vector to floating-point vector.     
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @defgroup q15_to_x  Convert 16-bit Integer value    
+ */
+
+/**    
+ * @addtogroup q15_to_x    
+ * @{    
+ */
+
+
+
+
+/**    
+ * @brief  Converts the elements of the Q15 vector to floating-point vector.     
+ * @param[in]       *pSrc points to the Q15 input vector    
+ * @param[out]      *pDst points to the floating-point output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (float32_t) pSrc[n] / 32768;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ */
+
+
+void arm_q15_to_float(
+  q15_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pIn = pSrc;                             /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (float32_t) A / 32768 */
+    /* convert from q15 to float and then store the results in the destination buffer */
+    *pDst++ = ((float32_t) * pIn++ / 32768.0f);
+    *pDst++ = ((float32_t) * pIn++ / 32768.0f);
+    *pDst++ = ((float32_t) * pIn++ / 32768.0f);
+    *pDst++ = ((float32_t) * pIn++ / 32768.0f);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (float32_t) A / 32768 */
+    /* convert from q15 to float and then store the results in the destination buffer */
+    *pDst++ = ((float32_t) * pIn++ / 32768.0f);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of q15_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,156 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q15_to_q31.c    
+*    
+* Description:	Converts the elements of the Q15 vector to Q31 vector.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.     
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup q15_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the Q15 vector to Q31 vector.     
+ * @param[in]       *pSrc points to the Q15 input vector    
+ * @param[out]      *pDst points to the Q31 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:   
+ *   
+ * <pre>    
+ * 	pDst[n] = (q31_t) pSrc[n] << 16;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ */
+
+
+void arm_q15_to_q31(
+  q15_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pIn = pSrc;                             /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2;
+  q31_t out1, out2, out3, out4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (q31_t)A << 16 */
+    /* convert from q15 to q31 and then store the results in the destination buffer */
+    in1 = *__SIMD32(pIn)++;
+    in2 = *__SIMD32(pIn)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    /* extract lower 16 bits to 32 bit result */
+    out1 = in1 << 16u;
+    /* extract upper 16 bits to 32 bit result */
+    out2 = in1 & 0xFFFF0000;
+    /* extract lower 16 bits to 32 bit result */
+    out3 = in2 << 16u;
+    /* extract upper 16 bits to 32 bit result */
+    out4 = in2 & 0xFFFF0000;
+
+#else
+
+    /* extract upper 16 bits to 32 bit result */
+    out1 = in1 & 0xFFFF0000;
+    /* extract lower 16 bits to 32 bit result */
+    out2 = in1 << 16u;
+    /* extract upper 16 bits to 32 bit result */
+    out3 = in2 & 0xFFFF0000;
+    /* extract lower 16 bits to 32 bit result */
+    out4 = in2 << 16u;
+
+#endif //      #ifndef ARM_MATH_BIG_ENDIAN
+
+    *pDst++ = out1;
+    *pDst++ = out2;
+    *pDst++ = out3;
+    *pDst++ = out4;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (q31_t)A << 16 */
+    /* convert from q15 to q31 and then store the results in the destination buffer */
+    *pDst++ = (q31_t) * pIn++ << 16;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of q15_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q15_to_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,154 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q15_to_q7.c    
+*    
+* Description:	Converts the elements of the Q15 vector to Q7 vector.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup q15_to_x    
+ * @{    
+ */
+
+
+/**    
+ * @brief Converts the elements of the Q15 vector to Q7 vector.     
+ * @param[in]       *pSrc points to the Q15 input vector    
+ * @param[out]      *pDst points to the Q7 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (q7_t) pSrc[n] >> 8;   0 <= n < blockSize.    
+ * </pre>   
+ *   
+ */
+
+
+void arm_q15_to_q7(
+  q15_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  q15_t *pIn = pSrc;                             /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2;
+  q31_t out1, out2;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (q7_t) A >> 8 */
+    /* convert from q15 to q7 and then store the results in the destination buffer */
+    in1 = *__SIMD32(pIn)++;
+    in2 = *__SIMD32(pIn)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    out1 = __PKHTB(in2, in1, 16);
+    out2 = __PKHBT(in2, in1, 16);
+
+#else
+
+    out1 = __PKHTB(in1, in2, 16);
+    out2 = __PKHBT(in1, in2, 16);
+
+#endif //      #ifndef ARM_MATH_BIG_ENDIAN
+
+    /* rotate packed value by 24 */
+    out2 = ((uint32_t) out2 << 8) | ((uint32_t) out2 >> 24);
+
+    /* anding with 0xff00ff00 to get two 8 bit values */
+    out1 = out1 & 0xFF00FF00;
+    /* anding with 0x00ff00ff to get two 8 bit values */
+    out2 = out2 & 0x00FF00FF;
+
+    /* oring two values(contains two 8 bit values) to get four packed 8 bit values */
+    out1 = out1 | out2;
+
+    /* store 4 samples at a time to destiantion buffer */
+    *__SIMD32(pDst)++ = out1;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (q7_t) A >> 8 */
+    /* convert from q15 to q7 and then store the results in the destination buffer */
+    *pDst++ = (q7_t) (*pIn++ >> 8);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of q15_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_float.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_float.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,131 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q31_to_float.c    
+*    
+* Description:	Converts the elements of the Q31 vector to floating-point vector.      
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @defgroup q31_to_x  Convert 32-bit Integer value    
+ */
+
+/**    
+ * @addtogroup q31_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the Q31 vector to floating-point vector.    
+ * @param[in]       *pSrc points to the Q31 input vector    
+ * @param[out]      *pDst points to the floating-point output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (float32_t) pSrc[n] / 2147483648;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ */
+
+
+void arm_q31_to_float(
+  q31_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pIn = pSrc;                             /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (float32_t) A / 2147483648 */
+    /* convert from q31 to float and then store the results in the destination buffer */
+    *pDst++ = ((float32_t) * pIn++ / 2147483648.0f);
+    *pDst++ = ((float32_t) * pIn++ / 2147483648.0f);
+    *pDst++ = ((float32_t) * pIn++ / 2147483648.0f);
+    *pDst++ = ((float32_t) * pIn++ / 2147483648.0f);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (float32_t) A / 2147483648 */
+    /* convert from q31 to float and then store the results in the destination buffer */
+    *pDst++ = ((float32_t) * pIn++ / 2147483648.0f);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of q31_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,145 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q31_to_q15.c    
+*    
+* Description:	Converts the elements of the Q31 vector to Q15 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup q31_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the Q31 vector to Q15 vector.    
+ * @param[in]       *pSrc points to the Q31 input vector    
+ * @param[out]      *pDst points to the Q15 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *     
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (q15_t) pSrc[n] >> 16;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ */
+
+
+void arm_q31_to_q15(
+  q31_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pIn = pSrc;                             /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+  q31_t out1, out2;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (q15_t) A >> 16 */
+    /* convert from q31 to q15 and then store the results in the destination buffer */
+    in1 = *pIn++;
+    in2 = *pIn++;
+    in3 = *pIn++;
+    in4 = *pIn++;
+
+    /* pack two higher 16-bit values from two 32-bit values */
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    out1 = __PKHTB(in2, in1, 16);
+    out2 = __PKHTB(in4, in3, 16);
+
+#else
+
+    out1 = __PKHTB(in1, in2, 16);
+    out2 = __PKHTB(in3, in4, 16);
+
+#endif //      #ifdef ARM_MATH_BIG_ENDIAN
+
+    *__SIMD32(pDst)++ = out1;
+    *__SIMD32(pDst)++ = out2;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (q15_t) A >> 16 */
+    /* convert from q31 to q15 and then store the results in the destination buffer */
+    *pDst++ = (q15_t) (*pIn++ >> 16);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of q31_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_q7.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q31_to_q7.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,136 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q31_to_q7.c    
+*    
+* Description:	Converts the elements of the Q31 vector to Q7 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup q31_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the Q31 vector to Q7 vector.    
+ * @param[in]       *pSrc points to the Q31 input vector    
+ * @param[out]      *pDst points to the Q7 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (q7_t) pSrc[n] >> 24;   0 <= n < blockSize.     
+ * </pre>    
+ *   
+ */
+
+
+void arm_q31_to_q7(
+  q31_t * pSrc,
+  q7_t * pDst,
+  uint32_t blockSize)
+{
+  q31_t *pIn = pSrc;                             /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+  q31_t in1, in2, in3, in4;
+  q7_t out1, out2, out3, out4;
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (q7_t) A >> 24 */
+    /* convert from q31 to q7 and then store the results in the destination buffer */
+    in1 = *pIn++;
+    in2 = *pIn++;
+    in3 = *pIn++;
+    in4 = *pIn++;
+
+    out1 = (q7_t) (in1 >> 24);
+    out2 = (q7_t) (in2 >> 24);
+    out3 = (q7_t) (in3 >> 24);
+    out4 = (q7_t) (in4 >> 24);
+
+    *__SIMD32(pDst)++ = __PACKq7(out1, out2, out3, out4);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (q7_t) A >> 24 */
+    /* convert from q31 to q7 and then store the results in the destination buffer */
+    *pDst++ = (q7_t) (*pIn++ >> 24);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of q31_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_float.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_float.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,131 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q7_to_float.c    
+*    
+* Description:	Converts the elements of the Q7 vector to floating-point vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @defgroup q7_to_x  Convert 8-bit Integer value    
+ */
+
+/**    
+ * @addtogroup q7_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the Q7 vector to floating-point vector.    
+ * @param[in]       *pSrc points to the Q7 input vector    
+ * @param[out]      *pDst points to the floating-point output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *		     
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (float32_t) pSrc[n] / 128;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ */
+
+
+void arm_q7_to_float(
+  q7_t * pSrc,
+  float32_t * pDst,
+  uint32_t blockSize)
+{
+  q7_t *pIn = pSrc;                              /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (float32_t) A / 128 */
+    /* convert from q7 to float and then store the results in the destination buffer */
+    *pDst++ = ((float32_t) * pIn++ / 128.0f);
+    *pDst++ = ((float32_t) * pIn++ / 128.0f);
+    *pDst++ = ((float32_t) * pIn++ / 128.0f);
+    *pDst++ = ((float32_t) * pIn++ / 128.0f);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (float32_t) A / 128 */
+    /* convert from q7 to float and then store the results in the destination buffer */
+    *pDst++ = ((float32_t) * pIn++ / 128.0f);
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+}
+
+/**    
+ * @} end of q7_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_q15.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_q15.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,157 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q7_to_q15.c    
+*    
+* Description:	Converts the elements of the Q7 vector to Q15 vector.    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup q7_to_x    
+ * @{    
+ */
+
+
+
+
+/**    
+ * @brief Converts the elements of the Q7 vector to Q15 vector.    
+ * @param[in]       *pSrc points to the Q7 input vector    
+ * @param[out]      *pDst points to the Q15 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (q15_t) pSrc[n] << 8;   0 <= n < blockSize.    
+ * </pre>    
+ *   
+ */
+
+
+void arm_q7_to_q15(
+  q7_t * pSrc,
+  q15_t * pDst,
+  uint32_t blockSize)
+{
+  q7_t *pIn = pSrc;                              /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+  q31_t in;
+  q31_t in1, in2;
+  q31_t out1, out2;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (q15_t) A << 8 */
+    /* convert from q7 to q15 and then store the results in the destination buffer */
+    in = *__SIMD32(pIn)++;
+
+    /* rotatate in by 8 and extend two q7_t values to q15_t values */
+    in1 = __SXTB16(__ROR(in, 8));
+
+    /* extend remainig two q7_t values to q15_t values */
+    in2 = __SXTB16(in);
+
+    in1 = in1 << 8u;
+    in2 = in2 << 8u;
+
+    in1 = in1 & 0xFF00FF00;
+    in2 = in2 & 0xFF00FF00;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    out2 = __PKHTB(in1, in2, 16);
+    out1 = __PKHBT(in2, in1, 16);
+
+#else
+
+    out1 = __PKHTB(in1, in2, 16);
+    out2 = __PKHBT(in2, in1, 16);
+
+#endif
+
+    *__SIMD32(pDst)++ = out1;
+    *__SIMD32(pDst)++ = out2;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (q15_t) A << 8 */
+    /* convert from q7 to q15 and then store the results in the destination buffer */
+    *pDst++ = (q15_t) * pIn++ << 8;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of q7_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_q31.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/SupportFunctions/arm_q7_to_q31.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,142 @@
+/* ----------------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:		arm_q7_to_q31.c    
+*    
+* Description:	Converts the elements of the Q7 vector to Q31 vector.  
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.   
+* ---------------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+/**    
+ * @ingroup groupSupport    
+ */
+
+/**    
+ * @addtogroup q7_to_x    
+ * @{    
+ */
+
+/**    
+ * @brief Converts the elements of the Q7 vector to Q31 vector.    
+ * @param[in]       *pSrc points to the Q7 input vector    
+ * @param[out]      *pDst points to the Q31 output vector   
+ * @param[in]       blockSize length of the input vector    
+ * @return none.    
+ *    
+ * \par Description:    
+ *    
+ * The equation used for the conversion process is:    
+ *   
+ * <pre>    
+ * 	pDst[n] = (q31_t) pSrc[n] << 24;   0 <= n < blockSize.   
+ * </pre>     
+ *   
+ */
+
+
+void arm_q7_to_q31(
+  q7_t * pSrc,
+  q31_t * pDst,
+  uint32_t blockSize)
+{
+  q7_t *pIn = pSrc;                              /* Src pointer */
+  uint32_t blkCnt;                               /* loop counter */
+
+#ifndef ARM_MATH_CM0_FAMILY
+
+  q31_t in;
+
+  /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+  /*loop Unrolling */
+  blkCnt = blockSize >> 2u;
+
+  /* First part of the processing with loop unrolling.  Compute 4 outputs at a time.    
+   ** a second loop below computes the remaining 1 to 3 samples. */
+  while(blkCnt > 0u)
+  {
+    /* C = (q31_t) A << 24 */
+    /* convert from q7 to q31 and then store the results in the destination buffer */
+    in = *__SIMD32(pIn)++;
+
+#ifndef ARM_MATH_BIG_ENDIAN
+
+    *pDst++ = (__ROR(in, 8)) & 0xFF000000;
+    *pDst++ = (__ROR(in, 16)) & 0xFF000000;
+    *pDst++ = (__ROR(in, 24)) & 0xFF000000;
+    *pDst++ = (in & 0xFF000000);
+
+#else
+
+    *pDst++ = (in & 0xFF000000);
+    *pDst++ = (__ROR(in, 24)) & 0xFF000000;
+    *pDst++ = (__ROR(in, 16)) & 0xFF000000;
+    *pDst++ = (__ROR(in, 8)) & 0xFF000000;
+
+#endif //              #ifndef ARM_MATH_BIG_ENDIAN
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+  /* If the blockSize is not a multiple of 4, compute any remaining output samples here.    
+   ** No loop unrolling is used. */
+  blkCnt = blockSize % 0x4u;
+
+#else
+
+  /* Run the below code for Cortex-M0 */
+
+  /* Loop over blockSize number of values */
+  blkCnt = blockSize;
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY */
+
+  while(blkCnt > 0u)
+  {
+    /* C = (q31_t) A << 24 */
+    /* convert from q7 to q31 and then store the results in the destination buffer */
+    *pDst++ = (q31_t) * pIn++ << 24;
+
+    /* Decrement the loop counter */
+    blkCnt--;
+  }
+
+}
+
+/**    
+ * @} end of q7_to_x group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/TransformFunctions/arm_bitreversal.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/TransformFunctions/arm_bitreversal.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,242 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_bitreversal.c    
+*    
+* Description:	This file has common tables like Bitreverse, reciprocal etc which are used across different functions    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.  
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/*    
+* @brief  In-place bit reversal function.   
+* @param[in, out] *pSrc        points to the in-place buffer of floating-point data type.   
+* @param[in]      fftSize      length of the FFT.   
+* @param[in]      bitRevFactor bit reversal modifier that supports different size FFTs with the same bit reversal table.   
+* @param[in]      *pBitRevTab  points to the bit reversal table.   
+* @return none.   
+*/
+
+void arm_bitreversal_f32(
+float32_t * pSrc,
+uint16_t fftSize,
+uint16_t bitRevFactor,
+uint16_t * pBitRevTab)
+{
+   uint16_t fftLenBy2, fftLenBy2p1;
+   uint16_t i, j;
+   float32_t in;
+
+   /*  Initializations */
+   j = 0u;
+   fftLenBy2 = fftSize >> 1u;
+   fftLenBy2p1 = (fftSize >> 1u) + 1u;
+
+   /* Bit Reversal Implementation */
+   for (i = 0u; i <= (fftLenBy2 - 2u); i += 2u)
+   {
+      if(i < j)
+      {
+         /*  pSrc[i] <-> pSrc[j]; */
+         in = pSrc[2u * i];
+         pSrc[2u * i] = pSrc[2u * j];
+         pSrc[2u * j] = in;
+
+         /*  pSrc[i+1u] <-> pSrc[j+1u] */
+         in = pSrc[(2u * i) + 1u];
+         pSrc[(2u * i) + 1u] = pSrc[(2u * j) + 1u];
+         pSrc[(2u * j) + 1u] = in;
+
+         /*  pSrc[i+fftLenBy2p1] <-> pSrc[j+fftLenBy2p1] */
+         in = pSrc[2u * (i + fftLenBy2p1)];
+         pSrc[2u * (i + fftLenBy2p1)] = pSrc[2u * (j + fftLenBy2p1)];
+         pSrc[2u * (j + fftLenBy2p1)] = in;
+
+         /*  pSrc[i+fftLenBy2p1+1u] <-> pSrc[j+fftLenBy2p1+1u] */
+         in = pSrc[(2u * (i + fftLenBy2p1)) + 1u];
+         pSrc[(2u * (i + fftLenBy2p1)) + 1u] =
+         pSrc[(2u * (j + fftLenBy2p1)) + 1u];
+         pSrc[(2u * (j + fftLenBy2p1)) + 1u] = in;
+
+      }
+
+      /*  pSrc[i+1u] <-> pSrc[j+1u] */
+      in = pSrc[2u * (i + 1u)];
+      pSrc[2u * (i + 1u)] = pSrc[2u * (j + fftLenBy2)];
+      pSrc[2u * (j + fftLenBy2)] = in;
+
+      /*  pSrc[i+2u] <-> pSrc[j+2u] */
+      in = pSrc[(2u * (i + 1u)) + 1u];
+      pSrc[(2u * (i + 1u)) + 1u] = pSrc[(2u * (j + fftLenBy2)) + 1u];
+      pSrc[(2u * (j + fftLenBy2)) + 1u] = in;
+
+      /*  Reading the index for the bit reversal */
+      j = *pBitRevTab;
+
+      /*  Updating the bit reversal index depending on the fft length  */
+      pBitRevTab += bitRevFactor;
+   }
+}
+
+
+
+/*    
+* @brief  In-place bit reversal function.   
+* @param[in, out] *pSrc        points to the in-place buffer of Q31 data type.   
+* @param[in]      fftLen       length of the FFT.   
+* @param[in]      bitRevFactor bit reversal modifier that supports different size FFTs with the same bit reversal table   
+* @param[in]      *pBitRevTab  points to bit reversal table.   
+* @return none.   
+*/
+
+void arm_bitreversal_q31(
+q31_t * pSrc,
+uint32_t fftLen,
+uint16_t bitRevFactor,
+uint16_t * pBitRevTable)
+{
+   uint32_t fftLenBy2, fftLenBy2p1, i, j;
+   q31_t in;
+
+   /*  Initializations      */
+   j = 0u;
+   fftLenBy2 = fftLen / 2u;
+   fftLenBy2p1 = (fftLen / 2u) + 1u;
+
+   /* Bit Reversal Implementation */
+   for (i = 0u; i <= (fftLenBy2 - 2u); i += 2u)
+   {
+      if(i < j)
+      {
+         /*  pSrc[i] <-> pSrc[j]; */
+         in = pSrc[2u * i];
+         pSrc[2u * i] = pSrc[2u * j];
+         pSrc[2u * j] = in;
+
+         /*  pSrc[i+1u] <-> pSrc[j+1u] */
+         in = pSrc[(2u * i) + 1u];
+         pSrc[(2u * i) + 1u] = pSrc[(2u * j) + 1u];
+         pSrc[(2u * j) + 1u] = in;
+
+         /*  pSrc[i+fftLenBy2p1] <-> pSrc[j+fftLenBy2p1] */
+         in = pSrc[2u * (i + fftLenBy2p1)];
+         pSrc[2u * (i + fftLenBy2p1)] = pSrc[2u * (j + fftLenBy2p1)];
+         pSrc[2u * (j + fftLenBy2p1)] = in;
+
+         /*  pSrc[i+fftLenBy2p1+1u] <-> pSrc[j+fftLenBy2p1+1u] */
+         in = pSrc[(2u * (i + fftLenBy2p1)) + 1u];
+         pSrc[(2u * (i + fftLenBy2p1)) + 1u] =
+         pSrc[(2u * (j + fftLenBy2p1)) + 1u];
+         pSrc[(2u * (j + fftLenBy2p1)) + 1u] = in;
+
+      }
+
+      /*  pSrc[i+1u] <-> pSrc[j+1u] */
+      in = pSrc[2u * (i + 1u)];
+      pSrc[2u * (i + 1u)] = pSrc[2u * (j + fftLenBy2)];
+      pSrc[2u * (j + fftLenBy2)] = in;
+
+      /*  pSrc[i+2u] <-> pSrc[j+2u] */
+      in = pSrc[(2u * (i + 1u)) + 1u];
+      pSrc[(2u * (i + 1u)) + 1u] = pSrc[(2u * (j + fftLenBy2)) + 1u];
+      pSrc[(2u * (j + fftLenBy2)) + 1u] = in;
+
+      /*  Reading the index for the bit reversal */
+      j = *pBitRevTable;
+
+      /*  Updating the bit reversal index depending on the fft length */
+      pBitRevTable += bitRevFactor;
+   }
+}
+
+
+
+/*    
+   * @brief  In-place bit reversal function.   
+   * @param[in, out] *pSrc        points to the in-place buffer of Q15 data type.   
+   * @param[in]      fftLen       length of the FFT.   
+   * @param[in]      bitRevFactor bit reversal modifier that supports different size FFTs with the same bit reversal table   
+   * @param[in]      *pBitRevTab  points to bit reversal table.   
+   * @return none.   
+*/
+
+void arm_bitreversal_q15(
+q15_t * pSrc16,
+uint32_t fftLen,
+uint16_t bitRevFactor,
+uint16_t * pBitRevTab)
+{
+   q31_t *pSrc = (q31_t *) pSrc16;
+   q31_t in;
+   uint32_t fftLenBy2, fftLenBy2p1;
+   uint32_t i, j;
+
+   /*  Initializations */
+   j = 0u;
+   fftLenBy2 = fftLen / 2u;
+   fftLenBy2p1 = (fftLen / 2u) + 1u;
+
+   /* Bit Reversal Implementation */
+   for (i = 0u; i <= (fftLenBy2 - 2u); i += 2u)
+   {
+      if(i < j)
+      {
+         /*  pSrc[i] <-> pSrc[j]; */
+         /*  pSrc[i+1u] <-> pSrc[j+1u] */
+         in = pSrc[i];
+         pSrc[i] = pSrc[j];
+         pSrc[j] = in;
+
+         /*  pSrc[i + fftLenBy2p1] <-> pSrc[j + fftLenBy2p1];  */
+         /*  pSrc[i + fftLenBy2p1+1u] <-> pSrc[j + fftLenBy2p1+1u] */
+         in = pSrc[i + fftLenBy2p1];
+         pSrc[i + fftLenBy2p1] = pSrc[j + fftLenBy2p1];
+         pSrc[j + fftLenBy2p1] = in;
+      }
+
+      /*  pSrc[i+1u] <-> pSrc[j+fftLenBy2];         */
+      /*  pSrc[i+2] <-> pSrc[j+fftLenBy2+1u]  */
+      in = pSrc[i + 1u];
+      pSrc[i + 1u] = pSrc[j + fftLenBy2];
+      pSrc[j + fftLenBy2] = in;
+
+      /*  Reading the index for the bit reversal */
+      j = *pBitRevTab;
+
+      /*  Updating the bit reversal index depending on the fft length  */
+      pBitRevTab += bitRevFactor;
+   }
+}

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/TransformFunctions/arm_cfft_radix4_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/TransformFunctions/arm_cfft_radix4_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,1210 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cfft_radix4_f32.c    
+*    
+* Description:	Radix-4 Decimation in Frequency CFFT & CIFFT Floating point processing function    
+*    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+#include "arm_math.h"
+
+extern void arm_bitreversal_f32(
+float32_t * pSrc,
+uint16_t fftSize,
+uint16_t bitRevFactor,
+uint16_t * pBitRevTab);
+
+/**    
+* @ingroup groupTransforms    
+*/
+
+/* ----------------------------------------------------------------------    
+** Internal helper function used by the FFTs    
+** ------------------------------------------------------------------- */
+
+/*    
+* @brief  Core function for the floating-point CFFT butterfly process.   
+* @param[in, out] *pSrc            points to the in-place buffer of floating-point data type.   
+* @param[in]      fftLen           length of the FFT.   
+* @param[in]      *pCoef           points to the twiddle coefficient buffer.   
+* @param[in]      twidCoefModifier twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table.   
+* @return none.   
+*/
+
+void arm_radix4_butterfly_f32(
+float32_t * pSrc,
+uint16_t fftLen,
+float32_t * pCoef,
+uint16_t twidCoefModifier)
+{
+
+   float32_t co1, co2, co3, si1, si2, si3;
+   uint32_t ia1, ia2, ia3;
+   uint32_t i0, i1, i2, i3;
+   uint32_t n1, n2, j, k;
+
+#ifndef ARM_MATH_CM0_FAMILY_FAMILY
+
+   /* Run the below code for Cortex-M4 and Cortex-M3 */
+
+   float32_t xaIn, yaIn, xbIn, ybIn, xcIn, ycIn, xdIn, ydIn;
+   float32_t Xaplusc, Xbplusd, Yaplusc, Ybplusd, Xaminusc, Xbminusd, Yaminusc,
+   Ybminusd;
+   float32_t Xb12C_out, Yb12C_out, Xc12C_out, Yc12C_out, Xd12C_out, Yd12C_out;
+   float32_t Xb12_out, Yb12_out, Xc12_out, Yc12_out, Xd12_out, Yd12_out;
+   float32_t *ptr1;
+   float32_t p0,p1,p2,p3,p4,p5;
+   float32_t a0,a1,a2,a3,a4,a5,a6,a7;
+
+   /*  Initializations for the first stage */
+   n2 = fftLen;
+   n1 = n2;
+
+   /* n2 = fftLen/4 */
+   n2 >>= 2u;
+   i0 = 0u;
+   ia1 = 0u;
+
+   j = n2;
+
+   /*  Calculation of first stage */
+   do
+   {
+      /*  index calculation for the input as, */
+      /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+      i1 = i0 + n2;
+      i2 = i1 + n2;
+      i3 = i2 + n2;
+
+      xaIn = pSrc[(2u * i0)];
+      yaIn = pSrc[(2u * i0) + 1u];
+
+      xbIn = pSrc[(2u * i1)];
+      ybIn = pSrc[(2u * i1) + 1u];
+
+      xcIn = pSrc[(2u * i2)];
+      ycIn = pSrc[(2u * i2) + 1u];
+
+      xdIn = pSrc[(2u * i3)];
+      ydIn = pSrc[(2u * i3) + 1u];
+
+      /* xa + xc */
+      Xaplusc = xaIn + xcIn;
+      /* xb + xd */
+      Xbplusd = xbIn + xdIn;
+      /* ya + yc */
+      Yaplusc = yaIn + ycIn;
+      /* yb + yd */
+      Ybplusd = ybIn + ydIn;
+
+      /*  index calculation for the coefficients */
+      ia2 = ia1 + ia1;
+      co2 = pCoef[ia2 * 2u];
+      si2 = pCoef[(ia2 * 2u) + 1u];
+
+      /* xa - xc */
+      Xaminusc = xaIn - xcIn;
+      /* xb - xd */
+      Xbminusd = xbIn - xdIn;
+      /* ya - yc */
+      Yaminusc = yaIn - ycIn;
+      /* yb - yd */
+      Ybminusd = ybIn - ydIn;
+
+      /* xa' = xa + xb + xc + xd */
+      pSrc[(2u * i0)] = Xaplusc + Xbplusd;
+      /* ya' = ya + yb + yc + yd */
+      pSrc[(2u * i0) + 1u] = Yaplusc + Ybplusd;
+
+      /* (xa - xc) + (yb - yd) */
+      Xb12C_out = (Xaminusc + Ybminusd);
+      /* (ya - yc) + (xb - xd) */
+      Yb12C_out = (Yaminusc - Xbminusd);
+      /* (xa + xc) - (xb + xd) */
+      Xc12C_out = (Xaplusc - Xbplusd);
+      /* (ya + yc) - (yb + yd) */
+      Yc12C_out = (Yaplusc - Ybplusd);
+      /* (xa - xc) - (yb - yd) */
+      Xd12C_out = (Xaminusc - Ybminusd);
+      /* (ya - yc) + (xb - xd) */
+      Yd12C_out = (Xbminusd + Yaminusc);
+
+      co1 = pCoef[ia1 * 2u];
+      si1 = pCoef[(ia1 * 2u) + 1u];
+
+      /*  index calculation for the coefficients */
+      ia3 = ia2 + ia1;
+      co3 = pCoef[ia3 * 2u];
+      si3 = pCoef[(ia3 * 2u) + 1u];
+
+      Xb12_out = Xb12C_out * co1;
+      Yb12_out = Yb12C_out * co1;
+      Xc12_out = Xc12C_out * co2;
+      Yc12_out = Yc12C_out * co2;
+      Xd12_out = Xd12C_out * co3;
+      Yd12_out = Yd12C_out * co3;
+         
+      /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+      //Xb12_out -= Yb12C_out * si1;
+      p0 = Yb12C_out * si1;
+      /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+      //Yb12_out += Xb12C_out * si1;
+      p1 = Xb12C_out * si1;
+      /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+      //Xc12_out -= Yc12C_out * si2;
+      p2 = Yc12C_out * si2;
+      /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+      //Yc12_out += Xc12C_out * si2;
+      p3 = Xc12C_out * si2;
+      /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+      //Xd12_out -= Yd12C_out * si3;
+      p4 = Yd12C_out * si3;
+      /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+      //Yd12_out += Xd12C_out * si3;
+      p5 = Xd12C_out * si3;
+      
+      Xb12_out += p0;
+      Yb12_out -= p1;
+      Xc12_out += p2;
+      Yc12_out -= p3;
+      Xd12_out += p4;
+      Yd12_out -= p5;
+
+      /* xc' = (xa-xb+xc-xd)co2 + (ya-yb+yc-yd)(si2) */
+      pSrc[2u * i1] = Xc12_out;
+
+      /* yc' = (ya-yb+yc-yd)co2 - (xa-xb+xc-xd)(si2) */
+      pSrc[(2u * i1) + 1u] = Yc12_out;
+
+      /* xb' = (xa+yb-xc-yd)co1 + (ya-xb-yc+xd)(si1) */
+      pSrc[2u * i2] = Xb12_out;
+
+      /* yb' = (ya-xb-yc+xd)co1 - (xa+yb-xc-yd)(si1) */
+      pSrc[(2u * i2) + 1u] = Yb12_out;
+
+      /* xd' = (xa-yb-xc+yd)co3 + (ya+xb-yc-xd)(si3) */
+      pSrc[2u * i3] = Xd12_out;
+
+      /* yd' = (ya+xb-yc-xd)co3 - (xa-yb-xc+yd)(si3) */
+      pSrc[(2u * i3) + 1u] = Yd12_out;
+
+      /*  Twiddle coefficients index modifier */
+      ia1 += twidCoefModifier;
+
+      /*  Updating input index */
+      i0++;
+
+   }
+   while(--j);
+
+   twidCoefModifier <<= 2u;
+
+   /*  Calculation of second stage to excluding last stage */
+   for (k = fftLen >> 2u; k > 4u; k >>= 2u)
+   {
+      /*  Initializations for the first stage */
+      n1 = n2;
+      n2 >>= 2u;
+      ia1 = 0u;
+
+      /*  Calculation of first stage */
+      j = 0;
+      do
+      {
+         /*  index calculation for the coefficients */
+         ia2 = ia1 + ia1;
+         ia3 = ia2 + ia1;
+         co1 = pCoef[ia1 * 2u];
+         si1 = pCoef[(ia1 * 2u) + 1u];
+         co2 = pCoef[ia2 * 2u];
+         si2 = pCoef[(ia2 * 2u) + 1u];
+         co3 = pCoef[ia3 * 2u];
+         si3 = pCoef[(ia3 * 2u) + 1u];
+
+         /*  Twiddle coefficients index modifier */
+         ia1 += twidCoefModifier;
+      
+         i0 = j;
+         do
+         {
+            /*  index calculation for the input as, */
+            /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+            i1 = i0 + n2;
+            i2 = i1 + n2;
+            i3 = i2 + n2;
+
+            xaIn = pSrc[(2u * i0)];
+            yaIn = pSrc[(2u * i0) + 1u];
+
+            xbIn = pSrc[(2u * i1)];
+            ybIn = pSrc[(2u * i1) + 1u];
+
+            xcIn = pSrc[(2u * i2)];
+            ycIn = pSrc[(2u * i2) + 1u];
+
+            xdIn = pSrc[(2u * i3)];
+            ydIn = pSrc[(2u * i3) + 1u];
+
+            /* xa - xc */
+            Xaminusc = xaIn - xcIn;
+            /* (xb - xd) */
+            Xbminusd = xbIn - xdIn;
+            /* ya - yc */
+            Yaminusc = yaIn - ycIn;
+            /* (yb - yd) */
+            Ybminusd = ybIn - ydIn;
+
+            /* xa + xc */
+            Xaplusc = xaIn + xcIn;
+            /* xb + xd */
+            Xbplusd = xbIn + xdIn;
+            /* ya + yc */
+            Yaplusc = yaIn + ycIn;
+            /* yb + yd */
+            Ybplusd = ybIn + ydIn;
+
+            /* (xa - xc) + (yb - yd) */
+            Xb12C_out = (Xaminusc + Ybminusd);
+            /* (ya - yc) -  (xb - xd) */
+            Yb12C_out = (Yaminusc - Xbminusd);
+            /* xa + xc -(xb + xd) */
+            Xc12C_out = (Xaplusc - Xbplusd);
+            /* (ya + yc) - (yb + yd) */
+            Yc12C_out = (Yaplusc - Ybplusd);
+            /* (xa - xc) - (yb - yd) */
+            Xd12C_out = (Xaminusc - Ybminusd);
+            /* (ya - yc) +  (xb - xd) */
+            Yd12C_out = (Xbminusd + Yaminusc);
+
+            pSrc[(2u * i0)] = Xaplusc + Xbplusd;
+            pSrc[(2u * i0) + 1u] = Yaplusc + Ybplusd;
+
+            Xb12_out = Xb12C_out * co1;
+            Yb12_out = Yb12C_out * co1;
+            Xc12_out = Xc12C_out * co2;
+            Yc12_out = Yc12C_out * co2;
+            Xd12_out = Xd12C_out * co3;
+            Yd12_out = Yd12C_out * co3;
+         
+            /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+            //Xb12_out -= Yb12C_out * si1;
+            p0 = Yb12C_out * si1;
+            /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+            //Yb12_out += Xb12C_out * si1;
+            p1 = Xb12C_out * si1;
+            /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+            //Xc12_out -= Yc12C_out * si2;
+            p2 = Yc12C_out * si2;
+            /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+            //Yc12_out += Xc12C_out * si2;
+            p3 = Xc12C_out * si2;
+            /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+            //Xd12_out -= Yd12C_out * si3;
+            p4 = Yd12C_out * si3;
+            /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+            //Yd12_out += Xd12C_out * si3;
+            p5 = Xd12C_out * si3;
+            
+            Xb12_out += p0;
+            Yb12_out -= p1;
+            Xc12_out += p2;
+            Yc12_out -= p3;
+            Xd12_out += p4;
+            Yd12_out -= p5;
+
+            /* xc' = (xa-xb+xc-xd)co2 + (ya-yb+yc-yd)(si2) */
+            pSrc[2u * i1] = Xc12_out;
+
+            /* yc' = (ya-yb+yc-yd)co2 - (xa-xb+xc-xd)(si2) */
+            pSrc[(2u * i1) + 1u] = Yc12_out;
+
+            /* xb' = (xa+yb-xc-yd)co1 + (ya-xb-yc+xd)(si1) */
+            pSrc[2u * i2] = Xb12_out;
+
+            /* yb' = (ya-xb-yc+xd)co1 - (xa+yb-xc-yd)(si1) */
+            pSrc[(2u * i2) + 1u] = Yb12_out;
+
+            /* xd' = (xa-yb-xc+yd)co3 + (ya+xb-yc-xd)(si3) */
+            pSrc[2u * i3] = Xd12_out;
+
+            /* yd' = (ya+xb-yc-xd)co3 - (xa-yb-xc+yd)(si3) */
+            pSrc[(2u * i3) + 1u] = Yd12_out;
+
+            i0 += n1;
+         } while(i0 < fftLen);
+         j++;
+      } while(j <= (n2 - 1u));
+      twidCoefModifier <<= 2u;
+   }
+
+   j = fftLen >> 2;
+   ptr1 = &pSrc[0];
+
+   /*  Calculations of last stage */
+   do
+   {
+      xaIn = ptr1[0];
+      yaIn = ptr1[1];
+      xbIn = ptr1[2];
+      ybIn = ptr1[3];
+      xcIn = ptr1[4];
+      ycIn = ptr1[5];
+      xdIn = ptr1[6];
+      ydIn = ptr1[7];
+
+      /* xa + xc */
+      Xaplusc = xaIn + xcIn;
+
+      /* xa - xc */
+      Xaminusc = xaIn - xcIn;
+
+      /* ya + yc */
+      Yaplusc = yaIn + ycIn;
+
+      /* ya - yc */
+      Yaminusc = yaIn - ycIn;
+
+      /* xb + xd */
+      Xbplusd = xbIn + xdIn;
+
+      /* yb + yd */
+      Ybplusd = ybIn + ydIn;
+
+      /* (xb-xd) */
+      Xbminusd = xbIn - xdIn;
+
+      /* (yb-yd) */
+      Ybminusd = ybIn - ydIn;
+
+      /* xa' = xa + xb + xc + xd */
+      a0 = (Xaplusc + Xbplusd);
+      /* ya' = ya + yb + yc + yd */
+      a1 = (Yaplusc + Ybplusd);
+      /* xc' = (xa-xb+xc-xd) */
+      a2 = (Xaplusc - Xbplusd);
+      /* yc' = (ya-yb+yc-yd) */
+      a3 = (Yaplusc - Ybplusd);
+      /* xb' = (xa+yb-xc-yd) */
+      a4 = (Xaminusc + Ybminusd);
+      /* yb' = (ya-xb-yc+xd) */
+      a5 = (Yaminusc - Xbminusd);
+      /* xd' = (xa-yb-xc+yd)) */
+      a6 = (Xaminusc - Ybminusd);
+      /* yd' = (ya+xb-yc-xd) */
+      a7 = (Xbminusd + Yaminusc);
+   
+      ptr1[0] = a0;
+      ptr1[1] = a1;
+      ptr1[2] = a2;
+      ptr1[3] = a3;
+      ptr1[4] = a4;
+      ptr1[5] = a5;
+      ptr1[6] = a6;
+      ptr1[7] = a7;
+
+      /* increment pointer by 8 */
+      ptr1 += 8u;
+   } while(--j);
+
+#else
+
+   float32_t t1, t2, r1, r2, s1, s2;
+
+   /* Run the below code for Cortex-M0 */
+
+   /*  Initializations for the fft calculation */
+   n2 = fftLen;
+   n1 = n2;
+   for (k = fftLen; k > 1u; k >>= 2u)
+   {
+      /*  Initializations for the fft calculation */
+      n1 = n2;
+      n2 >>= 2u;
+      ia1 = 0u;
+
+      /*  FFT Calculation */
+      j = 0;
+      do
+      {
+         /*  index calculation for the coefficients */
+         ia2 = ia1 + ia1;
+         ia3 = ia2 + ia1;
+         co1 = pCoef[ia1 * 2u];
+         si1 = pCoef[(ia1 * 2u) + 1u];
+         co2 = pCoef[ia2 * 2u];
+         si2 = pCoef[(ia2 * 2u) + 1u];
+         co3 = pCoef[ia3 * 2u];
+         si3 = pCoef[(ia3 * 2u) + 1u];
+
+         /*  Twiddle coefficients index modifier */
+         ia1 = ia1 + twidCoefModifier;
+
+         i0 = j;
+         do
+         {
+            /*  index calculation for the input as, */
+            /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+            i1 = i0 + n2;
+            i2 = i1 + n2;
+            i3 = i2 + n2;
+
+            /* xa + xc */
+            r1 = pSrc[(2u * i0)] + pSrc[(2u * i2)];
+
+            /* xa - xc */
+            r2 = pSrc[(2u * i0)] - pSrc[(2u * i2)];
+
+            /* ya + yc */
+            s1 = pSrc[(2u * i0) + 1u] + pSrc[(2u * i2) + 1u];
+
+            /* ya - yc */
+            s2 = pSrc[(2u * i0) + 1u] - pSrc[(2u * i2) + 1u];
+
+            /* xb + xd */
+            t1 = pSrc[2u * i1] + pSrc[2u * i3];
+
+            /* xa' = xa + xb + xc + xd */
+            pSrc[2u * i0] = r1 + t1;
+
+            /* xa + xc -(xb + xd) */
+            r1 = r1 - t1;
+
+            /* yb + yd */
+            t2 = pSrc[(2u * i1) + 1u] + pSrc[(2u * i3) + 1u];
+
+            /* ya' = ya + yb + yc + yd */
+            pSrc[(2u * i0) + 1u] = s1 + t2;
+
+            /* (ya + yc) - (yb + yd) */
+            s1 = s1 - t2;
+
+            /* (yb - yd) */
+            t1 = pSrc[(2u * i1) + 1u] - pSrc[(2u * i3) + 1u];
+
+            /* (xb - xd) */
+            t2 = pSrc[2u * i1] - pSrc[2u * i3];
+
+            /* xc' = (xa-xb+xc-xd)co2 + (ya-yb+yc-yd)(si2) */
+            pSrc[2u * i1] = (r1 * co2) + (s1 * si2);
+
+            /* yc' = (ya-yb+yc-yd)co2 - (xa-xb+xc-xd)(si2) */
+            pSrc[(2u * i1) + 1u] = (s1 * co2) - (r1 * si2);
+
+            /* (xa - xc) + (yb - yd) */
+            r1 = r2 + t1;
+
+            /* (xa - xc) - (yb - yd) */
+            r2 = r2 - t1;
+
+            /* (ya - yc) -  (xb - xd) */
+            s1 = s2 - t2;
+
+            /* (ya - yc) +  (xb - xd) */
+            s2 = s2 + t2;
+
+            /* xb' = (xa+yb-xc-yd)co1 + (ya-xb-yc+xd)(si1) */
+            pSrc[2u * i2] = (r1 * co1) + (s1 * si1);
+
+            /* yb' = (ya-xb-yc+xd)co1 - (xa+yb-xc-yd)(si1) */
+            pSrc[(2u * i2) + 1u] = (s1 * co1) - (r1 * si1);
+
+            /* xd' = (xa-yb-xc+yd)co3 + (ya+xb-yc-xd)(si3) */
+            pSrc[2u * i3] = (r2 * co3) + (s2 * si3);
+
+            /* yd' = (ya+xb-yc-xd)co3 - (xa-yb-xc+yd)(si3) */
+            pSrc[(2u * i3) + 1u] = (s2 * co3) - (r2 * si3);
+         
+            i0 += n1;
+         } while( i0 < fftLen);
+         j++;
+      } while(j <= (n2 - 1u));
+      twidCoefModifier <<= 2u;
+   }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY_FAMILY */
+
+}
+
+/*    
+* @brief  Core function for the floating-point CIFFT butterfly process.   
+* @param[in, out] *pSrc            points to the in-place buffer of floating-point data type.   
+* @param[in]      fftLen           length of the FFT.   
+* @param[in]      *pCoef           points to twiddle coefficient buffer.   
+* @param[in]      twidCoefModifier twiddle coefficient modifier that supports different size FFTs with the same twiddle factor table.   
+* @param[in]      onebyfftLen      value of 1/fftLen.   
+* @return none.   
+*/
+
+void arm_radix4_butterfly_inverse_f32(
+float32_t * pSrc,
+uint16_t fftLen,
+float32_t * pCoef,
+uint16_t twidCoefModifier,
+float32_t onebyfftLen)
+{
+   float32_t co1, co2, co3, si1, si2, si3;
+   uint32_t ia1, ia2, ia3;
+   uint32_t i0, i1, i2, i3;
+   uint32_t n1, n2, j, k;
+
+#ifndef ARM_MATH_CM0_FAMILY_FAMILY
+
+   float32_t xaIn, yaIn, xbIn, ybIn, xcIn, ycIn, xdIn, ydIn;
+   float32_t Xaplusc, Xbplusd, Yaplusc, Ybplusd, Xaminusc, Xbminusd, Yaminusc,
+   Ybminusd;
+   float32_t Xb12C_out, Yb12C_out, Xc12C_out, Yc12C_out, Xd12C_out, Yd12C_out;
+   float32_t Xb12_out, Yb12_out, Xc12_out, Yc12_out, Xd12_out, Yd12_out;
+   float32_t *ptr1;
+   float32_t p0,p1,p2,p3,p4,p5,p6,p7;
+   float32_t a0,a1,a2,a3,a4,a5,a6,a7;
+
+
+   /*  Initializations for the first stage */
+   n2 = fftLen;
+   n1 = n2;
+
+   /* n2 = fftLen/4 */
+   n2 >>= 2u;
+   i0 = 0u;
+   ia1 = 0u;
+
+   j = n2;
+
+   /*  Calculation of first stage */
+   do
+   {
+      /*  index calculation for the input as, */
+      /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+      i1 = i0 + n2;
+      i2 = i1 + n2;
+      i3 = i2 + n2;
+
+      /*  Butterfly implementation */
+      xaIn = pSrc[(2u * i0)];
+      yaIn = pSrc[(2u * i0) + 1u];
+
+      xcIn = pSrc[(2u * i2)];
+      ycIn = pSrc[(2u * i2) + 1u];
+
+      xbIn = pSrc[(2u * i1)];
+      ybIn = pSrc[(2u * i1) + 1u];
+
+      xdIn = pSrc[(2u * i3)];
+      ydIn = pSrc[(2u * i3) + 1u];
+
+      /* xa + xc */
+      Xaplusc = xaIn + xcIn;
+      /* xb + xd */
+      Xbplusd = xbIn + xdIn;
+      /* ya + yc */
+      Yaplusc = yaIn + ycIn;
+      /* yb + yd */
+      Ybplusd = ybIn + ydIn;
+
+      /*  index calculation for the coefficients */
+      ia2 = ia1 + ia1;
+      co2 = pCoef[ia2 * 2u];
+      si2 = pCoef[(ia2 * 2u) + 1u];
+
+      /* xa - xc */
+      Xaminusc = xaIn - xcIn;
+      /* xb - xd */
+      Xbminusd = xbIn - xdIn;
+      /* ya - yc */
+      Yaminusc = yaIn - ycIn;
+      /* yb - yd */
+      Ybminusd = ybIn - ydIn;
+
+      /* xa' = xa + xb + xc + xd */
+      pSrc[(2u * i0)] = Xaplusc + Xbplusd;
+
+      /* ya' = ya + yb + yc + yd */
+      pSrc[(2u * i0) + 1u] = Yaplusc + Ybplusd;
+
+      /* (xa - xc) - (yb - yd) */
+      Xb12C_out = (Xaminusc - Ybminusd);
+      /* (ya - yc) + (xb - xd) */
+      Yb12C_out = (Yaminusc + Xbminusd);
+      /* (xa + xc) - (xb + xd) */
+      Xc12C_out = (Xaplusc - Xbplusd);
+      /* (ya + yc) - (yb + yd) */
+      Yc12C_out = (Yaplusc - Ybplusd);
+      /* (xa - xc) + (yb - yd) */
+      Xd12C_out = (Xaminusc + Ybminusd);
+      /* (ya - yc) - (xb - xd) */
+      Yd12C_out = (Yaminusc - Xbminusd);
+
+      co1 = pCoef[ia1 * 2u];
+      si1 = pCoef[(ia1 * 2u) + 1u];
+
+      /*  index calculation for the coefficients */
+      ia3 = ia2 + ia1;
+      co3 = pCoef[ia3 * 2u];
+      si3 = pCoef[(ia3 * 2u) + 1u];
+
+      Xb12_out = Xb12C_out * co1;
+      Yb12_out = Yb12C_out * co1;
+      Xc12_out = Xc12C_out * co2;
+      Yc12_out = Yc12C_out * co2;
+      Xd12_out = Xd12C_out * co3;
+      Yd12_out = Yd12C_out * co3;
+   
+      /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+      //Xb12_out -= Yb12C_out * si1;
+      p0 = Yb12C_out * si1;
+      /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+      //Yb12_out += Xb12C_out * si1;
+      p1 = Xb12C_out * si1;
+      /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+      //Xc12_out -= Yc12C_out * si2;
+      p2 = Yc12C_out * si2;
+      /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+      //Yc12_out += Xc12C_out * si2;
+      p3 = Xc12C_out * si2;
+      /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+      //Xd12_out -= Yd12C_out * si3;
+      p4 = Yd12C_out * si3;
+      /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+      //Yd12_out += Xd12C_out * si3;
+      p5 = Xd12C_out * si3;
+      
+      Xb12_out -= p0;
+      Yb12_out += p1;
+      Xc12_out -= p2;
+      Yc12_out += p3;
+      Xd12_out -= p4;
+      Yd12_out += p5;
+
+      /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+      pSrc[2u * i1] = Xc12_out;
+
+      /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+      pSrc[(2u * i1) + 1u] = Yc12_out;
+
+      /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+      pSrc[2u * i2] = Xb12_out;
+
+      /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+      pSrc[(2u * i2) + 1u] = Yb12_out;
+
+      /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+      pSrc[2u * i3] = Xd12_out;
+
+      /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+      pSrc[(2u * i3) + 1u] = Yd12_out;
+
+      /*  Twiddle coefficients index modifier */
+      ia1 = ia1 + twidCoefModifier;
+
+      /*  Updating input index */
+      i0 = i0 + 1u;
+
+   } while(--j);
+
+   twidCoefModifier <<= 2u;
+
+   /*  Calculation of second stage to excluding last stage */
+   for (k = fftLen >> 2u; k > 4u; k >>= 2u)
+   {
+      /*  Initializations for the first stage */
+      n1 = n2;
+      n2 >>= 2u;
+      ia1 = 0u;
+
+      /*  Calculation of first stage */
+      j = 0;
+      do
+      {
+         /*  index calculation for the coefficients */
+         ia2 = ia1 + ia1;
+         ia3 = ia2 + ia1;
+         co1 = pCoef[ia1 * 2u];
+         si1 = pCoef[(ia1 * 2u) + 1u];
+         co2 = pCoef[ia2 * 2u];
+         si2 = pCoef[(ia2 * 2u) + 1u];
+         co3 = pCoef[ia3 * 2u];
+         si3 = pCoef[(ia3 * 2u) + 1u];
+
+         /*  Twiddle coefficients index modifier */
+         ia1 = ia1 + twidCoefModifier;
+
+         i0 = j;
+         do
+         {
+            /*  index calculation for the input as, */
+            /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+            i1 = i0 + n2;
+            i2 = i1 + n2;
+            i3 = i2 + n2;
+
+            xaIn = pSrc[(2u * i0)];
+            yaIn = pSrc[(2u * i0) + 1u];
+
+            xbIn = pSrc[(2u * i1)];
+            ybIn = pSrc[(2u * i1) + 1u];
+
+            xcIn = pSrc[(2u * i2)];
+            ycIn = pSrc[(2u * i2) + 1u];
+
+            xdIn = pSrc[(2u * i3)];
+            ydIn = pSrc[(2u * i3) + 1u];
+
+            /* xa - xc */
+            Xaminusc = xaIn - xcIn;
+            /* (xb - xd) */
+            Xbminusd = xbIn - xdIn;
+            /* ya - yc */
+            Yaminusc = yaIn - ycIn;
+            /* (yb - yd) */
+            Ybminusd = ybIn - ydIn;
+
+            /* xa + xc */
+            Xaplusc = xaIn + xcIn;
+            /* xb + xd */
+            Xbplusd = xbIn + xdIn;
+            /* ya + yc */
+            Yaplusc = yaIn + ycIn;
+            /* yb + yd */
+            Ybplusd = ybIn + ydIn;
+
+            /* (xa - xc) - (yb - yd) */
+            Xb12C_out = (Xaminusc - Ybminusd);
+            /* (ya - yc) +  (xb - xd) */
+            Yb12C_out = (Yaminusc + Xbminusd);
+            /* xa + xc -(xb + xd) */
+            Xc12C_out = (Xaplusc - Xbplusd);
+            /* (ya + yc) - (yb + yd) */
+            Yc12C_out = (Yaplusc - Ybplusd);
+            /* (xa - xc) + (yb - yd) */
+            Xd12C_out = (Xaminusc + Ybminusd);
+            /* (ya - yc) -  (xb - xd) */
+            Yd12C_out = (Yaminusc - Xbminusd);
+
+            pSrc[(2u * i0)] = Xaplusc + Xbplusd;
+            pSrc[(2u * i0) + 1u] = Yaplusc + Ybplusd;
+
+            Xb12_out = Xb12C_out * co1;
+            Yb12_out = Yb12C_out * co1;
+            Xc12_out = Xc12C_out * co2;
+            Yc12_out = Yc12C_out * co2;
+            Xd12_out = Xd12C_out * co3;
+            Yd12_out = Yd12C_out * co3;
+
+            /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+            //Xb12_out -= Yb12C_out * si1;
+            p0 = Yb12C_out * si1;
+            /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+            //Yb12_out += Xb12C_out * si1;
+            p1 = Xb12C_out * si1;
+            /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+            //Xc12_out -= Yc12C_out * si2;
+            p2 = Yc12C_out * si2;
+            /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+            //Yc12_out += Xc12C_out * si2;
+            p3 = Xc12C_out * si2;
+            /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+            //Xd12_out -= Yd12C_out * si3;
+            p4 = Yd12C_out * si3;
+            /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+            //Yd12_out += Xd12C_out * si3;
+            p5 = Xd12C_out * si3;
+            
+            Xb12_out -= p0;
+            Yb12_out += p1;
+            Xc12_out -= p2;
+            Yc12_out += p3;
+            Xd12_out -= p4;
+            Yd12_out += p5;
+
+            /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+            pSrc[2u * i1] = Xc12_out;
+
+            /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+            pSrc[(2u * i1) + 1u] = Yc12_out;
+
+            /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+            pSrc[2u * i2] = Xb12_out;
+
+            /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+            pSrc[(2u * i2) + 1u] = Yb12_out;
+
+            /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+            pSrc[2u * i3] = Xd12_out;
+
+            /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+            pSrc[(2u * i3) + 1u] = Yd12_out;
+
+            i0 += n1;
+         } while(i0 < fftLen);
+         j++;
+      } while(j <= (n2 - 1u));
+      twidCoefModifier <<= 2u;
+   }
+   /*  Initializations of last stage */
+
+   j = fftLen >> 2;
+   ptr1 = &pSrc[0];
+
+   /*  Calculations of last stage */
+   do
+   {
+      xaIn = ptr1[0];
+      yaIn = ptr1[1];
+      xbIn = ptr1[2];
+      ybIn = ptr1[3];
+      xcIn = ptr1[4];
+      ycIn = ptr1[5];
+      xdIn = ptr1[6];
+      ydIn = ptr1[7];
+
+      /*  Butterfly implementation */
+      /* xa + xc */
+      Xaplusc = xaIn + xcIn;
+
+      /* xa - xc */
+      Xaminusc = xaIn - xcIn;
+
+      /* ya + yc */
+      Yaplusc = yaIn + ycIn;
+
+      /* ya - yc */
+      Yaminusc = yaIn - ycIn;
+
+      /* xb + xd */
+      Xbplusd = xbIn + xdIn;
+
+      /* yb + yd */
+      Ybplusd = ybIn + ydIn;
+
+      /* (xb-xd) */
+      Xbminusd = xbIn - xdIn;
+
+      /* (yb-yd) */
+      Ybminusd = ybIn - ydIn;
+      
+      /* xa' = (xa+xb+xc+xd) * onebyfftLen */
+      a0 = (Xaplusc + Xbplusd);
+      /* ya' = (ya+yb+yc+yd) * onebyfftLen */
+      a1 = (Yaplusc + Ybplusd);
+      /* xc' = (xa-xb+xc-xd) * onebyfftLen */
+      a2 = (Xaplusc - Xbplusd);
+      /* yc' = (ya-yb+yc-yd) * onebyfftLen  */
+      a3 = (Yaplusc - Ybplusd);
+      /* xb' = (xa-yb-xc+yd) * onebyfftLen */
+      a4 = (Xaminusc - Ybminusd);
+      /* yb' = (ya+xb-yc-xd) * onebyfftLen */
+      a5 = (Yaminusc + Xbminusd);
+      /* xd' = (xa-yb-xc+yd) * onebyfftLen */
+      a6 = (Xaminusc + Ybminusd);
+      /* yd' = (ya-xb-yc+xd) * onebyfftLen */
+      a7 = (Yaminusc - Xbminusd);
+   
+      p0 = a0 * onebyfftLen;
+      p1 = a1 * onebyfftLen;
+      p2 = a2 * onebyfftLen;
+      p3 = a3 * onebyfftLen;
+      p4 = a4 * onebyfftLen;
+      p5 = a5 * onebyfftLen;
+      p6 = a6 * onebyfftLen;
+      p7 = a7 * onebyfftLen;
+   
+      /* xa' = (xa+xb+xc+xd) * onebyfftLen */
+      ptr1[0] = p0;
+      /* ya' = (ya+yb+yc+yd) * onebyfftLen */
+      ptr1[1] = p1;
+      /* xc' = (xa-xb+xc-xd) * onebyfftLen */
+      ptr1[2] = p2;
+      /* yc' = (ya-yb+yc-yd) * onebyfftLen  */
+      ptr1[3] = p3;
+      /* xb' = (xa-yb-xc+yd) * onebyfftLen */
+      ptr1[4] = p4;
+      /* yb' = (ya+xb-yc-xd) * onebyfftLen */
+      ptr1[5] = p5;
+      /* xd' = (xa-yb-xc+yd) * onebyfftLen */
+      ptr1[6] = p6;
+      /* yd' = (ya-xb-yc+xd) * onebyfftLen */
+      ptr1[7] = p7;
+
+      /* increment source pointer by 8 for next calculations */
+      ptr1 = ptr1 + 8u;
+
+   } while(--j);
+
+#else
+
+   float32_t t1, t2, r1, r2, s1, s2;
+
+   /* Run the below code for Cortex-M0 */
+
+   /*  Initializations for the first stage */
+   n2 = fftLen;
+   n1 = n2;
+
+   /*  Calculation of first stage */
+   for (k = fftLen; k > 4u; k >>= 2u)
+   {
+      /*  Initializations for the first stage */
+      n1 = n2;
+      n2 >>= 2u;
+      ia1 = 0u;
+
+      /*  Calculation of first stage */
+      j = 0;
+      do
+      {
+         /*  index calculation for the coefficients */
+         ia2 = ia1 + ia1;
+         ia3 = ia2 + ia1;
+         co1 = pCoef[ia1 * 2u];
+         si1 = pCoef[(ia1 * 2u) + 1u];
+         co2 = pCoef[ia2 * 2u];
+         si2 = pCoef[(ia2 * 2u) + 1u];
+         co3 = pCoef[ia3 * 2u];
+         si3 = pCoef[(ia3 * 2u) + 1u];
+
+         /*  Twiddle coefficients index modifier */
+         ia1 = ia1 + twidCoefModifier;
+
+         i0 = j;
+         do
+         {
+            /*  index calculation for the input as, */
+            /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+            i1 = i0 + n2;
+            i2 = i1 + n2;
+            i3 = i2 + n2;
+
+            /* xa + xc */
+            r1 = pSrc[(2u * i0)] + pSrc[(2u * i2)];
+
+            /* xa - xc */
+            r2 = pSrc[(2u * i0)] - pSrc[(2u * i2)];
+
+            /* ya + yc */
+            s1 = pSrc[(2u * i0) + 1u] + pSrc[(2u * i2) + 1u];
+
+            /* ya - yc */
+            s2 = pSrc[(2u * i0) + 1u] - pSrc[(2u * i2) + 1u];
+
+            /* xb + xd */
+            t1 = pSrc[2u * i1] + pSrc[2u * i3];
+
+            /* xa' = xa + xb + xc + xd */
+            pSrc[2u * i0] = r1 + t1;
+
+            /* xa + xc -(xb + xd) */
+            r1 = r1 - t1;
+
+            /* yb + yd */
+            t2 = pSrc[(2u * i1) + 1u] + pSrc[(2u * i3) + 1u];
+
+            /* ya' = ya + yb + yc + yd */
+            pSrc[(2u * i0) + 1u] = s1 + t2;
+
+            /* (ya + yc) - (yb + yd) */
+            s1 = s1 - t2;
+
+            /* (yb - yd) */
+            t1 = pSrc[(2u * i1) + 1u] - pSrc[(2u * i3) + 1u];
+
+            /* (xb - xd) */
+            t2 = pSrc[2u * i1] - pSrc[2u * i3];
+
+            /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+            pSrc[2u * i1] = (r1 * co2) - (s1 * si2);
+
+            /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+            pSrc[(2u * i1) + 1u] = (s1 * co2) + (r1 * si2);
+
+            /* (xa - xc) - (yb - yd) */
+            r1 = r2 - t1;
+
+            /* (xa - xc) + (yb - yd) */
+            r2 = r2 + t1;
+
+            /* (ya - yc) +  (xb - xd) */
+            s1 = s2 + t2;
+
+            /* (ya - yc) -  (xb - xd) */
+            s2 = s2 - t2;
+
+            /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+            pSrc[2u * i2] = (r1 * co1) - (s1 * si1);
+
+            /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+            pSrc[(2u * i2) + 1u] = (s1 * co1) + (r1 * si1);
+
+            /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+            pSrc[2u * i3] = (r2 * co3) - (s2 * si3);
+
+            /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+            pSrc[(2u * i3) + 1u] = (s2 * co3) + (r2 * si3);
+         
+            i0 += n1;
+         } while( i0 < fftLen);
+         j++;
+      } while(j <= (n2 - 1u));
+      twidCoefModifier <<= 2u;
+   }
+   /*  Initializations of last stage */
+   n1 = n2;
+   n2 >>= 2u;
+
+   /*  Calculations of last stage */
+   for (i0 = 0u; i0 <= (fftLen - n1); i0 += n1)
+   {
+      /*  index calculation for the input as, */
+      /*  pSrc[i0 + 0], pSrc[i0 + fftLen/4], pSrc[i0 + fftLen/2], pSrc[i0 + 3fftLen/4] */
+      i1 = i0 + n2;
+      i2 = i1 + n2;
+      i3 = i2 + n2;
+
+      /*  Butterfly implementation */
+      /* xa + xc */
+      r1 = pSrc[2u * i0] + pSrc[2u * i2];
+
+      /* xa - xc */
+      r2 = pSrc[2u * i0] - pSrc[2u * i2];
+
+      /* ya + yc */
+      s1 = pSrc[(2u * i0) + 1u] + pSrc[(2u * i2) + 1u];
+
+      /* ya - yc */
+      s2 = pSrc[(2u * i0) + 1u] - pSrc[(2u * i2) + 1u];
+
+      /* xc + xd */
+      t1 = pSrc[2u * i1] + pSrc[2u * i3];
+
+      /* xa' = xa + xb + xc + xd */
+      pSrc[2u * i0] = (r1 + t1) * onebyfftLen;
+
+      /* (xa + xb) - (xc + xd) */
+      r1 = r1 - t1;
+
+      /* yb + yd */
+      t2 = pSrc[(2u * i1) + 1u] + pSrc[(2u * i3) + 1u];
+
+      /* ya' = ya + yb + yc + yd */
+      pSrc[(2u * i0) + 1u] = (s1 + t2) * onebyfftLen;
+
+      /* (ya + yc) - (yb + yd) */
+      s1 = s1 - t2;
+
+      /* (yb-yd) */
+      t1 = pSrc[(2u * i1) + 1u] - pSrc[(2u * i3) + 1u];
+
+      /* (xb-xd) */
+      t2 = pSrc[2u * i1] - pSrc[2u * i3];
+
+      /* xc' = (xa-xb+xc-xd)co2 - (ya-yb+yc-yd)(si2) */
+      pSrc[2u * i1] = r1 * onebyfftLen;
+
+      /* yc' = (ya-yb+yc-yd)co2 + (xa-xb+xc-xd)(si2) */
+      pSrc[(2u * i1) + 1u] = s1 * onebyfftLen;
+
+      /* (xa - xc) - (yb-yd) */
+      r1 = r2 - t1;
+
+      /* (xa - xc) + (yb-yd) */
+      r2 = r2 + t1;
+
+      /* (ya - yc) + (xb-xd) */
+      s1 = s2 + t2;
+
+      /* (ya - yc) - (xb-xd) */
+      s2 = s2 - t2;
+
+      /* xb' = (xa+yb-xc-yd)co1 - (ya-xb-yc+xd)(si1) */
+      pSrc[2u * i2] = r1 * onebyfftLen;
+
+      /* yb' = (ya-xb-yc+xd)co1 + (xa+yb-xc-yd)(si1) */
+      pSrc[(2u * i2) + 1u] = s1 * onebyfftLen;
+
+      /* xd' = (xa-yb-xc+yd)co3 - (ya+xb-yc-xd)(si3) */
+      pSrc[2u * i3] = r2 * onebyfftLen;
+
+      /* yd' = (ya+xb-yc-xd)co3 + (xa-yb-xc+yd)(si3) */
+      pSrc[(2u * i3) + 1u] = s2 * onebyfftLen;
+   }
+
+#endif /* #ifndef ARM_MATH_CM0_FAMILY_FAMILY */
+}
+
+/**    
+* @addtogroup ComplexFFT    
+* @{    
+*/
+
+/**    
+* @details    
+* @brief Processing function for the floating-point Radix-4 CFFT/CIFFT.   
+* @deprecated Do not use this function.  It has been superseded by \ref arm_cfft_f32 and will be removed
+* in the future.
+* @param[in]      *S    points to an instance of the floating-point Radix-4 CFFT/CIFFT structure.   
+* @param[in, out] *pSrc points to the complex data buffer of size <code>2*fftLen</code>. Processing occurs in-place.   
+* @return none.   
+*/
+
+void arm_cfft_radix4_f32(
+const arm_cfft_radix4_instance_f32 * S,
+float32_t * pSrc)
+{
+
+   if(S->ifftFlag == 1u)
+   {
+      /*  Complex IFFT radix-4  */
+      arm_radix4_butterfly_inverse_f32(pSrc, S->fftLen, S->pTwiddle,
+      S->twidCoefModifier, S->onebyfftLen);
+   }
+   else
+   {
+      /*  Complex FFT radix-4  */
+      arm_radix4_butterfly_f32(pSrc, S->fftLen, S->pTwiddle,
+      S->twidCoefModifier);
+   }
+
+   if(S->bitReverseFlag == 1u)
+   {
+      /*  Bit Reversal */
+      arm_bitreversal_f32(pSrc, S->fftLen, S->bitRevFactor, S->pBitRevTable);
+   }
+
+}
+
+/**    
+* @} end of ComplexFFT group    
+*/
+

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/Source/TransformFunctions/arm_cfft_radix4_init_f32.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/Source/TransformFunctions/arm_cfft_radix4_init_f32.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,165 @@
+/* ----------------------------------------------------------------------    
+* Copyright (C) 2010-2014 ARM Limited. All rights reserved.    
+*    
+* $Date:        19. March 2015 
+* $Revision: 	V.1.4.5  
+*    
+* Project: 	    CMSIS DSP Library    
+* Title:	    arm_cfft_radix4_init_f32.c    
+*    
+* Description:	Radix-4 Decimation in Frequency Floating-point CFFT & CIFFT Initialization function    
+*    
+* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
+*  
+* Redistribution and use in source and binary forms, with or without 
+* modification, are permitted provided that the following conditions
+* are met:
+*   - Redistributions of source code must retain the above copyright
+*     notice, this list of conditions and the following disclaimer.
+*   - Redistributions in binary form must reproduce the above copyright
+*     notice, this list of conditions and the following disclaimer in
+*     the documentation and/or other materials provided with the 
+*     distribution.
+*   - Neither the name of ARM LIMITED nor the names of its contributors
+*     may be used to endorse or promote products derived from this
+*     software without specific prior written permission.
+*
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
+* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.    
+* -------------------------------------------------------------------- */
+
+
+#include "arm_math.h"
+#include "arm_common_tables.h"
+
+/**    
+ * @ingroup groupTransforms    
+ */
+
+/**    
+ * @addtogroup ComplexFFT    
+ * @{    
+ */
+
+/**    
+* @brief  Initialization function for the floating-point CFFT/CIFFT.   
+* @deprecated Do not use this function.  It has been superceded by \ref arm_cfft_f32 and will be removed
+* in the future.
+* @param[in,out] *S             points to an instance of the floating-point CFFT/CIFFT structure.   
+* @param[in]     fftLen         length of the FFT.   
+* @param[in]     ifftFlag       flag that selects forward (ifftFlag=0) or inverse (ifftFlag=1) transform.   
+* @param[in]     bitReverseFlag flag that enables (bitReverseFlag=1) or disables (bitReverseFlag=0) bit reversal of output.   
+* @return        The function returns ARM_MATH_SUCCESS if initialization is successful or ARM_MATH_ARGUMENT_ERROR if <code>fftLen</code> is not a supported value.   
+*    
+* \par Description:   
+* \par    
+* The parameter <code>ifftFlag</code> controls whether a forward or inverse transform is computed.    
+* Set(=1) ifftFlag for calculation of CIFFT otherwise  CFFT is calculated   
+* \par    
+* The parameter <code>bitReverseFlag</code> controls whether output is in normal order or bit reversed order.    
+* Set(=1) bitReverseFlag for output to be in normal order otherwise output is in bit reversed order.    
+* \par    
+* The parameter <code>fftLen</code>	Specifies length of CFFT/CIFFT process. Supported FFT Lengths are 16, 64, 256, 1024.    
+* \par    
+* This Function also initializes Twiddle factor table pointer and Bit reversal table pointer.    
+*/
+
+arm_status arm_cfft_radix4_init_f32(
+  arm_cfft_radix4_instance_f32 * S,
+  uint16_t fftLen,
+  uint8_t ifftFlag,
+  uint8_t bitReverseFlag)
+{
+  /*  Initialise the default arm status */
+  arm_status status = ARM_MATH_SUCCESS;
+
+  /*  Initialise the FFT length */
+  S->fftLen = fftLen;
+
+  /*  Initialise the Twiddle coefficient pointer */
+  S->pTwiddle = (float32_t *) twiddleCoef;
+
+  /*  Initialise the Flag for selection of CFFT or CIFFT */
+  S->ifftFlag = ifftFlag;
+
+  /*  Initialise the Flag for calculation Bit reversal or not */
+  S->bitReverseFlag = bitReverseFlag;
+
+  /*  Initializations of structure parameters depending on the FFT length */
+  switch (S->fftLen)
+  {
+
+  case 4096u:
+    /*  Initializations of structure parameters for 4096 point FFT */
+
+    /*  Initialise the twiddle coef modifier value */
+    S->twidCoefModifier = 1u;
+    /*  Initialise the bit reversal table modifier */
+    S->bitRevFactor = 1u;
+    /*  Initialise the bit reversal table pointer */
+    S->pBitRevTable = (uint16_t *) armBitRevTable;
+    /*  Initialise the 1/fftLen Value */
+    S->onebyfftLen = 0.000244140625;
+    break;
+
+  case 1024u:
+    /*  Initializations of structure parameters for 1024 point FFT */
+
+    /*  Initialise the twiddle coef modifier value */
+    S->twidCoefModifier = 4u;
+    /*  Initialise the bit reversal table modifier */
+    S->bitRevFactor = 4u;
+    /*  Initialise the bit reversal table pointer */
+    S->pBitRevTable = (uint16_t *) & armBitRevTable[3];
+    /*  Initialise the 1/fftLen Value */
+    S->onebyfftLen = 0.0009765625f;
+    break;
+
+
+  case 256u:
+    /*  Initializations of structure parameters for 256 point FFT */
+    S->twidCoefModifier = 16u;
+    S->bitRevFactor = 16u;
+    S->pBitRevTable = (uint16_t *) & armBitRevTable[15];
+    S->onebyfftLen = 0.00390625f;
+    break;
+
+  case 64u:
+    /*  Initializations of structure parameters for 64 point FFT */
+    S->twidCoefModifier = 64u;
+    S->bitRevFactor = 64u;
+    S->pBitRevTable = (uint16_t *) & armBitRevTable[63];
+    S->onebyfftLen = 0.015625f;
+    break;
+
+  case 16u:
+    /*  Initializations of structure parameters for 16 point FFT */
+    S->twidCoefModifier = 256u;
+    S->bitRevFactor = 256u;
+    S->pBitRevTable = (uint16_t *) & armBitRevTable[255];
+    S->onebyfftLen = 0.0625f;
+    break;
+
+
+  default:
+    /*  Reporting argument error if fftSize is not valid value */
+    status = ARM_MATH_ARGUMENT_ERROR;
+    break;
+  }
+
+  return (status);
+}
+
+/**    
+ * @} end of ComplexFFT group    
+ */

diff -r b9debc14d077 -r 9dd7c64b4a64 CMSIS-DSP_Lib/license.txt
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CMSIS-DSP_Lib/license.txt	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,29 @@
+All files contained in the folders "CMSIS\DSP-Lib\Source" and "CMSIS\DSP-Lib\Examples"
+are guided by the following license:
+
+Copyright (C) 2009-2015 ARM Limited. 
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+ - Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+ - Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+ - Neither the name of ARM nor the names of its contributors may be used 
+   to endorse or promote products derived from this software without 
+   specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS AND CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+

diff -r b9debc14d077 -r 9dd7c64b4a64 CN0540_FFT/cn0540_adi_fft.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CN0540_FFT/cn0540_adi_fft.c	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,483 @@
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+ 
+#include "cn0540_adi_fft.h"
+#include "cn0540_windowing.h"
+#include "stdio.h"
+#include "stdlib.h"
+#include <math.h>
+#include <stdbool.h>
+
+static arm_cfft_radix4_instance_f32 S;
+
+/**
+ * Initialize the FFT structure
+ * @param **fft_entry_init - the FFT data structure
+ * @param **fft_meas - the FFT measurements structure
+ */
+int32_t FFT_init_params(struct fft_entry **fft_entry_init, struct fft_measurements **fft_meas)
+{
+	struct fft_entry *fft_data_init;
+	struct fft_measurements *fft_meas_init;
+
+	fft_data_init = (struct fft_entry *)malloc(sizeof(*fft_data_init));
+	if (!fft_data_init) {
+		return -1;
+	}	
+	fft_meas_init = (struct fft_measurements *)malloc(sizeof(*fft_meas_init));
+	if (!fft_meas_init) {
+		return -1;
+	}
+	fft_data_init->window = BLACKMAN_HARRIS_7TERM;
+	fft_data_init->vref = (float)(4096);
+	fft_data_init->sample_rate = 32000;
+	fft_data_init->mclk = 16384;
+	fft_data_init->fft_length = 4096;
+	fft_data_init->bin_width = 0.0;
+	fft_data_init->fft_done = false;
+	
+	*fft_entry_init = fft_data_init;
+	
+	fft_meas_init->fundamental = 0.0;
+	fft_meas_init->pk_spurious_noise = 0.0;
+	fft_meas_init->pk_spurious_freq = 0;	
+	fft_meas_init->THD = 0.0;
+	fft_meas_init->SNR = 0.0;
+	fft_meas_init->DR = 0.0;
+	fft_meas_init->SINAD = 0.0;
+	fft_meas_init->SFDR_dbc = 0.0;
+	fft_meas_init->SFDR_dbfs = 0.0;
+	fft_meas_init->ENOB = 0.0;
+	fft_meas_init->RMS_noise = 0.0;
+	fft_meas_init->average_bin_noise = 0.0;
+	fft_meas_init->max_amplitude = 0.0;
+	fft_meas_init->min_amplitude = 0.0; 
+	fft_meas_init->pk_pk_amplitude = 0.0;
+	fft_meas_init->DC = 0.0;
+	fft_meas_init->transition_noise = 0.0;
+	fft_meas_init->max_amplitude_LSB = 0;
+	fft_meas_init->min_amplitude_LSB = 0; 
+	fft_meas_init->pk_pk_amplitude_LSB = 0;
+	fft_meas_init->DC_LSB = 0;
+	fft_meas_init->transition_noise_LSB = 0.0;
+	
+	for (uint8_t i = 0; i < 7; i++) {
+		fft_meas_init->harmonics_mag_dbfs[i] = 0.0;
+		fft_meas_init->harmonics_freq[i] = 0;
+		fft_meas_init->harmonics_power[i] = 0.0;
+	}
+	
+	*fft_meas = fft_meas_init;
+	
+	return 0;
+}
+
+/**
+ * Initialize the FFT module
+ * The funciton has to be called, everytime when user wants to change the number of samples
+ * @param sample_count - desired FFT samples
+ * @param *fft_data - fft_data structure
+ */
+void FFT_init(uint16_t sample_count, struct fft_entry *fft_data)
+{
+	arm_cfft_radix4_init_f32(&S, sample_count, 0, 1);
+	fft_data->fft_length = sample_count / 2;
+}
+
+/**
+ * Update reference voltage and Master clock
+ * The funciton has to be called, everytime when user wants to change Vref or Mclk	
+ * @param referemce - The reference voltage in mV
+ * @param master_clock - The master clock frequnecy in kHz
+ * @param sampling_rate - The samplingrate frequnecy in kHz
+ * @param *fft_data - fft_data structure
+ */
+void update_FFT_enviroment(uint16_t reference, uint16_t master_clock, uint16_t sampling_rate, struct fft_entry *fft_data)
+{
+	fft_data->vref = ((float)(reference)) / ((float)(1000.0)); 			// Convert reference voltage to V
+	fft_data->mclk = master_clock;											    // MCLK in kHz
+	fft_data->sample_rate = sampling_rate;										// Sampling rate
+}
+
+
+
+
+/**
+ * Perform the FFT
+ * @param *data - pointer to sampled data
+ * @param *fft_data		-	fft_data structure
+ * @param *fft_meas		-	fft_meas structure
+ * @param sample_rate	-	sample rate based on MCLK, MCLK_DIV and Digital filter settings
+ */
+void perform_FFT(uint32_t *data, struct fft_entry *fft_data, struct fft_measurements *fft_meas, uint32_t sample_rate)
+{
+	uint32_t i;
+	int32_t shifted_data = 0;	
+	double coeffs_sum = 0.0;
+	
+	fft_data->fft_done = false;
+	fft_data->sample_rate = sample_rate;														// get sample rate
+	fft_data->bin_width = (float)(fft_data->sample_rate) / ((float)(fft_data->fft_length)*2);	// get bin width
+	
+	
+	//Converting RAW adc data to codes
+	for(i = 0 ; i < fft_data->fft_length * 2 ; i++) {
+		if (data[i] & 0x800000)
+			shifted_data = (int32_t)((0xFF << 24) | data[i]);	
+		else
+			shifted_data = (int32_t)((0x00 << 24) | data[i]);	
+		
+		fft_data->codes[i] = shifted_data;  													// Codes in range 0 +/- ZERO SCALE
+		fft_data->zero_scale_codes[i] = shifted_data + ADC_ZERO_SCALE;							// Codes shifted up by ZERO SCALE - range ZERO SCALE +/- ZERO SCALE
+	}	
+	// Find max, min, pk-pk amplitude, DC offset, Tranition noise
+	FFT_waveform_stat(fft_data, fft_meas);	
+	
+	// Converting codes without DC offset to "volts" without respect to Vref voltage
+	for (i = 0; i < fft_data->fft_length * 4; i++) {
+		// Filling array for FFT, conversion to voltage withour respect to Vref voltage		
+		fft_data->fft_input[i] = ((((float)(fft_data->codes[i / 2])) / ADC_ZERO_SCALE));
+				
+		// Voltage array with respect to full scale voltage, Vpp
+		fft_data->voltage[i / 2] = (float)((2*fft_data->vref / ADC_ZERO_SCALE) * fft_data->codes[i / 2]);
+				
+		// Imaginary part
+		fft_data->fft_input[++i] = 0;
+	}	
+	// Apply windowing
+	FFT_windowing(fft_data, &coeffs_sum);	
+	//perform FFT, passing the input array, complex FFT values will be stored to the iput array
+	arm_cfft_radix4_f32(&S, fft_data->fft_input);	
+	//transform from complex FFT to magnitude
+	arm_cmplx_mag_f32(fft_data->fft_input, fft_data->fft_magnitude, fft_data->fft_length);	
+	// Convert FFT magnitude to dB for plot
+	FFT_maginutde_do_dB(fft_data, coeffs_sum);	
+	//Calculate THD
+	FFT_calculate_THD(fft_data, fft_meas);	
+	// Calculate noise and its parameters from FFT points
+	FFT_calculate_noise(fft_data, fft_meas);	
+	// FFT finish flag
+	fft_data->fft_done = true;
+}
+
+/**
+ * Windowing function
+ * 7-term Blackman-Harris and Rectangular widow for now, prepared for more windowing functions if needed
+ * You can use precalculated coeficients for 4096 sample length
+ * @param *sample			-	pointer to sample
+ * @param sample_length		-	2 * FFT_length, because working with samples, not with FFT yet
+ * @param *sum				-	pointer to sum of all the coeffs
+ * 
+ */
+void static FFT_windowing(struct fft_entry *fft_data, double *sum)
+{
+	uint8_t j;
+	uint16_t i;
+	double term = 0.0;
+	const double sample_count = (double)((fft_data->fft_length * 2) - 1);
+	
+	for (i = 0; i < fft_data->fft_length * 4; i++)
+	{
+		switch (fft_data->window) {																	// Switch prepard for other windowing functions
+		case BLACKMAN_HARRIS_7TERM:		
+			if (fft_data->fft_length == 2048)													// For 4096 samples, precalculated coefficients are used
+				term = Seven_Term_Blackman_Harris_4096[i / 2];			
+			else {																			// 7-term BH windowing formula 
+				for (j = 0; j < 7; j++)
+					term += seven_term_BH_coefs[j] * cos((double)((2.0 * PI * j * (i / 2))) / sample_count);
+			}
+			break;
+		case RECTANGULAR:																			// No window, all terms = 1
+			term = 1;
+			break;			
+		default:
+			break;
+		}
+		*sum += term;																				// Getting sum of all terms, which will be used for amplitude correction
+		fft_data->fft_input[i] *= (float)(term);												// Multiplying each (real) sample by windowing term
+		term = 0.0;
+		i++;																						// +1, to consider only real (not imaginary) samples
+	}
+}
+
+
+/**
+ * Transfer magnitude to dB
+ * @param *fft_data -	fft_data structure
+ * @param sum		-	sum of all windowing coeffs
+ */
+void static FFT_maginutde_do_dB(struct fft_entry *fft_data, double sum)
+{
+	uint16_t i;
+	float correction = 0;
+	
+	// Getting sum of coeffs
+	// If rectangular window is choosen = no windowing, sum of coeffs is number of samples
+	const float coeff_sum = ((fft_data->fft_length == 2048) && 
+							(fft_data->window == BLACKMAN_HARRIS_7TERM)) ? ((float)(Seven_Term_Blackman_Harris_4096_sum)) :
+							((fft_data->window == RECTANGULAR) ? ((float)(fft_data->fft_length * 2.0)) : (float)(sum));
+	
+	for (i = 0; i < fft_data->fft_length; i++) {
+		// Apply a correction factor
+		// Divide magnigude by a sum of the windowing function coefficients
+		// Multiple by 2 because of power spread over spectrum below and above the Nyquist frequency
+		correction = (fft_data->fft_magnitude[i] * 2.0) / coeff_sum;
+		
+		// FFT magnitude with windowing correction
+		fft_data->fft_magnitude_corrected[i] = correction;
+		
+		//Convert to dB without respect to Vref
+		fft_data->fft_dB[i] = 20.0 * (log10f(correction));
+	}
+}
+
+/**
+ * THD Calculation with support of harmonics folding to 1-st nyquist zone
+ * @param *fft_data - fft_data structure
+ * @param *fft_meas - fft_meas structure
+ */
+void static FFT_calculate_THD(struct fft_entry *fft_data, struct fft_measurements *fft_meas)
+{
+	const uint16_t first_nyquist_zone = fft_data->fft_length;
+	uint16_t i, j, k = 0, fund_freq = 0, harmonic_position;
+	int8_t m, nyquist_zone;
+	float mag_helper = -200.0, freq_helper, sum = 0.0, fund_mag = -200.0; 
+	float fund_pow_bins[21], harm_pow_bins[5][7];
+	
+	// Looking for the fundamental frequency and amplitude
+	for(i = DC_BINS ; i < fft_data->fft_length ; i++) {												// Not counting DC bins
+		if (fft_data->fft_dB[i] > fund_mag) {
+			fund_mag = fft_data->fft_dB[i];
+			fund_freq = i;
+		}
+	}
+	
+	fft_meas->harmonics_freq[0] = fund_freq; 													// Fundamental frequency bin
+	fft_meas->harmonics_mag_dbfs[0] = fund_mag;  												// Fundamental magnitude in dBFS
+	fft_meas->fundamental = dbfs_to_volts(fft_data->vref, fund_mag); 							// Fundamental magnitude in V
+
+	for(i = 1 ; i < 6 ; i++) {		
+		if (fft_meas->harmonics_freq[0] * (i + 1) < first_nyquist_zone)							// Checking if 2nd harmonic is outside of the first nyquist zone
+			harmonic_position = fft_meas->harmonics_freq[0] * (i + 1);
+		else {
+			nyquist_zone = 1 + (fft_meas->harmonics_freq[0] * (i + 1) / first_nyquist_zone); 	// Determine the nyquist zone
+			if(nyquist_zone % 2)																// Odd nyquist zones: 3, 5, 7...
+				harmonic_position = first_nyquist_zone - (first_nyquist_zone * nyquist_zone - fft_meas->harmonics_freq[0] * (i + 1));
+			else																				// Even nyquist zones: 2, 4, 6...
+				harmonic_position = first_nyquist_zone * nyquist_zone - fft_meas->harmonics_freq[0] * (i + 1);
+		}
+		// Extend searching range by 3 bins around expected position of the harmonic
+		for(m = -HARM_BINS ; m <= HARM_BINS ; m++) {
+			if (fft_data->fft_dB[harmonic_position + m] > mag_helper) {
+				mag_helper = fft_data->fft_dB[harmonic_position + m];
+				freq_helper = (harmonic_position + m);
+			}	
+		}
+		
+		fft_meas->harmonics_freq[i] = freq_helper;
+		fft_meas->harmonics_mag_dbfs[i]  = mag_helper;
+		mag_helper = -200.0;
+	}	
+	// Power leakage of the fundamental
+	for(i = fft_meas->harmonics_freq[0] - FUND_BINS ; i <= fft_meas->harmonics_freq[0] + FUND_BINS ; i++) {
+		sum += powf(((fft_data->fft_magnitude_corrected[i] / (2.0*SQRT_2))), 2.0);
+		fund_pow_bins[k] = fft_data->fft_magnitude_corrected[i];
+		k++;
+	}		
+	// Finishing the RSS of power-leaked fundamental
+	sum = sqrt(sum);
+	fft_meas->harmonics_power[0] = sum * 2.0 * SQRT_2;
+	sum = 0.0;
+	k = 0;	
+	// Power leakage of the harmonics
+	for(j = 1 ; j <= 5 ; j++) {
+		for (i = fft_meas->harmonics_freq[j] - HARM_BINS; i <= fft_meas->harmonics_freq[j] + HARM_BINS; i++) {
+			sum += powf(((fft_data->fft_magnitude_corrected[i] / (2.0*SQRT_2))), 2.0);
+			harm_pow_bins[j - 1][k] = fft_data->fft_magnitude_corrected[i];
+			k++;
+		}		
+		// Finishing the RSS of power-leaked harmonics
+		k = 0;
+		sum = sqrt(sum);
+		fft_meas->harmonics_power[j] = sum * 2.0 * SQRT_2;
+		sum = 0.0;
+	}	
+	// The THD formula
+	fft_meas->THD = sqrtf(powf(fft_meas->harmonics_power[1], 2.0) + powf(fft_meas->harmonics_power[2], 2.0) + powf(fft_meas->harmonics_power[3], 2.0) + powf(fft_meas->harmonics_power[4], 2.0) + powf(fft_meas->harmonics_power[5], 2.0)) / fft_meas->harmonics_power[0];
+	// Back from volts to dB
+	fft_meas->THD = 20.0 * log10f(fft_meas->THD);
+}
+
+/**
+ * Calculate amplitudes: min, max, pk-pk amplitude and DC part
+ * @param *fft_data - fft_data structure
+ * @param *fft_meas - fft_meas structure
+ */
+void static FFT_waveform_stat(struct fft_entry *fft_data, struct fft_measurements *fft_meas)
+{
+	uint16_t i;
+	int16_t max_position, min_position;
+	int32_t max = -ADC_ZERO_SCALE, min = ADC_ZERO_SCALE, offset_correction;
+	int64_t sum = 0;
+	double deviation = 0.0, mean;
+	
+	// summ of all coeffs, to find the Mean value
+	for(i = 0; i < fft_data->fft_length * 2; i++)
+		sum += fft_data->codes[i];
+	
+	// Calculating mean value = DC offset
+	mean = (sum / (fft_data->fft_length * 2));
+	
+	// DC part in LSBs
+	fft_meas->DC_LSB = (int32_t)(mean) + ADC_ZERO_SCALE;
+	offset_correction = (int32_t)(mean);
+	
+	// Min, Max amplitudes + Deviation
+	for (i = 0; i < fft_data->fft_length * 2; i++) {						// Working with codes = fft_length * 2
+		// Calculating the Deviation for Transition noise
+		deviation += pow(fft_data->codes[i] - mean, 2.0);
+		
+		// Looking for MAX value
+		if (fft_data->codes[i] > max) {
+			max = fft_data->codes[i];
+			max_position = i;
+		}
+		// Looking for MIN value
+		if (fft_data->codes[i] < min) {
+			min = fft_data->codes[i];
+			min_position = i;
+		}	
+	}	
+	// Amplitudes in Volts
+	fft_meas->max_amplitude = (2.0 * fft_data->vref * fft_data->codes[max_position]) / ADC_FULL_SCALE; 
+	fft_meas->min_amplitude = (2.0 * fft_data->vref * fft_data->codes[min_position]) / ADC_FULL_SCALE; 
+	fft_meas->pk_pk_amplitude = fft_meas->max_amplitude - fft_meas->min_amplitude;
+	fft_meas->DC = (2.0 * fft_data->vref * ((float)(((int32_t)(fft_meas->DC_LSB) - ADC_ZERO_SCALE)))) / ADC_FULL_SCALE;
+	
+	// Amplitudes in LSBs
+	fft_meas->max_amplitude_LSB = fft_data->codes[max_position] + ADC_ZERO_SCALE;
+	fft_meas->min_amplitude_LSB = fft_data->codes[min_position] + ADC_ZERO_SCALE;
+	fft_meas->pk_pk_amplitude_LSB = fft_meas->max_amplitude_LSB - fft_meas->min_amplitude_LSB;
+	
+	// Transition noise
+	deviation = (sqrt(deviation / (fft_data->fft_length * 2.0)));
+	fft_meas->transition_noise_LSB = (uint32_t)(deviation);
+	fft_meas->transition_noise = (2.0 * fft_data->vref * fft_meas->transition_noise_LSB) / ADC_FULL_SCALE;
+	
+	// RMS noise
+	fft_meas->RMS_noise = fft_meas->transition_noise;
+	
+	// Applying mean value to each sample = removing DC offset
+	for(i = 0 ; i < fft_data->fft_length * 2 ; i++)
+		fft_data->codes[i] -= offset_correction;
+	
+}
+
+/**
+ * Calculate the RMS noise from the FFT plot
+ * @param *fft_data - fft_data structure
+ * @param *fft_meas - fft_meas structure
+ */
+void static FFT_calculate_noise(struct fft_entry *fft_data, struct fft_measurements *fft_meas)
+{
+	const float LW_DR_correction_const = 4.48;																		// Magic constant from the LabView FFT core correcting only dynamic range
+	uint16_t i, j;
+	float biggest_spur = -300;
+	double RSS = 0.0, mean = 0.0;
+	
+	// Initalizing pk_spurious variables
+	fft_meas->pk_spurious_noise = -200.0;
+	fft_meas->pk_spurious_freq = 0;
+	
+	for (i = 0; i < DC_BINS; i++)																						// Ignoring DC bins
+		fft_data->noise_bins[i] = 0.0;
+	for (i = DC_BINS; i < fft_data->fft_length; i++) {
+		// Ignoring spread near the fundamental
+		if ((i <= fft_meas->harmonics_freq[0] + FUND_BINS) && (i >= fft_meas->harmonics_freq[0] - FUND_BINS))			
+			fft_data->noise_bins[i] = 0.0;
+			
+		else if((i <= fft_meas->harmonics_freq[1] + HARM_BINS) && (i >= fft_meas->harmonics_freq[1] - HARM_BINS))		// Ignoring spread near harmonics
+			fft_data->noise_bins[i] = 0.0;
+		
+		else if((i <= fft_meas->harmonics_freq[2] + HARM_BINS) && (i >= fft_meas->harmonics_freq[2] - HARM_BINS))
+			fft_data->noise_bins[i] = 0.0;
+
+		else if((i <= fft_meas->harmonics_freq[3] + HARM_BINS) && (i >= fft_meas->harmonics_freq[3] - HARM_BINS))
+			fft_data->noise_bins[i] = 0.0;
+		
+		else if((i <= fft_meas->harmonics_freq[4] + HARM_BINS) && (i >= fft_meas->harmonics_freq[4] - HARM_BINS))
+			fft_data->noise_bins[i] = 0.0;
+
+		else if((i <= fft_meas->harmonics_freq[5] + HARM_BINS) && (i >= fft_meas->harmonics_freq[5] - HARM_BINS))
+			fft_data->noise_bins[i] = 0.0;
+
+		else {		
+			// Root Sum Square = RSS for noise calculations
+			fft_data->noise_bins[i] = fft_data->fft_magnitude_corrected[i];
+			RSS += pow(((double)(fft_data->fft_magnitude_corrected[i] / (2.0*SQRT_2))), 2.0);
+			
+			// Average bin noise
+			mean += fft_data->fft_magnitude_corrected[i];
+			
+			// Peak spurious amplitude
+			if(fft_data->fft_magnitude_corrected[i] > fft_meas->pk_spurious_noise) {
+				fft_meas->pk_spurious_noise = fft_data->fft_magnitude_corrected[i];
+				fft_meas->pk_spurious_freq = i;
+			}
+			
+		}	
+	}	
+	mean /= (double)(fft_data->fft_length);
+	
+	// RSS of FFT spectrum without DC, Fund. and Harmonics
+	RSS = sqrt(RSS);
+	RSS = RSS * 2.0 * SQRT_2;
+	
+	// Peak spurious amplitude = Highest amplitude excluding DC, the Fundamental and the Harmonics
+	fft_meas->pk_spurious_noise = 20.0 * log10f(1.0 / fft_meas->pk_spurious_noise);
+	
+	// Looking for the biggest spur among harmonics
+	for(i = 1 ; i < 6 ; i++) {
+		if (fft_meas->harmonics_mag_dbfs[i] > biggest_spur)
+			biggest_spur = fft_meas->harmonics_mag_dbfs[i]; 
+	}	
+	// Looking for the biggest spur among harmonics and pk_spurious_noise
+	if(biggest_spur > fft_meas->pk_spurious_noise)
+		biggest_spur = fft_meas->pk_spurious_noise;
+	
+	// Spurious Free Dynamic Range SFDR related to the carrer = biggest spur - the Fundamental, [dBc] - Decibels related to the carrier
+	fft_meas->SFDR_dbc = biggest_spur - fft_meas->harmonics_mag_dbfs[0];
+
+	// Spurious Free Dynamic Range SFDR related to the full-scale = biggest spur - full-scale [dBFS], where full-scale is 0 dBFS
+	fft_meas->SFDR_dbfs = biggest_spur;
+	
+	// Average bin noise = Mean value of FFT spectrum excluding DC, the Fundamental and the Harmonics
+	fft_meas->average_bin_noise = (float)(20.0 * log10(mean));
+	
+	// DR = 1 / RSS of FFT spectrum without DC, Fund. and Harmonics + Magic constant from the Labview FFT core
+	fft_meas->DR = (20.0 * log10f(1.0 / (float)(RSS))) + LW_DR_correction_const;
+	
+	// SNR = Power of the fundamental / RSS of FFT spectrum without DC, Fund. and Harmonics
+	fft_meas->SNR = 20.0 * log10f((fft_meas->harmonics_power[0]) / (RSS));
+	
+	// SINAD
+	fft_meas->SINAD = -10.0 * log10f(powf(10.0, (fabs(fft_meas->SNR))*(-1.0) / 10.0) + powf(10.0, fabs(fft_meas->THD)*(-1.0) / 10.0));
+	
+	// ENOB - Effective number of bits
+	fft_meas->ENOB = (fft_meas->SINAD - 1.67 + fabs(fft_meas->harmonics_mag_dbfs[0])) / 6.02; 
+}
+
+/**
+ * Convert dBFS to volts in Pk-Pk
+ * @param vref - reference voltage in volts
+ * @param *fft_meas - fft_meas structure
+ */
+float static dbfs_to_volts(float vref, float value)
+{
+	return ( 2 * vref * powf(10.0, value / 20.0) );
+}

diff -r b9debc14d077 -r 9dd7c64b4a64 CN0540_FFT/cn0540_adi_fft.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CN0540_FFT/cn0540_adi_fft.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,114 @@
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+
+#ifndef CN0540_ADI_FFT_H_
+#define CN0540_ADI_FFT_H_
+
+#include "stdint.h"
+#include "arm_math.h"
+#include "stdbool.h"
+
+/**
+ ******************************************************************************************************************************************************************
+ ******************************************************************************************************************************************************************
+ * 
+ *			MACROS AND CONSTANT DEFINITIONS
+ */
+
+#define N_BITS					24								// Define resolution of the ADC
+
+
+#define SQRT_2					sqrt(2)
+#define ADC_FULL_SCALE			(1 << N_BITS)					// Full scale of the ADC depends on the N_BITS macro, for 24-bit ADC, ADC_FULL_SCALE = 16777216
+#define ADC_ZERO_SCALE			(1 << (N_BITS - 1))				// Zero scale of the ADC depends on the N_BITS macro, for 24-bit ADC, ADC_ZERO_SCALE = 8388608
+
+#define DC_BINS					10								// Ignoring certain amount of DC bins for for noise and other calculations
+#define FUND_BINS				10								// Power spread of the fundamental, 10 bins from either side of the fundamental
+#define HARM_BINS				3								// Power spread of the harmonic, 10 bins from either side of the harmonic
+
+
+/**
+ ******************************************************************************************************************************************************************
+ ******************************************************************************************************************************************************************
+ * 
+ *			TYPES DECLARATIONS
+ */
+
+enum fft_windowing_type
+{
+	BLACKMAN_HARRIS_7TERM,
+	RECTANGULAR
+};
+
+struct fft_entry									// Structure carying all the necessary data, the FFT work with
+{
+	float	vref;
+	uint16_t	mclk;
+	float	bin_width;
+	uint32_t	sample_rate;						// Sample rate based on MCLK, MCLK_DIV and Digital filter settings
+	uint16_t	fft_length;							// Length of fft = sample_count / 2
+	int32_t		codes[4096];						// Codes in range 0 +/- ZERO SCALE
+	int32_t		zero_scale_codes[4096];				// Codes shifted up by ZERO SCALE - range ZERO SCALE +/- ZERO SCALE
+	float	voltage[4096];						// Voltage before windowing
+	float	fft_magnitude[2048];				// Maximum length of FFT magnitude supported by the on-chip DSP = 4096 samples
+	float	fft_magnitude_corrected[2048];		// Maginute with windowing correction
+	float	fft_dB[2048];						// dB fro plot
+	float	fft_input[8192];					// Maximum length of FFT input array supporred by the on-chip DSP = 4096 Real + 4096 Imaginary samples
+	float	noise_bins[2048];					// FFT bins excluding DC, fundamental and Harmonics
+	enum fft_windowing_type window;					// WIndow type
+	bool		fft_done;
+};
+
+struct fft_measurements								// Structure carying all the FFT measurements
+{
+	float	harmonics_power[6]; 				// Harmonics, including their power leakage
+	float	harmonics_mag_dbfs[6]; 				// Harmonic magnitudes for THD
+	uint16_t	harmonics_freq[6]; 					// Harmonic frequencies for THD
+	float	fundamental;  						// Fundamental in volts
+	float	pk_spurious_noise; 					// Peak spurious noise (amplitude)
+	uint16_t	pk_spurious_freq; 					// Peak Spurious Frequency
+	float	THD; 								// Total Harmonic Distortion
+	float	SNR; 								// Signal to Noise Ratio
+	float	DR; 								// Dynamic Range
+	float	SINAD; 								// Signal to Noise And Distortion ratio
+	float	SFDR_dbc; 							// Spurious Free Dynamic Range, dBc
+	float	SFDR_dbfs; 							// Spurious Free Dynamic Range, dbFS
+	float	ENOB; 								// ENOB - Effective Number Of Bits
+	float	RMS_noise; 							// same as transition noise
+	float	average_bin_noise;
+	float	max_amplitude;
+	float	min_amplitude;
+	float	pk_pk_amplitude;
+	float	DC;
+	float	transition_noise;
+	uint32_t	max_amplitude_LSB;
+	uint32_t	min_amplitude_LSB;
+	uint32_t	pk_pk_amplitude_LSB;
+	int32_t		DC_LSB;
+	float	transition_noise_LSB;
+};
+
+/**
+ ******************************************************************************************************************************************************************
+ ******************************************************************************************************************************************************************
+ * 
+ *			FUNCTION DECLARATIONS
+ */
+
+int32_t FFT_init_params(struct fft_entry **fft_entry_init, struct fft_measurements **fft_meas);
+void perform_FFT(uint32_t *data, struct fft_entry *fft_data, struct fft_measurements *fft_meas, uint32_t sample_rate);
+void FFT_init(uint16_t sample_count, struct fft_entry *fft_data);
+void update_FFT_enviroment(uint16_t reference, uint16_t master_clock, uint16_t sampling_rate, struct fft_entry *fft_data);
+void static FFT_maginutde_do_dB(struct fft_entry *fft_data, double sum);
+void static FFT_calculate_THD(struct fft_entry *fft_data, struct fft_measurements *fft_meas);
+void static FFT_calculate_noise(struct fft_entry *fft_data, struct fft_measurements *fft_meas);
+float static dbfs_to_volts(float vref, float value);
+void static FFT_windowing(struct fft_entry *fft_data, double *sum);
+void static FFT_waveform_stat(struct fft_entry *fft_data, struct fft_measurements *fft_meas);
+#endif // !ADI_FFT_H_
+

diff -r b9debc14d077 -r 9dd7c64b4a64 CN0540_FFT/cn0540_windowing.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CN0540_FFT/cn0540_windowing.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,4123 @@
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+ 
+ #ifndef _WINDOWING_H_
+#define _WINDOWING_H_				 
+
+#include <arm_math.h>
+
+//Windowing function coefficients for 7-term Blackman-Harris window
+const double seven_term_BH_coefs[7] = { 0.27105140069342, -0.43329793923448, 0.21812299954311, -0.06592544638803, 0.01081174209837, -0.00077658482522, 0.00001388721735 };
+
+//Sum of all 4096 coeficients as a gain factor after aplying the windowing and the FFT
+const double Seven_Term_Blackman_Harris_4096_sum = 1109.9554550113426;
+
+const float32_t Seven_Term_Blackman_Harris_4096[4096] = 
+{
+	0.0000000591045186,
+	0.0000000591772285,
+	0.0000000593954077,
+	0.0000000597591949,
+	0.0000000602688317,
+	0.0000000609246413,
+	0.0000000617270643,
+	0.0000000626766195,
+	0.0000000637739319,
+	0.0000000650196981,
+	0.0000000664147422,
+	0.0000000679599736,
+	0.0000000696563802,
+	0.0000000715050774,
+	0.0000000735072447,
+	0.0000000756641896,
+	0.0000000779772833,
+	0.0000000804480322,
+	0.0000000830780138,
+	0.0000000858689049,
+	0.0000000888224960,
+	0.0000000919406702,
+	0.0000000952253956,
+	0.0000000986787683,
+	0.0000001023029625,
+	0.0000001061002592,
+	0.0000001100730458,
+	0.0000001142238020,
+	0.0000001185551284,
+	0.0000001230697109,
+	0.0000001277703348,
+	0.0000001326599204,
+	0.0000001377414662,
+	0.0000001430180845,
+	0.0000001484929868,
+	0.0000001541695127,
+	0.0000001600510871,
+	0.0000001661412625,
+	0.0000001724436771,
+	0.0000001789620967,
+	0.0000001857004150,
+	0.0000001926625970,
+	0.0000001998527637,
+	0.0000002072751215,
+	0.0000002149339906,
+	0.0000002228338332,
+	0.0000002309792109,
+	0.0000002393747991,
+	0.0000002480254295,
+	0.0000002569359765,
+	0.0000002661114991,
+	0.0000002755571700,
+	0.0000002852782757,
+	0.0000002952802447,
+	0.0000003055685909,
+	0.0000003161490270,
+	0.0000003270272941,
+	0.0000003382093325,
+	0.0000003497012528,
+	0.0000003615091657,
+	0.0000003736394660,
+	0.0000003860985487,
+	0.0000003988930644,
+	0.0000004120297206,
+	0.0000004255153669,
+	0.0000004393570805,
+	0.0000004535619667,
+	0.0000004681373582,
+	0.0000004830906732,
+	0.0000004984295288,
+	0.0000005141616271,
+	0.0000005302949262,
+	0.0000005468373843,
+	0.0000005637973004,
+	0.0000005811829737,
+	0.0000005990029308,
+	0.0000006172658118,
+	0.0000006359804274,
+	0.0000006551558727,
+	0.0000006748012424,
+	0.0000006949258022,
+	0.0000007155391586,
+	0.0000007366509180,
+	0.0000007582709145,
+	0.0000007804091524,
+	0.0000008030758067,
+	0.0000008262812798,
+	0.0000008500360877,
+	0.0000008743509738,
+	0.0000008992367952,
+	0.0000009247047501,
+	0.0000009507660366,
+	0.0000009774321370,
+	0.0000010047147043,
+	0.0000010326257325,
+	0.0000010611771586,
+	0.0000010903812608,
+	0.0000011202505448,
+	0.0000011507976296,
+	0.0000011820354757,
+	0.0000012139770433,
+	0.0000012466357475,
+	0.0000012800250033,
+	0.0000013141585669,
+	0.0000013490505353,
+	0.0000013847148921,
+	0.0000014211661892,
+	0.0000014584188648,
+	0.0000014964879256,
+	0.0000015353883782,
+	0.0000015751356841,
+	0.0000016157451910,
+	0.0000016572329287,
+	0.0000016996148133,
+	0.0000017429071022,
+	0.0000017871265072,
+	0.0000018322896267,
+	0.0000018784136273,
+	0.0000019255157895,
+	0.0000019736139620,
+	0.0000020227255391,
+	0.0000020728687105,
+	0.0000021240621209,
+	0.0000021763244149,
+	0.0000022296742372,
+	0.0000022841309146,
+	0.0000023397142286,
+	0.0000023964435059,
+	0.0000024543392101,
+	0.0000025134213502,
+	0.0000025737108444,
+	0.0000026352286113,
+	0.0000026979960239,
+	0.0000027620346827,
+	0.0000028273661883,
+	0.0000028940128232,
+	0.0000029619975521,
+	0.0000030313426578,
+	0.0000031020715596,
+	0.0000031742079045,
+	0.0000032477753393,
+	0.0000033227979657,
+	0.0000033993005673,
+	0.0000034773079278,
+	0.0000035568450585,
+	0.0000036379378798,
+	0.0000037206118577,
+	0.0000038048938222,
+	0.0000038908101487,
+	0.0000039783876673,
+	0.0000040676541175,
+	0.0000041586367843,
+	0.0000042513643166,
+	0.0000043458653636,
+	0.0000044421681196,
+	0.0000045403025979,
+	0.0000046402983571,
+	0.0000047421854106,
+	0.0000048459946811,
+	0.0000049517561820,
+	0.0000050595022003,
+	0.0000051692641136,
+	0.0000052810742091,
+	0.0000053949652283,
+	0.0000055109703681,
+	0.0000056291228248,
+	0.0000057494567045,
+	0.0000058720061133,
+	0.0000059968065216,
+	0.0000061238929447,
+	0.0000062533008531,
+	0.0000063850670813,
+	0.0000065192284637,
+	0.0000066558213803,
+	0.0000067948840297,
+	0.0000069364550654,
+	0.0000070805726864,
+	0.0000072272755460,
+	0.0000073766045716,
+	0.0000075285988714,
+	0.0000076833002822,
+	0.0000078407483670,
+	0.0000080009867816,
+	0.0000081640564531,
+	0.0000083300010374,
+	0.0000084988632807,
+	0.0000086706877482,
+	0.0000088455190053,
+	0.0000090234016170,
+	0.0000092043810582,
+	0.0000093885037131,
+	0.0000095758168754,
+	0.0000097663669294,
+	0.0000099602029877,
+	0.0000101573732536,
+	0.0000103579259303,
+	0.0000105619110400,
+	0.0000107693804239,
+	0.0000109803841042,
+	0.0000111949730126,
+	0.0000114131998998,
+	0.0000116351184261,
+	0.0000118607813420,
+	0.0000120902423077,
+	0.0000123235577121,
+	0.0000125607821246,
+	0.0000128019719341,
+	0.0000130471835291,
+	0.0000132964751174,
+	0.0000135499049065,
+	0.0000138075311042,
+	0.0000140694146467,
+	0.0000143356155604,
+	0.0000146061938722,
+	0.0000148812123371,
+	0.0000151607337102,
+	0.0000154448207468,
+	0.0000157335380209,
+	0.0000160269482876,
+	0.0000163251206686,
+	0.0000166281188285,
+	0.0000169360118889,
+	0.0000172488653334,
+	0.0000175667482836,
+	0.0000178897334990,
+	0.0000182178882824,
+	0.0000185512835742,
+	0.0000188899921341,
+	0.0000192340867216,
+	0.0000195836419152,
+	0.0000199387304747,
+	0.0000202994287974,
+	0.0000206658114621,
+	0.0000210379585042,
+	0.0000214159463212,
+	0.0000217998531298,
+	0.0000221897589654,
+	0.0000225857456826,
+	0.0000229878914979,
+	0.0000233962837228,
+	0.0000238110023929,
+	0.0000242321330006,
+	0.0000246597592195,
+	0.0000250939701800,
+	0.0000255348495557,
+	0.0000259824901150,
+	0.0000264369755314,
+	0.0000268984003924,
+	0.0000273668538284,
+	0.0000278424286080,
+	0.0000283252174995,
+	0.0000288153150905,
+	0.0000293128177873,
+	0.0000298178183584,
+	0.0000303304168483,
+	0.0000308507114823,
+	0.0000313788004860,
+	0.0000319147875416,
+	0.0000324587708747,
+	0.0000330108523485,
+	0.0000335711411026,
+	0.0000341397353623,
+	0.0000347167479049,
+	0.0000353022805939,
+	0.0000358964462066,
+	0.0000364993502444,
+	0.0000371111091226,
+	0.0000377318283427,
+	0.0000383616243198,
+	0.0000390006098314,
+	0.0000396489012928,
+	0.0000403066151193,
+	0.0000409738713643,
+	0.0000416507864429,
+	0.0000423374804086,
+	0.0000430340769526,
+	0.0000437407034042,
+	0.0000444574725407,
+	0.0000451845226053,
+	0.0000459219700133,
+	0.0000466699493700,
+	0.0000474285880045,
+	0.0000481980205222,
+	0.0000489783706143,
+	0.0000497697801620,
+	0.0000505723801325,
+	0.0000513863087690,
+	0.0000522117043147,
+	0.0000530487013748,
+	0.0000538974454685,
+	0.0000547580784769,
+	0.0000556307386432,
+	0.0000565155751246,
+	0.0000574127370783,
+	0.0000583223663853,
+	0.0000592446158407,
+	0.0000601796346018,
+	0.0000611275754636,
+	0.0000620885912213,
+	0.0000630628419458,
+	0.0000640504804323,
+	0.0000650516667520,
+	0.0000660665609757,
+	0.0000670953231747,
+	0.0000681381206959,
+	0.0000691951136105,
+	0.0000702664692653,
+	0.0000713523622835,
+	0.0000724529527361,
+	0.0000735684152460,
+	0.0000746989317122,
+	0.0000758446622058,
+	0.0000770057959016,
+	0.0000781825001468,
+	0.0000793749641161,
+	0.0000805833697086,
+	0.0000818078915472,
+	0.0000830487260828,
+	0.0000843060552143,
+	0.0000855800608406,
+	0.0000868709466886,
+	0.0000881788946572,
+	0.0000895041084732,
+	0.0000908467700356,
+	0.0000922070976230,
+	0.0000935852731345,
+	0.0000949815075728,
+	0.0000963960046647,
+	0.0000978289681370,
+	0.0000992806017166,
+	0.0001007511236821,
+	0.0001022407377604,
+	0.0001037496549543,
+	0.0001052781080944,
+	0.0001068262936315,
+	0.0001083944443963,
+	0.0001099827786675,
+	0.0001115915219998,
+	0.0001132208926720,
+	0.0001148711307906,
+	0.0001165424473584,
+	0.0001182350970339,
+	0.0001199492980959,
+	0.0001216852906509,
+	0.0001234433148056,
+	0.0001252236252185,
+	0.0001270264328923,
+	0.0001288519997615,
+	0.0001307005877607,
+	0.0001325724151684,
+	0.0001344677730231,
+	0.0001363868796034,
+	0.0001383299968438,
+	0.0001402974012308,
+	0.0001422893546987,
+	0.0001443060900783,
+	0.0001463479129598,
+	0.0001484150561737,
+	0.0001505078107584,
+	0.0001526264532004,
+	0.0001547712308820,
+	0.0001569424493937,
+	0.0001591403852217,
+	0.0001613653148524,
+	0.0001636175293243,
+	0.0001658972905716,
+	0.0001682049332885,
+	0.0001705407048576,
+	0.0001729049399728,
+	0.0001752979151206,
+	0.0001777199213393,
+	0.0001801712933229,
+	0.0001826523075579,
+	0.0001851632841863,
+	0.0001877045287983,
+	0.0001902763615362,
+	0.0001928790879901,
+	0.0001955130574061,
+	0.0001981785462704,
+	0.0002008759183809,
+	0.0002036054647760,
+	0.0002063675492536,
+	0.0002091624919558,
+	0.0002119906275766,
+	0.0002148522908101,
+	0.0002177478309022,
+	0.0002206775825471,
+	0.0002236419095425,
+	0.0002266411465826,
+	0.0002296756429132,
+	0.0002327457768843,
+	0.0002358518831898,
+	0.0002389943401795,
+	0.0002421734970994,
+	0.0002453897322994,
+	0.0002486434241291,
+	0.0002519349509384,
+	0.0002552646619733,
+	0.0002586329355836,
+	0.0002620402083267,
+	0.0002654868003447,
+	0.0002689731481951,
+	0.0002724996302277,
+	0.0002760666538961,
+	0.0002796745975502,
+	0.0002833238686435,
+	0.0002870148746297,
+	0.0002907480229624,
+	0.0002945237501990,
+	0.0002983424346894,
+	0.0003022045129910,
+	0.0003061104216613,
+	0.0003100605681539,
+	0.0003140553599223,
+	0.0003180952917319,
+	0.0003221807419322,
+	0.0003263122052886,
+	0.0003304900601506,
+	0.0003347148012836,
+	0.0003389868652448,
+	0.0003433066885918,
+	0.0003476747660898,
+	0.0003520915051922,
+	0.0003565574297681,
+	0.0003610729763750,
+	0.0003656386106741,
+	0.0003702548274305,
+	0.0003749220923055,
+	0.0003796409000643,
+	0.0003844117163680,
+	0.0003892350650858,
+	0.0003941114118788,
+	0.0003990412515122,
+	0.0004040251078550,
+	0.0004090634756722,
+	0.0004141568788327,
+	0.0004193058121018,
+	0.0004245107993484,
+	0.0004297723644413,
+	0.0004350910312496,
+	0.0004404673236422,
+	0.0004459017945919,
+	0.0004513949388638,
+	0.0004569473385345,
+	0.0004625595465768,
+	0.0004682320577558,
+	0.0004739654832520,
+	0.0004797603760380,
+	0.0004856172599830,
+	0.0004915367462672,
+	0.0004975193296559,
+	0.0005035657086410,
+	0.0005096763488837,
+	0.0005158518324606,
+	0.0005220928578638,
+	0.0005283999489620,
+	0.0005347736878321,
+	0.0005412146565504,
+	0.0005477235536091,
+	0.0005543009028770,
+	0.0005609473446384,
+	0.0005676634609699,
+	0.0005744499503635,
+	0.0005813073948957,
+	0.0005882364348508,
+	0.0005952377105132,
+	0.0006023118621670,
+	0.0006094595300965,
+	0.0006166813545860,
+	0.0006239779759198,
+	0.0006313501507975,
+	0.0006387984030880,
+	0.0006463235476986,
+	0.0006539261084981,
+	0.0006616069003940,
+	0.0006693665054627,
+	0.0006772056221962,
+	0.0006851250655018,
+	0.0006931253592484,
+	0.0007012073183432,
+	0.0007093716412783,
+	0.0007176190265454,
+	0.0007259501726367,
+	0.0007343657780439,
+	0.0007428666576743,
+	0.0007514535100199,
+	0.0007601270335726,
+	0.0007688879850321,
+	0.0007777371210977,
+	0.0007866752566770,
+	0.0007957030902617,
+	0.0008048213203438,
+	0.0008140308200382,
+	0.0008233323460445,
+	0.0008327266550623,
+	0.0008422145619988,
+	0.0008517968235537,
+	0.0008614742546342,
+	0.0008712476119399,
+	0.0008811177685857,
+	0.0008910854812711,
+	0.0009011516231112,
+	0.0009113169508055,
+	0.0009215823374689,
+	0.0009319486562163,
+	0.0009424166637473,
+	0.0009529872331768,
+	0.0009636612376198,
+	0.0009744394919835,
+	0.0009853228693828,
+	0.0009963123593479,
+	0.0010074086021632,
+	0.0010186126455665,
+	0.0010299254208803,
+	0.0010413476265967,
+	0.0010528803104535,
+	0.0010645242873579,
+	0.0010762804886326,
+	0.0010881498456001,
+	0.0011001334059983,
+	0.0011122318683192,
+	0.0011244462803006,
+	0.0011367775732651,
+	0.0011492265621200,
+	0.0011617945274338,
+	0.0011744820512831,
+	0.0011872902978212,
+	0.0012002201983705,
+	0.0012132728006691,
+	0.0012264489196241,
+	0.0012397497193888,
+	0.0012531761312857,
+	0.0012667289702222,
+	0.0012804095167667,
+	0.0012942187022418,
+	0.0013081574579701,
+	0.0013222268316895,
+	0.0013364279875532,
+	0.0013507617404684,
+	0.0013652293710038,
+	0.0013798316940665,
+	0.0013945699902251,
+	0.0014094451908022,
+	0.0014244584599510,
+	0.0014396107289940,
+	0.0014549031620845,
+	0.0014703368069604,
+	0.0014859129441902,
+	0.0015016323886812,
+	0.0015174965374172,
+	0.0015335063217208,
+	0.0015496629057452,
+	0.0015659673372284,
+	0.0015824208967388,
+	0.0015990247484297,
+	0.0016157799400389,
+	0.0016326875193045,
+	0.0016497489996254,
+	0.0016669651959091,
+	0.0016843375051394,
+	0.0017018669750541,
+	0.0017195548862219,
+	0.0017374025192112,
+	0.0017554108053446,
+	0.0017735812580213,
+	0.0017919149249792,
+	0.0018104130867869,
+	0.0018290770240128,
+	0.0018479077843949,
+	0.0018669068813324,
+	0.0018860754789785,
+	0.0019054147414863,
+	0.0019249260658398,
+	0.0019446106161922,
+	0.0019644696731120,
+	0.0019845047499985,
+	0.0020047170110047,
+	0.0020251076202840,
+	0.0020456782076508,
+	0.0020664297044277,
+	0.0020873637404293,
+	0.0021084817126393,
+	0.0021297845523804,
+	0.0021512741222978,
+	0.0021729513537139,
+	0.0021948178764433,
+	0.0022168750874698,
+	0.0022391241509467,
+	0.0022615666966885,
+	0.0022842038888484,
+	0.0023070373572409,
+	0.0023300682660192,
+	0.0023532982449979,
+	0.0023767289239913,
+	0.0024003612343222,
+	0.0024241970386356,
+	0.0024482375010848,
+	0.0024724842514843,
+	0.0024969386868179,
+	0.0025216024369001,
+	0.0025464768987149,
+	0.0025715634692460,
+	0.0025968637783080,
+	0.0026223792228848,
+	0.0026481114327908,
+	0.0026740620378405,
+	0.0027002322021872,
+	0.0027266237884760,
+	0.0027532381936908,
+	0.0027800772804767,
+	0.0028071419801563,
+	0.0028344346210361,
+	0.0028619563672692,
+	0.0028897086158395,
+	0.0029176934622228,
+	0.0029459123034030,
+	0.0029743667691946,
+	0.0030030582565814,
+	0.0030319888610393,
+	0.0030611597467214,
+	0.0030905727762729,
+	0.0031202298123389,
+	0.0031501322519034,
+	0.0031802819576114,
+	0.0032106803264469,
+	0.0032413294538856,
+	0.0032722307369113,
+	0.0033033858053386,
+	0.0033347967546433,
+	0.0033664652146399,
+	0.0033983925823122,
+	0.0034305811859667,
+	0.0034630321897566,
+	0.0034957476891577,
+	0.0035287293139845,
+	0.0035619791597128,
+	0.0035954986233264,
+	0.0036292895674706,
+	0.0036633540876210,
+	0.0036976938135922,
+	0.0037323106080294,
+	0.0037672061007470,
+	0.0038023823872209,
+	0.0038378413300961,
+	0.0038735845591873,
+	0.0039096144028008,
+	0.0039459322579205,
+	0.0039825402200222,
+	0.0040194396860898,
+	0.0040566339157522,
+	0.0040941233746707,
+	0.0041319108568132,
+	0.0041699977591634,
+	0.0042083864100277,
+	0.0042470786720514,
+	0.0042860764078796,
+	0.0043253819458187,
+	0.0043649966828525,
+	0.0044049229472876,
+	0.0044451630674303,
+	0.0044857184402645,
+	0.0045265913940966,
+	0.0045677837915719,
+	0.0046092974953353,
+	0.0046511352993548,
+	0.0046932990662754,
+	0.0047357901930809,
+	0.0047786110080779,
+	0.0048217643052340,
+	0.0048652514815331,
+	0.0049090743996203,
+	0.0049532363191247,
+	0.0049977381713688,
+	0.0050425822846591,
+	0.0050877714529634,
+	0.0051333075389266,
+	0.0051791924051940,
+	0.0052254283800721,
+	0.0052720177918673,
+	0.0053189629688859,
+	0.0053662657737732,
+	0.0054139285348356,
+	0.0054619535803795,
+	0.0055103427730501,
+	0.0055590989068151,
+	0.0056082233786583,
+	0.0056577194482088,
+	0.0057075889781117,
+	0.0057578342966735,
+	0.0058084577322006,
+	0.0058594611473382,
+	0.0059108473360538,
+	0.0059626186266541,
+	0.0060147773474455,
+	0.0060673253610730,
+	0.0061202654615045,
+	0.0061735995113850,
+	0.0062273307703435,
+	0.0062814611010253,
+	0.0063359928317368,
+	0.0063909282907844,
+	0.0064462702721357,
+	0.0065020206384361,
+	0.0065581826493144,
+	0.0066147577017546,
+	0.0066717490553856,
+	0.0067291590385139,
+	0.0067869899794459,
+	0.0068452442064881,
+	0.0069039245136082,
+	0.0069630332291126,
+	0.0070225726813078,
+	0.0070825461298227,
+	0.0071429549716413,
+	0.0072038029320538,
+	0.0072650918737054,
+	0.0073268241249025,
+	0.0073890029452741,
+	0.0074516306631267,
+	0.0075147096067667,
+	0.0075782425701618,
+	0.0076422323472798,
+	0.0077066812664270,
+	0.0077715921215713,
+	0.0078369677066803,
+	0.0079028103500605,
+	0.0079691223800182,
+	0.0080359075218439,
+	0.0081031676381826,
+	0.0081709064543247,
+	0.0082391249015927,
+	0.0083078276365995,
+	0.0083770155906677,
+	0.0084466924890876,
+	0.0085168611258268,
+	0.0085875242948532,
+	0.0086586847901344,
+	0.0087303444743156,
+	0.0088025070726871,
+	0.0088751753792167,
+	0.0089483512565494,
+	0.0090220384299755,
+	0.0090962396934628,
+	0.0091709578409791,
+	0.0092461956664920,
+	0.0093219559639692,
+	0.0093982415273786,
+	0.0094750560820103,
+	0.0095524005591869,
+	0.0096302796155214,
+	0.0097086951136589,
+	0.0097876517102122,
+	0.0098671494051814,
+	0.0099471937865019,
+	0.0100277876481414,
+	0.0101089319214225,
+	0.0101906312629580,
+	0.0102728884667158,
+	0.0103557063266635,
+	0.0104390867054462,
+	0.0105230351909995,
+	0.0106075527146459,
+	0.0106926430016756,
+	0.0107783088460565,
+	0.0108645539730787,
+	0.0109513811767101,
+	0.0110387923195958,
+	0.0111267920583487,
+	0.0112153831869364,
+	0.0113045684993267,
+	0.0113943517208099,
+	0.0114847347140312,
+	0.0115757212042809,
+	0.0116673149168491,
+	0.0117595186457038,
+	0.0118523361161351,
+	0.0119457691907883,
+	0.0120398215949535,
+	0.0121344970539212,
+	0.0122297983616590,
+	0.0123257292434573,
+	0.0124222924932837,
+	0.0125194909051061,
+	0.0126173282042146,
+	0.0127158081158996,
+	0.0128149334341288,
+	0.0129147069528699,
+	0.0130151333287358,
+	0.0131162144243717,
+	0.0132179548963904,
+	0.0133203566074371,
+	0.0134234242141247,
+	0.0135271595790982,
+	0.0136315673589706,
+	0.0137366512790322,
+	0.0138424132019281,
+	0.0139488577842712,
+	0.0140559868887067,
+	0.0141638061031699,
+	0.0142723172903061,
+	0.0143815241754055,
+	0.0144914304837584,
+	0.0146020390093327,
+	0.0147133544087410,
+	0.0148253785446286,
+	0.0149381160736084,
+	0.0150515707209706,
+	0.0151657452806830,
+	0.0152806425467134,
+	0.0153962681069970,
+	0.0155126238241792,
+	0.0156297124922276,
+	0.0157475396990776,
+	0.0158661101013422,
+	0.0159854236990213,
+	0.0161054842174053,
+	0.0162262991070747,
+	0.0163478683680296,
+	0.0164701975882053,
+	0.0165932886302471,
+	0.0167171452194452,
+	0.0168417748063803,
+	0.0169671755284071,
+	0.0170933548361063,
+	0.0172203145921230,
+	0.0173480585217476,
+	0.0174765922129154,
+	0.0176059175282717,
+	0.0177360381931067,
+	0.0178669579327106,
+	0.0179986823350191,
+	0.0181312132626772,
+	0.0182645544409752,
+	0.0183987095952034,
+	0.0185336824506521,
+	0.0186694785952568,
+	0.0188060998916626,
+	0.0189435500651598,
+	0.0190818347036839,
+	0.0192209556698799,
+	0.0193609185516834,
+	0.0195017252117395,
+	0.0196433793753386,
+	0.0197858866304159,
+	0.0199292507022619,
+	0.0200734753161669,
+	0.0202185641974211,
+	0.0203645192086697,
+	0.0205113459378481,
+	0.0206590499728918,
+	0.0208076313138008,
+	0.0209570974111557,
+	0.0211074519902468,
+	0.0212586969137192,
+	0.0214108359068632,
+	0.0215638745576143,
+	0.0217178165912628,
+	0.0218726657330990,
+	0.0220284257084131,
+	0.0221851021051407,
+	0.0223426949232817,
+	0.0225012134760618,
+	0.0226606559008360,
+	0.0228210315108299,
+	0.0229823421686888,
+	0.0231445915997028,
+	0.0233077835291624,
+	0.0234719235450029,
+	0.0236370135098696,
+	0.0238030590116978,
+	0.0239700656384230,
+	0.0241380333900452,
+	0.0243069697171450,
+	0.0244768783450127,
+	0.0246477611362934,
+	0.0248196236789227,
+	0.0249924715608358,
+	0.0251663066446781,
+	0.0253411345183849,
+	0.0255169589072466,
+	0.0256937816739082,
+	0.0258716102689505,
+	0.0260504484176636,
+	0.0262302998453379,
+	0.0264111664146185,
+	0.0265930555760860,
+	0.0267759710550308,
+	0.0269599147140980,
+	0.0271448921412230,
+	0.0273309089243412,
+	0.0275179669260979,
+	0.0277060717344284,
+	0.0278952289372683,
+	0.0280854385346174,
+	0.0282767098397017,
+	0.0284690428525209,
+	0.0286624450236559,
+	0.0288569182157516,
+	0.0290524680167437,
+	0.0292491000145674,
+	0.0294468160718679,
+	0.0296456199139357,
+	0.0298455189913511,
+	0.0300465151667595,
+	0.0302486140280962,
+	0.0304518174380064,
+	0.0306561347097158,
+	0.0308615639805794,
+	0.0310681145638227,
+	0.0312757901847363,
+	0.0314845927059650,
+	0.0316945277154446,
+	0.0319055989384651,
+	0.0321178101003170,
+	0.0323311686515808,
+	0.0325456783175468,
+	0.0327613428235054,
+	0.0329781621694565,
+	0.0331961475312710,
+	0.0334152989089489,
+	0.0336356237530708,
+	0.0338571257889271,
+	0.0340798087418079,
+	0.0343036763370037,
+	0.0345287322998047,
+	0.0347549840807915,
+	0.0349824316799641,
+	0.0352110862731934,
+	0.0354409441351891,
+	0.0356720164418221,
+	0.0359043031930923,
+	0.0361378118395805,
+	0.0363725461065769,
+	0.0366085134446621,
+	0.0368457101285458,
+	0.0370841473340988,
+	0.0373238250613213,
+	0.0375647544860840,
+	0.0378069356083870,
+	0.0380503721535206,
+	0.0382950678467751,
+	0.0385410338640213,
+	0.0387882664799690,
+	0.0390367731451988,
+	0.0392865613102913,
+	0.0395376347005367,
+	0.0397899933159351,
+	0.0400436446070671,
+	0.0402985960245132,
+	0.0405548438429832,
+	0.0408124029636383,
+	0.0410712733864784,
+	0.0413314551115036,
+	0.0415929593145847,
+	0.0418557897210121,
+	0.0421199463307858,
+	0.0423854365944862,
+	0.0426522679626942,
+	0.0429204404354095,
+	0.0431899577379227,
+	0.0434608310461044,
+	0.0437330566346645,
+	0.0440066456794739,
+	0.0442816019058228,
+	0.0445579253137112,
+	0.0448356270790100,
+	0.0451147034764290,
+	0.0453951656818390,
+	0.0456770174205303,
+	0.0459602624177933,
+	0.0462449043989182,
+	0.0465309470891953,
+	0.0468183979392052,
+	0.0471072606742382,
+	0.0473975390195847,
+	0.0476892367005348,
+	0.0479823574423790,
+	0.0482769124209881,
+	0.0485728979110718,
+	0.0488703250885010,
+	0.0491691976785660,
+	0.0494695156812668,
+	0.0497712828218937,
+	0.0500745140016079,
+	0.0503792017698288,
+	0.0506853573024273,
+	0.0509929843246937,
+	0.0513020865619183,
+	0.0516126714646816,
+	0.0519247390329838,
+	0.0522382967174053,
+	0.0525533482432365,
+	0.0528698973357677,
+	0.0531879514455795,
+	0.0535075105726719,
+	0.0538285858929157,
+	0.0541511736810207,
+	0.0544752851128578,
+	0.0548009239137173,
+	0.0551280938088894,
+	0.0554567947983742,
+	0.0557870380580425,
+	0.0561188273131847,
+	0.0564521625638008,
+	0.0567870549857616,
+	0.0571235045790672,
+	0.0574615150690079,
+	0.0578010939061642,
+	0.0581422448158264,
+	0.0584849715232849,
+	0.0588292814791203,
+	0.0591751746833324,
+	0.0595226585865021,
+	0.0598717369139194,
+	0.0602224133908749,
+	0.0605746954679489,
+	0.0609285868704319,
+	0.0612840875983238,
+	0.0616412088274956,
+	0.0619999505579472,
+	0.0623603202402592,
+	0.0627223178744316,
+	0.0630859583616257,
+	0.0634512305259705,
+	0.0638181492686272,
+	0.0641867220401764,
+	0.0645569413900375,
+	0.0649288222193718,
+	0.0653023645281792,
+	0.0656775757670403,
+	0.0660544633865356,
+	0.0664330199360847,
+	0.0668132603168488,
+	0.0671951845288277,
+	0.0675787925720215,
+	0.0679640993475914,
+	0.0683511123061180,
+	0.0687398165464401,
+	0.0691302344202995,
+	0.0695223584771156,
+	0.0699162036180496,
+	0.0703117698431015,
+	0.0707090646028519,
+	0.0711080804467201,
+	0.0715088322758675,
+	0.0719113275408745,
+	0.0723155662417412,
+	0.0727215483784676,
+	0.0731292814016342,
+	0.0735387727618217,
+	0.0739500224590302,
+	0.0743630379438400,
+	0.0747778192162514,
+	0.0751943737268448,
+	0.0756127089262009,
+	0.0760328248143196,
+	0.0764547288417816,
+	0.0768784284591675,
+	0.0773039162158966,
+	0.0777312070131302,
+	0.0781603008508682,
+	0.0785911977291107,
+	0.0790239125490189,
+	0.0794584378600121,
+	0.0798947885632515,
+	0.0803329646587372,
+	0.0807729735970497,
+	0.0812148079276085,
+	0.0816584825515747,
+	0.0821040049195290,
+	0.0825513675808907,
+	0.0830005854368210,
+	0.0834516510367393,
+	0.0839045792818069,
+	0.0843593701720238,
+	0.0848160311579704,
+	0.0852745622396469,
+	0.0857349634170532,
+	0.0861972495913506,
+	0.0866614207625389,
+	0.0871274769306183,
+	0.0875954255461693,
+	0.0880652666091919,
+	0.0885370150208473,
+	0.0890106633305550,
+	0.0894862189888954,
+	0.0899636894464493,
+	0.0904430747032166,
+	0.0909243822097778,
+	0.0914076119661331,
+	0.0918927714228630,
+	0.0923798605799675,
+	0.0928688868880272,
+	0.0933598503470421,
+	0.0938527658581734,
+	0.0943476259708405,
+	0.0948444381356239,
+	0.0953432023525238,
+	0.0958439335227013,
+	0.0963466241955757,
+	0.0968512818217278,
+	0.0973579138517380,
+	0.0978665202856064,
+	0.0983771011233330,
+	0.0988896712660789,
+	0.0994042232632637,
+	0.0999207720160484,
+	0.1004393100738525,
+	0.1009598448872566,
+	0.1014823839068413,
+	0.1020069271326065,
+	0.1025334820151329,
+	0.1030620485544205,
+	0.1035926342010498,
+	0.1041252389550209,
+	0.1046598702669144,
+	0.1051965206861496,
+	0.1057352125644684,
+	0.1062759310007095,
+	0.1068186908960342,
+	0.1073634922504425,
+	0.1079103425145149,
+	0.1084592342376709,
+	0.1090101897716522,
+	0.1095631942152977,
+	0.1101182550191879,
+	0.1106753870844841,
+	0.1112345829606056,
+	0.1117958426475525,
+	0.1123591810464859,
+	0.1129245981574059,
+	0.1134920939803123,
+	0.1140616759657860,
+	0.1146333366632462,
+	0.1152070984244347,
+	0.1157829463481903,
+	0.1163608878850937,
+	0.1169409379363060,
+	0.1175230890512466,
+	0.1181073486804962,
+	0.1186937093734741,
+	0.1192821934819221,
+	0.1198727861046791,
+	0.1204655021429062,
+	0.1210603415966034,
+	0.1216573044657707,
+	0.1222563982009888,
+	0.1228576228022575,
+	0.1234609857201576,
+	0.1240664795041084,
+	0.1246741190552711,
+	0.1252839118242264,
+	0.1258958280086517,
+	0.1265099197626114,
+	0.1271261423826218,
+	0.1277445405721664,
+	0.1283650845289230,
+	0.1289877891540527,
+	0.1296126693487167,
+	0.1302397102117538,
+	0.1308689266443253,
+	0.1315003037452698,
+	0.1321338713169098,
+	0.1327696144580841,
+	0.1334075331687927,
+	0.1340476274490356,
+	0.1346899271011353,
+	0.1353344023227692,
+	0.1359810829162598,
+	0.1366299390792847,
+	0.1372810155153275,
+	0.1379342675209045,
+	0.1385897397994995,
+	0.1392474025487900,
+	0.1399072855710983,
+	0.1405693739652634,
+	0.1412336677312851,
+	0.1419001966714859,
+	0.1425689160823822,
+	0.1432398706674576,
+	0.1439130455255508,
+	0.1445884406566620,
+	0.1452660709619522,
+	0.1459459215402603,
+	0.1466280072927475,
+	0.1473123133182526,
+	0.1479988694190979,
+	0.1486876606941223,
+	0.1493786871433258,
+	0.1500719636678696,
+	0.1507674753665924,
+	0.1514652371406555,
+	0.1521652489900589,
+	0.1528675109148026,
+	0.1535720229148865,
+	0.1542787849903107,
+	0.1549877971410751,
+	0.1556990742683411,
+	0.1564126163721085,
+	0.1571284234523773,
+	0.1578464806079865,
+	0.1585668176412582,
+	0.1592894047498703,
+	0.1600142717361450,
+	0.1607414186000824,
+	0.1614708155393600,
+	0.1622025072574615,
+	0.1629364639520645,
+	0.1636727005243301,
+	0.1644112020730972,
+	0.1651519984006882,
+	0.1658950746059418,
+	0.1666404306888580,
+	0.1673880815505981,
+	0.1681380122900009,
+	0.1688902229070663,
+	0.1696447432041168,
+	0.1704015284776688,
+	0.1711606234312057,
+	0.1719219982624054,
+	0.1726856827735901,
+	0.1734516471624374,
+	0.1742199212312698,
+	0.1749904900789261,
+	0.1757633537054062,
+	0.1765385121107101,
+	0.1773159801959991,
+	0.1780957579612732,
+	0.1788778156042099,
+	0.1796621978282928,
+	0.1804488748311996,
+	0.1812378615140915,
+	0.1820291578769684,
+	0.1828227639198303,
+	0.1836186796426773,
+	0.1844168901443481,
+	0.1852174252271652,
+	0.1860202699899673,
+	0.1868254244327545,
+	0.1876328885555267,
+	0.1884426623582840,
+	0.1892547607421875,
+	0.1900691539049149,
+	0.1908858865499496,
+	0.1917049139738083,
+	0.1925262659788132,
+	0.1933499276638031,
+	0.1941759139299393,
+	0.1950042247772217,
+	0.1958348304033279,
+	0.1966677755117416,
+	0.1975030303001404,
+	0.1983405947685242,
+	0.1991804838180542,
+	0.2000226974487305,
+	0.2008672207593918,
+	0.2017140537500381,
+	0.2025632262229919,
+	0.2034147083759308,
+	0.2042685002088547,
+	0.2051246166229248,
+	0.2059830576181412,
+	0.2068438082933426,
+	0.2077068835496902,
+	0.2085722833871841,
+	0.2094399929046631,
+	0.2103100121021271,
+	0.2111823558807373,
+	0.2120570242404938,
+	0.2129340022802353,
+	0.2138132899999619,
+	0.2146949023008347,
+	0.2155788242816925,
+	0.2164650708436966,
+	0.2173536121845245,
+	0.2182444781064987,
+	0.2191376686096191,
+	0.2200331538915634,
+	0.2209309637546539,
+	0.2218310683965683,
+	0.2227334976196289,
+	0.2236382365226746,
+	0.2245452702045441,
+	0.2254546284675598,
+	0.2263662815093994,
+	0.2272802442312241,
+	0.2281965166330338,
+	0.2291150838136673,
+	0.2300359457731247,
+	0.2309591323137283,
+	0.2318845987319946,
+	0.2328123748302460,
+	0.2337424457073212,
+	0.2346748113632202,
+	0.2356094717979431,
+	0.2365464419126511,
+	0.2374856919050217,
+	0.2384272217750549,
+	0.2393710613250732,
+	0.2403171807527542,
+	0.2412655800580978,
+	0.2422162741422653,
+	0.2431692481040955,
+	0.2441245019435883,
+	0.2450820356607437,
+	0.2460418492555618,
+	0.2470039427280426,
+	0.2479683011770248,
+	0.2489349395036697,
+	0.2499038577079773,
+	0.2508750259876251,
+	0.2518484890460968,
+	0.2528241872787476,
+	0.2538021504878998,
+	0.2547823786735535,
+	0.2557648718357086,
+	0.2567496299743652,
+	0.2577366530895233,
+	0.2587258815765381,
+	0.2597174048423767,
+	0.2607111632823944,
+	0.2617071568965912,
+	0.2627053856849670,
+	0.2637058794498444,
+	0.2647085785865784,
+	0.2657135426998138,
+	0.2667207121849060,
+	0.2677301466464996,
+	0.2687417864799500,
+	0.2697556614875793,
+	0.2707717418670654,
+	0.2717900574207306,
+	0.2728105783462524,
+	0.2738333046436310,
+	0.2748582363128662,
+	0.2758854031562805,
+	0.2769147455692291,
+	0.2779463231563568,
+	0.2789800763130188,
+	0.2800160348415375,
+	0.2810541689395905,
+	0.2820945084095001,
+	0.2831370234489441,
+	0.2841817140579224,
+	0.2852285802364349,
+	0.2862776219844818,
+	0.2873288691043854,
+	0.2883822619915009,
+	0.2894378006458282,
+	0.2904955148696899,
+	0.2915554046630859,
+	0.2926174104213715,
+	0.2936815917491913,
+	0.2947479188442230,
+	0.2958163917064667,
+	0.2968869805335999,
+	0.2979597151279449,
+	0.2990345954895020,
+	0.3001115918159485,
+	0.3011907041072845,
+	0.3022719621658325,
+	0.3033553063869476,
+	0.3044407665729523,
+	0.3055283427238464,
+	0.3066180050373077,
+	0.3077097833156586,
+	0.3088036477565765,
+	0.3098995983600616,
+	0.3109976351261139,
+	0.3120977282524109,
+	0.3131999373435974,
+	0.3143042027950287,
+	0.3154105246067047,
+	0.3165189027786255,
+	0.3176293671131134,
+	0.3187418580055237,
+	0.3198564052581787,
+	0.3209730088710785,
+	0.3220916390419006,
+	0.3232122957706451,
+	0.3243349790573120,
+	0.3254597187042236,
+	0.3265864253044128,
+	0.3277151882648468,
+	0.3288459479808807,
+	0.3299787044525146,
+	0.3311134576797485,
+	0.3322502076625824,
+	0.3333889245986938,
+	0.3345296382904053,
+	0.3356723487377167,
+	0.3368169963359833,
+	0.3379636108875275,
+	0.3391122221946716,
+	0.3402627408504486,
+	0.3414152562618256,
+	0.3425696790218353,
+	0.3437260389328003,
+	0.3448843359947205,
+	0.3460445702075958,
+	0.3472067117691040,
+	0.3483707904815674,
+	0.3495367467403412,
+	0.3507046401500702,
+	0.3518743813037872,
+	0.3530460596084595,
+	0.3542195856571198,
+	0.3553950190544128,
+	0.3565723001956940,
+	0.3577514588832855,
+	0.3589324653148651,
+	0.3601153492927551,
+	0.3613000512123108,
+	0.3624866008758545,
+	0.3636749982833862,
+	0.3648651838302612,
+	0.3660572171211243,
+	0.3672510683536530,
+	0.3684467375278473,
+	0.3696441650390625,
+	0.3708434402942657,
+	0.3720444440841675,
+	0.3732472658157349,
+	0.3744518458843231,
+	0.3756582140922546,
+	0.3768663108348846,
+	0.3780761659145355,
+	0.3792877793312073,
+	0.3805011212825775,
+	0.3817161917686462,
+	0.3829329907894135,
+	0.3841515183448792,
+	0.3853717446327209,
+	0.3865936696529388,
+	0.3878172636032104,
+	0.3890425860881805,
+	0.3902695477008820,
+	0.3914982080459595,
+	0.3927285373210907,
+	0.3939605057239532,
+	0.3951941430568695,
+	0.3964293897151947,
+	0.3976663053035736,
+	0.3989048302173615,
+	0.4001449644565582,
+	0.4013867080211639,
+	0.4026300609111786,
+	0.4038750231266022,
+	0.4051215350627899,
+	0.4063696563243866,
+	0.4076193273067474,
+	0.4088705778121948,
+	0.4101233780384064,
+	0.4113776981830597,
+	0.4126335978507996,
+	0.4138909876346588,
+	0.4151499271392822,
+	0.4164103567600250,
+	0.4176723062992096,
+	0.4189357161521912,
+	0.4202006459236145,
+	0.4214670360088348,
+	0.4227349162101746,
+	0.4240042567253113,
+	0.4252750277519226,
+	0.4265472590923309,
+	0.4278208911418915,
+	0.4290959835052490,
+	0.4303724765777588,
+	0.4316504001617432,
+	0.4329296946525574,
+	0.4342103898525238,
+	0.4354924559593201,
+	0.4367758929729462,
+	0.4380607008934021,
+	0.4393468797206879,
+	0.4406343698501587,
+	0.4419232010841370,
+	0.4432133734226227,
+	0.4445048272609711,
+	0.4457976222038269,
+	0.4470916986465454,
+	0.4483870565891266,
+	0.4496837258338928,
+	0.4509816169738770,
+	0.4522807896137238,
+	0.4535812139511108,
+	0.4548828601837158,
+	0.4561857581138611,
+	0.4574898779392242,
+	0.4587951898574829,
+	0.4601016938686371,
+	0.4614093899726868,
+	0.4627182781696320,
+	0.4640283286571503,
+	0.4653395712375641,
+	0.4666519165039062,
+	0.4679654240608215,
+	0.4692800641059875,
+	0.4705958068370819,
+	0.4719126820564270,
+	0.4732306301593781,
+	0.4745496809482574,
+	0.4758698046207428,
+	0.4771910011768341,
+	0.4785132408142090,
+	0.4798365533351898,
+	0.4811608791351318,
+	0.4824862480163574,
+	0.4838126003742218,
+	0.4851399958133698,
+	0.4864683449268341,
+	0.4877977073192596,
+	0.4891280233860016,
+	0.4904593229293823,
+	0.4917915463447571,
+	0.4931247234344482,
+	0.4944588243961334,
+	0.4957938492298126,
+	0.4971297681331635,
+	0.4984666109085083,
+	0.4998043179512024,
+	0.5011428594589233,
+	0.5024823546409607,
+	0.5038226246833801,
+	0.5051637291908264,
+	0.5065057277679443,
+	0.5078485012054443,
+	0.5091920495033264,
+	0.5105364322662354,
+	0.5118815898895264,
+	0.5132275223731995,
+	0.5145742297172546,
+	0.5159216523170471,
+	0.5172698497772217,
+	0.5186187624931335,
+	0.5199683308601379,
+	0.5213186740875244,
+	0.5226696729660034,
+	0.5240213871002197,
+	0.5253736972808838,
+	0.5267267227172852,
+	0.5280803442001343,
+	0.5294346213340759,
+	0.5307895541191101,
+	0.5321450233459473,
+	0.5335011482238770,
+	0.5348578095436096,
+	0.5362150669097900,
+	0.5375728607177734,
+	0.5389311909675598,
+	0.5402901172637939,
+	0.5416495203971863,
+	0.5430094599723816,
+	0.5443698763847351,
+	0.5457307696342468,
+	0.5470921397209167,
+	0.5484539866447449,
+	0.5498162508010864,
+	0.5511789917945862,
+	0.5525420904159546,
+	0.5539056658744812,
+	0.5552695989608765,
+	0.5566339492797852,
+	0.5579986572265625,
+	0.5593637228012085,
+	0.5607291460037231,
+	0.5620948672294617,
+	0.5634609460830688,
+	0.5648273229598999,
+	0.5661939978599548,
+	0.5675609707832336,
+	0.5689282417297363,
+	0.5702956914901733,
+	0.5716634392738342,
+	0.5730314254760742,
+	0.5743995904922485,
+	0.5757679939270020,
+	0.5771365761756897,
+	0.5785053372383118,
+	0.5798742175102234,
+	0.5812433362007141,
+	0.5826125144958496,
+	0.5839818716049194,
+	0.5853513479232788,
+	0.5867208838462830,
+	0.5880905389785767,
+	0.5894602537155151,
+	0.5908300876617432,
+	0.5921998620033264,
+	0.5935697555541992,
+	0.5949396491050720,
+	0.5963095426559448,
+	0.5976794362068176,
+	0.5990492701530457,
+	0.6004191040992737,
+	0.6017888784408569,
+	0.6031585931777954,
+	0.6045282483100891,
+	0.6058977842330933,
+	0.6072672605514526,
+	0.6086366176605225,
+	0.6100057959556580,
+	0.6113749146461487,
+	0.6127437949180603,
+	0.6141125559806824,
+	0.6154810786247253,
+	0.6168494820594788,
+	0.6182175874710083,
+	0.6195855140686035,
+	0.6209532022476196,
+	0.6223206520080566,
+	0.6236878037452698,
+	0.6250546574592590,
+	0.6264212727546692,
+	0.6277875304222107,
+	0.6291534900665283,
+	0.6305190920829773,
+	0.6318843364715576,
+	0.6332492232322693,
+	0.6346137523651123,
+	0.6359778642654419,
+	0.6373416185379028,
+	0.6387048959732056,
+	0.6400677561759949,
+	0.6414301395416260,
+	0.6427920460700989,
+	0.6441535353660583,
+	0.6455144882202148,
+	0.6468749642372131,
+	0.6482348442077637,
+	0.6495942473411560,
+	0.6509531140327454,
+	0.6523113846778870,
+	0.6536690592765808,
+	0.6550261974334717,
+	0.6563826799392700,
+	0.6577385663986206,
+	0.6590937972068787,
+	0.6604483723640442,
+	0.6618022918701172,
+	0.6631554961204529,
+	0.6645080447196960,
+	0.6658598780632019,
+	0.6672109961509705,
+	0.6685613393783569,
+	0.6699109673500061,
+	0.6712598204612732,
+	0.6726078391075134,
+	0.6739550828933716,
+	0.6753015518188477,
+	0.6766471862792969,
+	0.6779919266700745,
+	0.6793358922004700,
+	0.6806789040565491,
+	0.6820210814476013,
+	0.6833623647689819,
+	0.6847026944160461,
+	0.6860421299934387,
+	0.6873805522918701,
+	0.6887180805206299,
+	0.6900546550750732,
+	0.6913901567459106,
+	0.6927247047424316,
+	0.6940582394599915,
+	0.6953907608985901,
+	0.6967221498489380,
+	0.6980525851249695,
+	0.6993818283081055,
+	0.7007100582122803,
+	0.7020371556282043,
+	0.7033631205558777,
+	0.7046879529953003,
+	0.7060116529464722,
+	0.7073341608047485,
+	0.7086555361747742,
+	0.7099756598472595,
+	0.7112945914268494,
+	0.7126122713088989,
+	0.7139286994934082,
+	0.7152439355850220,
+	0.7165578603744507,
+	0.7178704738616943,
+	0.7191818356513977,
+	0.7204918265342712,
+	0.7218005061149597,
+	0.7231078743934631,
+	0.7244138121604919,
+	0.7257184386253357,
+	0.7270216345787048,
+	0.7283234596252441,
+	0.7296237945556641,
+	0.7309227585792542,
+	0.7322202324867249,
+	0.7335162758827209,
+	0.7348108291625977,
+	0.7361038327217102,
+	0.7373954057693481,
+	0.7386853694915771,
+	0.7399738430976868,
+	0.7412607669830322,
+	0.7425460815429688,
+	0.7438298463821411,
+	0.7451119422912598,
+	0.7463924884796143,
+	0.7476713657379150,
+	0.7489486336708069,
+	0.7502242326736450,
+	0.7514981031417847,
+	0.7527703046798706,
+	0.7540408372879028,
+	0.7553096413612366,
+	0.7565766572952271,
+	0.7578419446945190,
+	0.7591055035591125,
+	0.7603672146797180,
+	0.7616271972656250,
+	0.7628853321075439,
+	0.7641416192054749,
+	0.7653960585594177,
+	0.7666487097740173,
+	0.7678994536399841,
+	0.7691482901573181,
+	0.7703952193260193,
+	0.7716402411460876,
+	0.7728833556175232,
+	0.7741245031356812,
+	0.7753636837005615,
+	0.7766008973121643,
+	0.7778361439704895,
+	0.7790693044662476,
+	0.7803004980087280,
+	0.7815296649932861,
+	0.7827568054199219,
+	0.7839818000793457,
+	0.7852047681808472,
+	0.7864256501197815,
+	0.7876443862915039,
+	0.7888610363006592,
+	0.7900754809379578,
+	0.7912878394126892,
+	0.7924979925155640,
+	0.7937059402465820,
+	0.7949117422103882,
+	0.7961152791976929,
+	0.7973166108131409,
+	0.7985157370567322,
+	0.7997125387191772,
+	0.8009071350097656,
+	0.8020994067192078,
+	0.8032893538475037,
+	0.8044769763946533,
+	0.8056623339653015,
+	0.8068453073501587,
+	0.8080258965492249,
+	0.8092041015625000,
+	0.8103799819946289,
+	0.8115534186363220,
+	0.8127244114875793,
+	0.8138930201530457,
+	0.8150591254234314,
+	0.8162228465080261,
+	0.8173840045928955,
+	0.8185427188873291,
+	0.8196989297866821,
+	0.8208525776863098,
+	0.8220037221908569,
+	0.8231523036956787,
+	0.8242983222007751,
+	0.8254417777061462,
+	0.8265826702117920,
+	0.8277208805084229,
+	0.8288565278053284,
+	0.8299894928932190,
+	0.8311198353767395,
+	0.8322474956512451,
+	0.8333725333213806,
+	0.8344948291778564,
+	0.8356144428253174,
+	0.8367313146591187,
+	0.8378454446792603,
+	0.8389568328857422,
+	0.8400654196739197,
+	0.8411712646484375,
+	0.8422743678092957,
+	0.8433746099472046,
+	0.8444720506668091,
+	0.8455666303634644,
+	0.8466583490371704,
+	0.8477472662925720,
+	0.8488332629203796,
+	0.8499163985252380,
+	0.8509966135025024,
+	0.8520739078521729,
+	0.8531482815742493,
+	0.8542197346687317,
+	0.8552882075309753,
+	0.8563537001609802,
+	0.8574162125587463,
+	0.8584756851196289,
+	0.8595321774482727,
+	0.8605856299400330,
+	0.8616361021995544,
+	0.8626834750175476,
+	0.8637277483940125,
+	0.8647689819335938,
+	0.8658071160316467,
+	0.8668421506881714,
+	0.8678740859031677,
+	0.8689028024673462,
+	0.8699284791946411,
+	0.8709509372711182,
+	0.8719701766967773,
+	0.8729863166809082,
+	0.8739991784095764,
+	0.8750088810920715,
+	0.8760153055191040,
+	0.8770185112953186,
+	0.8780184984207153,
+	0.8790152072906494,
+	0.8800085783004761,
+	0.8809987306594849,
+	0.8819855451583862,
+	0.8829690217971802,
+	0.8839492201805115,
+	0.8849260210990906,
+	0.8858994841575623,
+	0.8868696093559265,
+	0.8878362774848938,
+	0.8887996077537537,
+	0.8897595405578613,
+	0.8907160162925720,
+	0.8916690349578857,
+	0.8926186561584473,
+	0.8935648202896118,
+	0.8945075273513794,
+	0.8954467177391052,
+	0.8963823914527893,
+	0.8973146080970764,
+	0.8982432484626770,
+	0.8991683721542358,
+	0.9000899791717529,
+	0.9010080099105835,
+	0.9019225239753723,
+	0.9028334021568298,
+	0.9037407040596008,
+	0.9046443700790405,
+	0.9055444598197937,
+	0.9064408540725708,
+	0.9073336720466614,
+	0.9082227945327759,
+	0.9091082811355591,
+	0.9099900722503662,
+	0.9108682274818420,
+	0.9117426276206970,
+	0.9126133322715759,
+	0.9134802818298340,
+	0.9143435359001160,
+	0.9152030348777771,
+	0.9160587787628174,
+	0.9169107079505920,
+	0.9177589416503906,
+	0.9186033010482788,
+	0.9194439053535461,
+	0.9202806353569031,
+	0.9211136102676392,
+	0.9219427108764648,
+	0.9227679967880249,
+	0.9235893487930298,
+	0.9244068861007690,
+	0.9252205491065979,
+	0.9260302782058716,
+	0.9268361330032349,
+	0.9276380538940430,
+	0.9284360408782959,
+	0.9292301535606384,
+	0.9300202727317810,
+	0.9308063983917236,
+	0.9315885901451111,
+	0.9323668479919434,
+	0.9331410527229309,
+	0.9339112639427185,
+	0.9346774816513062,
+	0.9354397058486938,
+	0.9361978769302368,
+	0.9369519948959351,
+	0.9377020597457886,
+	0.9384480118751526,
+	0.9391899704933167,
+	0.9399278163909912,
+	0.9406616091728210,
+	0.9413912296295166,
+	0.9421167969703674,
+	0.9428381919860840,
+	0.9435554742813110,
+	0.9442686438560486,
+	0.9449776411056519,
+	0.9456824660301208,
+	0.9463831186294556,
+	0.9470795989036560,
+	0.9477719068527222,
+	0.9484599828720093,
+	0.9491438865661621,
+	0.9498235583305359,
+	0.9504989981651306,
+	0.9511702060699463,
+	0.9518371224403381,
+	0.9524998664855957,
+	0.9531582593917847,
+	0.9538124203681946,
+	0.9544622898101807,
+	0.9551079273223877,
+	0.9557492136955261,
+	0.9563861489295959,
+	0.9570188522338867,
+	0.9576472043991089,
+	0.9582712054252625,
+	0.9588908553123474,
+	0.9595061540603638,
+	0.9601171016693115,
+	0.9607236385345459,
+	0.9613258838653564,
+	0.9619236588478088,
+	0.9625170826911926,
+	0.9631060957908630,
+	0.9636906981468201,
+	0.9642708897590637,
+	0.9648466706275940,
+	0.9654179811477661,
+	0.9659848809242249,
+	0.9665473103523254,
+	0.9671052694320679,
+	0.9676588177680969,
+	0.9682078361511230,
+	0.9687523841857910,
+	0.9692924618721008,
+	0.9698280692100525,
+	0.9703590869903564,
+	0.9708856940269470,
+	0.9714077115058899,
+	0.9719252586364746,
+	0.9724382162094116,
+	0.9729466438293457,
+	0.9734505414962769,
+	0.9739499092102051,
+	0.9744446873664856,
+	0.9749349355697632,
+	0.9754205346107483,
+	0.9759016036987305,
+	0.9763780832290649,
+	0.9768499732017517,
+	0.9773172736167908,
+	0.9777799248695374,
+	0.9782379865646362,
+	0.9786914587020874,
+	0.9791402220726013,
+	0.9795844554901123,
+	0.9800239801406860,
+	0.9804588556289673,
+	0.9808891415596008,
+	0.9813147187232971,
+	0.9817356467247009,
+	0.9821518659591675,
+	0.9825634956359863,
+	0.9829703569412231,
+	0.9833725690841675,
+	0.9837701320648193,
+	0.9841629266738892,
+	0.9845510721206665,
+	0.9849345088005066,
+	0.9853131771087646,
+	0.9856871962547302,
+	0.9860564470291138,
+	0.9864209294319153,
+	0.9867807626724243,
+	0.9871358275413513,
+	0.9874861240386963,
+	0.9878317117691040,
+	0.9881724715232849,
+	0.9885085225105286,
+	0.9888398051261902,
+	0.9891663789749146,
+	0.9894881248474121,
+	0.9898050427436829,
+	0.9901172518730164,
+	0.9904246330261230,
+	0.9907272458076477,
+	0.9910250902175903,
+	0.9913181066513062,
+	0.9916062951087952,
+	0.9918897151947021,
+	0.9921683073043823,
+	0.9924420714378357,
+	0.9927110671997070,
+	0.9929751753807068,
+	0.9932345151901245,
+	0.9934889674186707,
+	0.9937386512756348,
+	0.9939834475517273,
+	0.9942234158515930,
+	0.9944585561752319,
+	0.9946888685226440,
+	0.9949142932891846,
+	0.9951348304748535,
+	0.9953505992889404,
+	0.9955614209175110,
+	0.9957674145698547,
+	0.9959685802459717,
+	0.9961648583412170,
+	0.9963562488555908,
+	0.9965427517890930,
+	0.9967243671417236,
+	0.9969011545181274,
+	0.9970730543136597,
+	0.9972400069236755,
+	0.9974021315574646,
+	0.9975593686103821,
+	0.9977117180824280,
+	0.9978591203689575,
+	0.9980016946792603,
+	0.9981393218040466,
+	0.9982721209526062,
+	0.9983999729156494,
+	0.9985228776931763,
+	0.9986409544944763,
+	0.9987540841102600,
+	0.9988623261451721,
+	0.9989656209945679,
+	0.9990640282630920,
+	0.9991575479507446,
+	0.9992461204528809,
+	0.9993298053741455,
+	0.9994085431098938,
+	0.9994823932647705,
+	0.9995512962341309,
+	0.9996153116226196,
+	0.9996743798255920,
+	0.9997285604476929,
+	0.9997777938842773,
+	0.9998220801353455,
+	0.9998614788055420,
+	0.9998959898948669,
+	0.9999254941940308,
+	0.9999501109123230,
+	0.9999698400497437,
+	0.9999846220016479,
+	0.9999944567680359,
+	0.9999994039535522,
+	0.9999994039535522,
+	0.9999944567680359,
+	0.9999846220016479,
+	0.9999698400497437,
+	0.9999501109123230,
+	0.9999254941940308,
+	0.9998959302902222,
+	0.9998614788055420,
+	0.9998220801353455,
+	0.9997777938842773,
+	0.9997285604476929,
+	0.9996743798255920,
+	0.9996153116226196,
+	0.9995512962341309,
+	0.9994823932647705,
+	0.9994085431098938,
+	0.9993298053741455,
+	0.9992461204528809,
+	0.9991575479507446,
+	0.9990640282630920,
+	0.9989656209945679,
+	0.9988622665405273,
+	0.9987540841102600,
+	0.9986408948898315,
+	0.9985228776931763,
+	0.9983999133110046,
+	0.9982720613479614,
+	0.9981393218040466,
+	0.9980016946792603,
+	0.9978591203689575,
+	0.9977117180824280,
+	0.9975593686103821,
+	0.9974021315574646,
+	0.9972400069236755,
+	0.9970729947090149,
+	0.9969011545181274,
+	0.9967243671417236,
+	0.9965427517890930,
+	0.9963561892509460,
+	0.9961647987365723,
+	0.9959685206413269,
+	0.9957674145698547,
+	0.9955614209175110,
+	0.9953505396842957,
+	0.9951348304748535,
+	0.9949142336845398,
+	0.9946888089179993,
+	0.9944585561752319,
+	0.9942234158515930,
+	0.9939834475517273,
+	0.9937385916709900,
+	0.9934889674186707,
+	0.9932344555854797,
+	0.9929751753807068,
+	0.9927110075950623,
+	0.9924420714378357,
+	0.9921683073043823,
+	0.9918897151947021,
+	0.9916062951087952,
+	0.9913180470466614,
+	0.9910250306129456,
+	0.9907272458076477,
+	0.9904246330261230,
+	0.9901171922683716,
+	0.9898050427436829,
+	0.9894880652427673,
+	0.9891663193702698,
+	0.9888398051261902,
+	0.9885085225105286,
+	0.9881724715232849,
+	0.9878316521644592,
+	0.9874860644340515,
+	0.9871357679367065,
+	0.9867807030677795,
+	0.9864209294319153,
+	0.9860563874244690,
+	0.9856871366500854,
+	0.9853131175041199,
+	0.9849344491958618,
+	0.9845510125160217,
+	0.9841628670692444,
+	0.9837700724601746,
+	0.9833725690841675,
+	0.9829703569412231,
+	0.9825634360313416,
+	0.9821518659591675,
+	0.9817355871200562,
+	0.9813146591186523,
+	0.9808890819549561,
+	0.9804587960243225,
+	0.9800239205360413,
+	0.9795843958854675,
+	0.9791402220726013,
+	0.9786913990974426,
+	0.9782379269599915,
+	0.9777798652648926,
+	0.9773172140121460,
+	0.9768499135971069,
+	0.9763780236244202,
+	0.9759015440940857,
+	0.9754205346107483,
+	0.9749348759651184,
+	0.9744446277618408,
+	0.9739498496055603,
+	0.9734504818916321,
+	0.9729465842247009,
+	0.9724381566047668,
+	0.9719251990318298,
+	0.9714076519012451,
+	0.9708856344223022,
+	0.9703590273857117,
+	0.9698280096054077,
+	0.9692924022674561,
+	0.9687523245811462,
+	0.9682077765464783,
+	0.9676587581634521,
+	0.9671052098274231,
+	0.9665472507476807,
+	0.9659848213195801,
+	0.9654179215431213,
+	0.9648466110229492,
+	0.9642708301544189,
+	0.9636906385421753,
+	0.9631060361862183,
+	0.9625170230865479,
+	0.9619235992431641,
+	0.9613257646560669,
+	0.9607235789299011,
+	0.9601170420646667,
+	0.9595060944557190,
+	0.9588907361030579,
+	0.9582710862159729,
+	0.9576470851898193,
+	0.9570187926292419,
+	0.9563860893249512,
+	0.9557490944862366,
+	0.9551078081130981,
+	0.9544622302055359,
+	0.9538123607635498,
+	0.9531581997871399,
+	0.9524997472763062,
+	0.9518370628356934,
+	0.9511700868606567,
+	0.9504988789558411,
+	0.9498234391212463,
+	0.9491438269615173,
+	0.9484599232673645,
+	0.9477718472480774,
+	0.9470795392990112,
+	0.9463830590248108,
+	0.9456824064254761,
+	0.9449775218963623,
+	0.9442685842514038,
+	0.9435554146766663,
+	0.9428381323814392,
+	0.9421166777610779,
+	0.9413911700248718,
+	0.9406614899635315,
+	0.9399277567863464,
+	0.9391899108886719,
+	0.9384479522705078,
+	0.9377019405364990,
+	0.9369518756866455,
+	0.9361977577209473,
+	0.9354395866394043,
+	0.9346774220466614,
+	0.9339112043380737,
+	0.9331409931182861,
+	0.9323667287826538,
+	0.9315885305404663,
+	0.9308063387870789,
+	0.9300201535224915,
+	0.9292300343513489,
+	0.9284359812736511,
+	0.9276379942893982,
+	0.9268360137939453,
+	0.9260302186012268,
+	0.9252204298973083,
+	0.9244068264961243,
+	0.9235892891883850,
+	0.9227678775787354,
+	0.9219425916671753,
+	0.9211134910583496,
+	0.9202805757522583,
+	0.9194437861442566,
+	0.9186031818389893,
+	0.9177588224411011,
+	0.9169106483459473,
+	0.9160586595535278,
+	0.9152029156684875,
+	0.9143434166908264,
+	0.9134802222251892,
+	0.9126132130622864,
+	0.9117425084114075,
+	0.9108681082725525,
+	0.9099900126457214,
+	0.9091082215309143,
+	0.9082227349281311,
+	0.9073335528373718,
+	0.9064407944679260,
+	0.9055443406105042,
+	0.9046442508697510,
+	0.9037405848503113,
+	0.9028332829475403,
+	0.9019224047660828,
+	0.9010079503059387,
+	0.9000898599624634,
+	0.8991683125495911,
+	0.8982431292533875,
+	0.8973144888877869,
+	0.8963822722434998,
+	0.8954465985298157,
+	0.8945074081420898,
+	0.8935647010803223,
+	0.8926185369491577,
+	0.8916689753532410,
+	0.8907158970832825,
+	0.8897594213485718,
+	0.8887994885444641,
+	0.8878361582756042,
+	0.8868694901466370,
+	0.8858993649482727,
+	0.8849259018898010,
+	0.8839491009712219,
+	0.8829689025878906,
+	0.8819854259490967,
+	0.8809986114501953,
+	0.8800085186958313,
+	0.8790150880813599,
+	0.8780183792114258,
+	0.8770184516906738,
+	0.8760151863098145,
+	0.8750087618827820,
+	0.8739990592002869,
+	0.8729861974716187,
+	0.8719700574874878,
+	0.8709508180618286,
+	0.8699283599853516,
+	0.8689026832580566,
+	0.8678739666938782,
+	0.8668420314788818,
+	0.8658069968223572,
+	0.8647688627243042,
+	0.8637276291847229,
+	0.8626833558082581,
+	0.8616359829902649,
+	0.8605855107307434,
+	0.8595320582389832,
+	0.8584755659103394,
+	0.8574160337448120,
+	0.8563535809516907,
+	0.8552880287170410,
+	0.8542196154594421,
+	0.8531481623649597,
+	0.8520737886428833,
+	0.8509964942932129,
+	0.8499162793159485,
+	0.8488331437110901,
+	0.8477471470832825,
+	0.8466582298278809,
+	0.8455665111541748,
+	0.8444718718528748,
+	0.8433744907379150,
+	0.8422742486000061,
+	0.8411711454391479,
+	0.8400653004646301,
+	0.8389567136764526,
+	0.8378453254699707,
+	0.8367311358451843,
+	0.8356142640113831,
+	0.8344947099685669,
+	0.8333724141120911,
+	0.8322473764419556,
+	0.8311197161674500,
+	0.8299893736839294,
+	0.8288564085960388,
+	0.8277207612991333,
+	0.8265824913978577,
+	0.8254416584968567,
+	0.8242982029914856,
+	0.8231521844863892,
+	0.8220036029815674,
+	0.8208524584770203,
+	0.8196988105773926,
+	0.8185425996780396,
+	0.8173838853836060,
+	0.8162226676940918,
+	0.8150590062141418,
+	0.8138929009437561,
+	0.8127242922782898,
+	0.8115532994270325,
+	0.8103798627853394,
+	0.8092039823532104,
+	0.8080257773399353,
+	0.8068451285362244,
+	0.8056621551513672,
+	0.8044768571853638,
+	0.8032892346382141,
+	0.8020992279052734,
+	0.8009069561958313,
+	0.7997124195098877,
+	0.7985155582427979,
+	0.7973164916038513,
+	0.7961151599884033,
+	0.7949116230010986,
+	0.7937058210372925,
+	0.7924978733062744,
+	0.7912877202033997,
+	0.7900753617286682,
+	0.7888608574867249,
+	0.7876442670822144,
+	0.7864255309104919,
+	0.7852046489715576,
+	0.7839816808700562,
+	0.7827566266059875,
+	0.7815295457839966,
+	0.7803003787994385,
+	0.7790691852569580,
+	0.7778359651565552,
+	0.7766007781028748,
+	0.7753635644912720,
+	0.7741243839263916,
+	0.7728832364082336,
+	0.7716401219367981,
+	0.7703951001167297,
+	0.7691481113433838,
+	0.7678992748260498,
+	0.7666485309600830,
+	0.7653959393501282,
+	0.7641414999961853,
+	0.7628851532936096,
+	0.7616270184516907,
+	0.7603670954704285,
+	0.7591053247451782,
+	0.7578418254852295,
+	0.7565765380859375,
+	0.7553094625473022,
+	0.7540407180786133,
+	0.7527701854705811,
+	0.7514979839324951,
+	0.7502240538597107,
+	0.7489485144615173,
+	0.7476712465286255,
+	0.7463923692703247,
+	0.7451118230819702,
+	0.7438296675682068,
+	0.7425459623336792,
+	0.7412605881690979,
+	0.7399737238883972,
+	0.7386852502822876,
+	0.7373952269554138,
+	0.7361037135124207,
+	0.7348106503486633,
+	0.7335160970687866,
+	0.7322201132774353,
+	0.7309226393699646,
+	0.7296236753463745,
+	0.7283232808113098,
+	0.7270215153694153,
+	0.7257182598114014,
+	0.7244136929512024,
+	0.7231076955795288,
+	0.7218003869056702,
+	0.7204917073249817,
+	0.7191816568374634,
+	0.7178703546524048,
+	0.7165576815605164,
+	0.7152437567710876,
+	0.7139285802841187,
+	0.7126120924949646,
+	0.7112944126129150,
+	0.7099754810333252,
+	0.7086553573608398,
+	0.7073340415954590,
+	0.7060115337371826,
+	0.7046878337860107,
+	0.7033630013465881,
+	0.7020370364189148,
+	0.7007099390029907,
+	0.6993817090988159,
+	0.6980524063110352,
+	0.6967220306396484,
+	0.6953905820846558,
+	0.6940581202507019,
+	0.6927245855331421,
+	0.6913900375366211,
+	0.6900544762611389,
+	0.6887179613113403,
+	0.6873804330825806,
+	0.6860419511795044,
+	0.6847025156021118,
+	0.6833621859550476,
+	0.6820209026336670,
+	0.6806787848472595,
+	0.6793357133865356,
+	0.6779918074607849,
+	0.6766470074653625,
+	0.6753013730049133,
+	0.6739549636840820,
+	0.6726077198982239,
+	0.6712596416473389,
+	0.6699107885360718,
+	0.6685612201690674,
+	0.6672108173370361,
+	0.6658596992492676,
+	0.6645079255104065,
+	0.6631553769111633,
+	0.6618021130561829,
+	0.6604481935501099,
+	0.6590936183929443,
+	0.6577383875846863,
+	0.6563825011253357,
+	0.6550260186195374,
+	0.6536689400672913,
+	0.6523112058639526,
+	0.6509529352188110,
+	0.6495941281318665,
+	0.6482347249984741,
+	0.6468747854232788,
+	0.6455143094062805,
+	0.6441533565521240,
+	0.6427919268608093,
+	0.6414299607276917,
+	0.6400675773620605,
+	0.6387047171592712,
+	0.6373414397239685,
+	0.6359777450561523,
+	0.6346136331558228,
+	0.6332491040229797,
+	0.6318842172622681,
+	0.6305189132690430,
+	0.6291533112525940,
+	0.6277873516082764,
+	0.6264210939407349,
+	0.6250545382499695,
+	0.6236876249313354,
+	0.6223204731941223,
+	0.6209530830383301,
+	0.6195853948593140,
+	0.6182174682617188,
+	0.6168493032455444,
+	0.6154809594154358,
+	0.6141123771667480,
+	0.6127436161041260,
+	0.6113747358322144,
+	0.6100056767463684,
+	0.6086364388465881,
+	0.6072670817375183,
+	0.6058976650238037,
+	0.6045280694961548,
+	0.6031584739685059,
+	0.6017886996269226,
+	0.6004189252853394,
+	0.5990490913391113,
+	0.5976792573928833,
+	0.5963093638420105,
+	0.5949394702911377,
+	0.5935695767402649,
+	0.5921997427940369,
+	0.5908299088478088,
+	0.5894601345062256,
+	0.5880904197692871,
+	0.5867207646369934,
+	0.5853511691093445,
+	0.5839817523956299,
+	0.5826123952865601,
+	0.5812431573867798,
+	0.5798740983009338,
+	0.5785051584243774,
+	0.5771363973617554,
+	0.5757678151130676,
+	0.5743994116783142,
+	0.5730312466621399,
+	0.5716632604598999,
+	0.5702955722808838,
+	0.5689280629158020,
+	0.5675608515739441,
+	0.5661938786506653,
+	0.5648272037506104,
+	0.5634608268737793,
+	0.5620947480201721,
+	0.5607289671897888,
+	0.5593635439872742,
+	0.5579984784126282,
+	0.5566337704658508,
+	0.5552694797515869,
+	0.5539054870605469,
+	0.5525419712066650,
+	0.5511788129806519,
+	0.5498160719871521,
+	0.5484538078308105,
+	0.5470919609069824,
+	0.5457305908203125,
+	0.5443696975708008,
+	0.5430092811584473,
+	0.5416493415832520,
+	0.5402899384498596,
+	0.5389310717582703,
+	0.5375726819038391,
+	0.5362148880958557,
+	0.5348576307296753,
+	0.5335009694099426,
+	0.5321449041366577,
+	0.5307893753051758,
+	0.5294345021247864,
+	0.5280802249908447,
+	0.5267265439033508,
+	0.5253735780715942,
+	0.5240212082862854,
+	0.5226694941520691,
+	0.5213184952735901,
+	0.5199682116508484,
+	0.5186185836791992,
+	0.5172696709632874,
+	0.5159215331077576,
+	0.5145740509033203,
+	0.5132274031639099,
+	0.5118814706802368,
+	0.5105363130569458,
+	0.5091919302940369,
+	0.5078483223915100,
+	0.5065055489540100,
+	0.5051636099815369,
+	0.5038224458694458,
+	0.5024821758270264,
+	0.5011427402496338,
+	0.4998041689395905,
+	0.4984664618968964,
+	0.4971296191215515,
+	0.4957937002182007,
+	0.4944586753845215,
+	0.4931245744228363,
+	0.4917913973331451,
+	0.4904591739177704,
+	0.4891278743743896,
+	0.4877975583076477,
+	0.4864681959152222,
+	0.4851398468017578,
+	0.4838124513626099,
+	0.4824860692024231,
+	0.4811607301235199,
+	0.4798364043235779,
+	0.4785130918025970,
+	0.4771908521652222,
+	0.4758696556091309,
+	0.4745495319366455,
+	0.4732304811477661,
+	0.4719125330448151,
+	0.4705956578254700,
+	0.4692799150943756,
+	0.4679652750492096,
+	0.4666517674922943,
+	0.4653394222259521,
+	0.4640281796455383,
+	0.4627181291580200,
+	0.4614092409610748,
+	0.4601015448570251,
+	0.4587950408458710,
+	0.4574897289276123,
+	0.4561856091022491,
+	0.4548827111721039,
+	0.4535810649394989,
+	0.4522806406021118,
+	0.4509814679622650,
+	0.4496835768222809,
+	0.4483869075775146,
+	0.4470915496349335,
+	0.4457974731922150,
+	0.4445047080516815,
+	0.4432132244110107,
+	0.4419230520725250,
+	0.4406342208385468,
+	0.4393467307090759,
+	0.4380605518817902,
+	0.4367757439613342,
+	0.4354923069477081,
+	0.4342102408409119,
+	0.4329295456409454,
+	0.4316502511501312,
+	0.4303723275661469,
+	0.4290958344936371,
+	0.4278207719326019,
+	0.4265471100807190,
+	0.4252748787403107,
+	0.4240041077136993,
+	0.4227347671985626,
+	0.4214669167995453,
+	0.4202004969120026,
+	0.4189355969429016,
+	0.4176721572875977,
+	0.4164102077484131,
+	0.4151497781276703,
+	0.4138908386230469,
+	0.4126334488391876,
+	0.4113775789737701,
+	0.4101232290267944,
+	0.4088704288005829,
+	0.4076191782951355,
+	0.4063695073127747,
+	0.4051213860511780,
+	0.4038748741149902,
+	0.4026299118995667,
+	0.4013865590095520,
+	0.4001448154449463,
+	0.3989046812057495,
+	0.3976661562919617,
+	0.3964292705059052,
+	0.3951939940452576,
+	0.3939603567123413,
+	0.3927283883094788,
+	0.3914980888366699,
+	0.3902694284915924,
+	0.3890424370765686,
+	0.3878171443939209,
+	0.3865935206413269,
+	0.3853715956211090,
+	0.3841513693332672,
+	0.3829328715801239,
+	0.3817160725593567,
+	0.3805009722709656,
+	0.3792876303195953,
+	0.3780760467052460,
+	0.3768661618232727,
+	0.3756580650806427,
+	0.3744517266750336,
+	0.3732471466064453,
+	0.3720443248748779,
+	0.3708432912826538,
+	0.3696440458297729,
+	0.3684465885162354,
+	0.3672509491443634,
+	0.3660570979118347,
+	0.3648650646209717,
+	0.3636748492717743,
+	0.3624864518642426,
+	0.3612999022006989,
+	0.3601152002811432,
+	0.3589323461055756,
+	0.3577513098716736,
+	0.3565721809864044,
+	0.3553948700428009,
+	0.3542194664478302,
+	0.3530459105968475,
+	0.3518742620944977,
+	0.3507044911384583,
+	0.3495366275310516,
+	0.3483706414699554,
+	0.3472065925598145,
+	0.3460444509983063,
+	0.3448842167854309,
+	0.3437259197235107,
+	0.3425695598125458,
+	0.3414151072502136,
+	0.3402626216411591,
+	0.3391120731830597,
+	0.3379634916782379,
+	0.3368168771266937,
+	0.3356721997261047,
+	0.3345295190811157,
+	0.3333888053894043,
+	0.3322500586509705,
+	0.3311133086681366,
+	0.3299785554409027,
+	0.3288457989692688,
+	0.3277150690555573,
+	0.3265863060951233,
+	0.3254595696926117,
+	0.3243348598480225,
+	0.3232121765613556,
+	0.3220915198326111,
+	0.3209728896617889,
+	0.3198562860488892,
+	0.3187417387962341,
+	0.3176292479038239,
+	0.3165187835693359,
+	0.3154104053974152,
+	0.3143040835857391,
+	0.3131998181343079,
+	0.3120976090431213,
+	0.3109974861145020,
+	0.3098994493484497,
+	0.3088034987449646,
+	0.3077096641063690,
+	0.3066178858280182,
+	0.3055282235145569,
+	0.3044406473636627,
+	0.3033551871776581,
+	0.3022718131542206,
+	0.3011905848979950,
+	0.3001114726066589,
+	0.2990344762802124,
+	0.2979595959186554,
+	0.2968868613243103,
+	0.2958162724971771,
+	0.2947477996349335,
+	0.2936814725399017,
+	0.2926172912120819,
+	0.2915552854537964,
+	0.2904953956604004,
+	0.2894376814365387,
+	0.2883821129798889,
+	0.2873287498950958,
+	0.2862775027751923,
+	0.2852284610271454,
+	0.2841815948486328,
+	0.2831369042396545,
+	0.2820943892002106,
+	0.2810540497303009,
+	0.2800159156322479,
+	0.2789799571037292,
+	0.2779462039470673,
+	0.2769146263599396,
+	0.2758852839469910,
+	0.2748581171035767,
+	0.2738331854343414,
+	0.2728104591369629,
+	0.2717899382114410,
+	0.2707716226577759,
+	0.2697555422782898,
+	0.2687416672706604,
+	0.2677300274372101,
+	0.2667206227779388,
+	0.2657134234905243,
+	0.2647084593772888,
+	0.2637057602405548,
+	0.2627052664756775,
+	0.2617070376873016,
+	0.2607110440731049,
+	0.2597172856330872,
+	0.2587257921695709,
+	0.2577365338802338,
+	0.2567495107650757,
+	0.2557647824287415,
+	0.2547822892665863,
+	0.2538020610809326,
+	0.2528240680694580,
+	0.2518483698368073,
+	0.2508749067783356,
+	0.2499037384986877,
+	0.2489348351955414,
+	0.2479681968688965,
+	0.2470038235187531,
+	0.2460417449474335,
+	0.2450819313526154,
+	0.2441243976354599,
+	0.2431691288948059,
+	0.2422161698341370,
+	0.2412654757499695,
+	0.2403170615434647,
+	0.2393709421157837,
+	0.2384271174669266,
+	0.2374855726957321,
+	0.2365463227033615,
+	0.2356093674898148,
+	0.2346747070550919,
+	0.2337423413991928,
+	0.2328122705221176,
+	0.2318844944238663,
+	0.2309590280056000,
+	0.2300358414649963,
+	0.2291149795055389,
+	0.2281964123249054,
+	0.2272801399230957,
+	0.2263661772012711,
+	0.2254545241594315,
+	0.2245451658964157,
+	0.2236381322145462,
+	0.2227333933115005,
+	0.2218309789896011,
+	0.2209308594465256,
+	0.2200330495834351,
+	0.2191375643014908,
+	0.2182443886995316,
+	0.2173535227775574,
+	0.2164649665355682,
+	0.2155787199735641,
+	0.2146947979927063,
+	0.2138131856918335,
+	0.2129338979721069,
+	0.2120569199323654,
+	0.2111822664737701,
+	0.2103099226951599,
+	0.2094398885965347,
+	0.2085721790790558,
+	0.2077067941427231,
+	0.2068437188863754,
+	0.2059829682111740,
+	0.2051245272159576,
+	0.2042684108018875,
+	0.2034146040678024,
+	0.2025631219148636,
+	0.2017139643430710,
+	0.2008671164512634,
+	0.2000225931406021,
+	0.1991803944110870,
+	0.1983405053615570,
+	0.1975029259920120,
+	0.1966676712036133,
+	0.1958347409963608,
+	0.1950041204690933,
+	0.1941758245229721,
+	0.1933498382568359,
+	0.1925261765718460,
+	0.1917048245668411,
+	0.1908857822418213,
+	0.1900690644979477,
+	0.1892546564340591,
+	0.1884425729513168,
+	0.1876327991485596,
+	0.1868253201246262,
+	0.1860201805830002,
+	0.1852173358201981,
+	0.1844168007373810,
+	0.1836185902357101,
+	0.1828226745128632,
+	0.1820290684700012,
+	0.1812377721071243,
+	0.1804487854242325,
+	0.1796621084213257,
+	0.1788777261972427,
+	0.1780956685543060,
+	0.1773158907890320,
+	0.1765384227037430,
+	0.1757632642984390,
+	0.1749904006719589,
+	0.1742198318243027,
+	0.1734515577554703,
+	0.1726855933666229,
+	0.1719219237565994,
+	0.1711605340242386,
+	0.1704014539718628,
+	0.1696446537971497,
+	0.1688901484012604,
+	0.1681379228830338,
+	0.1673879921436310,
+	0.1666403561830521,
+	0.1658950001001358,
+	0.1651519238948822,
+	0.1644111275672913,
+	0.1636726111173630,
+	0.1629363745450974,
+	0.1622024178504944,
+	0.1614707410335541,
+	0.1607413291931152,
+	0.1600141972303391,
+	0.1592893302440643,
+	0.1585667282342911,
+	0.1578464061021805,
+	0.1571283340454102,
+	0.1564125418663025,
+	0.1556989997625351,
+	0.1549877226352692,
+	0.1542786955833435,
+	0.1535719335079193,
+	0.1528674215078354,
+	0.1521651595830917,
+	0.1514651626348495,
+	0.1507674008607864,
+	0.1500718742609024,
+	0.1493786126375198,
+	0.1486875861883163,
+	0.1479987949132919,
+	0.1473122388124466,
+	0.1466279178857803,
+	0.1459458470344543,
+	0.1452659964561462,
+	0.1445883661508560,
+	0.1439129710197449,
+	0.1432397961616516,
+	0.1425688415765762,
+	0.1419001072645187,
+	0.1412335932254791,
+	0.1405692994594574,
+	0.1399072110652924,
+	0.1392473280429840,
+	0.1385896652936935,
+	0.1379341930150986,
+	0.1372809410095215,
+	0.1366298645734787,
+	0.1359810084104538,
+	0.1353343278169632,
+	0.1346898525953293,
+	0.1340475529432297,
+	0.1334074586629868,
+	0.1327695399522781,
+	0.1321337968111038,
+	0.1315002292394638,
+	0.1308688521385193,
+	0.1302396357059479,
+	0.1296125948429108,
+	0.1289877295494080,
+	0.1283650100231171,
+	0.1277444660663605,
+	0.1271260827779770,
+	0.1265098452568054,
+	0.1258957684040070,
+	0.1252838373184204,
+	0.1246740520000458,
+	0.1240664124488831,
+	0.1234609186649323,
+	0.1228575557470322,
+	0.1222563311457634,
+	0.1216572374105453,
+	0.1210602745413780,
+	0.1204654350876808,
+	0.1198727190494537,
+	0.1192821264266968,
+	0.1186936423182487,
+	0.1181072816252708,
+	0.1175230219960213,
+	0.1169408708810806,
+	0.1163608282804489,
+	0.1157828792929649,
+	0.1152070313692093,
+	0.1146332770586014,
+	0.1140616089105606,
+	0.1134920269250870,
+	0.1129245311021805,
+	0.1123591214418411,
+	0.1117957830429077,
+	0.1112345159053802,
+	0.1106753200292587,
+	0.1101181954145432,
+	0.1095631271600723,
+	0.1090101227164268,
+	0.1084591746330261,
+	0.1079102754592896,
+	0.1073634326457977,
+	0.1068186312913895,
+	0.1062758713960648,
+	0.1057351455092430,
+	0.1051964610815048,
+	0.1046598032116890,
+	0.1041251793503761,
+	0.1035925745964050,
+	0.1030619889497757,
+	0.1025334224104881,
+	0.1020068675279617,
+	0.1014823243021965,
+	0.1009597852826118,
+	0.1004392504692078,
+	0.0999207124114037,
+	0.0994041636586189,
+	0.0988896116614342,
+	0.0983770415186882,
+	0.0978664606809616,
+	0.0973578542470932,
+	0.0968512222170830,
+	0.0963465645909309,
+	0.0958438739180565,
+	0.0953431501984596,
+	0.0948443785309792,
+	0.0943475663661957,
+	0.0938527062535286,
+	0.0933597981929779,
+	0.0928688272833824,
+	0.0923798009753227,
+	0.0918927118182182,
+	0.0914075523614883,
+	0.0909243226051331,
+	0.0904430225491524,
+	0.0899636372923851,
+	0.0894861668348312,
+	0.0890106111764908,
+	0.0885369628667831,
+	0.0880652144551277,
+	0.0875953733921051,
+	0.0871274247765541,
+	0.0866613686084747,
+	0.0861971974372864,
+	0.0857349112629890,
+	0.0852745100855827,
+	0.0848159790039062,
+	0.0843593180179596,
+	0.0839045271277428,
+	0.0834515988826752,
+	0.0830005332827568,
+	0.0825513154268265,
+	0.0821039527654648,
+	0.0816584378480911,
+	0.0812147632241249,
+	0.0807729214429855,
+	0.0803329199552536,
+	0.0798947438597679,
+	0.0794583931565285,
+	0.0790238603949547,
+	0.0785911455750465,
+	0.0781602486968040,
+	0.0777311548590660,
+	0.0773038640618324,
+	0.0768783763051033,
+	0.0764546841382980,
+	0.0760327801108360,
+	0.0756126642227173,
+	0.0751943290233612,
+	0.0747777745127678,
+	0.0743629857897758,
+	0.0739499703049660,
+	0.0735387206077576,
+	0.0731292292475700,
+	0.0727214962244034,
+	0.0723155140876770,
+	0.0719112828373909,
+	0.0715087875723839,
+	0.0711080357432365,
+	0.0707090198993683,
+	0.0703117251396179,
+	0.0699161589145660,
+	0.0695223137736320,
+	0.0691301897168159,
+	0.0687397718429565,
+	0.0683510676026344,
+	0.0679640620946884,
+	0.0675787478685379,
+	0.0671951398253441,
+	0.0668132156133652,
+	0.0664329752326012,
+	0.0660544186830521,
+	0.0656775385141373,
+	0.0653023272752762,
+	0.0649287775158882,
+	0.0645569041371346,
+	0.0641866773366928,
+	0.0638181120157242,
+	0.0634511858224869,
+	0.0630859136581421,
+	0.0627222806215286,
+	0.0623602792620659,
+	0.0619999095797539,
+	0.0616411678493023,
+	0.0612840466201305,
+	0.0609285458922386,
+	0.0605746544897556,
+	0.0602223761379719,
+	0.0598716959357262,
+	0.0595226176083088,
+	0.0591751337051392,
+	0.0588292405009270,
+	0.0584849342703819,
+	0.0581422075629234,
+	0.0578010566532612,
+	0.0574614778161049,
+	0.0571234636008739,
+	0.0567870177328587,
+	0.0564521253108978,
+	0.0561187900602818,
+	0.0557870008051395,
+	0.0554567575454712,
+	0.0551280528306961,
+	0.0548008866608143,
+	0.0544752478599548,
+	0.0541511364281178,
+	0.0538285486400127,
+	0.0535074733197689,
+	0.0531879141926765,
+	0.0528698600828648,
+	0.0525533109903336,
+	0.0522382594645023,
+	0.0519247017800808,
+	0.0516126342117786,
+	0.0513020530343056,
+	0.0509929507970810,
+	0.0506853237748146,
+	0.0503791682422161,
+	0.0500744767487049,
+	0.0497712492942810,
+	0.0494694784283638,
+	0.0491691604256630,
+	0.0488702915608883,
+	0.0485728643834591,
+	0.0482768788933754,
+	0.0479823239147663,
+	0.0476892031729221,
+	0.0473975054919720,
+	0.0471072271466255,
+	0.0468183644115925,
+	0.0465309135615826,
+	0.0462448708713055,
+	0.0459602288901806,
+	0.0456769838929176,
+	0.0453951358795166,
+	0.0451146736741066,
+	0.0448355935513973,
+	0.0445578955113888,
+	0.0442815721035004,
+	0.0440066158771515,
+	0.0437330268323421,
+	0.0434608012437820,
+	0.0431899279356003,
+	0.0429204106330872,
+	0.0426522381603718,
+	0.0423854067921638,
+	0.0421199165284634,
+	0.0418557599186897,
+	0.0415929295122623,
+	0.0413314253091812,
+	0.0410712435841560,
+	0.0408123731613159,
+	0.0405548177659512,
+	0.0402985662221909,
+	0.0400436148047447,
+	0.0397899635136127,
+	0.0395376048982143,
+	0.0392865315079689,
+	0.0390367470681667,
+	0.0387882366776466,
+	0.0385410040616989,
+	0.0382950417697430,
+	0.0380503423511982,
+	0.0378069058060646,
+	0.0375647284090519,
+	0.0373237989842892,
+	0.0370841212570667,
+	0.0368456840515137,
+	0.0366084836423397,
+	0.0363725200295448,
+	0.0361377857625484,
+	0.0359042771160603,
+	0.0356719903647900,
+	0.0354409180581570,
+	0.0352110601961613,
+	0.0349824056029320,
+	0.0347549580037594,
+	0.0345287062227726,
+	0.0343036502599716,
+	0.0340797826647758,
+	0.0338570997118950,
+	0.0336356014013290,
+	0.0334152765572071,
+	0.0331961214542389,
+	0.0329781398177147,
+	0.0327613167464733,
+	0.0325456522405148,
+	0.0323311462998390,
+	0.0321177877485752,
+	0.0319055728614330,
+	0.0316945016384125,
+	0.0314845666289330,
+	0.0312757641077042,
+	0.0310680922120810,
+	0.0308615416288376,
+	0.0306561104953289,
+	0.0304517950862646,
+	0.0302485898137093,
+	0.0300464909523726,
+	0.0298454947769642,
+	0.0296455975621939,
+	0.0294467918574810,
+	0.0292490776628256,
+	0.0290524456650019,
+	0.0288568958640099,
+	0.0286624226719141,
+	0.0284690205007792,
+	0.0282766874879599,
+	0.0280854180455208,
+	0.0278952065855265,
+	0.0277060512453318,
+	0.0275179464370012,
+	0.0273308865725994,
+	0.0271448716521263,
+	0.0269598942250013,
+	0.0267759487032890,
+	0.0265930350869894,
+	0.0264111459255219,
+	0.0262302793562412,
+	0.0260504279285669,
+	0.0258715897798538,
+	0.0256937611848116,
+	0.0255169384181499,
+	0.0253411140292883,
+	0.0251662880182266,
+	0.0249924510717392,
+	0.0248196050524712,
+	0.0246477425098419,
+	0.0244768578559160,
+	0.0243069510906935,
+	0.0241380147635937,
+	0.0239700451493263,
+	0.0238030403852463,
+	0.0236369948834181,
+	0.0234719049185514,
+	0.0233077649027109,
+	0.0231445729732513,
+	0.0229823235422373,
+	0.0228210128843784,
+	0.0226606391370296,
+	0.0225011948496103,
+	0.0223426781594753,
+	0.0221850834786892,
+	0.0220284089446068,
+	0.0218726489692926,
+	0.0217177998274565,
+	0.0215638577938080,
+	0.0214108191430569,
+	0.0212586782872677,
+	0.0211074352264404,
+	0.0209570806473494,
+	0.0208076145499945,
+	0.0206590332090855,
+	0.0205113291740417,
+	0.0203645024448633,
+	0.0202185474336147,
+	0.0200734585523605,
+	0.0199292358011007,
+	0.0197858717292547,
+	0.0196433644741774,
+	0.0195017084479332,
+	0.0193609017878771,
+	0.0192209407687187,
+	0.0190818198025227,
+	0.0189435351639986,
+	0.0188060849905014,
+	0.0186694636940956,
+	0.0185336675494909,
+	0.0183986946940422,
+	0.0182645395398140,
+	0.0181311983615160,
+	0.0179986674338579,
+	0.0178669430315495,
+	0.0177360232919455,
+	0.0176059026271105,
+	0.0174765773117542,
+	0.0173480454832315,
+	0.0172202996909618,
+	0.0170933399349451,
+	0.0169671606272459,
+	0.0168417599052191,
+	0.0167171321809292,
+	0.0165932737290859,
+	0.0164701826870441,
+	0.0163478534668684,
+	0.0162262842059135,
+	0.0161054711788893,
+	0.0159854087978601,
+	0.0158660952001810,
+	0.0157475266605616,
+	0.0156296994537115,
+	0.0155126098543406,
+	0.0153962541371584,
+	0.0152806295081973,
+	0.0151657322421670,
+	0.0150515576824546,
+	0.0149381030350924,
+	0.0148253655061126,
+	0.0147133413702250,
+	0.0146020269021392,
+	0.0144914174452424,
+	0.0143815120682120,
+	0.0142723051831126,
+	0.0141637939959764,
+	0.0140559747815132,
+	0.0139488456770778,
+	0.0138424010947347,
+	0.0137366391718388,
+	0.0136315561830997,
+	0.0135271484032273,
+	0.0134234121069312,
+	0.0133203445002437,
+	0.0132179427891970,
+	0.0131162032485008,
+	0.0130151221528649,
+	0.0129146957769990,
+	0.0128149222582579,
+	0.0127157969400287,
+	0.0126173170283437,
+	0.0125194797292352,
+	0.0124222813174129,
+	0.0123257180675864,
+	0.0122297881171107,
+	0.0121344858780503,
+	0.0120398113504052,
+	0.0119457580149174,
+	0.0118523249402642,
+	0.0117595084011555,
+	0.0116673046723008,
+	0.0115757109597325,
+	0.0114847244694829,
+	0.0113943414762616,
+	0.0113045582547784,
+	0.0112153729423881,
+	0.0111267827451229,
+	0.0110387820750475,
+	0.0109513709321618,
+	0.0108645437285304,
+	0.0107782995328307,
+	0.0106926327571273,
+	0.0106075434014201,
+	0.0105230258777738,
+	0.0104390773922205,
+	0.0103556960821152,
+	0.0102728791534901,
+	0.0101906219497323,
+	0.0101089226081967,
+	0.0100277783349156,
+	0.0099471854045987,
+	0.0098671410232782,
+	0.0097876423969865,
+	0.0097086867317557,
+	0.0096302703022957,
+	0.0095523921772838,
+	0.0094750467687845,
+	0.0093982331454754,
+	0.0093219475820661,
+	0.0092461872845888,
+	0.0091709494590759,
+	0.0090962313115597,
+	0.0090220300480723,
+	0.0089483428746462,
+	0.0088751669973135,
+	0.0088024986907840,
+	0.0087303360924125,
+	0.0086586764082313,
+	0.0085875159129500,
+	0.0085168536752462,
+	0.0084466850385070,
+	0.0083770072087646,
+	0.0083078192546964,
+	0.0082391174510121,
+	0.0081708980724216,
+	0.0081031601876020,
+	0.0080359000712633,
+	0.0079691149294376,
+	0.0079028028994799,
+	0.0078369602560997,
+	0.0077715846709907,
+	0.0077066738158464,
+	0.0076422248966992,
+	0.0075782355852425,
+	0.0075147021561861,
+	0.0074516232125461,
+	0.0073889959603548,
+	0.0073268171399832,
+	0.0072650848887861,
+	0.0072037959471345,
+	0.0071429484523833,
+	0.0070825391449034,
+	0.0070225661620498,
+	0.0069630262441933,
+	0.0069039179943502,
+	0.0068452376872301,
+	0.0067869834601879,
+	0.0067291525192559,
+	0.0066717425361276,
+	0.0066147511824965,
+	0.0065581761300564,
+	0.0065020145848393,
+	0.0064462637528777,
+	0.0063909222371876,
+	0.0063359863124788,
+	0.0062814550474286,
+	0.0062273247167468,
+	0.0061735934577882,
+	0.0061202594079077,
+	0.0060673193074763,
+	0.0060147712938488,
+	0.0059626125730574,
+	0.0059108417481184,
+	0.0058594555594027,
+	0.0058084521442652,
+	0.0057578287087381,
+	0.0057075833901763,
+	0.0056577138602734,
+	0.0056082177907228,
+	0.0055590933188796,
+	0.0055103371851146,
+	0.0054619479924440,
+	0.0054139229469001,
+	0.0053662601858377,
+	0.0053189578466117,
+	0.0052720126695931,
+	0.0052254232577980,
+	0.0051791872829199,
+	0.0051333024166524,
+	0.0050877663306892,
+	0.0050425771623850,
+	0.0049977330490947,
+	0.0049532311968505,
+	0.0049090697430074,
+	0.0048652463592589,
+	0.0048217591829598,
+	0.0047786063514650,
+	0.0047357850708067,
+	0.0046932939440012,
+	0.0046511306427419,
+	0.0046092928387225,
+	0.0045677791349590,
+	0.0045265862718225,
+	0.0044857137836516,
+	0.0044451584108174,
+	0.0044049182906747,
+	0.0043649920262396,
+	0.0043253772892058,
+	0.0042860722169280,
+	0.0042470744810998,
+	0.0042083822190762,
+	0.0041699935682118,
+	0.0041319062002003,
+	0.0040941191837192,
+	0.0040566292591393,
+	0.0040194354951382,
+	0.0039825360290706,
+	0.0039459280669689,
+	0.0039096102118492,
+	0.0038735806010664,
+	0.0038378371391445,
+	0.0038023784290999,
+	0.0037672021426260,
+	0.0037323066499084,
+	0.0036976898554713,
+	0.0036633501295000,
+	0.0036292858421803,
+	0.0035954946652055,
+	0.0035619752015918,
+	0.0035287255886942,
+	0.0034957439638674,
+	0.0034630284644663,
+	0.0034305774606764,
+	0.0033983890898526,
+	0.0033664614893496,
+	0.0033347932621837,
+	0.0033033823128790,
+	0.0032722270116210,
+	0.0032413259614259,
+	0.0032106768339872,
+	0.0031802784651518,
+	0.0031501287594438,
+	0.0031202263198793,
+	0.0030905695166439,
+	0.0030611564870924,
+	0.0030319853685796,
+	0.0030030549969524,
+	0.0029743635095656,
+	0.0029459090437740,
+	0.0029176902025938,
+	0.0028897055890411,
+	0.0028619531076401,
+	0.0028344313614070,
+	0.0028071389533579,
+	0.0027800740208477,
+	0.0027532351668924,
+	0.0027266207616776,
+	0.0027002291753888,
+	0.0026740590110421,
+	0.0026481086388230,
+	0.0026223764289171,
+	0.0025968609843403,
+	0.0025715606752783,
+	0.0025464741047472,
+	0.0025215996429324,
+	0.0024969358928502,
+	0.0024724814575166,
+	0.0024482347071171,
+	0.0024241942446679,
+	0.0024003584403545,
+	0.0023767261300236,
+	0.0023532956838608,
+	0.0023300657048821,
+	0.0023070345632732,
+	0.0022842013277113,
+	0.0022615639027208,
+	0.0022391215898097,
+	0.0022168725263327,
+	0.0021948153153062,
+	0.0021729487925768,
+	0.0021512715611607,
+	0.0021297822240740,
+	0.0021084791515023,
+	0.0020873614121228,
+	0.0020664273761213,
+	0.0020456758793443,
+	0.0020251052919775,
+	0.0020047146826982,
+	0.0019845024216920,
+	0.0019644675776362,
+	0.0019446084043011,
+	0.0019249237375334,
+	0.0019054125295952,
+	0.0018860732670873,
+	0.0018669047858566,
+	0.0018479056889191,
+	0.0018290748121217,
+	0.0018104109913111,
+	0.0017919128295034,
+	0.0017735791625455,
+	0.0017554088262841,
+	0.0017374004237354,
+	0.0017195529071614,
+	0.0017018649959937,
+	0.0016843355260789,
+	0.0016669632168487,
+	0.0016497470205650,
+	0.0016326856566593,
+	0.0016157779609784,
+	0.0015990227693692,
+	0.0015824190340936,
+	0.0015659654745832,
+	0.0015496610431001,
+	0.0015335044590756,
+	0.0015174946747720,
+	0.0015016306424513,
+	0.0014859111979604,
+	0.0014703350607306,
+	0.0014549014158547,
+	0.0014396089827642,
+	0.0014244567137212,
+	0.0014094435609877,
+	0.0013945683604106,
+	0.0013798300642520,
+	0.0013652277411893,
+	0.0013507601106539,
+	0.0013364263577387,
+	0.0013222252018750,
+	0.0013081558281556,
+	0.0012942170724273,
+	0.0012804080033675,
+	0.0012667274568230,
+	0.0012531745014712,
+	0.0012397482059896,
+	0.0012264474062249,
+	0.0012132712872699,
+	0.0012002188013867,
+	0.0011872889008373,
+	0.0011744806542993,
+	0.0011617930140346,
+	0.0011492251651362,
+	0.0011367760598660,
+	0.0011244448833168,
+	0.0011122304713354,
+	0.0011001320090145,
+	0.0010881485650316,
+	0.0010762790916488,
+	0.0010645228903741,
+	0.0010528789134696,
+	0.0010413463460281,
+	0.0010299240238965,
+	0.0010186113649979,
+	0.0010074073215947,
+	0.0009963110787794,
+	0.0009853217052296,
+	0.0009744382696226,
+	0.0009636600152589,
+	0.0009529860108159,
+	0.0009424154413864,
+	0.0009319474338554,
+	0.0009215811733156,
+	0.0009113157866523,
+	0.0009011504589580,
+	0.0008910843171179,
+	0.0008811166044325,
+	0.0008712465059943,
+	0.0008614731486887,
+	0.0008517957176082,
+	0.0008422134560533,
+	0.0008327256073244,
+	0.0008233312983066,
+	0.0008140297723003,
+	0.0008048202726059,
+	0.0007957020425238,
+	0.0007866742089391,
+	0.0007777361315675,
+	0.0007688869955018,
+	0.0007601260440424,
+	0.0007514525204897,
+	0.0007428656681441,
+	0.0007343648467213,
+	0.0007259491831064,
+	0.0007176180370152,
+	0.0007093707099557,
+	0.0007012063870206,
+	0.0006931244279258,
+	0.0006851241341792,
+	0.0006772047490813,
+	0.0006693655741401,
+	0.0006616059690714,
+	0.0006539252353832,
+	0.0006463226745836,
+	0.0006387975881808,
+	0.0006313492776826,
+	0.0006239771610126,
+	0.0006166805396788,
+	0.0006094587151892,
+	0.0006023110472597,
+	0.0005952368956059,
+	0.0005882356781512,
+	0.0005813066381961,
+	0.0005744491936639,
+	0.0005676627042703,
+	0.0005609465879388,
+	0.0005543001461774,
+	0.0005477227969095,
+	0.0005412139580585,
+	0.0005347729311325,
+	0.0005283991922624,
+	0.0005220921593718,
+	0.0005158511339687,
+	0.0005096756503917,
+	0.0005035650101490,
+	0.0004975186311640,
+	0.0004915360477753,
+	0.0004856165905949,
+	0.0004797597066499,
+	0.0004739648429677,
+	0.0004682314174715,
+	0.0004625589062925,
+	0.0004569466982502,
+	0.0004513943276834,
+	0.0004459011543076,
+	0.0004404667124618,
+	0.0004350904200692,
+	0.0004297717532609,
+	0.0004245102172717,
+	0.0004193052300252,
+	0.0004141562967561,
+	0.0004090628935955,
+	0.0004040245548822,
+	0.0003990406985395,
+	0.0003941108298022,
+	0.0003892345121130,
+	0.0003844111633953,
+	0.0003796403470915,
+	0.0003749215684365,
+	0.0003702543035615,
+	0.0003656380868051,
+	0.0003610724525061,
+	0.0003565569058992,
+	0.0003520910104271,
+	0.0003476742422208,
+	0.0003433061938267,
+	0.0003389863704797,
+	0.0003347143065184,
+	0.0003304895944893,
+	0.0003263117105234,
+	0.0003221802762710,
+	0.0003180948260706,
+	0.0003140549233649,
+	0.0003100601024926,
+	0.0003061099560000,
+	0.0003022040764336,
+	0.0002983419981319,
+	0.0002945233136415,
+	0.0002907475864049,
+	0.0002870144380722,
+	0.0002833234320860,
+	0.0002796741900966,
+	0.0002760662464425,
+	0.0002724992518779,
+	0.0002689727698453,
+	0.0002654864219949,
+	0.0002620398008730,
+	0.0002586325572338,
+	0.0002552642836235,
+	0.0002519345725887,
+	0.0002486430457793,
+	0.0002453893830534,
+	0.0002421731333015,
+	0.0002389939763816,
+	0.0002358515339438,
+	0.0002327454276383,
+	0.0002296752936672,
+	0.0002266407973366,
+	0.0002236415748484,
+	0.0002206772478530,
+	0.0002177474962082,
+	0.0002148519706680,
+	0.0002119903074345,
+	0.0002091621718137,
+	0.0002063672436634,
+	0.0002036051591858,
+	0.0002008755982388,
+	0.0001981782406801,
+	0.0001955127518158,
+	0.0001928787969518,
+	0.0001902760704979,
+	0.0001877042377600,
+	0.0001851629931480,
+	0.0001826520165196,
+	0.0001801710022846,
+	0.0001777196448529,
+	0.0001752976386342,
+	0.0001729046634864,
+	0.0001705404429231,
+	0.0001682046713540,
+	0.0001658970286371,
+	0.0001636172673898,
+	0.0001613650529180,
+	0.0001591401378391,
+	0.0001569422020111,
+	0.0001547709834995,
+	0.0001526262058178,
+	0.0001505075779278,
+	0.0001484148233430,
+	0.0001463476801291,
+	0.0001443058717996,
+	0.0001422891218681,
+	0.0001402971829521,
+	0.0001383297785651,
+	0.0001363866613247,
+	0.0001344675547443,
+	0.0001325722114416,
+	0.0001307003694819,
+	0.0001288517960347,
+	0.0001270262291655,
+	0.0001252234214917,
+	0.0001234431256307,
+	0.0001216850942001,
+	0.0001199491016450,
+	0.0001182349005830,
+	0.0001165422581835,
+	0.0001148709343397,
+	0.0001132207034971,
+	0.0001115913328249,
+	0.0001099825967685,
+	0.0001083942697733,
+	0.0001068261190085,
+	0.0001052779334714,
+	0.0001037494876073,
+	0.0001022405631375,
+	0.0001007509490591,
+	0.0000992804343696,
+	0.0000978288007900,
+	0.0000963958373177,
+	0.0000949813475017,
+	0.0000935851130635,
+	0.0000922069375520,
+	0.0000908466172405,
+	0.0000895039556781,
+	0.0000881787418621,
+	0.0000868707938935,
+	0.0000855799153214,
+	0.0000843059096951,
+	0.0000830485805636,
+	0.0000818077533040,
+	0.0000805832314654,
+	0.0000793748258729,
+	0.0000781823691796,
+	0.0000770056576584,
+	0.0000758445312385,
+	0.0000746988007450,
+	0.0000735682915547,
+	0.0000724528290448,
+	0.0000713522385922,
+	0.0000702663455741,
+	0.0000691949899192,
+	0.0000681380042806,
+	0.0000670952067594,
+	0.0000660664445604,
+	0.0000650515576126,
+	0.0000640503712930,
+	0.0000630627328064,
+	0.0000620884820819,
+	0.0000611274663243,
+	0.0000601795254624,
+	0.0000592445103393,
+	0.0000583222608839,
+	0.0000574126352149,
+	0.0000565154768992,
+	0.0000556306404178,
+	0.0000547579802515,
+	0.0000538973472430,
+	0.0000530486067873,
+	0.0000522116097272,
+	0.0000513862178195,
+	0.0000505722891830,
+	0.0000497696892126,
+	0.0000489782833029,
+	0.0000481979295728,
+	0.0000474285006931,
+	0.0000466698656965,
+	0.0000459218863398,
+	0.0000451844389318,
+	0.0000444573925051,
+	0.0000437406197307,
+	0.0000430340005551,
+	0.0000423374040111,
+	0.0000416507064074,
+	0.0000409737949667,
+	0.0000403065387218,
+	0.0000396488248953,
+	0.0000390005370718,
+	0.0000383615515602,
+	0.0000377317555831,
+	0.0000371110363631,
+	0.0000364992811228,
+	0.0000358963770850,
+	0.0000353022151103,
+	0.0000347166824213,
+	0.0000341396698786,
+	0.0000335710756190,
+	0.0000330107905029,
+	0.0000324587090290,
+	0.0000319147256960,
+	0.0000313787422783,
+	0.0000308506532747,
+	0.0000303303586406,
+	0.0000298177601508,
+	0.0000293127595796,
+	0.0000288152587018,
+	0.0000283251629298,
+	0.0000278423740383,
+	0.0000273668010777,
+	0.0000268983476417,
+	0.0000264369245997,
+	0.0000259824373643,
+	0.0000255347986240,
+	0.0000250939192483,
+	0.0000246597101068,
+	0.0000242320838879,
+	0.0000238109550992,
+	0.0000233962364291,
+	0.0000229878460232,
+	0.0000225857002079,
+	0.0000221897134907,
+	0.0000217998094740,
+	0.0000214159026655,
+	0.0000210379166674,
+	0.0000206657696253,
+	0.0000202993869607,
+	0.0000199386886379,
+	0.0000195836018975,
+	0.0000192340467038,
+	0.0000188899521163,
+	0.0000185512453754,
+	0.0000182178500836,
+	0.0000178896953003,
+	0.0000175667119038,
+	0.0000172488289536,
+	0.0000169359755091,
+	0.0000166280842677,
+	0.0000163250861078,
+	0.0000160269155458,
+	0.0000157335034601,
+	0.0000154447880050,
+	0.0000151607009684,
+	0.0000148811805047,
+	0.0000146061629493,
+	0.0000143355846376,
+	0.0000140693846333,
+	0.0000138075020004,
+	0.0000135498758027,
+	0.0000132964469231,
+	0.0000130471553348,
+	0.0000128019437398,
+	0.0000125607548398,
+	0.0000123235304272,
+	0.0000120902159324,
+	0.0000118607549666,
+	0.0000116350929602,
+	0.0000114131744340,
+	0.0000111949484563,
+	0.0000109803595478,
+	0.0000107693567770,
+	0.0000105618883026,
+	0.0000103579022834,
+	0.0000101573505162,
+	0.0000099601811598,
+	0.0000097663451015,
+	0.0000095757950476,
+	0.0000093884827947,
+	0.0000092043601398,
+	0.0000090233806986,
+	0.0000088454989964,
+	0.0000086706686488,
+	0.0000084988441813,
+	0.0000083299819380,
+	0.0000081640382632,
+	0.0000080009685917,
+	0.0000078407301771,
+	0.0000076832820923,
+	0.0000075285815910,
+	0.0000073765872912,
+	0.0000072272587204,
+	0.0000070805558607,
+	0.0000069364386945,
+	0.0000067948681135,
+	0.0000066558054641,
+	0.0000065192130023,
+	0.0000063850520746,
+	0.0000062532858465,
+	0.0000061238779381,
+	0.0000059967919697,
+	0.0000058719920162,
+	0.0000057494426073,
+	0.0000056291091823,
+	0.0000055109571804,
+	0.0000053949524954,
+	0.0000052810614761,
+	0.0000051692513807,
+	0.0000050594899221,
+	0.0000049517439038,
+	0.0000048459824029,
+	0.0000047421740419,
+	0.0000046402869884,
+	0.0000045402916840,
+	0.0000044421572056,
+	0.0000043458544496,
+	0.0000042513538574,
+	0.0000041586263251,
+	0.0000040676436583,
+	0.0000039783776629,
+	0.0000038908001443,
+	0.0000038048840452,
+	0.0000037206025354,
+	0.0000036379285575,
+	0.0000035568359635,
+	0.0000034772988329,
+	0.0000033992916997,
+	0.0000033227893255,
+	0.0000032477666991,
+	0.0000031741994917,
+	0.0000031020633742,
+	0.0000030313346997,
+	0.0000029619895940,
+	0.0000028940053198,
+	0.0000028273586850,
+	0.0000027620271794,
+	0.0000026979889753,
+	0.0000026352215627,
+	0.0000025737040232,
+	0.0000025134145289,
+	0.0000024543323889,
+	0.0000023964369120,
+	0.0000023397078621,
+	0.0000022841247755,
+	0.0000022296680982,
+	0.0000021763182758,
+	0.0000021240562091,
+	0.0000020728630261,
+	0.0000020227198547,
+	0.0000019736082777,
+	0.0000019255105599,
+	0.0000018784083977,
+	0.0000018322843971,
+	0.0000017871213913,
+	0.0000017429021000,
+	0.0000016996099248,
+	0.0000016572281538,
+	0.0000016157405298,
+	0.0000015751310229,
+	0.0000015353839444,
+	0.0000014964834918,
+	0.0000014584145447,
+	0.0000014211619828,
+	0.0000013847107994,
+	0.0000013490465562,
+	0.0000013141547015,
+	0.0000012800211380,
+	0.0000012466318822,
+	0.0000012139732917,
+	0.0000011820318377,
+	0.0000011507941053,
+	0.0000011202471342,
+	0.0000010903778502,
+	0.0000010611738617,
+	0.0000010326225492,
+	0.0000010047116348,
+	0.0000009774290675,
+	0.0000009507630239,
+	0.0000009247017942,
+	0.0000008992339531,
+	0.0000008743481885,
+	0.0000008500333593,
+	0.0000008262786082,
+	0.0000008030731919,
+	0.0000007804065945,
+	0.0000007582684134,
+	0.0000007366484738,
+	0.0000007155367712,
+	0.0000006949234717,
+	0.0000006747989687,
+	0.0000006551536558,
+	0.0000006359782674,
+	0.0000006172636517,
+	0.0000005990008276,
+	0.0000005811809842,
+	0.0000005637953677,
+	0.0000005468355084,
+	0.0000005302930504,
+	0.0000005141598081,
+	0.0000004984277666,
+	0.0000004830889679,
+	0.0000004681356813,
+	0.0000004535603182,
+	0.0000004393554889,
+	0.0000004255138322,
+	0.0000004120281858,
+	0.0000003988915864,
+	0.0000003860970992,
+	0.0000003736380449,
+	0.0000003615078015,
+	0.0000003496999170,
+	0.0000003382080536,
+	0.0000003270260436,
+	0.0000003161478048,
+	0.0000003055673972,
+	0.0000002952790794,
+	0.0000002852771672,
+	0.0000002755560899,
+	0.0000002661104190,
+	0.0000002569349249,
+	0.0000002480244063,
+	0.0000002393738328,
+	0.0000002309782730,
+	0.0000002228329237,
+	0.0000002149331095,
+	0.0000002072742546,
+	0.0000001998519252,
+	0.0000001926617870,
+	0.0000001856996334,
+	0.0000001789613435,
+	0.0000001724429382,
+	0.0000001661405520,
+	0.0000001600504049,
+	0.0000001541688590,
+	0.0000001484923615,
+	0.0000001430174734,
+	0.0000001377408836,
+	0.0000001326593519,
+	0.0000001277697947,
+	0.0000001230691851,
+	0.0000001185546239,
+	0.0000001142233188,
+	0.0000001100725768,
+	0.0000001060998116,
+	0.0000001023025362,
+	0.0000000986783633,
+	0.0000000952250119,
+	0.0000000919403007,
+	0.0000000888221479,
+	0.0000000858685780,
+	0.0000000830777012,
+	0.0000000804477409,
+	0.0000000779770133,
+	0.0000000756639338,
+	0.0000000735070103,
+	0.0000000715048571,
+	0.0000000696561813,
+	0.0000000679597889,
+	0.0000000664145787,
+	0.0000000650195489,
+	0.0000000637737969,
+	0.0000000626765058,
+	0.0000000617269649,
+	0.0000000609245632,
+	0.0000000602687606,
+	0.0000000597591452,
+	0.0000000593953757,
+	0.0000000591772142,
+	0.0000000591045186
+};
+
+
+#endif // !_WINDOWING_H_
+
+

diff -r b9debc14d077 -r 9dd7c64b4a64 LICENSE.md
--- a/LICENSE.md	Tue May 11 11:08:02 2021 +0000
+++ b/LICENSE.md	Mon Dec 06 05:22:28 2021 +0000
@@ -1,205 +1,185 @@
 Analog Devices, Inc. (ADI)
-Source Code Software License Agreement
-20210129-LWSCMOS-CTSLA
-BEFORE YOU SELECT THE "I ACCEPT" BUTTON AT THE BOTTOM OF THIS WINDOW, CAREFULLY READ THE TERMS AND CONDITIONS SET FORTH BELOW.  BY SELECTING THE "I ACCEPT" BUTTON BELOW, OR DOWNLOADING, REPRODUCING, DISTRIBUTING OR OTHERWISE USING THE SOFTWARE, YOU AGREE TO BE BOUND BY THE TERMS AND CONDITIONS SET FORTH BELOW.  IF YOU DO NOT AGREE TO ALL OF THE TERMS AND CONDITIONS, SELECT THE 'I DO NOT ACCEPT' BUTTON AND YOU MUST NOT DOWNLOAD, INSTALL OR OTHERWISE USE THE SOFTWARE.
+EVALUATION LICENSE AGREEMENT 
+20200427-CN0540EC-CTELA
+
+This Evaluation License Agreement (the “Agreement”) is a legal agreement between Analog Devices, Inc., a Massachusetts corporation, with its principal office at One Technology Way, Norwood, Massachusetts, USA 02062 (“Analog Devices”) and you (personally or on behalf of your employer, as applicable) (“Licensee”) for the software and related documentation that accompanies this Agreement (the “Licensed Software”).   YOU AGREE THAT YOU ARE BOUND BY THE TERMS AND CONDITIONS OF THIS AGREEMENT BY DOWNLOADING, INSTALLING, COPYING OR USING THE SOFTWARE. IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY OR USE THE SOFTWARE.  YOU REPRESENT THAT YOU ARE OVER THE AGE OF 18 AND HAVE THE CAPACITY AND AUTHORITY TO BIND YOURSELF OR YOUR EMPLOYER, AS APPLICABLE, TO THE TERMS OF THIS AGREEMENT. 
+
+The Licensed Software consists of (a) embedded software (including firmware) designed to operate in an Analog Devices processor / product (“Embedded Software”) and / or (b) application software designed to run on personal computers (“PC Software”).  
+
+1. License.  Subject to the terms and conditions of this Agreement, Analog Devices grants to Licensee a non-exclusive, non-transferable, non-sublicensable license to:
+(a) internally use and copy the Embedded Software (and modify the Embedded Software if it is provided in source code form) for the sole purpose of evaluating the use of Embedded Software with Analog Devices’ processors / products; and  
+(b) internally use and copy the PC Software for the sole purpose of evaluating use of the PC Software with Analog Devices’ processors / products.  Such evaluation may include configuring, monitoring and controlling Analog Devices processors / products solely in order to evaluate use of the PC Software with Analog Devices’ processors / products. 
 
-DOWNLOADING, REPRODUCING, DISTRIBUTING OR OTHERWISE USING THE SOFTWARE CONSTITUTES ACCEPTANCE OF THIS LICENSE.  THE SOFTWARE MAY NOT BE USED EXCEPT AS EXPRESSLY AUTHORIZED UNDER THIS LICENSE. 
+2. Restrictions and Conditions.  The license granted in Section 1 is conditioned on full compliance with this Section 2 and the other obligations under this Agreement.  
+(a) Licensee shall not modify, reverse engineer, decompile, disassemble or create derivative works of the Licensed Software except and only to the extent that such activity is expressly permitted (i) pursuant to Section 1 above or (ii) by applicable law notwithstanding this limitation. 
+(b) In no event shall Licensee (i) sublicense, rent, lease, permit time-sharing or otherwise make available, transfer, deliver, disclose, or distribute the Licensed Software to any third party or (ii) use the Licensed Software for any commercial purpose, including, without limitation, the manufacture of products intended for commercial sale or the development of any other software, application, product or service for commercial release. 
+(c) The Licensed Software may not be used with any processors / products other than Analog Devices’ processors / products or for any other purpose.  
+(d) Licensee shall not engage in any activities with respect to the Licensed Software that would cause the Licensed Software, in whole or in part to become subject to any terms of an Excluded License.  An “Excluded License” means any license, including licenses for “open source” code (such as defined by the Free Software Foundation), that requires as a condition of use, modification, and/or distribution of the software subject to such Excluded License, that such software or other software combined and/or distributed with such software be (i) disclosed or distributed in source code form; (ii) licensed for the purpose of making derivative works; or (iii) redistributable at no charge.  Examples of Excluded Licenses include, without limitation, the GNU General Public License, the GNU Lesser General Public License, the Mozilla Public License and the Microsoft Reciprocal License.  The restrictions of this section apply regardless of whether the Licensed Software is intended or designed to run in an environment that includes software under an Excluded License.  Any license, agreement or other document issued, entered into or granted by Licensee that purports to apply any Excluded License to any portion of the Licensed Software shall be null and void with regard to the Licensed Software. 
+(e) If Analog Devices elects to make any update, upgrade or new version of the Licensed Software (“Updates”) available to Licensee, such Updates shall be deemed to be the Licensed Software under this Agreement.  If requested by Analog Devices, Licensee shall only use the latest version of the Licensed Software (including Updates).  Analog Devices shall have no obligation to provide support or Updates of any kind.
+(f) In no event shall Licensee remove any copyright or other intellectual property notice or other legend contained on or in copies of the Licensed Software or displayed by the Licensed Software.
+(g) To the extent there are any specifications and/or user manuals for the Licensed Software, as an additional restriction under this Agreement (and in no way expanding any rights under this Agreement), the Licensed Software may not be used in any manner that is inconsistent with such specifications and/or user manuals.  For the avoidance of doubt, Licensee may not distribute the Licensed Software under any circumstances.  
 
-The software is protected by copyright law and international copyright treaties.  
+3. Ownership.  Licensee acknowledges and agrees that Analog Devices and its licensors and suppliers (as applicable) retain all right, title and interest in the Licensed Software and derivative works thereof, including all related patent, copyright and other intellectual property rights in any of the foregoing, and that Licensee’s rights to the Licensed Software are limited to those expressly provided for in Section 1 above (subject to the conditions and restrictions in this Section 3).  Licensee shall not take any action inconsistent with such title and ownership.  Any use of the Licensed Software for any purpose other than as expressly licensed hereunder is outside the scope of this Agreement.  All rights not expressly granted in this Agreement are reserved to Analog Devices.  It is agreed that because of the proprietary nature of the Licensed Software, Analog Devices’ remedies at law for a breach by the Licensee of its obligations under this Agreement or for use of the Licensed Software beyond the scope of the license granted herein will be inadequate and that Analog Devices will, in the event of such breach, be entitled to equitable relief, including injunctive relief, without the posting of any bond, in addition to all other remedies provided under this Agreement or available at law.
 
-1. License:  Subject to the terms and conditions of this license, the software may be reproduced, modified and distributed in source code and object code form.
+4. Publicity. Notwithstanding anything in this Agreement, Licensee may not use any trademark or trade name of Analog Devices or make any public announcement regarding the existence of this Agreement without Analog Devices’ prior written consent.  Licensee may not publish or provide the results of any benchmark or comparison tests run on the Licensed Software to any third party without the prior written consent of Analog Devices.  
+
+5. Feedback. Licensee may from time to time provide modifications, enhancements, improvements, code, suggestions, ideas, comments or other feedback (“Feedback”) to Analog Devices related to the Licensed Software.  Licensee agrees that all Feedback is and shall be given entirely voluntarily. To the extent Licensee provides such Feedback, Licensee (on behalf of itself and its affiliates) hereby grants to Analog Devices and its affiliates a non-exclusive, irrevocable, perpetual, worldwide, royalty-free, transferable license, with the right to sublicense, under Licensee’s (and its affiliates’) intellectual property, to use and disclose Feedback in any manner Analog Devices or its affiliates choose, including, without limitation, displaying, performing, copying, making, having made, using, selling and otherwise disposing of Analog Devices’ and its affiliates and their respective licensees’ software, applications, products or services embodying such Feedback in any manner and via any media, without reference to its source or other obligation to Licensee and even if the Feedback is designated as confidential.
 
-2. Conditions: 
-(a) Any distribution of the software must include a complete copy of this license and retain all copyright and other proprietary notices.  The software that is distributed (including modified versions of the software) shall be subject to the terms and conditions of this license.  
-(b) The software may not be combined or merged with other software in any manner that would cause the software to become subject to terms and conditions which differ from those of this license.
-(c) The software is licensed solely and exclusively for use with processors / products manufactured by or for ADI.
-(d) Licensee shall not use the name or any trademark of ADI (including those of its licensors) or any contributor to endorse or promote products without prior written consent of the owner of the name or trademark.  The term �contributor� means any person or entity that modifies or distributes the software.  
-(e) Modified versions of the Software must be conspicuously marked as such.
-(f) Use of the software may or may not infringe patent rights of one or more patent holders.  This license does not alleviate the obligation to obtain separate licenses from patent holders to use the software.
-(g) All rights not expressly granted hereunder are reserved.  
-(h) This license shall be governed by the laws of Massachusetts, without regard to its conflict of laws rules.  The software shall only be used in compliance with all applicable laws and regulations, including without limitation export control laws.  
+6.  Confidentiality.  
+(a) The Licensed Software and any accompanying documentation, and any other information which a reasonable person would understand is of a confidential or proprietary nature, shall be deemed to be “Confidential Information” of Analog Devices whether or not it is identified in writing as “Confidential.”  Any other materials or information identified by Analog Devices as “Confidential” or with any similar notice shall also be treated as Confidential Information of Analog Devices under this Agreement.  Analog Devices Confidential Information shall include, without limitation, software and information of Analog Devices’ affiliates, suppliers and licensors.  
+(b) Licensee shall protect the confidentiality of Analog Devices Confidential Information. Without limitation, Licensee agrees: (i) not to disclose or otherwise permit any other person or entity access to, in any manner, Confidential Information, or any part thereof in any form whatsoever; except that such disclosure or access shall be permitted to an employee of Licensee (x) requiring access to Confidential Information in the course of his or her employment in connection with this Agreement, (y) who is subject to written confidentiality obligations at least as protective with respect to Confidential Information as the terms and conditions in this Agreement and (z) who complies with all other applicable provisions of this Agreement; (ii) to notify Analog Devices promptly and in writing of the circumstances surrounding any suspected possession, use or knowledge of Confidential Information other than those authorized by this Agreement; and (iii) not to use Confidential Information for any purpose other than as explicitly set forth herein.
+(c) Nothing in this Section 6 shall restrict Licensee with respect to information if such information:  (i) was rightfully possessed by Licensee before it was received from Analog Devices; (ii) is independently developed by Licensee without reference to Confidential Information; (iii) is subsequently furnished to Licensee by a third party not under any obligation of confidentiality with respect to such information, and without restrictions on use or disclosure; or (iv) is or becomes public or available to the general public otherwise than through any act or default of Licensee.
+(d) Because the unauthorized use, transfer or dissemination of any Confidential Information may diminish substantially the value of such materials and may irreparably harm Analog Devices, if Licensee breaches the provisions of this Section 6, Analog Devices shall, without limiting its other rights or remedies, be entitled to equitable relief, including but not limited to injunctive relief.
+
+7. Third Party Software.  The Licensed Software may be accompanied by or include software made available by one or more third parties (“Third Party Software”).  Each portion of Third Party Software is subject to its own separate software license terms and conditions (“Third Party Licenses”).  The Third Party Licenses for Third Party Software delivered with the Licensed Software may be set forth or identified (by URL or otherwise) in (i) Appendix A to this Agreement (if any), (ii) the applicable software header or footer text, (iii) a text file located in the directory of the applicable Third Party Software component, (iv) software documentation, (v) in connection with any Update of the Licensed Software or its documentation, and/or (vi) such other location customarily used for licensing terms. The use of each portion of Third Party Software is subject to the Third Party Licenses, and Licensee agrees that Licensee’s use of any Third Party Software is bound by the applicable Third Party License.  Licensee agrees to review and comply with all applicable Third Party Licenses prior to any use or distribution of any Third Party Software.  Third Party Software is provided on an “as is” basis without any representation, warranty or liability of any kind.  Analog Devices (including its licensors and suppliers) shall have no liability or responsibility for the operation or performance of the Third Party Software and shall not be liable for any damages, costs, or expenses, direct or indirect, arising out of the performance or failure to perform of the Third Party Software.  Analog Devices (including its licensors and suppliers) shall be entitled to the benefit of any and all limitations of liability and disclaimers of warranties contained in the Third Party Licenses. 
+
+8. Required Consents; Indemnification. Licensee acknowledges that use of the Licensed Software may require Licensee to obtain licenses to intellectual property or other consents from one or more third parties.  Licensee is responsible for obtaining any and all such required licenses or consents regarding the Licensed Software and for the performance of any and all required tests or analysis necessary or appropriate for the determination of the suitability of the Licensed Software for its purposes.  Without limitation, Licensee is responsible for obtaining, maintaining and complying with third party licenses in connection with any Industry Standard hereafter defined below (including related intellectual property rights) applicable to the Licensed Software.  "Industry Standard" means any standard, protocol or specification that is promulgated by any standards development organization, consortium, trade association, special interest group, or like group or entity, for the purpose of widespread adoption.  By way of non-limiting examples, industry standards and specifications may include without limitation technical specifications promulgated by organizations such as the International Telecommunications Union (ITU), International Standards Organization (ISO), International Electrotechnical Commission (IEC), 3'd Generation Partnership Project (3GPP), Moving Picture Experts Group (MPEG), World Wide Web Consortium (W3C), Internet Engineering Task Force (IETF), OpenFabrics Alliance, Open Mobile Alliance, UPnP Forum, USB lmplementers Forum, Institute of Electrical and Electronics Engineers (IEEE), American National Standards Institute (ANSI), Telecommunications Industry Association (TIA), AUTomotive Open System Architecture (AUTOSAR), High-bandwidth Digital Content Protection (HDCP), High-Definition Multimedia Interface (HDMI), Digital Transmission Content Protection (DTCP), Digital Transmission Licensing Administrator (DTLA), and Ethernet POWERLINK Standardization Group (EPSG).  Licensee shall defend, indemnify and hold Analog Devices, its affiliates, licensors and suppliers, and their respective officers, directors, employees and agents (each an “Indemnified Party”) harmless from and against any damages, fines, penalties, assessments, liabilities, costs and expenses (including reasonable attorneys’ fees and court costs) in the event that any claim is brought against an Indemnified Party arising or alleged to arise directly or indirectly from (i) Licensee’s possession, use, distribution or other exploitation of the Licensed Software or Third Party Software, or (ii) Licensee’s failure to obtain any required license or consent with respect to the Licensed Software or Third Party Software.  
+
+9. Term and Termination.  
+(a) The term of this Agreement is for a period of six (6) months commencing on the date the Licensed Software is first received by Licensee from Analog Devices or its authorized distributor (“Term”).  This Agreement is effective until the expiration of the Term or until terminated in accordance with this Section.  Either party may terminate this Agreement at any time by giving written notice to the other party. This Agreement shall immediately automatically terminate in the event of any failure by Licensee to comply with any term or condition of the Agreement. In the event of termination or expiration (i) all licenses granted to Licensee immediately expire and (ii) Licensee must immediately cease using the Licensed Software and permanently delete all copies of the Licensed Software and all of its component parts, including any backup or archival copies.  The provisions of Sections 2 through 19 shall survive any termination or expiration of this Agreement according to their terms.
+(b) THE LICENSED SOFTWARE MAY BE TIME-SENSITIVE AND MAY NOT FUNCTION UPON EXPIRATION OF TERM.  NOTICE OF EXPIRATION WILL NOT BE GIVEN, SO LICENSEE NEEDS TO PLAN FOR THE EXPIRATION DATE.  In order to protect against unauthorized use of the Licensed Software in commercial applications, Analog Devices may have integrated copy protection into the evaluation software.  Typical protection may include a time-out or periodic beep on audio software or a watermark on imaging software.  
+
+10. DISCLAIMER OF WARRANTIES.  
+THE LICENSED SOFTWARE, THIRD PARTY SOFTWARE AND ANY SUPPORT ARE PROVIDED "AS IS" WITHOUT REPRESENTATION OR WARRANTY OF ANY KIND, AND ANALOG DEVICES, FOR ITSELF AND ITS AFFILIATES, HEREBY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, WHETHER EXPRESS OR IMPLIED, ORAL OR WRITTEN, WITH RESPECT TO THE LICENSED SOFTWARE AND THIRD PARTY SOFTWARE AND ANY SUPPORT, INCLUDING, BUT NOT LIMITED TO, ANY EXPRESS OR IMPLIED WARRANTIES OF MERCHANTABILITY; FITNESS FOR ANY PARTICULAR PURPOSE; QUALITY AND ACCURACY OF INFORMATIONAL CONTENT; NON-INFRINGEMENT; QUIET ENJOYMENT; AND TITLE.  LICENSEE AGREES THAT ANY EFFORTS BY ANALOG DEVICES OR ITS AFFILIATES TO MODIFY OR UPDATE THE LICENSED SOFTWARE OR THIRD PARTY SOFTWARE OR PROVIDE SUPPORT SHALL NOT BE DEEMED A WAIVER OF THESE LIMITATIONS, AND THAT ANY ANALOG DEVICES WARRANTIES SHALL NOT BE DEEMED TO HAVE FAILED OF THEIR ESSENTIAL PURPOSE.   
 
-3. WARRANTY DISCLAIMER: THE SOFTWARE AND ANY RELATED INFORMATION AND/OR ADVICE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT REPRESENTATIONS, GUARANTEES OR WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ORAL OR WRITTEN, INCLUDING WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. There is no obligation to provide software support or updates.  The Software is not fault-tolerant and is not intended for use in high risk applications, including without limitation in the operation of nuclear facilities, aircraft navigation or control systems, air traffic control, life support machines, weapons systems, autonomous driving or other safety critical automotive applications, or any other application in which the failure of the software could lead to death, personal injury, or severe physical or environmental damages.  The software is not authorized to be used under such circumstances.
+11.  Limitation of Liability.  
+(a) TO THE MAXIMUM EXTENT PERMITTED BY LAW, ANALOG DEVICES (INCLUDING ITS AFFILIATES) SHALL NOT BE LIABLE FOR ANY DAMAGES ARISING FROM OR RELATED TO THE LICENSED SOFTWARE, THIRD PARTY SOFTWARE, THEIR USE OR ANY RELATED INFORMATION AND/OR SERVICES, INCLUDING BUT NOT LIMITED TO ANY INDIRECT, INCIDENTAL, SPECIAL, PUNITIVE, EXEMPLARY, CONSEQUENTIAL OR ANALOGOUS DAMAGES (INCLUDING WITHOUT LIMITATION ANY DAMAGES RESULTING FROM LOSS OF USE, DATA, REVENUE, PROFITS, OR SAVINGS, COMPUTER DAMAGE, INTERRUPTION OF BUSINESS, OR ANY OTHER CAUSE), UNDER ANY LEGAL THEORY (INCLUDING WITHOUT LIMITATION CONTRACT, WARRANTY, TORT, NEGLIGENCE, STRICT OR PRODUCT LIABILITY), EVEN IF IT HAS BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES.  
+(b) IN NO EVENT SHALL ANALOG DEVICES’ CUMULATIVE LIABILITY FOR DAMAGES TO LICENSEE FOR ANY AND ALL CAUSES WHATSOEVER, REGARDLESS OF THE FORM OF ANY CLAIMS OR ACTIONS, EXCEED ONE HUNDRED U.S. DOLLARS ($100.00 U.S.).  ANALOG DEVICES’ AFFILIATES, LICENSORS AND SUPPLIERS SHALL HAVE NO LIABILITY WHATSOEVER UNDER THIS AGREEMENT OR IN CONNECTION WITH THE LICENSED SOFTWARE OR ITS USE.
+(c) Some jurisdictions do not permit the exclusion or limitation of liability for consequential, incidental or other damages, and, as such, some portion of the above limitation may not apply to Licensee.  In such jurisdictions, Analog Devices' liability is limited to the greatest extent permitted by law.
+
+12. Choice of Law.  This Agreement and any dispute related to the Licensed Software shall be governed by the laws of the Commonwealth of Massachusetts, United States of America, without reference to its principles of conflicts of laws, and, as to matters affecting copyrights, trademarks and patents, in addition, by applicable United States federal law.  The parties agree that the jurisdiction and venue of any action with respect to this Agreement shall be in a court of competent subject matter jurisdiction located in Boston, Massachusetts, and each of the parties hereby agrees to submit itself to the exclusive jurisdiction and venue of such courts for the purpose of any such action, except that Analog Devices may seek equitable (including injunctive) relief and enforce judgements in any venue of its choosing. Licensee hereby submits to personal jurisdiction in such courts. The parties hereto specifically exclude the United Nations Convention on Contracts for the International Sale of Goods and the Uniform Computer Information Transactions Act from this Agreement.  The parties hereto waive any statute, law, or regulation that might provide an alternative law or forum or to have this Agreement written in any language other than English.
+
+13. U.S. Government Restricted Rights. If the Licensed Software or documentation provided by Analog Devices or its suppliers is procured by or on behalf of the United States Government, the Government agrees that such software or documentation is “commercial computer software” or “commercial computer software documentation” and that absent a written agreement to the contrary, the Government’s rights with respect to such Licensed Software or documentation are limited by the terms of this Agreement, pursuant to FAR § 12.212(a) and/or DFARS § 227.7202-1(a), as applicable. 
 
-4. LIMITATION OF LIABILITY: TO THE MAXIMUM EXTENT PERMITTED BY LAW ADI (INCLUDING ITS LICENSORS) AND CONTRIBUTORS SHALL NOT BE LIABLE FOR ANY DAMAGES ARISING FROM OR RELATED TO THE SOFTWARE, ITS USE OR ANY RELATED INFORMATION AND/OR SERVICES, INCLUDING BUT NOT LIMITED TO ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, PUNITIVE, EXEMPLARY, CONSEQUENTIAL OR ANALOGOUS DAMAGES (INCLUDING WITHOUT LIMITATION ANY DAMAGES RESULTING FROM LOSS OF USE, DATA, REVENUE, PROFITS, OR SAVINGS, COMPUTER DAMAGE OR ANY OTHER CAUSE), UNDER ANY LEGAL THEORY (INCLUDING WITHOUT LIMITATION CONTRACT, WARRANTY, TORT, NEGLIGENCE, STRICT OR PRODUCT LIABILITY), EVEN IF IT HAS BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES.  Some jurisdictions do not permit the exclusion or limitation of liability for consequential, incidental or other damages, and, as such, some portion of the above limitation may not apply.  In such jurisdictions, liability is limited to the greatest extent permitted by law.
-5.  Third Party Software:  The software may be accompanied by or include software made available by one or more third parties (Third Party Software).  Each portion of Third Party Software is subject to its own separate software license terms and conditions (�Third Party Licenses�).  The Third Party Licenses for Third Party Software delivered with the software are set forth or identified (by url or otherwise) in (i) Appendix A to this license (if any), (ii) the applicable software header or footer text, (iii) a text file located in the directory of the applicable Third Party Software component and/or (iv) such other location customarily used for licensing terms. The use of each portion of Third Party Software is subject to the Third Party Licenses, and you agree that your use of any Third Party Software is bound by the applicable Third Party License.  You agree to review and comply with all applicable Third Party Licenses prior to any use or distribution of any Third Party Software.  Third Party Software is provided on an �as is� basis without any representation, warranty or liability of any kind.  ADI (including its licensors) and contributors shall have no liability or responsibility for the operation or performance of the Third Party Software and shall not be liable for any damages, costs, or expenses, direct or indirect, arising out of the performance or failure to perform of the Third Party Software.  ADI (including its licensors) and contributors shall be entitled to the benefit of any and all limitations of liability and disclaimers of warranties contained in the Third Party Licenses. For the avoidance of doubt, this license does not alter, limit or expand the terms and conditions of, or rights granted to you pursuant to, Third Party Licenses.  
- 
-Appendix A - Third Party License
+14.  Export.  Licensee shall only use the Licensed Software in compliance with all applicable laws and regulations, including without limitation export control laws.  Licensee agrees that Licensee will not directly or indirectly export the Licensed Software to another country except in full compliance with all applicable United States Federal Laws and Regulations and other laws and regulations relating to exports and imports.  Licensee will not export/re-export, directly or indirectly, any software, information or technical data acquired under this Agreement or the "direct product" thereof to any country for which the United States Government or any agency thereof, at the time of export, requires an export license or other governmental approval, without first obtaining such license or approval.  The term "direct product" as used herein means the immediate product (including processes and services) produced directly by the use of the technical data or information.  In addition to the above, the Licensed Software and/or any "direct product" thereof, may not be used by, or exported, transferred or re-exported to (i) any U.S. or U.N. or EU-sanctioned or embargoed country, or to nationals or residents of such countries;  (ii) any person , entity, organization, or other party identified on the U.S. Department of Treasury’s lists of “Specially Designated Nationals and Blocked Persons” (iii) any associations, individuals, companies, entities, organizations found in the U.S. Department of Commerce’s Table of Denial Orders or Entity List, as published and revised from time to time (collectively known as the "Denied Parties List" or "Prohibited Parties List"); and/or (iv) any unauthorized or prohibited end-user engaged in any prohibited activities related to weapons of mass destruction, including without limitation, activities related to the design, development, production or use of nuclear weapons, materials, or facilities, missile or the support of missile projects, and chemical or biological weapons.  Licensee understands that the foregoing obligations are legal requirements and agree that they shall survive any expiration or termination of this Agreement.
+
+15. Compliance with Laws; Taxes.  Licensee shall comply with all laws, legislation, rules, regulations, governmental requirements and industry standards with respect to the Licensed Software, and the performance by Licensee of its obligations hereunder, existing in any applicable jurisdiction.  In the event that this Agreement is required to be registered with any governmental authority, Licensee shall notify Analog Devices in writing and cause such registration to be made and shall bear any expense or tax payable in respect thereof.  Licensee shall bear any and all expenses and pay any and all taxes that may be payable in relation to this Agreement.
+
+16. Assignment.  This Agreement is personal to Licensee and Licensee may not transfer, sublicense, lease, rent, or assign its rights under this Agreement, and any such attempt shall be null and void. Analog Devices may assign, transfer, or sublicense this Agreement or any rights or obligations hereunder at any time in its sole discretion.
 
-Mbed-OS
+17.  Waiver; Modification; Severability.  No waiver, consent, modification or change of terms of this Agreement shall bind either party unless in writing signed by both parties, and then such waiver, consent, modification or change shall be effective only in the specific instance and for the specific purpose given.  If any provision of this Agreement is unenforceable, such provision shall be enforced to the extent possible under applicable law, and the remaining provisions will remain in effect.
 
-Download page:https://github.com/ARMmbed/mbed-os/
-Online license: 	https://github.com/ARMmbed/mbed-os/blob/master/LICENSE-apache-2.0.txt
+18.  Audit.  Analog Devices shall have the right upon ten (10) days prior written notice to audit Licensee’s compliance with the terms of this Agreement during normal business hours.  In connection with such audit, Analog Devices shall have access to all reasonably requested documents, equipment, information and personnel.  If requested by Analog Devices, within ten business days of such request, Licensee shall either (i) certify in writing that Licensee is fully compliant with this Agreement or (ii) deliver a notice in writing stating all of the reasons why Licensee is not fully compliant.  
+
+19. Entire Agreement.  This Agreement constitutes the entire, final, and complete agreement between the parties hereto relevant to the subject matter hereof, and supersedes any and all other agreements, either oral or in writing, between the parties with respect to the subject matter of this Agreement.  Any term or condition incorporated in Licensee’s purchase order(s) or any other document provided by Licensee to Analog Devices which is in any way different from, inconsistent with or in addition to the terms and conditions set forth herein shall be of no effect, shall not apply to the licensing of the Licensed Software, and shall not become a part of a contract between the parties or be binding upon Analog Devices.  Analog Devices’ failure to object to terms contained in any communication from Licensee shall not be an acceptance of such terms or a waiver of the terms set forth in this Agreement.  If, for any reason, any provision of this Agreement is held invalid, such invalidity shall not affect the remainder of this Agreement, and this Agreement shall continue in force and effect to the full extent allowed by law.  For the avoidance of doubt, all the Licensed Software under this Agreement is subject to the terms and conditions of this Agreement and not any agreement or terms for purchase of Analog Devices products, even if the Licensed Software is delivered with such products.
 
 
-Apache License 2.0
+
+ 
+Appendix A – Third Party Licenses
+
+
+Analog devices/platform_drivers
+
+Download page: https://os.mbed.com/teams/AnalogDevices/code/platform_drivers/
+Online license: https://github.com/ARMmbed/mbed-os/blob/master/LICENSE-apache-2.0.txt 
+
+Analog devices/platform_drivers are subject to the mbed-os license (Apache 2.0 License) which is reproduced below.
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+
+ARMmbed/mbed-os
+
+Download page: https://github.com/ARMmbed/mbed-os
+Online license: https://github.com/ARMmbed/mbed-os/blob/master/LICENSE-apache-2.0.txt
+
+
+mbed-os Licence
 
 Apache License
 Version 2.0, January 2004
 http://www.apache.org/licenses/
+	
 
 TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
+	
 1. Definitions.
-
-"License" shall mean the terms and conditions for use, reproduction, and
-distribution as defined by Sections 1 through 9 of this document.
+	
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+	
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
 
-"Licensor" shall mean the copyright owner or entity authorized by the copyright
-owner that is granting the License.
-
-"Legal Entity" shall mean the union of the acting entity and all other entities
-that control, are controlled by, or are under common control with that entity.
-For the purposes of this definition, "control" means (i) the power, direct or
-indirect, to cause the direction or management of such entity, whether by
-contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
 outstanding shares, or (iii) beneficial ownership of such entity.
-
-"You" (or "Your") shall mean an individual or Legal Entity exercising
-permissions granted by this License.
-
-"Source" form shall mean the preferred form for making modifications, including
-but not limited to software source code, documentation source, and configuration
-files.
-
-"Object" form shall mean any form resulting from mechanical transformation or
-translation of a Source form, including but not limited to compiled object code,
-generated documentation, and conversions to other media types.
-
-"Work" shall mean the work of authorship, whether in Source or Object form, made
-available under the License, as indicated by a copyright notice that is included
-in or attached to the work (an example is provided in the Appendix below).
+	
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+	
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
 
-"Derivative Works" shall mean any work, whether in Source or Object form, that
-is based on (or derived from) the Work and for which the editorial revisions,
-annotations, elaborations, or other modifications represent, as a whole, an
-original work of authorship. For the purposes of this License, Derivative Works
-shall not include works that remain separable from, or merely link (or bind by
-name) to the interfaces of, the Work and Derivative Works thereof.
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+	
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
 
-"Contribution" shall mean any work of authorship, including the original version
-of the Work and any modifications or additions to that Work or Derivative Works
-thereof, that is intentionally submitted to Licensor for inclusion in the Work
-by the copyright owner or by an individual or Legal Entity authorized to submit
-on behalf of the copyright owner. For the purposes of this definition,
-"submitted" means any form of electronic, verbal, or written communication sent
-to the Licensor or its representatives, including but not limited to
-communication on electronic mailing lists, source code control systems, and
-issue tracking systems that are managed by, or on behalf of, the Licensor for
-the purpose of discussing and improving the Work, but excluding communication
-that is conspicuously marked or otherwise designated in writing by the copyright
-owner as "Not a Contribution."
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+	
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition,
+"submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication
+that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
 
-"Contributor" shall mean Licensor and any individual or Legal Entity on behalf
-of whom a Contribution has been received by Licensor and subsequently
-incorporated within the Work.
-
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+	
 2. Grant of Copyright License.
-
-Subject to the terms and conditions of this License, each Contributor hereby
-grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
-irrevocable copyright license to reproduce, prepare Derivative Works of,
-publicly display, publicly perform, sublicense, and distribute the Work and such
-Derivative Works in Source or Object form.
+	
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
 
 3. Grant of Patent License.
 
-Subject to the terms and conditions of this License, each Contributor hereby
-grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
-irrevocable (except as stated in this section) patent license to make, have
-made, use, offer to sell, sell, import, and otherwise transfer the Work, where
-such license applies only to those patent claims licensable by such Contributor
-that are necessarily infringed by their Contribution(s) alone or by combination
-of their Contribution(s) with the Work to which such Contribution(s) was
-submitted. If You institute patent litigation against any entity (including a
-cross-claim or counterclaim in a lawsuit) alleging that the Work or a
-Contribution incorporated within the Work constitutes direct or contributory
-patent infringement, then any patent licenses granted to You under this License
-for that Work shall terminate as of the date such litigation is filed.
-
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor
+that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License	for that Work shall terminate as of the date such litigation is filed.
+	
 4. Redistribution.
-
-You may reproduce and distribute copies of the Work or Derivative Works thereof
-in any medium, with or without modifications, and in Source or Object form,
-provided that You meet the following conditions:
-
-You must give any other recipients of the Work or Derivative Works a copy of
-this License; and
-You must cause any modified files to carry prominent notices stating that You
-changed the files; and
-You must retain, in the Source form of any Derivative Works that You distribute,
-all copyright, patent, trademark, and attribution notices from the Source form
-of the Work, excluding those notices that do not pertain to any part of the
-Derivative Works; and
-If the Work includes a "NOTICE" text file as part of its distribution, then any
-Derivative Works that You distribute must include a readable copy of the
-attribution notices contained within such NOTICE file, excluding those notices
-that do not pertain to any part of the Derivative Works, in at least one of the
-following places: within a NOTICE text file distributed as part of the
-Derivative Works; within the Source form or documentation, if provided along
-with the Derivative Works; or, within a display generated by the Derivative
-Works, if and wherever such third-party notices normally appear. The contents of
-the NOTICE file are for informational purposes only and do not modify the
-License. You may add Your own attribution notices within Derivative Works that
-You distribute, alongside or as an addendum to the NOTICE text from the Work,
-provided that such additional attribution notices cannot be construed as
-modifying the License.
-You may add Your own copyright statement to Your modifications and may provide
-additional or different license terms and conditions for use, reproduction, or
-distribution of Your modifications, or for any such Derivative Works as a whole,
-provided Your use, reproduction, and distribution of the Work otherwise complies
-with the conditions stated in this License.
-
+	
+You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+	
+You must give any other recipients of the Work or Derivative Works a copy of this License; and
+You must cause any modified files to carry prominent notices stating that You changed the files; and You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the	following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+	
 5. Submission of Contributions.
 
-Unless You explicitly state otherwise, any Contribution intentionally submitted
-for inclusion in the Work by You to the Licensor shall be under the terms and
-conditions of this License, without any additional terms or conditions.
-Notwithstanding the above, nothing herein shall supersede or modify the terms of
-any separate license agreement you may have executed with Licensor regarding
-such Contributions.
-
+Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions.  Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+	
 6. Trademarks.
 
-This License does not grant permission to use the trade names, trademarks,
-service marks, or product names of the Licensor, except as required for
-reasonable and customary use in describing the origin of the Work and
-reproducing the content of the NOTICE file.
-
+This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+	
 7. Disclaimer of Warranty.
 
-Unless required by applicable law or agreed to in writing, Licensor provides the
-Work (and each Contributor provides its Contributions) on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
-including, without limitation, any warranties or conditions of TITLE,
-NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
-solely responsible for determining the appropriateness of using or
-redistributing the Work and assume any risks associated with Your exercise of
-permissions under this License.
+Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of	permissions under this License.
 
 8. Limitation of Liability.
-
-In no event and under no legal theory, whether in tort (including negligence),
-contract, or otherwise, unless required by applicable law (such as deliberate
-and grossly negligent acts) or agreed to in writing, shall any Contributor be
-liable to You for damages, including any direct, indirect, special, incidental,
-or consequential damages of any character arising as a result of this License or
-out of the use or inability to use the Work (including but not limited to
-damages for loss of goodwill, work stoppage, computer failure or malfunction, or
-any and all other commercial damages or losses), even if such Contributor has
-been advised of the possibility of such damages.
-
+	
+In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be	liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+	
 9. Accepting Warranty or Additional Liability.
+	
+While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your	sole responsibility, not on behalf of any other Contributor, and only if You
+agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
 
-While redistributing the Work or Derivative Works thereof, You may choose to
-offer, and charge a fee for, acceptance of support, warranty, indemnity, or
-other liability obligations and/or rights consistent with this License. However,
-in accepting such obligations, You may act only on Your own behalf and on Your
-sole responsibility, not on behalf of any other Contributor, and only if You
-agree to indemnify, defend, and hold each Contributor harmless for any liability
-incurred by, or claims asserted against, such Contributor by reason of your
-accepting any such warranty or additional liability.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
 
-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ARM-software/CMSIS/DSP_Lib/Source
+
+Download page: https://github.com/ARM-software/CMSIS/tree/master/CMSIS/DSP_Lib/Source
+Online license: https://github.com/ARM-software/CMSIS/blob/master/CMSIS/DSP_Lib/license.txt
 
 
+DSP_Lib Licence
+
+All files contained in the folders "CMSIS\DSP-Lib\Source" and "CMSIS\DSP-Lib\Examples"
+are guided by the following license:
+
+Copyright (C) 2009-2015 ARM Limited. 
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+ - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+ - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+ - Neither the name of ARM nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

diff -r b9debc14d077 -r 9dd7c64b4a64 LTC26X6.lib
--- a/LTC26X6.lib	Tue May 11 11:08:02 2021 +0000
+++ b/LTC26X6.lib	Mon Dec 06 05:22:28 2021 +0000
@@ -1,1 +1,1 @@
-https://os.mbed.com/teams/AnalogDevices/code/LTC26X6/#91316dbef1b6
+https://os.mbed.com/users/jngarlitos/code/LTC26X6/#cab5535713ee

diff -r b9debc14d077 -r 9dd7c64b4a64 README.txt
--- a/README.txt	Tue May 11 11:08:02 2021 +0000
+++ b/README.txt	Mon Dec 06 05:22:28 2021 +0000
@@ -1,27 +1,33 @@
 Evaluation Boards/Products Supported
 ------------------------------------ 
-EVAL-AD1234 (AD1234)
-EVAL-AD1256 (AD1256)
-<< add more here >>
+EVAL-CN0540-ARDZ
+AD7768-1
+ADA4945-1
+LTC2606
+ADG5421
+AD8605
+ADA4807
 
 Overview
 --------
-These code files provide drivers to interface with AD1234 and communicate with 
-EVAL-AD1234 board. This code was developed and tested on SDP-K1 controller board 
+These code files provide drivers to interface with AD7768-1 & LTC2606 and communicate with 
+EVAL-CN0540-ARDZ. This code was developed and tested on SDP-K1 controller board 
 https://os.mbed.com/platforms/SDP_K1/. 
 
-Product details: https://www.analog.com/en/products/ad1234.html
-Eval board details: https://www.analog.com/en/design-center/evaluation-hardware-and-software/evaluation-boards-kits/EVAL-AD1234.html
+Product details: https://www.analog.com/en/design-center/reference-designs/circuits-from-the-lab/cn0540.html
+Eval board details: https://wiki.analog.com/resources/eval/user-guides/circuits-from-the-lab/cn0540
 
 
 Hardware Setup
 --------------
-Required: SDP-K1, EVAL-AD1234, USB cable, 12 V power supply, 60 MHz external 
-clock supply.
-Plug in the EVAL-AD124 board on SDP-K1 board (or any other Mbed enabled 
-controller board) using the SDP connector and screws.
+Required: SDP-K1, EVAL-CN0535-FMCZ, USB cable,
+
+Plug in the EVAL-CN0535-FMCZ board on SDP-K1 board using the Arduino pin header.
 Connect SDP-K1 board to the PC using the USB cable.
 
+A detailed user guide on how to use EVAL-CN0535-FMCZ with SDP-K1 board on Mbed platform is available 
+here: https://wiki.analog.com/resources/eval/user-guides/circuits-from-the-lab/cn0540/sdp-k1
+
 
 How to Get Started
 ------------------
@@ -30,7 +36,7 @@
 instructions on how to import code are here: https://os.mbed.com/docs/mbed-os/v5.12/tools/importing-code.html
 Compile code. Drag and drop binary into SDP-K1 controller board. Find detailed instructions 
 here: https://os.mbed.com/docs/mbed-os/v5.12/tools/getting-your-program-on-your-board.html
-Open Tera Term (or alternative), select 9600 baud rate, and the applicable COM port to see the 
+Open Tera Term (or alternative), select 115200 baud rate, and the applicable COM port to see the 
 list of options.
 
 A detailed user guide on how to use SDP-K1 board on Mbed platform is available 
@@ -41,3 +47,192 @@
 -----
 If using Win 7, install serial drivers for Mbed. https://os.mbed.com/docs/mbed-os/v5.12/tutorials/windows-serial-driver.html
 A detailed user guide on SDP-K1 controller board is available here https://www.analog.com/en/design-center/evaluation-hardware-and-software/evaluation-boards-kits/SDP-K1.html.
+
+
+License
+-------
+Analog Devices, Inc. (ADI)
+EVALUATION LICENSE AGREEMENT 
+20200427-CN0540EC-CTELA
+
+This Evaluation License Agreement (the “Agreement”) is a legal agreement between Analog Devices, Inc., a Massachusetts corporation, with its principal office at One Technology Way, Norwood, Massachusetts, USA 02062 (“Analog Devices”) and you (personally or on behalf of your employer, as applicable) (“Licensee”) for the software and related documentation that accompanies this Agreement (the “Licensed Software”).   YOU AGREE THAT YOU ARE BOUND BY THE TERMS AND CONDITIONS OF THIS AGREEMENT BY DOWNLOADING, INSTALLING, COPYING OR USING THE SOFTWARE. IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY OR USE THE SOFTWARE.  YOU REPRESENT THAT YOU ARE OVER THE AGE OF 18 AND HAVE THE CAPACITY AND AUTHORITY TO BIND YOURSELF OR YOUR EMPLOYER, AS APPLICABLE, TO THE TERMS OF THIS AGREEMENT. 
+
+The Licensed Software consists of (a) embedded software (including firmware) designed to operate in an Analog Devices processor / product (“Embedded Software”) and / or (b) application software designed to run on personal computers (“PC Software”).  
+
+1. License.  Subject to the terms and conditions of this Agreement, Analog Devices grants to Licensee a non-exclusive, non-transferable, non-sublicensable license to:
+(a) internally use and copy the Embedded Software (and modify the Embedded Software if it is provided in source code form) for the sole purpose of evaluating the use of Embedded Software with Analog Devices’ processors / products; and  
+(b) internally use and copy the PC Software for the sole purpose of evaluating use of the PC Software with Analog Devices’ processors / products.  Such evaluation may include configuring, monitoring and controlling Analog Devices processors / products solely in order to evaluate use of the PC Software with Analog Devices’ processors / products. 
+
+2. Restrictions and Conditions.  The license granted in Section 1 is conditioned on full compliance with this Section 2 and the other obligations under this Agreement.  
+(a) Licensee shall not modify, reverse engineer, decompile, disassemble or create derivative works of the Licensed Software except and only to the extent that such activity is expressly permitted (i) pursuant to Section 1 above or (ii) by applicable law notwithstanding this limitation. 
+(b) In no event shall Licensee (i) sublicense, rent, lease, permit time-sharing or otherwise make available, transfer, deliver, disclose, or distribute the Licensed Software to any third party or (ii) use the Licensed Software for any commercial purpose, including, without limitation, the manufacture of products intended for commercial sale or the development of any other software, application, product or service for commercial release. 
+(c) The Licensed Software may not be used with any processors / products other than Analog Devices’ processors / products or for any other purpose.  
+(d) Licensee shall not engage in any activities with respect to the Licensed Software that would cause the Licensed Software, in whole or in part to become subject to any terms of an Excluded License.  An “Excluded License” means any license, including licenses for “open source” code (such as defined by the Free Software Foundation), that requires as a condition of use, modification, and/or distribution of the software subject to such Excluded License, that such software or other software combined and/or distributed with such software be (i) disclosed or distributed in source code form; (ii) licensed for the purpose of making derivative works; or (iii) redistributable at no charge.  Examples of Excluded Licenses include, without limitation, the GNU General Public License, the GNU Lesser General Public License, the Mozilla Public License and the Microsoft Reciprocal License.  The restrictions of this section apply regardless of whether the Licensed Software is intended or designed to run in an environment that includes software under an Excluded License.  Any license, agreement or other document issued, entered into or granted by Licensee that purports to apply any Excluded License to any portion of the Licensed Software shall be null and void with regard to the Licensed Software. 
+(e) If Analog Devices elects to make any update, upgrade or new version of the Licensed Software (“Updates”) available to Licensee, such Updates shall be deemed to be the Licensed Software under this Agreement.  If requested by Analog Devices, Licensee shall only use the latest version of the Licensed Software (including Updates).  Analog Devices shall have no obligation to provide support or Updates of any kind.
+(f) In no event shall Licensee remove any copyright or other intellectual property notice or other legend contained on or in copies of the Licensed Software or displayed by the Licensed Software.
+(g) To the extent there are any specifications and/or user manuals for the Licensed Software, as an additional restriction under this Agreement (and in no way expanding any rights under this Agreement), the Licensed Software may not be used in any manner that is inconsistent with such specifications and/or user manuals.  For the avoidance of doubt, Licensee may not distribute the Licensed Software under any circumstances.  
+
+3. Ownership.  Licensee acknowledges and agrees that Analog Devices and its licensors and suppliers (as applicable) retain all right, title and interest in the Licensed Software and derivative works thereof, including all related patent, copyright and other intellectual property rights in any of the foregoing, and that Licensee’s rights to the Licensed Software are limited to those expressly provided for in Section 1 above (subject to the conditions and restrictions in this Section 3).  Licensee shall not take any action inconsistent with such title and ownership.  Any use of the Licensed Software for any purpose other than as expressly licensed hereunder is outside the scope of this Agreement.  All rights not expressly granted in this Agreement are reserved to Analog Devices.  It is agreed that because of the proprietary nature of the Licensed Software, Analog Devices’ remedies at law for a breach by the Licensee of its obligations under this Agreement or for use of the Licensed Software beyond the scope of the license granted herein will be inadequate and that Analog Devices will, in the event of such breach, be entitled to equitable relief, including injunctive relief, without the posting of any bond, in addition to all other remedies provided under this Agreement or available at law.
+
+4. Publicity. Notwithstanding anything in this Agreement, Licensee may not use any trademark or trade name of Analog Devices or make any public announcement regarding the existence of this Agreement without Analog Devices’ prior written consent.  Licensee may not publish or provide the results of any benchmark or comparison tests run on the Licensed Software to any third party without the prior written consent of Analog Devices.  
+
+5. Feedback. Licensee may from time to time provide modifications, enhancements, improvements, code, suggestions, ideas, comments or other feedback (“Feedback”) to Analog Devices related to the Licensed Software.  Licensee agrees that all Feedback is and shall be given entirely voluntarily. To the extent Licensee provides such Feedback, Licensee (on behalf of itself and its affiliates) hereby grants to Analog Devices and its affiliates a non-exclusive, irrevocable, perpetual, worldwide, royalty-free, transferable license, with the right to sublicense, under Licensee’s (and its affiliates’) intellectual property, to use and disclose Feedback in any manner Analog Devices or its affiliates choose, including, without limitation, displaying, performing, copying, making, having made, using, selling and otherwise disposing of Analog Devices’ and its affiliates and their respective licensees’ software, applications, products or services embodying such Feedback in any manner and via any media, without reference to its source or other obligation to Licensee and even if the Feedback is designated as confidential.
+
+6.  Confidentiality.  
+(a) The Licensed Software and any accompanying documentation, and any other information which a reasonable person would understand is of a confidential or proprietary nature, shall be deemed to be “Confidential Information” of Analog Devices whether or not it is identified in writing as “Confidential.”  Any other materials or information identified by Analog Devices as “Confidential” or with any similar notice shall also be treated as Confidential Information of Analog Devices under this Agreement.  Analog Devices Confidential Information shall include, without limitation, software and information of Analog Devices’ affiliates, suppliers and licensors.  
+(b) Licensee shall protect the confidentiality of Analog Devices Confidential Information. Without limitation, Licensee agrees: (i) not to disclose or otherwise permit any other person or entity access to, in any manner, Confidential Information, or any part thereof in any form whatsoever; except that such disclosure or access shall be permitted to an employee of Licensee (x) requiring access to Confidential Information in the course of his or her employment in connection with this Agreement, (y) who is subject to written confidentiality obligations at least as protective with respect to Confidential Information as the terms and conditions in this Agreement and (z) who complies with all other applicable provisions of this Agreement; (ii) to notify Analog Devices promptly and in writing of the circumstances surrounding any suspected possession, use or knowledge of Confidential Information other than those authorized by this Agreement; and (iii) not to use Confidential Information for any purpose other than as explicitly set forth herein.
+(c) Nothing in this Section 6 shall restrict Licensee with respect to information if such information:  (i) was rightfully possessed by Licensee before it was received from Analog Devices; (ii) is independently developed by Licensee without reference to Confidential Information; (iii) is subsequently furnished to Licensee by a third party not under any obligation of confidentiality with respect to such information, and without restrictions on use or disclosure; or (iv) is or becomes public or available to the general public otherwise than through any act or default of Licensee.
+(d) Because the unauthorized use, transfer or dissemination of any Confidential Information may diminish substantially the value of such materials and may irreparably harm Analog Devices, if Licensee breaches the provisions of this Section 6, Analog Devices shall, without limiting its other rights or remedies, be entitled to equitable relief, including but not limited to injunctive relief.
+
+7. Third Party Software.  The Licensed Software may be accompanied by or include software made available by one or more third parties (“Third Party Software”).  Each portion of Third Party Software is subject to its own separate software license terms and conditions (“Third Party Licenses”).  The Third Party Licenses for Third Party Software delivered with the Licensed Software may be set forth or identified (by URL or otherwise) in (i) Appendix A to this Agreement (if any), (ii) the applicable software header or footer text, (iii) a text file located in the directory of the applicable Third Party Software component, (iv) software documentation, (v) in connection with any Update of the Licensed Software or its documentation, and/or (vi) such other location customarily used for licensing terms. The use of each portion of Third Party Software is subject to the Third Party Licenses, and Licensee agrees that Licensee’s use of any Third Party Software is bound by the applicable Third Party License.  Licensee agrees to review and comply with all applicable Third Party Licenses prior to any use or distribution of any Third Party Software.  Third Party Software is provided on an “as is” basis without any representation, warranty or liability of any kind.  Analog Devices (including its licensors and suppliers) shall have no liability or responsibility for the operation or performance of the Third Party Software and shall not be liable for any damages, costs, or expenses, direct or indirect, arising out of the performance or failure to perform of the Third Party Software.  Analog Devices (including its licensors and suppliers) shall be entitled to the benefit of any and all limitations of liability and disclaimers of warranties contained in the Third Party Licenses. 
+
+8. Required Consents; Indemnification. Licensee acknowledges that use of the Licensed Software may require Licensee to obtain licenses to intellectual property or other consents from one or more third parties.  Licensee is responsible for obtaining any and all such required licenses or consents regarding the Licensed Software and for the performance of any and all required tests or analysis necessary or appropriate for the determination of the suitability of the Licensed Software for its purposes.  Without limitation, Licensee is responsible for obtaining, maintaining and complying with third party licenses in connection with any Industry Standard hereafter defined below (including related intellectual property rights) applicable to the Licensed Software.  "Industry Standard" means any standard, protocol or specification that is promulgated by any standards development organization, consortium, trade association, special interest group, or like group or entity, for the purpose of widespread adoption.  By way of non-limiting examples, industry standards and specifications may include without limitation technical specifications promulgated by organizations such as the International Telecommunications Union (ITU), International Standards Organization (ISO), International Electrotechnical Commission (IEC), 3'd Generation Partnership Project (3GPP), Moving Picture Experts Group (MPEG), World Wide Web Consortium (W3C), Internet Engineering Task Force (IETF), OpenFabrics Alliance, Open Mobile Alliance, UPnP Forum, USB lmplementers Forum, Institute of Electrical and Electronics Engineers (IEEE), American National Standards Institute (ANSI), Telecommunications Industry Association (TIA), AUTomotive Open System Architecture (AUTOSAR), High-bandwidth Digital Content Protection (HDCP), High-Definition Multimedia Interface (HDMI), Digital Transmission Content Protection (DTCP), Digital Transmission Licensing Administrator (DTLA), and Ethernet POWERLINK Standardization Group (EPSG).  Licensee shall defend, indemnify and hold Analog Devices, its affiliates, licensors and suppliers, and their respective officers, directors, employees and agents (each an “Indemnified Party”) harmless from and against any damages, fines, penalties, assessments, liabilities, costs and expenses (including reasonable attorneys’ fees and court costs) in the event that any claim is brought against an Indemnified Party arising or alleged to arise directly or indirectly from (i) Licensee’s possession, use, distribution or other exploitation of the Licensed Software or Third Party Software, or (ii) Licensee’s failure to obtain any required license or consent with respect to the Licensed Software or Third Party Software.  
+
+9. Term and Termination.  
+(a) The term of this Agreement is for a period of six (6) months commencing on the date the Licensed Software is first received by Licensee from Analog Devices or its authorized distributor (“Term”).  This Agreement is effective until the expiration of the Term or until terminated in accordance with this Section.  Either party may terminate this Agreement at any time by giving written notice to the other party. This Agreement shall immediately automatically terminate in the event of any failure by Licensee to comply with any term or condition of the Agreement. In the event of termination or expiration (i) all licenses granted to Licensee immediately expire and (ii) Licensee must immediately cease using the Licensed Software and permanently delete all copies of the Licensed Software and all of its component parts, including any backup or archival copies.  The provisions of Sections 2 through 19 shall survive any termination or expiration of this Agreement according to their terms.
+(b) THE LICENSED SOFTWARE MAY BE TIME-SENSITIVE AND MAY NOT FUNCTION UPON EXPIRATION OF TERM.  NOTICE OF EXPIRATION WILL NOT BE GIVEN, SO LICENSEE NEEDS TO PLAN FOR THE EXPIRATION DATE.  In order to protect against unauthorized use of the Licensed Software in commercial applications, Analog Devices may have integrated copy protection into the evaluation software.  Typical protection may include a time-out or periodic beep on audio software or a watermark on imaging software.  
+
+10. DISCLAIMER OF WARRANTIES.  
+THE LICENSED SOFTWARE, THIRD PARTY SOFTWARE AND ANY SUPPORT ARE PROVIDED "AS IS" WITHOUT REPRESENTATION OR WARRANTY OF ANY KIND, AND ANALOG DEVICES, FOR ITSELF AND ITS AFFILIATES, HEREBY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, WHETHER EXPRESS OR IMPLIED, ORAL OR WRITTEN, WITH RESPECT TO THE LICENSED SOFTWARE AND THIRD PARTY SOFTWARE AND ANY SUPPORT, INCLUDING, BUT NOT LIMITED TO, ANY EXPRESS OR IMPLIED WARRANTIES OF MERCHANTABILITY; FITNESS FOR ANY PARTICULAR PURPOSE; QUALITY AND ACCURACY OF INFORMATIONAL CONTENT; NON-INFRINGEMENT; QUIET ENJOYMENT; AND TITLE.  LICENSEE AGREES THAT ANY EFFORTS BY ANALOG DEVICES OR ITS AFFILIATES TO MODIFY OR UPDATE THE LICENSED SOFTWARE OR THIRD PARTY SOFTWARE OR PROVIDE SUPPORT SHALL NOT BE DEEMED A WAIVER OF THESE LIMITATIONS, AND THAT ANY ANALOG DEVICES WARRANTIES SHALL NOT BE DEEMED TO HAVE FAILED OF THEIR ESSENTIAL PURPOSE.   
+
+11.  Limitation of Liability.  
+(a) TO THE MAXIMUM EXTENT PERMITTED BY LAW, ANALOG DEVICES (INCLUDING ITS AFFILIATES) SHALL NOT BE LIABLE FOR ANY DAMAGES ARISING FROM OR RELATED TO THE LICENSED SOFTWARE, THIRD PARTY SOFTWARE, THEIR USE OR ANY RELATED INFORMATION AND/OR SERVICES, INCLUDING BUT NOT LIMITED TO ANY INDIRECT, INCIDENTAL, SPECIAL, PUNITIVE, EXEMPLARY, CONSEQUENTIAL OR ANALOGOUS DAMAGES (INCLUDING WITHOUT LIMITATION ANY DAMAGES RESULTING FROM LOSS OF USE, DATA, REVENUE, PROFITS, OR SAVINGS, COMPUTER DAMAGE, INTERRUPTION OF BUSINESS, OR ANY OTHER CAUSE), UNDER ANY LEGAL THEORY (INCLUDING WITHOUT LIMITATION CONTRACT, WARRANTY, TORT, NEGLIGENCE, STRICT OR PRODUCT LIABILITY), EVEN IF IT HAS BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES.  
+(b) IN NO EVENT SHALL ANALOG DEVICES’ CUMULATIVE LIABILITY FOR DAMAGES TO LICENSEE FOR ANY AND ALL CAUSES WHATSOEVER, REGARDLESS OF THE FORM OF ANY CLAIMS OR ACTIONS, EXCEED ONE HUNDRED U.S. DOLLARS ($100.00 U.S.).  ANALOG DEVICES’ AFFILIATES, LICENSORS AND SUPPLIERS SHALL HAVE NO LIABILITY WHATSOEVER UNDER THIS AGREEMENT OR IN CONNECTION WITH THE LICENSED SOFTWARE OR ITS USE.
+(c) Some jurisdictions do not permit the exclusion or limitation of liability for consequential, incidental or other damages, and, as such, some portion of the above limitation may not apply to Licensee.  In such jurisdictions, Analog Devices' liability is limited to the greatest extent permitted by law.
+
+12. Choice of Law.  This Agreement and any dispute related to the Licensed Software shall be governed by the laws of the Commonwealth of Massachusetts, United States of America, without reference to its principles of conflicts of laws, and, as to matters affecting copyrights, trademarks and patents, in addition, by applicable United States federal law.  The parties agree that the jurisdiction and venue of any action with respect to this Agreement shall be in a court of competent subject matter jurisdiction located in Boston, Massachusetts, and each of the parties hereby agrees to submit itself to the exclusive jurisdiction and venue of such courts for the purpose of any such action, except that Analog Devices may seek equitable (including injunctive) relief and enforce judgements in any venue of its choosing. Licensee hereby submits to personal jurisdiction in such courts. The parties hereto specifically exclude the United Nations Convention on Contracts for the International Sale of Goods and the Uniform Computer Information Transactions Act from this Agreement.  The parties hereto waive any statute, law, or regulation that might provide an alternative law or forum or to have this Agreement written in any language other than English.
+
+13. U.S. Government Restricted Rights. If the Licensed Software or documentation provided by Analog Devices or its suppliers is procured by or on behalf of the United States Government, the Government agrees that such software or documentation is “commercial computer software” or “commercial computer software documentation” and that absent a written agreement to the contrary, the Government’s rights with respect to such Licensed Software or documentation are limited by the terms of this Agreement, pursuant to FAR § 12.212(a) and/or DFARS § 227.7202-1(a), as applicable. 
+
+14.  Export.  Licensee shall only use the Licensed Software in compliance with all applicable laws and regulations, including without limitation export control laws.  Licensee agrees that Licensee will not directly or indirectly export the Licensed Software to another country except in full compliance with all applicable United States Federal Laws and Regulations and other laws and regulations relating to exports and imports.  Licensee will not export/re-export, directly or indirectly, any software, information or technical data acquired under this Agreement or the "direct product" thereof to any country for which the United States Government or any agency thereof, at the time of export, requires an export license or other governmental approval, without first obtaining such license or approval.  The term "direct product" as used herein means the immediate product (including processes and services) produced directly by the use of the technical data or information.  In addition to the above, the Licensed Software and/or any "direct product" thereof, may not be used by, or exported, transferred or re-exported to (i) any U.S. or U.N. or EU-sanctioned or embargoed country, or to nationals or residents of such countries;  (ii) any person , entity, organization, or other party identified on the U.S. Department of Treasury’s lists of “Specially Designated Nationals and Blocked Persons” (iii) any associations, individuals, companies, entities, organizations found in the U.S. Department of Commerce’s Table of Denial Orders or Entity List, as published and revised from time to time (collectively known as the "Denied Parties List" or "Prohibited Parties List"); and/or (iv) any unauthorized or prohibited end-user engaged in any prohibited activities related to weapons of mass destruction, including without limitation, activities related to the design, development, production or use of nuclear weapons, materials, or facilities, missile or the support of missile projects, and chemical or biological weapons.  Licensee understands that the foregoing obligations are legal requirements and agree that they shall survive any expiration or termination of this Agreement.
+
+15. Compliance with Laws; Taxes.  Licensee shall comply with all laws, legislation, rules, regulations, governmental requirements and industry standards with respect to the Licensed Software, and the performance by Licensee of its obligations hereunder, existing in any applicable jurisdiction.  In the event that this Agreement is required to be registered with any governmental authority, Licensee shall notify Analog Devices in writing and cause such registration to be made and shall bear any expense or tax payable in respect thereof.  Licensee shall bear any and all expenses and pay any and all taxes that may be payable in relation to this Agreement.
+
+16. Assignment.  This Agreement is personal to Licensee and Licensee may not transfer, sublicense, lease, rent, or assign its rights under this Agreement, and any such attempt shall be null and void. Analog Devices may assign, transfer, or sublicense this Agreement or any rights or obligations hereunder at any time in its sole discretion.
+
+17.  Waiver; Modification; Severability.  No waiver, consent, modification or change of terms of this Agreement shall bind either party unless in writing signed by both parties, and then such waiver, consent, modification or change shall be effective only in the specific instance and for the specific purpose given.  If any provision of this Agreement is unenforceable, such provision shall be enforced to the extent possible under applicable law, and the remaining provisions will remain in effect.
+
+18.  Audit.  Analog Devices shall have the right upon ten (10) days prior written notice to audit Licensee’s compliance with the terms of this Agreement during normal business hours.  In connection with such audit, Analog Devices shall have access to all reasonably requested documents, equipment, information and personnel.  If requested by Analog Devices, within ten business days of such request, Licensee shall either (i) certify in writing that Licensee is fully compliant with this Agreement or (ii) deliver a notice in writing stating all of the reasons why Licensee is not fully compliant.  
+
+19. Entire Agreement.  This Agreement constitutes the entire, final, and complete agreement between the parties hereto relevant to the subject matter hereof, and supersedes any and all other agreements, either oral or in writing, between the parties with respect to the subject matter of this Agreement.  Any term or condition incorporated in Licensee’s purchase order(s) or any other document provided by Licensee to Analog Devices which is in any way different from, inconsistent with or in addition to the terms and conditions set forth herein shall be of no effect, shall not apply to the licensing of the Licensed Software, and shall not become a part of a contract between the parties or be binding upon Analog Devices.  Analog Devices’ failure to object to terms contained in any communication from Licensee shall not be an acceptance of such terms or a waiver of the terms set forth in this Agreement.  If, for any reason, any provision of this Agreement is held invalid, such invalidity shall not affect the remainder of this Agreement, and this Agreement shall continue in force and effect to the full extent allowed by law.  For the avoidance of doubt, all the Licensed Software under this Agreement is subject to the terms and conditions of this Agreement and not any agreement or terms for purchase of Analog Devices products, even if the Licensed Software is delivered with such products.
+
+
+
+ 
+Appendix A – Third Party Licenses
+
+
+Analog devices/platform_drivers
+
+Download page: https://os.mbed.com/teams/AnalogDevices/code/platform_drivers/
+Online license: https://github.com/ARMmbed/mbed-os/blob/master/LICENSE-apache-2.0.txt 
+
+Analog devices/platform_drivers are subject to the mbed-os license (Apache 2.0 License) which is reproduced below.
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+
+ARMmbed/mbed-os
+
+Download page: https://github.com/ARMmbed/mbed-os
+Online license: https://github.com/ARMmbed/mbed-os/blob/master/LICENSE-apache-2.0.txt
+
+
+mbed-os Licence
+
+Apache License
+Version 2.0, January 2004
+http://www.apache.org/licenses/
+    
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+    
+1. Definitions.
+    
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
+    
+"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
+outstanding shares, or (iii) beneficial ownership of such entity.
+    
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+    
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+    
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
+
+"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
+    
+"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition,
+"submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication
+that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."
+
+"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
+    
+2. Grant of Copyright License.
+    
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License.
+
+Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor
+that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License   for that Work shall terminate as of the date such litigation is filed.
+    
+4. Redistribution.
+    
+You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
+    
+You must give any other recipients of the Work or Derivative Works a copy of this License; and
+You must cause any modified files to carry prominent notices stating that You changed the files; and You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the  following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
+    
+5. Submission of Contributions.
+
+Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions.  Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
+    
+6. Trademarks.
+
+This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
+    
+7. Disclaimer of Warranty.
+
+Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of    permissions under this License.
+
+8. Limitation of Liability.
+    
+In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be  liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
+    
+9. Accepting Warranty or Additional Liability.
+    
+While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your    sole responsibility, not on behalf of any other Contributor, and only if You
+agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+
+ARM-software/CMSIS/DSP_Lib/Source
+
+Download page: https://github.com/ARM-software/CMSIS/tree/master/CMSIS/DSP_Lib/Source
+Online license: https://github.com/ARM-software/CMSIS/blob/master/CMSIS/DSP_Lib/license.txt
+
+
+DSP_Lib Licence
+
+All files contained in the folders "CMSIS\DSP-Lib\Source" and "CMSIS\DSP-Lib\Examples"
+are guided by the following license:
+
+Copyright (C) 2009-2015 ARM Limited. 
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+ - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+ - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+ - Neither the name of ARM nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

diff -r b9debc14d077 -r 9dd7c64b4a64 cn0540_app_config.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/cn0540_app_config.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,59 @@
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+
+#ifndef _APP_CONFIG_H_
+#define _APP_CONFIG_H_
+
+#include <stdint.h>
+#include "platform_drivers.h"
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+#ifdef __cplusplus
+#include "ad77681.h"
+}
+#endif
+
+#define  ARDUINO
+
+/**
+  The ADI SDP_K1 can be used with both arduino headers
+  or the 120-pin SDP connector found on ADI evaluation
+  boards. The default is the SDP connector
+
+  Uncomment the ARDUINO #define above to enable the ARDUINO connector
+
+*/
+//#warning  check this
+#ifdef ARDUINO
+	#define I2C_SCL     D15
+	#define I2C_SDA     D14
+
+	#define SPI_CS		D10
+	#define SPI_MISO	D12
+	#define SPI_MOSI	D11
+	#define SPI_SCK		D13
+
+#else
+	
+	#define I2C_SCL     SDP_I2C_SCL
+	#define I2C_SDA     SDP_I2C_SDA
+
+	#define SPI_CS		SDP_SPI_CS_A
+	#define SPI_MISO	SDP_SPI_MISO
+	#define SPI_MOSI	SDP_SPI_MOSI
+	#define SPI_SCK		SDP_SPI_SCK
+
+#endif
+
+#endif //_APP_CONFIG_H_
+
+
+
+

diff -r b9debc14d077 -r 9dd7c64b4a64 cn0540_init_params.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/cn0540_init_params.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,186 @@
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+
+#ifndef _INIT_PARAMS_H_
+#define _INIT_PARAMS_H_
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#include <mbed.h>
+#include <stdint.h>
+#include "ad77681.h"
+#include "ltc26x6.h"
+//#include "cn0540_piezo.h"
+#include "platform_drivers.h"
+#include "platform_support.h"
+#include "spi_extra.h"
+#include "i2c_extra.h"
+#include "gpio.h"
+
+/******************************************************************************/
+/********************** Macros and Constants Definitions **********************/
+/******************************************************************************/
+
+//DAC address
+#define LTC2606_I2C_ADDRESS 0x10
+//GPIOs pins
+#define ARD_RED_LED_PIN		D0
+#define ARD_BLUE_LED_PIN	D1
+#define DRDY_PIN			D2
+#define ADC_RST_PIN			D7
+#define ARD_BUF_EN_PIN		D9
+
+// Init params
+
+// Init SPI extra parameters structure
+mbed_spi_init_param spi_init_extra_params = {
+	.spi_clk_pin =	SPI_SCK,
+	.spi_miso_pin = SPI_MISO,
+	.spi_mosi_pin = SPI_MOSI
+};											 
+// SPI bus init parameters
+spi_init_param spi_params = {
+	20000000,		// SPI Speed
+	SPI_CS,			// SPI CS select index
+	SPI_MODE_3,		// SPI Mode 
+	&spi_init_extra_params, // SPI extra configurations
+};
+
+// Initial parameters for the ADC AD7768-1
+ad77681_init_param init_params = { 
+	
+	spi_params,					// SPI parameters
+	AD77681_ECO,				// power_mode
+	AD77681_MCLK_DIV_16,		// mclk_div
+	AD77681_CONV_CONTINUOUS,	// conv_mode
+	AD77681_AIN_SHORT,			// diag_mux_sel
+	false,						// conv_diag_sel
+	AD77681_CONV_24BIT,			// conv_len
+	AD77681_NO_CRC,				// crc_sel
+	0,							// status bit
+	AD77681_VCM_HALF_VCC,		// VCM setup
+	AD77681_AINn_ENABLED,		// AIN- precharge buffer
+	AD77681_AINp_ENABLED,		// AIN+ precharge buffer			
+	AD77681_BUFn_ENABLED,		// REF- buffer
+	AD77681_BUFp_ENABLED,		// REF+ buffer
+	AD77681_FIR,				// FIR Filter
+	AD77681_SINC5_FIR_DECx32,	// Decimate by 32
+	0,							// OS ratio of SINC3
+	4096,						// Reference voltage
+	16384,						// MCLK in kHz
+	32000,						// Sample rate in Hz
+	1,							// Data frame bytes
+};
+
+//Extra mbed_i2c_init_param
+mbed_i2c_init_param mbed_i2c_init_dac_extra_params = {
+	.i2c_sda_pin =	I2C_SDA,
+	.i2c_scl_pin = I2C_SCL,
+};   	
+	
+// Initial parameters for the DAC LTC2606's I2C bus	
+i2c_init_param i2c_params_dac = {
+		100000, 									// I2C speed (hz)
+		LTC26X6_WRITE_ADDRESS(LTC2606_I2C_ADDRESS),	// I2C slave address
+		&mbed_i2c_init_dac_extra_params,
+};	
+
+// Initial parameters for the DAC LTC2606 itself
+ltc26x6_init_param init_params_dac = { 
+		i2c_params_dac,	// I2C parameters
+		16,				// Resolution (LTC2606)
+		2.5,			// Reference Voltage
+		-0.001			// Typical offset
+};
+
+/*
+ *  User-defined coefficients for programmable FIR filter, max 56 coeffs
+ *  
+ *  Please note that, inserted coefficiets will be mirrored afterwards,
+ *  so you must insert only one half of all the coefficients.
+ *  
+ *  Please note your original filer must have ODD count of coefficients,
+ *  allowing internal ADC circuitry to mirror the coefficients properly.
+ *  
+ *	In case of usage lower count of coeffs than 56, please make sure, that
+ *	the variable 'count_of_active_coeffs' bellow, carries the correct number
+ *	of coeficients, allowing to fill the rest of the coeffs by zeroes
+ *
+ *	Default coeffs:
+ **/
+const uint8_t count_of_active_coeffs = 56;
+
+const float programmable_FIR[56] = { 
+		
+	-9.53674E-07,
+	 3.33786E-06,
+	 5.48363E-06,
+   - 5.48363E-06,
+   - 1.54972E-05,
+	 5.24521E-06,
+	 3.40939E-05,
+	 3.57628E-06,
+   - 6.17504E-05,
+   - 3.05176E-05,
+	 9.56059E-05,
+	 8.74996E-05,
+   - 0.000124693,
+   - 0.000186205,
+	 0.000128746,
+	 0.000333548,
+   - 7.70092E-05,
+   - 0.000524998,
+   - 6.98566E-05,
+     0.000738144,
+	 0.000353813,
+   - 0.000924349,
+   - 0.000809193,
+	 0.001007795,
+	 0.00144887,
+   - 0.000886202,
+   - 0.002248049,
+	 0.000440598,
+	 0.00312829,
+	 0.000447273,
+   - 0.00394845,
+   - 0.001870632,
+	 0.004499197,
+	 0.003867388,
+   - 0.004512072,
+   - 0.006392241,
+	 0.003675938,
+	 0.009288311,
+   - 0.001663446,
+   - 0.012270451,
+   - 0.001842737,
+	 0.014911652,
+	 0.007131577,
+   - 0.016633987,
+   - 0.014478207,
+	 0.016674042,
+	 0.024231672,
+   - 0.013958216,
+   - 0.037100792,
+	 0.006659508,
+	 0.055086851,
+	 0.009580374,
+   - 0.085582495,
+   - 0.052207232,
+	 0.177955151,
+	 0.416601658,
+};
+	
+#ifdef __cplusplus 
+}				  
+#endif // __cplusplus 	
+#endif // !_INIT_PARAMS_H_
+
+

diff -r b9debc14d077 -r 9dd7c64b4a64 main.cpp
--- a/main.cpp	Tue May 11 11:08:02 2021 +0000
+++ b/main.cpp	Mon Dec 06 05:22:28 2021 +0000
@@ -1,75 +1,2001 @@
-/* Copyright (c) 2019 Analog Devices, Inc.  All rights reserved.
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+#include <mbed.h>
+#include "platform_drivers.h"
+#include <assert.h>
+#include "main.h"
+#include "cn0540_app_config.h"
+#include "platform_drivers.h"
+#include "cn0540_init_params.h"
+
+extern "C"{
+#include "ad77681.h"
+#include "cn0540_adi_fft.h" 
+#include "ltc26x6.h"
+}
+// Descriptor of the main device - the ADC AD7768-1
+ad77681_dev *device_adc;
+// Structure carying measured data, sampled by the ADC
+adc_data measured_data;
+// AD7768-1's status register structure, carying all the error flags
+ad77681_status_registers *current_status;
+// Initialize the interrupt event variable
+volatile bool int_event= false;
+
+// Descriptor of the DAC LTC2606
+ltc26x6_dev *device_dac;
+
+// Structure carying data, the FFT module works with                
+fft_entry *FFT_data;    
+// Structure carying measuremtns from the FFT module                                
+fft_measurements *FFT_meas; 
+
+// Initialize the serial object with TX and RX pins
+Serial pc(USBTX, USBRX);
 
-Redistribution and use in source and binary forms, with or without modification, 
-are permitted provided that the following conditions are met:
-  - Redistributions of source code must retain the above copyright notice, 
-  this list of conditions and the following disclaimer.
-  - Redistributions in binary form must reproduce the above copyright notice, 
-  this list of conditions and the following disclaimer in the documentation 
-  and/or other materials provided with the distribution.  
-  - Modified versions of the software must be conspicuously marked as such.
-  - This software is licensed solely and exclusively for use with processors/products 
-  manufactured by or for Analog Devices, Inc.
-  - This software may not be combined or merged with other code in any manner 
-  that would cause the software to become subject to terms and conditions which 
-  differ from those listed here.
-  - Neither the name of Analog Devices, Inc. nor the names of its contributors 
-  may be used to endorse or promote products derived from this software without 
-  specific prior written permission.
-  - The use of this software may or may not infringe the patent rights of one or 
-  more patent holders.  This license does not release you from the requirement 
-  that you obtain separate licenses from these patent holders to use this software.
+// Initialize the drdy pin as interrupt input
+InterruptIn drdy(DRDY_PIN, PullNone);
+// Initialize the adc_rst_pin pin as digital output
+DigitalOut adc_reset(ADC_RST_PIN);
+// Initialize the buffer enable control pin as digital output
+DigitalOut buffer_en(ARD_BUF_EN_PIN);
+// Initialize the red LED control pin as digital output
+DigitalOut led_red(ARD_RED_LED_PIN);
+// Initialize the blue LED pin as digital output
+DigitalOut led_blue(ARD_BLUE_LED_PIN);
+
+/**
+ * ADC data recteption interrupt from DRDY
+ * 
+ * Data reception from the ADC using interrupt generated by the ADC's DRDY (Data Ready) pin
+ * Interrupt triggers falling edge of the active-high DRDY pulse
+ * DRDY pulse is generated by the ADC and frequency of the DRDY pulse depends on the ADC settings:
+ * 
+ * DRDY frequency = MCLK / ( MCLK_DIV * FILTER_OSR )
+ */
+void drdy_interrupt()
+{
+    int_event = true;
+
+    if (measured_data.count == measured_data.samples) { // Desired numer of samples has been taken, set everything back
+        drdy.disable_irq();                             // Disable interrupt on DRDY pin
+        measured_data.finish = true;                    // Desired number of samples has been taken
+        measured_data.count = 0;                        // Set measured data counter to 0
+    }
+}
+/**
+ *=================================================================================================================================
+ *
+ * //////////////////////////////////////////////////    MAIN function      \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
+ *
+ *==================================================================================================================================
+ */
+int main() {
+
+    int32_t connected = FAILURE;
+    uint32_t menu;
 
-THIS SOFTWARE IS PROVIDED BY ANALOG DEVICES, INC. AND CONTRIBUTORS "AS IS" AND 
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, NON-INFRINGEMENT, 
-TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN 
-NO EVENT SHALL ANALOG DEVICES, INC. OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, 
-INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, PUNITIVE OR CONSEQUENTIAL DAMAGES 
-(INCLUDING, BUT NOT LIMITED TO, DAMAGES ARISING OUT OF CLAIMS OF INTELLECTUAL 
-PROPERTY RIGHTS INFRINGEMENT; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 
-OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 
-THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 
-NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, 
-EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+    sdpk1_gpio_setup();                                                     // Setup SDP-K1 GPIOs
+    adc_hard_reset();                                                       // Perform ADC hard reset
+    connected = ad77681_setup(&device_adc, init_params, &current_status);   // SETUP and check connection
+    if(connected == FAILURE)
+        go_to_error();      
+    ltc26x6_init(&device_dac, init_params_dac);                             // Initialize DAC
+    
+    #ifdef CN0540_ADI_FFT_H_                                                // if the adi_fft.h is included , initialize it
+    FFT_init_params(&FFT_data, &FFT_meas);                                  // Initialize FFT structure and allocate space
+    update_FFT_enviroment(device_adc->vref, device_adc->mclk, device_adc->sample_rate, FFT_data);   // Update the Vref, Mclk, Sampling rate
+    measured_data.samples = 4096;                                           // Initialize FFT with 4096 samples
+    FFT_init(measured_data.samples, FFT_data);                              // Update sample count for FFT
+    #endif // CN0540_ADI_FFT_H_                                                    // FFT module is initializedd with 4096 samples and 7-term BH window
 
-2019-01-10-7CBSD SLA
-*/
+    print_title();
+    print_prompt();
 
-#include "mbed.h"
-
-// LED Blinking rate in milliseconds (Note: need to define the unit of a time duration i.e. seconds(s) or milliseconds(ms))
-#define SLEEP_TIME                  500ms
-
-// Initialise the digital pin that controls LED1
-DigitalOut led(LED1);
-// Initialise the serial object with TX and RX pins
-static BufferedSerial  serial_port(USBTX, USBRX);
-
-// The File handler is needed to allow printf commands to write to the terminal
-FileHandle *mbed::mbed_override_console(int fd)
-{
-    return &serial_port;
+//============ MAIN WHILE ====================//    
+    while(1)
+    {
+        
+        if (pc.readable()) { // Settings menu SWITCH
+            getUserInput(&menu);
+            
+            switch (menu) {     
+            case 1:
+                menu_1_set_adc_powermode();     // Set ADC power mode
+                break;
+            case 2:
+                menu_2_set_adc_clock_divider(); // Set ADC clock divider
+                break;
+            case 3:
+                menu_3_set_adc_filter_type();   // Set ad7768-1 filter type
+                break;
+            case 4:
+                menu_4_adc_buffers_controll();  // Set ADC AIN & Reference buffers
+                break;
+            case 5:
+                menu_5_set_default_settings();  // Set ADC default value
+                break;
+            case 6:
+                menu_6_set_adc_vcm();           // Set ADC VCM
+                break;  
+            case 7:
+                menu_7_adc_read_register();     // Read ADC register
+                break;
+            case 8:
+                menu_8_adc_cont_read_data();    // Perform continuous ADC read
+                break;
+            case 9:
+                menu_9_reset_ADC();             // Reset ADC
+                break;
+            case 10:
+                menu_10_power_down();           // ADC Wake up and sleep 
+                break;
+            case 11:
+                menu_11_ADC_GPIO();             // Set ADC GPIOs
+                break;
+            case 12:
+                menu_12_read_master_status();   // Read ADC master status
+                break;
+            case 13:
+                menu_13_mclk_vref();            // Set ADC MCLK / Vref
+                break;
+            case 14:
+                menu_14_print_measured_data();  // Print continouos ADC read data
+                break;
+            case 15:
+                menu_15_set_adc_data_output_mode(); // Set ADC data output mode 
+                break;
+            case 16:
+                menu_16_set_adc_diagnostic_mode();  // Set ADC to diagnostic mode
+                break;
+            case 17:
+                menu_17_do_the_fft();               // Perform FFT
+                break;
+            case 18:
+                menu_18_fft_settings();             // Change FFT settins
+                break;
+            case 19:
+                menu_19_gains_offsets();            // Set ADC gain and offset
+                break;
+            case 20:
+                menu_20_check_scratchpad();         // Perform scratchpad check
+                break;
+            case 21:
+                menu_21_piezo_offset();             // Compensate piezo offset
+                break;
+            case 22:
+                menu_22_set_DAC_output();           // Set DAC output mode
+                break;
+            default:
+                pc.printf("Invalid option");
+                print_prompt();
+                break;
+            }
+        }
+    }   
 }
 
-// main() runs in its own thread in the OS
-int main()
+/**
+ * Error warning, in case of unsuccessfull SPI connection
+ *
+ */
+void static go_to_error()
 {
-    // printing the Mbed OS version this example was written to the console
-    printf("This Application has been developed on Mbed OS version 6.4\r\n");
-    
-    // printing the actual Mbed OS version that this application is using to the console.
-    printf(
-        "Mbed OS version %d.%d.%d is what this applicaiton is currently using\r\n",
-        MBED_MAJOR_VERSION,
-        MBED_MINOR_VERSION,
-        MBED_PATCH_VERSION
-    );
-    
-    // The loop will toggle the LED every 500ms(SLEEP_TIME = 500ms) and print LED1s current state to the terminal
+    int32_t connected = FAILURE;
+    uint8_t scratchpad_sequence = 0xAD;
     while (1) {
-        led = !led;          // toggle LED1 state
-        printf("LED1 state: %d \r\n", (uint8_t)led);
-        ThisThread::sleep_for(SLEEP_TIME);
+        pc.printf("ERROR: NOT CONNECTED\nCHECK YOUR PHYSICAL CONNECTION\n\n");  // When not connected, keep showing error message
+        wait(5);
+        connected = ad77681_setup(&device_adc, init_params, &current_status);   // Keep trying to connect
+        if (connected == SUCCESS) {
+            pc.printf("SUCCESSFULLY RECONNECTED\n\n");                          // If successfull reading from scratchpad, init the ADC and go back
+            break;
+        }
     }
 }
 
+/**
+ * Print title
+ *
+ */
+void static print_title() {
+    pc.printf("\n\r");
+    pc.printf("****************************************************************\n");
+    pc.printf("*      EVAL-CN0540-PMDZ Demonstration Program -- (mbed)        *\n");
+    pc.printf("*                                                              *\n");
+    pc.printf("*   This program demonstrates IEPE / ICP piezo accelerometer   *\n");
+    pc.printf("*        interfacing and FFT measurements using AD7768-1       *\n");
+    pc.printf("*           Precision 24-bit sigma-delta AD converter          *\n");
+    pc.printf("*                                                              *\n");
+    pc.printf("* Set the baud rate to 115200 select the newline terminator.   *\n");
+    pc.printf("****************************************************************\n");
+}
+
+/**
+ * Print main menu to console
+ *
+ */
+void static print_prompt() {
+    pc.printf("\n\nCommand Summary:\n\n");
+    pc.printf("  1  - Set ADC power mode\n");
+    pc.printf("  2  - Set ADC MCLK divider\n");
+    pc.printf("  3  - Set ADC filter type\n");
+    pc.printf("  4  - Set ADC AIN and REF buffers\n");
+    pc.printf("  5  - Set ADC to default config\n");
+    pc.printf("  6  - Set ADC VCM output\n");
+    pc.printf("  7  - Read desired ADC register\n");
+    pc.printf("  8  - Read continuous ADC data\n");
+    pc.printf("  9  - Reset ADC\n");
+    pc.printf("  10 - ADC Power-down\n");
+    pc.printf("  11 - Set ADC GPIOs\n");
+    pc.printf("  12 - Read ADC master status\n");
+    pc.printf("  13 - Set ADC Vref and MCLK\n");
+    pc.printf("  14 - Print ADC measured data\n");
+    pc.printf("  15 - Set ADC data output mode\n");
+    pc.printf("  16 - Set ADC diagnostic mode\n");
+    pc.printf("  17 - Do the FFT\n");
+    pc.printf("  18 - FFT settings\n");
+    pc.printf("  19 - Set ADC Gains, Offsets\n");
+    pc.printf("  20 - ADC Scratchpad Check\n");
+    pc.printf("  21 - Compenzate Piezo sensor offset\n");
+    pc.printf("  22 - Set DAC output\n");
+    pc.printf("\n\r");
+}
+
+/**
+ * Read user input from uart
+ * *UserInput = 0 if failure
+ *
+ */
+int32_t static getUserInput(uint32_t *UserInput)
+{
+    long uart_val;
+    int32_t ret;
+
+    ret = pc.scanf("%ld", &uart_val);       // Return 1 = OK, Return 0 = Fail
+
+    if((ret == 0) || (uart_val < 0)) {      // Failure if uart_val is negative, or non-digit
+        *UserInput = 0;
+        return FAILURE;
+    }
+    *UserInput = (uint32_t)(uart_val);
+    return SUCCESS;
+}
+
+/**
+ * Set power mode
+ *
+ */
+ void static menu_1_set_adc_powermode(void)
+ {  
+    uint32_t new_pwr_mode;
+    
+    pc.printf(" Avaliable power modes: \n");
+    pc.printf("  1 - Low power mode\n");
+    pc.printf("  2 - Median power mode\n");
+    pc.printf("  3 - Fast power mode\n");
+    pc.printf(" Select an option: \n");
+
+    getUserInput(&new_pwr_mode);    
+    pc.putc('\n');
+    
+    switch (new_pwr_mode) { 
+    case 1:
+        ad77681_set_power_mode(device_adc, AD77681_ECO);
+        pc.printf(" Low power mode selected\n");
+        break;
+    case 2:
+        ad77681_set_power_mode(device_adc, AD77681_MEDIAN);
+        pc.printf(" Median power mode selected\n");
+        break;
+    case 3:
+        ad77681_set_power_mode(device_adc, AD77681_FAST);
+        pc.printf(" Fast power mode selected\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Set clock divider
+ *
+ */
+ void static menu_2_set_adc_clock_divider(void)
+ {
+    uint32_t new_mclk_div;
+    
+    pc.printf(" Avaliable MCLK divide options: \n");
+    pc.printf("  1 - MCLK/16\n");
+    pc.printf("  2 - MCLK/8\n");
+    pc.printf("  3 - MCLK/4\n");
+    pc.printf("  4 - MCLK/2\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_mclk_div);    
+    pc.putc('\n');
+    
+    switch (new_mclk_div) { 
+    case 1:
+        ad77681_set_mclk_div(device_adc, AD77681_MCLK_DIV_16);
+        pc.printf(" MCLK/16 selected\n");
+        break;
+    case 2:
+        ad77681_set_mclk_div(device_adc, AD77681_MCLK_DIV_8);
+        pc.printf(" MCLK/8 selected\n");
+        break;
+    case 3:
+        ad77681_set_mclk_div(device_adc, AD77681_MCLK_DIV_4);
+        pc.printf(" MCLK/4 selected\n");
+        break;
+    case 4:
+        ad77681_set_mclk_div(device_adc, AD77681_MCLK_DIV_2);
+        pc.printf(" MCLK/2 selected\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    // Update the sample rate after changing the MCLK divider
+    ad77681_update_sample_rate(device_adc);         
+    print_prompt();
+}
+
+/**
+ * Set filter type
+ *
+ */
+ void static menu_3_set_adc_filter_type(void) 
+{
+    uint32_t new_filter = 0;
+    int32_t ret;
+    
+    pc.printf(" Avaliable clock divide options: \n");
+    pc.printf(" 1 - SINC3 Fileter\n");
+    pc.printf(" 2 - SINC5 Filter\n");
+    pc.printf(" 3 - Low ripple FIR Filter\n");
+    pc.printf(" 4 - SINC3 50/60Hz rejection\n");
+    pc.printf(" 5 - User-defined FIR filter\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_filter);  
+    pc.putc('\n');
+    
+    switch (new_filter) {   
+    case 1:
+        set_adc_SINC3_filter();
+        break;      
+    case 2:
+        set_adc_SINC5_filter();
+        break;      
+    case 3:
+        set_adc_FIR_filter();
+        break;  
+    case 4:
+        set_adc_50HZ_rej();
+        break;  
+    case 5:
+        set_adc_user_defined_FIR();
+        break;      
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    // Update the sample rate after changing the Filter type
+    ad77681_update_sample_rate(device_adc); 
+    print_prompt();
+}
+
+/**
+ * Set SINC3 filter
+ *
+ */
+void static set_adc_SINC3_filter(void)
+{   
+    uint32_t new_sinc3 = 0, new_sinc5 = 0;
+    int32_t ret;
+    
+    pc.printf(" SINC3 filter Oversampling ratios: \n");
+    pc.printf(" OSR is calculated as (x + 1)*32\n");
+    pc.printf(" x is SINC3 OSR register value\n");
+    pc.printf(" Please input a value from 0 to 8192 = 2^13\n  :");
+        
+    ret = getUserInput(&new_sinc3);
+
+    if ((new_sinc3 >= 0) && (new_sinc3 <= 8192) && (ret == SUCCESS)) {
+        pc.printf("%d\n", new_sinc3);
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_SINC3, new_sinc3);
+        pc.printf(" SINC3 OSR is set to %d\n", (new_sinc3 + 1) * 32);
+    } else {
+        pc.printf("%d\n", new_sinc3);
+        pc.printf(" Invalid option - too large number\n");
+    }    
+}
+
+/**
+ * Set SINC5 filter
+ *
+ */
+void static set_adc_SINC5_filter(void)
+{
+    uint32_t new_sinc5;
+    
+    pc.printf(" SINC5 filter Oversampling ratios: \n");
+    pc.printf("  1 - Oversampled by 8\n");
+    pc.printf("  2 - Oversampled by 16\n");
+    pc.printf("  3 - Oversampled by 32\n");
+    pc.printf("  4 - Oversampled by 64\n");
+    pc.printf("  5 - Oversampled by 128\n");
+    pc.printf("  6 - Oversampled by 256\n");
+    pc.printf("  7 - Oversampled by 512\n");
+    pc.printf("  8 - Oversampled by 1024\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_sinc5);
+        
+    pc.putc('\n');
+        
+    switch (new_sinc5) {
+    case 1:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_SINC5_DECx8, 0);
+        pc.printf(" SINC5 with OSRx8 set\n");
+        break;
+    case 2:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_SINC5_DECx16, 0);
+        pc.printf(" SINC5 with OSRx16 set\n");
+        break;
+    case 3:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_SINC5, 0);
+        pc.printf(" SINC5 with OSRx32 set\n");
+        break;
+    case 4:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx64, AD77681_SINC5, 0);
+        pc.printf(" SINC5 with OSRx64 set\n");
+        break;
+    case 5:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx128, AD77681_SINC5, 0);
+        pc.printf(" SINC5 with OSRx128 set\n");
+        break;
+    case 6:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx256, AD77681_SINC5, 0);
+        pc.printf(" SINC5 with OSRx256 set\n");
+        break;
+    case 7:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx512, AD77681_SINC5, 0);
+        pc.printf(" SINC5 with OSRx512 set\n");
+        break;
+    case 8:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx1024, AD77681_SINC5, 0);
+        pc.printf(" SINC5 with OSRx1024 set\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }   
+}
+
+/**
+ * Set FIR filter
+ *
+ */
+void static set_adc_FIR_filter(void)
+{
+    uint32_t new_fir;
+    
+    pc.printf(" FIR filter Oversampling ratios: \n");
+    pc.printf("  1 - Oversampled by 32\n");
+    pc.printf("  2 - Oversampled by 64\n");
+    pc.printf("  3 - Oversampled by 128\n");
+    pc.printf("  4 - Oversampled by 256\n");
+    pc.printf("  5 - Oversampled by 512\n");
+    pc.printf("  6 - Oversampled by 1024\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_fir);
+        
+    pc.putc('\n');
+        
+    switch (new_fir) {
+    case 1:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_FIR, 0);
+        pc.printf(" FIR with OSRx32 set\n");
+        break;
+    case 2:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx64, AD77681_FIR, 0);
+        pc.printf(" FIR with OSRx64 set\n");
+        break;
+    case 3:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx128, AD77681_FIR, 0);
+        pc.printf(" FIR with OSRx128 set\n");
+        break;
+    case 4:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx256, AD77681_FIR, 0);
+        pc.printf(" FIR with OSRx256 set\n");
+        break;
+    case 5:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx512, AD77681_FIR, 0);
+        pc.printf(" FIR with OSRx512 set\n");
+        break;
+    case 6:
+        ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx1024, AD77681_FIR, 0);
+        pc.printf(" FIR with OSRx1024 set\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }           
+}
+
+/**
+ * Set 50HZ rejection bit when SINC3 is being used
+ *
+ */
+void static set_adc_50HZ_rej(void)
+{
+    uint32_t new_50Hz;
+    
+    pc.printf(" SINC3 50/60Hz rejection: \n");
+    pc.printf("  1 - 50/60Hz rejection enable \n");
+    pc.printf("  2 - 50/60Hz rejection disable \n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_50Hz);
+        
+    pc.putc('\n');
+        
+    switch (new_50Hz)
+    {
+    case 1:
+        ad77681_set_50HZ_rejection(device_adc, ENABLE);
+        pc.printf(" SINC3 50/60Hz rejection enabled\n");
+        break;
+    case 2:
+        ad77681_set_50HZ_rejection(device_adc, DISABLE);
+        pc.printf(" SINC3 50/60Hz rejection disabled\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }   
+}
+
+/**
+ * Insert user-defined FIR filter coeffs
+ *
+ */
+void static set_adc_user_defined_FIR(void) 
+{
+    const uint8_t coeff_reg_length = 56; // Maximum allowed number of coefficients in the coeff register                                        
+
+    pc.printf(" User Defined FIR filter\n");
+    
+    if ((ARRAY_SIZE(programmable_FIR) <= coeff_reg_length) && (count_of_active_coeffs <= coeff_reg_length)) {
+        pc.printf("  Aplying user-defined FIR filter coefficients from 'FIR_user_coeffs.h'\n");
+        ad77681_programmable_filter(device_adc, programmable_FIR, count_of_active_coeffs);
+        pc.printf(" Coeffs inserted successfully\n");
+    } else
+        pc.printf("  Incorrect count of coefficients in 'FIR_user_coeffs.h'\n");    
+}
+
+/**
+ * AIN and REF buffers controll
+ *
+ */
+ void static menu_4_adc_buffers_controll(void)
+ {
+    uint32_t new_AIN_buffer = 0, new_REF_buffer = 0, new_buffer = 0;
+    
+    pc.printf(" Buffers settings: \n");
+    pc.printf("  1 - Set AIN precharge buffers\n");
+    pc.printf("  2 - Set REF buffers\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_buffer);
+    pc.putc('\n');
+    
+    switch (new_buffer) {
+    case 1:
+        pc.printf(" Analog IN precharge buffers settings: \n");
+        pc.printf("  1 - Turn ON  both precharge buffers\n");
+        pc.printf("  2 - Turn OFF both precharge buffers\n");
+        pc.printf("  3 - Turn ON  AIN- precharge buffer\n");
+        pc.printf("  4 - Turn OFF AIN- precharge buffer\n");
+        pc.printf("  5 - Turn ON  AIN+ precharge buffer\n");
+        pc.printf("  6 - Turn OFF AIN+ precharge buffer\n");
+        pc.printf(" Select an option: \n");
+                
+        getUserInput(&new_AIN_buffer);
+        pc.putc('\n');
+        
+        switch (new_AIN_buffer) {
+        case 1:
+            ad77681_set_AINn_buffer(device_adc, AD77681_AINn_ENABLED);
+            ad77681_set_AINp_buffer(device_adc, AD77681_AINp_ENABLED);
+            pc.printf(" AIN+ and AIN- enabled\n");
+            break;
+        case 2:
+            ad77681_set_AINn_buffer(device_adc, AD77681_AINn_DISABLED);
+            ad77681_set_AINp_buffer(device_adc, AD77681_AINp_DISABLED);
+            pc.printf(" AIN+ and AIN- disabled\n");
+            break;
+        case 3:
+            ad77681_set_AINn_buffer(device_adc, AD77681_AINn_ENABLED);
+            pc.printf(" AIN- Enabled\n");
+            break;
+        case 4:
+            ad77681_set_AINn_buffer(device_adc, AD77681_AINn_DISABLED);
+            pc.printf(" AIN- Disabled\n");
+            break;
+        case 5:
+            ad77681_set_AINp_buffer(device_adc, AD77681_AINp_ENABLED);
+            pc.printf(" AIN+ Enabled\n");
+            break;
+        case 6:
+            ad77681_set_AINp_buffer(device_adc, AD77681_AINp_DISABLED);
+            pc.printf(" AIN+ Disabled\n");
+            break;
+        default:
+            pc.printf(" Invalid option\n");
+            break;
+        }       
+        break;
+    case 2:
+        pc.printf(" REF buffers settings: \n");
+        pc.printf("  1 - Full REF- reference buffer\n");
+        pc.printf("  2 - Full REF+ reference buffer\n");
+        pc.printf("  3 - Unbuffered REF- reference buffer\n");
+        pc.printf("  4 - Unbuffered REF+ reference buffer\n");
+        pc.printf("  5 - Precharge  REF- reference buffer\n");
+        pc.printf("  6 - Precharge  REF+ reference buffer\n");
+        pc.printf(" Select an option: \n");
+        
+        getUserInput(&new_REF_buffer);
+        pc.putc('\n');
+        
+        switch (new_REF_buffer) {
+        case 1:
+            ad77681_set_REFn_buffer(device_adc, AD77681_BUFn_FULL_BUFFER_ON);
+            pc.printf(" Fully buffered REF-\n");
+            break;
+        case 2:
+            ad77681_set_REFp_buffer(device_adc, AD77681_BUFp_FULL_BUFFER_ON);
+            pc.printf(" Fully buffered REF+\n");
+            break;
+        case 3:
+            ad77681_set_REFn_buffer(device_adc, AD77681_BUFn_DISABLED);
+            pc.printf(" Unbuffered REF-\n");
+            break;
+        case 4:
+            ad77681_set_REFp_buffer(device_adc, AD77681_BUFp_DISABLED);
+            pc.printf(" Unbuffered REF+\n");
+            break;
+        case 5:
+            ad77681_set_REFn_buffer(device_adc, AD77681_BUFn_ENABLED);
+            pc.printf(" Precharge buffer on REF-\n");
+            break;
+        case 6:
+            ad77681_set_REFp_buffer(device_adc, AD77681_BUFp_ENABLED);
+            pc.printf(" Precharge buffer on REF+\n");
+            break;
+        default:
+            pc.printf(" Invalid option\n");
+            break;
+        }       
+        break;      
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Default ADC Settings
+ *
+ */
+ void static menu_5_set_default_settings(void)
+ {  
+    int32_t default_settings_flag  = ad77681_setup(&device_adc, init_params, &current_status);
+    
+    if (default_settings_flag == SUCCESS)
+        pc.printf("\n Default ADC settings successfull\n");
+    else
+        pc.printf("\n Error in settings, please reset the ADC\n");
+    print_prompt();
+}
+
+/**
+ * VCM output controll
+ *
+ */
+ void static menu_6_set_adc_vcm(void)
+ {
+    uint32_t new_vcm = 0;
+    
+    pc.printf(" Avaliable VCM output voltage levels: \n");
+    pc.printf("  1 - VCM = (AVDD1-AVSS)/2\n");
+    pc.printf("  2 - VCM = 2.5V\n");
+    pc.printf("  3 - VCM = 2.05V\n");
+    pc.printf("  4 - VCM = 1.9V\n");
+    pc.printf("  5 - VCM = 1.65V\n");
+    pc.printf("  6 - VCM = 1.1V\n");
+    pc.printf("  7 - VCM = 0.9V\n");
+    pc.printf("  8 - VCM off\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_vcm);
+    pc.putc('\n');
+    
+    switch (new_vcm) {
+    
+    case 1:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_HALF_VCC);
+        pc.printf(" VCM set to half of the Vcc\n");
+        break;
+    case 2:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_2_5V);
+        pc.printf(" VCM set to 2.5V\n");
+        break;
+    case 3:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_2_05V);
+        pc.printf(" VCM set to 2.05V\n");
+        break;
+    case 4:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_1_9V);
+        pc.printf(" VCM set to 1.9V\n");
+        break;
+    case 5:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_1_65V);
+        pc.printf(" VCM set to 1.65V\n");
+        break;
+    case 6:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_1_1V);
+        pc.printf(" VCM set to 1.1V\n");
+        break;
+    case 7:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_0_9V);
+        pc.printf(" VCM set to 0.9V\n");
+        break;      
+    case 8:
+        ad77681_set_VCM_output(device_adc, AD77681_VCM_OFF);
+        pc.printf(" VCM OFF\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Register read
+ *
+ */
+ void static menu_7_adc_read_register(void)
+ {
+    uint32_t new_reg_to_read = 0;
+    uint8_t reg_read_buf[3], read_adc_data[6], hex_number = 0;
+    uint8_t HI = 0, MID = 0, LO = 0;
+    char binary_number[8], other_register[2] = "";
+    double voltage;
+    
+    pc.printf(" Read desired register: \n");
+    pc.printf("  1 - 0x03        - Chip type\n");
+    pc.printf("  2 - 0x14        - Interface format\n");
+    pc.printf("  3 - 0x15        - Power clock\n");
+    pc.printf("  4 - 0x16        - Analog\n");
+    pc.printf("  5 - 0x17        - Analog2\n");
+    pc.printf("  6 - 0x18        - Conversion\n");
+    pc.printf("  7 - 0x19        - Digital filter\n");
+    pc.printf("  8 - 0x1A        - SINC3 Dec. rate MSB\n");
+    pc.printf("  9 - 0x1B        - SINC3 Dec. rate LSB\n");
+    pc.printf(" 10 - 0x1C        - Duty cycle ratio\n");
+    pc.printf(" 11 - 0x1D        - Sync, Reset\n");
+    pc.printf(" 12 - 0x1E        - GPIO Controll\n");
+    pc.printf(" 13 - 0x1F        - GPIO Write\n");
+    pc.printf(" 14 - 0x20        - GPIO Read\n");
+    pc.printf(" 15 - 0x21 - 0x23 - Offset register\n");
+    pc.printf(" 16 - 0x24 - 0x26 - Gain register\n");
+    pc.printf(" 17 - 0x2C        - ADC Data\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_reg_to_read);
+    pc.putc('\n');
+    
+    switch (new_reg_to_read) {  
+    case 1:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_CHIP_TYPE, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x03 - Chip type register is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 2:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_INTERFACE_FORMAT, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x14 - Interface format register is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 3:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_POWER_CLOCK, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x15 - Power clock register is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 4:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_ANALOG, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x16 - Anlaog register is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 5:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_ANALOG2, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x17 - Analog2 regster is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 6:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_CONVERSION, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x18 - Conversion register is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 7:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_DIGITAL_FILTER, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x19 - Digital filter register is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 8:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_SINC3_DEC_RATE_MSB, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x1A - SINC3 Dec. rate MSB is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 9:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_SINC3_DEC_RATE_LSB, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x1B - SINC3 Dec. rate LSB is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 10:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_DUTY_CYCLE_RATIO, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x1C - Duty cycle ratio 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 11:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_SYNC_RESET, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x1D - Sync, Reset 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 12:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_GPIO_CONTROL, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x1E - GPIO Controll is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 13:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_GPIO_WRITE, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x1F - GPIO Write is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 14:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_GPIO_READ, reg_read_buf);
+        print_binary(reg_read_buf[1], binary_number);
+        pc.printf(" Value of 0x20 - GPIO Read is: 0x%x  0b%s\n", reg_read_buf[1], binary_number);
+        break;
+    case 15:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_OFFSET_HI, reg_read_buf);
+        HI = reg_read_buf[1];
+        
+        ad77681_spi_reg_read(device_adc, AD77681_REG_OFFSET_MID, reg_read_buf);
+        MID = reg_read_buf[1];
+        
+        ad77681_spi_reg_read(device_adc, AD77681_REG_OFFSET_LO, reg_read_buf);
+        LO = reg_read_buf[1];
+        
+        pc.printf(" Value of 0x21 - 0x23 - Offset register is: 0x%x %x %x\n", HI, MID, LO);
+        break;
+        
+    case 16:
+        ad77681_spi_reg_read(device_adc, AD77681_REG_GAIN_HI, reg_read_buf);
+        HI = reg_read_buf[1];
+        
+        ad77681_spi_reg_read(device_adc, AD77681_REG_GAIN_MID, reg_read_buf);
+        MID = reg_read_buf[1];
+        
+        ad77681_spi_reg_read(device_adc, AD77681_REG_GAIN_LO, reg_read_buf);
+        LO = reg_read_buf[1];
+        
+        pc.printf(" Value of 0x24 - 0x26 - Gain register is: 0x%x %x %x\n", HI, MID, LO);
+        break;  
+    case 17:
+        ad77681_spi_read_adc_data(device_adc, read_adc_data, AD77681_REGISTER_DATA_READ);
+        pc.printf(" Value of 0x2C - ADC data is: 0x%x 0x%x 0x%x\n", read_adc_data[1], read_adc_data[2], read_adc_data[3]);
+        break;  
+
+    default :
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Read ADC data
+ *
+ */
+void static menu_8_adc_cont_read_data(void)
+{
+    uint32_t new_sample_count = 0;
+    int32_t ret;
+
+    pc.printf(" Read Continuous ADC Data");
+    pc.printf("  Input number of samples (1 to 4096): \n");
+    ret = getUserInput(&new_sample_count);                  // Get user input
+
+    if ((new_sample_count <= 4096) && (ret == SUCCESS) ) {
+        pc.printf("\n%d of samples\n", new_sample_count);   // Print Desired Measurement Count
+        measured_data.samples = (uint16_t)(new_sample_count);
+        measured_data.finish = false;
+        measured_data.count = 0;
+        pc.printf("Sampling....\n");
+        cont_sampling();
+        pc.printf("Done Sampling....\n");
+    } else {
+        pc.printf(" Invalid option\n");
+    }
+    print_prompt();
+}
+
+/**
+ * ADC Continuous read
+ *
+ */
+void static cont_sampling()
+{      
+    uint8_t buf[6];
+     
+    ad77681_set_continuos_read(device_adc, AD77681_CONTINUOUS_READ_ENABLE);
+    __enable_irq();                                             // Enable all interupts
+    drdy.enable_irq();                                          // Enable interrupt on DRDY pin
+    drdy.fall(&drdy_interrupt);                                 // Interrupt on falling edne of DRDY
+
+    while (!measured_data.finish) { // While loop. Waiting for the measurements to be completed
+        if (int_event==true) {      // Checks if Interrupt Occurred
+            ad77681_spi_read_adc_data(device_adc, buf, AD77681_CONTINUOUS_DATA_READ);    // Read the continuous read data
+            if (device_adc->conv_len == AD77681_CONV_24BIT)                                                 // 24bit format
+                measured_data.raw_data[measured_data.count] = (buf[0] << 16 | buf[1] << 8 | buf[2]<< 0);    // Combining the SPI buffers
+            else                                                                                            // 16bit format
+                measured_data.raw_data[measured_data.count] = (buf[0] << 8 | buf[1]<< 0);                   // Combining the SPI buffers
+            measured_data.count++;  // Increment Measured Data Counter
+            int_event=false;        // Set int event flag after reading the Data
+        }
+    }
+    ad77681_set_continuos_read(device_adc, AD77681_CONTINUOUS_READ_DISABLE);    // Disable continuous ADC read
+}
+
+/**
+ * Reset ADC
+ *
+ */
+ void static menu_9_reset_ADC(void)
+ {
+    uint32_t new_reset_option = 0;
+    
+    pc.printf(" ADC reset opportunities: \n");
+    pc.printf("  1 - Soft reset - over SPI\n");
+    pc.printf("  2 - Hard reset - uing RESET pin\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_reset_option);
+    pc.putc('\n');
+    
+    switch (new_reset_option) {
+    case 1:
+        ad77681_soft_reset(device_adc);             // Perform soft reset thru SPI write
+        pc.printf(" ADC after soft reset\n");
+        break;
+    case 2:
+        adc_hard_reset();
+        pc.printf(" ADC after hard reset\n");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Reset ADC thru SDP-K1 GPIO
+ * 
+ * 
+ */
+ void static adc_hard_reset()
+{
+    adc_reset = 0;  // Set ADC reset pin to Low
+    mdelay(100);    // Delay 100ms
+    adc_reset = 1;  // Set ADC reset pin to High
+    mdelay(100);    // Delay 100ms
+}
+
+/**
+ * Sleep mode / Wake up ADC
+ *
+ */
+ void static menu_10_power_down(void)
+ {
+    uint32_t new_sleep = 0;
+    
+    pc.printf(" Controll sleep mode of the ADC: \n");
+    pc.printf("  1 - Put ADC to sleep mode\n");
+    pc.printf("  2 - Wake up ADC\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_sleep);
+    pc.putc('\n');
+    
+    switch (new_sleep) {
+    case 1:
+        ad77681_power_down(device_adc, AD77681_SLEEP);
+        pc.printf(" ADC put to sleep mode\n");
+        break;
+    case 2:
+        ad77681_power_down(device_adc, AD77681_WAKE);
+        pc.printf(" ADC powered\n");
+        break;
+    default:
+        pc.printf("Invalid option\n");
+        break;          
+    }
+    print_prompt();
+}
+
+/**
+ * ADC's GPIO Controll
+ *
+ */
+ void static menu_11_ADC_GPIO(void) 
+ {
+    uint8_t GPIO_state;
+    uint32_t new_gpio_sel = 0;
+    char binary_number[8];
+    int32_t ret_val = FAILURE, ret;
+    
+    pc.printf(" ADC GPIO Controll: \n");
+    pc.printf("  1 - Read from GPIO\n");
+    pc.printf("  2 - Write to  GPIO\n");
+    pc.printf("  3 - Set GPIO as input / output\n");
+    pc.printf("  4 - Change GPIO settings\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_gpio_sel);
+    pc.putc('\n');
+    
+    switch (new_gpio_sel) {     
+    case 1:
+        ad77681_gpio_read(device_adc, &GPIO_state, AD77681_ALL_GPIOS);
+        print_binary(GPIO_state, binary_number);
+        pc.printf(" Current GPIO Values:\n GPIO0: %c\n GPIO1: %c\n GPIO2: %c\n GPIO3: %c\n", binary_number[7], binary_number[6], binary_number[5], binary_number[4]);
+        break;      
+    case 2:
+        adc_GPIO_write();
+        break;  
+    case 3:
+        adc_GPIO_inout();       
+        break;          
+    case 4:
+        adc_GPIO_settings();        
+        break;      
+    default:
+        pc.printf(" Invalid option\n");
+        break;  
+    }
+    print_prompt();
+}
+
+/**
+ * Write to GPIOs, part of the ADC_GPIO function
+ *
+ */
+void static adc_GPIO_write(void)
+{
+    uint32_t new_gpio_write = 0, new_value = 0;
+    int32_t ret, ret_val;
+    
+    pc.printf(" Write to GPIO: \n");
+    pc.printf("  1 - Write to all GPIOs\n");
+    pc.printf("  2 - Write to GPIO0\n");
+    pc.printf("  3 - Write to GPIO1\n");
+    pc.printf("  4 - Write to GPIO2\n");
+    pc.printf("  5 - Write to GPIO3\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_gpio_write);      
+    pc.putc('\n');
+        
+    switch (new_gpio_write)
+    {
+    case 1:
+        pc.printf("Insert value to be writen into all GPIOs, same value for all GPIOs: ");
+        ret = getUserInput(&new_value);
+            
+        if (((new_value == GPIO_HIGH) || (new_value == GPIO_LOW)) && (ret == SUCCESS)) {
+            new_value *= 0xF;
+            ret_val = ad77681_gpio_write(device_adc, new_value, AD77681_ALL_GPIOS);
+            pc.printf("\n Value %d successully written to all GPOIs\n", new_value);
+        } else
+            pc.printf("\nInvalid value\n");             
+        break;          
+    case 2:
+        pc.printf("Insert value to be written into GPIO0: ");
+        ret = getUserInput(&new_value); 
+            
+        if (((new_value == GPIO_HIGH) || (new_value == GPIO_LOW)) && (ret == SUCCESS)) {
+            ret_val = ad77681_gpio_write(device_adc, new_value, AD77681_GPIO0);
+            pc.printf("\n Value %d successully written to GPIO0\n", new_value);
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;          
+    case 3:
+        pc.printf("Insert value to be written into GPIO1: ");
+        ret = getUserInput(&new_value);         
+            
+        if (((new_value == GPIO_HIGH) || (new_value == GPIO_LOW)) && (ret == SUCCESS)) {
+            ret_val = ad77681_gpio_write(device_adc, new_value, AD77681_GPIO1);
+            pc.printf("\n Value %d successully written to GPIO1\n", new_value);
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;          
+    case 4:
+        pc.printf("Insert value to be written into GPIO2: ");
+        ret = getUserInput(&new_value); 
+            
+        if (((new_value == GPIO_HIGH) || (new_value == GPIO_LOW)) && (ret == SUCCESS)) {
+            ret_val = ad77681_gpio_write(device_adc, new_value, AD77681_GPIO2);
+            pc.printf("\n Value %d successully written to GPIO2\n", new_value);
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;          
+    case 5:
+        pc.printf("Insert value to be written into GPIO3: ");
+        ret = getUserInput(&new_value); 
+            
+        if (((new_value == GPIO_HIGH) || (new_value == GPIO_LOW)) && (ret == SUCCESS)) {
+            ret_val = ad77681_gpio_write(device_adc, new_value, AD77681_GPIO3);
+            pc.printf("\n Value %d successully written to GPIO3\n", new_value);
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;          
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }   
+}
+
+/**
+ * GPIO direction, part of the ADC_GPIO function
+ *
+ */
+void static adc_GPIO_inout(void)
+{
+    uint32_t new_gpio_inout = 0, new_gpio_inout_set = 0;
+    int32_t ret_val;
+    
+    pc.printf(" Set GPIOs as input or output: \n");
+    pc.printf("  1 - Set all GPIOs\n");
+    pc.printf("  2 - Set GPIO0\n");
+    pc.printf("  3 - Set GPIO1\n");
+    pc.printf("  4 - Set GIPO2\n");
+    pc.printf("  5 - Set GPIO3\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_gpio_inout);  
+    pc.putc('\n');
+        
+    switch (new_gpio_inout) {
+    case 1:
+        pc.printf("   1 - Set all GPIOS as inputs\n");
+        pc.printf("   2 - Set all GPIOS as outputs\n");
+            
+        getUserInput(&new_gpio_inout_set);  
+        pc.putc('\n');          
+            
+        if ((new_gpio_inout_set == 1) || (new_gpio_inout_set == 2)) {
+            new_gpio_inout_set -= 1;
+            new_gpio_inout_set *= 0xF;
+            ret_val = ad77681_gpio_inout(device_adc, new_gpio_inout_set, AD77681_ALL_GPIOS);
+            pc.printf("All GPIOs successfully set");
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;
+    case 2:
+        pc.printf("   1 - Set GPIO0 as input\n");
+        pc.printf("   2 - Set GPIO0 as output\n");
+            
+        getUserInput(&new_gpio_inout_set);  
+        pc.putc('\n');  
+            
+        if ((new_gpio_inout_set == 1) || (new_gpio_inout_set == 2)) {
+            new_gpio_inout_set -= 1;
+            ret_val = ad77681_gpio_inout(device_adc, new_gpio_inout_set, AD77681_GPIO0);
+            pc.printf("GPIO0 successfully set");
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;          
+    case 3:
+        pc.printf("   1 - Set GPIO1 as input\n");
+        pc.printf("   2 - Set GPIO1 as output\n");
+            
+        getUserInput(&new_gpio_inout_set);  
+        pc.putc('\n');  
+            
+        if ((new_gpio_inout_set == 1) || (new_gpio_inout_set == 2)) {
+            new_gpio_inout_set -= 1;
+            ret_val = ad77681_gpio_inout(device_adc, new_gpio_inout_set, AD77681_GPIO1);
+            pc.printf("GPIO1 successfully set");
+        } else
+            pc.printf("\nInvalid value\n");         
+        break;          
+    case 4:
+        pc.printf("   1 - Set GPIO2 as input\n");
+        pc.printf("   2 - Set GPIO2 as output\n");
+            
+        getUserInput(&new_gpio_inout_set);  
+        pc.putc('\n');      
+            
+        if ((new_gpio_inout_set == 1) || (new_gpio_inout_set == 2)) {
+            new_gpio_inout_set -= 1;
+            ret_val = ad77681_gpio_inout(device_adc, new_gpio_inout_set, AD77681_GPIO2);
+            pc.printf("GPIO2 successfully set");
+        } else
+            pc.printf("\nInvalid value\n");
+        break;      
+    case 5:
+        pc.printf("   1 - Set GPIO3 as input\n");
+        pc.printf("   2 - Set GPIO3 as output\n");
+            
+        getUserInput(&new_gpio_inout_set);  
+        pc.putc('\n');  
+            
+        if ((new_gpio_inout_set == 1) || (new_gpio_inout_set == 2)) {
+            new_gpio_inout_set -= 1;
+            ret_val = ad77681_gpio_inout(device_adc, new_gpio_inout_set, AD77681_GPIO3);
+            pc.printf("GPIO3 successfully set");
+        } else
+            pc.printf("\nInvalid value\n");
+        break;              
+    default:
+        pc.printf(" Invalid option\n");
+        break;  
+    }
+}
+
+/**
+ * Additional GPIO settings, part of the ADC_GPIO function
+ *
+ */
+void static adc_GPIO_settings(void)
+{
+    uint32_t new_gpio_settings = 0;
+    
+    pc.printf(" GPIO Settings: \n");
+    pc.printf("  1 - Enable  all GPIOs (Global enble)\n"); 
+    pc.printf("  2 - Disable all GPIOs (Global disable)\n");
+    pc.printf("  3 - Set GPIO0 - GPIO2 as open drain\n");
+    pc.printf("  4 - Set GPIO0 - GPIO2 as strong driver\n");
+    pc.printf(" Select an option: \n");
+        
+    getUserInput(&new_gpio_settings);   
+    pc.putc('\n');
+        
+    switch (new_gpio_settings) {
+    case 1:
+        ad77681_global_gpio(device_adc, AD77681_GLOBAL_GPIO_ENABLE);
+        pc.printf(" Global GPIO enalbe bit enabled");
+        break;
+    case 2:
+        ad77681_global_gpio(device_adc, AD77681_GLOBAL_GPIO_DISABLE);
+        pc.printf(" Global GPIO enalbe bit disabled");
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;  
+    }
+}
+
+/**
+ * Read ADC status from status registers
+ *
+ */
+ void static menu_12_read_master_status(void)
+ {  
+    uint8_t reg_read_buf[3];
+    char binary_number[8];
+    char ok[3] = { 'O', 'K' }, fault[6] = { 'F', 'A', 'U', 'L', 'T' };
+    
+    ad77681_status(device_adc, current_status);
+    pc.putc('\n');
+    pc.printf("== MASTER STATUS REGISER\n");
+    pc.printf("Master error:          %s\n", ((current_status->master_error == 0) ? (ok) : (fault)));
+    pc.printf("ADC error:             %s\n", ((current_status->adc_error == 0) ? (ok) : (fault)));
+    pc.printf("Dig error:             %s\n", ((current_status->dig_error == 0) ? (ok) : (fault)));
+    pc.printf("Ext. clock:            %s\n", ((current_status->adc_err_ext_clk_qual == 0) ? (ok) : (fault)));
+    pc.printf("Filter saturated:      %s\n", ((current_status->adc_filt_saturated == 0) ? (ok) : (fault)));
+    pc.printf("Filter not settled:    %s\n", ((current_status->adc_filt_not_settled == 0) ? (ok) : (fault)));
+    pc.printf("SPI error:             %s\n", ((current_status->spi_error == 0) ? (ok) : (fault)));
+    pc.printf("POR Flag:              %s\n", ((current_status->por_flag == 0) ? (ok) : (fault)));
+    
+    if (current_status->spi_error == 1) {
+        pc.printf("\n== SPI DIAG STATUS REGISER\n");
+        pc.printf("SPI ignore error:      %s\n", ((current_status->spi_ignore == 0) ? (ok) : (fault)));
+        pc.printf("SPI clock count error: %s\n", ((current_status->spi_clock_count == 0) ? (ok) : (fault)));
+        pc.printf("SPI read error:        %s\n", ((current_status->spi_read_error == 0) ? (ok) : (fault)));
+        pc.printf("SPI write error:       %s\n", ((current_status->spi_write_error == 0) ? (ok) : (fault)));
+        pc.printf("SPI CRC error:         %s\n", ((current_status->spi_crc_error == 0) ? (ok) : (fault)));
+    }
+    if (current_status->adc_error == 1) {
+        pc.printf("\n== ADC DIAG STATUS REGISER\n");
+        pc.printf("DLDO PSM error:        %s\n", ((current_status->dldo_psm_error == 0) ? (ok) : (fault)));
+        pc.printf("ALDO PSM error:        %s\n", ((current_status->aldo_psm_error == 0) ? (ok) : (fault)));
+        pc.printf("REF DET error:         %s\n", ((current_status->ref_det_error == 0) ? (ok) : (fault)));
+        pc.printf("FILT SAT error:        %s\n", ((current_status->filt_sat_error == 0) ? (ok) : (fault)));
+        pc.printf("FILT NOT SET error:    %s\n", ((current_status->filt_not_set_error == 0) ? (ok) : (fault)));
+        pc.printf("EXT CLK QUAL error:    %s\n", ((current_status->ext_clk_qual_error == 0) ? (ok) : (fault)));
+    }   
+    if (current_status->dig_error == 1) {
+        pc.printf("\n== DIGITAL DIAG STATUS REGISER\n");
+        pc.printf("Memory map CRC error:  %s\n", ((current_status->memoy_map_crc_error == 0) ? (ok) : (fault)));
+        pc.printf("RAM CRC error:         %s\n", ((current_status->ram_crc_error == 0) ? (ok) : (fault)));
+        pc.printf("FUSE CRC error:        %s\n", ((current_status->fuse_crc_error == 0) ? (ok) : (fault)));
+    }   
+    pc.putc('\n');
+    print_prompt();
+}
+
+/**
+ * Set Vref anc MCLK as "exteranl" values, depending on you setup
+ *
+ */
+ void static menu_13_mclk_vref(void)
+ {
+    uint32_t input = 0, new_settings = 0;
+    int32_t ret;
+        
+    pc.printf(" Set Vref and Mclk: \n");                                
+    pc.printf("  1 - Change Vref\n"); 
+    pc.printf("  2 - Change MCLK\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_settings);    
+    pc.putc('\n');
+    
+    switch (new_settings) {
+    case 1:
+        pc.printf(" Change Vref from %d mV to [mV]: ", device_adc->vref);       // Vref change
+        ret = getUserInput(&input); 
+                
+        if ((input >= 1000) && (input <= 5000) && (ret == SUCCESS)) {
+            pc.printf("\n New Vref value is %d mV", input);
+            device_adc->vref = input;
+            
+            #ifdef CN0540_ADI_FFT_H_
+            // Update the Vref, Mclk and sampling rate
+            update_FFT_enviroment(device_adc->vref, device_adc->mclk, device_adc->sample_rate, FFT_data); 
+            #endif //CN0540_ADI_FFT_H_         
+        } else
+            pc.printf(" Invalid option\n"); 
+        pc.putc('\n');
+        break;      
+    case 2:
+        pc.printf(" Change MCLK from %d kHz to [kHz]: ", device_adc->mclk);     // MCLK change
+        ret = getUserInput(&input);             
+        
+        if ((input >= 10000) && (input <= 50000) && (ret == SUCCESS)){
+            pc.printf("\n New MCLK value is %d kHz\n", input);
+            device_adc->vref = input;
+            ad77681_update_sample_rate(device_adc);                             // Update the sample rate after changinig the MCLK
+            
+            #ifdef CN0540_ADI_FFT_H_
+            // Update the Vref, Mclk and sampling rate
+            update_FFT_enviroment(device_adc->vref, device_adc->mclk, device_adc->sample_rate, FFT_data); 
+            #endif //CN0540_ADI_FFT_H_
+        } else
+            pc.printf(" Invalid option\n"); 
+        pc.putc('\n');
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Print measured data and transfered to voltage
+ *
+ */
+ void static menu_14_print_measured_data(void)
+ {
+    double voltage;
+    int32_t shifted_data;
+    uint16_t i;
+    char buf[15];
+
+    if (measured_data.finish) {
+        // Printing Voltage
+        pc.printf("\n\nVoltage\n");
+        for ( i = 0; i < measured_data.samples; i++) {
+            ad77681_data_to_voltage(device_adc, &measured_data.raw_data[i], &voltage);
+            pc.printf("%.9f \n",voltage);
+        }
+        // Printing Codes
+        pc.printf("\n\nCodes\n");
+        for(i = 0 ; i < measured_data.samples ; i++) {
+            if (measured_data.raw_data[i] & 0x800000)
+                shifted_data = (int32_t)((0xFF << 24) | measured_data.raw_data[i]);
+            else
+                shifted_data = (int32_t)((0x00 << 24) | measured_data.raw_data[i]);
+            pc.printf("%d\n", shifted_data + AD7768_HALF_SCALE);
+        }
+        // Printing Raw Date
+        pc.printf("\n\nRaw data\n");
+        for (i = 0; i < measured_data.samples; i++)
+            pc.printf("%d\n", measured_data.raw_data[i]);
+        // Set  measured_data.finish to false after Printing
+        measured_data.finish = false;
+    } else
+        pc.printf("Data not prepared\n");
+    print_prompt();
+}
+
+/**
+ * Set data output mode
+ *
+ */
+void static menu_15_set_adc_data_output_mode(void) 
+{
+    uint32_t new_data_mode = 0, new_length = 0, new_status = 0, new_crc = 0, ret;
+    
+    pc.printf(" ADC data outpup modes: \n");
+    pc.printf("  1 - Continuous: waiting for DRDY\n");
+    pc.printf("  2 - Continuous one shot: waiting for SYNC_IN\n");
+    pc.printf("  3 - Single-conversion standby\n");
+    pc.printf("  4 - Periodic standby\n");
+    pc.printf("  5 - Standby mode\n");
+    pc.printf("  6 - 16bit or 24bit data format\n");
+    pc.printf("  7 - Status bit output\n");
+    pc.printf("  8 - Switch form diag mode to measure\n");
+    pc.printf("  9 - Switch form measure to diag mode\n");
+    pc.printf(" 10 - Set CRC type\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_data_mode);
+    pc.putc('\n');
+    
+    switch (new_data_mode) {
+    case 1:
+        ad77681_set_conv_mode(device_adc, AD77681_CONV_CONTINUOUS, device_adc->diag_mux_sel, device_adc->conv_diag_sel);// DIAG MUX NOT SELECTED
+        pc.printf(" Continuous mode set\n");
+        break;
+    case 2:
+        ad77681_set_conv_mode(device_adc, AD77681_CONV_ONE_SHOT, device_adc->diag_mux_sel, device_adc->conv_diag_sel);
+        pc.printf(" Continuous one shot conversion set\n");
+        break;
+    case 3:
+        ad77681_set_conv_mode(device_adc, AD77681_CONV_SINGLE, device_adc->diag_mux_sel, device_adc->conv_diag_sel);
+        pc.printf(" Single conversion standby mode set\n");
+        break;  
+    case 4:
+        ad77681_set_conv_mode(device_adc, AD77681_CONV_PERIODIC, device_adc->diag_mux_sel, device_adc->conv_diag_sel);
+        pc.printf(" Periodiec standby mode set\n");
+        break;  
+    case 5:
+        ad77681_set_conv_mode(device_adc, AD77681_CONV_STANDBY, device_adc->diag_mux_sel, device_adc->conv_diag_sel);
+        pc.printf(" Standby mode set\n");
+        break;
+    case 6:
+        pc.printf(" Conversion length select: \n");
+        pc.printf("  1 - 24bit length\n");
+        pc.printf("  2 - 16bit length\n");
+        
+        getUserInput(&new_length);
+        pc.putc('\n');
+        
+        switch (new_length) {
+        case 1:
+            ad77681_set_convlen(device_adc, AD77681_CONV_24BIT);
+            pc.printf(" 24bit data output format selected\n");
+            break;
+        case 2:
+            ad77681_set_convlen(device_adc, AD77681_CONV_16BIT);
+            pc.printf(" 16bit data output format selected\n");
+            break;
+        default:
+            pc.printf(" Invalid option\n");
+            break;
+        }
+        break;
+    case 7:
+        pc.printf(" Status bit output: \n");
+        pc.printf("  1 - Enable status bit after each ADC conversion\n");
+        pc.printf("  2 - Disable status bit after each ADC conversion\n");
+        
+        getUserInput(&new_status);
+        pc.putc('\n');
+        
+        switch (new_status) {
+        case 1:
+            ad77681_set_status_bit(device_adc, true);
+            pc.printf(" Status bit enabled\n");
+            break;
+        case 2:
+            ad77681_set_status_bit(device_adc, false);
+            pc.printf(" Status bit disabled\n");
+            break;
+        default:
+            pc.printf(" Invalid option\n");
+            break;
+        }       
+        break;
+    case 8:
+        ad77681_set_conv_mode(device_adc, device_adc->conv_mode, device_adc->diag_mux_sel, false);// DIAG MUX NOT SELECTED
+        pc.printf(" Measure mode selected\n");
+        break;
+    case 9:
+        ad77681_set_conv_mode(device_adc, device_adc->conv_mode, device_adc->diag_mux_sel, true); // DIAG MUX SELECTED
+        pc.printf(" Diagnostic mode selected\n");
+        break;
+    case 10:
+        pc.printf(" CRC settings \n");
+        pc.printf("  1 - Disable CRC\n");
+        pc.printf("  2 - 8-bit polynomial CRC\n");
+        pc.printf("  3 - XOR based CRC\n");
+        
+        getUserInput(&new_crc);
+        pc.putc('\n');
+        
+        switch (new_crc) {
+        case 1:
+            ad77681_set_crc_sel(device_adc, AD77681_NO_CRC);
+            pc.printf(" CRC disabled\n");
+            break;
+        case 2:
+            ad77681_set_crc_sel(device_adc, AD77681_CRC);
+            pc.printf("  8-bit polynomial CRC method selected\n");
+            break;
+        case 3:
+            ad77681_set_crc_sel(device_adc, AD77681_XOR);
+            pc.printf("  XOR based CRC method selected\n");
+            break;
+        default:
+            pc.printf(" Invalid option\n");
+            break;
+        }
+        break;      
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Set diagnostic mode
+ *
+ */
+void static menu_16_set_adc_diagnostic_mode(void)
+{
+    uint32_t new_diag_mode = 0; 
+        
+    pc.printf(" ADC diagnostic modes: \n");
+    pc.printf("  1 - Internal temperature sensor\n");
+    pc.printf("  2 - AIN shorted\n");
+    pc.printf("  3 - Positive full-scale\n");
+    pc.printf("  4 - Negative full-scale\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_diag_mode);
+    pc.putc('\n');
+    
+    switch (new_diag_mode) {
+    case 1:
+        ad77681_set_conv_mode(device_adc, device_adc->conv_mode, AD77681_TEMP_SENSOR, true);          
+        pc.printf(" Diagnostic mode: Internal temperature sensor selected\n");
+        break;
+    case 2:
+        ad77681_set_conv_mode(device_adc, device_adc->conv_mode, AD77681_AIN_SHORT, true);
+        pc.printf(" Diagnostic mode: AIN shorted selected\n");
+        break;
+    case 3:
+        ad77681_set_conv_mode(device_adc, device_adc->conv_mode, AD77681_POSITIVE_FS, true);
+        pc.printf(" Diagnostic mode: Positive full-scale selected\n");
+        break;  
+    case 4:
+        ad77681_set_conv_mode(device_adc, device_adc->conv_mode, AD77681_NEGATIVE_FS, true);
+        pc.printf(" Diagnostic mode: Negative full-scale selected\n");
+        break;  
+    default:
+        pc.printf(" Invalid option\n");
+        break;
+    }
+    print_prompt();
+}
+
+/**
+ * Do the FFT
+ *
+ */
+void static menu_17_do_the_fft(void)
+{
+    pc.printf(" FFT in progress...\n");
+    measured_data.samples = FFT_data->fft_length * 2;
+    measured_data.samples = 4096;
+    measured_data.finish = false;
+    measured_data.count = 0;
+    pc.printf("Sampling....\n");
+    cont_sampling();
+    perform_FFT(measured_data.raw_data, FFT_data, FFT_meas, device_adc->sample_rate);
+    pc.printf(" FFT Done!\n");
+    measured_data.finish = false;   
+    
+    pc.printf("\n THD:\t\t%.3f dB", FFT_meas->THD);
+    pc.printf("\n SNR:\t\t%.3f dB", FFT_meas->SNR);
+    pc.printf("\n DR:\t\t%.3f dB", FFT_meas->DR);
+    pc.printf("\n Fundamental:\t%.3f dBFS", FFT_meas->harmonics_mag_dbfs[0]);
+    pc.printf("\n Fundamental:\t%.3f Hz", FFT_meas->harmonics_freq[0]*FFT_data->bin_width);
+    pc.printf("\n RMS noise:\t%.6f uV", FFT_meas->RMS_noise * 1000000.0);
+    pc.printf("\n LSB noise:\t%.3f", FFT_meas->transition_noise_LSB);
+    
+    print_prompt(); 
+}
+
+/**
+ * Setting of the FFT module
+ *
+ */
+void static menu_18_fft_settings(void)
+{
+    uint32_t new_menu_select, new_window, new_sample_count;
+    
+    pc.printf(" FFT settings: \n");
+    pc.printf("  1 - Set window type\n");
+    pc.printf("  2 - Set sample count\n");
+    pc.printf("  3 - Print FFT plot\n");
+    pc.printf(" Select an option: \n\n");
+    
+    getUserInput(&new_menu_select);
+    
+    switch (new_menu_select) {
+    case 1:
+        pc.printf(" Choose window type:\n");
+        pc.printf("  1 - 7-term Blackman-Harris\n");
+        pc.printf("  2 - Rectangular\n");
+        
+        getUserInput(&new_window);
+        
+        switch (new_window) {
+        case 1:
+            pc.printf("  7-7-term Blackman-Harris window selected\n");
+            FFT_data->window = BLACKMAN_HARRIS_7TERM;
+            break;          
+        case 2:
+            pc.printf("  Rectalngular window selected\n");
+            FFT_data->window = RECTANGULAR;
+            break;          
+        default:
+            pc.printf(" Invalid option\n");
+            break;
+        }       
+        break;  
+    case 2:
+        pc.printf(" Set sample count:\n");
+        pc.printf("  1 - 4096 samples\n");
+        pc.printf("  2 - 1024 samples\n");
+        pc.printf("  3 - 256  samples\n");
+        pc.printf("  4 - 64   samples\n");
+        pc.printf("  5 - 16   samples\n");
+        
+        getUserInput(&new_sample_count);
+        
+        switch (new_sample_count) {
+        case 1:
+            pc.printf(" 4096 samples selected\n");
+            FFT_init(4096, FFT_data);   // Update the FFT module with a new sample count
+            break;
+        case 2:
+            pc.printf(" 1024 samples selected\n");
+            FFT_init(1024, FFT_data);
+            break;
+        case 3:
+            pc.printf(" 256 samples selected\n");
+            FFT_init(256, FFT_data);
+            break;
+        case 4:
+            pc.printf(" 64 samples selected\n");
+            FFT_init(64, FFT_data);
+            break;
+        case 5:
+            pc.printf(" 16 samples selected\n");
+            FFT_init(16, FFT_data);
+            break;
+        default:
+            pc.printf(" Invalid option\n");
+            break;  
+        }       
+        break;      
+    case 3:
+        if (FFT_data->fft_done == true) {
+            pc.printf(" Printing FFT plot in dB:\n");
+            
+            for (uint16_t i = 0; i < FFT_data->fft_length; i++)
+                pc.printf("%.4f\n", FFT_data->fft_dB[i]);
+        }
+        else
+            pc.printf(" Data not prepared\n");
+        break;      
+    default:
+        pc.printf(" Invalid option\n");
+        break;  
+    }
+    print_prompt(); 
+}
+
+/**
+ * Set Gains and Offsets
+ *
+ */
+void static menu_19_gains_offsets(void)
+{
+    uint32_t gain_offset, new_menu_select;
+    int32_t ret;
+    
+    pc.printf(" Gains and Offsets settings: \n");
+    pc.printf("  1 - Set gain\n");
+    pc.printf("  2 - Set offset\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_menu_select);
+    
+    switch (new_menu_select) {
+    case 1:
+        pc.printf(" Insert new Gain value in decimal form\n");
+        ret = getUserInput(&gain_offset);
+        
+        if ((gain_offset <= 0xFFFFFF) && (ret == SUCCESS)) {
+            ad77681_apply_gain(device_adc, gain_offset);
+            pc.printf(" Value %d has been successfully inserted to the Gain register\n", gain_offset);
+        } else
+            pc.printf(" Invalid value\n");      
+        break;  
+    case 2:
+        pc.printf(" Insert new Offset value in decimal form\n");
+        ret = getUserInput(&gain_offset);       
+        if ((gain_offset <= 0xFFFFFF) && (ret == SUCCESS)) {
+            ad77681_apply_offset(device_adc, gain_offset);
+            pc.printf(" Value %d has been successfully inserted to the Offset register\n", gain_offset);
+        } else
+            pc.printf(" Invalid value\n");      
+        break;      
+    default:
+        pc.printf(" Invalid option\n");
+        break;  
+    }   
+    print_prompt();     
+}
+
+/**
+ * Chceck read and write functionaity by writing to and reading from scratchpad register
+ *
+ */
+void static menu_20_check_scratchpad(void)
+{
+    int32_t ret;
+    uint32_t ret_val;
+    uint32_t new_menu_select;
+    uint8_t chceck_sequence;
+    
+    pc.printf(" Scratchpad check\n");
+    pc.printf("  Insert 8bit number for scratchpad check: \n");
+    
+    ret = getUserInput(&new_menu_select);
+    
+    if ((new_menu_select <= 0xFF) && (new_menu_select >= 0) && (ret == SUCCESS)) {
+        chceck_sequence = (uint8_t)(new_menu_select);
+        ret_val = ad77681_scratchpad(device_adc, &chceck_sequence);
+        pc.printf("  Insered sequence:  %d\n  Returned sequence: %d\n", new_menu_select, chceck_sequence);
+        if (ret_val == SUCCESS)
+            pc.printf("  SUCCESS!\n");
+        else
+            pc.printf("  FAILURE!\n");
+    } else
+        pc.printf("  Invalid value\n"); 
+    print_prompt(); 
+}
+
+/**
+ * Start with the piezo accelerometer offset compensation
+ * The offset compenzation process uses a successive approximation model
+ * There is lot of averaging going on, because of quite noisy piezo accelerometer
+ * It will take some time
+ */
+void static menu_21_piezo_offset(void)
+{
+    uint8_t ltc2606_res = 16;
+    uint32_t dac_code = 0;
+    uint32_t dac_code_arr[16];
+    double mean_voltage = 0.0, min_voltage; 
+    double mean_voltage_arr[16];
+    int8_t sar_loop, min_find, min_index;
+    uint16_t SINC3_odr;    
+
+    // Low power mode and MCLK/16
+    ad77681_set_power_mode(device_adc, AD77681_ECO);
+    ad77681_set_mclk_div(device_adc, AD77681_MCLK_DIV_16); 
+    
+    // 4SPS = 7999 SINC3, 10SPS = 3199 SINC3, 50SPS = 639 SINC3
+    ad77681_SINC3_ODR(device_adc, &SINC3_odr, 4);  
+    // Set the oversamplig ratio to high value, to extract DC                                                                       
+    ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_SINC3, SINC3_odr);
+ 
+    // successive approximation algorithm
+    pc.printf("\nInitialize SAR loop (DAC MSB set to high)\n");
+    // Set DAC code to half scale  
+    dac_code = (1 << (ltc2606_res - 1 ));
+    // Update output of the DAC
+    ltc26x6_write_code(device_dac, write_update_command, dac_code);
+    // Wait for DAC output to settle                                    
+    wait_ms(500);
+    // Set desired number of samples for every iteration           
+    measured_data.samples = 100;
+    measured_data.finish = false;
+    measured_data.count = 0;
+    // Take X number of samples
+    cont_sampling();
+    // Get the mean voltage of taken samples stroed in the measured_data strucutre                                                                 
+    get_mean_voltage(&measured_data, &mean_voltage);
+    // Print the mean ADC read voltage for a given DAC code 
+    pc.printf("DAC code:%x\t\tMean Voltage: %.6f\n", dac_code, mean_voltage);
+    // Store the initial DAC code in the array
+    dac_code_arr[ltc2606_res - 1] = dac_code;
+    // Store the initial mean voltage in the array
+    mean_voltage_arr[ltc2606_res - 1] = mean_voltage;
+    
+    for ( sar_loop = ltc2606_res - 1; sar_loop > 0; sar_loop--) {
+        // Check if the mean voltage is positive or negative
+        if (mean_voltage > 0) {
+            dac_code = dac_code + (1 << (sar_loop - 1));
+            pc.printf("UP\n\n");           
+        } else {
+            dac_code = dac_code - (1 << (sar_loop)) + (1 << (sar_loop-1));
+            pc.printf("DOWN\n\n");        
+        }  
+        // Print loop coard
+        pc.printf("SAR loop #: %d\n",sar_loop);
+        // Update output of the DAC
+        ltc26x6_write_code(device_dac, write_update_command, dac_code);
+        // Wait for DAC output to settle                                    
+        wait_ms(500);
+        // Clear data finish flag
+        measured_data.finish = false;
+        measured_data.count = 0;
+        // Take X number of samples
+        cont_sampling();
+        // Get the mean voltage of taken samples stroed in the measured_data strucutre                                                                 
+        get_mean_voltage(&measured_data, &mean_voltage); 
+        pc.printf("DAC code:%x\t\tMean Voltage: %.6f\n", dac_code, mean_voltage); 
+        dac_code_arr[sar_loop - 1] = dac_code;
+        mean_voltage_arr[sar_loop - 1] = mean_voltage;
+    }
+    min_voltage = abs(mean_voltage_arr[0]); 
+    for (min_find = 0;  min_find < 16; min_find++) {
+        if (min_voltage > abs(mean_voltage_arr[min_find])) {
+            min_voltage = abs(mean_voltage_arr[min_find]);
+            min_index = min_find;
+        } 
+    }
+    ltc26x6_write_code(device_dac, write_update_command, dac_code_arr[min_index]);
+    // Wait for DAC output to settle                                    
+    wait_ms(500);
+    // Print the final DAC code               
+    pc.printf("\nFinal DAC code set to:%x\t\tFinal Mean Voltage: %.6f\n", dac_code_arr[min_index], mean_voltage_arr[min_index]);                
+    // Set to original filter
+    ad77681_set_filter_type(device_adc, AD77681_SINC5_FIR_DECx32, AD77681_FIR, 0);
+    ad77681_update_sample_rate(device_adc);      
+    pc.printf("\nOffset compenzation done!\n"); 
+    print_prompt(); 
+}
+
+/**
+ * Get mean from sampled data
+ * @param mean_voltage      Mean Voltage
+ * @param measured_data     The structure carying measured data
+ */
+void static get_mean_voltage(struct adc_data *measured_data, double *mean_voltage)
+{
+    int32_t shifted_data;
+    double sum = 0, voltage = 0;
+    uint16_t i;
+    
+    for ( i = 0; i < measured_data->samples; i++) {
+        ad77681_data_to_voltage(device_adc, &measured_data->raw_data[i], &voltage);
+        sum += voltage; 
+    }
+    *mean_voltage = (double)(sum / (double)(measured_data->samples));    
+}
+
+/**
+ * Set output of the on-board DAC in codes or in voltage
+ *
+ */
+void static menu_22_set_DAC_output(void)
+{
+    int16_t dac_status;
+    uint16_t  code ;
+    uint32_t new_menu_select, new_dac;
+    float dac_voltage;
+    // Gain factor of the on-board DAC buffer, to have full 5V range(ADA4807-1ARJZ)
+    // Non-inverting op-amp resistor ratio => 1 + (2.7 k ohm / 2.7 k ohm)
+    float buffer_gain =  2;  
+    
+    pc.printf(" Set DAC output: \n");
+    pc.printf("  1 - Voltage\n");
+    pc.printf("  2 - Codes\n");
+    pc.printf(" Select an option: \n");
+    
+    getUserInput(&new_menu_select);
+    
+    switch (new_menu_select) {
+    case 1:
+        pc.printf(" Set DAC output in mV: ");
+        getUserInput(&new_dac);
+        
+        dac_voltage = ((float)(new_dac) / 1000.0) / buffer_gain;
+        ltc26x6_voltage_to_code(device_dac, dac_voltage, &code);    
+        ltc26x6_write_code(device_dac, write_update_command, code); 
+        if (dac_status == SUCCESS)    
+            pc.printf("%.3f V at Shift output\n\n", dac_voltage * buffer_gain); 
+        else if (dac_status == LTC26X6_CODE_OVERFLOW)
+            pc.printf("%.3f V at Shift output, OVERFLOW!\n\n", dac_voltage * buffer_gain);  
+        else if (dac_status == LTC26X6_CODE_UNDERFLOW)
+            pc.printf("%.3f V at Shift output, UNDERFLOW!\n\n", dac_voltage * buffer_gain);     
+        break;      
+    case 2:
+        pc.printf(" Set DAC codes in decimal form: ");
+        getUserInput(&new_dac);             
+        ltc26x6_write_code(device_dac, write_update_command, new_dac);
+        pc.printf("%x at DAC output\n\n", new_dac);     
+        break;
+    default:
+        pc.printf(" Invalid option\n");
+        break;          
+    }
+    print_prompt();
+}
+
+/**
+ * Prints out an array in binary form
+ *
+ */
+void static print_binary(uint8_t number, char *binary_number)
+{   
+    for (int8_t i = 7; i >= 0; i--) {
+        if (number & 1)
+            binary_number[i] = '1';
+        else
+            binary_number[i] = '0';
+        number >>= 1;
+    }
+}
+
+/**
+ * Setup SDP-K1 GPIOs
+ * 
+ * 
+ */
+ void static sdpk1_gpio_setup(void)
+{
+    // Enable DAC buffer & other buffer
+    buffer_en = GPIO_HIGH;
+    // Turn on onboard red LED
+    led_red = GPIO_HIGH;
+    // Turn on onboard blue LED
+    led_blue = GPIO_HIGH;
+}
\ No newline at end of file

diff -r b9debc14d077 -r 9dd7c64b4a64 main.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/main.h	Mon Dec 06 05:22:28 2021 +0000
@@ -0,0 +1,62 @@
+/******************************************************************************
+ *Copyright (c)2020 Analog Devices, Inc.  
+ *
+ * Licensed under the 2020-04-27-CN0540EC License(the "License");
+ * you may not use this file except in compliance with the License.
+ *
+ ****************************************************************************/
+
+#ifndef _MAIN_H_
+#define _MAIN_H_
+
+#include <stdint.h>
+#include <string>
+using namespace std;
+
+void drdy_interrupt();
+int32_t static getUserInput(uint32_t *UserInput);
+void static go_to_error();
+void static print_title();
+void static print_prompt();
+int32_t static getMenuSelect(uint16_t *menuSelect);
+int32_t static getLargeMenuSelect(uint32_t *largeSelect);
+void static print_binary(uint8_t number, char *binary_number);
+void static menu_1_set_adc_powermode(void);
+void static menu_2_set_adc_clock_divider(void);
+void static menu_3_set_adc_filter_type(void);
+void static set_adc_FIR_filter(void);
+void static set_adc_SINC5_filter(void);
+void static set_adc_SINC3_filter(void);
+void static set_adc_50HZ_rej(void);
+void static set_adc_user_defined_FIR(void);
+void static menu_4_adc_buffers_controll(void);
+void static menu_5_set_default_settings(void);
+void static menu_6_set_adc_vcm(void);
+void static menu_7_adc_read_register(void);
+void static menu_8_adc_cont_read_data(void);
+void static adc_data_read(void);
+void static cont_sampling();
+void static menu_9_reset_ADC(void);
+void static menu_10_power_down(void);
+void static menu_11_ADC_GPIO(void);
+void static adc_GPIO_write(void);
+void static adc_GPIO_inout(void);
+void static adc_GPIO_settings(void);
+void static menu_12_read_master_status(void);
+void static menu_13_mclk_vref(void);
+void static menu_14_print_measured_data(void);
+void static menu_15_set_adc_data_output_mode(void);
+void static menu_16_set_adc_diagnostic_mode(void);
+void static menu_17_do_the_fft(void);
+void static menu_18_fft_settings(void);
+void static menu_19_gains_offsets(void);
+void static menu_20_check_scratchpad(void);
+void static menu_21_piezo_offset(void);
+void static menu_22_set_DAC_output(void);
+void static get_mean_voltage(struct adc_data *measured_data, double *mean_voltage);
+void static adc_hard_reset(void);
+void static sdpk1_gpio_setup(void);
+
+#endif // !_MAIN_H_
+
+

diff -r b9debc14d077 -r 9dd7c64b4a64 mbed-os.lib
--- a/mbed-os.lib	Tue May 11 11:08:02 2021 +0000
+++ b/mbed-os.lib	Mon Dec 06 05:22:28 2021 +0000
@@ -1,1 +1,2 @@
-https://github.com/ARMmbed/mbed-os/#8ef0a435b2356f8159dea8e427b2935d177309f8
+https://github.com/ARMmbed/mbed-os/#73f096399b4cda1f780b140c87afad9446047432
+

diff -r b9debc14d077 -r 9dd7c64b4a64 platform_drivers.lib
--- a/platform_drivers.lib	Tue May 11 11:08:02 2021 +0000
+++ b/platform_drivers.lib	Mon Dec 06 05:22:28 2021 +0000
@@ -1,1 +1,1 @@
-https://os.mbed.com/teams/AnalogDevices/code/platform_drivers/#61ad39564f45
+https://os.mbed.com/teams/AnalogDevices/code/platform_drivers/#70fc373a5f46

Repository toolbox

Export to desktop IDE

Build repository

Repository details

Type:	Program
Mbed OS support:	Mbed OS
Created:	11 May 2021
Imports:	27
Forks:	0
Commits:	2
Dependents:	0
Dependencies:	3
Followers:	110

This repository is Public (Unlisted).

Revision 1:9dd7c64b4a64, committed 2021-12-06

Changed in this revision

Repository toolbox

Repository details

Important Information for this Arm website

Access Warning