CMSIS_DSP_401 - V4.0.1 of the ARM CMSIS DSP libraries. Note that…

Users » emh203 » Code » CMSIS_DSP_401

V4.0.1 of the ARM CMSIS DSP libraries. Note that arm_bitreversal2.s, arm_cfft_f32.c and arm_rfft_fast_f32.c had to be removed. arm_bitreversal2.s will not assemble with the online tools. So, the fast f32 FFT functions are not yet available. All the other FFT functions are available.

Dependents: MPU9150_Example fir_f32 fir_f32 MPU9150_nucleo_noni2cdev ... more

FilteringFunctions/arm_conv_opt_q7.c@0:3d9c67d97d6f, 2014-07-28 (annotated)

Committer:: emh203
Date:: Mon Jul 28 15:03:15 2014 +0000
Revision:: 0:3d9c67d97d6f

1st working commit.   Had to remove arm_bitreversal2.s     arm_cfft_f32.c and arm_rfft_fast_f32.c.    The .s will not assemble.      For now I removed these functions so we could at least have a library for the other functions.

Who changed what in which revision?

User	Revision	Line number	New contents of line
emh203	0:3d9c67d97d6f	1	/* ----------------------------------------------------------------------
emh203	0:3d9c67d97d6f	2	* Copyright (C) 2010-2014 ARM Limited. All rights reserved.
emh203	0:3d9c67d97d6f	3	*
emh203	0:3d9c67d97d6f	4	* $Date: 12. March 2014
emh203	0:3d9c67d97d6f	5	* $Revision: V1.4.3
emh203	0:3d9c67d97d6f	6	*
emh203	0:3d9c67d97d6f	7	* Project: CMSIS DSP Library
emh203	0:3d9c67d97d6f	8	* Title: arm_conv_opt_q7.c
emh203	0:3d9c67d97d6f	9	*
emh203	0:3d9c67d97d6f	10	* Description: Convolution of Q7 sequences.
emh203	0:3d9c67d97d6f	11	*
emh203	0:3d9c67d97d6f	12	* Target Processor: Cortex-M4/Cortex-M3/Cortex-M0
emh203	0:3d9c67d97d6f	13	*
emh203	0:3d9c67d97d6f	14	* Redistribution and use in source and binary forms, with or without
emh203	0:3d9c67d97d6f	15	* modification, are permitted provided that the following conditions
emh203	0:3d9c67d97d6f	16	* are met:
emh203	0:3d9c67d97d6f	17	* - Redistributions of source code must retain the above copyright
emh203	0:3d9c67d97d6f	18	* notice, this list of conditions and the following disclaimer.
emh203	0:3d9c67d97d6f	19	* - Redistributions in binary form must reproduce the above copyright
emh203	0:3d9c67d97d6f	20	* notice, this list of conditions and the following disclaimer in
emh203	0:3d9c67d97d6f	21	* the documentation and/or other materials provided with the
emh203	0:3d9c67d97d6f	22	* distribution.
emh203	0:3d9c67d97d6f	23	* - Neither the name of ARM LIMITED nor the names of its contributors
emh203	0:3d9c67d97d6f	24	* may be used to endorse or promote products derived from this
emh203	0:3d9c67d97d6f	25	* software without specific prior written permission.
emh203	0:3d9c67d97d6f	26	*
emh203	0:3d9c67d97d6f	27	* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
emh203	0:3d9c67d97d6f	28	* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
emh203	0:3d9c67d97d6f	29	* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
emh203	0:3d9c67d97d6f	30	* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
emh203	0:3d9c67d97d6f	31	* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
emh203	0:3d9c67d97d6f	32	* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
emh203	0:3d9c67d97d6f	33	* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
emh203	0:3d9c67d97d6f	34	* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
emh203	0:3d9c67d97d6f	35	* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
emh203	0:3d9c67d97d6f	36	* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
emh203	0:3d9c67d97d6f	37	* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
emh203	0:3d9c67d97d6f	38	* POSSIBILITY OF SUCH DAMAGE.
emh203	0:3d9c67d97d6f	39	* -------------------------------------------------------------------- */
emh203	0:3d9c67d97d6f	40
emh203	0:3d9c67d97d6f	41	#include "arm_math.h"
emh203	0:3d9c67d97d6f	42
emh203	0:3d9c67d97d6f	43	/**
emh203	0:3d9c67d97d6f	44	* @ingroup groupFilters
emh203	0:3d9c67d97d6f	45	*/
emh203	0:3d9c67d97d6f	46
emh203	0:3d9c67d97d6f	47	/**
emh203	0:3d9c67d97d6f	48	* @addtogroup Conv
emh203	0:3d9c67d97d6f	49	* @{
emh203	0:3d9c67d97d6f	50	*/
emh203	0:3d9c67d97d6f	51
emh203	0:3d9c67d97d6f	52	/**
emh203	0:3d9c67d97d6f	53	* @brief Convolution of Q7 sequences.
emh203	0:3d9c67d97d6f	54	* @param[in] *pSrcA points to the first input sequence.
emh203	0:3d9c67d97d6f	55	* @param[in] srcALen length of the first input sequence.
emh203	0:3d9c67d97d6f	56	* @param[in] *pSrcB points to the second input sequence.
emh203	0:3d9c67d97d6f	57	* @param[in] srcBLen length of the second input sequence.
emh203	0:3d9c67d97d6f	58	* @param[out] *pDst points to the location where the output result is written. Length srcALen+srcBLen-1.
emh203	0:3d9c67d97d6f	59	* @param[in] pScratch1 points to scratch buffer(of type q15_t) of size max(srcALen, srcBLen) + 2min(srcALen, srcBLen) - 2.
emh203	0:3d9c67d97d6f	60	* @param[in] *pScratch2 points to scratch buffer (of type q15_t) of size min(srcALen, srcBLen).
emh203	0:3d9c67d97d6f	61	* @return none.
emh203	0:3d9c67d97d6f	62	*
emh203	0:3d9c67d97d6f	63	* \par Restrictions
emh203	0:3d9c67d97d6f	64	* If the silicon does not support unaligned memory access enable the macro UNALIGNED_SUPPORT_DISABLE
emh203	0:3d9c67d97d6f	65	* In this case input, output, scratch1 and scratch2 buffers should be aligned by 32-bit
emh203	0:3d9c67d97d6f	66	*
emh203	0:3d9c67d97d6f	67	* @details
emh203	0:3d9c67d97d6f	68	* <b>Scaling and Overflow Behavior:</b>
emh203	0:3d9c67d97d6f	69	*
emh203	0:3d9c67d97d6f	70	* \par
emh203	0:3d9c67d97d6f	71	* The function is implemented using a 32-bit internal accumulator.
emh203	0:3d9c67d97d6f	72	* Both the inputs are represented in 1.7 format and multiplications yield a 2.14 result.
emh203	0:3d9c67d97d6f	73	* The 2.14 intermediate results are accumulated in a 32-bit accumulator in 18.14 format.
emh203	0:3d9c67d97d6f	74	* This approach provides 17 guard bits and there is no risk of overflow as long as <code>max(srcALen, srcBLen)<131072</code>.
emh203	0:3d9c67d97d6f	75	* The 18.14 result is then truncated to 18.7 format by discarding the low 7 bits and then saturated to 1.7 format.
emh203	0:3d9c67d97d6f	76	*
emh203	0:3d9c67d97d6f	77	*/
emh203	0:3d9c67d97d6f	78
emh203	0:3d9c67d97d6f	79	void arm_conv_opt_q7(
emh203	0:3d9c67d97d6f	80	q7_t * pSrcA,
emh203	0:3d9c67d97d6f	81	uint32_t srcALen,
emh203	0:3d9c67d97d6f	82	q7_t * pSrcB,
emh203	0:3d9c67d97d6f	83	uint32_t srcBLen,
emh203	0:3d9c67d97d6f	84	q7_t * pDst,
emh203	0:3d9c67d97d6f	85	q15_t * pScratch1,
emh203	0:3d9c67d97d6f	86	q15_t * pScratch2)
emh203	0:3d9c67d97d6f	87	{
emh203	0:3d9c67d97d6f	88
emh203	0:3d9c67d97d6f	89	q15_t pScr2, pScr1; /* Intermediate pointers for scratch pointers */
emh203	0:3d9c67d97d6f	90	q15_t x4; /* Temporary input variable */
emh203	0:3d9c67d97d6f	91	q7_t pIn1, pIn2; /* inputA and inputB pointer */
emh203	0:3d9c67d97d6f	92	uint32_t j, k, blkCnt, tapCnt; /* loop counter */
emh203	0:3d9c67d97d6f	93	q7_t px; / Temporary input1 pointer */
emh203	0:3d9c67d97d6f	94	q15_t py; / Temporary input2 pointer */
emh203	0:3d9c67d97d6f	95	q31_t acc0, acc1, acc2, acc3; /* Accumulator */
emh203	0:3d9c67d97d6f	96	q31_t x1, x2, x3, y1; /* Temporary input variables */
emh203	0:3d9c67d97d6f	97	q7_t pOut = pDst; / output pointer */
emh203	0:3d9c67d97d6f	98	q7_t out0, out1, out2, out3; /* temporary variables */
emh203	0:3d9c67d97d6f	99
emh203	0:3d9c67d97d6f	100	/* The algorithm implementation is based on the lengths of the inputs. */
emh203	0:3d9c67d97d6f	101	/* srcB is always made to slide across srcA. */
emh203	0:3d9c67d97d6f	102	/* So srcBLen is always considered as shorter or equal to srcALen */
emh203	0:3d9c67d97d6f	103	if(srcALen >= srcBLen)
emh203	0:3d9c67d97d6f	104	{
emh203	0:3d9c67d97d6f	105	/* Initialization of inputA pointer */
emh203	0:3d9c67d97d6f	106	pIn1 = pSrcA;
emh203	0:3d9c67d97d6f	107
emh203	0:3d9c67d97d6f	108	/* Initialization of inputB pointer */
emh203	0:3d9c67d97d6f	109	pIn2 = pSrcB;
emh203	0:3d9c67d97d6f	110	}
emh203	0:3d9c67d97d6f	111	else
emh203	0:3d9c67d97d6f	112	{
emh203	0:3d9c67d97d6f	113	/* Initialization of inputA pointer */
emh203	0:3d9c67d97d6f	114	pIn1 = pSrcB;
emh203	0:3d9c67d97d6f	115
emh203	0:3d9c67d97d6f	116	/* Initialization of inputB pointer */
emh203	0:3d9c67d97d6f	117	pIn2 = pSrcA;
emh203	0:3d9c67d97d6f	118
emh203	0:3d9c67d97d6f	119	/* srcBLen is always considered as shorter or equal to srcALen */
emh203	0:3d9c67d97d6f	120	j = srcBLen;
emh203	0:3d9c67d97d6f	121	srcBLen = srcALen;
emh203	0:3d9c67d97d6f	122	srcALen = j;
emh203	0:3d9c67d97d6f	123	}
emh203	0:3d9c67d97d6f	124
emh203	0:3d9c67d97d6f	125	/* pointer to take end of scratch2 buffer */
emh203	0:3d9c67d97d6f	126	pScr2 = pScratch2;
emh203	0:3d9c67d97d6f	127
emh203	0:3d9c67d97d6f	128	/* points to smaller length sequence */
emh203	0:3d9c67d97d6f	129	px = pIn2 + srcBLen - 1;
emh203	0:3d9c67d97d6f	130
emh203	0:3d9c67d97d6f	131	/* Apply loop unrolling and do 4 Copies simultaneously. */
emh203	0:3d9c67d97d6f	132	k = srcBLen >> 2u;
emh203	0:3d9c67d97d6f	133
emh203	0:3d9c67d97d6f	134	/* First part of the processing with loop unrolling copies 4 data points at a time.
emh203	0:3d9c67d97d6f	135	** a second loop below copies for the remaining 1 to 3 samples. */
emh203	0:3d9c67d97d6f	136	while(k > 0u)
emh203	0:3d9c67d97d6f	137	{
emh203	0:3d9c67d97d6f	138	/* copy second buffer in reversal manner */
emh203	0:3d9c67d97d6f	139	x4 = (q15_t) * px--;
emh203	0:3d9c67d97d6f	140	*pScr2++ = x4;
emh203	0:3d9c67d97d6f	141	x4 = (q15_t) * px--;
emh203	0:3d9c67d97d6f	142	*pScr2++ = x4;
emh203	0:3d9c67d97d6f	143	x4 = (q15_t) * px--;
emh203	0:3d9c67d97d6f	144	*pScr2++ = x4;
emh203	0:3d9c67d97d6f	145	x4 = (q15_t) * px--;
emh203	0:3d9c67d97d6f	146	*pScr2++ = x4;
emh203	0:3d9c67d97d6f	147
emh203	0:3d9c67d97d6f	148	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	149	k--;
emh203	0:3d9c67d97d6f	150	}
emh203	0:3d9c67d97d6f	151
emh203	0:3d9c67d97d6f	152	/* If the count is not a multiple of 4, copy remaining samples here.
emh203	0:3d9c67d97d6f	153	** No loop unrolling is used. */
emh203	0:3d9c67d97d6f	154	k = srcBLen % 0x4u;
emh203	0:3d9c67d97d6f	155
emh203	0:3d9c67d97d6f	156	while(k > 0u)
emh203	0:3d9c67d97d6f	157	{
emh203	0:3d9c67d97d6f	158	/* copy second buffer in reversal manner for remaining samples */
emh203	0:3d9c67d97d6f	159	x4 = (q15_t) * px--;
emh203	0:3d9c67d97d6f	160	*pScr2++ = x4;
emh203	0:3d9c67d97d6f	161
emh203	0:3d9c67d97d6f	162	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	163	k--;
emh203	0:3d9c67d97d6f	164	}
emh203	0:3d9c67d97d6f	165
emh203	0:3d9c67d97d6f	166	/* Initialze temporary scratch pointer */
emh203	0:3d9c67d97d6f	167	pScr1 = pScratch1;
emh203	0:3d9c67d97d6f	168
emh203	0:3d9c67d97d6f	169	/* Fill (srcBLen - 1u) zeros in scratch buffer */
emh203	0:3d9c67d97d6f	170	arm_fill_q15(0, pScr1, (srcBLen - 1u));
emh203	0:3d9c67d97d6f	171
emh203	0:3d9c67d97d6f	172	/* Update temporary scratch pointer */
emh203	0:3d9c67d97d6f	173	pScr1 += (srcBLen - 1u);
emh203	0:3d9c67d97d6f	174
emh203	0:3d9c67d97d6f	175	/* Copy (srcALen) samples in scratch buffer */
emh203	0:3d9c67d97d6f	176	/* Apply loop unrolling and do 4 Copies simultaneously. */
emh203	0:3d9c67d97d6f	177	k = srcALen >> 2u;
emh203	0:3d9c67d97d6f	178
emh203	0:3d9c67d97d6f	179	/* First part of the processing with loop unrolling copies 4 data points at a time.
emh203	0:3d9c67d97d6f	180	** a second loop below copies for the remaining 1 to 3 samples. */
emh203	0:3d9c67d97d6f	181	while(k > 0u)
emh203	0:3d9c67d97d6f	182	{
emh203	0:3d9c67d97d6f	183	/* copy second buffer in reversal manner */
emh203	0:3d9c67d97d6f	184	x4 = (q15_t) * pIn1++;
emh203	0:3d9c67d97d6f	185	*pScr1++ = x4;
emh203	0:3d9c67d97d6f	186	x4 = (q15_t) * pIn1++;
emh203	0:3d9c67d97d6f	187	*pScr1++ = x4;
emh203	0:3d9c67d97d6f	188	x4 = (q15_t) * pIn1++;
emh203	0:3d9c67d97d6f	189	*pScr1++ = x4;
emh203	0:3d9c67d97d6f	190	x4 = (q15_t) * pIn1++;
emh203	0:3d9c67d97d6f	191	*pScr1++ = x4;
emh203	0:3d9c67d97d6f	192
emh203	0:3d9c67d97d6f	193	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	194	k--;
emh203	0:3d9c67d97d6f	195	}
emh203	0:3d9c67d97d6f	196
emh203	0:3d9c67d97d6f	197	/* If the count is not a multiple of 4, copy remaining samples here.
emh203	0:3d9c67d97d6f	198	** No loop unrolling is used. */
emh203	0:3d9c67d97d6f	199	k = srcALen % 0x4u;
emh203	0:3d9c67d97d6f	200
emh203	0:3d9c67d97d6f	201	while(k > 0u)
emh203	0:3d9c67d97d6f	202	{
emh203	0:3d9c67d97d6f	203	/* copy second buffer in reversal manner for remaining samples */
emh203	0:3d9c67d97d6f	204	x4 = (q15_t) * pIn1++;
emh203	0:3d9c67d97d6f	205	*pScr1++ = x4;
emh203	0:3d9c67d97d6f	206
emh203	0:3d9c67d97d6f	207	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	208	k--;
emh203	0:3d9c67d97d6f	209	}
emh203	0:3d9c67d97d6f	210
emh203	0:3d9c67d97d6f	211	#ifndef UNALIGNED_SUPPORT_DISABLE
emh203	0:3d9c67d97d6f	212
emh203	0:3d9c67d97d6f	213	/* Fill (srcBLen - 1u) zeros at end of scratch buffer */
emh203	0:3d9c67d97d6f	214	arm_fill_q15(0, pScr1, (srcBLen - 1u));
emh203	0:3d9c67d97d6f	215
emh203	0:3d9c67d97d6f	216	/* Update pointer */
emh203	0:3d9c67d97d6f	217	pScr1 += (srcBLen - 1u);
emh203	0:3d9c67d97d6f	218
emh203	0:3d9c67d97d6f	219	#else
emh203	0:3d9c67d97d6f	220
emh203	0:3d9c67d97d6f	221	/* Apply loop unrolling and do 4 Copies simultaneously. */
emh203	0:3d9c67d97d6f	222	k = (srcBLen - 1u) >> 2u;
emh203	0:3d9c67d97d6f	223
emh203	0:3d9c67d97d6f	224	/* First part of the processing with loop unrolling copies 4 data points at a time.
emh203	0:3d9c67d97d6f	225	** a second loop below copies for the remaining 1 to 3 samples. */
emh203	0:3d9c67d97d6f	226	while(k > 0u)
emh203	0:3d9c67d97d6f	227	{
emh203	0:3d9c67d97d6f	228	/* copy second buffer in reversal manner */
emh203	0:3d9c67d97d6f	229	*pScr1++ = 0;
emh203	0:3d9c67d97d6f	230	*pScr1++ = 0;
emh203	0:3d9c67d97d6f	231	*pScr1++ = 0;
emh203	0:3d9c67d97d6f	232	*pScr1++ = 0;
emh203	0:3d9c67d97d6f	233
emh203	0:3d9c67d97d6f	234	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	235	k--;
emh203	0:3d9c67d97d6f	236	}
emh203	0:3d9c67d97d6f	237
emh203	0:3d9c67d97d6f	238	/* If the count is not a multiple of 4, copy remaining samples here.
emh203	0:3d9c67d97d6f	239	** No loop unrolling is used. */
emh203	0:3d9c67d97d6f	240	k = (srcBLen - 1u) % 0x4u;
emh203	0:3d9c67d97d6f	241
emh203	0:3d9c67d97d6f	242	while(k > 0u)
emh203	0:3d9c67d97d6f	243	{
emh203	0:3d9c67d97d6f	244	/* copy second buffer in reversal manner for remaining samples */
emh203	0:3d9c67d97d6f	245	*pScr1++ = 0;
emh203	0:3d9c67d97d6f	246
emh203	0:3d9c67d97d6f	247	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	248	k--;
emh203	0:3d9c67d97d6f	249	}
emh203	0:3d9c67d97d6f	250
emh203	0:3d9c67d97d6f	251	#endif
emh203	0:3d9c67d97d6f	252
emh203	0:3d9c67d97d6f	253	/* Temporary pointer for scratch2 */
emh203	0:3d9c67d97d6f	254	py = pScratch2;
emh203	0:3d9c67d97d6f	255
emh203	0:3d9c67d97d6f	256	/* Initialization of pIn2 pointer */
emh203	0:3d9c67d97d6f	257	pIn2 = (q7_t *) py;
emh203	0:3d9c67d97d6f	258
emh203	0:3d9c67d97d6f	259	pScr2 = py;
emh203	0:3d9c67d97d6f	260
emh203	0:3d9c67d97d6f	261	/* Actual convolution process starts here */
emh203	0:3d9c67d97d6f	262	blkCnt = (srcALen + srcBLen - 1u) >> 2;
emh203	0:3d9c67d97d6f	263
emh203	0:3d9c67d97d6f	264	while(blkCnt > 0)
emh203	0:3d9c67d97d6f	265	{
emh203	0:3d9c67d97d6f	266	/* Initialze temporary scratch pointer as scratch1 */
emh203	0:3d9c67d97d6f	267	pScr1 = pScratch1;
emh203	0:3d9c67d97d6f	268
emh203	0:3d9c67d97d6f	269	/* Clear Accumlators */
emh203	0:3d9c67d97d6f	270	acc0 = 0;
emh203	0:3d9c67d97d6f	271	acc1 = 0;
emh203	0:3d9c67d97d6f	272	acc2 = 0;
emh203	0:3d9c67d97d6f	273	acc3 = 0;
emh203	0:3d9c67d97d6f	274
emh203	0:3d9c67d97d6f	275	/* Read two samples from scratch1 buffer */
emh203	0:3d9c67d97d6f	276	x1 = *__SIMD32(pScr1)++;
emh203	0:3d9c67d97d6f	277
emh203	0:3d9c67d97d6f	278	/* Read next two samples from scratch1 buffer */
emh203	0:3d9c67d97d6f	279	x2 = *__SIMD32(pScr1)++;
emh203	0:3d9c67d97d6f	280
emh203	0:3d9c67d97d6f	281	tapCnt = (srcBLen) >> 2u;
emh203	0:3d9c67d97d6f	282
emh203	0:3d9c67d97d6f	283	while(tapCnt > 0u)
emh203	0:3d9c67d97d6f	284	{
emh203	0:3d9c67d97d6f	285
emh203	0:3d9c67d97d6f	286	/* Read four samples from smaller buffer */
emh203	0:3d9c67d97d6f	287	y1 = _SIMD32_OFFSET(pScr2);
emh203	0:3d9c67d97d6f	288
emh203	0:3d9c67d97d6f	289	/* multiply and accumlate */
emh203	0:3d9c67d97d6f	290	acc0 = __SMLAD(x1, y1, acc0);
emh203	0:3d9c67d97d6f	291	acc2 = __SMLAD(x2, y1, acc2);
emh203	0:3d9c67d97d6f	292
emh203	0:3d9c67d97d6f	293	/* pack input data */
emh203	0:3d9c67d97d6f	294	#ifndef ARM_MATH_BIG_ENDIAN
emh203	0:3d9c67d97d6f	295	x3 = __PKHBT(x2, x1, 0);
emh203	0:3d9c67d97d6f	296	#else
emh203	0:3d9c67d97d6f	297	x3 = __PKHBT(x1, x2, 0);
emh203	0:3d9c67d97d6f	298	#endif
emh203	0:3d9c67d97d6f	299
emh203	0:3d9c67d97d6f	300	/* multiply and accumlate */
emh203	0:3d9c67d97d6f	301	acc1 = __SMLADX(x3, y1, acc1);
emh203	0:3d9c67d97d6f	302
emh203	0:3d9c67d97d6f	303	/* Read next two samples from scratch1 buffer */
emh203	0:3d9c67d97d6f	304	x1 = *__SIMD32(pScr1)++;
emh203	0:3d9c67d97d6f	305
emh203	0:3d9c67d97d6f	306	/* pack input data */
emh203	0:3d9c67d97d6f	307	#ifndef ARM_MATH_BIG_ENDIAN
emh203	0:3d9c67d97d6f	308	x3 = __PKHBT(x1, x2, 0);
emh203	0:3d9c67d97d6f	309	#else
emh203	0:3d9c67d97d6f	310	x3 = __PKHBT(x2, x1, 0);
emh203	0:3d9c67d97d6f	311	#endif
emh203	0:3d9c67d97d6f	312
emh203	0:3d9c67d97d6f	313	acc3 = __SMLADX(x3, y1, acc3);
emh203	0:3d9c67d97d6f	314
emh203	0:3d9c67d97d6f	315	/* Read four samples from smaller buffer */
emh203	0:3d9c67d97d6f	316	y1 = _SIMD32_OFFSET(pScr2 + 2u);
emh203	0:3d9c67d97d6f	317
emh203	0:3d9c67d97d6f	318	acc0 = __SMLAD(x2, y1, acc0);
emh203	0:3d9c67d97d6f	319
emh203	0:3d9c67d97d6f	320	acc2 = __SMLAD(x1, y1, acc2);
emh203	0:3d9c67d97d6f	321
emh203	0:3d9c67d97d6f	322	acc1 = __SMLADX(x3, y1, acc1);
emh203	0:3d9c67d97d6f	323
emh203	0:3d9c67d97d6f	324	x2 = *__SIMD32(pScr1)++;
emh203	0:3d9c67d97d6f	325
emh203	0:3d9c67d97d6f	326	#ifndef ARM_MATH_BIG_ENDIAN
emh203	0:3d9c67d97d6f	327	x3 = __PKHBT(x2, x1, 0);
emh203	0:3d9c67d97d6f	328	#else
emh203	0:3d9c67d97d6f	329	x3 = __PKHBT(x1, x2, 0);
emh203	0:3d9c67d97d6f	330	#endif
emh203	0:3d9c67d97d6f	331
emh203	0:3d9c67d97d6f	332	acc3 = __SMLADX(x3, y1, acc3);
emh203	0:3d9c67d97d6f	333
emh203	0:3d9c67d97d6f	334	pScr2 += 4u;
emh203	0:3d9c67d97d6f	335
emh203	0:3d9c67d97d6f	336
emh203	0:3d9c67d97d6f	337	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	338	tapCnt--;
emh203	0:3d9c67d97d6f	339	}
emh203	0:3d9c67d97d6f	340
emh203	0:3d9c67d97d6f	341
emh203	0:3d9c67d97d6f	342
emh203	0:3d9c67d97d6f	343	/* Update scratch pointer for remaining samples of smaller length sequence */
emh203	0:3d9c67d97d6f	344	pScr1 -= 4u;
emh203	0:3d9c67d97d6f	345
emh203	0:3d9c67d97d6f	346
emh203	0:3d9c67d97d6f	347	/* apply same above for remaining samples of smaller length sequence */
emh203	0:3d9c67d97d6f	348	tapCnt = (srcBLen) & 3u;
emh203	0:3d9c67d97d6f	349
emh203	0:3d9c67d97d6f	350	while(tapCnt > 0u)
emh203	0:3d9c67d97d6f	351	{
emh203	0:3d9c67d97d6f	352
emh203	0:3d9c67d97d6f	353	/* accumlate the results */
emh203	0:3d9c67d97d6f	354	acc0 += (pScr1++ *pScr2);
emh203	0:3d9c67d97d6f	355	acc1 += (pScr1++ *pScr2);
emh203	0:3d9c67d97d6f	356	acc2 += (pScr1++ *pScr2);
emh203	0:3d9c67d97d6f	357	acc3 += (pScr1++ *pScr2++);
emh203	0:3d9c67d97d6f	358
emh203	0:3d9c67d97d6f	359	pScr1 -= 3u;
emh203	0:3d9c67d97d6f	360
emh203	0:3d9c67d97d6f	361	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	362	tapCnt--;
emh203	0:3d9c67d97d6f	363	}
emh203	0:3d9c67d97d6f	364
emh203	0:3d9c67d97d6f	365	blkCnt--;
emh203	0:3d9c67d97d6f	366
emh203	0:3d9c67d97d6f	367	/* Store the result in the accumulator in the destination buffer. */
emh203	0:3d9c67d97d6f	368	out0 = (q7_t) (__SSAT(acc0 >> 7u, 8));
emh203	0:3d9c67d97d6f	369	out1 = (q7_t) (__SSAT(acc1 >> 7u, 8));
emh203	0:3d9c67d97d6f	370	out2 = (q7_t) (__SSAT(acc2 >> 7u, 8));
emh203	0:3d9c67d97d6f	371	out3 = (q7_t) (__SSAT(acc3 >> 7u, 8));
emh203	0:3d9c67d97d6f	372
emh203	0:3d9c67d97d6f	373	*__SIMD32(pOut)++ = __PACKq7(out0, out1, out2, out3);
emh203	0:3d9c67d97d6f	374
emh203	0:3d9c67d97d6f	375	/* Initialization of inputB pointer */
emh203	0:3d9c67d97d6f	376	pScr2 = py;
emh203	0:3d9c67d97d6f	377
emh203	0:3d9c67d97d6f	378	pScratch1 += 4u;
emh203	0:3d9c67d97d6f	379
emh203	0:3d9c67d97d6f	380	}
emh203	0:3d9c67d97d6f	381
emh203	0:3d9c67d97d6f	382
emh203	0:3d9c67d97d6f	383	blkCnt = (srcALen + srcBLen - 1u) & 0x3;
emh203	0:3d9c67d97d6f	384
emh203	0:3d9c67d97d6f	385	/* Calculate convolution for remaining samples of Bigger length sequence */
emh203	0:3d9c67d97d6f	386	while(blkCnt > 0)
emh203	0:3d9c67d97d6f	387	{
emh203	0:3d9c67d97d6f	388	/* Initialze temporary scratch pointer as scratch1 */
emh203	0:3d9c67d97d6f	389	pScr1 = pScratch1;
emh203	0:3d9c67d97d6f	390
emh203	0:3d9c67d97d6f	391	/* Clear Accumlators */
emh203	0:3d9c67d97d6f	392	acc0 = 0;
emh203	0:3d9c67d97d6f	393
emh203	0:3d9c67d97d6f	394	tapCnt = (srcBLen) >> 1u;
emh203	0:3d9c67d97d6f	395
emh203	0:3d9c67d97d6f	396	while(tapCnt > 0u)
emh203	0:3d9c67d97d6f	397	{
emh203	0:3d9c67d97d6f	398	acc0 += (pScr1++ *pScr2++);
emh203	0:3d9c67d97d6f	399	acc0 += (pScr1++ *pScr2++);
emh203	0:3d9c67d97d6f	400
emh203	0:3d9c67d97d6f	401	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	402	tapCnt--;
emh203	0:3d9c67d97d6f	403	}
emh203	0:3d9c67d97d6f	404
emh203	0:3d9c67d97d6f	405	tapCnt = (srcBLen) & 1u;
emh203	0:3d9c67d97d6f	406
emh203	0:3d9c67d97d6f	407	/* apply same above for remaining samples of smaller length sequence */
emh203	0:3d9c67d97d6f	408	while(tapCnt > 0u)
emh203	0:3d9c67d97d6f	409	{
emh203	0:3d9c67d97d6f	410
emh203	0:3d9c67d97d6f	411	/* accumlate the results */
emh203	0:3d9c67d97d6f	412	acc0 += (pScr1++ *pScr2++);
emh203	0:3d9c67d97d6f	413
emh203	0:3d9c67d97d6f	414	/* Decrement the loop counter */
emh203	0:3d9c67d97d6f	415	tapCnt--;
emh203	0:3d9c67d97d6f	416	}
emh203	0:3d9c67d97d6f	417
emh203	0:3d9c67d97d6f	418	blkCnt--;
emh203	0:3d9c67d97d6f	419
emh203	0:3d9c67d97d6f	420	/* Store the result in the accumulator in the destination buffer. */
emh203	0:3d9c67d97d6f	421	*pOut++ = (q7_t) (__SSAT(acc0 >> 7u, 8));
emh203	0:3d9c67d97d6f	422
emh203	0:3d9c67d97d6f	423	/* Initialization of inputB pointer */
emh203	0:3d9c67d97d6f	424	pScr2 = py;
emh203	0:3d9c67d97d6f	425
emh203	0:3d9c67d97d6f	426	pScratch1 += 1u;
emh203	0:3d9c67d97d6f	427
emh203	0:3d9c67d97d6f	428	}
emh203	0:3d9c67d97d6f	429
emh203	0:3d9c67d97d6f	430	}
emh203	0:3d9c67d97d6f	431
emh203	0:3d9c67d97d6f	432
emh203	0:3d9c67d97d6f	433	/**
emh203	0:3d9c67d97d6f	434	* @} end of Conv group
emh203	0:3d9c67d97d6f	435	*/

Repository toolbox

Export to desktop IDE

Repository details

Type:	Library
Created:	28 Jul 2014
Imports:	1167
Forks:	0
Commits:	1
Dependents:	15
Dependencies:	0
Followers:	39

FilteringFunctions/arm_conv_opt_q7.c@0:3d9c67d97d6f, 2014-07-28 (annotated)

Who changed what in which revision?

Repository toolbox

Repository details

Important Information for this Arm website

Access Warning