CortexM3 Instruction Set Summary
The processor implements the ARMv7M Thumb instruction set. Table 1 shows the CortexM3 instructions and their cycle counts. The cycle counts are based on a system with zero wait states. Within the assembler syntax, depending on the operation, the <op2> field can be replaced with one of the following options:
For brevity, not all load and store addressing modes are shown. See the ARMv7M Architecture Reference Manual for more information. Table 1 uses the following abbreviations in the Cycles column:
P  The number of cycles required for a pipeline refill. This ranges from 1 to 3 depending on the alignment and width of the target instruction, and whether the processor manages to speculate the address early.
B  The number of cycles required to perform the barrier operation. For DSB and DMB, the minimum number of cycles is zero. For ISB, the minimum number of cycles is equivalent to the number required for a pipeline refill.
N The number of registers in the register list to be loaded or stored, including PC or LR.
W The number of cycles spent waiting for an appropriate event.
Table 1 CortexM3 instruction set summary
Operation 
Description 
Assembler 
Cycles 
Move 
Register 
MOV Rd, <op2> 
1 
16bit immediate 
MOVW Rd, #<imm> 
1 

Immediate into top 
MOVT Rd, #<imm> 
1 

To PC 
MOV PC, Rm 
1 + P 

Add 
Add 
ADD Rd, Rn, <op2> 
1 
Add to PC 
ADD PC, PC, Rm 
1 + P 

Add with carry 
ADC Rd, Rn, <op2> 
1 

Form address 
ADR Rd, <label> 
1 

Subtract 
Subtract 
SUB Rd, Rn, <op2> 
1 
Subtract with borrow 
SBC Rd, Rn, <op2> 
1 

Reverse 
RSB Rd, Rn, <op2> 
1 

Multiply 
Multiply 
MUL Rd, Rn, Rm 
1 
Multiply accumulate 
MLA Rd, Rn, Rm 
2 

Multiply subtract 
MLS Rd, Rn, Rm 
2 

Long signed 
SMULL RdLo, RdHi, Rn, Rm 
3 to 5^{[a]} 

Long unsigned 
UMULL RdLo, RdHi, Rn, Rm 
3 to 5^{[a]} 

Long signed accumulate 
SMLAL RdLo, RdHi, Rn, Rm 
4 to 7^{[a]} 

Long unsigned accumulate 
UMLAL RdLo, RdHi, Rn, Rm 
4 to 7^{[a]} 

Divide 
Signed 
SDIV Rd, Rn, Rm 
2 to 12^{[b]} 
Unsigned 
UDIV Rd, Rn, Rm 
2 to 12^{[b]} 

Saturate 
Signed 
SSAT Rd, #<imm>, <op2> 
1 
Unsigned 
USAT Rd, #<imm>, <op2> 
1 

Compare 
Compare 
CMP Rn, <op2> 
1 
Negative 
CMN Rn, <op2> 
1 

Logical 
AND 
AND Rd, Rn, <op2> 
1 
Exclusive OR 
EOR Rd, Rn, <op2> 
1 

OR 
ORR Rd, Rn, <op2> 
1 

OR NOT 
ORN Rd, Rn, <op2> 
1 

Bit clear 
BIC Rd, Rn, <op2> 
1 

Move NOT 
MVN Rd, <op2> 
1 

AND test 
TST Rn, <op2> 
1 

Exclusive OR test 
TEQ Rn, <op1> 


Shift 
Logical shift left 
LSL Rd, Rn, #<imm> 
1 
Logical shift left 
LSL Rd, Rn, Rs 
1 

Logical shift right 
LSR Rd, Rn, #<imm> 
1 

Logical shift right 
LSR Rd, Rn, Rs 
1 

Arithmetic shift right 
ASR Rd, Rn, #<imm> 
1 

Arithmetic shift right 
ASR Rd, Rn, Rs 
1 

Rotate 
Rotate right 
ROR Rd, Rn, #<imm> 
1 
Rotate right 
ROR Rd, Rn, Rs 
1 

With extension 
RRX Rd, Rn 
1 

Count 
Leading zeroes 
CLZ Rd, Rn 
1 
Load 
Word 
LDR Rd, [Rn, <op2>] 
2^{[c]} 
To PC 
LDR PC, [Rn, <op2>] 
2^{[c]} + P 

Halfword 
LDRH Rd, [Rn, <op2>] 
2^{[c]} 

Byte 
LDRB Rd, [Rn, <op2>] 
2^{[c]} 

Signed halfword 
LDRSH Rd, [Rn, <op2>] 
2^{[c]} 

Signed byte 
LDRSB Rd, [Rn, <op2>] 
2^{[c]} 

User word 
LDRT Rd, [Rn, #<imm>] 
2^{[c]} 

User halfword 
LDRHT Rd, [Rn, #<imm>] 
2^{[c]} 

User byte 
LDRBT Rd, [Rn, #<imm>] 
2^{[c]} 

User signed halfword 
LDRSHT Rd, [Rn, #<imm>] 
2^{[c]} 

User signed byte 
LDRSBT Rd, [Rn, #<imm>] 
2^{[c]} 

PC relative 
LDR Rd,[PC, #<imm>] 
2^{[c]} 

Doubleword 
LDRD Rd, Rd, [Rn, #<imm>] 
1 + N 

Multiple 
LDM Rn, {<reglist>} 
1 + N 

Multiple including PC 
LDM Rn, {<reglist>, PC} 
1 + N + P 

Store 
Word 
STR Rd, [Rn, <op2>] 
2^{[c]} 
Halfword 
STRH Rd, [Rn, <op2>] 
2^{[c]} 

Byte 
STRB Rd, [Rn, <op2>] 
2^{[c]} 

Signed halfword 
STRSH Rd, [Rn, <op2>] 
2^{[c]} 

Signed byte 
STRSB Rd, [Rn, <op2>] 
2^{[c]} 

User word 
STRT Rd, [Rn, #<imm>] 
2^{[c]} 

User halfword 
STRHT Rd, [Rn, #<imm>] 
2^{[c]} 

User byte 
STRBT Rd, [Rn, #<imm>] 
2^{[c]} 

User signed halfword 
STRSHT Rd, [Rn, #<imm>] 
2^{[c]} 

User signed byte 
STRSBT Rd, [Rn, #<imm>] 
2^{c} 

Doubleword 
STRD Rd, Rd, [Rn, #<imm>] 
1 + N 

Multiple 
STM Rn, {<reglist>} 
1 + N 

Push 
Push 
PUSH {<reglist>} 
1 + N 
Push with link register 
PUSH {<reglist>, LR} 
1 + N 

Pop 
Pop 
POP {<reglist>} 
1 + N 
Pop and return 
POP {<reglist>, PC} 
1 + N + P 

Semaphore 
Load exclusive 
LDREX Rd, [Rn, #<imm>] 
2 
Load exclusive half 
LDREXH Rd, [Rn] 
2 

Load exclusive byte 
LDREXB Rd, [Rn] 
2 

Store exclusive 
STREX Rd, Rt, [Rn, #<imm>] 
2 

Store exclusive half 
STREXH Rd, Rt, [Rn] 
2 

Store exclusive byte 
STREXB Rd, Rt, [Rn] 
2 

Clear exclusive monitor 
CLREX 
1 

Branch 
Conditional 
B<cc> <label> 
1 or 1 + P^{[d]} 
Unconditional 
B <label> 
1 + P 

With link 
BL <label> 
1 + P 

With exchange 
BX Rm 
1 + P 

With link and exchange 
BLX Rm 
1 + P 

Branch if zero 
CBZ Rn, <label> 
1 or 1 + P^{[d]} 

Branch if nonzero 
CBNZ Rn, <label> 
1 or 1 + P^{[d]} 

Byte table branch 
TBB [Rn, Rm] 
2 + P 

Halfword table branch 
TBH [Rn, Rm, LSL#1] 
2 + P 

State change 
Supervisor call 
SVC #<imm> 
 
Ifthenelse 
IT... <cond> 
1^{[e]} 

Disable interrupts 
CPSID <flags> 
1 or 2 

Enable interrupts 
CPSIE <flags> 
1 or 2 

Read special register 
MRS Rd, <specreg> 
1 or 2 

Write special register 
MSR <specreg>, Rn 
1 or 2 

Breakpoint 
BKPT #<imm> 
 

Extend 
Signed halfword to word 
SXTH Rd, <op2> 
1 
Signed byte to word 
SXTB Rd, <op2> 
1 

Unsigned halfword 
UXTH Rd, <op2> 
1 

Unsigned byte 
UXTB Rd, <op2> 
1 

Bit field 
Extract unsigned 
UBFX Rd, Rn, #<imm>, #<imm> 
1 
Extract signed 
SBFX Rd, Rn, #<imm>, #<imm> 
1 

Clear 
BFC Rd, Rn, #<imm>, #<imm> 
1 

Insert 
BFI Rd, Rn, #<imm>, #<imm> 
1 

Reverse 
Bytes in word 
REV Rd, Rm 
1 
Bytes in both halfwords 
REV16 Rd, Rm 
1 

Signed bottom halfword 
REVSH Rd, Rm 
1 

Bits in word 
RBIT Rd, Rm 
1 

Hint 
Send event 
SEV 
1 
Wait for event 
WFE 
1 + W 

Wait for interrupt 
WFI 
1 + W 

No operation 
NOP 
1 

Barriers 
Instruction synchronization 
ISB 
1 + B 
Data memory 
DMB 
1 + B 

Data synchronization 
DSB <flags> 
1 + B 

^{[a] }UMULL, SMULL, UMLAL, and SMLAL instructions use early termination depending on the size of the source values. These are interruptible, that is abandoned and restarted, with worst case latency of one cycle. ^{[b] }Division operations use early termination to minimize the number of cycles required based on the number of leading ones and zeroes in the input operands. ^{[c] }Neighboring load and store single instructions can pipeline their address and data phases. This enables these instructions to complete in a single execution cycle. ^{[d] }Conditional branch completes in a single cycle if the branch is not taken. ^{[e] }An IT instruction can be folded onto a preceding 16bit Thumb instruction, enabling execution in zero cycles. 