Things that remain constant whichever Mbed LPC1768 is used: peripheral definitions; leds; memory layout; main processor clock. Note that watchdog, default and hard fault handlers; peripheral power and clocks; and the vector table will need to be set up for each project.

Dependents:   test-lpc1768 oldheating gps motorhome ... more

Aim

To program the MBED LPC1768 using just the online compiler and the UM10360.pdf manual.
As a hobbyist I want to do the fun stuff - programming the various peripherals to make them do what I wanted them to do - unfortunately the Mbed OS libraries have done all that while leaving the mundane for me to do.
The Mbed OS libraries have to cope with multiple compilers, multiple targets and all the peripherals even if not used; making them large and putting many layers between you and the registers of the LPC1768.
We, on he other hand, can limit ourselves to a single compiler (the online ARM CC compiler), one target (the MBED LPC1768), and those few peripherals we need which, together with limiting some abilities such as dynamic allocation of interrupt vectors, allows us to dramatically reduces the number of files required and means we can specify and understand each and every bit of our application.

Technical

Bit Banding

Use bit-banding to access the individual bits by mapping each bit in the region base (GPIO 0x2000 0000 or peripheral 0x4000 0000) to a word in the region alias (GPIO 0x2200 0000 or peripheral 0x4200 0000).
The formula is therefore ALIAS + ((((registerAddress - BASE) << 3) + BIT) << 2)) which calculates the number of bits above the base [bytes * 8 plus bit number]; converts it to the number of bytes [ *4] and adds that to the alias.
Writing with bit banding still involves a read modify write cycle; the difference is that the cycle and bit twiddling is managed in hardware and is atomic.
For GPIO writes it is better to use the SET and CLR registers without bit banding to eliminate the surplus read.

Startup code

From the ARM Information Center:

  • __main calls __scatterload then calls __rt_entry
  • __scatterload goes through the region table and initializes the various execution-time regions
  • __rt_entry calls __user_setup_stackheap to initialise the stack and heap, then calls __rt_lib_init to initialise the C library sub systems and finally calls the user-level main

Files used

These are the files listed roughly in the order you might create them. Files you would not normally touch are in the lpc1768 library to make them easier to maintain, while files you would normally modify to suit your application are in the main folder.
lpc1768 contains static files for:

  • startup - memory layout in a scatter file and a routine to calculate the stack and heap locations
  • bitband - macros for each of the two bit band regions
  • system - set up the PLL to take the CPU clock from 12 to 96MHz
  • gpio - addresses of the various GPIO registers
  • led - leds 1 to 4: get, toggle and set
  • serialpc - the serial link to the PC using uart0
  • firmware - contains routines to handle the uploading of new firmware without having to plug into the usb port
  • semihost - contains routines to do various jobs such as MAC, reset and local file handling
  • fault - saves the location of a fault across restarts in a small area left aside in the scatter file
  • handlers - hard fault and default handlers: other handlers are defined in the module that uses them
  • watchdog - watchdog handler

main contains files to modify for:

  • vectors - a simple list of handler addresses without any code
  • periphs - a list of peripherals to power up and their clock dividers
  • main - the start up routine which calls periphs, system, and led initialisation then loops flashing an led

startup.sct

Contains the scatter file where the different memory regions are defined for the linker.
Derived from targets/TARGET_NXP/TARGET_LPC176X/device/TOOLCHAIN_ARM_STD/LPC1768.sct.
I have removed the NIVT and IAP reserved areas by not moving the vector table into ram (therefore have to specify the handlers at link time: they cannot be dynamically allocated) and by not using In Application Programming.
The .CRPSection, together with a line in startup.c ensures that the value -1 is stored in location 0x2FC thereby setting code read protection to none.
The following sections are specified:

  • RESET - used in vectors.S
  • .CRPSection - used in startup.c to ensure that Code Read Protection is set to none
  • InRoot$$Sections - used for code which must execute at its load address
  • AHBSRAM0 - used in the log module for use as buffer; normally dedicated to USB if you use it
  • AHBSRAM1 - used in the net module for the ethernet buffers
  • fault - Leave 4 bytes for the fault code at 20083FFC
  • CANRAM - normally dedicated to Controller Area Network if you use it

startup.sct

#! armcc -E

#define MBED_APP_START 0x00000000

#define MBED_APP_SIZE 0x80000

LR_IROM1 MBED_APP_START MBED_APP_SIZE ; load region size_region
{   
  ER_IROM0 MBED_APP_START 0x2FC    ; load address = execution address
  {
    *.o (RESET, +First)
    .ANY (+RO)
  }
  ER_CRP (MBED_APP_START + 0x2FC) FIXED 4
  {
    *.o (.CRPSection)
  }
  ER_IROM1 (MBED_APP_START + (0x2FC + 4)) FIXED (MBED_APP_SIZE - (0x2FC + 4))
  {
    *(InRoot$$Sections)
    .ANY (+RO)
  }
  RW_IRAM1 0x10000000 0x8000    ; RW and ZI
  {
   .ANY (+RW +ZI)
  }
  RW_IRAM2 0x2007C000 0x4000    ; RW data, USB RAM used by the Log module
  {
   .ANY (AHBSRAM0)
  }
  RW_IRAM3 0x20080000 0x3FFC    ; RW data, ETH RAM used by the Net module. Leave 4 bytes for the fault code at 20083FFC
  {
   .ANY (AHBSRAM1)
  }
  RW_IRAM4 0x40038000 0x0800    ; RW data, CAN RAM
  {
   .ANY (CANRAM)
  }
}

startup.c

Contains the c function __user_setup_stackheap which is called from __main to calculate the stack and heap positions prior to main being called .
Derived from platform/mbed_retarget.cpp.

startup.c

#include <stdint.h>
#include <rt_misc.h>

__attribute__ ((section (".CRPSection"), used)) const long crpKey = -1; //Don't use Code Read Protection

extern char Image$$RW_IRAM1$$ZI$$Limit[];

__value_in_regs struct __initial_stackheap __user_setup_stackheap(uint32_t R0, uint32_t R1, uint32_t R2, uint32_t R3)
{
    uint32_t zi_limit = (uint32_t)Image$$RW_IRAM1$$ZI$$Limit;
    uint32_t sp_limit = __current_sp();

    zi_limit = (zi_limit + 7) & ~0x7;    // ensure zi_limit is 8-byte aligned

    struct __initial_stackheap r;
    r.heap_base = zi_limit;
    r.heap_limit = sp_limit;
    return r;
}

vectors.S

Contains the vector table specified in assembly; there is no code here: just pointers
Derived from targets/TARGET_NXP/TARGET_LPC176X/device/TOOLCHAIN_ARM_STD/startup_LPC17xx.S. There are two difference from Mbed:

  • I have not specified a ResetHandler or SystemInit function but have just put __main directly into the table and called SystemInit from main: there is no longer assembly code for a reset handler which calls SystemInit then __main.
  • I define the handlers in the module they are used (the default and hard fault handlers have been given their own module) and put the name directly into the table: they are no longer weak references which default to an infinite loop if not defined.

To define an interrupt handler in a given application:

  • Define the handler (void parameters and returns void) in its module using the attribute __irq
  • Add the name of the handler to the list of IMPORTs
  • Replace 'DefaultHandler' on the relevant line of __Vectors with the name of the handler

vectors.S

__initial_sp        EQU     0x10008000  ; Top of RAM from LPC1768

                PRESERVE8
                THUMB

; Vector Table Mapped to Address 0 at Reset

                AREA    RESET, DATA, READONLY
                EXPORT  __Vectors
                IMPORT  __main
                IMPORT  DefaultHandler
                IMPORT  HardFaultHandler
                IMPORT  WatchdogHandler

__Vectors       DCD     __initial_sp              ; Top of Stack
                DCD     __main                    ; Reset Handler
                DCD     DefaultHandler            ; NMI Handler
                DCD     HardFaultHandler          ; Hard Fault Handler
                DCD     DefaultHandler            ; MPU Fault Handler
                DCD     DefaultHandler            ; Bus Fault Handler
                DCD     DefaultHandler            ; Usage Fault Handler
                DCD     0                         ; Reserved
                DCD     0                         ; Reserved
                DCD     0                         ; Reserved
                DCD     0                         ; Reserved
                DCD     DefaultHandler            ; SVCall Handler
                DCD     DefaultHandler            ; Debug Monitor Handler
                DCD     0                         ; Reserved
                DCD     DefaultHandler            ; PendSV Handler
                DCD     DefaultHandler            ; SysTick Handler

                ; External Interrupts
                DCD     WatchdogHandler           ; Watchdog Timer
                DCD     DefaultHandler            ; Timer0
                DCD     DefaultHandler            ; Timer1
                DCD     DefaultHandler            ; Timer2
                DCD     DefaultHandler            ; Timer3
                DCD     DefaultHandler            ; UART0
                DCD     DefaultHandler            ; UART1
                DCD     DefaultHandler            ; UART2
                DCD     DefaultHandler            ; UART3
                DCD     DefaultHandler            ; PWM1
                DCD     DefaultHandler            ; I2C0
                DCD     DefaultHandler            ; I2C1
                DCD     DefaultHandler            ; I2C2
                DCD     DefaultHandler            ; SPI
                DCD     DefaultHandler            ; SSP0
                DCD     DefaultHandler            ; SSP1
                DCD     DefaultHandler            ; PLL0 Lock (Main PLL)
                DCD     DefaultHandler            ; Real Time Clock
                DCD     DefaultHandler            ; External Interrupt 0
                DCD     DefaultHandler            ; External Interrupt 1
                DCD     DefaultHandler            ; External Interrupt 2
                DCD     DefaultHandler            ; External Interrupt 3
                DCD     DefaultHandler            ; A/D Converter
                DCD     DefaultHandler            ; Brown-Out Detect
                DCD     DefaultHandler            ; USB
                DCD     DefaultHandler            ; CAN
                DCD     DefaultHandler            ; General Purpose DMA
                DCD     DefaultHandler            ; I2S
                DCD     DefaultHandler            ; Ethernet
                DCD     DefaultHandler            ; Repetitive Interrupt Timer
                DCD     DefaultHandler            ; Motor Control PWM
                DCD     DefaultHandler            ; Quadrature Encoder Interface
                DCD     DefaultHandler            ; PLL1 Lock (USB PLL)

                ALIGN
                END

bitband.h

bitband.h

#define ALIAS4 0x42000000
#define BASE4  0x40000000
#define BIT_BAND4(ADDR_PAR, BIT_PAR) *((volatile unsigned *)(ALIAS4 + ((((ADDR_PAR - BASE4) << 3) + BIT_PAR) << 2)))

periphs.c

Power up, select the clock divider and select the pins for the peripherals you need; before the PLL is enabled.

periphs.c

#define PCONP    (*((volatile unsigned *) 0x400FC0C4))
#define PCLKSEL0 (*((volatile unsigned *) 0x400FC1A8))
#define PCLKSEL1 (*((volatile unsigned *) 0x400FC1AC))
#define PINSEL0  (*((volatile unsigned *) 0x4002C000))
#define PINSEL1  (*((volatile unsigned *) 0x4002C004))
#define PINSEL2  (*((volatile unsigned *) 0x4002C008))
#define PINSEL3  (*((volatile unsigned *) 0x4002C00C))


void PeriphsInit (void)
{
    //Peripheral power - Table 46
    PCONP  = 0;
    PCONP |= 1 <<  1; //TIMER0
    PCONP |= 1 <<  3; //UART0
    PCONP |= 1 <<  4; //UART1
    PCONP |= 1 <<  9; //RTC
    PCONP |= 1 << 10; //SSP1
    PCONP |= 1 << 15; //GPIO
    PCONP |= 1 << 30; //ENET                 

    //Peripheral clocks; default is 00 to divide by 4; specify 01 to divide by 1
    PCLKSEL0  = 0;
    PCLKSEL0 |= 1 <<  2;  //TIM0
    PCLKSEL0 |= 1 <<  6;  //UART0
    PCLKSEL0 |= 1 <<  8;  //UART1
    PCLKSEL0 |= 1 << 20;  //SSP1

    //Pin functions table 80.
    PINSEL0  = 0;
    PINSEL0 |= 1U <<  4; //P0.02 01 TXD0  UART0
    PINSEL0 |= 1U <<  6; //P0.03 01 RXD0  UART0
    PINSEL0 |= 2U << 14; //P0.07 10 SCK1   SSP1
    PINSEL0 |= 2U << 16; //P0.08 10 MISO1  SSP1
    PINSEL0 |= 2U << 18; //P0.09 10 MOSI1  SSP1
    PINSEL0 |= 1U << 30; //P0.15 01 TXD1  UART1
    
    PINSEL1  = 0;
    PINSEL1 |= 1U <<  0; //P0.16 01 RXD1  UART1
    
    PINSEL2  = 0;
    PINSEL2 |= 1U <<  0; //P1.00 01 ENET_TXD0
    PINSEL2 |= 1U <<  2; //P1.01 01 ENET_TXD1
    PINSEL2 |= 1U <<  8; //P1.04 01 ENET_TX_EN
    PINSEL2 |= 1U << 16; //P1.08 01 ENET_CRS
    PINSEL2 |= 1U << 18; //P1.09 01 ENET_RXD0
    PINSEL2 |= 1U << 20; //P1.10 01 ENET_RXD1
    PINSEL2 |= 1U << 28; //P1.14 01 ENET_RX_ER
    PINSEL2 |= 1U << 30; //P1.15 01 ENET_REF_CLK
    
    PINSEL3  = 0;
    PINSEL3 |= 1U <<  0; //P1.16 01 ENET_MDC
    PINSEL3 |= 1U <<  2; //P1.17 01 ENET_MDIO
}

system.c

Sets up the flash wait states, the main oscillator and finally starts the PLL to move from 12MHz to 96MHz. The peripheral clocks have to be set before the PLL is enabled.

system.c

#include "bitband.h"

//Addresses
#define   FLASHCFG_ADDR 0x400FC000 // Flash Accelerator Configuration Register; controls flash access timing R/W 0x303A
#define    CCLKCFG_ADDR 0x400FC104 // CPU Clock Configuration Register R/W 0
#define  CLKSRCSEL_ADDR 0x400FC10C // Clock Source Select Register R/W 0 
#define        SCS_ADDR 0x400FC1A0 // System Control and Status R/W 0
#define    PLL0CON_ADDR 0x400FC080 // PLL0 Control Register R/W 0
#define    PLL0CFG_ADDR 0x400FC084 // PLL0 Configuration Register R/W 0
#define   PLL0STAT_ADDR 0x400FC088 // PLL0 Status Register RO 0
#define   PLL0FEED_ADDR 0x400FC08C // PLL0 Feed Register WO NA
#define  USBCLKCFG_ADDR 0x400FC108 // USB Clock Configuration Register R/W 0
#define  CLKOUTCFG_ADDR 0x400FC1C8 // Clock Output Configuration Register R/W 0

//Registers
#define   FLASHCFG *((volatile unsigned *)  FLASHCFG_ADDR)
#define    CCLKCFG *((volatile unsigned *)   CCLKCFG_ADDR)
#define  CLKSRCSEL *((volatile unsigned *) CLKSRCSEL_ADDR)
#define    PLL0CFG *((volatile unsigned *)   PLL0CFG_ADDR)
#define   PLL0FEED *((volatile unsigned *)  PLL0FEED_ADDR)
#define  USBCLKCFG *((volatile unsigned *) USBCLKCFG_ADDR)
#define  CLKOUTCFG *((volatile unsigned *) CLKOUTCFG_ADDR)

//Bits
#define PLLE0    BIT_BAND4( PLL0CON_ADDR,  0)
#define PLLC0    BIT_BAND4( PLL0CON_ADDR,  1)
#define PLOCK0   BIT_BAND4(PLL0STAT_ADDR, 26) //Reflects the PLL0 Lock status.
#define OSCRANGE BIT_BAND4(     SCS_ADDR,  4)
#define OSCEN    BIT_BAND4(     SCS_ADDR,  5)
#define OSCSTAT  BIT_BAND4(     SCS_ADDR,  6)

void SystemInit()
{
    //Flash config
    FLASHCFG &= ~(0xF << 12); // b15:12 0b0100 Flash accesses use 5 CPU clocks. Use for up to 100 MHz CPU clock.
    FLASHCFG |=   0x4 << 12 ;

    //Enable the main oscillator and wait for it to be ready
    OSCRANGE = 0;                  //Main Oscillator Range Select: 0= 1 MHz to 20 MHz; 1= 15 MHz to 25 MHz
    OSCEN    = 1;                  //Main Oscillator Enable
    while (!OSCSTAT) __nop();      //Wait for Main Oscillator to be ready
    
    // bit 7:0 CCLKSEL: Divide value for the CPU clock (CCLK) from the PLL0 output.
    CCLKCFG   = 2;                 // 2 == divide by 3
    
    //PLL0 Clock Source Select
    CLKSRCSEL = 1;                 //01 selects the main oscillator
    
    //PLLO Configuration
    PLL0CFG   = 11;                //MSEL0=11 == M=12 == multiply by 24; NSEL=0 == divide by 1
    PLL0FEED  = 0xAA;
    PLL0FEED  = 0x55;
    
    //PLL0 Control
    PLLE0     = 1;                 // PLL0 Enable
    PLL0FEED  = 0xAA;
    PLL0FEED  = 0x55;
    while (!PLOCK0) __nop();       // Wait for PLL to be locked
    PLLC0     = 1;                 // PLL0 Connect
    PLL0FEED  = 0xAA;
    PLL0FEED  = 0x55;

    //USB Clock divider
    USBCLKCFG = 5;                 // 5 == divide by 6
    
    //Clock Output
    CLKOUTCFG = 0;                 //No clock output
}

gpio.h

gpio.h

#define FIO0DIR_ADDR 0x2009C000
#define FIO1DIR_ADDR 0x2009C020
#define FIO2DIR_ADDR 0x2009C040

#define FIO0PIN_ADDR 0x2009C014
#define FIO1PIN_ADDR 0x2009C034
#define FIO2PIN_ADDR 0x2009C054

#define FIO0SET_ADDR 0x2009C018
#define FIO1SET_ADDR 0x2009C038
#define FIO2SET_ADDR 0x2009C058

#define FIO0CLR_ADDR 0x2009C01C
#define FIO1CLR_ADDR 0x2009C03C
#define FIO2CLR_ADDR 0x2009C05C

#define FIO0SET(BIT_PAR) *((volatile unsigned *) FIO0SET_ADDR) = 1U << BIT_PAR
#define FIO1SET(BIT_PAR) *((volatile unsigned *) FIO1SET_ADDR) = 1U << BIT_PAR
#define FIO2SET(BIT_PAR) *((volatile unsigned *) FIO2SET_ADDR) = 1U << BIT_PAR

#define FIO0CLR(BIT_PAR) *((volatile unsigned *) FIO0CLR_ADDR) = 1U << BIT_PAR
#define FIO1CLR(BIT_PAR) *((volatile unsigned *) FIO1CLR_ADDR) = 1U << BIT_PAR
#define FIO2CLR(BIT_PAR) *((volatile unsigned *) FIO2CLR_ADDR) = 1U << BIT_PAR

#define ALIAS2 0x22000000
#define BASE2  0x20000000
#define BIT_BAND2(ADDR_PAR, BIT_PAR) *((volatile unsigned *)(ALIAS2 + ((((ADDR_PAR - BASE2) << 3) + BIT_PAR) << 2)))

#define FIO0PIN(BIT_PAR) BIT_BAND2(FIO0PIN_ADDR, BIT_PAR)
#define FIO1PIN(BIT_PAR) BIT_BAND2(FIO1PIN_ADDR, BIT_PAR)
#define FIO2PIN(BIT_PAR) BIT_BAND2(FIO2PIN_ADDR, BIT_PAR)

#define FIO0DIR(BIT_PAR) BIT_BAND2(FIO0DIR_ADDR, BIT_PAR)
#define FIO1DIR(BIT_PAR) BIT_BAND2(FIO1DIR_ADDR, BIT_PAR)
#define FIO2DIR(BIT_PAR) BIT_BAND2(FIO2DIR_ADDR, BIT_PAR)

led.c

led.c

#include <stdbool.h>
#include "bitband.h"
#include "gpio.h"

#define LED1BIT 18
#define LED2BIT 20
#define LED3BIT 21
#define LED4BIT 23

#define LED1DIR BIT_BAND2(FIO1DIR_ADDR, LED1BIT)
#define LED2DIR BIT_BAND2(FIO1DIR_ADDR, LED2BIT)
#define LED3DIR BIT_BAND2(FIO1DIR_ADDR, LED3BIT)
#define LED4DIR BIT_BAND2(FIO1DIR_ADDR, LED4BIT)

#define LED1PIN BIT_BAND2(FIO1PIN_ADDR, LED1BIT)
#define LED2PIN BIT_BAND2(FIO1PIN_ADDR, LED2BIT)
#define LED3PIN BIT_BAND2(FIO1PIN_ADDR, LED3BIT)
#define LED4PIN BIT_BAND2(FIO1PIN_ADDR, LED4BIT)

#define LED1SET FIO1SET = 1U << LED1BIT
#define LED1CLR FIO1CLR = 1U << LED1BIT

#define LED2SET FIO1SET = 1U << LED2BIT
#define LED2CLR FIO1CLR = 1U << LED2BIT

#define LED3SET FIO1SET = 1U << LED3BIT
#define LED3CLR FIO1CLR = 1U << LED3BIT

#define LED4SET FIO1SET = 1U << LED4BIT
#define LED4CLR FIO1CLR = 1U << LED4BIT

void LedInit()
{
    LED1DIR = 1;
    LED2DIR = 1;
    LED3DIR = 1;
    LED4DIR = 1;
}
void Led1Set(bool value) { if (value) LED1SET; else LED1CLR; }
void Led2Set(bool value) { if (value) LED2SET; else LED2CLR; }
void Led3Set(bool value) { if (value) LED3SET; else LED3CLR; }
void Led4Set(bool value) { if (value) LED4SET; else LED4CLR; }

void Led1Tgl() { if (LED1PIN) LED1CLR; else LED1SET; }
void Led2Tgl() { if (LED2PIN) LED2CLR; else LED2SET; }
void Led3Tgl() { if (LED3PIN) LED3CLR; else LED3SET; }
void Led4Tgl() { if (LED4PIN) LED4CLR; else LED4SET; }

bool Led1Get() { return LED1PIN; }
bool Led2Get() { return LED2PIN; }
bool Led3Get() { return LED3PIN; }
bool Led4Get() { return LED4PIN; }

handlers.c

Contains those handlers which are not defined elsewhere in code. Any handler needs to be defined with the __irq attribute.
The hard fault handler is configured to just flash led 4; this leaves the other leds available to help locate the source of the hard fault: typically something like an array overflow or accessing a peripheral that has not been powered up. Faults are saved in the fault module for later debugging

handlers.c

#include <stdint.h>

#include "fault.h"
#include "led.h"

__irq void DefaultHandler()
{
    FaultSave(FAULT_TYPE_DEFAULT);
    int value = 0;
    int count = 0;
    while (1)
    {
        count++;
        if (count > 10000000)
        {
            value = !value;
            Led1Set( value);
            Led2Set(!value);
            Led3Set(!value);
            Led4Set( value);
            count = 0;
        }
    }
}
__irq void HardFaultHandler()
{
    FaultSave(FAULT_TYPE_HARD);
    int value = 0;
    int count = 0;
    while (1)
    {
        count++;
        if (count > 10000000)
        {
            value = !value;
            Led4Set(value);
            count = 0;
        }
    }
}

main.c

Calls PeriphsInit to power up and select the peripheral clocks, then SystemInit and LedInit. After that you can initialise anything else then go into an infinite loop as this routine never exits.

main.c

#include <stdint.h>
#include <stdbool.h>

#include "periphs.h"
#include "system.h"
#include "led.h"

int main()
{
    PeriphsInit();
    SystemInit();
    LedInit();
    
    int count = 0;
    while (true)
    {
        count++;
        if (count > 10000000)
        {
            Led1Set(!Led1Get());
            count = 0;
        }
    }
}

bignum/bn-asm.inc

Committer:
andrewboyson
Date:
2022-06-03
Revision:
71:2f8af23950de
Parent:
55:975f706c67d2

File content as of revision 71:2f8af23950de:

; Define and set macro variables BITS, WORDS and BYTES before including this file
;
; Functions place their arguments in R0, R1, R2, R3 and then after four they will be passed on the stack.
; The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result value from a function.
; They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
; A subroutine must preserve the contents of registers r4-r8, r10, r11 and the SP.
; This can be done on the stack with push and pop operations. See the ARM Procedure Call Standard for additional details on argument passing.
; Use   BX   LR  to return to C using link register (Branch indirect using LR - a return)

    EXPORT BnZer$BITS
    EXPORT BnInc$BITS
    EXPORT BnShr$BITS
    EXPORT BnShl$BITS
    EXPORT BnCpy$BITS
    EXPORT BnOrr$BITS
    EXPORT BnAdd$BITS
    EXPORT BnSub$BITS
    EXPORT BnCmp$BITS
    EXPORT BnIse$BITS
    EXPORT BnIne$BITS
    
    GBLA COUNT
    
BnZer$BITS
        MOV   R3, #0          ;Clear our 'zero' register
        
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        STR  R3, [R0], #4     ;R0 contains the address of the data; R0 will be post incremented by 4 bytes
    
        WEND
    
        BX   LR

BnInc$BITS
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR  R3, [R0]       ;R0 contains the address of the data
        ADDS R3,  R3,  #1   ;Add 1 without carry then update the carry flag
        STR  R3, [R0], #4   ;R0 contains the address of the data; R0 will be post incremented by 4 bytes
        BXCC LR             ;Return if the carry is clear
    
        WEND
    
        BX   LR

BnCpy$BITS
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR  R2, [R1], #4   ;R1 contains the address of the donor; R1 will be post incremented by 4 bytes
        STR  R2, [R0], #4   ;R0 contains the address of the recipient; R0 will be post incremented by 4 bytes
    
        WEND
    
        BX   LR
        
BnOrr$BITS
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR  R2, [R0]       ;R0 contains the address of the accumulator
        LDR  R3, [R1], #4   ;R1 contains the address of the value; R1 will be post incremented by 4 bytes
        ORR  R2,  R2,  R3   ;Or
        STR  R2, [R0], #4   ;R0 contains the address of the recipient; R0 will be post incremented by 4 bytes
    
        WEND
    
        BX   LR
        
BnAdd$BITS
        CMN  R0, #0         ;Acts like the addition of zero which will clear the carry flag
        
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR  R2, [R0]       ;R0 contains the address of the accumulator
        LDR  R3, [R1], #4   ;R1 contains the address of the additive; R1 will be post incremented by 4 bytes
        ADCS R2,  R2,  R3   ;Add with carry then update the carry flag
        STR  R2, [R0], #4   ;R0 contains the address of the data; R0 will be post incremented by 4 bytes
    
        WEND
    
        BX   LR
        
BnSub$BITS
        CMP  R0, #0         ;Acts like the subtraction of zero which will set the carry flag
        
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR  R2, [R0]       ;R0 contains the address of the accumulator
        LDR  R3, [R1], #4   ;R1 contains the address of the subtractor; R1 will be post incremented by 4 bytes
        SBCS R2,  R2,  R3   ;Subtract with carry then update the carry flag
        STR  R2, [R0], #4   ;R0 contains the address of the data; R0 will be post incremented by 4 bytes
    
        WEND
    
        BX   LR

BnShr$BITS                  ;IN R0 pData; IN R1 'bit to shift in'; OUT R0 'bit shifted out'
        ADD  R0, R0, #BYTES ;Go just beyond the big end of the data (1024 / 8)
        RRXS R1, R1         ;Put the lsb of 'bit to shift in' into the carry flag
        
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR  R3, [R0, #-4]! ;R0 contains the address of the data; it is pre-decremented by 4
        RRXS R3,  R3        ;Rotate right putting the carry into bit 31 then update the carry flag
        STR  R3, [R0]       ;R0 contains the address of the data
    
        WEND
    
        MOVCC R0, #0       ;Return carry set or carry clear
        MOVCS R0, #1
        BX   LR
        
BnShl$BITS                 ;IN R0 pData; IN R1 'bit to shift in'; OUT R0 'bit shifted out'
        AND   R1, R1, #1   ;Mask out all but the lsb in 'bit to shift in' in case a bool true is represented other than by a 1
        
COUNT   SETA  0
        WHILE COUNT < WORDS
COUNT   SETA  COUNT + 1

        LDR   R3, [R0]     ;R0 contains the address of the data
        LSLS  R3,  R3, #1  ;Shift R3 left by one then update the carry flag with the 31st bit
        ORR   R3,  R1      ;Add the 'bit to shift in' but don't touch the carry flag
        MOVCC R1, #0       ;Set 'bit to shift in' for the next loop from the carry set or carry clear
        MOVCS R1, #1
        STR   R3, [R0], #4 ;R0 contains the address of the data; R0 will be post incremented by 4 bytes
    
        WEND
        MOV   R0, R1       ;Return the 'bit shifted out'
        BX    LR
        
BnCmp$BITS
        ADD  R0, R0, #BYTES    ;Go just beyond the big end of the data (1024 / 8)
        ADD  R1, R1, #BYTES    ;Go just beyond the big end of the data (1024 / 8)
        
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR   R2, [R0, #-4]! ;R0 contains the address of the lhs; R0 will be pre decremented by 4 bytes
        LDR   R3, [R1, #-4]! ;R1 contains the address of the rhs; R1 will be pre decremented by 4 bytes
        CMP   R2,  R3        ;Set flags as result of subtracting R3 from R2. R2 > R3 ==> HI; R2 < R3 ==> LO
        BHI.W %F99           ;Return +1 if R2 is higher than R3
        BLO.W %F98           ;Return -1 if R2 is higher than R3
    
        WEND
    
        MOV  R0, #0          ;Return  0
        BX   LR              ;
99      MOV  R0, #+1         ;Return +1
        BX   LR
98      MOV  R0, #-1         ;Return -1
        BX   LR

BnIse$BITS
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR   R3, [R0], #4   ;R0 contains the address of the lhs; R0 will be post incremented by 4 bytes
        CMP   R3,  #0        ;Set flags as result of subtracting R3 from R2. R2 > R3 ==> HI; R2 < R3 ==> LO
        BNE.W %F99           ;Return not empty - F == search forward 0 too 99 are local numeric labels
    
        WEND

        MOV  R0, #1          ;Return true
        BX   LR
99      MOV  R0, #0          ;Return false
        BX   LR

        
BnIne$BITS
COUNT   SETA    0
        WHILE   COUNT < WORDS
COUNT   SETA    COUNT + 1

        LDR   R3, [R0], #4   ;R0 contains the address of the lhs; R0 will be post incremented by 4 bytes
        CMP   R3,  #0        ;Set flags as result of subtracting R3 from R2. R2 > R3 ==> HI; R2 < R3 ==> LO
        BNE.W %F99           ;Return not empty
    
        WEND

        MOV  R0, #0          ;Return false
        BX   LR
99      MOV  R0, #1          ;Return true
        BX   LR