FLASH array programming using IAP

Hi All, Several people seemed to have found problems trying to define arrays as being within FLASH on the LPC1768.

https://developer.mbed.org/users/okano/notebook/iap-in-application-programming-internal-flash-eras/

Using Igor's method a work around was suggested by using an attribute with the array declaration. However this is less than ideal and certainly seems to break the mechanisms used in mbed compiler for calculating memory usage, as well as reports of compilation problems, which I too experienced. Therefore I wrote a little wrapper to allow the use of standard const arrays defined and compiled normally by the MBED compiler. Having defined a const array the code can write to any elements of the array using the IAP library. There are quite a few things to consider.

  1. Is the running code using the AHBRAM area (eth for example)
  2. Does the code need to use interrupts between calls to the array fill routine.
  3. Can the array be filled sequentially from start to finish.
  4. Everything is LPC1768(512K Flash) specific (currently)

The reasons these are important are due to the techniques I applied to try to minimize the number of erase/write actions used when writing the array. If the system is not using any of the AHBSRAM0 or AHBSRAM1 then the full 32K is used as a scratch area, to allow copying the existing contents of sectors on the flash before erase. These are then re-constructed and written back to the FLASH. Therefore if these SRAM blocks are not used, they are free to be used as a scratch area and the AHBSRAM contents before and after the calls are irrelevant. If however, these areas are used, either by libraries in your code or in any other way, I have implemented a simple technique, to temporarily store the AHBSRAM in Sector 29 on FLASH. When flash operations are complete sector 29 is copied back into AHBSRAM. Clearly throughout these processes interrupts must be disabled, both to prevent any unwanted access to FLASH as we program it, or by the same token access into the AHBSRAM while it is used as a scratch area. The exact details of how you allow interrupts is specific to your own code. Let us imagine a system where AHBSRAM is used elsewhere, in this case its contents must be stored temporarily in Sect 29, but still an efficient method can be employed to cycle through the entire array in 512byte blocks from start to finish, keeping interrupts disabled throughout the process. In this case, AHBSRAM is stored to Sect29 on first entry into fill function . Each affected sector is then erased as required by the calls to write each specific 512byte block, finally the last call to the write function is indicated through its parameters and AHBSRAM is restored, interrupts enabled and good to go.

If however, interrupts can not be disabled throughout, then the scratch area must be written and restored every time a sector must be erased. The impact of this I have attempted to amortize by introducing a bitmap flag, Every time a sector is erased a 64bit bitmap is set to indicate which 512byte blocks within this sector have been written into by the fill routine, any block(bytes) not given by the call will be written as 0xFF after the sector is erased. In this way subsequent calls to write into this sector will not require an erase of the entire sector, and therefore no need to cache the AHBSRAM into sect 29, and no need to erase the sector again before we program it. This means for example even if interrupts are required between calls to the fill routine, sect 29 is banged only as many times as we need to write into any block within a sector written to already by the fill routine. Note, this also implies, that changing just a single byte within a block will result in the whole 512 byte block being marked as 'written'. Should, at a later stage, any other byte(s) within the same block be written, the entire sector will need to be cached, erased and re-written. It is clearly far more efficient to ensure, at a minimum, that data is passed as full 512byte aligned chunks. In this way the number or erase and potential caching (sect29) operations are minimized, even if these 512byte chunks are not written sequentially into the sector .

Different arrays can be defined and the code will handle any type, BUT!! if calls to the fill routine deals with different arrays these bitmaps will reflect the array used when the function was last called, there is only one bitmap per sector. It is designed to reflect where the fill function has erased and written, it is not concerned with the array that owned the data. It will improve efficiency but does not keep state for each possible array. Some inefficiency will occur if calls to the function use different arrays each time. However as arrays do not overlap, this approach should still offer reasonable efficiency. Go figure ;) It makes my head hurt. The function can handle blocks of data from 512bytes down to a single element length. i.e 1 byte for uint8_t or 4 bytes for uint32_t etc. It deals with everything at byte level and cares not for the type definitions used apart from array element size which must be passed to the function. This is not SRAM, some consideration is needed to minimize how much banging of the flash occurs. Its your call if you use...

Finally, the MBED compiler will align const arrays on the flash, but only according to the variables typedef. It does not consider the flash sector structure and/or block size of the IAP write function. Therefore it is best to manually align the arrays to match the 512byte blocks used by the function as they are declared.

align arrays at declaration to 512bytes

const uint16_t sensor_lut[0x10000] __attribute__((aligned (0x200))) = {0};

This ensures that the arrays begin directly at the start of any IAP block operation. There is clearly a penalty to this is terms of packing inefficiency, but max of 511bytes will be lost. However it is still possible to use the fill function with unaligned arrays. In this case calling the function with 'num_write_elements' = 0, will return with the number of array elements before the first element which is aligned correctly on flash. Nothing will be written to flash during this call, but you can use the return value to ensure the first call to the fill function only writes the array elements up to the last byte before the point of alignment. From there on ...keep things aligned. However clearly it is much simpler to define the const array aligned at compile time, dispensing with this issue.

So here are my functions, I hope this helps someone else too. I am using this to calculate and pre-fill at look up table which is for a 16bit sensor. No compression is possible SNR will not allow it, therefore a full 128Kbyte block is needed.

Flash Array Write

int Write_Flash_Array(uint8_t *flash_array, uint32_t array_size, uint32_t array_elem_size, uint8_t *data, uint32_t elem_offset, uint32_t num_write_ele, bool preserve_sram, bool last_call)
{
    static const uint32_t BLOCK_SIZE = 0x200;
    static uint64_t bmWriteable[29] __attribute__ ((section ("ZI"))); // Ensure no mistakes elsewhere, make sure its in ZI section.
    static uint8_t *scratch = (uint8_t*)0x2007C000; // Create a pointer to  AHBSRAM0 & AHBSRAM1 scratch area.
    static bool active_scratch = false;
    static uint8_t empty = 0xFF;
    uint32_t len = num_write_ele * array_elem_size;
    uint32_t offset = elem_offset * array_elem_size;
    uint8_t cache[BLOCK_SIZE];
    uint32_t fill = 0;
    uint32_t fetch = 0;

    //Sanity checks
    uint32_t allign = (BLOCK_SIZE - ((uint32_t)flash_array % BLOCK_SIZE))/array_elem_size; // We must recieve blocks alligned to the flash structure.
    if(len == 0) return allign;
    if(len > BLOCK_SIZE) return -1;
    if(len > array_size) len = array_size;

    //Calculate pointers into sector affected by write request.
    uint8_t *block_low = (uint8_t*)((uint32_t)(flash_array + offset) - ((uint32_t)(flash_array + offset) % BLOCK_SIZE));
    uint8_t *write_low = flash_array + offset;
    uint8_t *write_high = flash_array + ((offset + len) - 1);
    uint8_t *flash_array_high = flash_array + (array_size - 1);
    if(flash_array_high < write_high) write_high = flash_array_high;    // Make sure we only write into array defined mem, no matter how much data we are passed.
    uint8_t *work_block = 0;
    uint32_t scratch_offset = 0;
    uint32_t write_block = addr_to_block(flash_array + offset, BLOCK_SIZE);

    uint32_t work_sect = addr_to_sect((uint32_t)block_low);
    uint8_t *sect_low = sect_to_addr(work_sect);
    uint8_t *sect_high = sect_to_addr(work_sect + 1) - 1;

    if(work_sect > 28) {
        if(preserve_sram) return -1;       // Check we dont write into the scratch sector.
        else if(work_sect > 29) return -1;  // Check we dont exceed Flash size.
    }

    __disable_irq();        // Disable IRQs, Sync instruction pipeline and set a data memory barrier to ensure all explicit memory access has finished.
    __ISB();
    __DMB();
    // Run through the effected sectors
    if(((bmWriteable[work_sect] >> write_block) & 1)) {       // Check if the write_block has been erased by us and never written to. If so just write the block
        //Prepare block data
        fill = 0;
        fetch = 0;
        for(uint8_t *i = block_low; i < (block_low + BLOCK_SIZE); ++i) {
            if((i < write_low) || (i > write_high)) cache[fill] = 0xFF;      //Fill any bytes not given in data with 0xFF these can then still be poked directly elsewhere without erase.
            else {
                cache[fill] = *(data + fetch);
                ++fetch;
            }
            ++fill;
        }
        FlashWrite_IAP_calls('B', BLOCK_SIZE, work_sect, &cache[0], block_low, NULL, NULL, 0);
        bmWriteable[work_sect] &= ~((uint64_t)0x1 << write_block);              // Update block writeable status
    } else {
        uint32_t num_blk_in_sect = work_sect < 0x10?0x8:0x40;
        if(preserve_sram && !active_scratch) {
            FlashWrite_IAP_calls('A');   // Copy AHNSRAM0&1 into sect 29.
            active_scratch = true;
        }
        memcpy(scratch, sect_to_addr(work_sect), (num_blk_in_sect * BLOCK_SIZE));
        bmWriteable[work_sect] = 0xFFFFFFFFFFFFFFFF;
        for(work_block = sect_low; work_block < (sect_low + (num_blk_in_sect * BLOCK_SIZE)); work_block += BLOCK_SIZE) {
            fetch = 0;
            for(uint8_t *i = work_block; i < (work_block + BLOCK_SIZE); ++i) {
                scratch_offset = i - sect_low;
                if((i >= flash_array) && (i <= flash_array_high)) {
                    if ((i >= write_low) && (i <= write_high)) {
                        memcpy((scratch + scratch_offset), (data + fetch), 1);
                        ++fetch;
                        bmWriteable[work_sect] &= ~((uint64_t)0x1 << addr_to_block(i, BLOCK_SIZE));              // Update block writeable status
                    } else {
                        memcpy((scratch + scratch_offset), &empty, 1);
                    }
                }
            }
        }
        FlashWrite_IAP_calls('S', BLOCK_SIZE, work_sect, NULL, NULL, scratch, sect_low, num_blk_in_sect);
        
    }
    if(preserve_sram && last_call) {
        memcpy(scratch,(uint8_t*)0x00078000, 0x8000);       // Copy sector 29 into AHBSRAM0 & 1
        active_scratch = false;
        __enable_irq();
    }
    if(!preserve_sram) __enable_irq();
    return 0;
}

uint32_t addr_to_sect(uint32_t faddr)
{
    uint32_t sect = faddr & 0xFF000;
    if (sect < 0x00010000) sect >>= 12;
    else sect = 0x0F + ((sect >> 15) - 1);
    return sect;
}

uint8_t *sect_to_addr(uint32_t sect)
{
    if(sect < 0x10) return (uint8_t*)(0x0 + (sect * 0x1000));
    else return (uint8_t*)(0x10000 + ((sect - 0x10) * 0x8000));
}

uint32_t addr_to_block(uint8_t *faddr, uint32_t BLOCK_SIZE)
{
    uint32_t sectbase = (uint32_t)faddr;
    if(sectbase < 0x10000) sectbase = sectbase & 0xFF000;
    else sectbase = sectbase & 0xF8000;
    uint32_t block = ((uint32_t)faddr - sectbase) / BLOCK_SIZE;
    return block;
}



int FlashWrite_IAP_calls(char type, uint32_t BLOCK_SIZE, uint32_t work_sect, uint8_t *cache, uint8_t *block_low, uint8_t *scratch, uint8_t *sect_low, uint32_t num_blk_in_sect)
{
    uint32_t sect = 29;
    uint32_t iap_status = 0;
    uint32_t write_block = 0;

    switch(type) {
        case 'A':// Copy AHBSRAM0 & AHBSRAM1 into sector 29 on Flash
            iap_status = (error_code) iap.blank_check(sect, sect);
            if(iap_status == SECTOR_NOT_BLANK) {
                iap_status = (error_code) iap.prepare(sect, sect);
                if (iap_status != CMD_SUCCESS) return iap_status;
                iap_status = (error_code) iap.erase(sect, sect);
                if (iap_status != CMD_SUCCESS) return iap_status;
            }
            for(int i = 0; i < 0x8000; i+=0x1000) {
                iap_status = (error_code) iap.prepare(sect, sect);
                if (iap_status != CMD_SUCCESS) return iap_status;
                iap_status = (error_code) iap.write((char*)(0x2007C000 + i), (char*)(0x00078000 + i), 0x1000);
                if (iap_status != CMD_SUCCESS) return iap_status;
            }
            break;

        case 'B':// Copy cache data to Flash Block
            iap_status = (error_code) iap.prepare(work_sect, work_sect);
            if (iap_status != CMD_SUCCESS) return -iap_status;
            iap_status = (error_code) iap.write((char*)cache, (char*)block_low, BLOCK_SIZE);
            if (iap_status != CMD_SUCCESS) return -iap_status;
            break;

        case 'S':// Copy scratch copy + block block to Flash Sect
            iap_status = (error_code) iap.prepare(work_sect, work_sect);
            if (iap_status != CMD_SUCCESS) return iap_status;
            iap_status = (error_code) iap.erase( work_sect, work_sect );
            if (iap_status != CMD_SUCCESS) return iap_status;
            for (write_block = 0; write_block < (num_blk_in_sect * BLOCK_SIZE); write_block+=(BLOCK_SIZE * 8)) {
                iap_status = (error_code) iap.prepare(work_sect, work_sect);
                if (iap_status != CMD_SUCCESS) return iap_status;
                iap_status = (error_code) iap.write((char*)(scratch + write_block), (char*)(sect_low + write_block), (BLOCK_SIZE * 8));
                if (iap_status != CMD_SUCCESS) return iap_status;
                iap_status = (error_code) iap.compare((char*)(scratch + write_block), (char*)(sect_low + write_block), (BLOCK_SIZE * 8));
                if (iap_status != CMD_SUCCESS) return -iap_status;
            }
            break;
    }
    return 0;
}


The following is a short example of a call to the fill function. This simply runs across the const array sensor_lut[] and fills with data generated into a data[] array, defined with the same type as that of the flash const array. It takes care of aligning the first call and handles changes in the array type through simple sizeof() calculations. The whole thing could be much simpler if you know your arrays types, you allign at compile time etc. But the details are there for your delight.

Example call to fill function

void Flash_Write(void)
{
    //pc.printf("\033[2J");

    int16_t data[0x100];
    bool final;
    int block_size;

    int allign = Write_Flash_Array((uint8_t*)sensor_lut, sizeof(sensor_lut), sizeof(sensor_lut[0]), (uint8_t*)data, 0, 0, true, false);
    int elements = sizeof(sensor_lut)/sizeof(sensor_lut[0]);
    
    for(int32_t block = 0; block < elements; block==0?block+=allign:block+=(sizeof(data)/sizeof(data[0]))) {
        for(int i = block; i < (block + (block==0?allign:(sizeof(data)/sizeof(data[0])))); ++i) {
            data[i%(sizeof(data)/sizeof(data[0]))] = i - 0x7FFF;
            if(i == elements - 1) final = true;
            else final = false;
        }
        block_size = block==0?allign:(sizeof(data)/sizeof(data[0]));
        int result = Write_Flash_Array((uint8_t*)sensor_lut, sizeof(sensor_lut), sizeof(sensor_lut[0]), (uint8_t*)data, block, block_size, true, final);
        if (result != 0) {
            pc.printf("\033[3;1H%i",result);
        }
    }
}

So, finally I have normally defined const arrays compiled as standard by the MBED compiler, the usage of SRAM and FLASH are shown correctly and we have no issues with memory allocations.

Hope someone else might find this useful and finally as always, use with care. I have tested, but by no means exhaustively. Keep your fly squatter handy :)


Please log in to post comments.