Load/Store

Memory Types

Almost all modern microprocessor have the ability to access two types of memory.  The first type of memory is a non-volatile memory that stores the machine instructions used to implement an embedded application.  In addition to the machine instructions, this area also contains constant variables, such as strings, that are used in the application.   We will refer to this type of memory as FLASH.  For our purposes, FLASH is only modified when we program our board via the JTAG interface and cannot be written to by the application itself.  The other important characteristic of the FLASH is that it is non-volatile.  This means that FLASH retains its contents when power is removed from the microprocessor.  As a result, our application does not need to re-initialize FLASH.  Once the flash is programmed via JTAG, it will remain unchanged until the next time the microprocessor is programmed.  In some situations FLASH can be updated by an application, but that is beyond the scope of this class.

The second memory type we will look at is SRAM (Static Random Access Memory).  SRAM is a volatile memory that is used to store computations that is calculated by the application.  A key differences between SRAM and FLASH is that the SRAM does not retain its contents when power is removed.  When the microprocessor is powered on, the contents of SRAM are uninitialized.  Any variables that are stored in SRAM will require the application to initialize the variable.  Most variables in a high level language would fall into this category.

Memory Addressing Modes

Microprocessors can access memory using several different memory addressing modes.  Here are a few of the more common addressing modes

Direct Addressing                                    LOAD R0, 0xFF00FF00 
The operand address is encoded into the instruction.  In variable length instructions, the full physical address can usually be encoded.
Register Indirect Addressing             LOAD R0, [R1]
The instruction specifies a register that contains the memory address to access
Indexed Addressing                                LOAD R0, [R1], R4
The address is calculated from a constant base address and the contents of a register
Based Addressing                                    LOAD R0, [R1], #64
The address is calculated from a base address contained in a register, plus a constant offset encoded in the instruction
PC-Relative Addressing                        LOAD R0, [PC], #64

 The address is computed by adding an offset value encoded in the instruction to the current value of the program counter.

Depending on the processor architecture, it may only support a few different memory addressing modes.   The ARM architecture is a RISC architecture, so it does not support direct addressing ( a 32-bit address consumes the entire instruction leaving no room for the op code).  As a result, the Cortex-M architecture supports register indirect, PC Relative, and based addressing modes.

Memory Map

The Cortex-M architecture has a 32-bit address bus.  32 address bits allow 4,294,967,296 address locations (2^32).  The roughly 4 billion addresses make up what is called a memory map.   On a microcontroller, only a small percentage of those 4 billion address locations will be usable by the application.  The MCU used in our class has only 32KBytes of SRAM and 256KBytes of FLASH.  In order to properly access the SRAM and FLASH, we need to know what addresses the FLASH and SRAM are mapped to.

 Description  Address Range
 SRAM  0x2000.7FFF
 0x2000.0000
 Reserved  0x1FFF.FFF
 0x0004.0000
 FLASH  0x0003.FFFF
 0x0000.0000

Allocating Memory

Before we access data in the FLASH or SRAM, we need to allocate the memory in the assembler.  The type of memory allocation shown below is static memory allocation.  Static memory allocation reserves a predetermined amount of memory at compile time.  We simply tell the assembler how much memory we want and what type of memory we want to allocate.  Statically allocated memory is used to allocated global variables in a high level language.  Once a statically allocated memory location is allocated, it cannot be used for any other purpose in the application.  This differs from dynamic memory allocation.  Dynamic memory allocation is a run time memory allocation scheme.  In dynamic memory allocation, memory is allocated based on need.  When the the application is done with a segment of SRAM, the SRAM is returned to the free pool and can be allocated for some other purpose.  This is a more flexible memory allocation scheme, but also is more complicated.   Dynamic memory allocation will be covered in more detail at a later time.

Allocating Application Storage

In order for an application to be run on the MCU, space needs to be reserved in the FLASH for the machine instructions that comprise the application.  The example code below can be used to define the main application code for the Keil assembler.  Take a look at the comments below for a more detailed explanation of what each line does.

    export __main                          ;(1) 

    AREA    |.text|, CODE, READONLY        ;(2)    
    align

;**********************
; main assembly program
;**********************
__main PROC                                ;(3)

    ; Instructions go here

    align

    ENDP                                  ;(4)
    END                                   ;(5)

 

  1. Export directives exports the symbol __main to the linker.  The symbol represents the address where the main routine starts.  This allows other files to import the __main symbol and branch to it.
  2. Indicates to the compiler that the resulting instructions will be located in the FLASH or CODE section of the memory map.  The specific section is given a name of .text
  3. A label used to determine the address of __main.  The PROC directive indicates the machine instructions are part of a procedure.  Without the PROC directive, you would not be able to debug the procedure in the debugger.
  4. Ends the procedure
  5. Ends the file

Constant Variables

In addition to allocating space in the FLASH for the application, we can also allocate constant variables.  A constant variable is a variable whose value is known at compile time and cannot be modified while the application is running.  One of the most common constant variables types are string constants that are displayed to the user.  Examine the examples below to see how to allocate constants in uVision.

;**********************************************
; Constant Variables (FLASH) Segment
;**********************************************
    AREA    |.text|, CODE, READONLY
CONST_WORD      DCD     0xDEADBEEF          ;(1)
HWORD_CONST     DCW     0xABCD              ;(2)
BYTE_CONST      DCB     0xAB                ;(3)
STRING_CONST    DCB     "Hello ECE353"      ;(4)
    align

;**********************************************
; Code (FLASH) Segment
;**********************************************
ece353_main PROC
    B ece353_main
    align

    ENDP
    END

 

  1. DCD is used to allocate 32-bits (4 bytes) of space and initializes the value to 0xDEADBEEF
  2. DCW is used to allocate 16-bits (2 bytes) of space and initializes the value to 0xABCD
  3. DCB is used to allocate 8-bits   (1 bytes) of space and initializes the value to 0xAB
  4. Allocates an array of bytes (12) and initializes the contents to be “Hello ECE353”

Loading Addresses

The two most common commands used to load an address into a register are shown below

(1) Normally used to access a label in FLASH
ADR      R0, CONST_WORD

(2) Normally used to access a label in SRAM          
LDR      R0, =(CONST_WORD)

 

  1. ADR is used to generate a PC relative address for a label.  The label must be within a range of -4095 and +4095 from the current PC.  If you are running the application out of FLASH, you cannot use this instruction to access a label in SRAM.  SRAM is more than 4095 bytes way from flash in the memory map.
  2. If you need to access a label in SRAM, you can use this version of the LDR command.  This is a pseudo command that creates a hidden read-only variable in the FLASH.  The assembler sets the value of the hidden variable to be the address of the desired label in SRAM.  The hidden variable is loaded using a PC relative LDR instruction that is within 4095 of the program counter.

Loading Data

The MCU executes a LDR (Load Register) instruction to read data in either FLASH or SRAM.  The LDR instruction requires that we supply an address as the 2nd operand.  So how do we know what address to use?  The answer is that we use the label supplied for the given piece of data.  The following instruction will load the 4-byte data found at CONST_WORD into R1.  This is the PC-Relative version of the LDR instruction.  Note that the data at CONST_WORD is loaded into R1, not the address of CONST_WORD

LDR      R1, CONST_WORD

If you are required to load a 16-bit data value into a register, you will need to use LDRH.  If you wish to load a 8-bit data value into a register you will need to use an LDRB.

LDRH      R1, HWORD_CONST
LDRB      R2, BYTE_CONST

What happens to the upper 16-bits of R1 after the LDRH instruction?  They are set to 0.  The upper 24-bits of R2 would be set to 0 after the LDRB instruction.

But what if the 8-bit value in BYTE_CONST was a negative number?  In this situation, we need to sign extend the 8-bit value to the entire 32-bit register.  A number is sign extended by taking the sign bit (bit 8 in this case) and replicating the value of the sign bit in all the bit positions greater than the sign bit (bits 31 through 8).   We can do this by appending a S to the instruction.

Sign extending an 8 or 16-bit value from SRAM or FLASH into a register DOES NOT set any values of the APSR.  Only a ALU type instruction or MOV instruction can do that.

LDRSB   R2, BYTE_CONST  ; Sign Extend the 8-bit value to 32-bits

The ARM architecture supports load instructions with the following address modes.

; Register Indirect
;  R0 ? MEM[R1]
LDR  R0, [R1]

; Register Indirect with pre-indexed
;  R0 ? MEM[R1+4]
LDR  R0, [R1, #4]

; Register Indirect with pre-indexed
;  R0 ? MEM[R1+R2]
LDR  R0, [R1, R2]

; Register Indirect with pre-index, R1 updated
;  R0 ? MEM[R1+4], R1 ? R1 + 4 
LDR  R0, [R1, #4]!

; Register Indirect post-indexed
;  R0 ? MEM[R1], R1 ? R1 + 4 
LDR  R0, [R1], #4 

; PC relative
; R0? MEM[PC ± Offset],
LDR  R0, label_1

; Pseudo Instruction, literal insertion
; Assembler creates a hidden constant in the
; literal pool with the correct value
; R0? ADDR of label_1, not the value at label_1
LDR  R0, =(label_1)

 

Allocating Global Variables

When an application must write to a variable, that variable must be located in SRAM.   Functionally, SRAM differs from FLASH because we can modify the contents of SRAM.  It is important to note that SRAM cannot be initialized at compile time.    Unlike FLASH which is non-volatile storage,  SRAM is a volatile memory.  Because SRAM is a volatile memory, the ARM assembler does not generate a ‘SRAM image’.  Generating a ‘SRAM image’ would be pointless because once power is removed from the processor, the initial state of any variables in SRAM would be lost the next time the processor was powered on and the application would not function properly.

Instead, when the microprocessor is powered on, the application begins to execute from the non-volatile application image stored in  FLASH.   The application code is required to have an initialization routine that will execute MOV and STR commands that are used to initialize the variables allocated in SRAM.

;****************************
; SRAM
;****************************
    AREA    SRAM, READWRITE      ; (1)
BYTE_DATA  DCB    0xAB           ; (2)
BYTE_ARRAY SPACE  10*1           ; (3)
WORD_ARRAY SPACE  10*4           ; (4)
    align
  1. Directive indicating that the following data allocations take place in SRAM and can be written and read.
  2. Creates a label called BYTE_DATA in SRAM for 1 byte of data.  The initialization has no effect.  The contents of SRAM need to be set via STR instructions
  3. Allocates 10 bytes of data. The label BYTE_ARRAY is used to access the beginning of the array.
  4. Allocates 10 words of data.  The label WORD_ARRAY is used to access the beginning of the array.

Writing to SRAM

When the MCU wants to write to SRAM, it must issue an STR command.  The STR command works in a similar way to the LDR command.  It can store 32, 16, or 8 bits from a general purpose register into an address in SRAM using a 32-bit base address.  Why does the Cortex-M architecture support 8 and 16-bit stores into SRAM?  The answer is data density.  Microcontrollers typically have a limited amount of SRAM.  If  data can be represented using fewer bits, this allows the programmer to better utilize the SRAM resources available.  For example, the analog to digital converter on the Tiva Launchpad is a 12-bit converter.  This means that every data sample we take is only 12-bits.  If we store this data as a WORD (32-bits), 20 of the 32-bits will be wasted.  A better approach would be to use half words (16-bits) to store the data.  This would allow twice the number of measurements to be stored into the same amount of SRAM as a 32-bit store.

The examples below give examples of how to write 32, 16, and 8 bits of data to SRAM.

MOV       R0, #0x20000000

; Stores a 32-bit Value at location 0x20000000
STR       R1, [R0]

; Stores a 16-bit value at location 0x20000004
STRH      R1, [R0, #4] 

; Stores a  8-bit value at location 0x20000006     
STRB      R2, [R0, #6]

The Cortex-M Architecture supports the following store commands

; Register Indirect
; MEM[R1] ? R0
STR  R0, [R1] 

; Register Indirect with pre-indexed
; MEM[R1+4] ? R0
STR  R0, [R1, #4]

; Register Indirect with pre-indexed
; MEM[R1+R0] ? R0
STR R0, [R1, R0]

; Register Indirect with pre-index, R1 updated
; MEM[R1+4]? R0, R1 ? R1 + 4 
STR  R0, [R1, #4]!

; Register Indirect post-indexed
; MEM[R1]? R0, R1 ? R1 + 4 
STR  R0, [R1], #4

Endianness

When you begin to examine the data in SRAM, you may be surprised by the order in which the data is stored in memory.  Data can be stored in one of two ways: big endian or little endian.  A big endian system stores the most significant byte in the smallest address.  A little endian system stores the least significant byte in the smallest address.  The TM4C123 uses little endian by default.

Examples

Allocating an Array in FLASH

;**********************************************
; Constant Variables (FLASH) Segment
;**********************************************
    AREA    |.text|, CODE, READONLY

; Allocate an array of 4 bytes
BYTE_ARRAY   DCB 0
             DCB 1
             DCB 2
             DCB 3
    align

Allocating Array of WORDs in SRAM

Note that the values stored in the array are unknown at reset.

; This is a string constant.  It does not allocate any space in the 
; application.  The assembler replaces WORD with '4' before compilation. 
WORD   EQU  4

;****************************
; SRAM
;****************************
    AREA    SRAM, READWRITE      
WORD_ARRAY SPACE  10*WORD           
    align

Reading Array – Pre-Indexed

    ; Load the contents of the byte array into R1-R8.
    ; Values treated as unsinged 
    ; Register Indirect with pre-indexed
    ADR     R0, BYTE_ARRAY
    LDRB    R1, [R0, #0]
    LDRB    R2, [R0, #1]
    LDRB    R3, [R0, #2]
    LDRB    R4, [R0, #3]
    LDRB    R5, [R0, #4]
    LDRB    R6, [R0, #5]
    LDRB    R7, [R0, #6]
    LDRB    R8, [R0, #7]

Reading Array – Post-Indexed

    ; Load the contents of the byte array into R1-R8.
    ; Values treated as unsinged 
    ; Register Indirect post-indexed
    ADR     R0, BYTE_ARRAY
    LDRB    R1, [R0], #1
    LDRB    R2, [R0], #1
    LDRB    R3, [R0], #1
    LDRB    R4, [R0], #1
    LDRB    R5, [R0], #1
    LDRB    R6, [R0], #1
    LDRB    R7, [R0], #1
    LDRB    R8, [R0], #1

Summing Array – Pre-Indexed

    ; Sum the the contents of BYTE array
    ; using a FOR loop
    ADR    R0, BYTE_ARRAY
    MOV    R1, #0    ; Initialize index
    MOV    R2, #0    ; Initialize Array Sum

FOR_START
    CMP     R1, #8
    BEQ     FOR_END
    LDRB    R3, [R0, R1]
    ADD     R2, R2, R3
    ADD     R1, R1, #1
    B       FOR_START
FOR_END

Summing Array – Post-Indexed

    ; Sum the the contents of BYTE array
    ; using a FOR loop
    ADR    R0, BYTE_ARRAY
    MOV    R1, #0
    MOV    R2, #0    ; Initialize Array Sum

FOR_START2
    CMP     R1, #8
    BEQ     FOR_END2
    LDRB    R3, [R0], #1
    ADD     R2, R2, R3
    ADD     R1, R1, #1
    B       FOR_START2
FOR_END2

Array Copy

    ; Copy 8 bytes of data from BYTE_ARRAY to
    ; SRAM_ARRAY
    ADR    R0, BYTE_ARRAY       ; SRC Address
    LDR    R1,=(SRAM_ARRAY_1)   ; DEST Address

    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes
    LDRB    R3, [R0], #1         ; Load  1 Bytes
    STRB    R3, [R1], #1         ; Store 1 Bytes

    ; Copy 8 bytes of data from BYTE_ARRAY to
    ; SRAM_ARRAY much more efficiently
    ADR    R0, BYTE_ARRAY       ; SRC Address
    LDR    R1,=(SRAM_ARRAY_2)   ; DEST Address

    LDR    R3, [R0, #0]         ; Load  4 Bytes
    STR    R3, [R1, #0]         ; Store 4 Bytes
    LDR    R3, [R0, #4]         ; Load  4 Bytes
    STR    R3, [R1, #4]         ; Store 4 Bytes