Overview
The stack is an area of SRAM that is used to temporarily store the contents of general purpose registers. A register is saved to the stack using an operation known as a PUSH operation. A register is restored from the stack using a POP operation. In addition to saving and restoring general purpose registers, we will also use the stack to allocate local variables and pass parameter to a function.
Software Based Stack
The Cortex-M architecture uses a software based stack. During the initialization of the microprocessor, a section of SRAM is reserved for the stack. The starting address of the stack is then loaded into R13 (SP is a synonym for R13 in the Keil uVison IDE). The stack pointer is used to keep track of where new data will be added to the stack, or where already saved data will be restored from.
The advantage of a software based stack is that it provides flexibility in how the SRAM is allocated. Many MCUs have fewer than 128KBs of SRAM, so SRAM is a very valuable commodity that we want to allocate in a very judicious way. If an application heavily utilizes the stack, the initialization code can be modified to reserve additional SRAM for the stack. If an application uses very little stack space, the initialization code can be modified to reduce the amount of SRAM dedicated to the stack.
Some MCUs implement their stack using dedicated hardware that resembles a small SRAM used only for stack operations. The primary benefit of a stack implemented in hardware is performance. The primary detractor to a hardware based stack is that you are not able to adjust the size of the stack.
Stack Organization
When a register is pushed on the stack, it can be saved to the largest address in the stack or the smallest address. When registers are saved to the largest free memory address in the stack, the stack is considered to be a descending stack. As data is added to the stack, the value of the stack pointer is decremented by 4. As data is removed from the stack, the stack pointer is incremented by 4.
If a register is pushed to the smallest free memory address in the stack, the stack is considered to be an ascending stack. As data is added to the stack, the value of the stack pointer is incremented by 4. As data is removed from the stack, the stack pointer is decremented by 4.
A stack can also be classified as Full or Empty. A Full stack implies that the address in the stack pointer holds a saved register value. As a result, the stack pointer is adjusted to point to the next free address in the stack before new data is added. An Empty stack implies that the address in the stack pointer does NOT hold valid data. In this situation, a register is written to the stack and then the stack pointer is adjusted to point to the next available address in the stack.
A stack can be either Ascending or Descending. A stack can also be either Full or Empty. This leads to four possible stack organizations: Full Descending, Full Ascending, Empty Descending, and Empty Descending. The image below shows how each stack would be initialized if the stack started at 0x2000.0000 and contained 8 entries. The value SP would be initialized to is in the parenthesis.
Stack Initialization
The Cortex-M architecture uses a Full Descending stack. The size of the stack and the address it points to is set in startup_TM4C123.s. The portion of the code that initializes the stack is shown below.
; <h> Stack Configuration ; <o> Stack Size (in Bytes) <0x0-0xFFFFFFFF:8> ; </h> Stack_Size EQU 0x00000400 AREA STACK, NOINIT, READWRITE, ALIGN=3 Stack_Mem SPACE Stack_Size __initial_sp
You can set the stack size by modifying the string constant Stack_Size. The code above sets the stack to be 1K bytes. Since the MCU uses a full descending stack, a label (__initial_sp) is placed at the memory address immediately after the SRAM reserved for the stack.
A good question is, where is the stack located in SRAM? The Keil uVison assembler places the stack after the last global variable that is allocated in SRAM. The figure below shows an example of how SRAM is partitioned by the assembler.
Global Memory Pool
The global data pool consists of any global variables that were allocated by the compiler or assembler. In C, any variable that is declared outside of a function is added to the global memory pool. Below is an example of how to reserve 1024 bytes of global data.
;********************************************** ; SRAM ;********************************************** AREA SRAM, READWRITE GLOB_DATA SPACE 1024 align
The global data pool takes up 1K of data and the stack size was set to 1K ( see Stack_Size above). The rest of SRAM is then reserved for the heap. The heap is used for dynamic memory allocation (malloc) which will be covered later in class.
Its important to understand how the code you write affects the way memory is allocated by the assembler. Allocating a large number of global variables could reduce the amount of SRAM available to the stack and/or the heap. Not allocating enough SRAM for the stack can also lead to the stack clobbering global variables, commonly called a stack overflow.
Practical Engineering Note: In systems with very little memory, stack overflows can be a common occurrence. If your systems begins to fail in a strange way for what seems to be an unexplained reason, increase the size of your stack and see if the problem goes away. As an application grows in complexity, stack usage will go up and the size you choose at the beginning of the project can be too small.
Using the Stack
Saving Register Contents
Perhaps the most common use of the stack is to temporarily save the contents of a general purpose register. The example below shows how a function saves the registers that would be overwritten by the function. In order to save a register, we will use the PUSH operation. The POP operation is used to restore a register. PUSH is a synonym for STMDB ( Store Multiple, Decrement Before) where the base register is the stack pointer (SP). POP is a synonym for LDMIA ( Load Multiple, Increment After) where the base register is the stack pointer.
Example: Callee Saved
This example demonstrates a callee saved routine. In a callee saved routine, the function is responsible for saving any registers that it modifies. The idea here is that the function designer knows exactly what registers are modified by the function and can save only those registers to the stack.
;********************************************** ; function add4 ; Sums four WORDs of data located in adjacent ; memory locations. ; ; INPUT: ; R0: Base Address ; OUTPUT: ; R1: Total ; ; NOTE: Not an EABI compliant routine! ;********************************************** add4 PROC PUSH {R5-R8} ; Save to stack LDM R0, {R5-R8} ADD R5, R5, R6 ADD R5, R5, R7 ADD R1, R5, R8 POP {R5-R8} ; Restore from stack BX LR ENDP
The following code segment is the main routine that calls the function
;********************************************** ; Code (FLASH) Segment ; main assembly program ;********************************************** __main PROC LDR R0, =(GLOB_DATA) MOV R1, #0 MOV R2, #4 MOV R3, #8 MOV R4, #12 STM R0, {R1-R4} BL add4 ;Loop forever B __main
Analyzing the two code segments above:
- The function was written in such a way that it expects the address of the first word will be loaded into R0
- The function is called using Branch With Link, saving the address of the next instruction to LR
- When we examine the function, we see that registers R1, R5, R6, R7, and R8 are modified. Since the calling function expects the summed value to be returned in R1, we only need to store R5-R8
- After the computation, we restore the saved values of R5-R8
- Return to the calling function
Example: Caller Saved
This example demonstrates a caller saved routine is implemented. The idea here is that the calling function knows exactly what registers need to be maintained and saves only those registers to the stack. This allows the function to freely modify any register without affecting the operational state of the calling function.
;********************************************** ; function sub4 ; Subtracts four WORDs of data located in ; adjacent memory locations. ; ; INPUT: ; R0: Base Address ; OUTPUT: ; R1: Total ; ; NOTE: Not an EABI compliant routine! ;********************************************** sub4 PROC LDM R0, {R5-R8} SUB R5, R5, R6 SUB R5, R5, R7 SUB R1, R5, R8 BX LR ENDP
;********************************************** ; Code (FLASH) Segment ; main assembly program ;********************************************** __main PROC LDR R0, =(GLOB_DATA) LDR R5, =(testBit10) PUSH {R5} BL sub4 POP {R5} ;Loop forever B __main
Analyzing the two code segments above:
- The function was written in such a way that it expects the address of the first word will be loaded into R0
- Lets assume that R5 contains data we want preserved. A caller saved software design requires that the calling function save the registers that contain useful information, so we save R5 to the stack.
- Branch With Link to the function, saving the the next instruction address to the LR.
- In caller saved, the callee has no obligations to save any of the registers.
- Restore the saved value of R5
Passing Parameters
In the examples above, R0 was used to pass the first address of the data being examined. R0 was effectively be used to pass a parameter to the function. In C, the add4 function could be written as follows:
uint32_t add4(uint32_t baseAddr);
Instead of passing the parameter in a register, we could have alternatively chosen to pass the baseAddr parameter on the stack. This requires the main routine to PUSH the parameter to the stack prior to calling the function. Below is a snippet showing how to pass a single parameter via the stack.
LDR R9, =(GLOB_DATA) PUSH {R9} ; Push the to the stack BL add4_stack_param POP {R9} ; Deallocate the paramter from the stack
The following snippet shows how the add4 function would be modified if the base address was passed via the stack instead of R0.
;********************************************** ; function add4_stack_param ; Sums four WORDs of data located in adjacent ; memory locations. ; ; INPUT: ; SP: Base Address ; OUTPUT: ; R1: Total ; ; NOTE: Not an EABI compliant routine! ;********************************************** add4_stack_param PROC PUSH {R0, R5-R8} ; Save to stack LDR R0, [SP, #20] ; Load the parameter from the stack LDM R0, {R5-R8} ADD R5, R5, R6 ADD R5, R5, R7 ADD R1, R5, R8 POP {R0, R5-R8} ; Restore from stack BX LR ENDP
So why is the parameter located at an address that is 20 greater than the current stack pointer? The reason that 20 is added to the stack pointer is that we pushed 5 registers to the stack immediately before we load the parameter. As a result, the location of the parameter relative to the stack pointer changes. The figure below illustrates where the parameter is located relative to the stack pointer before and after the PUSH instruction. Make sure to note the addresses that R0, R5, R6, R7, and R8 are saved to on the stack. The lowest register number always gets saved to the lowest address in a PUSH operation!
Allocating Local Variables
In the C programming language, a variable can be a global variable or a local variable. A global variable can be read from or written to from any piece of code in the application. A global variable is allocated from the global memory pool. The address assigned to the global variable is set at compile time and will not change during the execution of the application.
In contrast, a local variable can only be read/written from within the function where it has been declared. When a function exits, the value that the local variable holds is lost. This is due to the fact that local variables are allocated from the stack. Free space in the stack expands and contracts dynamically, so the address of a given local variable also changes. For this reason, we will access local variables using an offset from the stack pointer. We will also see that allocating local variables is a run time operation which will also require us to de-allocate the variable from the stack.
The C code below shows an example of a function that allocates various local variables.
void localVarExample(void) { uint32_t var1 = 0; uint8_t var2 = 0; uint8_t var3 = 0; uint16_t var4 = 0; /* Code that modifies R0-R4*/ }
If this C code were to be written in ARM assembly, it would look like this
;********************************************** ; NOT EABI compliant routine! ;********************************************** localVarsASM PROC PUSH {R0-R4} ; Allocate Local Variables SUB SP, SP, 16 MOV R0, #0 STRH R0, [SP, #0] ; Access var4 STRB R0, [SP, #4] ; Access var3 STRB R0, [SP, #8] ; Access var2 STR R0, [SP, #12] ; Access var1 MOV R4, R3 ; De-Allocate Local Variables ADD SP, SP, 16 POP {R0-R4} BX LR ENDP
A few observations.
- The number of registers pushed to the stack is balanced with the number of registers that are popped.
- Allocating local variables is done with a single SUB instruction. We use a subtract operation since it can allocate multiple WORDs of stack space in a single instruction. This is compared to modifying the stack with a PUSH instruction that could take multiple clock cycles to complete.
- Local variables allocated in the function must be de-allocated before exiting the function.
- Since the stack is Full Descending, we use a subtract instruction with an immediate to allocate local variables.
- De-allocation is accomplished with an ADD instruction. The immediate should equal to the immediate used in the preceding SUB operation.
- The amount added/subtracted from the stack pointer should be a multiple of 4 bytes that is large enough to meet the requirements of all the local variables. The stack pointer needs to point to an address that is on a WORD boundary.
Symmetrical Stack Operations
When you look at the examples provided above, you will notice that there is symmetry between the PUSH and POP operations. Every PUSH must be paired with a POP. When a function pushes 4 registers onto the stack, it must pop 4 registers before the function returns. If a calling function passes a parameter via the stack, that parameter must be removed after the function returns. If local variables are allocated using a SUB operation, we need a paired ADD operation. By maintaining symmetrical stack operations, the validity of data stored in the stack is maintained and our applications function as expected.
Nested Functions
The BL and BLX commands are used to implement function calls. They save the address of the instruction after the BL command into the link register (LR). When the function ends, we issue a BX LR command to return from the function. But what happens to the Link Register if we have nested function calls? When the nested function is called, the value of the Link Register is going to get overwritten! This means there is no way to return to the main application if we do not save the contents of the Link Register prior to the nested function call.
For this reason, if a function contains a nested function call, it must PUSH the Link Register to the stack prior to executing the nested function. The example below shows a nested function. Notice that if the Link Register is saved to the stack, we can directly POP the saved Link Register into the Program Counter. Popping to the Program Counter performs the same return as BX LR.
;********************************************** ; function f1 ; NOTE: Not an EABI compliant routine! ;********************************************** func1 PROC PUSH {R0, R1, LR} MOV R0, #0 MOV R1, #1 BL func2 POP {R0, R1, PC} ENDP
I am a little confused about “MOV R4, R3” in localVarsASM PROC. The value in R4 is already same as it in R3, then why need to have this “MOV” command?