Related Questions
- What Is An Algorithm In Computer Science?
- Can You Log Into My College And Do My Ucertify Labs For Each Week And Take My Final Exam?
- Which One Is True For Sprint Planning? The Work To Be Performed In The Sprint Is Planned
- The Cyberapolis Water Company Has Been Taken Over By The Carbon Spector Terrorist Organization
- Hacking Assignment- Gain Access To Admin Credentials
- Ai Code & Poster For Training And Testing Images
- Ai Code & Poster For Training And Testing Images
- Testout Office Pro Help!
- Reporting: An Enterprise Network Design Report A Request For Proposal (Rfp) To Design Wired Lans, Wireless Lans And Support Backbone Network
- Hire Someone Doing My Test
- I Am Having Issues With My Program. It Is Already Passing Multiple Tests
- Gis Project, Floodplain And Stormwater Management
- 8 Question Wireshark Activity - Tcp
- Using Backtracking To Generate Search Tree For Cannibal And Missionary
- Wireshark Activity - Answer Questions From Pdf And Post Answer In Word Document
Popular Services
- Coursework writing help
- Term paper writing help
- Writing Help
- Paper Writing Help
- Research paper help
- Thesis Help
- Dissertation Help
- Case study writing service
- Capstone Project Writing Help
- Lab report Writing
- Take my online class
- Take my online exam
- Do my test for me
- Do my homework for me
- Do my math homework for me
- Online Assignment Help
- Do my assignment for me
- Essay Writing Help
- Write my college essay
- Write my essay for me
Post your project now for free and watch professional homework help experts outbid each other in just a few minutes.
Computer Architecture - Homework 5
Asked
Modified
Viewed
25
I have uploaded my homework file and the E20 Manual. Please solve all the questions in the homework file, the E20 manual describes the specific language that we need to solve the homework.
This order does not have tags, yet.
Additional Instructions
CS-UY 2214 — Homework 5
Jeff Epstein
Introduction
Unless otherwise specified, put your answers in a plain text file named hw5.txt. Number each answer.
Submit your work on Gradescope.
You may consult the E20 manual, which is available on Brightspace.
Problems
1. Write an E20 assembly language program that will store the value 1099 at memory cell 456, then halt.
Use your E20 assembler to make sure that your program is correct and can be assembled into valid
machine code. Your solution should consist of no more than seven instructions.
2. Examine the single-cycle circuit diagram and its description in the E20 manual. In particular, pay
attention to the meaning of the various control signals, and the possible values they can carry.
We want to add a new instruction to the single-cycle E20 processor, while preserving as much of the
existing hardware as possible.
Consider the following instruction specification:
swi imm, $regAddr
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
??? regAddr 000 imm
Mnemonic: Store word immediate
Example: swi 5, $2
Example: swi foo, $4
Stores the value signed value imm to the memory address in $regAddr.
The memory address is interpreted as an absolute address. All 16 bits of the value of $regAddr
are used to index into memory.
Symbolically: Mem[R[regAddr]] <- imm
Note that the numeric opcode isn’t specified, because it isn’t relevant for this exercise.
Based on the single-cycle E20 circuit diagram in the E20 manual, describe in detail any changes
necessary to the single-cycle E20 design in order to implement the specified instruction. Explicitly
mention any hardware to be added or changed. Include in your discussion any new control wires. In
addition, specify the values of all control signals (old and new) that should be set when the given
instruction is executing. That is, for each of the control wires (FUNCalu, MUXalu, etc), including any
1
additional control wires that you add, give their value. You may, if you like, provide an updated circuit
diagram. Explain and justify all proposed changes.
Your modifications to the single-cycle E20 must not interfere with the execution of any other instruction.
You should add the minimal amount of hardware necessary to accomplish the goal of implementing
the specified instruction.
3. Consider two computers:
• Computer A has a 5 GHz clock frequency and an average CPI of 4.
• Computer B has a 3 GHz clock frequency and an average CPI of 2.
Assuming the two computers are otherwise equivalent, which computer will run your programs faster?
By how much (as a percent)? Justify your answer.
4. Examine the multicycle circuit diagram and its description in the E20 manual. In particular, pay
attention to the meaning of the various control signals, and the possible values they can carry.
We want to add a new instruction to the multicycle E20 processor, while preserving as much of the
existing hardware as possible.
Consider the following instruction specification:
jmpm imm($regAddr)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
??? regAddr 000 imm
Mnemonic: Jump via memory
Example: jmpm 5($2)
Calculates a memory pointer by summing the signed number imm and the value $regAddr, loads
the value from that address, and jumps to the loaded address.
The memory address is interpreted as an absolute address. The least significant 13 bits of the
value of $regAddr + imm are used to index into memory.
Symbolically: pc <- Mem[R[regAddr] + imm]
Note that the numeric opcode isn’t specified, because it isn’t relevant for this exercise.
Based on the multicycle E20 circuit diagram in the E20 manual, describe in detail any changes necessary
to the multicycle E20 design in order to implement the specified instruction. Explicitly mention any
hardware to be added or changed.
In addition, For each of the five stages of execution of the instruction, specify the correct behavior of
the processor in one of two ways: either give the number of an existing execution state (0 − 13), or
give a new execution state by specifying the values of all relevant control signals. For control signals
that are not relevant, write “don’t care.”
You may, if you like, provide an updated circuit diagram. Explain and justify all proposed changes.
Your modifications to the multicycle E20 must not interfere with the execution of any other instruction.
You should add the minimal amount of hardware necessary to accomplish the goal of implementing
the specified instruction.
2
5. We’ve previously built a single-cycle E15 implementation in Verilog. Now, we will build a multicycle
implementation. The visible behavior of the multicycle implementation should be identical to that of
the original. However, now each instructions executes in four serial phases: fetch, decode, exec, and
store.
Download and extract the file e15_multicycle_incomplete.zip, containing an incomplete implemen-
tation of a multicycle E15 processor in Verilog.
Read E15Process.v completely. Note the areas marked with a “TODO” comment: your task is to
replace these lines by providing appropriate statements. Do not modify any other code. A correct
implementation will allow your E15Process.v to execute any E15 assembly language program.
In particular, you must provide code at the following points:
• Provide the complete decode phase. The decode phase is responsible for sending values to the
ALU on the mBus and the dBus. Your code should assign appropriate values to the registers
mbEn, dbEn, and addNotSub, based on current instruction, which is accessible in the opCode, src,
dst, and immData registers.
In order to determine the correct values of mbEn and dbEn, you’ll need to examine the continuous
assignment for mBus and dstBus, provided at the end of the module. Those wires provide inputs
directly to the ALU.
You’ll also need to update myState with the next phase.
• Provide code for setting pcIncr in the exec phase. This will be very similar to your code in
the single-cycle version of the processor. pcIncr is passed as an input into the pcALU, which
determines the address of the next instruction.
• In the store stage, provide code for storing the ALU result into an appropriate register, if necessary.
Not all instruction write a value to a register. Because mbEn was set to bEn_ALU in the previous
phase, mBus will contain the ALU output.
Test programs are included. You should verify the correct function of your E15 processor by comparing
each program’s actual output against its expected output.
Submit only your complete E15Process.v.
3
CS-UY 2214 — E20 Manual
Jeff Epstein
Contents
1 Introduction 2
2 Architecture 2
2.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 E15 vs E20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Instruction set 11
3.1 Instructions with three register arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 add $regDst, $regSrcA, $regSrcB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 sub $regDst, $regSrcA, $regSrcB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3 or $regDst, $regSrcA, $regSrcB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.4 and $regDst, $regSrcA, $regSrcB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.5 slt $regDst, $regSrcA, $regSrcB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.6 jr $reg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Instructions with two register arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 slti $regDst, $regSrc, imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 lw $regDst, imm($regAddr) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 sw $regSrc, imm($regAddr) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.4 jeq $regA, $regB, imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.5 addi $regDst, $regSrc, imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Instructions with no register arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 j imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 jal imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Pseudo-instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.1 movi $reg, imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.2 nop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.3 halt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Assembler directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5.1 .fill imm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Undefined bit patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Examples 16
4.1 Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1
5 Hardware implementation 18
5.1 Single-cycle version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1.1 Circuit diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1.2 Control signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Multicycle version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2.1 Circuit diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.2 High-level state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.3 Low-level state diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Pipelined version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.1 Circuit diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3.2 Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.3 Control modules and wires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3.4 Architectural differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1 Introduction
This manual covers the architecture of the E20 processor and its instruction set. The E20’s architecture is
more powerful than the E15, and its assembly language is more expressive. The encoding of its machine
language is also more complicated, so we will introduce a program, called an assembler, to convert its
assembly language into machine language, which can be directly interpreted by the processor.
2 Architecture
2.1 Registers
The E20 processor has eight numbered 16-bit registers. By convention, in E20 assembly language, we write
each register name with a dollar sign, so the registers are $0 through $7. However, register $0 is special,
because its value is always zero and cannot be changed. Attempting to change the value of register $0 is
valid, but will not produce any effect: the register is immutable. Therefore, we have only seven mutable
registers that can actually be used as registers, $1 through $7. The initial value of all registers is zero.
In addition, the E20 processor has a 16-bit program counter register, which cannot be accessed directly
through the usual instructions. The program counter stores the memory address of the currently-executing
instruction. The initial value of the program counter register is zero. After each non-jump instruction, the
program counter is incremented. A jump instruction may adjust the program counter.
2.2 Instructions
Syntax Each instruction in E20 assembly language consists of an opcode, as well as zero, one, two, or
three arguments. Opcodes are not case-sensitive. Numeric arguments are expressed in decimal. At least one
space is required between the opcode and the first argument. Arguments are separated by commas. Optional
spaces may occur around each comma. Optional spaces may occur before the opcode, as well as after the
last argument, or immediately after the opcode if there are no arguments. A maximum of one instruction
may appear on a single line.
Any opcode may optionally be prefixed by any number of label declarations, each consisting of a label
name followed by a colon. At least one space is required between a label declaration and any subsequent
label declaration or opcode on the same line. A label name is a non-empty sequence of letters, digits,
and underscores; the first character may not be a digit. Labels may appear either on the same line as an
instruction, or on a line by themselves. A label declaration may also appear on a line after the last instruction
of a program. Label names are not case-sensitive.
Comments begin with a hash mark (#) and continue to the end of the line. Blank lines are ignored.
2
There are three kinds of arguments:
• register argument — an integer 0 . . . 7 prefixed with dollar sign. For example, $4
• immediate argument — may consist of a positive decimal integer, such as 34; a negative decimal
integer, such as -4; or a label name, such as loc2
• memory reference — a combination of an immediate argument and a register argument, with the latter
contained in parentheses. For example, 0($2) or array($0)
Examples The add instruction takes three register arguments: a destination register and two source
registers, as follows:
add $1, $2, $3 # add the value of $2 and the value of $3, store sum into $1
The addi instruction takes a destination register, a source register, and an immediate value, as follows.
Note that the third argument is an immediate value, not a register:
addi $1, $2, 3 # add the value of $2 and 3, store sum into $1
Due to the special property that register $0 always has value zero, we can use the same opcode to store
an immediate value into a register. Below, we add the immediate value to the value of register $0, so the
sum will always equal the immediate value:
addi $1, $0, 3 # store 3 into $1
The j instruction takes an immediate argument, and jumps to the memory address indicated. The
address may be expressed numerically or with a label:
j 8 # jump to address 8
j somelabel # jump to address given by label (defined elsewhere)
The lw instruction has one register argument and one memory reference argument. It will calculate a
pointer given by the memory reference argument, and read the value stored at that memory address, storing
it into the given register. Each memory reference argument has two parts: an immediate value (specified as
a number or as a label) and a parenthesized register name. The memory address to be accessed is calculated
by adding the immediate value and the value stored in the register:
lw $4, 0($3) # the memory reference is 0($3). Here,
# the immediate part is zero, and the
# register is $3. Therefore , the address
# to be read is equal to the value of
# register $3, plus zero. The value at that
# memory location will be stored into
# register $4
lw $4, foo($0) # the memory reference is foo($0). Here, the
# immediate part is foo, a label. The register
# part is $0, which we know has the value 0.
3
# Therefore , the address to be read is equal
# to the value of foo, plus zero. The value
# at that memory location will be stored into
# register $4
lw $4, foo($3) # the memory reference is foo($3). Here, the
# immediate part is foo, a label. The register
# part is $3. Therefore , the address to be read
# is equal to the value of foo, plus the value
# of register $3. The value at that memory location
# will be stored into register $4.
Further examples of instructions are given below, as well as in the provided example programs.
Common errors The format of the addi instruction specifies the following fields:
• opcode — 3 bits
• source register — 3 bits
• destination register — 3 bits
• immediate — 7 bits
This means that the immediate value must be expressed as a signed 7-bit number, in the range of
−64 . . . 63. Any value outside of that range is not expressible in the allocated number of bits, and therefore
the assembler must reject it. A similar restriction exists for other instructions that take an immediate field,
such as jeq and slti.
A consequence of the above stipulation is that it’s impossible to directly set all bits of a register at once,
with a single addi instruction. For that, we would need to use lw to read a stored value from memory.
2.3 Memory
The E20 processor has 8192 16-bit cells of memory. Initially, the program is loaded into memory, starting
at address zero. The program counter is initially zero, so execution begins at the first instruction of the
program. Memory cells that do not initially contain program instructions are initially set to zero.
The 8192 cells of memory are addressed using 13-bit addresses. When accessing a memory cell via
a pointer (either through the program counter, or with the lw and sw opcodes), the 3 most significant
bits of the pointer are ignored. Therefore, an instruction accessing memory via a pointer value 43222
(10101000110101102) refers to the same address as a pointer value 2262 (00001000110101102). Similarly, the
instruction to be executed will be determined by the least significant 13 bits of the program counter.
E20 assembly language allows the use of labels. A label is a symbol that represents a known memory
address. A label’s name is a sequence of characters obeying the following rules: the first character may be
a letter or an underscore; subsequent characters may be a letter, an underscore, or a digit; there must be at
least one character. Label names are not case-sensitive.
A label may be declared in your assembly code by giving its name followed by a colon, preceding any
instruction (including normal instructions, pseudo-instruction, and directives). Multiple labels may be de-
clared at a single instruction, one after another. The value of the label will be the address of the instruction
that it precedes. If the label precedes no instruction, the value of the label will be the address that a subse-
quent instruction would occupy. A label’s name may be used as the imm field of any instruction, assuming
that the value of the label can be expressed in the number of bits allocated to that field. A label’s declaration
need not precede its use.
Labels make it easier to write E20 assembly language programs. Instead of referring to addresses nu-
merically, which will change as we add or remove instructions in the development of our program, we can
4
refer to addresses by name, which the assembly will convert to the appropriate numeric address. Labels can
denote the address of an instruction (which can serve as the target of a jump instruction) or the address of
a data location (which can serve as the target of load/store word instruction).
Here’s an example program, with assembly language on the left, and the corresponding machine code on
the right:
first_label:
movi $1, 1
j first_label
j second_label
second_label:
movi $2, 2
ram[0] = 16'b0010000010000001;
ram[1] = 16'b0100000000000000;
ram[2] = 16'b0100000000000011;
ram[3] = 16'b0010000100000010;
In the above program, the label first_label is declared before the instruction at address 0, therefore
its value is zero. The second_label is declared before the instruction at address 3, so its value is 3. The
instruction j first_label refers to first_label, so the immediate field of that instruction contains the
value of that label. Similarly, the instruction j second_label has an immediate field containing the value
of that label. In the above listing, the color of the labels matches to the color of the corresponding bits.
Memory can be read and written using the lw and sw instructions, respectively.
The lw (load word) instruction will read a memory cell, copying its value into a register:
• The first argument of lw is the register to copy the value into.
• The second argument of lw has two parts:
– an immediate field, which can be a number or a label, and
– a register field.
The immediate will be summed with the value of the register, giving the memory address to read from.
The sw (store word) instruction will write to a memory cell, copying a value from a register:
5
• The first argument of sw is the register holding the value to be copied to be memory.
• The second argument of sw has two parts:
– an immediate field, which can be a number or a label, and
– a register field.
The immediate will be summed with the value of the register, giving the memory address to write to.
Here’s an example:
lw $2, myvariable($0)
movi $3, 1
lw $4, myvariable($3)
myvariable:
.fill 42
.fill 97
ram[0] = 16'b1000000100000011;
ram[1] = 16'b0010000110000001;
ram[2] = 16'b1000111000000011;
ram[3] = 16'b0000000000101010;
ram[4] = 16'b0000000001100001;
In the above program, the instruction at address 0 calculates the memory address myvariable + $0,
which is equal to myvariable. The value of the label myvariable is 3, because the label is declared at
the instruction at memory address 3. Therefore, the first lw reads the value at that address, which is
42, and loads it into register $2. Then, in the instruction at address 2, we calculate the memory address
myvariable + $3, which is myvariable + 1, which is 4. The value at memory address 4 is is 97, which is
loaded into register $4.
Variables We can use labels, lw, and sw to provide variables to our program. Consider the following:
beginning:
lw $1, mycounter($0)
addi $1, $1, 1
sw $1, mycounter($0)
j beginning
mycounter:
.fill 0
ram[0] = 16'b1000000010000100;
ram[1] = 16'b0010010010000001;
ram[2] = 16'b1010000010000100;
ram[3] = 16'b0100000000000000;
ram[4] = 16'b0000000000000000;
In the above program, mycounter is a label with the value 4, referring to a memory address initialized
to the value 0. In each iteration of the loop, the value from that address is read, incremented, and stored
back into memory. The register $1 is used only temporarily. This approach to storing data is convenient,
because although we have relatively few registers, we have access to a large number of memory addresses.
6
Arrays The numeric relationship of memory addresses to each other allows us to loop through a sequence
of adjacent memory addresses, effectively treating them as an array. For example:
movi $1, 0
movi $3, 0
loop:
lw $2, myarray($1)
add $3, $3, $2
addi $1, $1, 1
jeq $2, $0, done
j loop
done:
halt
myarray:
.fill 5
.fill 3
.fill 20
.fill 4
.fill 5
.fill 0
ram[0] = 16'b0010000010000000;
ram[1] = 16'b0010000110000000;
ram[2] = 16'b1000010100001000;
ram[3] = 16'b0000110100110000;
ram[4] = 16'b0010010010000001;
ram[5] = 16'b1100100000000001;
ram[6] = 16'b0100000000000010;
ram[7] = 16'b0100000000000111;
ram[8] = 16'b0000000000000101;
ram[9] = 16'b0000000000000011;
ram[10] = 16'b0000000000010100;
ram[11] = 16'b0000000000000100;
ram[12] = 16'b0000000000000101;
ram[13] = 16'b0000000000000000;
In the above program, myarray identifies the address of the beginning of an array of numbers, which is
really just a sequence of memory locations. The program uses $1 to store an index into the array, starting
at zero, and incrementing it thereafter. A value from the array is read by the instruction at address 2. The
loop will then sum each consecutive array element into a running total in $3, before continuing to the next
array element. When it finds an array element whose value is zero, the loop ends. Thus, this program sums
all element in the array, and the final value of $3 will be 37.
Halting As in the E15, the end of the program is expressed by entering a tight loop. We use the halt
pseudo-instruction for this purpose, which is implemented as a jump to itself. That is, the halt instruction
is equivalent to the following:
endofprogram: j endofprogram
Common errors Although a label may be declared before or after its use, it is an error to refer to a label
that has not been declared at all in the program. For example, the following program cannot be assembled:
lw $1, mycounter($0)
addi $1, $1, 1
sw $1, mycounter($0)
j beginning # undeclared label!
mycounter:
.fill 0
It is an error to declare the same label more than once in a program. On the other hand, it is not an
error to declare a label that is never referred to.
Jumps E20 has several jump opcodes:
• j — jump, with a 13-bit destination address
• jal — jump and link, with a 13-bit destination address
• jr — jump to register, with a register containing a 16-bit destination address
7
• jeq — jump if equal, with a 7-bit relative destination address
In the case of jr, the register argument provides a 16-bit address, which is loaded into the program
counter.
In the case of j and jal, the instruction format provides only a 13-bit destination address. The provided
13 bits will be copied into the least-significant 13 bits of the program counter, and the remaining 3 bits of
the program counter will be set to zero.
In the case of jeq, the instruction format provides only a 7-bit relative destination address, which is
interpreted as a signed 2’s complement number, sign-extended1, and added to the 16-bit address of the
subsequent instruction; this sum is stored into the program counter. This means that using only this
instruction, it is impossible to jump to a location more than 27−1 − 1 = 01111112 = 6310 cells ahead of the
current program counter, or more than −27−1 = 10000002 = −6410 behind it.
In E20 assembly language, it is an error to provide a immediate value, either numerically or by label, that
exceeds the allowed number of bits for the immediate field of the given instruction. For example, j 9000
cannot be assembled, since 9000 cannot be expressed as a 13-bit unsigned integer. Similarly, jeq $0, $0, 90
cannot be assembled if it is located at address 0, since 90 − 0 − 1 = 89 cannot be expressed as a 7-bit 2’s
complement integer.
2.4 Comparison
E20 supports equality comparison and less-than comparison.
Equality Equality comparison is achieved with the jeq instruction. Its arguments are two registers, and
an immediate value that indicates a memory location. If the values of its two register arguments are equal,
then the program will jump to the given address. Otherwise, the program will proceed to the subsequent
instruction. For example, consider the following program:
jeq $1, $0, they_are_equal # if $1 is 0, go to they_are_equal
addi $1, $1, 1 # we execute this only when $1 is not 0
j done # avoid fall-through to next instruction
they_are_equal:
addi $1, $1, 2 # we execute this only when $1 is 0
done:
# end of program
The above E20 assembly program might be expressed roughly using the following Python code:
if Reg1 == 0:
Reg1 += 2
else:
Reg1 += 1
Note that the j instruction is necessary. In contrast, consider the following program, identical except for
the omission of the j instruction:
jeq $1, $0, they_are_equal # if $1 is 0, go to they_are_equal
addi $1, $1, 1 # we execute this only when $1 is not 0
they_are_equal:
addi $1, $1, 2 # we execute this regardless
done:
# end of program
1To sign-extend is to increase the number of bits of a 2’s complement number, in such a way that does not change the value
it represents. The most-significant bit of the number, which represents the sign (negative or non-negative) of the value, must
be reproduced into new bits positions. For example, consider a 4-bit 2’s complement number, 10112 = −510. If we sign-extend
this to an 8-bit number, it will become 111110112.
In this case, we need to sign-extend the 7-bit address to 16 bits, because the result must be added to a signed 16-bit number.
8
This version would be expressed with the following Python code:
if Reg1 != 0:
Reg1 += 1
Reg1 += 2
Less-than Less-than comparison is achieved with the slt or slti instructions, in conjunction with jeq.
The slt or slti instructions will set a given register to 1 if its second argument is less than its third
argument, and to 0 otherwise. Then, we can use jeq to check the value of the given register. For example:
slt $1, $2, $3 # $1 will be set to 1 if $2 < $3
jeq $1, $0, not_less # if $1 is now 0, we know $2 >= $3
addi $2, $2, 1 # we execute this only when $2 < $3
j done # avoid fall-through to next instruction
not_less:
addi $2, $2, 2 # we execute this only when $2 >= $3
done:
# end of program
The above E20 assembly program might be expressed roughly using the following Python code:
if Reg2 < Reg3:
Reg2 += 1
else:
Reg2 += 2
Note that slt and slti perform unsigned comparison. That is, both comparands are assumed to be
non-negative binary numbers.
Less-than-or-equals We can combine the above approaches to provide less-than-or-equals comparison.
First we compare for equality, then for less-than.
jeq $2, $3, less_or_equal # jump if $2 and $3 are equal , otherwise fallthrough
slt $1, $2, $3 # $1 will be set to 1 if $2 < $3
jeq $1, $0, not_less_or_equal # if $1 is now 0, we know $2 > $3
less_or_equal:
addi $2, $2, 1 # we execute this only when $2 <= $3
j done # avoid fall-through to next instruction
not_less_or_equal:
addi $2, $2, 2 # we execute this only when $2 > $3
done:
# end of program
2.5 Subroutines
The jal and jr instructions can be used together to achieve subroutines, a flow control mechanism that
allows execution to return to an early point of code, and can be invoked from multiple places within a
program. A subroutine is therefore like a function, but does not necessarily encompass the parameters and
return values that functions usually entail.
Before we discuss how subroutines work in E20, consider the following C++ code:
1 int x = 0;
2 int y = 0;
3
9
4 void sub1() {
5 x++;
6 y--;
7 }
8
9 void main() {
10 sub1();
11 sub1();
12 }
In the above code, execution starts at line 10. The subroutine sub1 is invoked, which does stuff in lines
5 and 6, and execution then returns to line 11. At line 11, the subroutine is invoked again, lines 5 and 6 are
executed again, and execution returns to line 12, whereupon the program ends.
The subtlety of such code is the mechanism by which these invocations and returns are implemented.
How does sub1 know where to return to when it finishes, particularly when it returns to a different place
each time?
On most architectures, this feat is accomplished by storing the address of the subsequent instruction at
the time of a subroutine’s invocation. Then, when the subroutine has finished, it copies that stored address
into the program counter. This approach is used by the E20 processor.
• The jal (jump and link) instruction is similar to the ordinary jump instruction (j), with the key
difference that in addition to storing the immediate value into the program counter, jal will store the
address of the subsequent instruction into $7.
• The jr (jump to register) instruction copies the address in the given register into the program counter.
Therefore the instruction jr $7 will jump to the address in the register $7, where it was previously
stored by jal.
Below is a direct E20 translation of the above C++ program. Note that we use jal sub1 to invoke the
subroutine, and jr $7 to return from it.
main:
jal sub1 # invoke subroutine
jal sub1 # invoke subroutine again
halt # end program
sub1:
lw $1, x($0) # x++
addi $1, $1, 1
sw $1, x($0)
lw $1, y($0) # y--
addi $1, $1, -1
sw $1, y($0)
jr $7 # return from subroutine
x:
.fill 0
y:
.fill 0
10
Common errors A limitation of subroutines on E20 is that the value of $7 must be preserved for the
duration of the subroutine. If the subroutine modifies $7, then the jr instruction may return to an unintended
location.
A corollary of the above limitation is that a subroutine cannot call another subroutine. Since the jal
instruction modifies $7, this makes recursion awkward on E20.
Both of these limitation have workarounds. For example, a subroutine could save the value of $7 into
another register or into memory. It would then be free to manipulate $7, including by invoking other
subroutines, as long as it restores the original value of $7 before it returns.
2.6 E15 vs E20
The E20 processor can be considered a more powerful version of the E15. Although they have many features
in common and their instruction sets are similar, the E20 expands some capabilities.
E15 E20
registers four 4-bit registers seven 16-bit registers
memory 16 12-bit cells of read-only in-
struction memory
8192 16-bit cells of instruction/-
data memory
instruction format all instructions have 4-bit opcode
field, two 2-bit register fields,
and 4-bit immediate field
three formats:
• 3-bit opcode field, three 3-
bit register fields, 4-bit im-
mediate field
• 3-bit opcode field, two 3-
bit register fields, 7-bit im-
mediate field
• 3-bit opcode field, 13-bit
immediate field
3 Instruction set
Here we discuss the instructions that form the E20 instruction set. We categorize the instructions by their
format, which is based on the number of register arguments. The format determines how many bits are
allocated to each argument.
In addition to instructions proper, we discuss pseudo-instructions, which are assembly mnemonics that
are translated to another instruction. Finally, we cover assembler directives, which change the behavior of
the assembler but do not correspond to any specific instruction.
In the following subsections, we give the following information about each instruction:
• the instruction name and syntax — The header of each subsection gives the assembly language syntax.
• the instruction format — Within each 16-bit instruction, we show which bits must have which values.
• the informal behavior — We give a brief prose description of the behavior of the instruction when
executed.
• the formal behavior — We use a symbolic notation to express the behavior of the instruction. Here,
we use the following symbols:
– <-: indicates that the location on the left will be updated with the value on the right
– $reg (and variants, such as $regA, $regSrc, etc): the instruction’s register argument
11
– imm: the instruction’s immediate argument
– R: the processor’s register file, indexed by the name of a register, 0 . . . 7
– Mem: the processor’s memory, indexed by a memory address
– pc: the program counter
Unless otherwise specified, immediate values can be positive, negative, or zero and are represented in
2’s complement. Unless otherwise specified, all instructions increment the program counter.
3.1 Instructions with three register arguments
3.1.1 add $regDst, $regSrcA, $regSrcB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 3 bits 4 bits
000 regSrcA regSrcB regDst 0000
Example: add $1, $0, $5
Adds the value of registers $regSrcA and $regSrcB, storing the sum in $regDst.
Symbolically: R[regDst] <- R[regSrcA] + R[regSrcB]
3.1.2 sub $regDst, $regSrcA, $regSrcB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 3 bits 4 bits
000 regSrcA regSrcB regDst 0001
Mnemonic: Subtract
Example: sub $1, $0, $5
Subtracts the value of register $regSrcB from $regSrcA, storing the difference in $regDst.
Symbolically: R[regDst] <- R[regSrcA] - R[regSrcB]
3.1.3 or $regDst, $regSrcA, $regSrcB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 3 bits 4 bits
000 regSrcA regSrcB regDst 0010
Example: or $1, $0, $5
Calculates the bitwise OR of the value of registers $regSrcA and $regSrcB, storing the result in $regDst.
Symbolically: R[regDst] <- R[regSrcA] | R[regSrcB]
3.1.4 and $regDst, $regSrcA, $regSrcB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 3 bits 4 bits
000 regSrcA regSrcB regDst 0011
Example: and $1, $2, $5
Calculates the bitwise AND of the value of registers $regSrcA and $regSrcB, storing the result in
$regDst.
Symbolically: R[regDst] <- R[regSrcA] & R[regSrcB]
12
3.1.5 slt $regDst, $regSrcA, $regSrcB
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 3 bits 4 bits
000 regSrcA regSrcB regDst 0100
Mnemonic: Set if less than
Example: slt $1, $2, $5
Compares the value of $regSrcA with $regSrcB, setting $regDst to 1 if $regSrcA is less than $regSrcB,
and to 0 otherwise.
The comparison performed is unsigned, meaning that the two operands are treated as unsigned 16-bit
integers, not 2’s complement integers. Therefore, 0x0000 < 0xFFFF.
Symbolically: R[regDst] <- (R[regSrcA] < R[regSrcB]) ? 1 : 0
3.1.6 jr $reg
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 3 bits 4 bits
000 reg 000 000 1000
Mnemonic: Jump to register
Example: jr $1
Jumps unconditionally to the memory address in $reg.
The jump destination is expressed as an absolute address. All 16 bits of the value of $reg are stored into
the program counter.
Symbolically: pc <- R[reg]
3.2 Instructions with two register arguments
3.2.1 slti $regDst, $regSrc, imm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
111 regSrc regDst imm
Mnemonic: Set if less than, immediate
Example: slti $1, $2, some_label
Example: slti $1, $2, 30
Compares the value of $regSrc with sign-extended imm, setting $regDst to 1 if $regSrc is less than imm,
and to 0 otherwise.
The comparison performed is unsigned, meaning that the two operands are treated as unsigned 16-bit
integers, not 2’s complement integers. Therefore, 0x0000 < 0xFFFF. This is true even though the argument
is expressed as a 7-bit signed number.
Symbolically: R[regDst] <- (R[regSrc] < imm) ? 1 : 0
3.2.2 lw $regDst, imm($regAddr)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
100 regAddr regDst imm
Mnemonic: Load word
Example: lw $1, some_label($2)
13
Example: lw $1, 5($2)
Example: lw $1, -1($2)
Calculates a memory pointer by summing the signed number imm and the value $regAddr, and loads the
value from that address, storing it in $regDst.
The memory address is interpreted as an absolute address. The least significant 13 bits of the value of
$regAddr + imm are used to index into memory.
Symbolically: R[regDst] <- Mem[R[regAddr] + imm]
3.2.3 sw $regSrc, imm($regAddr)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
101 regAddr regSrc imm
Mnemonic: Store word
Example: sw $1, some_label($2)
Example: sw $1, 5($2)
Example: sw $1, -1($2)
Calculates a memory pointer by summing the signed number imm and the value $regAddr, and stores
the value in $regSrc to that memory address.
The memory address is interpreted as an absolute address. The least significant 13 bits of the value of
$regAddr + imm are used to index into memory.
Symbolically: Mem[R[regAddr] + imm] <- R[regSrc]
3.2.4 jeq $regA, $regB, imm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
110 regA regB rel_imm
where rel_imm = imm - pc - 1
Mnemonic: Jump if equal
Example: jeq $1, $0, some_label
Example: jeq $2, $3, 23 (jumps to address 23)
Compares the value of $regA with $regB. If the values are equal, jumps to the memory address identified
by the address imm, which is encoded as the signed number rel_imm.
The jump destination, imm, is encoded as a relative address, rel_imm. That is, when a jump is performed,
the value rel_imm is sign-extended and added to the successor value of the program counter. Therefore, the
actual address that will be jumped to is equal to the current program counter plus one plus the immediate
value. This means that when encoding a jeq instruction to machine code, the field in the least significant
seven bits must be the difference between the desired destination and one plus the program counter.
Symbolically: pc <- (R[regA] == R[regB]) ? pc+1+rel_imm : pc+1
3.2.5 addi $regDst, $regSrc, imm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 3 bits 3 bits 7 bits
001 regSrc regDst imm
Mnemonic: Add immediate
Example: addi $1, $0, some_label
Example: addi $2, $2, -5
Adds the value of register $regSrc and the signed number imm, storing the sum in $regDst.
Symbolically: R[regDst] <- R[regSrc] + imm
14
3.3 Instructions with no register arguments
3.3.1 j imm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 13 bits
010 imm
Mnemonic: Jump
Example: j some_label
Example: j 42
Jumps unconditionally to the memory address imm.
The jump destination is expressed as a non-negative absolute address. All 13 bits of imm are stored into
the program counter, while the most significant 3 bits of the program counter will be set to zero.
Symbolically: pc <- imm
3.3.2 jal imm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 bits 13 bits
011 imm
Mnemonic: Jump and link
Example: jal some_label
Example: jal 42
Stores the memory address of the next instruction in sequence in register $7, then jumps unconditionally
to the memory address imm.
The jump destination is expressed as a non-negative absolute address. All 13 bits of imm are stored into
the program counter, while the most significant 3 bits of the program counter will be set to zero.
Symbolically: R[7] <- pc+1; pc <- imm
3.4 Pseudo-instructions
Pseudo-instructions, unlike proper instructions, do not have their own unique encoding in machine language.
Instead, pseudo-instructions are a shorthand form of expressing more complicated assembly instructions.
The assembler will translate a pseudo-instruction into an actual instruction, which will then be assembled
normally.
3.4.1 movi $reg, imm
Mnemonic: Move immediate
Example: movi $2, 55
Example: movi $7, some_label
Copies the value imm to the register $reg.
The movi $reg, imm instruction is translated by the assembler as addi $reg, $0, imm.
3.4.2 nop
Mnemonic: No operation
Performs no operation, other than incrementing the program counter.
The nop instruction is translated by the assembler as add $0, $0, $0.
15
3.4.3 halt
Performs no operation at all. The program counter is not incremented, resulting in an infinite loop. By
convention, represents the end of the program.
The halt instruction is translated by the assembler as an unconditional jump (j) to the current memory
location.
3.5 Assembler directives
Directives are not processor instructions, but rather commands to the assembler itself.
3.5.1 .fill imm
Example: .fill some_label
Example: .fill 42
Inserts a 16-bit immediate value directly into memory at the current location. This directive instructs
the assembler to put a number into the place where an instruction would normally be. The immediate value
may be specified numerically (positive, negative, or zero) or with a label.
3.6 Undefined bit patterns
Any bit pattern not covered by any of the previous sections is not valid machine code and its interpretation
is undefined.
4 Examples
Here are some examples of E20 assembly language programs. Further examples, including the machine code
translation, are available on the class website.
4.1 Math
# add, addi, and sub work as you expect , except
# they take three arguments. The first argument
# is the destination , the other two are the
# sources.
# There is no subi. Instead you have to use addi
# with a negative immediate.
addi $1, $0, 5 # $1 := 5
addi $2, $1, -2 # $2 := $1 + (-2)
add $3, $1, $2 # $3 := $1 + $2
# Notice that $0 is special , because it is always
# zero. So using $0 as a source lets us effectively
# do a movi, as in the first instruction below.
addi $4, $0, 55 # $4 := 55
sub $5, $4, $1 # $5 := $4 - $1
# We also have bitwise operators AND and OR.
# (but not ori and andi). As above , the first
# operand is the destination register.
or $6, $2, $5
16
and $7, $2, $5
halt # end the program with an infinite loop
4.2 Loops
# A simple loop, counting down from 10
# Here, we introduce a pseudo -instruction as a synonym
# for addi. Specifically:
# movi $1, 10 <===> addi $1, $0, 10
# In other words , movi is really adding zero to the
# immediate and storing the result in $1.
# We also introduce jeq, which compares its first two
# arguments and conditionally jumps to the label in
# the third.
movi $1, 10 # Initialize counter to 10
beginning:
jeq $1, $0, done # if $1 == $0, go to done
addi $1, $1, -1 # Decrement $1
j beginning # go to top of loop
done:
halt # we've finished
4.3 Subroutines
# Example of simple subroutine. At the jal
# instruction , we call subroutine proc.
# It store a value in $3, then returns.
# Upon return , we go back to the address
# after the subroutine invocation. Thus,
# the addi instructions will be executed
# in the following order:
# X1, X2, X3
addi $1, $0, 1 # X1: assign 1 to $1
jal proc
addi $2, $0, 2 # X3: assign 2 to $2
halt
proc:
addi $3, $0, 3 # X2: assign 3 to $3
jr $7
4.4 Variables
# Examples of variables in memory.
# var1 is a label identifying a memory address
# containing 30, and var2 is a label identifying
17
# a memory address containing 5. We load the value
# at var1 into $1, and the value at var2 into $2.
# Then we AND them into $3, and OR them into $4.
# Then we store $3 into var3.
lw $1, var1($0) # read from address var1 + 0
lw $2, var2($0) # read from address var2 + 0
and $3, $1, $2 # AND the values together
or $4, $1, $2 # then OR them together
sw $3, var3($0) # write the AND result into memory
halt # program ends
var1: # declare a label
.fill 30 # insert the value 30 into memory here
var2:
.fill 5
var3:
.fill 0
5 Hardware implementation
There are three version of the E20 processor: a single-cycle version, a multicycle version, and a pipelined
version. Each version has different performance characteristics, but implements the same architecture: that
is, all three versions must interpret the same machine code and produce identical results, although the way
they achieve those results will differ.
5.1 Single-cycle version
Here we present details of the hardware implementation of the single-cycle version of the E20 processor. In
this version, each instruction executes in its entirety in one clock cycle.
5.1.1 Circuit diagram
In the following diagram, thick lines represent 16-bit wires. Dotted lines represent control signals.
18
Every cycle, the program counter selects an instruction to be read from memory, which is passed to the
control module. The control module acts as a decoder, which, on the basis of the current instruction, sets
all of the control signals appropriately, and all of the control and data lines are set up. Then the new values
are stored in memory, the register file, and the program counter register.
5.1.2 Control signals
The control signals are set based on the basis of only the opcode and the EQ wire from the ALU. The EQ
wire will be 1 when the result of the ALU operation is zero, and 0 otherwise.
The control signals are as follows:
19
FUNCalu This wire directs the ALU which operation to perform.
MUXalu This wire directs a mux to select one of two inputs to the ALU: either from a register,
or a sign-extended value from the immediate field.
MUXpc This 2-bit wire directs a mux to select one of four signals as the next value of the
program counter: the ALU output, the increment of the current program counter,
the zero-extended 13-bit immediate field, or the sum of the immediate field and the
increment of the current program counter.
MUXrf This wire directs a mux to select one of two register selectors as input to the read input
of the register file.
MUXtgt This 2-bit wire directs a mux to select one of three signals as input to the write selector
of the register file: the incremented program counter, the ALU output, or the data
memory output.
MUXdst This 2-bit wire directs a mux to select one of three possible register selectors indicating
where to write the result of the instruction: the second register field, the third register
field, or the literal 7.
WErf This wire enables or disables the write port of the register file. If the signal is 1, the
register file will write the value of TGTdata to register TGT. Otherwise, writing is
blocked.
WEdmem This wire enables or disables the write port of memory. If the signal is 1, memory will
write the value of dataValIn at address dataAddr. Otherwise, writing is blocked.
The control wires can carry the following concrete values with the indicated meaning:
Control signal Values
FUNCalu 0 = add
1 = subtract
2 = and
3 = or
4 = slt
MUXalu 0 = register
1 = immediate
MUXpc 0 = ALU
1 = pc+1
2 = pc+1+imm
3 = 13-bit immediate
MUXrf 0 = rA
1 = rC
MUXtgt 0 = ALU
1 = pc+1
2 = dmem
MUXdst 0 = rB
1 = rC
2 = literal 7
WErf 0 = disable
1 = enable
WEdmem 0 = disable
1 = enable
5.2 Multicycle version
Here we present details of the hardware implementation of the multicycle version of the E20 processor. In
this version, each instruction executes in more than one clock cycle.
20
5.2.1 Circuit diagram
In the following diagram, thick lines represent 16-bit wires. Thin lines represent wires of less than 16 bits.
Dotted lines represent control signals. The control unit, shown to the right, has implicit connection to many
components, including all the multiplexors, but for simplicity the actual wires are not shown.
In addition to the architectural registers $0...$7, this version of the processor has several microarchitec-
tural registers that are not accessible to the programmer:
• IR — the instruction register store the current instruction.
• A and B — these registers temporarily hold the input to the ALU before the EXEC stage.
• aluOut — this register stores the output of the ALU after the EXEC stage.
• MDR — the memory data register stores output from the memory unit between the MEM and WB
stage.
21
• state — stores the current state of execution. Used exclusively by the control unit.
5.2.2 High-level state diagram
In the following diagram, we show the relationship between the states of the multicycle processor.
Execution of all instructions is divided into five stages: instruction fetch (IF), instruction decode (ID),
execute (EXEC), memory (MEM), and writeback (WB). Each stage executes in one clock cycle. All instruc-
tions execute each of the five stages in sequence. In each stage, each instruction will be in one of several
possible states, each identified by a number between 0 and 14. The determination of which state to use is
made by the control module, based on the instruction’s opcode.
Execution of all instructions begins at state 0. Transitions to subsequent states, shown by arrows, are
predicated by the opcode. After executing the last stage of each instruction, execution resumes with the
next instruction at state 0.
Within the box for each state are shown the symbolic micro-instructions necessary to complete that
stage. The advancement of the state for the subsequent clock cycle is implicit. Similar to the notation used
in section 3, the micro-instructions indicate storage operations using the left arrow (<-), memory access
using M[...] and architectural register access using R[...].
The terms to the right side show which execution stage each state belongs to.
5.2.3 Low-level state diagram
The low-level state diagram is similar to the high-level diagram shown above, however instead of showing
symbolic micro-instructions, each state shows the exact control signals necessary to complete that stage.
22
5.3 Pipelined version
Here we present details of the hardware implementation of the pipelined version of the E20 processor. As
in the multicycle version, each instruction executes in more than one clock cycle. In addition, the processor
supports executing parts of different instructions at the same time, in different parts of the processor.
5.3.1 Circuit diagram
In the following diagram, thick lines represent 16-bit wires. Thin lines represent wires of less than 16 bits.
Dotted lines represent control signals. Control modules are shown separately.
Clocked latch modules are shown in gray, and logic modules are represented by white boxes.
To avoid clutter, lines representing wires between successive pipeline registers are omitted. For example,
the ID1 pipeline register passes its value to the ID2 pipeline register, even though the middle section of the
line between them is hidden.
23
5.3.2 Stages
In the circuit diagram, you can see that the processor is divided into areas matching each of the five stages
of execution:
1. Each instruction starts in the first stage, instruction fetch (IF), at the top of the diagram. This stage
is responsible for reading the instruction from memory. Memory is represented by the tall gray box on
the right.
2. Then, each instruction enters the instruction decode (ID) stage, where its register values are read from
the register file. The register file is represented by the tall gray box on the left.
3. The third stage is the execute (EXEC) stage, where most instructions have their operands processed
by the ALU. Much of the complexity of this stage is due to forwarding.
4. Instructions that access memory will do so in the memory (MEM) stage. Other instructions still use
this stage, but perform no useful work in it.
5. Finally, in the writeback (WB) stage, the instruction’s final result is stored back into the register file.
After this stage, each instruction is retired.
Between each stage, intermediary values are stored into pipeline registers, indicated by the wide gray
boxes with dotted border.
5.3.3 Control modules and wires
The processor has several control modules. Their inputs and outputs are described here.
CTLid implements decoding and stalling. The module examines the instructions
in IR1 and IR2; if the latter instruction is an lw and the former instruction reads
a register written by that lw, then we must stall. To perform the stall, the module
asserts Pstall, which holds the current instruction in the IF/ID pipeline register,
preventing an update; disables WEpc, preventing an update to the main program
counter; and also asserts Pnop2, which replaces the instruction in the ID/EXEC
pipeline register with a nop, thus introducing a bubble. If there is no need to stall,
then CTLid will set MUXb and MUXr1 for the registers needed by IR1 and to
store the instruction’s operands in A2 and B2.
CTLexec1 controls inputs of the execution stage. In a simple case, it examines
IR2, sets the appropriate aluOp for that opcode, and passes operands in A2, B2,
and IR2.imm7 to the ALU. The module also implements forwarding: if either or
both of the register operands read by the instruction in IR2 is also written by the
instruction in IR3, IR4, or IR5, then we must pass that result from aluOut,
mOut, or wbOut, respectively, directly to the execution unit, by setting the
control of MUXalu1, MUXalu2, and MUXalu3.
CTLexec2 handles jumps and mispredictions. If IR2 holds an unconditional
jump instruction (j, jal, or jr), or a conditional jump instruction (jeq) when the
two register operands are equal, then the processor must flush the pipeline and
prepare to fetch the instruction at the destination address. To flush the
instructions presently in the fetch and decode stages, the module asserts Pnop1
and Pnop2, which replace the instructions in the IF/ID and ID/EXEC pipeline
registers with a nop. To fetch the next instruction, the module sets MUXifpc,
which overrides the usual fetch logic of simply incrementing the previous program
counter.
CTLmem reads and writes data from the memory unit. If IR3 contains sw, it
asserts WEram to write the value in B3 to the address in aluOut. If IR3 contains
lw, it sets MUXmout so that the value of the address in aluOut is written to
mOut. In other cases, the value of aluOut is passed through to mOut.
CTLwb writes final values to the register file. For most instructions, the register
to be written will be in IR4.reg2 or IR4.reg3, and the value to be stored will be in
mOut. For jal, the value of the program counter plus one will be stored in
register $7. For sw and other jump instructions, no value is written.
25
The control wires can carry the following concrete values with the indicated meaning:
Control signal Values Description
Pnop1 0=no flush
1=flush
Replace IR1 with nop
Pstall 0=no stall
1=stall
Disable write to IR1 and PC1
MUXifpc 0=MUXjmp
1=PC0+1
Select new value of PC1
WEpc 0=disable
1=enable
Write enable for main program
counter
MUXr1 0=literal 0
1=IR1.reg1
Select register to read
MUXb 0=r2dataOut
1=IR1.imm13
Select new value of B2
Pnop2 0=no flush
1=flush
Replace IR2 with nop
MUXalu1 0=aluOut
1=mOut
2=wbOut
3=A2
Select ALU input
MUXalu2 0=aluOut
1=mOut
2=wbOut
3=B2
Select ALU input and B3
MUXalu3 0=IR2.imm7
1=MUXalu2
Select register value or immediate
to ALU
MUXjmp 0=ALU
1=PC2+IR2.imm7+1
Select target in case of jump
EQ 0=not equal
1=equal
Result of equality of ALU operands
aluOp 0=add
1=subtract
2=and
3=or
4=slt
Select ALU operation
MUXmout 0=aluOut
1=data2Out
Select new value of mOut
WEram 0=disable
1=enable
Write enable for memory
MUXrw 0=IR4.reg2
1=IR4.reg3
2=literal 7
Select register to write
MUXtgt 0=mOut
1=PC4+1
Select value to write to register
WEreg 0=disable
1=enable
Write enable for registers
5.3.4 Architectural differences
A design goal of all three versions (single-cycle, multicycle, and pipelined) of the E20 processor is to support
the same architecture: that is, they should run the same machine code and get the same results. However,
the pipelined micro-architecture compels us to compromise in this goal. For example, consider the following
program:
26
movi $1, 1
sw $0, target($0) # replace the instruction at target with nop
target:
add $1, $1, $1 # this instruction should not be run
halt
The above program modifies itself: the sw instruction stores the value 0 (corresponding to the nop
instruction) into the subsequent memory cell, which is then executed. This is a valid E20 program, which,
when run on the single-cycle or multicycle versions of the processor, will halt with the value 1 in register $1,
because the add is never run.
However, if we run the same program on the pipelined processor, we get a different result: in that case,
the add is fetched into the pipeline before the sw’s MEM stage overwrites it in memory. The add is executed,
and the final value of $1 is 2, not 1. Therefore the behavior of the pipelined E20 differs from earlier versions.
Strictly speaking, the pipelined E20 violates the architectural design that we established with the earlier
versions. However, this is a worthwhile sacrifice, thanks to the increased performance of the pipelined version.
Furthermore, in practice, most programs do not modify themselves, so this design characteristic is unlikely
to “break” many programs.
The alternative would be to have the pipelined processor automatically flush the pipeline in response
to self-modification. However, detecting a self-modifying program would add significant complexity to the
design.
27
Introduction
Architecture
Registers
Instructions
Memory
Comparison
Subroutines
E15 vs E20
Instruction set
Instructions with three register arguments
add $regDst, $regSrcA, $regSrcB
sub $regDst, $regSrcA, $regSrcB
or $regDst, $regSrcA, $regSrcB
and $regDst, $regSrcA, $regSrcB
slt $regDst, $regSrcA, $regSrcB
jr $reg
Instructions with two register arguments
slti $regDst, $regSrc, imm
lw $regDst, imm($regAddr)
sw $regSrc, imm($regAddr)
jeq $regA, $regB, imm
addi $regDst, $regSrc, imm
Instructions with no register arguments
j imm
jal imm
Pseudo-instructions
movi $reg, imm
nop
halt
Assembler directives
.fill imm
Undefined bit patterns
Examples
Math
Loops
Subroutines
Variables
Hardware implementation
Single-cycle version
Circuit diagram
Control signals
Multicycle version
Circuit diagram
High-level state diagram
Low-level state diagram
Pipelined version
Circuit diagram
Stages
Control modules and wires
Architectural differences
Answers
0No answers posted
Post your Answer - free or at a fee
Login to your account tutor account to post an answer
Posting a free answer earns you +20 points.
LoginTo get help with a similar task to Computer Architecture - Homework 5, Ask for Computer Science Assignment Help online.