Day 9

Created Wednesday 29 April 2020

We began class by reviewing the homework.

For the rewrite lab, he said he actually wanted us to have two separate programs. One for the main program and another for the subroutine. Also, he wants us to first store then restore registers. I just submitted my lab before this class... I believe I need to rewrite some things.

How do we measure ISAs (Instruction Set Architectures)?
Main memory space occupied by a program, instruction complexity, instruction length (in bits), and total number of instructions in the instruction set.

Aside for assembly

TERMINATE YOUR PROGRAMS! Don't just let it run. He won't take off points but please use good practice!

Designing an ISA

Instruction length. Short, long or variable.
Number of operands.
Numder of addressable registers.
Memory organization (is it a byte or word addressable system?)
Addressing modes (direct, indirect, or indexed)

Byte ordering, endianness, is another major architectural consideration.
If we have a two-byte integer, the integer may be stored so that the least significant byte is followed by the most significant byte or vice versa.

In little endian machines, the least significant byte is followed by the most significant. Big endian is the opposite.

For example, 0x12345678₁₆ has varying arrangements of bytes.
Big: 12 34 56 78
LIttle: 78 56 34 12

Remember, it's about the byte order and not the digits in the byte!

A computer uses 32-bit integers.
Values 0xABCD1234, 0x00FE4321, and 0x10 would be stored sequentially in memory, starting at address 0x200.

More on Endian

Big endian is more natural for us humans. The sign of the number can be determined by looking at the byte at address offset 0. Strings and integers are stored in the same order.

Little endian makes it easier to place values on non-word boundaries. Conversion from a 16-bit integer address to a 32-bit integer address does not require any arithmetic.

HOw the CPU will store data

We have three choices:

stack architecture
accumulator architecture
general purpose register architecture

In choosing one over the other, the tradeoffs are simplicity (and cost) of hardware design with execution speed and ease of use.

A stack architecture instructions and operands are implicitly taken from the stack. A stack cannot be accessed randomly.
An accumulator architecture, one operand of ba binary operation is implicitly in the accumulator. ONe operand is in memory, creating lots of bus traffic.
A general purpose register (GPR) architecture registers can be used instead of memory. Faster than accumulator. More efficient implementation for compilers. Results in longer instructions

Most systems today are GPR systems. There are three types:

Memory-memory where two or three operands may be in memory.
Register-memory where at least one operand must be in a register.
Load-store where no operands may be in memory.

The number of operands and the number of available registers has a direct affect on instruction length.

Stack machines use one and zero-operand instructions.
LOAD and STORE instructions require a single memory address operand.
Other instructions use operands from the stack implicitly.
PUSH and POP operations involve only the stack's top element.
Binary instructions (e.g. ADD, MULT) use the top two items on the stack.

Stack architecture requires us to think about arithmetic expression s a little differently. We are used to infix notation such as z = x + y

Stack arithmetic requires that we use postfix notation: z= xy+

I phased out for a bit be he spent time going over going from infix to postfix and whatnot.

Instructions fall into several broad categories

data movement
arithmetic
boolean
bit manipulation
Input/Output (or I/O)
control transfer
special purpose

Addressing modes specify where an operand is located.
They can specify a constant, a register, or a memory location.
The actual location of an operand is its effective address.
Certain addressing modes allow us to determine the address of an operand dynamically.

Immediate addressing is where the data is part of the instruction.
Direct addressing is where the address of the data is given in the instruction.
Register addressing is where the data is located in a register.
Indirect addressing gives the address of the address of the data in the instruction.
Register indirect addressing uses a register to store the address of the address of the data.

By any other name, pointers are actually indirect addresses!

Indexed addressing uses a register (implicitly or explicitly) as an offset, which is added to the address in teh operand to determine the effective address of the data.
Based addressing is similar except that a base register is used instead of an index register.

The difference between these two is that:
an index register holds an offset relative to the address given in the instruction.
a base register holds a base address where the address field represents a displacement from this base.

Stack addressing the operand is assumed to be on top of the stack. There are many variations to these addressing modes including:

indirect indexed
base/offset
self-relative
auto increment--decrement

we won't be going into these in detail

Instruction Pipelining

Some CPUs divide the fetch-decode-execute cycle into smaller steps. These smaller steps can often be executed in parallel to increase throughput. Such parallel execution is called instruction pipelining. Instruction pipelining provides for Instruction Level Parallelism (ILP)

Suppose a fetch-decode-execute cycle were broken into the following smaller steps:

Fetch instruction
Decode opcode
Calculate effective address of operands
Fetch operands
Execute instruction
Store result

For every clock cycle, one small step is carried out, and the stages are overlapped.

The theoretical speedup offered by a pipeline can be determined as follows:
Let t_p be the timer per stage. Each instruction represents a task, T, in the pipeline.
The first task (instruction) requires k * t_p time to complete in a k-stage pipeline. The remaining (n - 1) tasks emerge from the pipeline one per cycle. So the total time to complete the remaining tasks is (n-1)*t_p.
Thus, to complete n tasks using a k-stage pipeline requires:

(k * t_p) + (n - 1) * t_p = (k + n - 1) * t_p

This is a sunny day scenario though. Pipeline hazards arise that cause pipeline conflicts and stalls.

An instruction pipeline may stall, or be flushed for any of the following reasons:

Resource conflicts
Data dependencies
Conditional branching

MIPS

MIPS was an acronym for MIcroprocessor Without Interlocked Pipeline Stages.
The architecture is little endian and word-addressable with three-address, fixed-length instruction.
Like Intel, the pipeline size of the MIPS processors has grown: The R2000 and R4000 have five-stage pipelines.; the R4000 and R4400 have 8-stage pipelines.

Java

The Java programming language is an interpreted language that runs in a software machine called the Java Virtual Machine (JVM).
A JVM is written in a native language for a wide array of processors, including MIPS and Intel.
Like a real machine, the JVM has an ISA all of its own, called bytecode.
This ISA was designed to be compatible with the architecture of any machine on which the JVM is running.
Bytecode is portable. It can go anywhere.

Compile-time Environment (technically interpreted)

.java source files go through the Java Compiler (not actually a compiler. It is an interpreter). Program Class Files (.class) are made. The actual bytecode.

Run-time Environment

The JVM uses a Class Loader. Pulls in JAVA API Files. Then Execution Engine.

Java bytecode is a stack-based language.
MOst instructions are zero address instructions.
The JVM has 4 registers that provide access to 5 regions of main memory.
All references to memory are offsets from these registers. Java uses no pointers or absolute memory references.
Java was designed for platform interoperability, not performance!

ARM

You may not have heard of ARM but most likely use an ARM processor every day. It is the most widely used 32-bit instruction architecture.
95%+ of smartphones
80%+ of digital cameras
40%+ of all digital television sets
Founded in 1990 by Apple and others, ARM (Advanced RISC Machine) is now a British firm, ARM Holdings.
ARM Holdings does not manufacture these processors; it sells licenses to manufacture.

ARM is a load/store architecture: all data processing must be performed on values in registers, not in memory.
It uses fixed-length, three-operand instructions and simple addressing modes.
ARM processors have a minimum of a three-stage pipeline (consisting of fetch, decode, and execute);
Newer ARM processors have deeper pipelines (more stages). Some ARMS implementations have 13-stage integer pipelines.

ARM has 37 total registers but their visibility depends on the processor mode.
ARM allows multiple register transfers.
It can simultaneously load or store any subset of the 16 general-purpose registers from/to sequential memory addresses.
Control flow instructions include unconditional and conditional branching and procedure calls.
Most ARM instructions execute in a single cycle, provided there are no pipeline hazards or memory accesses.

ISAs are distingiushed according to their bits per instruction, number of operands per instruction, operand location and types and sizes of operands.

Endianess is another major architectural consideration.
CPU can store data based on

A stack architecture
An accumulator architecture
A general purpose register architecture.

Instruction sets are differentiated by the following:

Number of bits per instruction
Stack-based or register-based
Number of explicit operands per instruction
Operand location
Types of operations
Type and size of operands