University of California at Berkeley

## College of Engineering

Computer Science Division --- EECS
Spring 1998J. Wawrzynek

## CS152 Computer A rchitecture and Engineering Midterm II

Your Name: $\qquad$

ID Number: $\qquad$

This is a closed-book, closed-note exam. No cal culators. You have 3 hours.
Each question is marked with its number of points (one point per expected minute of time).
Show your work. Write neatly and be well organized.
Good luck!

1. [30 points] Short answers. Please provide short answers to the following questions.
a. [1 points] True or False. Microprogramming is an effective controller design abstraction for RISC processors.
b. [1 points] "Vertical" microcode refers to encoded microcode. The unencoded version is referred to as $\qquad$ .
c. [1 points] True or False. Superpipelining and Superscalar are both modern techniques for reducing CPI to less than 1.
d. [1 points] DRAM capacity multiplies by 4X approximately every $\qquad$ years.
e. [1 points] True or False. "Page mode" is a style of RAM that speeds access by eliminating refresh.
f. [1 points] True or False. A "write through" cache requires dirty bits.
g. [1 points] True or False. A verage hard-disk seek times are on the order of 8-12us.
h. [1 points] In IO systems, device "polling" is a low-cost alternative to $\qquad$ .
i. [2 points] Name the three "C"s of cache misses:
j. [2 points] What is the ideal speedup of a pipelined processor over a single cycle processor?
k. [2 points] List two techniques for handling data hazards.
2. [2 points] Sketch a typical curve showing latency versus percentage of maximum throughput for a IO system.
m. [2 points] Draw the transistor level circuit diagram for a CMOS dynamic RAM cell:
n. [3 points] Draw the transistor level circuit diagram for a CMOS static RAM cell:
o. [4 points] Sketch a simple diagram showing a system that overlaps TLB and cache lookup. Draw a box for each of the CPU, TLB, Cache, and Main Memory.
p. [5 points] A processor has an ideal CPI=2.0 and executes with a clock cycle of 10ns. Assuming
a instruction mix including 30\% load/ store instructions, a unified cache hit rate of $90 \%$ and a miss penalty of 10 cycles, what is the real CPI?
3. [15 points] Consider the design of 3 different caches, one direct mapped (DM ), one 2- way set associative (SA ), and one fully associative (FA ). Each has 8-bit addresses (word addressing), 2 words/ block, and a total capacity of (only) 8 words.
a. In the space below, for each type, label the address fields for the use as index and/ or tag by the cache. Indicate the number of bits for each.


SA


FA

b. Assume initially the caches are empty (all invalid entries). For each of the read addresses listed in the table below, fill in a 0 or 1 for each cache type, indicating a hit (1) or a miss (0). Use a lest-recently-used (LRU) replacement policy.
c. Assuming a hit time of 1 cycle and a miss penalty of 20 cycles, compute the number of cycles for each cache to process the given address stream. Record you answer in the table.

| address | 0 | 2 | 4 | 6 | 8 | 0 | 2 | 4 | 6 | 8 | 1 | 2 | 3 | 4 | 5 | 10 | \# of cycles |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |


| DM |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| SA |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| FA |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

## 3. [20 points]

Consider the design of the processor with a 7-stage pipeline depicted below:

| IF1 | IF2 | ID | EX | M1 | M2 | WB |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |

The pipeline is similar to the standard 5 stage pipeline with the exception that the icache and dcache are both pipelined with a latency of 2 cycles. As usual, the branch compare is calculated and the new PC loaded during the ID stage.

Assume the processor uses bypassing (forwarding) when necessary and possible.
a. Show the execution of one iteration of the following loop on the pipeline. Use the grid to illustrate the execution. In addition to one iteration, show the execution of one instruction from the next iteration of the loop, to illustrate when the next iteration would begin. A s usual, instruction issue slots move from top to bottom and time from left to right.

## D o not reorder the instruction.

loop: | lw | $\$ 2$, | $0(\$ 4)$ |
| :--- | :--- | :--- |
|  | add | $\$ 1, \$ 1, \$ 2$ |
|  | add | $\$ 5, \$ 5, \$ 1$ |
|  | addi | $\$ 4, \$ 4, \$ 4$ |
|  | addi | $\$ 3, \$ 3,-1$ |
|  | bgez | $\$ 3$, loop |
|  | nop |  |
|  | nop |  |
|  | nop |  |

lw \$2, 0(\$4)

| IF1 | IF2 | ID | EX | M1 | M2 | WB | X | X | X | X | X | X | X | X | X | X |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |

Write down the number of instruction issue slots for one iteration here:
b. Now, assume that you can reorder instructions within one iteration of the loop in an attempt to decrease the CPI. However, your reordered loop must retain the same function as the original loop.

Show execution of your reordered loop below.
lw \$2, 0(\$4)

| IF1 | IF2 | ID | EX | M1 | M2 | WB | X | X | X | X | X | X | X | X | X | X |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |

Write down the number of instruction issue slots for one iteration here:
c. Assuming that the loop executes for a very large number of iterations, what is the average CPI for each part? (Ignore the time it takes to fill the pipeline on the first iteration of the loop and the time it takes to drain the pipeline on the final iteration.)

Original order CPI =

Reordered CPI =
4. [35 points] This problem deals with a multi-cycle processor shown below:


Assume that a memory access can be completed in less than one cycle.
Each control signal is highlighted with a gray box. The signals beginning with "Id_" are register load enable signals and are active high. The write enable signals for the register file and the memory are also active high.

All corner signals are generated from a controller shown in the lower right hand corner. This is a microcode based controller with a section of microcode for each instruction. Each line of microcode includes a set of bits for driving the appropriate control signal and 3 bits used as the low-order bits of
the next microcode line. The 3 bit field is labeled "next". The high-order bits of the microcode address always come from the 3 opcode bits of the instruction register (IR).

For this problem we are interested in the instructions described in the table below, where "R", "M", "s_ext", and "imm", stand for "register file", "MEM ORY", "sign_extend", and immediate respectively.

| Instruction | Name | Description | Op |
| :---: | :---: | :---: | :---: |
| Logical NOR | NOR $\mathrm{x}, \mathrm{y}, \mathrm{z}$ | $\mathrm{R}[\mathrm{x}] \leftarrow \mathrm{R}[\mathrm{y}]$ NOR $\mathrm{R}[\mathrm{z}]$ | 0 |
| Load word | LDW $x, y$, imm | $\mathrm{R}[\mathrm{rx}] \leftarrow \mathrm{M}[\mathrm{R}[\mathrm{y}]+\mathrm{s}$.ext(imm)] | 1 |
| Store word | STW $\mathrm{x}, \mathrm{y}, \mathrm{imm}$ | $\mathrm{M}[\mathrm{R}[\mathrm{y}]+\mathrm{s}$.ext $(\mathrm{imm})] \leftarrow \mathrm{R}[\mathrm{x}]$ | 2 |
| Branch < 0 | BNEG $x, y$, imm | IF R $[\mathrm{x}]<0, \mathrm{PC} \leftarrow \mathrm{R}[\mathrm{y}]+\mathrm{s}$ _ext (imm) | 3 |
| Add to Memory | ADDM $\mathrm{x}, \mathrm{y}$,imm | $\mathrm{M}[\mathrm{R}[\mathrm{y}]+\mathrm{s}$.ext(imm) $] \leftarrow \mathrm{R}[\mathrm{x}]+\mathrm{M}[\mathrm{R}[\mathrm{y}]+\mathrm{s}$ _ext $(\mathrm{imm})]$ | 4 |

The table below represents the first 40 entries of the microcode ROM. The microcode line address is shown as a base-8 number. The first column of the table holds the 30bit "next" field. The remaining columns hold the state of the control signals.

Fill in the microcode table below for the above instructions. Use a base-8 number for the next column and 1's and 0's for the others. You do not need to fill in the 0 entries--any square left blank will be assumed to bea 0 .

Try to minimize the number of cycles for each instruction, and don't forget instruction fetch.

\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline microread address \& next \& 源 \& 馬 \& S

cos

\％ \& $$
\begin{aligned}
& \stackrel{\rightharpoonup}{\infty} \\
& \stackrel{1}{1}
\end{aligned}
$$ \& \[

$$
\begin{aligned}
& \overline{\Phi_{1}} \\
& \infty_{1}
\end{aligned}
$$

\] \& \[

\stackrel{a_{1}}{\square}

\] \& \[

$$
\begin{aligned}
& \Omega^{\prime} \\
& z^{\prime}
\end{aligned}
$$

\] \& \[

$$
\begin{aligned}
& \Xi \\
& \text { ت } \\
& \text { B } \\
& \text { 号 }
\end{aligned}
$$

\] \& \[

$$
\begin{aligned}
& \dot{D} \\
& \frac{D}{5} \\
& \frac{3}{4}
\end{aligned}
$$

\] \& \[

\frac{4}{3}

\] \& \[

$$
\begin{aligned}
& \mathbf{J}_{1} \\
& 0_{1}
\end{aligned}
$$

\] \& \[

$$
\begin{aligned}
& 0 \\
& 0 \\
& 0
\end{aligned}
$$

\] \& \[

$$
\begin{aligned}
& =1 \\
& \Xi
\end{aligned}
$$
\] \& $\stackrel{5}{3}$ \& 说 \&  \& 砢 \&  <br>

\hline 00 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 01 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 02 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 03 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 04 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 05 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 06 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 07 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 08 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 09 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 10 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 11 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 12 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 13 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 14 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 15 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 16 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 17 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 18 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 19 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 20 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 21 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 22 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 23 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 24 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 25 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 26 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 27 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 28 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 29 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 30 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 31 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 32 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 33 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 34 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 35 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 36 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 37 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 38 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 39 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 40 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 41. \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 42 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 43 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 44 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 45 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 46 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline 47 \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& \& <br>
\hline
\end{tabular}

Posted by HKN (Electrical Engineering and Computer Science Honor Society)
University of California at Berkeley
If you have any questions about these online exams please contact mailto:examfile@hkn.eecs.berkeley.edu

