The Focus on Microarchitecture

After interviewing a number of recent EE and CS graduates for RTL and DV positions in three companies, I made an observation: there is a very clear gap between what they learn in school and what the industry needs from them. The graduates usually know some Verilog, can write an FSM, answer a question about STA, explain the 5-stage MIPS / RISC-V pipeline or Tomasulo algorithm. However, many students struggle in important areas, for example in organizing data processing pipelines with dependencies and flow control. This know-how is essential to work in diverse areas: networking, where you have to process a stream of packets coming back-to-back with backpressure from time to time; GPU, where you process a stream of colored triangles in fixed function blocks in the same way; interconnects, where you process a stream of memory read and write requests; and so on.

There are textbooks that partially cover the subject, for example:

Digital Design: A Systems Approach by William James Dally and R. Curtis Harting. This book covers pipelines, double buffering, it mentions FIFOs and arbitration. However this book does not go far enough, for example it does not cover credit-based flow control, widely used in large electronic companies instead of double buffering.
Modern System-on-Chip Design by David J. Greaves. This book does cover credit-based flow control, interconnects and other topics but offers no code examples. The students who study it can talk about the ideas but will have a hard time implementing them unless they do some open projects before joining the industry.

Here is a specific example, a set of interview questions, to show how basic the problem is.
Suppose there is a pipelined block with a fixed latency of N clock cycles that computes an integer square root.

It gets X and a valid signal for X as an input and N clock cycles later outputs √X as an output:

The first question is: implement a pipelined block that computes a sum of three square roots:

The block should get three numbers A, B, and C each clock cycle and generate the result N clock cycles later:

This question is easy. Usually, a student immediately shows a design with three pipelined blocks instantiated in parallel:

Now, let’s change the problem a little bit and introduce dependencies between the computations:

Here, the struggle starts: many students draw something like this:

Now, I am telling to a student: this is not going to work. By the time the first instance of the square root module finishes computing √C, the value of the corresponding B argument is long gone; it was replaced N-1 clock cycles ago by a new value of B1. The next student attempts usually involve:

“OK, I will store B in a register.” My answer: This is not going to work since you can store only one value of B in a single register. How do you prevent the next B, B1 from being lost?
“Can I assume that ISQRT is a combinational module?” No, this would be too easy.
“Can I use an FSM to use the module sequentially?” No, this would limit the bandwidth and prevent from processing back-to-back transactions, unless a student can show how to use multiple FSMs.
“Can I use a Tomasulo algorithms?” No, this is gross overkill for a simple fixed-latency operation.

At this point, I usually ask whether a student learned about shift registers, FIFOs or circular buffers in their digital design curriculum. Either structure can be used to delay an argument until the appropriate pipeline stage. Some students immediately take the tip and present a solution while others continue to struggle.

The same thing can be done with shift registers, but it is a less power-efficient option in many cases; more switching = more dynamic power. This is another topic worth exploring with students to make them better prepared for a position as an RTL designer.

The point is: apparently, many students learn shift registers, FIFOs and other building blocks as some abstract constructs and don’t know how to use them for various design situations. They are also not trained in RTL coding for low power, banking schemes with embedded SRAM memories and other microarchitectural topics. Training junior engineers on the job is costly, both in terms of the time of senior engineers and risks to product quality.

To help the situation, a group of industrial engineers, students and academics are working on an open-source project called systemverilog-homework, a set of exercises covering such topics. Part of this activity happens at Verilog meetups in Hacker Dojo in Mountain View, California, every Sunday from 11 am to 2 pm. If you are unable to come to Mountain View, you can join us online over Zoom at the same time. Please send email to info@verilog-meetup.com, join Google Group meetsv and Telegram channel verilog_meetup to introduce yourself and get started.

Please also read the following related articles:

Self-Education And Educating Others, the recommendations from an RTL designer Yuri Panchul on how to get to speed in SystemVerilog, and a list of educational projects we have in the queue
A Case Study On Effective Pipeline Design In Digital Systems by Kiran Jayarama, a Doctoral Student at Wright State University