Last week I was doing a seminar on SystemVerilog, ASIC and FPGA at ADA University in Baku, Azerbaijan. I will replicate the last two sessions of this seminar, on RISC-V CPU simulation and synthesis, at the Verilog Meetups on March 3 and March 10 at Hacker Dojo, Mountain View, California. For this reason I am combining the information about Azerbaijan and California seminars in a single post.

First, let’s talk about ADA University.

The state of affairs before the seminar in Azerbaijan

The ADA University already has an EE and CS curriculum that includes:

  1. An introduction to digital logic design that uses Harris & Harris as a textbook. The students already know the concepts of gates, D-flip-flops, muxes, decoders, priority encoders, counters, shift registers and carry-chain adders. Some students also knew the concept of Finite State Machine (FSM) from another class (on discrete mathematics?).However: the existing class neither covered Hardware Description Languages (HDLs) nor had FPGA-based laboratory exercises. Another notable omission was static timing analysis: in other words the students knew the function of D-flip-flop and how it can be used to create counters, shift registers and FSMs, but did not know how the maximum clock frequency could be calculated based on propagation delays + setup/hold times of a circuit.
  2. An introduction to computer architecture and CPU microarchitecture using materials from ARM. This material included a CPU pipeline demonstration using a software microarchitecture-level simulator. Some students studied assembly language.This class can be obviously complemented by exercises with CPU on FPGA board and synthesizing CPU for ASIC using Open Lane open-source toolchain. ARM has an FPGA-based educational material called DesignStart, but I believe it is oriented toward using an existing Cortex-M IP rather than motivating the students to design their own CPU. For the latter it would be beneficial to let the students play with RISC-V cores since this is where the world’s academia is going.
  3. A class on computer graphics. This can be useful to connect with projects like open-source Vortex GPGPU at Georgia Institute of Technology. In addition, ADA University is planning to have a joint program on high-performance computing with George Washington University.
  4. Students know how to program in C/C++ and use Microsoft Visual Studio Code (VS Code), the environment we use during the seminars to edit, simulate and synthesize SystemVerilog designs. However, students did not know how C is translated into assembly language, adding such information to the curriculum would benefit their understanding of the software stack.
  5. Only a few students were exposed to Linux. While this does not hurt our seminar, since we use the scripts and FPGA tools compatible with both Windows and Linux, it would be beneficial to add a basic Linux class with some Bash scripting (if they don’t have it already), for two reasons:1) Many US universities do Linux-based open-source projects.2) The EDA tools for ASIC design are all Linux-based, both commercial and open-source.
  6. Equipment-wise the ADA University is well equipped with new laboratories and computers donated by Huawei. It also has 10 Altera Cyclone IV-based FPGA boards (ALINX AX4010), suitable for introductory exercises and synthesizing small microcontroller-class CPU cores.Unfortunately ALINX AX4010 are usually sold without a programmer (USB Blaster). Since I am aware of this circumstance, I asked ADA University whether they have USB Blasters with their boards. The didn’t, but luckily enough I had 15 extra USB Blasters at home, so the potential for a disaster (an FPGA seminar without working FPGA boards) was timely discovered and prevented.In addition, I brought several other FPGA boards:1) Altera-based Omdazz, Saylinx, Terasic DE0-Nano, DE0-CV, DE10-Lite (all are popular at Russia’s School of Synthesis of Digital Circuits) and a neat Altera-based board from emooc.cc.2) Two non-Altera boards: Basys3 – a standard inexpensive Xilinx board with Artix-7 FPGA, and Tang Prime 20K Lite board with Gowin FPGA. I consider it useful for any university to have expertise with all the popular toolchains to avoid a vendor lock. I would also recommend using multiple toolchains as a base for student projects and presentations so the students can compare the vendors and share their opinions.
  7. Last but not least, ADA University has courses and projects in robotics, sound processing, DSP and other related topics. Particularly, a combination of an FPGA board with microphones and I2S sound generators can be used for various experiments beyond the scope of digital design and computer architecture.
A seminar on SystemVerilog, ASIC & FPGA at ADA University in Baku, Azerbaijan

During the seminar

The mode of the seminar was as follows: I was available for the five days from Monday through Friday from 10 am till 5 pm with a lunch break from 1 pm to 2 pm. Most students had other classes in parallel, so some of them could only come for 3 hours one day, 3 hours another day etc. Overall the seminar was attended by over 20 students, with 6-12 students present at any given moment.

This mode dictated the following strategy:

  1. I needed to explain the topics two or three times to different groups of students.
  2. The students who already learned a topic helped other students to get up to speed.
  3. When I had two groups of students with different levels of knowledge, I was giving a more advanced group to work on exercises from systemverilog-homework, while explaining to the novice group a topic that is new to them (like SystemVerilog syntax of combinational or sequential logic etc).

I was using the following materials:

  1. A presentation I prepared for the event: The First Step in Digital Design using SystemVerilog and FPGA – the Seminar at ADA University, Azerbaijan.
  2. A repository of synthesizable examples basics-graphics-music we developed together with other contributors of the School of Synthesis of Digital Circuits.
  3. The already mentioned repository systemverilog-homework we created with Mike Kuskov from Innopolis. Some problems from the first two parts originated from Digital Logic RTL & Verilog Interview Questions by Trey Johnson.
  4. schoolRISCV developed by Stanislav Zhelnio, an RTL engineer from Syntacore, with contributions from Alexander Romanov from HSE MIEM.
  5. The slides from Digital Design and Computer Architecture, RISC-V Edition by David Harris & Sarah Harris downloadable from DMK Press, the publisher of its Russian edition.

We started the seminar from the basic exercises with combinational and sequential logic on FPGA boards, then connected these topics to generating video graphics and sound recognition, and finally we worked with a RISC-V-subset minimalistic CPU core in a way I am going to describe in the second part of this article.

The students have fun generating graphics on a portable display

The students have fun generating graphics on a portable display

At the end of the week, after the exercises with FPGA boards and simulations, we watched a presentation by Jørgen Kragh Jakobsen from IC Works, an Open Source Chip Design consulting company. Mr. Jakobsen explained the project called Tiny Tapeout, which allows students and hobbyists to make their own chip on a foundry, by combining many designs in a single chip and using open-source ASIC design tools. This looks like a realistic option to make the first ASIC by ADA University (or maybe even the first ASIC in Azerbaijan?):

Also see a post of Jørgen Kragh Jakobsen on LinkedIn.

An interesting side observation: The most frequent mistake students were making in SystemVerilog during the seminar was mistyping the variable name in module instantiation. IntelFPGA Quartus does not flag it as an error; instead it quietly creates a hanging variable. After the trip I added to the config.svh header file the “`default_nettype none” directive suggested by Dmitry Smekhov to direct a synthesis tool to generate an explicit error.

The follow-up action items

  1. I would expect ADA University to incorporate the exercises with FPGA boards into their curriculum for both digital design and computer architecture classes.
  2. The students who want to continue to master SystemVerilog can go through the systemverilog-homework and participate in online meetings I have on Sundays. I usually have Zoom calls at 9 am California time (21.00 Baku time) meetings for Russian-speaking audience and 10 am California time (22.00 Baku time) meetings for English-speaking audience.Those Zoom calls are is in addition to the meetings at 2pm-5pm at Hacker Dojo in California (where I also have a Zoom connection), but it would be 2am-5am in Baku, which is unpractical.The ultimate outcome of those meetings is not only the education of the participants – I also expect the participants to create new materials (examples, documentation, slides) for the future students of Verilog, ASIC, FPGA and Computer Architecture.
  3. The faculty in ADA could assign some students a project to run and document the example flow not only for each Altera-based board, but also for Xilinx and Gowin-based boards I left with them. Creating such slides and documentation is a nice exercise for the students, plus it helps the future students as well.
  4. In addition to the regular FPGA boards for the students I also left in ADA University 15 bare-bone small boards with Altera MAX-II PLD that can be used for the simplified exercises aimed at mathematical school children, 12-16-year-old, see (12). Such exercises can be evaluated at ADA University as well and can be used for a summer school in Baku.
  5. It is important to study the Tiny Tapeout infrastructure and possibly create an ASIC using it. Alternatively it is possible to work with Europractice IC or Google / eFabless but this might be more difficult than working with Tiny Tapeout.

Here is the recommended literature for the study:

  1. Digital Design and Computer Architecture, RISC-V Edition by Sarah Harris and David Harris
  2. Digital Design: A Systems Approach Illustrated Edition by William James Dally and R. Curtis Harting
  3. Logic Design and Verification Using SystemVerilog (Revised, 2016) by Donald Thomas

You can get some more recommendations in the post Verilog Meetups @ Hacker Dojo: the status and the plans for February 2024.

The last session in Azerbaijan and the next session in California

Now let’s discuss the last session (“Around CPU design”) the ADA University students did in Azerbaijan. I plan the same topic as the next session in Hacker Dojo in Silicon Valley, California.

We start with running simple RISC-V assembly programs in RARS Instruction Set Simulator (ISS). The fastest way to learn the assembly language is to read a book parallel to with running such exercises.

Such activity can be augmented by compiling short C programs into assembly language using the RISC-V gcc toolchain with -S option. In this way we can see how the different C constructs (loops, ifs, function calls) are implemented in assembly.

I included RARS, gcc toolchain and schoolRISCV repository with a couple of assembly examples – into bootable SSDs with Simply Linux distribution I brought in ADA University. I have the same SSDs in California. Below is one of the examples, the Fibonacci numbers. a0 (register number 10) is used as an output (it will be connected to a 7-seven display on an FPGA board in the follow-up exercises).

This can be invoked by:

cd ~/projects/schoolRISCV/program/01_fibonacci
make rars

The RARS simulator has the simplest interface possible. It has just three key commands: F3 – compile, F5 – run and F7 – step. When you compile with F3, you can see how a pseudo-instruction “li” (load immediate) is expanded in different ways, either one or two instructions, depending on the value of its argument.

Then you step with F7 and observe how the program is executed inside the instruction set simulator. Yellow highlighting corresponds to the current instruction and green highlighting to a changed register:

Then we simulate the schoolRISCV CPU on the Register Transfer Level (RTL) in the Icarus Verilog simulator. Before doing it we need to generate a hexadecimal file program.hex from the assembly file, using the same RARS program, but in command line mode:

make program.hex
make icarus

After simulation, we can invoke the GTKWave waveform viewer and add the signals: clock, reset, program counter, the instruction memory address, the instruction coming from the memory and register 10 (“a0”). Now we can see the simulated CPU in action, running the program that generates Fibonacci numbers:

Now it is time to synthesize the schoolRISCV core to run it on an FPGA board.

It is convenient to use Microsoft Visual Studio Code (VSCode) to work with schoolRISCVbasics-graphics-music and systemverilog-homework repositories. VSCode is available under both Linux and Windows and even on Apple MacOS (although we cannot run FPGA synthesis on MacOS). We are using a terminal inside VSCode to run make commands and bash scripts.

schoolRISCV was already ported to a number of FPGA boards, however this number is less than the number of supported boards in basics-graphics-music infrastructure. In addition, porting the core to this infrastructure is a good exercise for a student because he can learn how to connect the CPU core to an instruction memory and the peripherals.

We start by creating a new example (30_schoolriscv) in ~projects/basics_graphics_music/labs directory and copying all the scripts, top.sv and tb.sv from another lab, such as a counter or a shift register. Then we copy to this directory files from ~projects/schoolriscv/src: sm_register.v sm_rom.v sr_cpu.v and sr_cpu.vh. Note: we don’t copy sr_top.sv and sm_hex_display.v because we already have their functionality inside basics_graphics_music.

Before we synthesize the resulting project the first time, we need to run a script 06_choose_another_fpga_board.bash. It is invoked in Linux as “./06_choose_another_fpga_board.bash” and under Windows as “bash 06_choose_another_fpga_board.bash”. The script shows a menu and you need to enter a number that corresponds to a board you have:

You need to add the following to top.sv:

We can synthesize the code by running a script 03_synthesize_for_fpga.bash.

You can also invoke the corresponding synthesis GUI by running the script 05_run_gui_for_fpga_synthesis.bash. GUI for all the commercial FPGA toolchains we support is useful because it can show the schematics compiled from our code, on different stages (compilation, mapping, place & route).

For example here is the schematics of the top module in intelFPGA Quartus GUI after compilation and elaboration but before mapping to the FPGA cells of the specific FPGA. It features the instances of CPU, the instruction memory and the seven segment display:

If we click on sr_cpu instance we are going to see the instances of its submodules: decoder, register file, ALU and control:

After running the synthesis, the script 03_synthesize_for_fpga.bash also configures the FPGA board. If you did not connect the board at first, the script fails, but you can configure the board without running the synthesis by invoking 04_configure_fpga.bash:

Now you can see the CPU running on an FPGA board with a slow single Hertz clock:

We can also synthesize schoolRISC CPU for ASIC using the Open Lane toolchain. We did not do this in Azerbaijan, but can do it in California. I had to change the scripts to do it, but during the upcoming seminar we will run ASIC synthesis by invoking the script 07_synthesize_for_asic.bash.

Once a synthesis is successful, we can run 08_visualize_asic_synthesis_results_1.bash to see the results in GUI:

Let’s zoom in:

In addition to the Python-based OpenRoad viewer, there is also a KLayout viewer:

We can also zoom in here:

Finally I need to note that the students should analyze the Quality of Results (QoR) reports (area, timing, power) and especially the reports from the Static Timing Analysis (STA) for both ASIC and FPGA.

This setup (simulation on ISS and RTL level, compilation using RISC-V GCC toolchain, synthesis for ASIC and FPGA) can be used as a base for a number of exercises. Some of them are listed in Verilog Meetups @ Hacker Dojo: the status and the plans for February 2024:

    1. CPU: Adopt schoolRISCV into the basics-graphics-music / ready-valid-etc infrastructure.
    2. CPU: Add a problem to solve: modify schoolRISCV by adding a multiplication instruction support using a combinational multiplier in Verilog. Change the CPU instruction decoder, ALU, test, and testbench to verify the result. Demonstrate the difference in maximum clock frequency using Open Lane ASIC synthesis and FPGA synthesis tools.
    3. CPU: Add a problem to solve: modify schoolRISCV by adding a multiplication instruction support using a pipelined multiplier with the latency of two clock cycles. Two variants: using a stall and using an optimized pipelined CPU implementation.
    4. CPU: Add a problem to solve: modify schoolRISCV to support an instruction memory with zero, one, or two-cycle latency.
    5. CPU: Adopt MIRISCV core into the basics-graphics-music / ready-valid-etc infrastructure.
    6. CPU: Create an example of three MIRISC cores sharing the same memory using simple arbitration. Then make this memory a multi-bank memory. Demonstrate bank conflicts and the performance gain on different memory access patterns.
    7. CPU: Create an example of two MIRISC cores exchanging information with each other using FIFOs connected to memory-mapped registers (gated storage). We can discuss this mechanism in some sessions.
    8. CPU: Connect a cache from the appendix to the Patterson-Hennessy textbook (5th Edition) to the MIRISC core and demonstrate the performance changes with different memory access patterns.
    9. CPU: Adopt YRV-Plus RISC-V core, described in Inside an Open-Source Processor – July 1, 2021 by Monte Dalrymple, into the basics-graphics-music / ready-valid-etc infrastructure. Prepare a presentation that compares and contrasts this core against another Monte Dalrymple’s core that follows the microarchitecture before the RISC revolution and is described in Microprocessor Design Using Verilog HDL Paperback – 2017 by Monte Dalrymple.
    10. CPU: Research and possibly put into basics-graphics-music / ready-valid-etc infrastructure a CPU called CORE-V Wally. It is associated with a textbook and is promoted by David Harris.
    11. CPU, long-term: Branch prediction examples.
    12. CPU, long-term: Cache coherency protocol examples: MSI, MESI, MOESI, snooping-based, directory-based.

Finally, I would like to say “Thank you” to the faculty and the students of ADA University who made our seminar in Azerbaijan possible and useful:

Abzatdin Adamov
Fuad Hajiyev
Wisam Al-Dayyeni
Elman Karimli
Nariman Vahabli
Orkhan Karimzada
Pasha Pashazada
Ayan Mammadzada

Arif Mammadli
Asra Mammadli
Aykhan Aghayev
Huseynali Sadikhov
Jamil Aliyev
Leyla Neymat
Leyla Neymat
Nərgiz Əmiraslanli
Said Akhadov
Sardar Ziyatkhanov
Turkan Hajiyeva

For the people in the San Francisco Bay Area, I would like to invite old and new participants to the seminar at Hacker Dojo in Mountain View on Sunday, March 3, from 2 pm till 5 pm. See you there. If you cannot come in person, try connecting using Zoom.

 

The discussion on habr.com