SNUG Silicon Valley 2026, an influential conference on tools for chip design, published my paper (alt link) that analyzes the mistakes AI tools and EE students make in SystemVerilog. Apparently, neither AI nor students are properly trained in data pipelining, flow control, distributing workload and similar microarchitectural issues. Yes, I do realize that AI abilities are a moving target. Whenever I create a challenge AI cannot crack, somebody leaks it to the internet, and AI is able to solve it in two to six months. However an intern striving to be a designer should be able to solve such tasks without AI, even if he plans to use AI. Otherwise he will not recognize some nasty AI-generated surprises, and then the millions of phones, internet routers, controllers for airplane turbines or whatever he designs are going to be dead on arrival. Here is the poster for the paper:

One of the challenges I describe, SystemVerilog Microarchitecture Challenge for AI No.2. Adding the Flow Control, is a blend of two open-source projects: SystemVerilog Homework from Verilog Meetup, and CORE-V Wally CPU from OpenHW Group.

The arithmetic sub-blocks for the Challenges are created by wrapping the FPU (Floating Point Unit) from the Wally CPU. David Harris, one of the principal designers of Wally and an author of several college textbooks, presented Wally a week ago at Samsung Semiconductor in Silicon Valley:

I know David from one of my previous jobs, at MIPS Technologies / Imagination Technologies, where our educational program department contracted David to create a product called MIPSfpga, based on my recommendation. Last week David also described the flow used for RISC-V core certification.

Wally CPU is described in a new David’s textbook coming out this year. Since the textbook has not been released yet, AI engines are not trained on it. This partially explains why AI had trouble to figure our the latencies of sub-blocks in the Challenges I presented in my paper. Those poor software AI creatures were writing “for the illustration, assume the latency is 1 clock cycle”. Since the sub-block latency is not 1, the AI-generated solutions failed my testbench, until August 2025 when somebody trained them to extract the latencies from Verilog code:

On the SNUG floor, I met John Cooley, the most famous blogger in the EDA industry. John became a blogger back in 1991 when the word “blogger” did not even exist yet. His website deepchip.com has eons of industry chronicles.

John read my article, then brought to my poster a gentleman and a lady from NVidia. He was using me as an example of how to write a convincing article, by doing the homework with the arguments, diagrams and code. John compared me to Martin Luther, the founder of the Protestant Reformation in 16th-century Europe. John told the NVidians, “Yuri has a very dangerous idea,” and compared my poster about AI on the SNUG conference to the “Ninety-five Theses or Disputation on the Power and Efficacy of Indulgences” that Luther put on the wall of the Castle Church in Wittenberg.

Some background: back in 1997, John Cooley steered the course of history by publishing a highly influential report, Verilog Won & VHDL Lost? — You Be The Judge!. John simply put several VHDL and Verilog guys in the same room and asked them to design a single-page block.

Then what happened: “Of the 9 Verilog designers in the contest, only 1 didn’t get to a final gate level netlist because he tried to code a look-ahead parity generator. Of the 8 remaining, 3 had netlists that missed on functional test vectors. The surprize was that, during the same time, none of 5 VHDL designers in the contest managed to produce any gate level designs.”

John’s competition was a culmination of hot debates described in his 1996 post INDUSTRY GADFLY: “From Beirut To Bosnia” & Response. After the competition I heard several people saying “the fate of VHDL is sealed”. Even Synopsys started to move their best engineers from VHDL to Verilog products at the time. Finally Verilog vs VHDL stabilized (you can still find VHDL in for example Imagination GPU), but this was much later and another story.

Then I met on the floor the #1 SNUG article author, Cliff Cummings. Cliff wrote popular texts on SystemVerilog, Resets, Clock Domain Crossing (CDC), Finite State Machines (FSMs), and other perennial topics. His articles are a must-read for anybody who prepares for job interviews. I did not take a photo with Cliff Cummings this year, so here is my photo with him from 2019, when he was doing training at Wave Computing, an AI hardware accelerator company where I was working at the moment:

I went with Cliff through my article and complained that the majority of students do not learn at university the topic of credit-based flow control, a bread-and-butter technique used in electronics companies for the last 20-30 years.

Cliff told me that the universities do not teach Clock Domain Crossing (CDC) either and asked: “Is your example a single-clock design?” I answered “yes”, and Cliff said “OK, in this case you can do this “+1″ operation to the counter without any problem, whenever you pop from that FIFO”:

Then I asked Cliff: Do you know a technique how to do the credit return when a FIFO is a CDC FIFO? You can use a second CDC FIFO which will have a Zero-width.

Cliff was shocked: “What?? What is a “Zero-Width FIFO”?” I confirmed: “Yes, not Zero-Depth FIFO, but a Zero-Width FIFO. This is a FIFO that has only push, pop, empty, full, read and write pointers – but no data”. The data (+1) is implied by the read/write pointers and empty/full flags. I heard this technique several years ago and it would be a natural next step to add it to our Challenges to the students and the AI companies big and small:

For those who are not familiar with Cliff’s works, here are three selected articles on CDC that every student has to study:

  1. Clock Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog. SNUG Boston 2008. Voted SNUG Best Paper 1st Place.
  2. Simulation and Synthesis Techniques for Asynchronous FIFO Design. SNUG San Jose 2002, updated 2005.
  3. Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons. SNUG San Jose 2002, updated 2020. Voted SNUG Best Paper 1st Place.

Thank you, please read, send me feedback about my article, and let’s work together to keep the AI companies busy. They have too easy a life demoing single-cycle CPUs and FSM-based BFMs with a single transaction in flight. We need to make sure they address the realistic pipelined, out-of-order, multi-bank and other interesting designs.

Contributors to the SystemVerilog Homework that became the basis for the Challenges: Yuri Panchul, Mike Kuskov, Maxim Kudinov, Kiran Jayarama, Maxim Trofimov, Alexey Fedorov, Konstantin Blokhin, Petr Dynin. Creator of Challenge 3, Alex Huang.

Leave a Reply

Your email address will not be published. Required fields are marked *