Samsung expands GPU team: open RTL Designer and Performance Architect positions

The team at Samsung that designs the Xclipse GPU in Galaxy phones with Exynos SoC – is doing expansion. There are multiple jobs available, including positions in the RTL and Architecture teams:

GPU RTL Design Engineer

GPU Performance Architect (PPA)

These positions are in two locations:

Samsung Advanced Computing Lab (ACL) in San Jose, California
Samsung Austin Research and Development Center (SARC) in Austin,Texas

The compensations listed in the opening descriptions are base salaries only. The total compensations includes serious performance-based bonuses.

You can apply at the website – or, if you want, I can make you an internal referral since I (Yuri Panchul) am a member of GPU team. Getting a referral from me is easy:

First, you solve the SystemVerilog Microarchitecture Challenge for AI No.2. Adding the Flow Control., an open-source challenge from Verilog Meetup – and send me the solution to yuri@panchul.com.https://github.com/verilog-meetup/systemverilog-microarchitecture-challenge-for-ai-2
Then I discuss your solution with you either in person (if you live in the San Francisco Bay Area) or over Zoom. I don’t care whether you solve the challenge manually or use AI for it, but I need you to explain every single line of the solution. In addition to it, I may ask you a couple more questions (manual solutions only in front of me, no AI).
If I like your solutions, I will enter your data in the Samsung internal referral website. After that, you will get an email from the website and can start an application to go through the official Samsung interview process. I will also forward your resume to the hiring managers and the team’s recruiters.
Note that solving a couple of puzzles for me is not a part of the official Samsung interview process. This is just my personal way to ensure I forward only the relevant resumes to the company. Every single member of our RTL teams would solve such a thing easily, so this is not something difficult, just some basic techniques to design a static pipeline in Verilog, generic for the industry.

Our team consists of friendly, proficient, and helpful people. We also have a fancy office that just got 10-year anniversary:

SystemVerilog Microarchitecture Challenge for AI No.2. Adding the Flow Control.

This repository contains a new challenge to any AI software that claims to generate Verilog code. The challenge is based on a very typical scenario in an electronic company: an engineer has to write a pipelined block using a library of sub-blocks written by somebody else. Then this engineer has to verify his block using a testbench written by somebody else. He may also need to figure out the sub-block latencies and handshakes by analyzing the code, since a lot of code in electronic companies is not sufficiently documented.

The SystemVerilog Microarchitecture Challenge for AI No.2 is based on the SystemVerilog Homework project by Verilog Meetup. It also uses the source code of an open-source Wally CPU.

This challenge is a sequel to the SystemVerilog Microarchitecture Challenge for AI No.1 which was challenging to ChatGPT 4 but became less challenging when ChatGPT 5 appeared.

1. The Prompt

Finish the code of a pipelined block in the file challenge.sv. The block computes a formula “a ** 5 + 0.3 * b – c”. Ready/valid handshakes for the arguments and the result follow the same rules as ready/valid in AXI Stream. When a block is not busy, arg_rdy should be 1, it should not wait for arg_vld. You are not allowed to implement your own submodules or functions for the addition, subtraction, multiplication, division, comparison or getting the square root of floating-point numbers. For such operations you can only use the modules from the arithmetic_block_wrappers directory. You are not allowed to change any other files except challenge.sv. You can check the results by running the script “simulate”. If the script outputs “FAIL” or does not output “PASS” from the code in the provided testbench.sv by running the provided script “simulate”, your design is not working and is not an answer to the challenge. When there is no backpressure, your design must be able to accept a new set of the inputs (a, b and c) each clock cycle back-to-back and generate the computation results without any stalls and without requiring empty cycle gaps in the input. The solution code has to be synthesizable SystemVerilog RTL. Your design cannot use more than 10 arithmetic blocks from arithmetic_block_wrappers directory or more than 10000 D-flip-flops or other state elements outside those arithmetic blocks. The solution also cannot use any SRAM or other embedded memory blocks. A human should not help AI by tipping anything on latencies or handshakes of the submodules. The AI has to figure this out by itself by analyzing the code in the repository directories. Likewise a human should not instruct AI how to build a pipeline structure since it makes the exercise meaningless.

More: https://github.com/verilog-meetup/systemverilog-microarchitecture-challenge-for-ai-2

Thank you,
Yuri Panchul