FPGA to ASIC

        - one lonesome software engineer's trek into the darkness


First design

The first job was to get the design up and running in simulation, so out with the trusty text editor, and a-typing I would go. The goal here was just to implement the SPI slave and control path that would ultimately let me interface to the chip. A SPI slave isn't particularly hard to implement, it comes down to a shift register and a bit of clock management. Once that was passing it's verilog testbench (different SCLK to chip clock rates, successfully write to register, successfully read just-written value from register, …) using icarus verilog and gtkwave
Stacks Image 214
… it was time to start pushing it through the qflow toolchain.

Now, the ultimate goal is to make a crypto core (SHA-3) which can take (via SPI) a message and return a cryptographic hash of the message, streaming it into and out of the chip. This isn't the most practical thing to do (transmitting the message over a SPI bus in plain text isn't exactly a smart idea if you're trying to encrypt it) but it's a reasonably ambitious target design for a first attempt at chip design. The crypto core was going to operate on 512-bits of the message at once (SHA-3 comes in -256 and -512 "standard" packages), and I'd used zero-padding if the message to encrypt didn't fit exactly.

Bearing that in mind, and that we have an 8-bit SPI bus, it seemed reasonable to have a register set that could hold the current input message, with 512 bits meaning I would want 64 8-bit registers, and another 64 8-bit registers for the current output message. Add some control signalling, and we're done, the process would be:

  • Write 64 bytes of message into the first page of registers
  • wait until the 'done' signal is raised
  • repeat the above two steps until we're finished sending data
  • read the 64 bytes of message from the second page of registers

All well and good, and it worked fine in the FPGA. Now onto the ASIC implementation…

I created the standard project hierarchy (source, layout, synthesis directories in a top-level project directory) and Invoked qflow using the default 0.35µm technology, vis:
qflow spiControl
then uncommented the lines in the generated 'qflow-exec.sh' and ran the script. It ran, and ran, and ran. Eventually it gave up (I actually didn't read the log file closely enough the first time around, so I went through the LVS process and of course it failed. Note-to-self: pay attention!) Qrouter was starting off with ~7000 nets to route, and failed to route about 340 of them after several hours of work.

Ok, so this wasn't going quite so smoothly as the tutorial [grin]. It turns out there's an option in 'project_vars.sh' (automatically generated when I first invoked qflow above) to set the initial packing density of the standard cells - the lower the number, the lower the density, therefore the more space there is available for routing between them. I experimented with values (0.8, 0.7, 0.5,..) with qrouter getting closer and closer to routing the design. Eventually I went as low as 0.25 for the initial_density value and lo and behold, we got a routed design

So then it was a matter of going through the same steps as in the tutorial to check LVS (layout vs synthesis) - I knew it worked at the verilog level, at least according to simulation, but this is a check that the circuit is the same after qrouter has strutted its funky stuff and connected everything together on a physical basis. Placing the GND and VDD power rails down the (respectively right and left) sides eventually ("gds read…" took a long time, I thought it had crashed at first) resulted in:
Stacks Image 221
… at which point I could run the spice extraction (which again took a little while, but eventually completed) and run netgen
netgen -batch lvs layout/spiControl.spice "synthesis/spiControl.spc spiControl"
…
Netlists match uniquely.
Contents of circuit 1:  Circuit: 'layout/spiControl.spice'
Circuit layout/spiControl.spice contains 4963 device instances.
  Class: OR2X2                 instances: 268
  Class: MUX2X1                instances: 468
  Class: AOI22X1               instances:  40
  Class: NOR2X1                instances: 554
  Class: NAND3X1               instances:  16
  Class: OAI22X1               instances: 128
  Class: DFFPOSX1              instances: 1079
  Class: AOI21X1               instances:  92
  Class: NAND2X1               instances: 427
  Class: OAI21X1               instances: 702
  Class: BUFX2                 instances:   7
  Class: BUFX4                 instances: 370
  Class: AND2X2                instances:  93
  Class: INVX1                 instances: 689
  Class: INVX2                 instances:  18
  Class: INVX4                 instances:   7
  Class: INVX8                 instances:   5
Circuit contains 4970 nets.
Contents of circuit 2:  Circuit: 'spiControl'
Circuit spiControl contains 4963 device instances.
  Class: OR2X2                 instances: 268
  Class: MUX2X1                instances: 468
  Class: AOI22X1               instances:  40
  Class: NOR2X1                instances: 554
  Class: NAND3X1               instances:  16
  Class: OAI22X1               instances: 128
  Class: DFFPOSX1              instances: 1079
  Class: AOI21X1               instances:  92
  Class: NAND2X1               instances: 427
  Class: OAI21X1               instances: 702
  Class: BUFX2                 instances:   7
  Class: BUFX4                 instances: 370
  Class: AND2X2                instances:  93
  Class: INVX1                 instances: 689
  Class: INVX2                 instances:  18
  Class: INVX4                 instances:   7
  Class: INVX8                 instances:   5
Circuit contains 4970 nets.

Circuit 1 contains 4963 elements, Circuit 2 contains 4963 elements.
Circuit 1 contains 4970 nodes,    Circuit 2 contains 4970 nodes.

Netlists match uniquely.
I think Lewis Carroll said it best: "Oh frabjous Day! Callooh! Callay! He chortled in his joy."

I may not yet have slain the jabberwock, but it does feel as though I'm in the shop, buying the boots I'm going to use on the journey…

Caveats



  • This is very much a brute-force approach to getting the route completed - to continue the "adventurer" theme, sort of like the barbarian-style "if it moves, hit it! If it doesn't move, hit it until it does!" approach. It seems to me there must be a better way than sacrificing all that real-estate to the routing of D flip-flops.
  • There is such a thing as a memory compiler, and it's possible (to be investigated) that there may be one that works with Magic/qflow. We could certainly use a RAM block instead of all those registers, and the packing density would presumably be greatly improved. All I have to do is figure out if (a) it does indeed work with qflow, and (b) how to do it. Nothing too difficult then…
  • As it stands, just the SPI slave interface is running at about 3.5mm2 which is already pretty large. I haven't even begun to put the crypto core in there yet…


[Back]