FPGA to ASIC

        - one lonesome software engineer's trek into the darkness


The quick brown fox

Stacks Image 372
The algorithm chosen by NIST for the SHA-3 standard is named Keccak, and the goal was always to get an implementation of this attached to a SPI interface, and have that as the first chip design.

Implemented in 0.18µm, the SPI interface and Keccak algorithm together take up approximately 0.4 square mm. A large percentage of this is the number of registers (D flip-flops) that are needed when you're operating on a 512-bit hash…

The implementation is pretty simple, you can send up to 64 bytes of data down the SPI interface (registers 64..127), then tell the chip that it has N bytes to hash (register 128 controls this), then tell it to go (register 129).

Once the hash is done, which takes 26 clock cycles (10 for loading the data into the bit-buffer, 16 for calculation, so it's possible I could reduce that down to 16 clocks for every successive hash) the keccak core raises a signal, and the 512 bits of output hash is copied into the SPI registers (0..64) ready for retrieval.

To indicate to the user of the chip that the data is ready to be retrieved, the (imaginatively named) output 'hashReady' is driven and can be used to drive an external IRQ.

The 1000-feet version of the layout looks like the image below. There are a some areas that are a bit congested, but overall it's a fairly sparse layout.
Stacks Image 375
Of course the big deal is "does it work". Well, the verilog simulation tells me that the SHA-3 hash of "The quick brown fox jumps over the lazy dog" is in fact

"d135bb84d0439dbac432247ee573a23ea7d3c9deb2a968eb31d47c4fb45f1ef"
"4422d6c531b5b9bd6f449ebcc449ea94d0a8f05f62130fda612da53c79659f609"

Stacks Image 380
… which is correct! And running the LVS script against the laid out circuit from Magic tells me that the laid-out circuit matches the spice simulation extracted from the design.

At this point I'm feeling pretty good about the design :) What I now have to do is understand the more of the nuts and bolts of the whole thing.
Circuit contains 6826 nets.

Circuit 1 contains 6819 elements, 
Circuit 2 contains 6819 elements.

Circuit 1 contains 6826 nodes,    
Circuit 2 contains 6826 nodes.

Netlists match uniquely.
Result: Circuits match uniquely.
Some topics needing more information input:

  • I need to understand more about the clock network within the chip. There some static timing analysis done by qflow that tells me it ought to run at ~240 MHz, but I want to know if there's some way to model and simulate the clock-delay across the chip to make sure.
  • Pads are a big issue - there are i/o pads, input pads, output pads, clock pads, VDD pads, and GND pads. I need to know whether I can feed in a 240 MHz clock over the clock pad, or whether there'll have to be a PLL on-board to ramp up the clock from an input frequency that the pad can handle.
  • How does one place pads in Magic, anyway ? I've not seen sight nor sound of them - and I only saw MOSI and MISO as signals brought out to the periphery of the chip as I was wiring up the GND and VDD rails.
  • Then there's the whole process of getting involved with Mosis - what's involved ? What are the costs ? How many children does one have to sacrifice to get access? Etc.

Still, getting something that passes LVS and simulates to the correct result is a milestone worthy of a blog post :)
[Back]