Coriolis2 180nm layout

Simple floorplan

Register files

There are 6 register files: STATE, SPR, INT, CR, XER and FAST.

Access to each of the ports is managed via a "Priority Picker" - an unary-in but one-hot unary-out picker - which allows one and only one "user" of a given regfile port at any one time.

Computation Units

There are 8 Function Units: ALU, Logical, Condition, Branch, ShiftRot, LDST, Trap, and SPRs.

Each Function Unit has operand inputs and operand outputs. Across all pipelines there are multiple Function Units that require "RA" (Register A Integer Register File). All of such "RA" read requests are (surprise) connected to the same "Priority Picker" mentioned above: likewise all Function Units requiring write to the "RT" register are connected to the exact same "RT-managing" Write Priority Picker.

Load Store Computation Unit(s)

Load/Store is a special type of Computation Unit that additionally has access to external memory. In the case where multiple LDSTCompUnits are added, L0CacheBuffer is responsible for "merging" these into single requests.

There are however two L0 Caches (both 128-bit wide), with a split on address bit 4 for selecting either the odd L0 Cache or the even L0 Cache.

Each of the two L0 caches has dual 64-bit Wishbone interfaces giving a total of four 64-bit Memory Bus requests that will be merged through an Arbiter down onto the same Memory Bus that the I-Cache is also connected to.

Instructions

Instructions are decoded by PowerDecoder2, after being read by the simple core FSM from the Instruction Cache. Currently this is an extremely simple memory block, to be replaced by a proper I-Cache with a proper connection to the Memory Bus (wishbone).

IO Ring and JTAG

The IO Ring is autogenerated from the same pinmux program that created the ?pinouts and the SVG image. The image was used by Greatek for packaging as well as a PCB designed by Professor Galayko of Sorbonne University.

The exact same pinmux program's output, specifying all interfaces, was also used to autogenerate the HDL for the JTAG Boundary Scan.

By strictly using the exact same machine readable specification for all Interfaces using only autogenerated techniques it was possible to ensure complete consistency across

  • Markdown file
  • SVG Image for packaging
  • IO Ring
  • JTAG Boundary Scan

JTAG also contains a Wishbone Master for direct access to Memory and also a DMI Interface for controlling the core. In simulations a JTAG client was implemented both in nmigen HDL as well as verilator. The exact same openocd scripts or direct JTAG connectivity using jtagremote can then be used on:

  • nmigen HDL simulations
  • verilator simulations
  • ECP5 FPGA
  • the actual ls180 ASIC

Building

To build see coriolis2. A tag has been used and the build instructions specify it. The soclayout repository is standalone, containing a snapshot of the verilog autogenerated output.

About coriolis2

There are several talks online now.

Jean-Paul Chaput of LIP6 carried out several improvements to coriolis2 in order for it to cope with an 800,000 transistor 30 mm2 180nm layout. These included:

  • automatic antennae diodes (needed for stopping ESD),
  • clock tree improvements
  • Dual Power rings (Core, IO)
  • Automatic buffer insertion (clock tree synchronised)
  • High fanout buffers (1 to 128) and repeater buffers

Overall it was a significant amount of work and it is entirely automated RTL2GDS, no manual intervention required.

coriolis2 converts verilog to BLIF using yosys and the Cell Library, then converts BLIF into a VHDL subset. This subset is extremely simple, comprising links (netlists) to cells and nothing more. It can be extracted and converted to actual VHDL and substituted successfully into verilator, ghdl or icarus simulations using cocotb (caveat: the files are enormous).