Skip to content

[WIP] Buses

NPU has some internal components and buses are the highways for those components to communiate. Following the tranditional computer system design, we will be having three major buses in our NPU: CBUS, ABUS and DBUS, which are control bus, address bus and data bus respectively. Control bus will transmit the decoded instructions to every NPU cores and the address bus is used to locate the data.

Control Bus

Our NPU should support multiple functionality including MMA, VSUB, VADD and other ops. As we are combining VLIW with systolic array, we need to expand our control bit length to support independent control to each cores.

For each 1-way pass, the operation byte in the instruction will look like this:

For each 1-way pass, the overall instruction will look like this:

And we have 4 1-way ALU inside a single PE, which mean it could be 4x 1-way ALUs, 2x 2-way ALUs, 2x 1-way ALUs + 1x 2-way ALU or 1x 4-way ALU. We will have a flexible design regarding to this design.

This would cause some issue on the scheduling. When dispatching 2-way input with 1-way inputs, the control will have to mask the adjacent control signal. To align with the little endian standard we will always listen from the last (lower bit positions). For example, if we dispatch CTRL#0&1 for 1-way 16bit OP, CTRL#2 and CTRL#3 for 2 1-way 8bit OP, and CTRL#3 for 1-way 8bit OP, then this is what we will get:

In this case, the instruction for CTRL#2 group will be ignored and only executing regarding to the signal from CTRL#1 group.