Skip to content

Chisel OpenNPU

An open-source Neural Processing Unit implementation in Chisel 6. Targets low-power, edge-oriented SoC integration.

Source code: GitHub


Notation

The following three symbols appear throughout all documentation, source code, and tests. Confusing them causes hard-to-debug hardware elaboration errors.

Parameter definitions

Symbol Meaning Test default Top (K=64)
N (N(bits)) Base lane width in bits. Matches MMALU nbits. Always spelled N(bits) in prose. 8 8
L Number of base VX registers. Must be divisible by 4. 32 32
K SIMD lane count per register. Equals MMALU array-side n at the backend boundary. 8 64

Register classes share the same physical bytes (L × K × N/8 total):

Class Count Lane width Aliases
VX[0..L-1] 32 N bits native
VE[0..L/2-1] 16 2N bits VE[i] = VX[2i] ∥ VX[2i+1]
VR[0..L/4-1] 8 4N bits VR[i] = VX[4i..4i+3]

ISA Designs


Implementation Details


Tutorials

  • GEMM + Softmax Quantization — post-accumulation quantization pipeline for transformer attention activation; demonstrates reduction ops, programmable LUT activation (vlut/vsetlut), numerical stability, and full end-to-end quantization chain with Scala reference verification

Quick Start

# Build the dev image
make image

# Enter the dev container
make container

# Run all tests (inside container or via Docker)
make test

# Elaborate top-level design (writes top.sv)
make build

See README.md for full setup instructions.