A high-frequency, fully synthesizable hardware implementation of the Keccak Permutation and SHA-3/SHAKE hashing algorithms.
This core utilizes a Multi-Cycle Iterative Architecture. To maximize operating frequency (
- FIPS 202 Compliant: Byte-exact implementation of SHA-3 and SHAKE standards. Verified against 3,592 NIST Test Vectors.
- Runtime Configurable: Switch between 4 modes dynamically via input signals:
- Fixed-Length: SHA3-256, SHA3-512
- Extendable-Output (XOF): SHAKE128, SHAKE256
- Standard Interface: AXI4-Stream compliant Sink (Input) and Source (Output) with full backpressure support.
- Robust Architecture:
- Internal Padding: Automatically handles the FIPS 202
10*1padding rule and Domain Separation Suffixes. - Safety Features: Integrated SystemVerilog Assertions (SVA) verify state machine stability, counter overflows, and AXI protocol compliance in real-time.
- Internal Padding: Automatically handles the FIPS 202
- Production Ready: Written with
default_nettype noneto prevent implicit wire hazards and supports explicit width casting.
| Mode | Security Strength | Rate (r) | Capacity (c) | Suffix |
|---|---|---|---|---|
| SHA3-256 | 128-bit | 1088 bits | 512 bits | 01 |
| SHA3-512 | 256-bit | 576 bits | 1024 bits | 01 |
| SHAKE128 | 128-bit | 1344 bits | 256 bits | 1111 |
| SHAKE256 | 256-bit | 1088 bits | 512 bits | 1111 |
The core follows a strict Start β Absorb β Permute β Squeeze lifecycle.
-
Initialization
- Assert
start_ifor one cycle while inSTATE_IDLE - Internally:
- Mode, rate, and suffix are latched
- State array is wiped to zero
- Absorb/squeeze counters are reset
- Assert
-
Absorption Phase
- Input data is streamed via AXI4-Stream sink
- Backpressure is applied when permutations are running
t_last_imarks the final message fragment- Arbitrary message lengths are supported via
t_keep_i
-
Padding & Final Permutation
- FIPS 202 domain suffix and
10*1padding are injected automatically - A final 24-round permutation is executed
- FIPS 202 domain suffix and
-
Squeeze Phase
- Output is streamed via AXI4-Stream source
- For SHA3 modes, output terminates automatically
- For SHAKE modes, output continues indefinitely until
stop_iis asserted
keccak_mode_i must remain stable after start_i until the core returns to STATE_IDLE.
t_ready_ois deasserted while permutations are running- Data is only accepted when
t_valid_i && t_ready_o - Input signals must remain stable while
t_ready_ois low
t_valid_ois asserted only when downstream is ready- Output data, keep, and last remain stable under backpressure
- SHAKE modes may produce unlimited output blocks
The architecture centers around a 1600-bit (200-byte) State Array that circulates through four specialized processing units in a feedback loop:
- Keccak Absorb Unit (KAU): Manages the "Sponge" construction by XORing incoming AXI data streams into the state array. It handles partial-block buffering and rate-boundary crossings.
-
Suffix Padder Unit (SPU): Injects the domain separation bits (e.g.,
0x06for SHA3) and the FIPS 20210*1padding rule once the message is complete. -
Keccak Step Unit (KSU): The computational heart of the core. It executes the 24 rounds of permutations (
$\theta, \rho, \pi, \chi, \iota$ ). - Keccak Output Unit (KOU): Truncates the state array to the desired rate (r) and linearizes the data onto the AXI4-Stream output bus during the Squeeze phase.
The design is orchestrated by a centralized FSM with the following states:
-
IDLE
- Waits for
start_i - Core is quiescent; AXI interfaces inactive
- Waits for
-
ABSORB
- Accepts AXI4-Stream input data
- Handles partial words using
t_keep - Supports carry-over when rate boundaries are crossed
- Automatically schedules permutations when the rate is full
-
SUFFIX_PADDING
- Injects FIPS 202 domain separation suffix
- Appends final
1bit according to10*1padding rule
-
PERMUTATION PIPELINE
- Each Keccak round is decomposed into 5 FSM states:
THETA β RHO β PI β CHI β IOTA
- A full permutation requires 24 rounds Γ 5 cycles = 120 cycles
- Each Keccak round is decomposed into 5 FSM states:
-
SQUEEZE
- Streams output blocks via AXI4-Stream
- Automatically re-enters permutation when rate is exhausted
- Terminates on:
- Hash completion (SHA3)
- External
stop_i(SHAKE)
The absorb unit supports input fragments that cross rate boundaries without data loss.
- Partial input words are tracked using
t_keep - Excess bytes are buffered internally (
carry_over) - Carry-over data is automatically re-injected on the next absorb cycle
- No external re-alignment or padding is required from the user
This allows seamless hashing of arbitrarily-sized messages using wide AXI data paths.
- Permutation latency: 120 cycles per Keccak-f[1600]
- Absorb throughput: 256 bits per accepted AXI beat
- Squeeze throughput: 256 bits per cycle (subject to backpressure)
- Critical path: Single Keccak step (Ξ, Ο, Ο, Ο, or ΞΉ)
The multi-cycle round decomposition significantly reduces combinational depth, enabling higher achievable clock frequencies compared to single-cycle designs.
DWIDTH: Input Data Width (Default: 256 bits)MAX_OUTPUT_DWIDTH: Output Data Width (Default: 256 bits)
| Signal Group | Name | Direction | Width | Description |
|---|---|---|---|---|
| System | clk |
Input | 1 | System Clock (Rising Edge) |
rst |
Input | 1 | Synchronous Active-High Reset | |
| Control | start_i |
Input | 1 | Pulse high to reset FSM and start new hash |
keccak_mode_i |
Input | 2 | 00: SHA3-256, 01: SHA3-512, 10: SHAKE128, 11: SHAKE256 |
|
stop_i |
Input | 1 | Stops output generation (Required for XOF modes) | |
| AXI Sink | t_data_i |
Input | 256 | Input Message Data |
t_valid_i |
Input | 1 | Master Valid | |
t_last_i |
Input | 1 | Assert high on the final chunk of the message | |
t_keep_i |
Input | 32 | Byte Enable (1 bit per byte). t_keep[0] is LSB. |
|
t_ready_o |
Output | 1 | Slave Ready. Core pulls low when processing permutation. | |
| AXI Source | t_data_o |
Output | 256 | Hash Output Data |
t_valid_o |
Output | 1 | Master Valid | |
t_last_o |
Output | 1 | End of Hash (High for SHA3, Low for SHAKE) | |
t_keep_o |
Output | 32 | Byte Enable for output data | |
t_ready_i |
Input | 1 | Slave Ready (Backpressure from downstream) |
This project utilizes a dual-verification strategy: SystemVerilog Assertions (SVA) for runtime protocol checking and Python-generated NIST vectors for standard compliance. Continuous Integration (CI) is handled via GitHub Actions to ensure build integrity on every Pull Request.
This core has been verified against the official NIST Cryptographic Algorithm Validation Program (CAVP) test vectors. A dedicated "Heavy" testbench (keccak_core_heavy_tb.sv) handles the automated regression of over 3,500 test vectors using a Two-Pass Python Runner to optimize disk usage.
| Standard | File Type | Count | Status |
|---|---|---|---|
| SHA3 | ShortMsg, LongMsg |
100% | β PASS |
| SHAKE | ShortMsg, LongMsg, VariableOut |
100% | β PASS |
| Total | All Vectors | 3,592 | PASS |
The simulation environment relies on ModelSim (Intel FPGA Lite). Since ModelSim ASE is a 32-bit application, running it on modern 64-bit Linux distributions (like Ubuntu 20.04/22.04) requires specific 32-bit compatibility libraries and a kernel check patch.
Install Dependencies:
# 1. Add architecture and update
sudo dpkg --add-architecture i386
sudo apt-get update
# 2. Install core build tools
sudo apt-get install -y wget build-essential
# 3. Install required 32-bit libraries (Required for ModelSim ASE)
sudo apt-get install -y libc6:i386 libncurses5:i386 libstdc++6:i386 \
lib32ncurses6 libxft2 libxft2:i386 libxext6 libxext6:i386Patching ModelSim (Critical for Modern Linux):
If ModelSim fails to launch or hangs, apply these patches to the vco script (located in <install_dir>/modelsim_ase/vco) to fix OS detection and force 32-bit mode:
# Fix Red Hat directory detection logic
sudo sed -i 's/linux_rh[[:digit:]]\+/linux/g' <path_to_modelsim>/vco
# Force 32-bit mode
sudo sed -i 's/MTI_VCO_MODE:-\"\"/MTI_VCO_MODE:-\"32\"/g' <path_to_modelsim>/vcoThe repository includes a Makefile that handles compiling, running, and waveform generation for multiple testbenches.
Setup Environment:
Ensure the path in env.sh points to your specific ModelSim installation (e.g., /opt/intelFPGA_lite/... or /pkgcache/...).
source env.shRun All Tests: This will execute the entire suite of unit tests and the full core integration test.
makeRun Specific Test: You can target individual modules (Unit Tests) using the run_<tb_name> target:
make run_theta_step_tb
make run_keccak_core_tb
make run_keccak_absorb_unit_tbClean Artifacts: Removes generated work libraries and .vcd waveform files.
make cleanViewing Waveforms: Every simulation run automatically generates a corresponding Value Change Dump (.vcd) file (e.g., keccak_core_tb.vcd) which can be opened in GTKWave or ModelSim.
This section describes how to execute the full NIST FIPS 202 compliance regression, consisting of vector generation followed by a two-pass simulation run.
Convert the official NIST .rsp files into a single consolidated vectors.txt file consumed by the heavy testbench.
cd verif/
# Parse ALL vectors (β 4,000 total; full compliance run)
python parse_nist_vectors.py --full test_vectors/SHA3/*.rsp test_vectors/SHAKE/*.rsp
# OR parse a reduced subset (default: 10 per file) for quick sanity checks
# python parse_nist_vectors.py test_vectors/SHA3/*.rsp test_vectors/SHAKE/*.rsp
cd ..Invoke the heavy regression runner:
python tb/run_heavy.pyThe regression executes in two passes:
-
Fast Pass Runs all test vectors with waveform generation disabled for maximum throughput.
-
Debug Pass (On Failure Only) Automatically re-runs the failing test ID with VCD recording enabled and archives the waveform for inspection.
-
Logs Full simulation output is written to
regression.log. -
Failure Artifacts Waveform dumps (
.vcd) for failing vectors are stored infailures/.
The repository is organized into RTL source, testbenches, and verification scripts:
.
βββ docs/ # Architecture Diagrams & FSM Specs
βββ python_testing/ # Step-mapping Golden Models (Python)
βββ rtl/ # SystemVerilog Source Code
β βββ keccak_core.sv # Top-level Module
β βββ keccak_pkg.sv # Global Parameters & Enums
β βββ keccak_step_unit.sv # Permutation Round Logic
β βββ keccak_absorb_unit.sv # Input Buffering & XOR Logic
β βββ keccak_output_unit.sv # Output Linearization & Squeeze
β βββ suffix_padder_unit.sv # FIPS 202 Padding Logic
β βββ merge_sv.py # Script to bundle RTL for synthesis
β βββ *_step.sv # Individual Step Modules (Chi, Rho, etc.)
βββ tb/ # SystemVerilog Testbenches
β βββ keccak_core_tb.sv # Integration Testbench
β βββ keccak_core_heavy_tb.sv # NIST Compliance Regression
β βββ run_heavy.py # Python Automation Runner
β βββ *_step_tb.sv # Unit Testbenches for Sub-modules
βββ verif/ # NIST Compliance Suite
β βββ parse_nist_vectors.py # .rsp to vectors.txt parser
β βββ test_vectors/ # Official NIST CAVP Test Vectors
βββ Makefile # Simulation & Build automation
βββ README.md

