Algorithm Accelerators

IP-ALGO-001 v2.0 FPGA Validated

Dyber's algorithm accelerator portfolio provides complete, FPGA-validated hardware implementations of all three NIST post-quantum cryptographic standards plus protocol-level offload engines. Each accelerator performs the full algorithm lifecycle â€” key generation, encapsulation/signing, and decapsulation/verification â€” entirely in hardware, delivering orders-of-magnitude performance improvement over software implementations.

Overview #

Post-quantum cryptographic algorithms are computationally intensive by design. Lattice-based schemes like ML-KEM and ML-DSA rely on polynomial arithmetic over large rings, requiring thousands of modular multiplications per operation. Hash-based schemes like SLH-DSA require extensive Keccak permutation chains. In software, these operations impose 5â€“50Ã— overhead versus classical cryptography (RSA/ECC) â€” a performance gap that hardware acceleration eliminates.

Dyber algorithm accelerators are self-contained cryptographic engines. Each integrator instantiates the accelerator as a memory-mapped peripheral, writes input data (public keys, messages, ciphertext) through the AMBA interface, triggers the operation, and reads the result. The internal architecture â€” NTT engines, hash cores, sampling units, and modular arithmetic â€” is entirely abstracted from the integrator.

Accelerator	NIST Standard	Type	Security Levels
DYBER-MLKEM	FIPS 203	Key Encapsulation Mechanism	L1 (512), L3 (768), L5 (1024)
DYBER-MLDSA	FIPS 204	Digital Signature	L2 (44), L3 (65), L5 (87)
DYBER-SLH	FIPS 205	Hash-Based Signature	L1 (128f/128s), L5 (256f)
DYBER-HKEM	IETF Draft	Hybrid KEM Bridge	ECDH + ML-KEM composite
DYBER-TLS	RFC 8446	TLS 1.3 Handshake Offload	ML-KEM + ML-DSA
DYBER-SBOOT	â€”	Secure Boot Verification	ML-DSA chain of trust

DYBER-MLKEM â€” ML-KEM Key Encapsulation #

The DYBER-MLKEM accelerator implements the complete CRYSTALS-Kyber key encapsulation mechanism as standardized in FIPS 203. ML-KEM is the primary algorithm for post-quantum key exchange and is expected to replace ECDH in TLS, IPsec, SSH, and virtually all transport-layer security protocols.

The accelerator performs three operations: KeyGen (generate public/private key pair), Encapsulate (produce ciphertext and shared secret from a public key), and Decapsulate (recover shared secret from ciphertext and private key). All operations execute entirely in hardware with sub-microsecond to low-microsecond latencies â€” enabling per-connection key exchange at rates that software implementations cannot approach.

ML-KEM Internal Architecture #

Internally, the DYBER-MLKEM accelerator is organized as a multi-stage pipeline:

â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
â”‚  AMBA Bus Interface (AXI4-Lite control / AXI4-Stream data)   â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚  Command Sequencer  â€” FSM routing KeyGen/Encaps/Decaps      â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚ NTT     â”‚ SHAKE-XOF  â”‚ SAMPLER-CBD  â”‚ POLY-ARITH           â”‚
â”‚ Engine  â”‚ Hash Core  â”‚ Noise Gen    â”‚ Coefficient Ops      â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚  Coefficient Memory  â€” Banked BRAM for polynomial storage    â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚  Key Buffer  â€” Isolated storage for private key material    â”‚
â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜

The Command Sequencer orchestrates the data flow for each operation. For encapsulation, the sequence is: parse public key â†’ generate randomness via SHAKE â†’ CBD noise sampling â†’ NTT forward transform â†’ polynomial multiply-accumulate â†’ NTT inverse â†’ compress â†’ output ciphertext + shared secret. Each sub-operation executes on a dedicated functional unit with data forwarded through internal FIFOs.

The NTT engine within DYBER-MLKEM is configurable at synthesis time â€” integrators choose from NTT-R2 through NTT-R32 depending on throughput requirements. The NTT configuration determines the overall accelerator area and latency characteristics.

ML-KEM Security Level Variants #

Variant	NIST Level	Classical Equivalent	Key Size (pk/sk)	Ciphertext	Shared Secret
ML-KEM-512	Level 1	AES-128	800 / 1,632 bytes	768 bytes	32 bytes
ML-KEM-768	Level 3	AES-192	1,184 / 2,400 bytes	1,088 bytes	32 bytes
ML-KEM-1024	Level 5	AES-256	1,568 / 3,168 bytes	1,568 bytes	32 bytes

The accelerator supports runtime parameter selection â€” a single hardware instance can process ML-KEM-512, -768, or -1024 operations without reconfiguration. The parameter set is specified as part of each command. Resource utilization scales with the chosen NTT engine and maximum supported security level.

Performance note: FPGA-validated hardware delivers 12â€“28Ã— acceleration over optimized software (OpenSSL 3.2 on current-generation server processors) for equivalent operations. Exact latency and throughput figures are available under NDA in the evaluation datasheet. Contact ip-sales@dyber.com for access.

DYBER-MLDSA â€” ML-DSA Digital Signatures #

The DYBER-MLDSA accelerator implements the complete CRYSTALS-Dilithium digital signature algorithm as standardized in FIPS 204. ML-DSA provides post-quantum digital signatures for code signing, certificate authentication, document integrity, and any application currently using RSA or ECDSA signatures.

Three operations are supported: KeyGen (generate signing/verification key pair), Sign (produce signature over a message), and Verify (validate signature against message and public key).

ML-DSA Internal Architecture #

ML-DSA signing is architecturally more complex than ML-KEM due to its rejection sampling loop â€” the algorithm may need to restart signing if intermediate values exceed certain bounds. Dyber's implementation handles this entirely in hardware with a dedicated retry controller that restarts the signing pipeline without host CPU intervention.

â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”
â”‚  AMBA Bus Interface                                          â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚  Command Sequencer  +  Rejection Retry Controller           â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚ NTT     â”‚ SHAKE    â”‚ SAMPLER   â”‚ POLY-ARITH                â”‚
â”‚ Engine  â”‚ XOF      â”‚ Uniform + â”‚ Coefficient arithmetic    â”‚
â”‚         â”‚          â”‚ CBD + Rej â”‚ + norm checking           â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚  Polynomial Memory  â€” Larger than ML-KEM (23-bit coefficients)â”‚
â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤
â”‚  Key Buffer  â€” Signing key isolation + hint generation       â”‚
â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜

The rejection retry controller monitors intermediate values after each signing attempt. If the norm of z exceeds Î² or the number of 1s in the hint vector exceeds Ï‰, the controller increments the nonce and restarts the pipeline. Average signing latency depends on the rejection rate (typically 4â€“7 attempts for ML-DSA-65), but the hardware pipeline restarts with zero overhead, making each retry significantly faster than a software restart.

Verify-only configuration: For applications that only need signature verification (e.g., secure boot, certificate validation), Dyber offers a verify-only variant of DYBER-MLDSA that omits the signing pipeline and rejection controller. This reduces area by approximately 35â€“40% while retaining full verification throughput.

ML-DSA Security Level Variants #

Variant	NIST Level	Classical Equivalent	Public Key	Signature
ML-DSA-44	Level 2	~SHA-256 collision	1,312 bytes	2,420 bytes
ML-DSA-65	Level 3	~AES-192	1,952 bytes	3,293 bytes
ML-DSA-87	Level 5	~AES-256	2,592 bytes	4,595 bytes

Like DYBER-MLKEM, the signature accelerator supports runtime parameter selection across all three security levels from a single hardware instance.

DYBER-SLH â€” SLH-DSA Hash-Based Signatures #

The DYBER-SLH accelerator implements SPHINCS+ as standardized in FIPS 205. SLH-DSA provides stateless hash-based digital signatures whose security relies solely on the security of the underlying hash function â€” offering a conservative alternative to lattice-based schemes for applications requiring defense-in-depth against potential future cryptanalytic advances in lattice problems.

SLH-DSA Architecture #

Unlike lattice-based algorithms, SLH-DSA does not use NTT or polynomial arithmetic. Instead, it relies on extensive hash tree computation â€” generating and traversing Merkle trees of WOTS+ one-time signatures. The computational bottleneck is raw Keccak permutation throughput.

DYBER-SLH integrates multiple KECCAK-CORE instances operating in parallel to accelerate the hash tree construction. The number of parallel hash instances is configurable at synthesis time, trading area for signing speed.

Variant	NIST Level	Signature Size	Use Case
SLH-DSA-128f	Level 1	17,088 bytes	Fast signing when larger signatures are acceptable
SLH-DSA-128s	Level 1	7,856 bytes	Small signatures when signing latency is tolerable
SLH-DSA-256f	Level 5	49,856 bytes	Maximum security with fast signing

SLH-DSA signatures are significantly larger and slower than ML-DSA, but offer a fundamentally different security assumption. Dyber recommends SLH-DSA for long-lived root-of-trust signatures (firmware signing, root CA certificates) where lattice-based redundancy provides additional assurance, and ML-DSA for high-volume operational signatures (TLS, API authentication).

DYBER-HKEM â€” Hybrid KEM Bridge #

The DYBER-HKEM accelerator implements combined classical + post-quantum key exchange for transitional deployments that require backward compatibility with existing PKI infrastructure. The core performs both ECDH and ML-KEM key exchange in parallel and derives the final shared secret by combining both results.

Mode	Classical Component	PQC Component	Combined Security
Hybrid-P256-512	ECDH P-256	ML-KEM-512	128-bit classical + Level 1 PQC
Hybrid-P384-768	ECDH P-384	ML-KEM-768	192-bit classical + Level 3 PQC
Hybrid-X25519-768	X25519	ML-KEM-768	128-bit classical + Level 3 PQC

The hybrid bridge is designed for IETF draft compliance (draft-ietf-tls-hybrid-design) and supports the KDF combination methods specified for TLS 1.3 hybrid key exchange. The classical ECDH component uses a hardened ECC core with constant-time scalar multiplication.

DYBER-TLS â€” TLS 1.3 Handshake Offload #

The DYBER-TLS engine offloads the complete TLS 1.3 cryptographic handshake from the host CPU. Rather than accelerating individual operations, DYBER-TLS orchestrates the entire key exchange and authentication sequence in hardware:

ClientHello â†’ ServerHello â†’ Key Exchange â†’ Certificate Verify â†’ Finished

The engine integrates DYBER-MLKEM for key exchange, DYBER-MLDSA for certificate verification, and SHA-3 for transcript hashing. The host CPU provides the certificate chain and receives the negotiated session keys â€” all intermediate cryptographic operations execute in hardware without CPU involvement.

Target application: DPU/SmartNIC deployments where TLS termination is performed at line rate. The DYBER-TLS engine can sustain hundreds of thousands of full handshakes per second, enabling cloud providers to terminate PQC-protected TLS connections at the network edge without CPU overhead.

DYBER-SBOOT â€” PQC Secure Boot #

The DYBER-SBOOT accelerator provides hardware-accelerated firmware chain-of-trust verification using ML-DSA signatures. It is optimized for the boot path: minimal area, fast verification, and simple integration with existing boot ROM architectures.

The core reads firmware images from memory, computes the hash, and verifies the ML-DSA signature against a root-of-trust public key stored in OTP or hardware fuses. Verification completes in microseconds â€” negligible compared to typical firmware load times â€” enabling quantum-resistant secure boot with zero impact on boot latency.

DYBER-SBOOT supports chained verification (bootloader â†’ kernel â†’ application) with multiple public key slots for key rotation and revocation.

Performance Comparison #

Hardware acceleration provides orders-of-magnitude improvement over software implementations for all PQC operations. The following table shows representative acceleration factors measured against optimized software on current-generation server processors:

Operation	Software Baseline	Hardware Acceleration Factor
ML-KEM-768 KeyGen	OpenSSL 3.2 / Zen 4	~28Ã— faster
ML-KEM-768 Encaps	OpenSSL 3.2 / Zen 4	~22Ã— faster
ML-KEM-768 Decaps	OpenSSL 3.2 / Zen 4	~25Ã— faster
ML-DSA-65 Sign	OpenSSL 3.2 / Zen 4	~12Ã— faster (avg)
ML-DSA-65 Verify	OpenSSL 3.2 / Zen 4	~18Ã— faster
SLH-DSA-128f Sign	Reference C / Zen 4	~8Ã— faster
SLH-DSA-128f Verify	Reference C / Zen 4	~15Ã— faster

These figures represent FPGA-validated measurements. ASIC implementation at advanced process nodes is expected to further improve both latency and throughput while substantially reducing power consumption.

Why this matters: A single TLS 1.3 handshake with PQC key exchange (ML-KEM-768) and authentication (ML-DSA-65) requires multiple PQC operations. At cloud provider scale â€” millions of connections per second across thousands of servers â€” even small per-operation latency improvements translate to significant infrastructure savings. Hardware acceleration makes PQC deployment cost-neutral versus classical crypto at the infrastructure level.

Multi-Algorithm Deployment #

Most real-world PQC deployments require multiple algorithms simultaneously â€” ML-KEM for key exchange and ML-DSA for authentication at minimum. Dyber accelerators are designed for co-deployment with shared submodules to reduce total area.

Shared NTT: ML-KEM and ML-DSA can share a single NTT engine instance (with different modulus configurations). The NTT engine switches between q=3329 (ML-KEM) and q=8380417 (ML-DSA) via register configuration with <10 cycle switching overhead.

Shared SHAKE: Both ML-KEM and ML-DSA use SHAKE-128/256 extensively. A single SHAKE-XOF core can be time-multiplexed between algorithm accelerators when concurrent operation is not required.

Bundled Subsystems: Dyber offers pre-configured multi-algorithm bundles optimized for common deployment scenarios:

Bundle	Included Cores	Target Use Case
PQC-TLS Bundle	MLKEM + MLDSA + shared NTT + SHAKE	TLS termination, web servers
PQC-HSM Bundle	All algorithms + KMU + MASK + QRNG	Hardware security modules
PQC-Edge Bundle	MLKEM-512 + SBOOT + minimal NTT	IoT gateways, constrained devices
PQC-DPU Bundle	TLS Engine + HKEM + KMU + QRNG	SmartNIC/DPU line-rate offload

Was this page helpful? Send feedback