Cryptographic Primitives

IP-PRIM-001 v2.0

Dyber's cryptographic primitive IP cores are the building blocks used within the algorithm accelerators, but each is also available as a standalone, independently licensable module. This enables integrators to compose custom cryptographic subsystems, accelerate specific bottleneck operations in software-driven implementations, or build proprietary algorithms on top of validated hardware primitives.

Overview #

The primitive library divides into five functional groups: hash and extendable-output functions, modular arithmetic, stochastic sampling, entropy generation, and key management. Each primitive provides a well-defined interface and can be instantiated independently or composed with other primitives via internal FIFO connections.

GroupCoresPrimary Role in PQC
Hash / XOFKECCAK-CORE, SHA3-HASH, SHAKE-XOFMessage hashing, key derivation, matrix expansion, commitment
ArithmeticMOD-BARRETT, MOD-MONT, POLY-ARITHModular reduction, polynomial operations, NTT butterfly support
SamplingSAMPLER-CBD, SAMPLER-UNIFORMNoise vector generation, public matrix generation
EntropyDYBER-QRNG, ENTROPY-CONDTrue random number generation, seed material, nonce production
Key ManagementDYBER-KMUKey lifecycle, storage, derivation, zeroization

KECCAK-CORE — Raw Permutation Engine #

The KECCAK-CORE implements the Keccak-f[1600] permutation — the mathematical foundation underlying the entire SHA-3 family. This is the lowest-level hash primitive: it accepts a 1600-bit state, performs 24 rounds of the Keccak permutation, and outputs the transformed state.

Architecture: Fully unrolled single-cycle-per-round implementation. 24 clock cycles per complete permutation. The round function (θ, ρ, π, χ, ι) is implemented as a combinational pipeline that processes one round per clock edge.

Use cases: Custom sponge constructions, proprietary hash modes, or direct integration into algorithm pipelines where the SHA-3/SHAKE wrapper overhead is unnecessary. Most integrators should use SHA3-HASH or SHAKE-XOF instead.

SHA3-HASH — SHA-3 Hash Core #

Complete SHA-3 implementation supporting SHA3-256 and SHA3-512 digest modes. The core handles message padding, absorption, squeezing, and multi-block processing automatically — the integrator feeds message data and reads the final hash digest.

Interface: AXI4-Stream input for message data, AXI4-Lite control for mode selection and status, AXI4-Stream or register-mapped output for the digest. Supports streaming operation where the message length is not known in advance.

Throughput: Multi-Gbps hash rate depending on message length. Short messages (single-block) are dominated by permutation latency; long messages approach the theoretical sponge rate. Detailed throughput curves available in the evaluation datasheet.

SHAKE-XOF — Extendable Output Function #

SHAKE-128 and SHAKE-256 extendable-output function implementation. Unlike fixed-output SHA-3, SHAKE can produce an arbitrary number of output bytes — making it essential for ML-KEM and ML-DSA, where SHAKE is used to expand seeds into full public matrices and noise vectors.

Architecture: Wraps KECCAK-CORE with sponge state management that supports continuous squeezing. After absorbing the seed, the core produces an unbounded output stream at the rate of one KECCAK permutation per squeeze block. An internal counter tracks the squeeze position for deterministic reproducibility.

Critical role in PQC: SHAKE is the second most computationally expensive operation in ML-KEM and ML-DSA after NTT. During ML-KEM key generation, SHAKE-128 expands a 32-byte seed into the full k×k public matrix A — requiring multiple squeeze operations. Hardware SHAKE acceleration can reduce ML-KEM key generation time by 20–30% beyond NTT acceleration alone.

MOD-BARRETT — Barrett Modular Reduction #

Single-cycle Barrett reduction unit with compile-time configurable modulus. The Barrett algorithm replaces division with multiplication by a precomputed reciprocal, enabling efficient modular reduction without a hardware divider.

Supported moduli: q = 3329 (ML-KEM, 12-bit), q = 8380417 (ML-DSA, 23-bit), and custom moduli configurable at synthesis time. Runtime switching between ML-KEM and ML-DSA moduli is supported via configuration register.

Latency: Single clock cycle from input to reduced output. Pipelined for back-to-back operation with no stall cycles.

Usage: Integrated within NTT butterfly units for twiddle factor multiplication reduction. Also used standalone for polynomial coefficient reduction after addition/subtraction operations.

MOD-MONT — Montgomery Multiplication #

Montgomery modular multiplication unit. Computes a·b·R⁻¹ mod q where R is the Montgomery constant. Requires operands in Montgomery representation but avoids explicit division, making it ideal for sequences of multiplications (as in NTT butterfly computations).

Configurable width: Operand width is parameterized at synthesis time. Supports 12-bit (ML-KEM), 23-bit (ML-DSA), and arbitrary widths up to 64-bit for custom applications or future algorithm support.

Constant-time guarantee: Execution time is independent of operand values. No early-exit optimizations that could leak information through timing.

POLY-ARITH — Polynomial Arithmetic Unit #

Combined polynomial addition, subtraction, and coefficient-wise multiplication with integrated modular reduction. Operates on polynomial vectors stored in block RAM, processing one or more coefficients per cycle depending on the datapath width configuration.

Operations:

c = a + b  (mod q)     // Polynomial addition
c = a - b  (mod q)     // Polynomial subtraction
c = a · b  (mod q)     // Coefficient-wise multiplication (NTT domain)
c = compress(a, d)     // ML-KEM compression (d-bit output)
c = decompress(a, d)   // ML-KEM decompression

POLY-ARITH includes the ML-KEM-specific compress/decompress functions that map between full-precision coefficients and reduced-precision ciphertext representations. These are constant-time implementations using Barrett reduction internally.

SAMPLER-CBD — Centered Binomial Distribution #

Generates noise polynomials from the centered binomial distribution Bη, which is the noise distribution used in ML-KEM and ML-DSA for secret key and error generation. The sampler consumes uniform random bytes (from SHAKE output) and produces polynomial coefficients following the CBD distribution.

Supported parameters: η = 2 (ML-KEM-512/768), η = 3, η = 4 (custom). The parameter is selectable at runtime.

Constant-time: The sampling algorithm uses only bitwise operations (AND, popcount, subtract) — no branches, no comparisons, and no rejection. Every input produces exactly one output coefficient in a fixed number of cycles.

SAMPLER-UNIFORM — Uniform Rejection Sampling #

Generates uniformly random polynomial coefficients in [0, q) from a SHAKE-128 byte stream. Used for public matrix generation in ML-KEM (matrix A) and ML-DSA (matrix A). Implements rejection sampling: candidate values ≥ q are discarded and the next candidate is drawn.

Bounded timing: While rejection sampling is inherently data-dependent (some candidates are rejected), the timing variation is bounded and does not leak secret information — the public matrix A is not secret. However, for conservative deployments, the sampler includes a constant-time mode that always processes a fixed number of candidates regardless of the acceptance rate, with a configurable upper bound.

DYBER-QRNG — Quantum Random Number Generator #

Hardware quantum entropy source that exploits quantum physical phenomena to generate true random numbers with information-theoretic guarantees that no algorithmic PRNG can provide. The DYBER-QRNG core integrates the entropy source, health monitoring, conditioning, and output interface into a single licensable IP block.

Entropy source: The raw entropy is derived from quantum noise processes with inherent physical randomness. The source undergoes continuous health testing per NIST SP 800-90B to detect degradation or failure in real time.

Conditioning: Raw entropy passes through a SP 800-90B compliant conditioning component (typically a cryptographic hash) that concentrates entropy and produces full-entropy output bits.

Output interface: Conditioned random bytes are available via AXI4-Stream or a simple valid/ready FIFO interface. Output rate is configurable based on the number of parallel entropy sources instantiated.

Certification support: DYBER-QRNG includes self-test sequences and health monitoring required for FIPS 140-3 RNG validation (SP 800-90B entropy source + SP 800-90A DRBG if required). Entropy assessment documentation is provided for certification submissions.

ENTROPY-COND — Entropy Conditioning Module #

Standalone entropy conditioning module for applications that bring their own entropy source (ring oscillator, metastable latch, external TRNG) but need SP 800-90B compliant conditioning, health testing, and output formatting.

Inputs: Accepts raw entropy bits from up to 4 independent sources. Each source is independently health-tested with configurable thresholds for repetition count and adaptive proportion tests.

Mixing: Multi-source entropy is combined using a hash-based mixer that guarantees the output entropy is at least as strong as the strongest individual source — even if other sources are compromised.

Health monitoring: Continuous and on-demand health tests with configurable alert thresholds. Hardware interrupt signals notify the host CPU of entropy source degradation before output quality is affected.

DYBER-KMU — Key Management Unit #

Hardware key lifecycle management engine that handles generation, storage, derivation, usage tracking, and guaranteed zeroization for cryptographic key material. The KMU is designed to enforce key isolation policies in hardware — ensuring that private key material never appears on external buses or in host-accessible memory.

Key slots: Up to 256 independently managed key slots, each with configurable permissions (sign-only, decrypt-only, derive-only, export-allowed). Slot permissions are enforced in hardware and cannot be overridden by software.

Key generation: Keys can be generated internally using the integrated DRBG (seeded from DYBER-QRNG or external entropy) or imported through an encrypted key wrapping protocol. Internally generated keys never leave the KMU boundary in plaintext.

Derivation: Hardware KDF (HKDF-SHA3) for deriving session keys, per-message keys, or hierarchical key structures from master keys. Derivation operations execute entirely within the KMU without exposing intermediate material.

Zeroization: Immediate hardware-triggered key destruction that overwrites all key slot memory with random data, followed by verification read-back. Zeroization can be triggered by software command, tamper detection, or power-loss detection. Formally verified to complete within a bounded number of clock cycles.

Audit trail: Hardware counters track usage count, last-used timestamp, and operation type for each key slot. Audit data is available via a read-only register interface for compliance and forensics.

Standalone vs. Submodule Usage #

Every primitive in this library can be used in two ways:

Standalone (independently licensed): The primitive is instantiated as a memory-mapped peripheral with its own AMBA bus interface. The host CPU or a DMA engine provides input data and reads results. Best for software-driven PQC implementations that accelerate only specific bottleneck operations, or for non-PQC applications that need standalone hash, RNG, or arithmetic acceleration.

Submodule (included within algorithm accelerators): The primitive is instantiated internally within a DYBER-MLKEM, DYBER-MLDSA, or other algorithm accelerator. Internal interfaces connect directly to the accelerator's data fabric — no external bus access is needed. This is the default mode when using algorithm accelerator IP; the primitives are included automatically.

Licensing note: Primitives included as submodules within algorithm accelerator licenses do not require separate licensing. Standalone licensing is only required when instantiating primitives outside of an algorithm accelerator context. Contact ip-sales@dyber.com for details on standalone primitive licensing.