Integration Guide
This guide covers the integration of Dyber PQC IP cores into SoC designs. Dyber IP operates as dedicated acceleration blocks that interface via standard AMBA buses — the same RTL integrates identically whether paired with x86, ARM, RISC-V, or any other processor architecture.
Overview #
Integrating a Dyber IP core follows the same pattern as integrating any AMBA-compliant peripheral. The host CPU communicates with the accelerator through standard bus interfaces for control and data. The accelerator handles the computationally intensive cryptographic operations at hardware speed, returning results through the same bus interface.
A typical integration involves: instantiating the IP core RTL in the SoC design, connecting AMBA bus signals to the system interconnect, providing clock and reset, writing synthesis constraints, and building the software driver from the provided reference implementation.
Architecture-Agnostic Model #
Dyber IP cores are designed as co-processor blocks. The CPU handles control flow, key management orchestration, and application logic while the accelerator executes the mathematically intensive operations (NTT transforms, polynomial arithmetic, hash computations) in hardware. This is the same proven model used for existing hardware crypto accelerators (AES/SHA co-processors, DMA engines).
The architecture-agnostic property arises from standardized bus interfaces. The AMBA protocol suite (AXI4, APB, AHB) is the universal interconnect across SoC architectures — the same protocols used in x86 server SoCs, ARM-based DPUs, RISC-V microcontrollers, and FPGA soft-processor systems. Dyber IP speaks the same bus language as every other peripheral in the SoC — no architecture-specific modifications are required.
┌──────────────┠┌──────────┠┌────────────────────────â”
│ Host CPU │ │ AMBA │ │ Dyber IP Core │
│ (x86/ARM/ │◄──►│ Intercon │◄──►│ (MLKEM, MLDSA, NTT...) │
│ RISC-V/any) │ │ │ │ │
└──────────────┘ └──────────┘ └────────────────────────┘
AXI4 / APB / AHB — same interface regardless of CPU
AMBA Bus Interfaces #
Each IP core exposes one or more AMBA interfaces depending on its complexity and performance requirements:
| Interface | Role | When to Use |
|---|---|---|
| AXI4-Lite | Control registers, status, configuration | All cores — always present for command/status |
| AXI4 (Full) | Bulk data transfer (keys, ciphertext, messages) | Algorithm accelerators handling large data payloads |
| AXI4-Stream | Streaming data path | High-throughput pipeline architectures (DPU, TLS offload) |
| APB | Low-power peripheral access | IoT / ultra-low-power deployments |
| AHB | Legacy system compatibility | Designs using AHB-based interconnect |
| Native FIFO | Minimal-overhead streaming | Direct IP-to-IP connections without bus overhead |
The interface selection is a compile-time configuration parameter. All interfaces expose the same register map and programming model — switching from AXI4-Lite to APB changes only the bus protocol wrapper, not the core functionality or driver API.
FPGA Integration #
FPGA integration is the primary validation path and the fastest way to evaluate Dyber IP in a working system. The IP is delivered with synthesis scripts and constraint templates for AMD/Xilinx FPGA families.
Vivado block design: Dyber IP cores are packaged as Vivado IP catalog components. Integration into a block design is drag-and-drop: instantiate the core, connect AXI interfaces to the processing system or system interconnect, and run connection automation.
Synthesis constraints: Clock constraints, I/O timing, and placement directives are provided as template XDC files. Timing closure has been achieved on the validated FPGA platforms with the provided constraints — integrators may need to adjust based on their specific design context and utilization.
FPGA-optimized vs. generic RTL: The FPGA-optimized variant uses DSP48E2 inference hints and BRAM primitive optimizations for maximum performance on Xilinx UltraScale+. The generic RTL variant uses pure inference-based coding with no vendor-specific constructs, suitable for porting to other FPGA families or ASIC flows.
Validated platforms: Zynq UltraScale+ (XCZU7EV) with ARM Cortex-A53 host integration via AXI is the primary validated platform. Zynq-7000 series validation has been completed with successful migration across DSP primitive architectures. Versal AI Core/Premium synthesis scripts are provided.
ASIC Integration Path #
ASIC integration requires technology-specific work that Dyber scopes as joint engineering with the integrator's implementation team. The generic RTL variant (inference-only code with no vendor primitives) is the starting point for all ASIC flows.
| ASIC Deliverable | Description |
|---|---|
| Generic RTL | Inference-only code with no vendor primitives, included with production license |
| Synthesis support | Timing constraints, synthesis guidance, and critical path documentation for target process node |
| STA collaboration | Support for static timing analysis across PVT corners using the integrator's libraries |
| DFT integration | RTL structured for scan insertion; BIST hooks provided for memory and logic testing |
| Gate-level verification | Support for gate-level simulation with post-synthesis netlists |
Dyber provides dedicated engineering support throughout the ASIC flow. Final implementation uses the integrator's standard cell libraries, PDK, and design methodology — Dyber provides the design content and integration expertise.
Driver Architecture #
Dyber provides reference driver implementations that demonstrate the complete programming model. Drivers are provided in bare-metal C and Linux kernel module form, targeting both x86-64 and ARM64 architectures.
Programming model: All Dyber IP cores follow a consistent command-based programming model:
1. Configure — Write parameter set and operation type to control registers
2. Load — Write input data (public key, message, ciphertext) to data registers or DMA buffer
3. Execute — Write GO bit to command register
4. Wait — Poll status register or wait for interrupt (deterministic cycle count)
5. Read — Read output data (shared secret, signature, verification result)
Bare-metal library: Architecture-independent C library with a clean API. Provides register access abstractions, operation wrappers, and self-test functions. No OS dependencies — suitable for firmware, RTOS, or bare-metal environments.
Linux kernel module: Reference kernel module implementing a character device interface (/dev/dyber_pqc0). Supports both x86-64 and ARM64 with architecture-specific DMA configuration. Upstream mainline submission planned in collaboration with integrator.
Adaptation: Reference drivers are starting points, not finished products. Platform-specific adaptation (register base addresses, interrupt routing, DMA configuration, integration with existing crypto subsystems) is expected as part of the integration process. Dyber provides engineering support for driver adaptation.
DMA & High-Throughput Patterns #
For high-throughput applications (TLS termination, bulk signing), register-based data transfer becomes a bottleneck. Dyber IP cores with AXI4 or AXI4-Stream interfaces support DMA-based data transfer for maximum throughput.
Scatter-gather DMA: The AXI4 interface supports scatter-gather descriptor lists for processing multiple operations without CPU intervention between operations. The host CPU populates a descriptor ring with input buffer pointers and output buffer pointers, then kicks the DMA engine. The accelerator processes operations back-to-back from the descriptor ring.
Streaming mode: AXI4-Stream interfaces enable continuous pipeline operation where input data flows through the accelerator without store-and-forward overhead. Best for network processing paths where packets flow through a pipeline of processing stages.
Interrupt coalescing: For high-throughput workloads, per-operation interrupts create excessive overhead. The IP supports configurable interrupt coalescing — raising a single interrupt after N operations complete or after a timeout, whichever comes first.
Multi-Instance Deployment #
For applications requiring higher throughput than a single core can provide, multiple IP core instances can be deployed in parallel. Each instance is independently addressable at a different base address on the AMBA bus.
Load balancing: The host driver or a hardware dispatcher distributes operations across available instances. Round-robin scheduling is simplest; priority-based scheduling enables latency-sensitive operations (handshake completion) to preempt bulk operations (certificate validation).
Shared resources: Multiple algorithm accelerator instances can share a single DYBER-KMU for key management and a single DYBER-QRNG for entropy, connected via internal AXI crossbar. This avoids duplicating support infrastructure when scaling compute cores.
Power & Clocking #
Clock domain: Each IP core operates in a single clock domain. The core clock can run at a different frequency than the bus clock — an asynchronous FIFO bridge (provided as part of the interface wrapper) handles clock domain crossing. This enables the core to run at its optimal frequency independent of the system bus clock.
Clock gating: IP cores support fine-grained clock gating for power management. When idle (no operation in progress), the internal pipeline clock is gated, reducing dynamic power to near-zero. The bus interface clock remains active for register access. Wake-from-idle latency is a single clock cycle.
Power domains: For ASIC integration, the IP can be placed in an independent power domain with its own voltage rail, enabling power-down of the PQC subsystem when not in use. The key buffer includes a power-down zeroization trigger to protect key material during power state transitions.
Platform Scenarios #
| Platform Type | Integration Pattern | Key Considerations |
|---|---|---|
| Adaptive SoC (Zynq/Versal) | PL instantiation, PS-to-PL AXI bridge | Validated reference design available. Block design integration. |
| FPGA (Virtex/Kintex/Artix) | Soft-processor + IP core, or standalone accelerator | MicroBlaze or external host via PCIe/Ethernet. |
| Server CPU (ASIC) | On-die block alongside existing crypto co-processor | PSP firmware integration, MMIO or dedicated command interface. Joint engineering. |
| DPU / SmartNIC | Dedicated PQC block in packet processing pipeline | AXI4-Stream for line-rate operation. P4 pipeline integration. |
| Client CPU (ASIC) | Security co-processor enhancement | TPM integration, fTPM offload, minimal area configuration. |
| MCU / IoT | APB peripheral in constrained SoC | NTT-R2 + ML-KEM-512 minimal configuration. Ultra-low power. |
Integration Deliverables #
All IP licenses include the following integration deliverables:
| Category | Contents |
|---|---|
| Design files | Synthesizable RTL (Verilog/SystemVerilog/VHDL), FPGA synthesis scripts, constraint templates, reference block designs |
| Verification | UVM testbench, NIST KAT vectors, functional coverage models, formal verification assertions |
| Software | Bare-metal C driver library, reference Linux kernel module (x86-64 + ARM64), API headers and documentation |
| Documentation | Datasheet with measured specifications, integration guide, register map, programming guide, security target |
Production licenses additionally include generic RTL for ASIC migration, ASIC synthesis guidance, and DFT integration hooks.
Was this page helpful? Send feedback