ZeroStream

Link (de)compression IP for die-to-die, chip-to-chip, and DRAM interfaces. Fully pipelined to sustain line rate, with deterministic, data-independent latency.

What it does

ZeroStream is a hardware compression and decompression IP block that expands effective memory bandwidth for AI accelerators. It compresses data in real time as it moves across a link or memory interface, so more useful data is delivered per cycle without changing the host software or retraining models.

At a glance

  • Hardware compress and decompress

  • Deterministic, data-independent latency

  • Line-rate, fully pipelined throughput

  • Transparent software encoding library

  • No model retraining required

Where it fits

  • Link compression across chip-to-chip (C2C) and die-to-die (D2D) connections.

  • DRAM bandwidth improvement through custom integration inside the SoC, applicable to HBM, GDDR, LPDDR, and DDR.

  • Flexible integration points, for example per memory channel close to the memory controller, or close to the DMA engine of an NPU.

Why it matters

Compression ratio translates directly into throughput on bandwidth-bound workloads. For LLM decode, which is memory bound, more effective bandwidth means more tokens per second from the same silicon and the same memory.

Software library included

The IP solution ships with a software library that adapts the encoding to different LLM models and data types for higher compression efficiency and optimal bandwidth. The library is transparent to the user and the system.

ZeroPoint's technical team provides onsite consulting for integration, so the result is an immediate bandwidth uplift with low integration risk.

Key specifications

Headline characteristics below. Detailed area, gate count, and cycle-level latency figures are configuration dependent and marked as placeholders for the public page.

Target dataAny data. Examples: weights, activations, KV cache, databases, data center class workloads.
Use casesDie-to-die, chip-to-chip, and DRAM bandwidth improvement.
Max clock frequencyUp to 2 GHz (Samsung 4nm).
Bandwidth at 2 GHz512-bit interface: 128 GB/s per direction (compress + decompress). 256-bit interface: 64 GB/s per direction.
Compression ratioUp to 1.5× on LLM weights; up to 2× on activations and KV cache.
Bandwidth improvement20–35% across data types, up to 50%.
Data interfaceAMBA AXI5, 256-bit or 512-bit.
LatencyDeterministic and data-independent. PLACEHOLDER: cycle counts
Silicon area / gate countPLACEHOLDER: area & gates (configuration dependent)
SRAMPLACEHOLDER: SRAM per config
MetadataCompression state of PLACEHOLDER: bits per superblock. Managed by the integrator or by the IP. No metadata management needed for C2C / D2D links.

What's included

  • Synthesizable RTL for compressor and decompressor.

  • Verification and test framework.

  • Transparent software encoding library.

  • Integration support and onsite consulting.

Explore the rest of the portfolio

ZeroConnect

Compressed memory for CXL devices

ZeroAI

Model compression for weights, KV, activations

ZeroStorage

Standards-based LZ4 acceleration

All products

Back to the portfolio