ZeroStream

Link (de)compression IP for die-to-die, chip-to-chip, and DRAM interfaces. Fully pipelined to sustain line rate, with deterministic, data-independent latency.

What it does

ZeroStream is a hardware compression and decompression IP block that expands effective memory bandwidth for AI accelerators. It compresses data in real time as it moves across a link or memory interface, so more useful data is delivered per cycle without changing the host software or retraining models.

At a glance

Hardware compress and decompress
Deterministic, data-independent latency
Line-rate, fully pipelined throughput
Transparent software encoding library
No model retraining required

Where it fits

Link compression across chip-to-chip (C2C) and die-to-die (D2D) connections.
DRAM bandwidth improvement through custom integration inside the SoC, applicable to HBM, GDDR, LPDDR, and DDR.
Flexible integration points, for example per memory channel close to the memory controller, or close to the DMA engine of an NPU.

Why it matters

Compression ratio translates directly into throughput on bandwidth-bound workloads. For LLM decode, which is memory bound, more effective bandwidth means more tokens per second from the same silicon and the same memory.

Software library included

The IP solution ships with a software library that adapts the encoding to different LLM models and data types for higher compression efficiency and optimal bandwidth. The library is transparent to the user and the system.

ZeroPoint's technical team provides onsite consulting for integration, so the result is an immediate bandwidth uplift with low integration risk.

Key specifications

Headline characteristics below. Detailed area, gate count, and cycle-level latency figures are configuration dependent and marked as placeholders for the public page.

Target data	Any data. Examples: weights, activations, KV cache, databases, data center class workloads.
Use cases	Die-to-die, chip-to-chip, and DRAM bandwidth improvement.
Max clock frequency	Up to 2 GHz (Samsung 4nm).
Bandwidth at 2 GHz	512-bit interface: 128 GB/s per direction (compress + decompress). 256-bit interface: 64 GB/s per direction.
Compression ratio	Up to 1.5× on LLM weights; up to 2× on activations and KV cache.
Bandwidth improvement	20–35% across data types, up to 50%.
Data interface	AMBA AXI5, 256-bit or 512-bit.
Latency	Deterministic and data-independent. PLACEHOLDER: cycle counts
Silicon area / gate count	PLACEHOLDER: area & gates (configuration dependent)
SRAM	PLACEHOLDER: SRAM per config
Metadata	Compression state of PLACEHOLDER: bits per superblock. Managed by the integrator or by the IP. No metadata management needed for C2C / D2D links.

What's included

Synthesizable RTL for compressor and decompressor.
Verification and test framework.
Transparent software encoding library.
Integration support and onsite consulting.

Talk to our team

Back to products

Explore the rest of the portfolio

ZeroConnect

Compressed memory for CXL devices

ZeroAI

Model compression for weights, KV, activations

ZeroStorage

Standards-based LZ4 acceleration

All products

Back to the portfolio