ZeroStream
Link (de)compression IP for die-to-die, chip-to-chip, and DRAM interfaces. Fully pipelined to sustain line rate, with deterministic, data-independent latency.
What it does
ZeroStream is a hardware compression and decompression IP block that expands effective memory bandwidth for AI accelerators. It compresses data in real time as it moves across a link or memory interface, so more useful data is delivered per cycle without changing the host software or retraining models.
At a glance
Hardware compress and decompress
Deterministic, data-independent latency
Line-rate, fully pipelined throughput
Transparent software encoding library
No model retraining required
Where it fits
Link compression across chip-to-chip (C2C) and die-to-die (D2D) connections.
DRAM bandwidth improvement through custom integration inside the SoC, applicable to HBM, GDDR, LPDDR, and DDR.
Flexible integration points, for example per memory channel close to the memory controller, or close to the DMA engine of an NPU.
Why it matters
Compression ratio translates directly into throughput on bandwidth-bound workloads. For LLM decode, which is memory bound, more effective bandwidth means more tokens per second from the same silicon and the same memory.
Software library included
The IP solution ships with a software library that adapts the encoding to different LLM models and data types for higher compression efficiency and optimal bandwidth. The library is transparent to the user and the system.
ZeroPoint's technical team provides onsite consulting for integration, so the result is an immediate bandwidth uplift with low integration risk.
Key specifications
Headline characteristics below. Detailed area, gate count, and cycle-level latency figures are configuration dependent and marked as placeholders for the public page.
| Target data | Any data. Examples: weights, activations, KV cache, databases, data center class workloads. |
| Use cases | Die-to-die, chip-to-chip, and DRAM bandwidth improvement. |
| Max clock frequency | Up to 2 GHz (Samsung 4nm). |
| Bandwidth at 2 GHz | 512-bit interface: 128 GB/s per direction (compress + decompress). 256-bit interface: 64 GB/s per direction. |
| Compression ratio | Up to 1.5× on LLM weights; up to 2× on activations and KV cache. |
| Bandwidth improvement | 20–35% across data types, up to 50%. |
| Data interface | AMBA AXI5, 256-bit or 512-bit. |
| Latency | Deterministic and data-independent. PLACEHOLDER: cycle counts |
| Silicon area / gate count | PLACEHOLDER: area & gates (configuration dependent) |
| SRAM | PLACEHOLDER: SRAM per config |
| Metadata | Compression state of PLACEHOLDER: bits per superblock. Managed by the integrator or by the IP. No metadata management needed for C2C / D2D links. |
What's included
Synthesizable RTL for compressor and decompressor.
Verification and test framework.
Transparent software encoding library.
Integration support and onsite consulting.
Explore the rest of the portfolio
ZeroConnect
Compressed memory for CXL devices
ZeroAI
Model compression for weights, KV, activations
ZeroStorage
Standards-based LZ4 acceleration
All products
Back to the portfolio