Hot Interconnects: UCIe Brings On-Package Memory to the Forefront

At the IEEE Hot Interconnects 2025, Intel’s Debendra Das Sharma outlined how the Universal Chiplet Interconnect Express (UCIe) can extend the performance of on-package memory, offering a path to low-power, high-bandwidth, low-latency, and cost-effective solutions for AI and HPC systems.

Das Sharma, joined by colleagues from Intel and AMD, noted that today’s AI workloads are increasingly constrained by memory bandwidth. While HBM offers ~20× the bandwidth of LPDDR, it is reaching bump-limited scaling challenges, and traditional bi-directional multi-drop buses remain inefficient. UCIe proposes a point-to-point, unidirectional PHY that delivers higher bandwidth density, improved power efficiency, and scalability across packaging generations.

Key Metrics and Advantages

Performance Range: UCIe 2D supports 4–32 GT/s at bump pitches of 100–130 μm and up to 224 GB/s/mm shoreline bandwidth. UCIe 2.5D (25–55 μm) boosts this to 1.3 TB/s/mm, while UCIe 3D with hybrid bonding (<1–9 μm) can reach 300 TB/s/mm².
Power Efficiency:
- 2D: 0.5 pJ/b (≤16G) to 0.6 pJ/b (>16G)
- 2.5D: 0.25 pJ/b (≤16G) to 0.3 pJ/b (>16G)
- 3D hybrid bonding: 0.05–0.01 pJ/b
Latency: ~2 ns round-trip, with <1 ns entry/exit and 85%+ dynamic power savings.
Packaging Flexibility: Supports 2D (cost-effective), 2.5D (CoWoS, EMIB), and 3D advanced options, allowing dies to be manufactured and assembled independently.
Protocol Support: Compatible with multiple memory and compute protocols, including CHI, LPDDR, HBM, and CXL, with mapping demonstrated for symmetric and asymmetric UCIe implementations.

The exponential rise in AI model sizes — from GPT-3 to projected GPT-4-class workloads — has created a widening “memory wall.” Even with large HBM stacks, scaling remains constrained by bump pitch and efficiency. UCIe’s unidirectional PHY offers a scalable roadmap:

Short-term: Frequency upgrades to 64 GT/s.
Mid-term: Bump pitch reduction using UCIe-A (Advanced).
Long-term: 3D hybrid bonding with unprecedented bandwidth density (up to 300 TB/s/mm²).

By standardizing on an open, interoperable interconnect, UCIe enables the disaggregation of compute and memory resources while maintaining low latency and high efficiency — critical for future AI and HPC systems.

🌐 Analysis

UCIe is rapidly becoming the backbone of heterogeneous chiplet ecosystems, addressing the most pressing challenge in AI infrastructure—memory bandwidth. With backing from Intel, AMD, Arm, and dozens of ecosystem players, UCIe has positioned itself as the “PCI Express for chiplets.”

This presentation underscores UCIe’s evolution beyond die-to-die connectivity into the domain of memory integration. By enabling efficient attachment of HBM and LPDDR via UCIe PHY, the standard offers a practical way to keep pace with the insatiable memory demands of AI training and inference.

The roadmap is particularly aggressive: UCIe 3.0 at 64 GT/s and 3D hybrid bonding with sub-micron bumps promise bandwidth densities that dwarf today’s HBM solutions, while achieving an order-of-magnitude power efficiency improvement. This positions UCIe not just as a chiplet interconnect, but as the critical path forward for next-generation AI accelerators and data center servers.

🌐 We’re tracking UCIe developments and chiplet standards at ConvergeDigest.com.