AMD Lays Out Full-Stack Vision for AI Infrastructure

Jim Carroll

11 months ago

At its “Advancing AI 2025” event, AMD unveiled a complete roadmap for next-generation AI infrastructure, emphasizing open networking standards, scalable system architectures, and developer-accessible software. The announcement included the launch of the Instinct™ MI350 Series GPUs, ROCm 7 software stack, and a preview of “Helios,” a fully integrated rack-scale AI platform set to arrive in 2026. These developments underscore AMD’s commitment to disaggregated, standards-based architectures as AI workloads become increasingly distributed and network-bound.

The Instinct MI355X GPU, based on AMD’s CDNA 4 architecture, delivers up to 20 PFLOPS FP4, backed by 288GB of HBM3E memory and 8TB/s bandwidth. Systems scale up to 128 GPUs per rack in liquid-cooled configurations, achieving 2.6 exaFLOPS of AI compute and supporting models with over 500 billion parameters. ROCm 7, AMD’s open-source AI software stack, brings full support for FP4, advanced inference optimization, and turnkey MLOps tools. Together, these technologies form the foundation of AMD’s open rack-scale AI strategy.

At the heart of this infrastructure is a strong emphasis on networking innovation. AMD is a founding member of the UALink™ Consortium, which is establishing an open interconnect standard for GPU-to-GPU communications across servers and racks. The MI355X and upcoming MI400 Series GPUs are UALink-enabled, allowing up to 72 GPUs per Helios rack to operate as a unified compute domain. UALink offers 260TB/s of intra-rack bandwidth, surpassing proprietary fabrics like NVLink in scalability and openness. AMD’s roadmap also includes support for tunneling UALink over Ultra Ethernet, blending the performance of scale-up fabrics with the flexibility of Ethernet.

AMD’s Pensando™ Pollara 400 and upcoming Vulcano AI NICs are fully compliant with the Ultra Ethernet Consortium (UEC) specification, offering programmable congestion control, path-aware routing, and support for up to 800G network throughput. The Vulcano NIC, launching with the Helios rack in 2026, will deliver 8x greater scale-out bandwidth per GPU than current generation NICs. These networking components enable distributed inference and Mixture of Experts (MoE) workloads to scale with low latency and maximum throughput—critical for the future of agentic AI. AMD’s approach also leverages CXL 3.0 in its “Venice” EPYC CPUs for coherent memory sharing between CPUs and accelerators in heterogeneous deployments.

Instinct MI355X GPU Highlights:
- 288GB HBM3E memory, 8TB/s bandwidth
- Up to 20 PFLOPS FP4, 10 PFLOPS FP8, and 5 PFLOPS FP16
- Scales to 128 GPUs per rack with air- or liquid-cooling
- Supports 2.6 EF (FP4) per rack and models up to 520B parameters
- 40% more tokens-per-dollar vs. NVIDIA B200 in benchmarked inference
ROCm 7 AI Software Stack:
- 3.5x inference and 3x training uplift over ROCm 6
- Native support for FP4, SGLang, vLLM, PyTorch, JAX
- Enterprise-ready MLOps for orchestration, compliance, and scaling
- Available via AMD Developer Cloud, enabling free access for open-source developers
Helios Rack-Scale AI Platform (Launching 2026):
- Up to 72 next-gen MI400 GPUs with 432GB HBM4 and 19.6TB/s bandwidth per GPU
- “Venice” EPYC CPUs (Zen 6) with 256 cores and 1.6TB/s memory bandwidth
- 260TB/s UALink fabric bandwidth across rack
- “Vulcano” NICs with 800G Ethernet and 8x GPU scale-out bandwidth
- Fully OCP-compliant design with UEC, CXL, and UALink integration
Networking Innovation for AI Infrastructure:
- UALink™: Open accelerator interconnect enabling 1,024+ GPU domains
- Ultra Ethernet: UEC-compliant NICs support programmable congestion control, path-aware routing
- Vulcano NIC: PCIe/UALink hybrid interface with 800G throughput
- CXL 3.0: Coherent CPU-GPU memory sharing in heterogeneous AI systems
- End-to-end openness: All networking layers designed for interoperability, avoiding vendor lock-in
Ecosystem Momentum:
- Meta: Running Llama 3/4 on MI300X, planning adoption of MI350 and MI400
- Microsoft Azure: Using Instinct GPUs for proprietary and OSS model hosting
- Oracle Cloud Infrastructure: Deploying 131,072 MI355X GPUs in zettascale clusters
- Red Hat: Shipping OpenShift AI on AMD GPU platforms
- Cohere, xAI, HUMAIN: Building LLM inference and agentic workloads on AMD silicon
- Astera Labs and Marvell: Collaborating on open interconnects including UALink

“AMD is driving AI innovation at an unprecedented pace, highlighted by the launch of our MI350 Series accelerators, our expanding ROCm software ecosystem, and the preview of our Helios rack platform,” said Dr. Lisa Su, Chair and CEO of AMD. “We are building the most open, most performant, and most scalable AI infrastructure portfolio in the industry—one that enables our customers and partners to unlock the full potential of generative and agentic AI at every level of deployment. From silicon to systems to networking, AMD is empowering a new era of open, rack-scale computing that redefines what’s possible in AI.”