• Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Friday, April 10, 2026
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » AMD Lays Out Full-Stack Vision for AI Infrastructure

AMD Lays Out Full-Stack Vision for AI Infrastructure

June 15, 2025
in All
A A

At its “Advancing AI 2025” event, AMD unveiled a complete roadmap for next-generation AI infrastructure, emphasizing open networking standards, scalable system architectures, and developer-accessible software. The announcement included the launch of the Instinct™ MI350 Series GPUs, ROCm 7 software stack, and a preview of “Helios,” a fully integrated rack-scale AI platform set to arrive in 2026. These developments underscore AMD’s commitment to disaggregated, standards-based architectures as AI workloads become increasingly distributed and network-bound.

The Instinct MI355X GPU, based on AMD’s CDNA 4 architecture, delivers up to 20 PFLOPS FP4, backed by 288GB of HBM3E memory and 8TB/s bandwidth. Systems scale up to 128 GPUs per rack in liquid-cooled configurations, achieving 2.6 exaFLOPS of AI compute and supporting models with over 500 billion parameters. ROCm 7, AMD’s open-source AI software stack, brings full support for FP4, advanced inference optimization, and turnkey MLOps tools. Together, these technologies form the foundation of AMD’s open rack-scale AI strategy.

At the heart of this infrastructure is a strong emphasis on networking innovation. AMD is a founding member of the UALink™ Consortium, which is establishing an open interconnect standard for GPU-to-GPU communications across servers and racks. The MI355X and upcoming MI400 Series GPUs are UALink-enabled, allowing up to 72 GPUs per Helios rack to operate as a unified compute domain. UALink offers 260TB/s of intra-rack bandwidth, surpassing proprietary fabrics like NVLink in scalability and openness. AMD’s roadmap also includes support for tunneling UALink over Ultra Ethernet, blending the performance of scale-up fabrics with the flexibility of Ethernet.

AMD’s Pensando™ Pollara 400 and upcoming Vulcano AI NICs are fully compliant with the Ultra Ethernet Consortium (UEC) specification, offering programmable congestion control, path-aware routing, and support for up to 800G network throughput. The Vulcano NIC, launching with the Helios rack in 2026, will deliver 8x greater scale-out bandwidth per GPU than current generation NICs. These networking components enable distributed inference and Mixture of Experts (MoE) workloads to scale with low latency and maximum throughput—critical for the future of agentic AI. AMD’s approach also leverages CXL 3.0 in its “Venice” EPYC CPUs for coherent memory sharing between CPUs and accelerators in heterogeneous deployments.

  • Instinct MI355X GPU Highlights:
    • 288GB HBM3E memory, 8TB/s bandwidth
    • Up to 20 PFLOPS FP4, 10 PFLOPS FP8, and 5 PFLOPS FP16
    • Scales to 128 GPUs per rack with air- or liquid-cooling
    • Supports 2.6 EF (FP4) per rack and models up to 520B parameters
    • 40% more tokens-per-dollar vs. NVIDIA B200 in benchmarked inference
  • ROCm 7 AI Software Stack:
    • 3.5x inference and 3x training uplift over ROCm 6
    • Native support for FP4, SGLang, vLLM, PyTorch, JAX
    • Enterprise-ready MLOps for orchestration, compliance, and scaling
    • Available via AMD Developer Cloud, enabling free access for open-source developers
  • Helios Rack-Scale AI Platform (Launching 2026):
    • Up to 72 next-gen MI400 GPUs with 432GB HBM4 and 19.6TB/s bandwidth per GPU
    • “Venice” EPYC CPUs (Zen 6) with 256 cores and 1.6TB/s memory bandwidth
    • 260TB/s UALink fabric bandwidth across rack
    • “Vulcano” NICs with 800G Ethernet and 8x GPU scale-out bandwidth
    • Fully OCP-compliant design with UEC, CXL, and UALink integration
  • Networking Innovation for AI Infrastructure:
    • UALink™: Open accelerator interconnect enabling 1,024+ GPU domains
    • Ultra Ethernet: UEC-compliant NICs support programmable congestion control, path-aware routing
    • Vulcano NIC: PCIe/UALink hybrid interface with 800G throughput
    • CXL 3.0: Coherent CPU-GPU memory sharing in heterogeneous AI systems
    • End-to-end openness: All networking layers designed for interoperability, avoiding vendor lock-in
  • Ecosystem Momentum:
    • Meta: Running Llama 3/4 on MI300X, planning adoption of MI350 and MI400
    • Microsoft Azure: Using Instinct GPUs for proprietary and OSS model hosting
    • Oracle Cloud Infrastructure: Deploying 131,072 MI355X GPUs in zettascale clusters
    • Red Hat: Shipping OpenShift AI on AMD GPU platforms
    • Cohere, xAI, HUMAIN: Building LLM inference and agentic workloads on AMD silicon
    • Astera Labs and Marvell: Collaborating on open interconnects including UALink

“AMD is driving AI innovation at an unprecedented pace, highlighted by the launch of our MI350 Series accelerators, our expanding ROCm software ecosystem, and the preview of our Helios rack platform,” said Dr. Lisa Su, Chair and CEO of AMD. “We are building the most open, most performant, and most scalable AI infrastructure portfolio in the industry—one that enables our customers and partners to unlock the full potential of generative and agentic AI at every level of deployment. From silicon to systems to networking, AMD is empowering a new era of open, rack-scale computing that redefines what’s possible in AI.”


ShareTweetShare
Previous Post

Oracle Taps AMD MI355X GPUs for Zettascale AI Supercluster 

Next Post

Explaining CPO

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Cisco, G42, and AMD to Build AI Infrastructure in the UAE
AI Infrastructure

DigitalBridge Teams with KT for AI Data Centers in Korea

November 26, 2025
BerryComm Expands Central Indiana Fiber with Nokia
5G / 6G / Wi-Fi

Telefónica Germany Awards Nokia a 5-Year RAN Modernization Deal

November 26, 2025
AMD’s Compute + Pensando Network Architecture Powers Zyphra’s AI 
AI Infrastructure

AMD’s Compute + Pensando Network Architecture Powers Zyphra’s AI 

November 25, 2025
Bleu, the “Cloud de Confiance” from Capgemini and Orange
Clouds and Carriers

Orange Business Begins Migration of 70% of IT Infrastructure to Bleu Cloud

November 25, 2025
Dell’s server and networking sales rise 16% yoy
Financials

Dell Raises FY26 AI Infrastructure Outlook as AI Server Shipments Surge 150%

November 25, 2025
GlobalFoundries acquires Tagore Technology’s GaN IP
Optical

GlobalFoundries Acquires InfiniLink for Silicon-Photonics Expertise

November 25, 2025
Next Post
Explaining CPO

Explaining CPO

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version