Oracle Taps AMD MI355X GPUs for Zettascale AI Supercluster

Oracle and AMD have announced a major expansion of their partnership to deliver high-performance AI infrastructure, with Oracle Cloud Infrastructure (OCI) set to become one of the first hyperscalers to deploy AMD’s new Instinct MI355X GPUs. The forthcoming OCI supercluster will scale up to 131,072 GPUs, targeting massive AI workloads including large language model training, generative inference, and next-gen agentic applications.

The new MI355X-powered compute shapes promise a 2.8X performance boost over previous AMD generations, enabled by 288GB of HBM3 and up to 8TB/s memory bandwidth. Support for the FP4 standard will allow efficient inference of 4-bit quantized models, and dense liquid-cooled racks with 64 GPUs per rack aim to optimize thermal efficiency and performance density for hyperscale training. Oracle’s zettascale cluster will feature AMD Turin CPUs as powerful head nodes and AMD Pollara network interface cards for ultra-low latency RoCE networking.

The collaboration also builds on AMD’s ROCm open software stack to ensure open-source compatibility and avoid vendor lock-in. By deploying AMD Pollara NICs on its backend, Oracle becomes the first cloud provider to implement Ultra Ethernet Consortium standards for AI networking at scale. “The latest generation of AMD Instinct GPUs and Pollara NICs on OCI will help support new use cases in inference, fine-tuning, and training,” said Forrest Norrod, EVP at AMD.

OCI to deploy up to 131,072 AMD Instinct MI355X GPUs in zettascale supercluster
MI355X offers 2.8X throughput boost and 50% more memory than prior generation
288GB HBM3 per GPU and FP4 support for efficient LLM inference
Liquid-cooled racks at 125kW each, 64 GPUs per rack at 1,400W per GPU
AMD Pollara NICs bring programmable congestion control and UEC standards to AI networking

“We are dedicated to providing the broadest AI infrastructure offerings,” said Mahesh Thiagarajan, EVP at Oracle Cloud Infrastructure. “AMD Instinct GPUs, paired with OCI’s performance, advanced networking, flexibility, and scale, will help our customers meet their inference and training needs for AI workloads and new agentic applications.”