xAI Taps NVIDIA Spectrum-X for Ethernet Linking 100,000 GPUs

Jim Carroll

1 year ago

NVIDIA confirmed that xAI’s Colossus supercomputer, housing 100,000 NVIDIA Hopper GPUs in Memphis, Tennessee, reached this scale with the NVIDIA Spectrum-X Ethernet networking platform. Designed to handle large-scale AI processing, Spectrum-X supports standards-based Ethernet, delivering high efficiency for remote direct memory access (RDMA) across AI data centers. The Colossus AI cluster, the largest of its kind, trains xAI’s Grok language models, which serve as the foundation for chatbot features available to X Premium subscribers.

100,000 NVIDIA Hopper GPUs power Colossus, expanding to 200,000
Spectrum-X Ethernet platform achieves 95% data throughput with RDMA
122-day build, reaching training-ready status in 19 days
Uses adaptive routing, congestion control, and enhanced performance isolation

“AI is becoming mission-critical and requires increased performance, security, scalability and cost-efficiency,” said Gilad Shainer, senior vice president of networking at NVIDIA. “The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerates the development, deployment and time to market of AI solutions.”

“Colossus is the most powerful training system in the world,” said Elon Musk on X. “Nice work by xAI team, NVIDIA and our many partners/suppliers.”

From @ServeTheHome on X