• Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Sunday, April 12, 2026
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » OIF 448: Google’s AI Challenge – Scaling Networks for 100K+ TPU Clusters

OIF 448: Google’s AI Challenge – Scaling Networks for 100K+ TPU Clusters

May 18, 2025
in Video
A A

At the recent OIF 448G Workshop in Santa Clara, Tad Hofmeister, Optical Hardware Engineer on Google’s Machine Learning Systems team, offered a deep dive into Google’s evolving AI infrastructure and made a compelling case for accelerating industry-wide support for 448Gbps electrical interfaces. Hofmeister, a long-time OIF contributor now focused on data center interconnects for AI workloads, outlined the demands of hyperscale AI clusters—both Google’s custom TPU-based systems and NVIDIA-based GPU clusters—and their growing reliance on high-speed, high-density connectivity to handle scale-up and scale-out traffic.

Hofmeister emphasized that while power and cost are always factors, the central motivation for 448G is simple: XPUs are running out of I/O escape. As Google’s Ironwood TPUs and NVIDIA’s Grace Blackwell GPUs push the limits of on-chip compute, the need to move more data between devices becomes critical. Hofmeister detailed both Google’s proprietary ICI-based TPU topology—which uses optical interconnects between cube-style clusters—and NVIDIA’s rack-contained NVLink GPU architectures, highlighting how both platforms demand massive bandwidth density and flexibility, with increasing adoption of co-packaged copper (CPC) to overcome signal integrity and density challenges.

He urged standards bodies to prioritize fast decision-making, suggesting the industry choose between PAM6 and PAM8 to avoid delays, and supported new front-panel connector MSAs tailored for 448G, even at the expense of backward compatibility. Hofmeister concluded by warning against designs that cannot be reliably serviced at scale and encouraged the community to adopt solutions that support flexibility, testability, and production viability.

• Google’s TPU-based AI clusters use a proprietary interconnect with optical circuit switching between racks, enabling scale-up to 9,216 TPUs per superpod.
• XPU trays must support both copper and optical interconnects via modular OSFPs for flexible deployment.
• The move to 448G is driven by package I/O limitations, not just performance or power savings.
• Google is skeptical that PAM4 will close at 448G and advocates for PAM6 or PAM8.
• Co-packaged copper is critical to bypass PCB limitations and achieve SerDes targets.
• Front-panel pluggables with improved connectors and possibly 12V power are needed to support up to 50W modules for high-performance optics.
• New connector MSAs should prioritize signal integrity over backward compatibility.
• Reliability, serviceability, and supply chain flexibility must be core design principles.

Tad Hofmeister, Optical Hardware Engineer, Google:

“448G isn’t just about speed—it’s about survival. We’re hitting the ceiling on how many SerDes we can escape from these XPUs. The path forward requires rethinking connector design, embracing co-packaged copper, and accepting that some legacy constraints must be broken to get where AI needs us to go.”

Want to be involved our video series? Contact info@nextgeninfra.io
https://ngi.fyi/oif448-google-tad

Tags: 448GoogleOIF
ShareTweetShare
Previous Post

Charter to Acquire Cox in $34.5B Deal

Next Post

OIF 448: Meta on Scaling Bandwidth from 228 to 448G

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Anthropic Expands Use of Google Cloud TPUs, Targeting One Million Units 
AI Infrastructure

Google Cloud to Build New Türkiye Region as Part of $2B, 10-Year Investment

November 24, 2025
OIF Update: External Light Source and CPO, 400ZR Survey, new WG chairs
All

OIF Publishes 112 Gb/s RTLR Interface Spec Targeting Lower-Power Optical Link

November 18, 2025
Anthropic Expands Use of Google Cloud TPUs, Targeting One Million Units 
AI Infrastructure

Google Commits $40B for AI Infrastructure in Texas

November 14, 2025
Video: Industry-Wide Collaboration on 448G
Optical

OIF Charts Path to 448G/Lane Interconnects

November 12, 2025
Google Cloud Details Ironwood TPUs and Axion CPUs for AI Inference 
AI Infrastructure

Google Cloud Details Ironwood TPUs and Axion CPUs for AI Inference 

November 9, 2025
Microsoft Cloud and AI Momentum Drive Results, CAPEX Rockets Up
AI Infrastructure

Google Sees Surging AI Infrastructure Expenses

October 29, 2025
Next Post
Video: Industry-Wide Collaboration on 448G

OIF 448: Meta on Scaling Bandwidth from 228 to 448G

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version