• Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Friday, April 10, 2026
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » Google’s Amin Vahdat: Networking Is the Bottleneck

Google’s Amin Vahdat: Networking Is the Bottleneck

March 31, 2025
in Optical
A A

San Francisco, March 31, 2025 — In a keynote at the Optica Executive Forum, Amin Vahdat, VP and Fellow at Google, delivered an urgent message for this week’s OFC 2025 conference: the future of AI hinges not just on compute and storage, but on solving the networking bottleneck. “Networking is the number one bottleneck we face,” Vahdat asserted, emphasizing that achieving the next generation of AI breakthroughs will require a radical rethinking of how data moves across systems — from chip-level to global scale.

Vahdat traced the history of computing from the early days of copper links to today’s multi-gigawatt AI infrastructure powered by TPUs and GPUs. He described how AI workloads, particularly model training and serving, now demand ultra-high bandwidth, low-latency interconnects. To support synchronous workloads running on thousands of TPUs, Google has turned to optical circuit switching, using MEMS-based systems to enable real-time failover and reconfiguration. These switches have become essential, not optional, to maintaining the reliability and scale of modern AI clusters.

Key Takeaways

• The Network is the Bottleneck:

• Vahdat called networking the primary limiting factor for AI scaling — more than compute or storage.

• Demand for compute is growing 10× per year, but interconnects aren’t keeping pace.

• Optical Circuit Switching at Google:

• Google’s AI clusters use optical circuit switches (OCS) for interconnecting 144 racks (8,960 TPUs) per pod.

• MEMS-based switches enable real-time reconfiguration after failures, ensuring continued synchronous computation.

• Google reconfigures its OCS multiple times daily to manage failures dynamically.

• Optical losses and transceiver reliability remain critical challenges — with optics being the #1 failure point in Google’s infra.

• MEMS and Reliability:

• MEMS switches have proven reliable once qualified, but yield and packaging remain hurdles.

• Google urged the optical industry to drive better reliability standards beyond traditional telecom benchmarks.

• TPU Network Design:

• Google designed a proprietary non-Ethernet protocol for TPU-to-TPU communication to minimize overhead.

• Each TPU has bandwidth equivalent to a mid-sized Ethernet switch to support intensive, synchronous workloads.

• Liquid Cooling and Power Constraints:

• Liquid cooling isn’t optional anymore; it delivers higher compute-per-watt and supports thermally dense systems.

• Power, not cost, is the main constraint on scaling data centers today — prompting a shift to perf-per-watt as the main design metric.

• AI Architecture Trends:

• Moving from general-purpose CPUs to specialized compute (TPUs, GPUs) is key to efficiency gains.

• Google sees microarchitectural optimization, system-level tuning, and algorithmic improvements as main drivers going forward.

• Latency and Cluster Size:

• Latency is still masked by clever software, but reducing it would unlock more efficient scaling.

• Google is hedging with network designs that could scale to a million-node clusters — even if not all elements are synchronized at once.

• Future Outlook:

• AI’s next phase is about delivering real-time insight, not just information.

• Vahdat predicts massive breakthroughs over the next five years — contingent on solving networking and reliability constraints.

Tags: GoogleOCSOFC25OpticaOptical Switch
ShareTweetShare
Previous Post

Optica Executive Forum: Photonic-enabled Modules Heading to 1.6T and 3.2T

Next Post

Eoptolink Demos 1.6T LRO Optical Transceivers

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Anthropic Expands Use of Google Cloud TPUs, Targeting One Million Units 
AI Infrastructure

Google Cloud to Build New Türkiye Region as Part of $2B, 10-Year Investment

November 24, 2025
Anthropic Expands Use of Google Cloud TPUs, Targeting One Million Units 
AI Infrastructure

Google Commits $40B for AI Infrastructure in Texas

November 14, 2025
Google Cloud Details Ironwood TPUs and Axion CPUs for AI Inference 
AI Infrastructure

Google Cloud Details Ironwood TPUs and Axion CPUs for AI Inference 

November 9, 2025
iPronics Pushes Low-Power, Programmable Optical Switch
Optical

iPronics Pushes Low-Power, Programmable Optical Switch

November 5, 2025
Microsoft Cloud and AI Momentum Drive Results, CAPEX Rockets Up
AI Infrastructure

Google Sees Surging AI Infrastructure Expenses

October 29, 2025
Google and NextEra to Restart Iowa’s Duane Arnold Nuclear Plant 
AI Infrastructure

Google and NextEra to Restart Iowa’s Duane Arnold Nuclear Plant 

October 29, 2025
Next Post
Eoptolink Demos 1.6T LRO Optical Transceivers

Eoptolink Demos 1.6T LRO Optical Transceivers

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version