• Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Friday, April 10, 2026
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » Optica Executive Forum: Tech Giants Debate Future of Photonics in AI Clusters

Optica Executive Forum: Tech Giants Debate Future of Photonics in AI Clusters

April 1, 2025
in Optical
A A

by James E. Carroll

San Francisco, March 31, 2025 — At the Optica Executive Forum during OFC 2025, an all-star panel of infrastructure leaders from Microsoft, NVIDIA, Meta, and Arista Networks convened to tackle one of the biggest bottlenecks in AI computing: the interconnect. With hyperscale data centers scaling up to millions of GPUs and petabytes of data in motion, the panel explored how photonic interconnects can bridge the performance, power, and reliability gaps emerging in next-gen AI clusters.

Moderated by Chris Pfistner of Avicena Tech, the session broke down the multi-layered topology of data center networks—front-end, scale-out, and the elusive scale-up layer—where traditional copper links still dominate due to cost and simplicity. But as Microsoft’s Pradeep Sindhu explained, AI workloads now require interconnecting thousands of GPUs per pod with ever-increasing bandwidth per device. “Copper is simply not going to scale to 256 or 512 GPUs in a pod,” Sindhu warned. “The opportunity for optics lies squarely in the scale-up layer.”

NVIDIA’s Ashkan Seyedi showcased the company’s latest advances in co-packaged optics (CPO), introduced just weeks prior, as a way to tackle power inefficiency and reduce network jitter. The new Spectrum-X platform, which integrates optics directly onto the switch package, was framed as a critical enabler of GPU utilization. “Power is directly translatable to money,” Seyedi noted. “With CPO, we can interconnect 3× the number of GPUs at the same network power budget compared to pluggables.”

Meta’s Drew Alduino provided a sobering reality check, grounding the conversation in operational experience. Meta is planning to deploy 1.3 million GPUs this year, backed by a $60–$65 billion CapEx investment. “It’s not all optics,” he said, “but it’s not not optics either.” Alduino emphasized how reliability, not just bandwidth, is now the industry’s Achilles’ heel. One failing optical link can cause a cascading stall across an entire AI training job—an issue that grows exponentially with cluster size. “With 100,000 nodes, you’re failing every 20 seconds unless your network becomes bulletproof.”

Arista’s Andy Bechtolsheim championed linear pluggable optics (LPO) as a more practical and serviceable alternative to CPO. “Yes, you get the same power and latency,” he said, “but pluggables offer better serviceability, faster repair cycles, and open multi-vendor compatibility.” He urged the industry to accelerate development of high-density 64-lane pluggable modules, arguing that many of the benefits attributed to CPO can be achieved in a pluggable form factor without the system-level downsides.


Key Takeaways from the Panel + Q&A

• AI cluster growth is exponential: Meta expects 1.3M GPUs online in 2025, with data centers drawing over 2 GW—equivalent to powering San Francisco.

• Photonic interconnects are already used in scale-out and long-haul links, but have yet to penetrate the scale-up GPU-to-GPU domain.

• Microsoft’s view: Optical transceivers are essential for scaling pod sizes beyond 64 GPUs; copper will hit thermal and signal integrity limits.

• NVIDIA’s Spectrum-X CPO platform promises 3× GPU interconnect density at the same network power footprint as traditional pluggables.

• Meta emphasized that reliability—especially soft/transient failures—has become the biggest barrier to scaling AI infrastructure.

• CPO vs LPO: CPO offers tighter integration and lower power; LPO provides superior modularity and easier diagnostics and replacement.

• Total Cost of Ownership (TCO): Panelists agreed that cost-per-link isn’t the only metric; performance-per-TCO across the entire data center is what really matters.

• Shoreline bottlenecks (limited physical IO off GPUs) are being addressed through 3D packaging, chiplet designs, and short-reach electrical channels.

• Optical Circuit Switching (OCS) is not a substitute for packet switching in AI training workloads—OCS is more akin to a dynamic patch panel.

• Serviceability risk: CPO failures require replacing the entire switch chassis, whereas LPO failures can be isolated to a single module, saving hours.

• Failure trends: Most failures are not lasers but components like wire bonds and connectors; better integration can mitigate risks.

• GR-468 not sufficient: Data center scale brings unique reliability and testing needs not covered by telecom-grade standards.

• Future timeline: Copper will dominate GPU-to-GPU interconnects through 2027, but optical scale-up is inevitable as rack densities rise.

• Call to industry: Bechtolsheim urged development of a new open 64-lane pluggable standard to avoid being locked into closed CPO solutions.


Tags: AristaOFC25Optica
ShareTweetShare
Previous Post

OFC 2025 Panel: Million-GPU Clusters Push Networks to the Breaking Point

Next Post

Lessengers Launches 1.6T Multimode OSFP Transceiver

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Arista Enhances CloudVision with Multi-Domain Automation and AI Data Lake
Financials

Arista Raises AI Revenue Target to $2.75B as Ethernet Evolves for AI Scale-Up

November 4, 2025
Arista’s Andy Bechtolsheim: Pluggables Still Reign as AI Drives Next Wave of 1.6T and 3.2T
Optical

Arista’s Andy Bechtolsheim: Pluggables Still Reign as AI Drives Next Wave of 1.6T and 3.2T

November 4, 2025
Arista Debuts R4 Series for 800G AI and Cloud Networks
Data Centers

Arista Debuts R4 Series for 800G AI and Cloud Networks

October 29, 2025
Hot Interconnects: Arista Outlines Pathways to Energy-Efficient Optics and Liquid-Cooled Racks
AI Infrastructure

Hot Interconnects: Arista Outlines Pathways to Energy-Efficient Optics and Liquid-Cooled Racks

August 21, 2025
Arista Enhances CloudVision with Multi-Domain Automation and AI Data Lake
Financials

Arista Q2 Revenue Surges 30% as AI Networking Accelerates

August 5, 2025
Arista Welcomes VeloCloud, Expands Campus and Branch Networking
Financials

Arista Welcomes VeloCloud, Expands Campus and Branch Networking

July 1, 2025
Next Post
POET and Lessengers partner on 800G DR8 transceivers

Lessengers Launches 1.6T Multimode OSFP Transceiver

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version