Converge Digest

PECC Summit: Broadcom’s Near Margalit on CPO Evolution

At the Photonic Enabled Cloud Computing (PECC) Summit in Silicon Valley, Near Margalit, Vice President and General Manager of Broadcom’s Optical Systems Division, shared the company’s latest thinking on how optical backplanes and co-packaged optics (CPO) will reshape the physical boundaries of AI infrastructure.

Margalit began by mapping the “cost of moving data” across layers of compute—from on-die interconnects, to package-level communication, to chassis, rack, and finally data center scale. Each expansion in scope, he noted, increases both latency and energy consumption. The traditional boundary between scale-up and scale-out has long been defined by the limits of electrical backplanes. Optical interconnects now make it possible to push that boundary further.

“Our goal is to make in-row optical connectivity equivalent to electrical backplanes in cost, power, and latency,” he said. “That’s the key to extending scale-up systems beyond a single rack.”

Extending Scale-Up to the Row

Broadcom envisions optical links replacing short copper reaches to connect entire rows of AI accelerators—dozens of racks operating as a unified compute domain. Achieving that requires optical systems that match the efficiency of electrical interconnects while dramatically simplifying cabling.

“If you look at a large AI cluster, the number of fibers is astronomical,” Margalit explained. “For around 33,000 fibers per system, about 85 percent are still being manually routed. That’s not sustainable.”

To solve this, Broadcom and its partners are exploring optical flexplane and blind-mate connector concepts that allow for pre-assembled fiber systems to be shipped, installed, and serviced more easily. “This is a huge opportunity for innovation—manufacturable, cleanable, resilient optical backplanes that can scale to thousands of endpoints,” he said.

Designing a 1,000-XPU Optical Cluster

Using data from Broadcom’s internal modeling, Margalit described how a 1,000-XPU AI cluster could be interconnected using 128 parallel 200-terabit switches. Each accelerator would have multiple bidirectional fiber links providing full any-to-any connectivity across the system.

“If you look ahead to 2027 or 2028,” he said, “we expect switch bandwidths to reach around 200 terabits per second. That allows you to connect roughly a thousand XPUs using about 128 switches.”

The design emphasizes redundancy and graceful degradation. Even if one switch fails, the overall system loses less than one percent of total capacity thanks to network-level parallelism and recovery mechanisms.

Reliability and Serviceability of CPO

Margalit highlighted Broadcom’s multi-year CPO reliability program, covering millions of device-hours across multiple system generations. Leveraging semiconductor-style manufacturing, Broadcom’s optical engines integrate photonic and electronic dies on a common substrate for higher uniformity and durability.

“The optical chiplet is built on similar infrastructure as an HPU,” he said. “Instead of memory, you have photonics. We’re applying the same manufacturing maturity that made semiconductors reliable to optics.”

Broadcom’s testing data shows telecom-grade reliability over millions of device-hours, with no non-serviceable CPO failures reported. Margalit noted that the key to serviceability lies in modular laser sources—using pluggable external laser modules that can be replaced in the field without interrupting the rest of the switch.

Designing for Failure Tolerance

Even with extreme reliability, failure is inevitable at hyperscale. A single AI cluster can include more than 260,000 optical endpoints, making some link interruptions statistically unavoidable.

“It’s not realistic to assume zero failures,” Margalit said. “The goal is to tolerate them.”

Broadcom’s optical fabric design allows for micro-session level recovery, where network software reroutes traffic instantly across redundant links. “If a lane or even a switch fails, we lose less than one percent of the path diversity,” he explained. “That’s how you maintain uptime in massive AI systems.”

Building for Coexistence

Margalit concluded by emphasizing that electrical and optical systems will coexist for years to come. “Electrical backplanes are simple and passive; optics add flexibility and reach,” he said. “The future is about balancing both domains to get the reliability, power, and performance AI demands.”


Key Takeaways



🌐 We’re tracking the evolution of AI-scale optical interconnects and co-packaged switch architectures at

ConvergeDigest.com/category/data-centers

Exit mobile version