HOTI25: Cornelis Presents its 576-Port Director Switch and Sub-µs Latency

AI and HPC clusters are no longer limited by processors—they’re constrained by the network fabric tying them together. At today’s online Hot Interconnects, Cornelis Networks’ Field CTO Matt Williams argued that without faster, more efficient interconnects, the next generation of scientific computing and AI training workloads will stall. Cornelis is betting that its Omni-Path architecture, purpose-built for scale-out performance, can outpace InfiniBand with lower latency, higher message rates, and consistent bandwidth at massive scale.

Williams described Cornelis as the “inventors of Omni-Path,” originally developed at Intel and spun out into an independent company. Today, Cornelis ships an end-to-end solution: Omni-Path SuperNICs, custom-designed switches, and an open-source OPX libfabric software stack integrated into the Linux kernel. The hardware portfolio includes a 48-port top-of-rack switch and a director-class system scaling to 576 ports in just 17RU at half the power draw of comparable InfiniBand platforms. On the software side, Omni-Path seamlessly supports MPI, PyTorch, TensorFlow, CUDA, and ROCm, making it a drop-in fabric for both HPC and AI environments.

Cornelis highlighted architectural features that set Omni-Path apart: sub-microsecond MPI latency, 2.5x the message rate of InfiniBand NDR, and fine-grained adaptive routing (FGAR) that spreads traffic across multiple paths while sharing congestion telemetry between switches. Its link-level retry mechanism retransmits errored packets locally, avoiding application-level slowdowns caused by end-to-end retransmissions. In early benchmarks, Omni-Path delivered 23–34% lower latency and up to 45% higher performance on sensitive HPC workloads.

Custom SuperNICs and switches designed for HPC + AI
576-port director switch in 17RU with 50% less power than rivals
Sub-µs MPI latency and up to 2.5x InfiniBand NDR message rate
FGAR ensures consistent bandwidth under congestion
Link-level retry prevents packet loss from bit errors

“We designed Omni-Path from the silicon up to deliver the lowest latency, highest message rate, and most efficient bandwidth utilization for HPC and AI applications,” said Matt Williams, Field CTO at Cornelis Networks.

🌐 Analysis: Cornelis is positioning Omni-Path as the fabric of choice for AI and HPC clusters that demand more than incremental gains. By controlling the silicon, system design, and software stack, Cornelis can optimize end-to-end performance in ways competitors like NVIDIA’s InfiniBand struggle to match. With cluster sizes expanding into hundreds of thousands of nodes, Cornelis’ strategy of low latency, high message rate, and local retry resilience could help HPC sites and hyperscalers reduce training times and energy costs.