At its NETWORKING @SCALE 2024 event in Santa Clara, Meta engineers Jyotsna Sundaresan and Abhishek Gopalan shared insights into how Meta’s global backbone network powers real-time communication across its platforms, such as Instagram and WhatsApp, while highlighting the growing impact of AI on network infrastructure. Meta operates one of the world’s largest backbone networks, connecting over 25 data centers and 85 points of presence, supported by millions of miles of fiber. The rapid growth in AI traffic has placed unprecedented demand on this infrastructure, with backbone capacity expanding by 30% annually for the past five years.
The team discussed how AI workloads, which initially stayed within regional data centers, have grown 100% year over year since 2022. This shift has created new challenges in data replication, data placement, and the need for freshness, significantly impacting the backbone. For example, AI training jobs require massive data movement between regions at a planetary scale, placing further strain on Meta’s global network. To address these challenges, Meta has introduced solutions such as improved data caching and optimization strategies to manage the surge in data traffic caused by AI.
Looking ahead, Meta continues to innovate, optimizing network operations across the stack, from computing to storage. AI-driven demands on the network are volatile, and the company is focused on expanding both the supply curve and the capacity of its infrastructure. The engineers emphasized the importance of maintaining flexible network design to handle the unpredictable growth in AI workloads and to ensure global connectivity for Meta’s billions of users.

Key Points:
Meta operates one of the largest backbone networks in the world, connecting over 25 data centers and 85 points of presence.
• The backbone is supported by millions of miles of terrestrial and subsea fiber routes.
• Backbone capacity has grown by over 30% annually for the last five years.
• AI-driven traffic demand has grown over 100% year over year since 2022.
• Initial AI traffic stayed within regional data centers but has expanded to require backbone resources.
• Meta faced unexpected challenges in AI traffic management, including data replication, placement, and freshness.
• Data replication for AI happens at a planetary scale, moving exabytes of data across the globe.
• AI workloads require frequent data movement between regions for training, further straining backbone resources.
• Meta has implemented solutions like data caching and better data placement strategies to manage increased traffic.
• Fresh data needs and replication frequency are significantly higher in AI workloads compared to non-AI.
• Meta’s infrastructure faces challenges from both demand volatility and supply constraints, including hardware and fiber availability.
• The company is optimizing its network across computing, data, and storage to better support AI workloads.
• Meta operates a differentiated class of service network, providing different QoS guarantees for various workloads.
• The backbone is a shared resource used by all of Meta’s products, including Facebook, Instagram, and WhatsApp.
• Meta is designing the backbone to handle unpredictable spikes in AI-driven traffic growth.
• Network optimizations include reducing cross-region data fetches, improving caching, and better understanding data flows.
• AI workloads have lower fungibility with hardware, complicating placement and resource allocation.
• The company is addressing the balance between infrastructure supply (fiber, hardware) and volatile AI-driven demand.
• Meta is expanding its physical infrastructure, including power and fiber, to keep pace with AI traffic growth.
• Meta’s infrastructure supports both AI and non-AI traffic, but AI introduces unique complexities in backbone management.
• The company has learned that collaboration across computing, storage, and network teams is essential for scaling AI.
• Meta is proactively adjusting its backbone to meet the growing needs of AI, while still supporting its traditional workloads.
• Future growth in AI models, including generative AI and AGI, presents unknown impacts on network requirements.
• Meta’s approach focuses on long-term scalability, keeping ahead of demand through both supply expansion and optimization.
• The global scale of Meta’s backbone allows it to deliver real-time experiences across its platforms, even in remote areas.
• Meta’s backbone and AI innovations ensure billions of users stay connected, regardless of location or device.
Video replay will be posted on the Networking @Scale site








