• Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Saturday, April 11, 2026
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » AI Infrastructure Summit: Google’s Foundations of AI

AI Infrastructure Summit: Google’s Foundations of AI

September 10, 2025
in All
A A

Google outlined its vision for scaling AI compute at the AI Infrastructure Summit in Santa Clara, where Mark Lohmeyer, VP & GM of Compute and AI Infrastructure, delivered the keynote “What’s Next for the Foundations of AI.” Lohmeyer compared the speed of today’s AI breakthroughs to the early internet, pointing to surging demand for compute and power efficiency as the defining challenges of this era.

He revealed that AI token processing across Google products hit 980 trillion tokens per month in June 2025, doubling in just two months. At this scale, power availability—not chips or datacenter space—has become the primary constraint. Google is addressing this by driving efficiency across the stack, claiming a 33x reduction in energy per prompt for Gemini over the past year, with each prompt consuming just 0.25 watt-hours, equivalent to nine seconds of video playback.

Google also spotlighted its TPU Ironwood platform, scaling 9,000 chips per pod with 42.5 petaflops and 7.3 petabytes of memory bandwidth, as well as partnerships with NVIDIA on Blackwell GPUs. New services like Inference Gateway, Dynamic Workload Scheduler, and AI-optimized storage aim to cut costs, reduce latency, and simplify deployment of large-scale inference workloads.

• AI token traffic surged to 980 trillion per month by June 2025 (2x growth in 2 months)

• Equivalent to every person on Earth reading a novel monthly

• Power availability is the new limiting factor for AI infrastructure buildouts

• Gemini prompt uses ~0.25 Wh — equal to 9 seconds of video streaming

• Google cut energy per Gemini prompt 33x in one year

• Efficiency gains driven by speculative decoding, disaggregated serving, and mixtures-of-experts

• TPU Ironwood delivers 5x compute and 6x memory vs. prior gen

• 9,000 Ironwood chips scale into a superpod with 42.5 petaflops peak

• Pods linked by optical fabric with dynamic reconfiguration for resilience

• 7.3 PB of HBM accessible across 9,000 chips, addressing bottlenecks

• Fifth-generation liquid cooling deployed across Ironwood systems

• Google TPU platform now in its 7th generation, >10 years of iteration

• Native PyTorch support coming to TPUs alongside JAX/TF

• Partnership with NVIDIA Blackwell GPUs, A100/A200 series integrated in Google Cloud

• Three major NVIDIA-backed services launched this year for inference/training

• Inference Gateway GA: AI-aware routing balances workloads across servers

• Features include prefix-aware routing and disaggregated serving

• Inference Optimizer delivers best-practice configs and continuous tuning

• Dynamic Workload Scheduler: new consumption model with flex-start and calendar reservations

• Custom classes: workload profiles that auto-shift between TPUs and GPUs across pricing tiers

• AI-optimized storage caches weights near accelerators, reducing load times by 96%

• Eliminates need for customer-built caching solutions (used by Palantir, Toyota)

• Long-context workloads supported via high-performance managed storage

• Cloud Network (CloudLAN) interconnect delivers 40% lower latency globally

• Example: Toyota reduced AI model creation time by 20% using Google infrastructure

• Small Toyota team built full AI platform in half the expected time

• Google’s approach integrates compute, storage, networking, frameworks, and deployment into a full AI stack

“Power has become one of the most precious commodities we have, and the only way forward is relentless efficiency across the entire stack,” Lohmeyer said.

🌐 Analysis: This keynote reinforced Google’s strategy to pair NVIDIA GPUs with its custom TPU roadmap while attacking the power efficiency challenge head-on. The Ironwood superpod demonstrates Google’s ability to scale custom silicon rivaling AWS Trainium/Inferentia and Microsoft Maia/Cobalt. The focus on inference infrastructure—storage, scheduling, and latency optimization—shows that serving models efficiently is emerging as the next competitive battleground.

🌐 We’re tracking the latest developments in AI infrastructure. Follow our ongoing coverage at: https://convergedigest.com/category/ai-infrastructure/Google outlined its vision for scaling AI compute at the AI Infrastructure Summit in Santa Clara, where Mark Lohmeyer, VP & GM of Compute and AI Infrastructure, delivered the keynote “What’s Next for the Foundations of AI.” Lohmeyer compared the speed of today’s AI breakthroughs to the early internet, pointing to surging demand for compute and power efficiency as the defining challenges of this era.

He revealed that AI token processing across Google products hit 980 trillion tokens per month in June 2025, doubling in just two months. At this scale, power availability—not chips or datacenter space—has become the primary constraint. Google is addressing this by driving efficiency across the stack, claiming a 33x reduction in energy per prompt for Gemini over the past year, with each prompt consuming just 0.25 watt-hours, equivalent to nine seconds of video playback.

Google also spotlighted its TPU Ironwood platform, scaling 9,000 chips per pod with 42.5 petaflops and 7.3 petabytes of memory bandwidth, as well as partnerships with NVIDIA on Blackwell GPUs. New services like Inference Gateway, Dynamic Workload Scheduler, and AI-optimized storage aim to cut costs, reduce latency, and simplify deployment of large-scale inference workloads.

• AI token traffic surged to 980 trillion per month by June 2025 (2x growth in 2 months)

• Equivalent to every person on Earth reading a novel monthly

• Power availability is the new limiting factor for AI infrastructure buildouts

• Gemini prompt uses ~0.25 Wh — equal to 9 seconds of video streaming

• Google cut energy per Gemini prompt 33x in one year

• Efficiency gains driven by speculative decoding, disaggregated serving, and mixtures-of-experts

• TPU Ironwood delivers 5x compute and 6x memory vs. prior gen

• 9,000 Ironwood chips scale into a superpod with 42.5 petaflops peak

• Pods linked by optical fabric with dynamic reconfiguration for resilience

• 7.3 PB of HBM accessible across 9,000 chips, addressing bottlenecks

• Fifth-generation liquid cooling deployed across Ironwood systems

• Google TPU platform now in its 7th generation, >10 years of iteration

• Native PyTorch support coming to TPUs alongside JAX/TF

• Partnership with NVIDIA Blackwell GPUs, A100/A200 series integrated in Google Cloud

• Three major NVIDIA-backed services launched this year for inference/training

• Inference Gateway GA: AI-aware routing balances workloads across servers

• Features include prefix-aware routing and disaggregated serving

• Inference Optimizer delivers best-practice configs and continuous tuning

• Dynamic Workload Scheduler: new consumption model with flex-start and calendar reservations

• Custom classes: workload profiles that auto-shift between TPUs and GPUs across pricing tiers

• AI-optimized storage caches weights near accelerators, reducing load times by 96%

• Eliminates need for customer-built caching solutions (used by Palantir, Toyota)

• Long-context workloads supported via high-performance managed storage

• Cloud Network (CloudLAN) interconnect delivers 40% lower latency globally

• Example: Toyota reduced AI model creation time by 20% using Google infrastructure

• Small Toyota team built full AI platform in half the expected time

• Google’s approach integrates compute, storage, networking, frameworks, and deployment into a full AI stack

“Power has become one of the most precious commodities we have, and the only way forward is relentless efficiency across the entire stack,” Lohmeyer said.

🌐 Analysis: This keynote reinforced Google’s strategy to pair NVIDIA GPUs with its custom TPU roadmap while attacking the power efficiency challenge head-on. The Ironwood superpod demonstrates Google’s ability to scale custom silicon rivaling AWS Trainium/Inferentia and Microsoft Maia/Cobalt. The focus on inference infrastructure—storage, scheduling, and latency optimization—shows that serving models efficiently is emerging as the next competitive battleground.

🌐 We’re tracking the latest developments in AI infrastructure. Follow our ongoing coverage at: https://convergedigest.com/category/ai-infrastructure/

ShareTweetShare
Previous Post

Dell’Oro: Strong Growth for Data Center Physical Infrastructure

Next Post

Oracle Cloud Infrastructure Surges 55% on AI Contracts

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Cisco, G42, and AMD to Build AI Infrastructure in the UAE
AI Infrastructure

DigitalBridge Teams with KT for AI Data Centers in Korea

November 26, 2025
BerryComm Expands Central Indiana Fiber with Nokia
5G / 6G / Wi-Fi

Telefónica Germany Awards Nokia a 5-Year RAN Modernization Deal

November 26, 2025
AMD’s Compute + Pensando Network Architecture Powers Zyphra’s AI 
AI Infrastructure

AMD’s Compute + Pensando Network Architecture Powers Zyphra’s AI 

November 25, 2025
Bleu, the “Cloud de Confiance” from Capgemini and Orange
Clouds and Carriers

Orange Business Begins Migration of 70% of IT Infrastructure to Bleu Cloud

November 25, 2025
Dell’s server and networking sales rise 16% yoy
Financials

Dell Raises FY26 AI Infrastructure Outlook as AI Server Shipments Surge 150%

November 25, 2025
GlobalFoundries acquires Tagore Technology’s GaN IP
Optical

GlobalFoundries Acquires InfiniLink for Silicon-Photonics Expertise

November 25, 2025
Next Post
Oracle’s Infrastructure Cloud Revenue (IaaS) rises 36%

Oracle Cloud Infrastructure Surges 55% on AI Contracts

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Subscribe to Daily Newsletter
  • NextGenInfra.io

© 2025 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version