Google Cloud Details Ironwood TPUs and Axion CPUs for AI Inference

Jim Carroll

5 months ago

Google Cloud announced a sweeping expansion of its AI infrastructure portfolio with the launch of Ironwood, its seventh-generation Tensor Processing Unit (TPU), and Axion, a new line of Arm-based CPUs designed for general-purpose and AI-adjacent workloads. Together, these represent the most comprehensive hardware refresh in Google’s compute lineup since the debut of the TPU v5 family, reflecting the company’s long-term strategy to optimize the entire AI stack—from silicon to software—to power the “age of inference.”

Ironwood TPUs will be generally available in the coming weeks, delivering a 10× increase in peak performance over TPU v5p and more than 4× higher performance per chip than TPU v6e (Trillium) for both training and inference workloads. Purpose-built for large-scale model training, reinforcement learning, and real-time inference, Ironwood extends Google’s design philosophy of tightly integrating custom silicon with advanced cooling, optical interconnects, and orchestration software. Each Ironwood superpod can connect 9,216 TPUs using 9.6 Tbps Inter-Chip Interconnect (ICI) bandwidth, supporting a total of 1.77 petabytes of high-bandwidth memory (HBM). The system can dynamically recover from faults using optical circuit switching (OCS) for live workload rerouting.

Google describes Ironwood as part of its AI Hypercomputer framework—an integrated supercomputing environment uniting compute, networking, storage, and software for maximal performance. The new architecture allows organizations to deploy frontier models at global scale while maintaining near-constant uptime. According to Google, AI Hypercomputer users have achieved a 353% three-year ROI and 28% lower IT costs on average. Software enhancements for Ironwood include tighter integration with Google Kubernetes Engine (GKE), new optimization techniques in MaxText, support for vLLM to simplify TPU-GPU switching, and Inference Gateway, which cuts time-to-first-token latency by up to 96% and reduces serving costs by 30%.

Early adopters include Anthropic, which plans to access up to 1 million TPUs to accelerate its Claude model family. “Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect,” said James Bradbury, Head of Compute at Anthropic. Other launch partners include Lightricks, using Ironwood to improve multimodal image and video generation, and Essential AI, which called the platform “incredibly easy to onboard.”

Alongside Ironwood, Google expanded its Axion CPU portfolio, designed for general-purpose compute that complements AI workloads. Built on Arm Neoverse cores, Axion aims to improve cost, performance, and energy efficiency for applications such as microservices, data analytics, databases, and web serving. The new N4A instance (in preview) provides up to 64 vCPUs, 512GB of DDR5 memory, and 50 Gbps networking, while the upcoming C4A metal offers bare-metal servers with up to 96 vCPUs and 100 Gbps networking for specialized environments.

Axion has already delivered measurable gains for early users. Vimeo reported a 30% performance improvement in video transcoding workloads compared with x86 VMs. ZoomInfo measured a 60% better price-performance ratiofor data pipelines, and Rise cited a 20% reduction in compute consumption using C4A instances for ad-serving infrastructure. These deployments demonstrate Google’s ability to extend custom silicon innovation beyond AI accelerators into the wider compute ecosystem.

Both Ironwood and Axion reinforce Google’s commitment to vertical integration, aligning hardware, cooling systems, networking fabrics, and open software layers within a single operational domain. The company notes that Ironwood’s third-generation liquid cooling system supports GigaWatt-scale data centers with 99.999% uptime since 2020, while Titanium SSDs and Hyperdisk storage continue to reduce I/O bottlenecks across diverse workloads.

Ironwood TPU Launch: Google introduced Ironwood, its seventh-generation Tensor Processing Unit (TPU), delivering a 10× peak performance gain over TPU v5p and 4× higher performance per chip than the v6e (Trillium) generation.
Scale and Connectivity: Each Ironwood superpod connects 9,216 TPUs through a 9.6 Tbps Inter-Chip Interconnect (ICI) fabric with 1.77 PB of shared high-bandwidth memory (HBM)—one of the densest AI interconnects in commercial use.
System Resilience: The platform uses optical circuit switching (OCS) for dynamic workload rerouting, ensuring uptime continuity during large-scale inference and training.
Liquid-Cooled at Scale: Ironwood features third-generation liquid cooling and has been deployed at GigaWatt data center scale with 99.999% uptime since 2020.
AI Hypercomputer Architecture: Ironwood integrates into Google’s AI Hypercomputer, a full-stack supercomputing environment that combines compute, networking, storage, and co-designed software for high-efficiency AI workloads.
Anthropic Partnership: Anthropic will scale up to 1 million TPUs on Ironwood to accelerate the training and inference of its Claude models—Google’s largest external TPU deployment yet.
Arm-Based Axion CPUs: Google also launched Axion N4A (in preview) and C4A metal (coming soon), custom Arm Neoverse CPUs optimized for general-purpose and AI-adjacent workloads, offering up to 2× price-performance vs x86 VMs.
Customer Adoption: Early adopters such as Vimeo, ZoomInfo, and Rise report 30–60% performance improvements and double-digit cost savings using Axion instances for core compute workloads.
Vertical Silicon Strategy: Google now manufactures three major in-house chip families—TPUs, Axion CPUs, and Tensor mobile SoCs—as part of a vertically integrated design philosophy linking hardware, data center infrastructure, and AI models.
Strategic Positioning: Ironwood and Axion position Google against NVIDIA’s GB200, AWS Trainium/Graviton, and Microsoft’s Maia/Cobalt programs, reinforcing its leadership in custom silicon co-designed with software and cloud orchestration for AI-scale computing.

“Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work,” said James Bradbury, Head of Compute at Anthropic. “Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect.”

🌐 Analysis:

Google’s Ironwood and Axion announcements highlight a decisive moment in the evolution of cloud infrastructure — the transition from AI training at scale to inference at planetary scale. This pivot reflects growing demand from enterprises and model developers to deploy frontier models efficiently while managing the spiraling costs of compute. Ironwood represents the culmination of Google’s decade-long TPU roadmap, delivering not just performance but system-level reliability through optical switching, liquid cooling, and hardware-software co-design.

Strategically, Google is responding to intensifying competition from NVIDIA’s GB200 NVL72 superchip clusters, AWS’s Trainium2 and Inferentia3, and Microsoft’s Maia 100 and Cobalt 100 initiatives. By introducing Ironwood and Axion simultaneously, Google demonstrates a dual-pronged approach—AI-specific acceleration via TPUs and energy-efficient general-purpose compute via Arm-based Axion CPUs. This combination gives Google flexibility to handle both the training and deployment of LLMs while providing cost-optimized compute for data preparation, microservices, and inference orchestration.

Ironwood’s interconnect speed of 9.6 Tbps and shared 1.77 PB memory pool represent one of the densest AI fabrics in commercial deployment, enabling massive parallelization across 9,000+ TPUs per pod. Its integration into Google’s Jupiter data center network, which links multiple superpods into clusters, offers a hyperscale platform comparable to NVIDIA’s NVLink and NVSwitch fabric topology. By embedding optical circuit switching, Google eliminates network fragility at the cluster level — a major reliability advantage for continuous inference workloads supporting tools like Gemini, Claude, Veo, and Imagen.

The Axion line, meanwhile, deepens Google’s investment in Arm-based compute, aligning with broader hyperscaler trends toward in-house CPUs. Like AWS’s Graviton and Microsoft’s Cobalt, Axion aims to reduce reliance on x86 vendors while optimizing for energy and cost efficiency. Its deployment across GKE and Dataflow workloads suggests Google intends to migrate much of its own internal and customer-facing infrastructure to Arm architectures over time, with Axion forming a baseline compute layer beneath Ironwood’s high-intensity accelerators.

In the broader AI ecosystem, this co-design strategy strengthens Google’s vertical control over both software and silicon, echoing the early TPU era that enabled breakthroughs like the Transformer model. Ironwood’s debut further extends that lineage—serving as the hardware foundation for Google’s next-generation Gemini models and potentially for third-party deployments of open frontier models. Together with Axion, the hardware roadmap positions Google Cloud as a full-stack infrastructure provider for AI-native enterprises, aiming to balance scale, efficiency, and cost predictability across heterogeneous compute demands.

🌐 We’re tracking the latest developments in semiconductors and AI infrastructure. Follow our ongoing coverage at: https://convergedigest.com/category/semiconductors/