Google introduces 8th-gen TPU: Two AI chips target training and inference, challenging Nvidia's pain points

📄Full Article· Automatically extracted by trafilaturaGemini 翻譯1896 words

At Cloud Next 2026, Google announced its eighth-generation TPU, splitting training and inference tasks into two dedicated chips for the first time: TPU 8t and TPU 8i, claiming an 80% improvement in performance per dollar. (Previous coverage: Anthropic announces partnership with Broadcom, Google expands TPU chip adoption, annual revenue jumps to $30 billion) (Background: Even Nvidia's favorite isn't safe! Core Scientific's "largest shareholder" rejects CoreWeave acquisition: $9 billion valuation too low) When Google's first-generation TPU (Tensor Processing Unit, a custom chip designed for AI computing) was released in 2016, the market predicted it would be the "Nvidia killer." The result: Nvidia's market cap has grown dozens of times over the past decade, and most of those predictions did not materialize. This time, at the Cloud Next 2026 conference, Google launched the eighth-generation TPU and made a decision it had never made before: to separate training and inference, using a dedicated chip for each. "Training" and "inference" are two completely different stages of AI computing. - Training is the process of a model learning from massive amounts of data, requiring extremely high computational density. - Inference is the process of the model responding to each user query after learning, requiring low latency and low cost. In the past, Google used the same TPU to handle both needs, but starting with the eighth generation, the two are officially separated. TPU 8t is a training-dedicated chip: it features 12.6 petaFLOPS of 4-bit floating-point computing power (petaFLOPS, or one quadrillion floating-point operations per second; the higher the number, the faster the computation), 216 GB of high-bandwidth memory, and 6.5 TB/s of memory bandwidth. Google claims this chip is 3 times faster at training than the previous generation and allows over 1 million TPUs to collaborate simultaneously in a single cluster. TPU 8i is an inference-dedicated chip: it features 10.1 petaFLOPS of FP4 computing power, 288 GB of high-bandwidth memory, and a larger 384 MB on-chip memory (used to reduce data movement latency). Google claims it offers an 80% improvement in inference performance per dollar compared to the previous Ironwood TPU, particularly excelling under low-latency targets. Both chips are expected to be officially available to the public within 2026. Google's move to split the chips directly targets one of Nvidia's weaknesses: versatility. Nvidia's GPUs are a single product line serving both training and inference. The upcoming Nvidia Vera Rubin chip is specified at 35 petaFLOPS of FP4 computing, 288 GB of HBM4 memory, and 22 TB/s of memory bandwidth—the raw computing numbers still lead Google's TPU 8t at 12.6 petaFLOPS. But purely comparing petaFLOPS obscures another dimension: cost structure. Competition in the inference market is essentially about the "cost per model response." Google has set the design goal for the TPU 8i to lower the unit cost of inference, which is the number that large model companies like Anthropic and OpenAI care about most. It is worth noting that Anthropic has announced it will expand its Claude training and service infrastructure to "multi-gigawatt level" TPU capacity, becoming the largest publicly disclosed TPU customer, and OpenAI has also begun accessing Google's TPU capacity. However, Google has not dismissed Nvidia. It simultaneously announced that its cloud will offer Nvidia Vera Rubin chips by the end of 2026. Furthermore, the two companies are collaborating to enhance the "Falcon" network protocol, a data center networking technology open-sourced by Google in 2023, aimed at making Nvidia systems run more efficiently on Google Cloud. The challenge for Google's TPU has never been just about whether it can beat Nvidia in specifications. The real problem is the ecosystem. Nvidia possesses CUDA, an AI computing software framework that developers have deeply relied on for twenty years (which can be understood as the programming language environment used by all AI engineers). From training scripts and optimization tools to reproduction environments for academic papers, almost everything is built on CUDA. Google's TPU has its own compilation tools, but every engineer who needs to port a workload represents a point of friction. Amazon and Microsoft face the same structural difficulty: they are both developing their own chips, both trying to reduce their reliance on Nvidia, and both continuing to purchase Nvidia chips. The logic of hyperscale cloud providers is not to eliminate Nvidia, but to keep more profit for themselves on workloads that their own chips can effectively cover. Google splitting training

Data Status✓ Full text extractedRead Original (動區 BlockTempo)

🔍Historical Similar Events· Keyword + Asset Matching6 items

2026-04-23

Google Takes Aim at Nvidia With New Tensor Chips to Power AI Boom

Similarity 130%關鍵字 nvidia/google

2026-04-23

Google admits partnership with Apple: Gemini to take over Siri features, rumored $1 billion annual payment to Apple

Similarity 120%關鍵字 google同分類 zh

2026-04-23

Google Maps integrates Gemini, launching three AI features focused on enterprise Agents

Similarity 120%關鍵字 google同分類 zh

2026-04-22