Tech

Wafer-scale accelerators could redefine AI

Share
Share
Wafer-scale accelerators could redefine AI
Credit: Device (2025). DOI: 10.1016/j.device.2025.100834

The promise of a new type of computer chip that could reshape the future of artificial intelligence and be more environmentally friendly is explored in a technology review paper published by UC Riverside engineers in the journal Device.

Known as wafer-scale accelerators, these massive chips made by Cerebras are built on dinner plate-sized silicon wafers, in stark contrast to traditional graphics processing units, or GPUs, which are no bigger than a postage stamp.

The paper by a cross-disciplinary UCR team concludes that wafer-scale processors can deliver far more computing power with much greater energy efficiency—traits that are needed as AI models grow ever larger and more demanding.

“Wafer-scale technology represents a major leap forward,” said Mihri Ozkan, a professor of electrical and computer engineering in UCR’s Bourns College of Engineering and the paper’s lead author. “It enables AI models with trillions of parameters to run faster and more efficiently than traditional systems.”

In addition to Ozkan, co-authors include UCR graduate students Lily Pompa, Md Shaihan Bin Iqbal, Yiu Chan, Daniel Morales, Zixun Chen, Handing Wang, Lusha Gao, and Sandra Hernandez Gonzalez.

GPUs became essential tools for AI development because they can perform many computations at once—ideal for processing images, language, and data streams in parallel. The execution of thousands of parallel operations simultaneously allows for driverless cars to interpret the world around them to avoid collisions, for images to be generated from text, and for ChatGPT to suggest dozens of meal recipes from a specific list of ingredients.

But as AI model complexity increases, even high-end GPUs are starting to hit performance and energy limits.

“AI computing isn’t just about speed anymore,” Ozkan said. “It’s about designing systems that can move massive amounts of data without overheating or consuming excessive electricity.”

The UCR analysis compares today’s standard GPU chips with wafer-scale systems like the Cerebras Wafer-Scale Engine 3 (WSE-3), which contains 4 trillion transistors and 900,000 AI-specific cores on a single wafer. Tesla’s Dojo D1, another example, includes 1.25 trillion transistors and nearly 9,000 cores per module. These systems are engineered to eliminate the performance bottlenecks that occur when data must travel between multiple smaller chips.

“By keeping everything on one wafer, you avoid the delays and power losses from chip-to-chip communication,” Ozkan said.

The paper also highlights technologies such as chip-on-wafer-on-substrate packaging, which could make wafer-scale designs more compact and easier to scale, with a potential 40-fold increase in computational density.

While these systems offer substantial advantages, they’re not suited for every application. Wafer-scale processors are costly to manufacture and less flexible for smaller-scale tasks. Conventional GPUs, with their modularity and affordability, remain essential in many settings.

“Single-chip GPUs won’t disappear,” Ozkan said. “But wafer-scale accelerators are becoming indispensable for training the most advanced AI models.”

The paper also addresses a growing concern in AI: sustainability. GPU-powered data centers use enormous amounts of electricity and water to stay cool. Wafer-scale processors, by reducing internal data traffic, consume far less energy per task.

For example, the Cerebras WSE-3 can perform up to 125 quadrillion operations per second while using a fraction of the power required by comparable GPU systems. Its architecture keeps data local, lowering energy draw and thermal output.

Meanwhile, NVIDIA’s H100 GPU—the backbone of many modern data centers—offers flexibility and high throughput, but at greater energy cost. With an efficiency rate of about 7.9 trillion operations per second per watt, it also requires extensive cooling infrastructure, often involving large volumes of water.

“Think of GPUs as busy highways—effective, but traffic jams waste energy,” Ozkan said. “Wafer-scale engines are more like monorails: direct, efficient, and less polluting.”

Cerebras reports that inference workloads on its WSE-3 system use one-sixth the power of equivalent GPU-based cloud setups. The technology is already being used in climate simulations, sustainable engineering, and carbon-capture modeling.

“We’re seeing wafer-scale systems accelerate sustainability research itself,” Ozkan said. “That’s a win for computing and a win for the planet.”

However, heat remains a challenge. With thermal design power reaching 10,000 watts, wafer-scale chips require advanced cooling. Cerebras employs a glycol-based loop built into the chip package, while Tesla uses a coolant system that distributes liquid evenly across the chip surface.

The authors also emphasize that up to 86% of a system’s total carbon footprint can come from manufacturing and supply chains, not just energy use. They advocate for recyclable materials and lower-emission alloys, along with full lifecycle design practices.

“Efficiency starts at the factory,” Ozkan said. “To truly lower computing’s impact, we need to rethink the whole process—from wafer to waste. This review is the result of a deep interdisciplinary collaboration. We hope it serves as a roadmap for researchers, engineers, and policymakers navigating the future of AI hardware.”

More information:
Mihrimah Ozkan et al, Performance, efficiency, and cost analysis of wafer-scale AI accelerators vs. single-chip GPUs, Device (2025). DOI: 10.1016/j.device.2025.100834

Provided by
University of California – Riverside


Citation:
Wafer-scale accelerators could redefine AI (2025, June 17)
retrieved 17 June 2025
from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
A new tool predicts when users will reject a new technology
Tech

A new tool predicts when users will reject a new technology

If you can predict that a new technology will not be adopted,...

This futuristic dual-screen laptop looks incredible, but one disappointing flaw might ruin it for power users
Tech

This futuristic dual-screen laptop looks incredible, but one disappointing flaw might ruin it for power users

Aura Ultrabook Dual 14″ Touch is perfect for presentations and scrolling through...

Two-actuator robot combines efficient ground rolling and spinning flight in one design
Tech

Two-actuator robot combines efficient ground rolling and spinning flight in one design

Weight breakdown of the ATOM prototype. The battery and the frame contribute...

How LLM architecture and training data shape AI’s position bias
Tech

How LLM architecture and training data shape AI’s position bias

Three types of attention masks and their corresponding directed graphs G used...