Your new experience awaits. Try the new design now and help us make it even better

Breaking the memory wall: next-generation artificial intelligence hardware

Explainer

Front Sci, 16 December 2025

This explainer is part of an article hub, related to lead article https://doi.org/10.3389/fsci.2025.1611658

Over the memory wall: how new chips could unlock faster, greener AI

Most of the energy used by artificial intelligence (AI) goes into moving data rather than computing. This creates a “memory wall” that slows systems and wastes power as models scale.

In their Frontiers in Science article, Roy et al. outline hardware strategies that bring computation closer to where data live, draw on brain-inspired signaling, and even use controlled randomness to cut energy use. They demonstrate these ideas using autonomous drone navigation as an example.

This explainer summarizes the article’s main points.

What is a “chip” and what is the “memory wall” in AI hardware?

A chip is a tiny piece of hardware packed with circuits. In AI systems, chips often specialize in either computing (for example, CPUs) or memory (for example, DRAM).

The memory wall refers to the loss in speed and efficiency when modern computers keep computing, and memory is mostly separate. Data must shuttle between these separate chips, and this transfer costs time and energy. As models grow, those costs dominate—so performance is limited by moving data, not by the speed of math. The authors say this “memory wall” drives most of AI’s energy use and wait time.

Most AI today uses artificial neural networks (ANNs). These models run through the same sequence of processing steps every time, even when little has changed, which forces a lot of data to move between compute and memory.

The authors argue that overcoming this wall is critical for faster, better AI.

The article presents two promising routes over the wall:

  • compute-in-memory (CIM) brings computation to where the data live—within the memory itself—to cut data movement

  • stochastic (approximate) hardware uses controlled randomness or low precision where models can tolerate noise.

How does compute-in-memory help?

CIM does the math where the data sit, folding simple tasks into, or right beside, the memory. Because far fewer values must travel between separate compute and memory chips, you cut out the main sources of cost—data movement and wait time—so power consumption drops and processing speeds up.

CIM is also scalable: as models grow, CIM can grow with them by adding more memory tiles that each computes locally, keeping energy per task lower and responses faster than shuttling everything to off-chip memory.

Why borrow ideas from the brain—and how do spiking neural networks cut energy?

Unlike ANNs’ always-on, frame-based computing, spiking neural networks (SNNs) only fire when something changes, so the chip can skip idle work. The brain is frugal: neurons stay quiet and send a brief spike only when something changes.

SNNs copy this rule. Each unit accumulates input and fires a short spike when a threshold is crossed; otherwise, nothing happens. Chips can then work in response to events—doing computation only on spikes—so they avoid unnecessary computation and move much less data.

Putting simple SNN updates into CIM keeps each connection’s strength and the neuron’s running charge side by side, so the chip doesn’t keep fetching them from elsewhere—cutting data trips and saving power.

Why design the model and the chip together?

So far, we’ve cut the cost of memory trips with CIM and cut how often those trips are needed with brain-inspired SNNs. Co-design makes those gains add up.

Instead of building a chip and squeezing a model onto it later, the authors argue for shaping them together: choose which operations live in memory, arrange data flows to avoid shuttling, and tune models to exploit sparsity and event timing.

The result is one platform that runs both styles efficiently, with shared building blocks and a memory hierarchy that fits both frame-based and event-based data.

When can randomness make hardware cheaper—and what is “stochastic hardware”?

Many AI models cope well with noise (small, random errors). Stochastic hardware leans into that: it uses circuits or devices that are naturally approximate or random, or it runs at lower precision, to save energy.

With co-design, the model is trained and scheduled to expect this noise, so you trade a tiny accuracy hit (or none) for lower power and simpler circuits. Done this way, randomness can be a feature that cuts energy without breaking the task.

How would this help AI in the real world—for example, on a rescue drone?

A search-and-rescue drone must “see,” plan, and act in real time without relying on the cloud. That calls for onboard AI that is accurate, fast, and frugal in size, weight, and power.

The authors propose tiny hybrid networks—frame-based tasks on conventional networks plus event streams on SNNs—running on a converged, CIM-rich, event-driven platform. This mix cuts data movement, aligns compute with sparse spikes, and keeps decisions on the vehicle.

In practice, this could look like:

  • sensing: cameras and event sensors feed local memory arrays that also compute (CIM), reducing off-chip traffic

  • perception: convolutions and attention on CIM tiles, plus SNN updates done in-memory so each connection’s strength and the neuron’s running charge sit side by side

  • planning and control: low-power, event-driven cores that exploit sparsity for mapping, obstacle avoidance, and flight control.

What are the main challenges and next steps?

By computing in memory, exploiting spikes, and using controlled randomness—designed together from device to algorithm—the authors lay out a roadmap to faster, more efficient AI hardware for everything from data centers to rescue drones.

Turning that roadmap into practice means placing CIM wisely: deciding which layers and operations belong in memory, at what precision, and at which level of the memory hierarchy so speed and efficiency rise without hurting accuracy. The field also needs fair, shared benchmarks that test ANNs, SNNs, and hybrids on the same tasks with accuracy and energy reported side by side.

Finally, true convergence will require tight model–hardware co-design so one platform runs ANNs and SNNs efficiently, and clear guidance on where stochastic hardware can safely cut power without breaking the task.