Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. High Perform. Comput.

Sec. Architecture and Systems

This article is part of the Research TopicEmerging Trends in Software Tools for Exascale Application DevelopmentView all articles

Processor Simulation as a Tool for Performance Engineering

Provisionally accepted
  • 1Forschungszentrum Julich GmbH, Jülich, Germany
  • 2Kungliga Tekniska Hogskolan, Stockholm, Sweden
  • 3Rijksuniversiteit Groningen, Groningen, Netherlands

The final, formatted version of the article will be published soon.

The diversity of processor architectures used for High-Performance Computing (HPC) applications has increased significantly over the last few years. This trend is expected to continue for different reasons, including the emergence of various instruction set extensions. Examples are the renewed interest in vector instructions like Arm's Scalable Vector Extension (SVE) or RISC-V's RVV. For application developers, research software developers, and performance engineers, the increased diversity and complexity of architectures have led to the following challenges: Limited access to these different processor architectures and more difficult root cause analysis in case of performance issues. To address these challenges, we propose leveraging the much-improved capabilities of processor simulators such as gem5. We enhanced this simulator with a performance analysis framework. We extend available performance counters and introduce new analysis capabilities to track the temporal behaviour of running applications. An algorithm has been implemented to link these statistics to specific regions. The resulting performance profiles allow for the identification of code regions with the potential for optimization. The focus is on observables to monitor quantities that are usually not directly accessible on real hardware. Different algorithms have been implemented to identify potential performance bottlenecks. The framework is evaluated for different types of HPC applications like the molecular-dynamics application GROMACS, Ligra, which implements the breadth-first search (BFS) algorithm, and a kernel from the Lattice QCD solver DD-αAMG.

Keywords: High-performance computing (HPC), Processor Architectures, Instruction set extensions, Vector Instructions, Arm's Scalable Vector Extension (SVE), RISC-V's RVV, performance counters, Performance profiles

Received: 18 Jul 2025; Accepted: 27 Oct 2025.

Copyright: © 2025 Falquez, Long, Ho, Suarez, Pleiter and Pleiter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dirk Pleiter, pleiter@kth.se

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.