Your new experience awaits. Try the new design now and help us make it even better

MINI REVIEW article

Front. Comput. Sci.

Sec. Networks and Communications

From Distributed Tracing to Proactive SLO Management: A Mini-Review of Trace-Driven Performance Prediction for Cloud-Native Microservices

Provisionally accepted
Miaopeng  YuMiaopeng Yu1,2Haonan  LiuHaonan Liu1,2Jinran  DuJinran Du1,2Kequan  LinKequan Lin3Tao  DaiTao Dai1,2Yanzhe  FuYanzhe Fu1,2Chunyan  YangChunyan Yang1,2*
  • 1Electric Power Research Institute, CSG, Guangzhou, China
  • 2Guangdong Provincial Key Laboratory of Power System Network Security, Guangzhou, China
  • 3China Southern Power Grid, Guangzhou, China

The final, formatted version of the article will be published soon.

Cloud-native microservices improve development velocity and elasticity, but they also create complex and dynamic service dependencies. Resource contention, queue buildup, and downstream slowdowns can propagate through call chains, amplifying end-to-end tail latency (e.g., p95/p99) and increasing Service Level Objective (SLO) violation risks. While many studies focus on post-hoc anomaly detection and root-cause analysis, industrial operations increasingly demand proactive capabilities, like predicting performance risks before a request finishes, issuing early warnings from partial trace prefixes, and producing actionable signals for mitigation. This mini-review synthesizes recent progress on trace-driven proactive SLO management. We summarize problem formulations and evaluation protocols for SLO violation and tail-quantile prediction, prefix early warning under precision constraints, and actionable intermediate outputs such as bottleneck candidate ranking and what-if estimation. We then survey modeling approaches spanning feature-based baselines, sequence models, graph neural networks, sequence-graph fusion, and multimodal/causal extensions, highlighting practical issues such as class imbalance, sampling-induced missing spans, and topology drift. Finally, we survey commonly used public benchmarks and traces, and discuss open challenges toward deployable, trustworthy proactive SLO management.

Keywords: causal inference, Distributed tracing, Graph neural networks, Microservices, multimodal learning, prefix-based early warning, SLO violation prediction, tail latency prediction

Received: 09 Jan 2026; Accepted: 03 Feb 2026.

Copyright: © 2026 Yu, Liu, Du, Lin, Dai, Fu and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Chunyan Yang

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.