Skip to content

floritange/Awesome-AIOps-Ops4AI-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Awesome AIOps Ops4AI Papers ⭐️

This repository is used to collect papers and code in the fields of AIOps, Ops4AI, LLM, software engineering, observability, and reliability.

AI4Ops

LLM4AIOps

LLM4RCA
  • 24_holmesgpt [code]

  • 23_k8sgpt [code]

  • 24_OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures? [paper] [code] [data]

  • 24_ASE_LasRCA: The Potential of One-Shot Failure Root Cause Analysis: Collaboration of the Large Language Model and Small Classifier [paper] [code]


  • 24_SIGOPSReview_LLexus: an AI agent system for incident management [paper]
  • 23_arXiv_RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models [paper]
  • 24_EuroSys_RCACOPILOT: Automatic Root Cause Analysis via Large Language Models for Cloud Incidents [paper]
LLM4AD

  • 24_FSE_MonitorAssistant: Simplifying Cloud Service Monitoring via Large Language Models [paper]
  • 24_arXiv_LLMAD: Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection [paper]

AIOps Challenge

2020    M, T       Telecom
2021    M, T, L    Bank
2022    M, T, L    Market

AIOps

RCA

Survey
  • 24_arXiv_A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends [paper]
  • 24_Root Cause Analysis for Distributed Systems [paper]
Multimodal RCA
  • 23_FSE_Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data [paper] [code]
  • 23_ICSE_Eadro: An End-to-End Troubleshooting Framework for Microservices on Multi-source Data [paper] [code]
  • 24_KDD_Microservice root cause analysis with limited observability through intervention recognition in the latent space [paper] [code&data]
  • 24_FSE_Chain-of-event: Interpretable root cause analysis for microservices through automatically learning weighted event causal graph [paper] [code]
  • 24_ASE_ART: A Unified Unsupervised Framework for Incident Management in Microservice Systems [paper] [code&data] ✅ AIOps Challenge 2021, data for GNN

  • 24_ASE_Giving Every Modality a Voice in Microservice Failure Diagnosis via Multimodal Adaptive Optimization [paper]
Metric-Based RCA
  • 24_ASE_RCAEval: Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We? [paper] [code]
Log-Based RCA
Trace-Based RCA

AD

Metric
AutoML
  • 24_ICDE_ADecimo: Model Selection for Time Series Anomaly Detection [paper] [code]
  • 23_ISSRE_AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment [paper] [code] [data]
  • 24_VLDB_AutoTSAD: Unsupervised Holistic Anomaly Detection for Time Series Data [paper] [code]
  • 25_ICSE_ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems [paper] [code]
  • MicroServo: A Scenario-Oriented Benchmark for Assessing AIOps Algorithms in Microservice Management [paper] [code]
uni
  • 24_KDD_Pre-trained KPI Anomaly Detection Model Through Disentangled Transformer [paper] [code]
  • 24_SOSP(Best paper)_FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production Monitoring [paper]
Log
  • 24_ASE_End-to-end automl for unsupervised log anomaly detection [paper]

Ops4AI

Reference

About

This repository is used to collect papers and code in the fields of AIOps, Ops4AI, LLM, software engineering, observability, and reliability.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors