Skip to content

Latest commit

 

History

History
141 lines (117 loc) · 19.4 KB

papers.md

File metadata and controls

141 lines (117 loc) · 19.4 KB

Papers

Surveys & Tutorials & Magazines

  1. [Blog] What is AIOps? Artificial Intelligence for IT Operations Explained, by Seth Paskin. [BMC Software]
  2. [Book'14] I Heart Logs, by Jay Kreps.
  3. [Book'12] Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management, by Anton A. Chuvakin, Kevin J. Schmidt, Christopher Phillips.
  4. [Thesis] Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems, by Weiyi Shang.
  5. [IST'20] A Systematic Literature Review on Automated Log Abstraction Techniques, by Diana El-Masri, Fabio Petrillo, Yann-Gael Guéhéneuc, Abdelwahab Hamou-Lhadj, Anas Bouzianea.
  6. [IEEE Software'16] Operational-Log Analysis for Big Data Systems: Challenges and Solutions, by Andriy V. Miranskyy, Abdelwahab Hamou-Lhadj, Enzo Cialini, Alf Larsson [IBM, Ericsson].

Anomaly Detection

  1. [ICDM'20] Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs, by Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao.
  2. [IJCAI'19] LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs, by Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, Rong Zhou. [Huawei]
  3. [FSE'19] Robust Log-based Anomaly Detection on Unstable Log Data, by Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, Junjie Chen, Xiaoting He, Randolph Yao, Jian-Guang Lou, Murali Chintalapati, Furao Shen, and Dongmei Zhang. [Microsoft]
  4. [ICSE'19] Energy-Based Anomaly Detection A New Perspective for Predicting Software Failures, by Cristina Monni, Mauro Pezzè.
  5. [DSN'19] Robust Anomaly Detection on Unreliable Data, by Zilong Zhao, Sophie Cerf, Robert Birke, Bogdan Robu, Sara Bouchenak, Sonia Ben Mokhtar, Lydia Y. Chen. [ABB Research]
  6. [BigData'18] Evaluation of Distributed Machine Learning Algorithms for Anomaly Detection from Large-Scale System Logs: A Case Study, by Merve Astekin, Harun Zengin, Hasan Sözer.
  7. [OSDI'18] Capturing and Enhancing In Situ System Observability for Failure Detection, by Peng Huang, Chuanxiong Guo, Jacob R. Lorch, Lidong Zhou, Yingnong Dang. [ByteDance, Microsoft]
  8. [IEEE Access'18] An Integrated Method for Anomaly Detection From Massive System Logs, by Zhaoli Liu, Tao Qin, Xiaohong Guan, Hezhi Jiang, Chenxu Wang.
  9. [NOMS'18] An Unsupervised Framework for Detecting Anomalous Messages from Syslog Log Files, by Risto Vaarandi, Bernhards Blumbergs, Markus Kont.
  10. [CCS'17] DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning, by Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar.
  11. [ISSRE'17] Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection, by Christophe Bertero, Matthieu Roy, Carla Sauvanaud and Gilles Tredan.
  12. [ISSRE'16] Experience Report: System Log Analysis for Anomaly Detection, by Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu.
  13. [ICDM'09] Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, by Qiang Fu, Jian-Guang Lou, Yi Wang, Jiang Li. [Microsoft]

Failure Prediction

Diagnosis/ Debugging/ Root Cause Analysis

  1. [CLOUD'19] An Approach to Cloud Execution Failure Diagnosis Based on Exception Logs in OpenStack, by Yue Yuan, Wenchang Shi, Bin Liang, Bo Qin.
  2. [SOSP'19] [The Prefix Inflection Theorem: A Principled Debugging Approach for Locating Root Cause], by Yongle Zhang, Kirk Rodrigues, Yu Luo, Michael Stumm, Ding Yuan.
  3. [FSE'19] Latent Error Prediction and Fault Localization for Microservice Applications by Learning from System Trace Logs, by Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Dewei Liu, Qilin Xiang, and Chuan He.
  4. [ICSE'19] An Empirical Study On Leveraging Logs For Debugging Production Failures, by An Ran Chen.
  5. [OSDI'18] REPT: Reverse Debugging of Failures in Deployed Software, by Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, Insu Yun. [Microsoft]
  6. [Ebook'17] The Complete Guide to Automated Root Cause Analysis, by Tali Soroker. [OverOps]
  7. [ICSE'13] Assisting Developers of Big Data Analytics Applications When Deploying on Hadoop Clouds, by Weiyi Shang, Zhen Ming Jiang, Hadi Hemmati, Bram Adams, Ahmed E. Hassan and Patrick Marin. [ACM SIGSOFT Distinguished Paper Award]
  8. [ICSM'13] Mining Telecom System Logs to Facilitate Debugging Tasks, by Alf Larsson, Abdelwahab Hamou-Lhadj. [Ericsson]
  9. [ASPLOS'10] SherLog: Error Diagnosis by Connecting Clues from Run-time Logs , by Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou and Shankar Pasupathy.

Failure Reproduction

  1. [SOSP'17] Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach, by Yongle Zhang, Serguei Makarov, Xiang Ren, David Lion, Ding Yuan.

Performance Issues

  1. [SOSP'17] Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle , by Xu Zhao, Kirk Rodrigues, Yu Luo, Ding Yuan, and Michael Stumm.
  2. [SOSP'17] lprof: A Non-intrusive Request Flow Profiler for Distributed Systems, by Xu Zhao, Yongle Zhang, David Lion, Muhammad FaizanUllah, Yu Luo, Ding Yuan, and Michael Stumm.

Energy Issues

Security Issues

Issue Categorization

  1. [FSE'18] Identifying Impactful Service System Problems via Log Analysis, by Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, Dongmei Zhang. [Microsoft]
  2. [IWQoS'18] Device-Agnostic Log Anomaly Classification with Partial Labels, by Weibin Meng, Ying Liu, Shenglin Zhang, Dan Pei, Hui Dong, Lei Song, Xulong Luo. [Baidu]
  3. [BigData'17] WEAC: Word Embeddings for Anomaly Classification from Event Logs, by Amit Pande, Vishal Ahuja. [Target Corporation]

Duplicate Issues Identification

  1. [TSE'18] Revisiting the Performance Evaluation of Automated Approaches for the Retrieval of Duplicate Issue Reports, by Mohamed Sami Rakha, Cor-Paul Bezemer, Ahmed E. Hassan.
  2. [DSN'14] Mining Historical Issue Repositories to Heal Large-Scale Online Service Systems, by Rui Ding, Qiang Fu, Jian-Guang Lou, Qingwei Lin, Dongmei Zhang, Tao Xie. [Microsoft]
  3. [ICDM'14] Identifying Recurrent and Unknown Performance Issues, by Meng-Hui Lim, Jian-Guang Lou, Hongyu Zhang, Qiang Fu, Andrew Beng Jin Teoh, Qingwei Lin, Rui Ding, Dongmei Zhang. [Microsoft]

Software Testing

  1. [ICSE'19] Mining Historical Test Logs to Predict Bugs and Localize Faults in the Test Logs, by Anunay Amar, Peter Rigby.
  2. [ASE'18] An Automated Approach to Estimating Code Coverage Measures via Execution Logs, by Boyuan Chen, Jian Song, Peng Xu, Xing Hu, Zhen Ming (Jack) Jiang.

Bug Finding

  1. [OSDI'18] Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing, by Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, Vijay Chidambaram. [VMware]
  2. [FSE'18] CloudRaid: Hunting Concurrency Bugs in the Cloud via Log-Mining, by Jie Lu, Feng Li, Lian Li, Xiaobing Feng.

Workflow Mining

  1. [ASE'19] Statistical Log Differencing, by Lingfeng Bao, Nimrod Busany, David Lo, Shahar Maoz.
  2. [ICSE'18] Inferring Hierarchical Motifs from Execution Traces, by Saba Alimadadi, Ali Mesbah, Karthik Pattabiraman.
  3. [FSE'18] Using Finite-State Models for Log Differencing, by Hen Amar, Lingfeng Bao, Nimrod Busany, David Lo, Shahar Maoz.

Logging Practices

  1. [SANER'20] MobiLogLeak: A Preliminary Study on Data Leakage Caused by Poor Logging Practices, by Rui Zhou, Mohammad Hamdaqa, Haipeng Cai, and Abdelwahab Hamou-Lhadj.
  2. [TSE'19] Which Variables Should I Log?, by Zhongxin Liu, Xin Xia, David Lo, Zhenchang Xing, Ahmed E. Hassan, Shanping Li.
  3. [ICSE'19] DLFinder: Characterizing and Detecting Duplicate Logging Code Smells, by Zhenhao Li, Tse-Hsun Chen, Jinqiu Yang and Weiyi Shang.
  4. [MSR'19] Tracing Back Log Data to its Log Statement: From Research to Practice, by Daan Schipper, Mauricio Aniche, Arie van Deursen.
  5. [ASE'18] Characterizing the Natural Language Descriptions in Software. Logging Statements, by Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu.
  6. [SOSP'17] Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold, by Xu Zhao, Kirk Rodrigues, Yu Luo, Michael Stumm, Ding Yuan, Yuanyuan Zhou.
  7. [ICSE'17] Characterizing and Detecting Anti-patterns in the Logging Code, by Boyuan Chen and Zhen Ming (Jack) Jiang.
  8. [Ebook'17] The Complete Guide to Java Logging in Production, by Henn Idan. [OverOps]
  9. [ATC'15] Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis, by Rui Ding, Hucheng Zhou, Jian-Guang Lou, Hongyu Zhang, Qingwei Lin, Qiang Fu, Dongmei Zhang, Tao Xie. [Microsoft]
  10. [OSDI'12] Be Conservative: Enhancing Failure Diagnosis with Proactive Logging, by Ding Yuan, Soyeon Park, Peng Huang, Yang Liu, Michael M. Lee, Xiaoming Tang, Yuanyuan Zhou, and Stefan Savage.
  11. [ICSE'12] Characterising Logging Practices in Open-Source Software, by Ding Yuan, Soyeon Park and Yuanyuan Zhou.
  12. [ICSE'12] Bridging the Divide between Software Developers and Operators using Logs, by Weiyi Shang.
  13. [TOCS'12][ASPLOS'11] Improving Software Diagnosability via Log Enhancement, by Ding Yuan, Jing Zheng, Soyeon Park, Yuanyuan Zhou, and Stefan Savage.

Tracing Practices

Log Parsing

  1. [TSE'20] Logram: Efficient Log Parsing Using n-Gram Dictionaries, by Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, Tse-Hsun Chen.
  2. [ECML-PKDD'20] Self-Supervised Log Parsing, by Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao.
  3. [ICSE'19] Tools and Benchmarks for Automated Log Parsing, by Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. [Huawei]
  4. [ICPC'18] A Search-based Approach for Accurate Identification of Log Message Formats, by Salma Messaoudi, Annibale Panichella, Domenico Bianculli, Lionel Briand, Raimondas Sasnauskas.
  5. [TDSC'18] Towards Automated Log Parsing for Large-Scale Log Data Analysis, by Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu.
  6. [CIKM'16] LogMine: Fast Pattern Recognition for Log Analytics, by Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Geoff Jiang, Adbullah Mueen. [NEC]
  7. [ICWS'17] Drain: An Online Log Parsing Approach with Fixed Depth Tree, by Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu.
  8. [ICDM'16] Spell: Streaming Parsing of System Event Logs, by Min Du, Feifei Li.
  9. [DSN'16] An Evaluation Study on Log Parsing and Its Use in Log Mining, by Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu.
  10. [TKDE'12] A Lightweight Algorithm for Message Type Extraction in System Application Logs, by Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios.
  11. [CIKM'11] LogSig: Generating System Events from Raw Textual Logs, by Liang Tang, Tao Li, Chang-Shing Perng.
  12. [KDD'09] Clustering Event Logs Using Iterative Partitioning, by Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios.

Log Compression

  1. [ASE'19] [Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression], by Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, Michael R. Lyu. [Huawei]
  2. [CCGrid'15] Cowic: A Column-Wise Independent Compression for Log Stream Analysis, by Hao Lin, Jingyu Zhou, Bin Yao, Minyi Guo, Jie Li.
  3. [MILCOM'14] Lightweight Packing of Log Files for Improved Compression in Mobile Tactical Networks, by Peter Mell, Richard E. Harang.
  4. [SIGMOD'13] Adaptive Log Compression for Massive Log Data, by Robert Christensen and Feifei Li. [Code]
  5. [DCC'04] High Density Compression of Log Files, by Balázs Rácz, András Lukács.

Empirical Studies

  1. [FSE'19] How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computing Platform, by Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella, and Nematollah Bidokhti. [Futurewei Technologies]
  2. [DSN'19] Characterizing and Understanding HPC Job Failures over The 2K-day Life of IBM BlueGene/Q System, by Sheng Di, Hanqi Guo, Eric R. Pershey, Marc Snir, Franck Cappello.
  3. [DSN'07] What Supercomputers Say: A Study of Five System Logs, by Adam J. Oliner, Jon Stearley.

Industrial Talks