LoggingDescriptions

This repository maintains a set of <code, log> pairs extracted from popular open-source projects, which are amendable to logging description generation research. More details about the dataset can be found in our paper:

Pinjia He, Zhuangbin Chen, Shilin He, Michael R. Lyu. Characterizing the Natural Language Descriptions in Software Logging Statements, in Proc. of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), 2018.

The projects are listed as follows, including 10 Java projects and 7 C# projects:

No	Java Projects	C# Projects
01	ActiveMQ	Azure SDK
02	Ambari	CoreRT
03	Brooklyn	CoreFX
04	Camel	Mono
05	CloudStack	MonoDevelop
06	Hadoop	Orleans
07	Hbase	SharpDevelop
08	Hive
09	Synapse
10	Ignite

Each folder of a project includes the following two files:

(project)_code_log_pairs.txt: This file contains all the code-log pairs extracted from the project. The pairs from different files of the project are separated.
file_trace.txt: To facilitate our data processing, different files of a project are renamed in the form of "sameple_ID". This file is used to help readers trace back to the original file.

Data Extraction

In the paper, each <code, log> pair is extracted from a single function and composed of two parts: the code text and the logging description. The code text contains 10 lines (if it has) of code statements preceeding the studied logging statement. The logging description contains the descriptive text in the same logging statement. Non-description parts such as variables are removed.

Processing Details:

All empty lines are skipped.
All English characters are converted to their lower cases.
In code text part, code lines are separeted by \tab.
Log statements that do not contain any description text are not considered as logging description but ordinary code statement in this dataset.
The extracted preceeding 10 lines of code statements do not exceed current function scope (see the following example for details).

A Simplified Example

For easy demonstration, in the following Java example, we simply extract 6 lines of code insteaed of 10 for the code text part.

public	void catchException() {
try {
		operation 1;
		operation 2;

	} catch (Exception1 e1) {
		LOGGER.error("Exception 1 happens", e1);

	} catch (Exception2 e2) {
		LOGGER.error(e2);

	} catch (Exception3 e3) {
		LOGGER.error("Exception 3 happens", e3);
	}
}

In this function, two <code, log> pairs can be extracted (\tab indicates new lines of code statement):

<code, log> pair 1:

Code Text:

public void catchexception() {     try {     operation 1;     operation 2;     } catch (exception1 e1) {

Logging Description:

exception 1 happens

<code, log> pair 2:

Code Text:

operation 2;     } catch (exception1 e1) {     } catch (exception2 e2) {     logger.error(e2);     } catch (exception3 e3) {

Logging Description:

exception 3 happens

Further Explanation:

Logging statement "LOGGER.error(e2);" can not produce a <code, log> pair since it does not contain any descriptive text except a variable. This kind of statement is treated as an ordinary code line, see <code, log> pair 2, while others with descriptive text will not appear in the code part of any pairs, see <code, log> pair 1.
In <code, log> pair 1, the code text contains only 5 (<6) code lines, but it will not include code outside the function.

Cite

If you use this dataset, please cite our paper using the following reference:

@inproceedings{he2018characterizing,
title={Characterizing the natural language descriptions in software logging statements},
author={He, Pinjia and Chen, Zhuangbin and He, Shilin and Lyu, Michael R},
booktitle={Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering},
pages={178--189},
year={2018},
organization={ACM}
}

License

All datasets in this repository will follow the MIT license for free reuse.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
c_sharp		c_sharp
java		java
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoggingDescriptions

Data Extraction

A Simplified Example

Further Explanation:

Cite

License

About

Releases

Packages

Contributors 4

logpai/LoggingDescriptions

Folders and files

Latest commit

History

Repository files navigation

LoggingDescriptions

Data Extraction

A Simplified Example

Further Explanation:

Cite

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages