forked from gousiosg/ecosystems
-
Notifications
You must be signed in to change notification settings - Fork 0
/
provenance.tex
28 lines (20 loc) · 1.98 KB
/
provenance.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
\chapter{Data provenance}
\chapterauthor{Sean P. Goggins, Ph.D
%& ...
}
\chapter*{Challenges and Solutions for Data Provenance Management for Understanding Open Source Ecosystems}
\section{Introduction}
\paragraph{Humans are not great at remembering and recording work they do every day.} A machinist is unlikely to recall and recount each step involved in setting up a machine to build a new, custom part. Similarly, scientists who examine electronic trace data (logs from human computer interaction) do not always track specifically how they fill in missing data, standardize categories, clean, reshape and reduce teh data they use \cite{Goggins et al 2016}.
\paragraph{The Mining Software Respositories, open collaboration data exchange and GenBank communities provide examples open source software ecosystem researchers may be able to utilize as exemplars to emulate for tracking data provenance.} Data provenance is a record of how data was collected, processed, altered and reshaped to provide the foundation for analysis and findings reported in academic papers or professional, industrial project dashboards. \#TODO describe prior work looking at data provenance across fields
\paragraph{Reporting of the provenance of data used for research and practice examining open source software is inconsistent.} \#\#TODO Describe examples and make the gap clear
\paragraph{In this chapter we enumerate the challenges, solutions and 2 exemplars for managing and documenting data provenance in open source software research.}
\section{State of the Art}
\section{Challenges in Research Practice}
\section{Methods for Addressing Challenges}
\section{The Open Collaboration Data Exchange Manifest Structure}
\subsection{The SPDX Example in the Linux Foundation}
\subsection{OCDX Current State}
\section{Data Provenance Technical Pipeline Examples From Social Computing Research}
\section{Data Provenance Technical Pipeline Examples From The Mining Software Repositories Community}
\section{Next Steps}
\section{Conclusion}