-
Notifications
You must be signed in to change notification settings - Fork 1
/
Background.tex
142 lines (109 loc) · 8.21 KB
/
Background.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
\section{Background}
This section will address our problem statement, the terminology we will be using throughout the paper, and other solutions that have been proposed or built to try
and address similar problems.
\subsection{Problem Statement}
% User data is extracted but not owned by the user and data is handled unsafely
% - Consent
% - Compensation
% - Transparency
% This leads us to the Snickerdoodle finds the following problem:
\textit{Individuals are constantly producing data from which actionable business intelligence is derived but these individuals do not own this value creation}
\newline
\newline
Large companies are constantly monitoring their users for data while these users do not have a viable mechanism by which to control their data. This observation
and collection of end-user data played a large factor in the shaping of the modern economy, even called surveillance capitalism by some. This has led to a
variety of negative consequences, such as who gets the value and security, privacy, surveillance consequences, transparency, compensation, and consent.
The Protocol will help individuals control their own data as well as understand its use and derive value from it. This will increase the security
and privacy of data and allow people to effectively monetize the data they generate. Additionally, it will make it simpler for companies interested in data to
run analysis while respecting data privacy legislation as they can use the protocol as their data infrastructure.
\subsection{Terminology}
This section will define those terms and give context to why they are important.
\subsubsection{Decentralization}
% - Permissionless
% - Trustless
% - Available
\begin{definition}
\label{definition:Decentralization}
Decentralization: When control over a system is held by a group rather than a single authority.
\end{definition}
In order to prevent a single party from acquiring undue influence in the data economy, including Snickerdoodle Labs, the Protocol will be built on top of a
decentralized blockchain data structure.
An important note is that decentralization by itself not the ultimate goal. Rather, the goal of the Protocol is to be permissionless, trustless, and available.
Currently, only known viable way to achieve these properties is by designing the protocol to be inherently decentralized.
$\mathbf{Permissionless}$
Anyone should be able to interact with the protocol without the permission of a trusted third party. Snickerdoodle Labs must not be in a position to decide which individuals
are able to collect and share their data, choose what businesses are able to request data, and what developers are able to build on top of the protocol.
The rules of the protocol will be determined by a Decentralized Autonomous Organization (DAO) which will provide a decentralized mechanism to manage the Protocol.
See section \ref{section:ImplementationDAO} for more details about the Protocol DAO.
$\mathbf{Trustless}$
The operation of the system should not require trust in a particular, centralized third party in order for the protocol to function. Actors in the system must not
need to rely on Snickerdoodle Labs or any other actors in order to own their own data or acquire insights.
It is worth noting that completely trustless systems do not exists, but there are varying levels of trustlessness. Ideally, one can trust many mathematicians
and engineers that the math and systems built will force the system to behave in the correct way. In the worst case, a strong financial incentive can be relied on for
the system to behave correctly. The Snickerdoodle Protocol will rely on both paradigms of trustlessness and aim to update the protocol to use stronger forms of trustlessness
over time.
$\mathbf{Availability}$
Actors in the system should be able to take feasible actions in the system in a reasonable amount of time.
\subsubsection{Data Safety}
% After writing this paper we don't use this term too much. I'm not entirely opposed to removing this and replacing the use of safety else where
% security
% Privacy
\begin{definition}
\label{definition:DataSafety}
Data Safety: Data is considered safe if it is securely stored and privately viewed.
\end{definition}
When describing data, we often say that it is $\textit{safe}$ in order to encompass all aspects of data security and privacy. Safe data is data that is securely
written, stored, transmitted, and accessed in a privacy-preserving manner. This means that the Snickerdoodle Protocol will have to have a strong sense of identity
management that only allows authorized people are able to access and know about the data. We implement this identity management via the Snickerdoodle Data Wallet
discussed in section \ref{section:DataWallet}.
\subsubsection{Data Subscribers \& Insights}
\begin{definition}
\label{definition:DataSubscriber}
Data Subscriber: A data subscriber is a data-consuming entity that pays for temporary access to data to gain insights
\end{definition}
\begin{definition}
\label{definition:Insight}
Insight: An insight is actionable intelligence gained from applying a function or algorithm to an appropriately structured data set.
\end{definition}
% Data consuming entity
In the modern data economy, all organizations need to make informed operational decisions. Data subscribers are interested primarily
in the insights that data provides rather than the raw data itself. In a world where data is owned by individuals, organizations
would not be able to store and own individual data forever, rather they would pay to be granted temporary access to that data to produce insights.
% old sections we had definitions for
% \subsubsection{Data Terms}
% \paragraph{Data Warehousing}
% \paragraph{Data Mining}
% \paragraph{Verifiability \& Authenticity}
% \paragraph{Data Freshness}
% \subsubsection{Web3 Terms}
% \paragraph{Interoperability}
% \paragraph{Key Management}
% \paragraph{Signing}
\subsection{Other Solutions}
There are a variety of different approaches and technologies that aim to allow users to own their own data. In this section, we discuss some of these approaches
\subsubsection{Policy Solutions}
%Follow GDPR + CPAA
%Warehousing / Data lake
Policy decisions such as GDPR in the European Union or the CCPA in California, attempt to regulate consolidated data warehouses and other types of centralized storage.
These give individuals rights that allow them to control how their data is used. These laws are a positive step towards giving individuals ownership over their data.
However, these laws can be hard for both end-user and application developers to interpret. Snickerdoodle Labs aims to address these problems by
building a system that is compliant with these regulations by default, easy for developers to leverage for their applications, and easy for individuals to express
their rights.
\subsubsection{Data Sharing Techniques}
%Perturbation Techniques
% -DP
%Federated Learning
%Data Outsourcing
% -We are making
There also exist a number of solutions that attempt to tackle the issues surrounding data-sharing. For example, perturbation techniques like differential privacy
have shown promise in sharing noisy and/or anonymized data with limited value loss \cite{dwork2008differential}. Techniques such as federated learning and
multi-party-compute have been used to train models on distributed data sets \cite{li2020federated}\cite{lindell2005secure}. In addition, data outsourcing
techniques have been employed to separate the management of data from its storage \cite{di2007data}. All of these solutions are still yet to find practical applications
for the most part and are hard for others to build on.
\subsubsection{Web3 Solutions}
% Data pools / unions
% Data set markets (Ocean)
The distributed nature of web3 technologies provide a natural way to explore data ownership and decentralize the control of data. Ceramic creates a way to
create link existing databases in a decentralized manner and manage their identity \cite{CeramicNetwork}. Ocean Protocol creates a data set market by allowing
people to sell access to data sets and bring compute to data \cite{OceanProtocol}. While these projects are inspired and create new ways to interact with data
in a decentralized way, they don't address the problem of allowing individuals to own and control their data.