-
Notifications
You must be signed in to change notification settings - Fork 1
/
Implementation.tex
458 lines (357 loc) · 37.8 KB
/
Implementation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
\section{Implementation}
\label{section:Implementation}
%------------------------------------------------------------------------------------------
\subsection{Architecture Abstract}
\label{section:Architecture}
This section will discuss the implementation and design decisions of the Protocol. The Protocol has three primary
components: a data wallet implementation, an on-chain data control plane, and aggregation service providers.
The data wallet is a software client that implements functionality which enables end-users to collect, index, and store their data as well as participate
in the decentralized data network, see section \ref{section:DataWallet}. Organizations (consumers of data insights) will be able
to query populations of data wallets through an on-chain control plane that adheres to a publish-subscribe (pub-sub) pattern, see section \ref{section:OnChain}.
At the core of the control plane is an upgradable contract factory which produces independent instances of an EIP-721 compatible consent registry.
Consent tokens claimed from these consent contracts are non-transferable but can be burned by the recipient. Claiming a consent token denotes a data
wallet user's consent to participate in network queries in return for rewards (which may or may not be web3 digital assets).
Data wallets receive queries via Ethereum Virtual Machine (EVM) events emitted from consent registry contracts they have claimed tokens in. These events contain metadata encoding
instructions (written in Synamint Query Language) to run computations on the data stored in the data wallet to produce insights. Once a data wallet has produced
the requested insight, it performs a digital "handshake" with the aggregation service provider specified in the query metadata such that the
data wallet owner receives a reward while the requesting organization receives the anonymized insight, see figure \ref{fig:OnChainOffChain}.
User consent and data flow is thus orchestrated in a distributed manner by the Protocol. It is also worth highlighting that while
Snickerdoodle Labs will develop service infrastructure for the Protocol (such as producing a data wallet client and an associated SaaS product offering
for enterprise participation in the Protocol), the protocol itself is, however, permissionless and open so that anyone could implement a
data wallet client or act as an aggregation service provider if the specifications of the protocol is adhered to.
%----------------------------------------------------------------------------------------------------------------------------------------
\subsection{On-Chain Components}
\label{section:OnChain}
\subsubsection{Consent Contract Factory}
\label{section:ConsentFactory}
\input{ConsentContractPubSubTikz}
The on-chain components of the Protocol function as a decentralized, permissionless, and trustless data control plane. It specifically implements a publish-subscribe pattern
in which organizations publish new instances of an EIP-721 compatible consent registry (see \ref{section:ConsentContract}), and end-users
subscribe to the registries by claiming a non-transferrable consent token. The publishing action is performed via the Protocol's upgradable
consent contract factory, see figure \ref{fig:ConsentFactory}. The factory contract will be the entrypoint to the Protocol for new insight
consumers, since a consent registry is required to communicate with the network of data wallets.
\input{ContractUpgradePatternTikz}
The consent contract factory exists as a utility for insight consumers to create new consent registries. The factory is implemented with an
upgradable beacon pattern, see figure \ref{fig:UpgradePattern} to enable gas-efficient deployments and to allow for seamless extensions of functionality
via DAO proposals (see sections \ref{section:ImplementationDAO} and \ref{section:Governance}) .
\paragraph{Factory Pattern}
The factory pattern defines a smart contract which is responsible for creating other contracts. The Protocol uses the factory
pattern to simplify the deployment of new consent registries and to give new insight consumers a single point of entry into the data network.
\paragraph{Upgradable Beacon Pattern}
\label{section:BeaconPattern}
Consent registries are deployed as proxy contract instances that reference an upgradable beacon contract to obtain the correct address to delegate
function calls, see figure \ref{fig:UpgradePattern}. This upgrade pattern compliments the factory pattern by allowing for very gas-efficient deployments
of new proxy instances. Proxy contracts only store storage variables and a pointer to their designated upgradable beacon contract. The upgradable
beacon contract points to an implementation contract. The implementation contract contains all function implementations as well as the storage
variable declarations that proxy contracts copy.
A Protocol upgrade to the consent registry functionality requires that a new implementation contract first be deployed to the blockchain. Then a
DAO proposal must be initiated to point the upgradeable beacon to the new implementation address. All previously deployed and newly created proxy
consent contracts inherit the functionality (and any new storage variables) defined in the new contract.
The upgradeability pattern is based on EIP-1967 and is implemented through OpenZeppelin libraries.
\subsubsection{Consent Registries}
\label{section:ConsentContract}
\input{OnChainOffChainInteractionTikz}
Consent registry contracts are the primary on-chain mechanism by which insight consumers interact with with data wallet end-users. Consent
registries allow organizations to create data pools by serving as an on-chain data structure that holds metadata regarding the conditions
under which data is to be collected and used as well as a cryptographically verifiable list of externally owned accounts (EOAs) that have
have given consent to participate in the data pool.
Consent registries expose an EIP-721 compatible interface. This is for developer integration convenience as it allows consent registries to
be readable by most existing NFT (non-fungible token) indexing services (such as Snowtrace). Consent is denoted by ownership of a non-transferrable consent NFT
which can be burned by the NFT owner at any time.
\paragraph{Request-for-Data Events}
\label{section:RequestForData}
After an organization has published a consent registry via the Consent Contract Factory (see \ref{section:ConsentFactory}), the organization can emit
EVM events by calling a special function, $requestForData$, which takes a content identifier (CID) as its only input. This CID is used to
retrieve the request specifications from a suitable content addressable network (like IPFS) that the data wallets will process (see section \ref{section:AquisitionControlFlow}). Data wallets can detect
past $requestForData$ events by constructing EVM query filters and requesting all EVM logs that match those filters. Thus consent registries offer a
tamper-resistant communication layer between organizations and participants in their data pools. See figure \ref{fig:OnChainOffChain}
Consent registries also specify a $queryHorizon$ which inform data wallets of the oldest block number to search for these events. This variable is
first initialized to the block number of the proxy contract deployment from the contract factory and can be updated by an EOA with the $DEFAULT\_ADMIN\_ROLE$
to a later blocknumber (but cannot be changed to an earlier block number). Setting a reasonable value for $queryHorizon$ is important since many RPC providers
only allow for query filters to search a limited history of the blockchain.
\paragraph{User Data Permissioning}
\label{section:UserPermissions}
A $requestForData$ event can indicate that it requires access to multiple attributes indexed by the user's data wallet client, such as country of origin, age,
on-chain contracts they have interacted with using their linked EOA asset account. However, users can set granular permissions regarding their indexed data
attributes.
Every consent token issued from a consent registry has an associated set of binary $agreementFlags$. There are 256 flags in total (the size of an EVM word)
though not all flags will be assigned to specific data attributes at Protocol launch, leaving room for customization. Only the owner of a consent token can
update the granular permissions denoted by the token's $agreementFlag$s. The availability of granular consent data on-chain allows organizations to better
understand what kinds of insights they will be able to obtain from their data pool before calling $requestForData$.
\paragraph{Consent Invitations}
\label{section:ConsentInvitations}
Consent registry metadata is used for the decentralized, permissionless, and trustless triggering of user flows that should be presented to the data wallet
end user in a format appropriate for the data wallet client environment. In a web browser setting, the data wallet detects the current active URL, and via DNS
over HTTPS (DoH) queries the TXT records associated with the apex domain. If the TXT record contains a reference to a Protocol consent
contract address, the data wallet then fetches the URLs registered in the consent registry $domains$ metadata storage variable and cross-references the domains
listed from the contract to the current URL. If the data wallet detects that the current URL is included in the domains listed in the
consent contract, the data wallet will inject a user flow into the browser DOM. The content of the user-flow is fetched from the URL specified by the consent
registry $baseURI$ parameter.
\input{Web3PopupTikz}
The Web3 popup protocol can be extended to other environments including mobile browsing environments, VR experiences, gaming consoles, etc as indicated by
figure \ref{fig:PopupProtocol}. The user flows presented by these tamper-resistant popups serve as the on-boarding mechanism for end-user's to opt into a data pool.
\paragraph{Opt-In methods and Meta-Transactions}
\label{section:OptInMethods}
The consent registries have two modes by which users can join a data pool: open-access and invite-only. Consent registries in which open-access is
enabled ($openOptInDisabled$ is $false$), any EOA is allowed to claim a consent token by calling the $optIn$ method and paying the associated gas
fees.
Registries where $openOptInDisabled$ is $true$ are invitation only and require a signature for an EOA with the $SIGNER\_ROLE$ in order to join
the data pool associated with a consent registry. There are two available methods for invitation-only user opt-in: $restrictedOptIn$ and $anonymousRestrictedOptIn$.
The former method
requires that the recipient EOA be known in advance by the $SIGNER\_ROLE$ in order to construct the appropriate signature. The receiving EOA then
becomes the only account that can call $restrictedOptIn$ with that signature. If the recipient EOA address is not known ahead of time, the $SIGNER\_ROLE$
can construct a signature for use with the $anonymousRestrictedOptIn$ method in which any EOA can submit the signature to that method in order to opt-in. Once a opt-in signature is used, it cannot be used a second time.
Support for both open and invitation-only opt-in flows makes the Protocol flexible to a variety of use-cases. However, end users are often hesitant to
spent their own cryptocurrency in order to pay for transaction fees associated with a decentralized application or simply do not have the tokens
necessary to do so. Additionally, requiring the user to spend their own assets to participate in the Protocol intoduces significant user friction and
adoption hurdles. Consent registries implement EIP-2772 compatible metatransaction capabilities to circumvent this issue.
Metatransactions enable the delegation of a user's transaction gas fees to another party in a manner that ensures that the user's transaction cannot be maliciously altered. All opt-in methods implement support for EIP-2772 compatible metatransactions. This will offer the
flexibility to have users pay for their own transaction gas fees or have the insight consumer pay.
\subsubsection{Identity Crumbs}
\label{section:Crumbs}
The Protocol introduces a special EIP-721 compatible registry, called the Crumbs Contract, to facilitate data wallet synchronization when an end user
installs the client on a new device. When a user links a new EOA to their data wallet identity, the EOA encrypts their data wallet identity EOA (see section
\ref{section:UserIdentityGeneration}) and stores the encrypted data in the token URI of an entry in the Crumbs contract.
During a new data wallet client installation, the end user must simply link any EOA that has previously been linked to their data wallet identity in order for the data
wallet to synchronize from their previously saved state that has been stored in the decentralized persistence layer of the data wallet network. The data wallet
checks if the account being linked owns a token in the Crumbs contract; if so, it reads the encrypted content of the token URI, decrypts the information, and loads
the public-private key pair into memory.
\subsubsection{EIP-20 Token}
\label{section:Token}
The Protocol includes a fungible utility token adhering to the EIP-20 standard. This token will be used for paying various fees required to leverage
the Protocol (like publishing a new consent registry) and will also be used for voting in the Protocol DAO (see Tokenomics in section \ref{section:tokenomics}). The associated voting mechanism that
accompanies the possession of a token can be delegated to a different address without relinquishing ownership of the token
\subsubsection{Decentralized Autonomous Organization}
\label{section:ImplementationDAO}
\input{ProposalLifecycleTikz}
The Protocol will include a decentralized autonomous organization (DAO) implementation as part of its on-chain components. The DAO will be responsible for proposing and executing upgrades to
the Protocol. The particular pattern used by the Protocol DAO is based on Curve Finance's DAO and will be implemented with OpenZeppelin libraries.
Token holders (see section \ref{section:Token}), are responsible for the creation and execution of DAO proposals. At mainnet launch it is anticipated that one
token will render one vote, though this too could be modified via a DAO proposal. Token holders will have to reach a pre-specified quorum of voting power in
order to successfully create a proposal in the DAO task queue.
Proposals will be subject to a delay of at least one block before voting begins as well as before a queued proposed can be executed
in order to prevent flash loan attacks. Voters who initiate a proposal and subsequently relinquish their voting power by either selling their tokens or have their voting power revoked may have their proposal canceled if their remaining voting power is below the quorum threshold. The lifecycle of a DAO proposal is outlined in figure \ref{fig:ProposalLifecycle}.
%----------------------------------------------------------------------------------------------------------------------------------------
\subsection{Off-Chain Components}
\label{section:OffChain}
\subsubsection{Data Wallet}
\label{section:DataWallet}
\input{DataWalletStructureTikz}
The data wallet is the primary client interface for end users to interact with the Protocol. It enables data ownership by facilitating user control
and consent to the collection, storage, and usage of their data. The data wallet should provide the following functionality (as indicated
by figure \ref{fig:DataWalletStructure}):
\begin{itemize}
\item Ingestion and indexing of user data from the data wallet client deployment environment
\item Identity and verifiable credential generation/management
\item Query/Reward discovery and management
\item A query engine for individualized data mining, insight processing and delivery
\item A consent management interface and granular access control
\item Secure storage of the data
\end{itemize}
To the user, the data wallet operates in a conceptually similar way to a conventional cryptocurrency wallet, but with a wider scope.
Instead of key and account management of a blockchain account, the primary purpose of a data wallet is to manage the storage, collection, and sharing of insights derived
from user data. Data wallet functionality is form-factor agnostic (see section \ref{section:FormFactor}). However, browser extension and
mobile applications are anticipated to be the primary channels for use.
\paragraph{Insight Acquisition and Control Flow}
\label{section:AquisitionControlFlow}
\input{InsightControlFLowTikz}
The acquisition of insights from the data network begins at the consent contract factory (see section \ref{section:ConsentFactory}) where an organization
must first publish a consent registry. Data wallets belonging to end users who have claimed a consent token via an invitation flow (section \ref{section:ConsentInvitations})
detect $requestForData$ events from the associated consent registry. The event will reveal a CID which resolves to query definition file (section \ref{section:RequestForData}).
The query execution layer of the data wallet will parse the query definition, construct the associated abstract syntax tree (AST), and apply the logic to the
data wallet persistence layer in a manner consistent with the conditions given by the user's on-chain permission settings (section \ref{section:UserPermissions}). This
control flow is outlined in figure \ref{fig:InsiteControlFlow}.
\paragraph{User Identity Generation via Key Ratchets}
\label{section:UserIdentityGeneration}
A data wallet should allow users to index transaction history and asset owership from multiple EOAs or smart wallets. However, if they were to use these addresses directly in the participation of
the Protocol for consent token ownership it would readily allow for chain analysis of user behavior and compromise user data privacy. Therefor, a data wallet
implementation should provide key ratchet utilities to allow for the deterministic generation of new EOAs (that cannot be linked back to the generating EOAs) which have the dedicated purpose of holding consent tokens.
A ratchet is a simple machine that only allows unidirectional state increments and prevents backward traversal of the state path. Likewise, a cryptographic
key ratchet is an algorithm that allows for the deterministic generation of new public-private key pairs, using a prior key pair as inputs, in a manner that precludes the feasibility of determining
what public-private key pairs were used in previous iterations. Cryptographic ratchet algorithms are widely used today in consumer-facing private messaging applications.
\begin{algorithm}
\caption{Key Ratchet Proto-algorithm}
\label{alg:KeyRatchet}
\begin{algorithmic}
\Require EOA with message signing utility, Seed Message
\Ensure New EOA with no prior transaction history
\State Seed Message Signature $\gets$ EOA.signMessage(Seed Message)
\State new EOA $\gets$ pbkdf2sync(Seed Message Signature, EOA public address, 100000, 32, sha256)
\State \textbf{return} new EOA
\end{algorithmic}
\end{algorithm}
Procedure \ref{alg:KeyRatchet} outlines a simple technique, leveraging the Password-Based Key Derivation Function 2 (pbkdf2sync), to generate a new public-private key pair from the message signing utility exposed from most consumer crypto-wallets. This
algorithm, used in conjunction with the Crumbs contract, described in section \ref{section:Crumbs}, can be used to enhance user data privacy by preventing the cross-referencing of linked EOAs on-chain while at the same time offering an improved user
experience.
Specifically, by using the key ratchet algorithm to derive a dedicated in-memory key pair for consent, the data wallet form factor can sign metatransactions (see section \ref{section:OptInMethods}) without multiple prompts from various wallet applications. Additionally, using derived EOAs for consent provides an additional layer of security for end user's, keeping their valuable asset-holding EOAs separate from those
used for participating in various data pools.
\paragraph{Storage}
\label{section:storage}
Secure storage of data is crucial to allowing end users to own their data. The Protocol does not specify a schema for local storage of user data; that is left to the party implementing a data wallet client. However, the storage layer of a data wallet client should allow for secure, tamper-resistant, and platform independent synchronization of user data across multiple client installations.
The initial data wallet implementation produced by Snickerdoodle Labs exposes a modular storage interface capable of integrating with various object storage provider technologies,
such as Google Storage, Amazon S3, or decentralized options like the Ceramic Network.
It is also important to call out the wallet's storage of public-private key pairs. A data wallet should only store public keys
and digital signatures associated with accounts linked to a user's data wallet, not private keys (other than the in-memory keys generated via key ratchet iterations used for holding consent tokens, see section \ref{section:UserIdentityGeneration}).
Instead, data wallet client implementations should delegate the management of asset-bearing keys to dedicated wallet applications like MetaMask, Coinbase Wallet, etc.
\paragraph{User Data Ingestion and Indexing}
\label{section:DataIngestion}
A data wallet client should implement automatic data collection utilities appropriate for the deployment environment of the client. For example, a browser extension client may collect metrics on sites visited and time spent on those pages. A mobile application client may collect geo-location information. Regardless of
the attributes a data wallet client is collecting in a particular form-factor, attributes can be categorized via three important properties: explicit/implicit, first/third party, and authenticated/unauthenticated.
% Not sure if this should be a paragraph or points
\paragraph{Explicit/Implicit}
The data attributes collected by a data wallet client can be considered explicit or implicit attributes. Explicit data is data must be indexed and stored in its entirety by the data wallet client in order to produce insights from it. Explicit data offers no deterministic mechanism to regenerate the data set from scratch if it is lost. Implicit data does not require storage of every data element of the attribute in order to evaluate functions to produce insights and can be deterministically recovered if lost.
A simple example of an explicit data attribute would be a user's date of birth and geo-location history. This data must be stored by the data wallet in its entirety to generate insights and if it is lost, it cannot be easily reproduced without an dedicated backup. An example of implicit data is a user's on-chain transaction history. Transaction history can be recovered deterministically by simply linking asset accounts to the user's data wallet. Additionally, the entirety of the user's transaction history does not need to be available at all times, it can be fetched as needed when an insight requires accessing it.
\paragraph{First/Third Party}
Data collected can come from different sources. Specifically, first-party data comes directly from the user, and third-party data comes from
someone other than the user. For example, if the user directly inputs their name, their name would be considered first-party data; if the
user imports their name from the DMV, that would be third-party data.
\paragraph{Authenticated/Unauthenticated}
Data can be authenticated if its origin and validity can be verified through cryptographic means and unauthenticated if no such mechanism exists. For example,
a wallet address can be authenticated via a signed message signature verification. Third-party data can be authenticated if it has a known
credential authority (this is the fundamental operating principal of certificate authorities like Digicert).
\paragraph{Localized Processing} % listens to events and processes queries
The data wallet is a local application that stores the owner's data securely and processes computations locally. By collecting and
securely storing user data locally, the data wallet guarantees data ownership to the user by never sharing it. Because computations
are running locally, the owner ensures that only analysis they've given consent to can run on their data. Insight consumers also benefit
from this model as they can leverage data-driven insights without the risk of liability associated with the custody of user personal identifying information (PII). While the initial version of localized processing will be more limited, there is a myriad of ways we can modify this
approach to add additional data safety and features (see section \ref{section:Future}).
The wallet will learn what computations to run by listening to on-chain $requestForData$ events as depicted in figure \ref{fig:InsiteControlFlow}. These queries are written in a simple language specified by the Protocol. This language, called Synamint Query Language, is discussed in section \ref{section:SyQL}
\subsubsection{Synamint Query Language (SyQL)} % maybe move to contracts
\label{section:SyQL}
The Protocol specifies a simple query language, called Synamint Query Language (SyQL), which will allow insight consumers to broadcast conditional insight requests to consent registry cohorts in a transparent and interpretable fashion.
SyQL is structured in JSON format containing information on the eligibility requirements, rewards, data to be collected, what processing
to perform, and where to send processed data (as depicted in the insight delivery edge in figure \ref{fig:InsiteControlFlow}). Queries written in SyQL should be stored on a content-addressable network, such as IPFS, to ensure tamper resistance.
A SyQL is written by specifying nested keywords with associated parameters that inform the data wallet client query processing engine what data attributes are being requested and what insights to return given conditional statement that are met by the user's data state. The keywords associated with SyQL and their intended usage and behavior are given in the following subsections (note that the SyQL specification is subject to change before mainnet launch of the Protocol):
\paragraph{version (required)}
The $version$ keyword is reserved for specifying the version of the SyQL schema a query is based on. This keyword has no sub-keywords.
\paragraph{timestamp (required)}
The time when the SyQL query is created in ISO 8601 format, i.e., YYYY-MM-DDTHH:MM:SS. For an example, 20:20:39 on 13 of November 2021 is represented as 2021-11-13T20:20:39. This keyword has no sub-keywords.
\paragraph{expiry (required)}
The time when the SyQL query is expired in ISO 8601 format. Queries that are recieved after this time are considered stale and will not be executed or rewarded. There are no sub-keywords associated with this top-level keyword.
\paragraph{description (required)}
The $description$ keyword is used for specifying text, markdown, or HTML intended to be displayed to the recipient of a query. There are no sub-keywords.
\paragraph{business (required)}
This keyword is reserved for indicating what entity is broadcasting a query. It has no sub-keywords.
\paragraph{queries (required)}
The $queries$ keyword is used to indicate that a SyQL file is requesting access to the data wallet persistence layer. One or more instances must be specified with a queries block. These query instances can then be referenced by other top-level keywords. A query instances has the following sub-keywords:
\begin{itemize}
\item $name$: The $name$ sub-keyword indicates which attribute must be accessed in the data wallet persistence layer. Supported attributes should include:
\begin{itemize}
\item network: accesses the Web3 data associated with all accounts linked to a data wallet identity
\item age: access to the age of the data wallet user
\item location: access to the location data of the data wallet user
\item browsing\_history: access to the browsing history of the data wallet user
\item gender: access to the gender field of the data wallet user
\item url\_visited\_count: access the number of times URLs are visited by the data wallet user
\item chain\_transactions: accesses the transaction volume (in USD) and count by the data wallet user
\item balance: accesses the balance of the data wallet user on a per-chain basis
\end{itemize}
\item $return$: The $return$ sub-keyword specifies the object type that will be returned by a query. Supported types include:
\begin{itemize}
\item boolean: true or false depending on the conditions applied to the attribute being accessed
\item integer: returns an integer object related to the referenced attribute
\item enum: returns an enum related to the referenced attributed. The enum keys are specified under enum\_keys sub-keyword
\item object: returns an object to describe the referenced attributed. The object schema is specified in object\_schema sub-keyword
\item array: returns an array to describe the referenced attributed. The array items are specified in array\_items sub-keyword
\item string: returns a string to describe the attribute of interest. The string patter is described using string\_pattern sub-keyword.
\end{itemize}
\item $conditions$: Conditions are used in conjunction with the boolean return type. A conditions are used to specify the filter to apply to the attribute in order to determine if true or false should be returned. The following conditions are supported:
\begin{itemize}
\item in: is the attribute in a set of objects
\item ge: is the attribute greater or equal than a given object
\item l: is the attribute less than an object
\item le: is the attribute less than or equal to an object
\item e: is the attribute equal to an object
\item g: is the attribute greater than an object
\item has: does the attribute include a set of objects
\end{itemize}
\item $networkid$: This sub-keyword is used in conjunction with the balance attribute type. This sub-keyword allows for the specification of which layer 1 protocols a balance query should be run against. The following networkid are supported:
\begin{itemize}
\item SOL: Solana network
\item 1: Ethereum Mainnet
\item 4: Ethereum Testnet (Rinkeby)
\item 42: Ethereum Testnet (Kovan)
\item 43114: Avalanche Mainnet
\item 43113: Avalanch Testnet (Fuji)
\item 137: Polygon Mainnet
\item 80001: Polygon Testnet (Mumbai)
\item *: all supported networks
\end{itemize}
\item $chain$: This sub-keyword is used in conjunction with the network attribute type. This sub-keyword allows for the specification of which layer 1 protocols a network query should be run against. The following chains are supported:
\begin{itemize}
\item ETH: Ethereum Mainnet
\item AVAX: Avalanche Mainnet
\end{itemize}
\item $contract$: The $contract$ sub-keyword is used in conjunction with the network sub-keyword. Specifying a contract indicates that the query is interrogating whether any accounts linked to a data wallet have made transactions meeting the following required characteristics:
\begin{itemize}
\item address: address of the smart contract of interest
\item networkid: chain ID that the smart contract is deployed to
\item function: function ABI on the target smart contract
\item direction: was the user's account in the to or from field
\item token: is the contract an ERC20 or ERC721 standard
\item timestamp: did the account submit a matching transaction between start and end timestamp
\end{itemize}
\item $enum\_keys$: This sub-keyword is used in conjunction with the enum attribute type. Listing the keys that the attribute type supports.
\item $object\_schema$: This sub-keyword is used in conjunction with the object attribute type. Specifying the schema of the object including the properties, patternProperties (properties with regex formatted keys), and required properties of the object.
\item $string\_pattern$: This is used to describe the pattern of the string attribute, using Regular Expression (RegEx).
\item $array\_items$: This sub-keyword is used in conjunction with the array attribute type. Specifying the items of the array. The following array\_items are supported:
\begin{itemize}
\item boolean: an array of booleans
\item integer: an array of integers
\item object: an array of objects described with object\_schema
\item array: an array of arrays
\item number: an array of numbers
\end{itemize}
\end{itemize}
\paragraph{returns}
The $returns$ keyword is used to specify one or more candidate return objects that may be delivered to an insight aggregator. A return object has the following sub-keywords:
\begin{itemize}
\item $name$: What is the type of return
\begin{itemize}
\item callback: resolves immediately to a pre-specified message delivered to a callback url
\item query\_response: resolves to the result of the specified query
\end{itemize}
\item $message$: An explicit string message to be returned as a result. Used with the callback return type.
\item $query$: A reference to a query specified in the queries block. Used in conjunction with the query\_response return type.
\item $url$: A complete URL specifying the location of the aggregation service associated with this SyQL file.
\end{itemize}
\paragraph{compensations (required)}
The $compensations$ keyword is used to declare one or more possible digital assets associated with the SDQL file. Below are the following required characteristics of the compensations:
\begin{itemize}
\item $description$: A text, markdown, or html string for displaying to the user information about the digital asset.
\item $callback$: A callback URL for claiming the digital asset.
\end{itemize}
\paragraph{logic}
The $logic$ keyword is used to specify arbitrary logic to components specified in the queries, returns, and compensations blocks.
\begin{itemize}
\item $returns$: A sub-keyword of logic used to specify an array of return expressions. A return expression can return objects declared in the returns block given that objects declared in queries have sufficient permissions to access the requisite attributes of the persistence layer.
\item $compensations$: A sub-keyword of logic used to specify an array of compensation expressions. A compensation expression can return objects declared in the compensations block given that objects declared in queries have sufficient permissions to access the requisite attributes of the persistence layer.
\end{itemize}
\subsubsection{Aggregation Services} % name for role?
Data ingestion and aggregation is the final stage of the insight aggregation process that allows subscribers to access the insights generated at the data wallet layer.
This section will detail how the subscriber can see processed data from the user's data wallet. Any entity can assume the role of data ingestion service
as long as they follow the protocol and are accepted by the DAO. Snickerdoodle Labs will offer the first data ingestion provider, which will be tied to
a SaaS product.
\paragraph{Insight Ingestion}
The final step of the protocol is for data wallets to send insights from their queries to an endpoint. This endpoint is the ingestion provider and is
specified in the SYQL query. Ingestion is an integral part of the protocol as it allows businesses to see the insights they have paid for.
Once the ingestion provider receives the insight, it must store the insights events and manage access. They can also provide any additional services
they want on top of this, such as providing a dashboard to view insights. There is no way to mathematically guarantee any ingestion provider is behaving
honestly or doesn't have any bugs. The lack of a strong guarantee means that actors in the protocol have to put some trust in ingestion providers. The
Snickerdoodle Protocol reduces this problem by only allowing ingestion services that the DAO has approved. Any provider has to be trustworthy enough for
the DAO to vote them in; if a provider is revealed to be a bad or incompetent actor, they can be voted out. For a deeper discussion of data safety
concerns with ingestion providers, see section \ref{section:IngestionDataSafety}.
\paragraph{Data Safety}
\label{section:IngestionDataSafety}
The initial version of the protocol requires trust in the Insight Platform to maintain data safety. As discussed above, there are no guarantees that the
ingestion provider acts honestly. The ingestion provider can see, delete, and sell any insights they receive. Additionally, because the initial version
of the wallet only implements simple anonymization techniques, ingestion providers could identify who shared the insight and reconstruct the raw data.
To reduce these privacy risks, the initial version of the protocol will only allow queries that don't reveal PII, and future versions will add anonymization
techniques to insight computations (see section \ref{section:Future}). To reduce the risk of malicious ingestion services, the Snickerdoodle DAO will maintain
an allowlist of trusted actors. Additionally, Snickerdoodle Labs will be providing an ingestion service. The Snickerdoodle Ingisht Service is a SaaS product
that will enforce data safety and has a massive financial and legal incentive to behave honestly.
\paragraph{SDL Insight Platform}
\label{section:InsightService}
The Snickerdoodle Insight Platform will be our SaaS product offering for interaction with the Snickerdoodle Protocol. The Insight Platform will provide an
ingestion service and help manage interaction with the rest of the protocol. This includes support for creating and managing queries, consent contracts,
rewards, and website integrations. The Insight Platform will also provide an analytics dashboard to help businesses see their insights.