Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dune Sync #45

Open
14 tasks
bh2smith opened this issue Nov 18, 2022 · 0 comments
Open
14 tasks

Dune Sync #45

bh2smith opened this issue Nov 18, 2022 · 0 comments
Labels

Comments

@bh2smith
Copy link
Contributor

bh2smith commented Nov 18, 2022

Instructions from Dune:

AWS Stuff

  1. The data is written into Dune’s S3 buckets. We assume we host the data, but the data still belongs to you.
  2. Separate IAM users are created for the community data providers to write data. IAM roles created by data providers are added to the trust policy to allow them to assume IAM roles within our account. This security measure minimizes the number of different credentials maintained by data providers.
  3. Data providers should write objects with the bucket-owner-full-control canned ACL. -- [Dune Sync] Add Basic AWS Post #44
  4. Pre-specified External IDs must be used when assuming the role.

File Formating

  1. We use JSON format to write data into S3 buckets.
  2. The file names should contain a predefined constant prefix. (e.g. cow_{the rest of the filename}.json)
  3. We intend to keep filenames very simple. Filenames need to contain an increasing sequence number (e.g., cow_000000000000001.json, cow_000000000000001.json). We are releasing a new update to be able to test timestamps instead of sequence numbers in the filenames.
  4. The data is written into JSON files as JSON objects.
  5. The data in JSON files should not be enclosed in array brackets.
  6. Generally, we avoid updates to the written files. If you are producing data every minute, you can write separate files.
  7. Written data should follow an append-only approach. We can discuss strategies for data updates and deletion separately.
  8. Name/value pairs in JSON files should correspond to the column names and data values in the target data table.
  9. If you plan to write data into several tables, you can write data for different tables in different folders in the S3 bucket. (e.g. s3:::{bucket name}/{folder name}) - [Dune Sync] Add Basic AWS Post #44
  10. We only support predefined schemas at the moment. Data providers should define schemas of the final tables in advance.
@bh2smith bh2smith added the epic label Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant