Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage v2: add image upload+polling, use CIDs, remove metadata #5032

Merged
merged 8 commits into from
Apr 3, 2023

Conversation

theoilie
Copy link
Contributor

@theoilie theoilie commented Mar 30, 2023

Description

  • Changes mediorum to not have metadata routes and instead compute+return CID V1 for transcoded/resized uploads
  • Makes libs poll /uploads/:id endpoint for transcode/resize success
  • Track and image upload flow gets all the way to the end and posts an EntityManager transaction with valid metadata. It will be able to fully complete and play the upload once discovery is able to index the transaction

Tests

With local stack+client, I signed up, pressed "f" to enable the storage_v2 feature flag, uploaded a track with cover art, and verified that polling succeeded followed by an EntityManager transaction being relayed with correct metadata (including track_cid). I also verified that results for jobs now use CID V1:

Job results
{
  "id": "4GVL7GUKBWT4TKPI433ZTF7RQE3C2OXG",
  "template": "img_square",
  "orig_filename": "[object Object]",
  "probe": {
    "format": {
      "filename": "[object Object]",
      "nb_streams": 1,
      "nb_programs": 0,
      "format_name": "jpeg_pipe",
      "format_long_name": "piped jpeg sequence",
      "size": "240898",
      "probe_score": 26
    }
  },
  "mirrors": [
    "http://localhost:1991"
  ],
  "status": "done",
  "created_by": "http://localhost:1991",
  "created_at": "2023-03-30T20:56:33.473712044Z",
  "transcoded_by": "http://localhost:1991",
  "transcode_progress": 1,
  "transcoded_at": "2023-03-30T20:56:34.185948669Z",
  "results": {
    "1000x1000": "QmbKtpchC1W6qg4PpjNUYysBKenqtKDaRbmQSnnEJsRbfb",
    "150x150": "QmR7wD3n2ANVxwtVykPXeAUYRiwXEZi97dfQDZFJyRtCRc",
    "480x480": "QmceU8VCVc6V4LU28jcAPeJHTGfEHjytdqYAqiWokcv13D"
  }
}
Metadata created by V2 upload
{
  "track_cid": "QmXDimCZGsfVE2C2n3w42Ro99bhUUFyyviRuq7YuNBSTuE",
  "owner_id": 391084037,
  "title": "80x27s-islandy-loop-925bpm-132431",
  "length": null,
  "cover_art": null,
  "cover_art_sizes": "6DFEFVOWWLW7MBVCQJWGMZRJOHLVLLG2",
  "tags": null,
  "genre": "Experimental",
  "mood": null,
  "credits_splits": null,
  "created_at": null,
  "create_date": null,
  "updated_at": null,
  "release_date": "Thu Mar 30 2023 15:15:36 GMT-0700",
  "file_type": null,
  "track_segments": [],
  "has_current_user_reposted": false,
  "followee_reposts": [],
  "followee_saves": [],
  "is_current": true,
  "is_unlisted": false,
  "is_premium": false,
  "premium_conditions": null,
  "field_visibility": {
    "genre": true,
    "mood": true,
    "tags": true,
    "share": false,
    "play_count": false,
    "remixes": true
  },
  "remix_of": null,
  "repost_count": 0,
  "save_count": 0,
  "description": null,
  "license": "All rights reserved",
  "isrc": null,
  "iswc": null,
  "download": null,
  "is_playlist_upload": false,
  "artwork": {
    "url": "blob:http://localhost:3000/8136377f-5f3a-48c8-b512-2cf79af38311",
    "file": {},
    "source": "original"
  }
}
Metadata created by the same V1 upload
{
  "track_cid": "QmcWqWwQSJ5mgZoLDqvqsLQ8eX3k6ZchxG33DraFQ8wDM1",
  "owner_id": 391084037,
  "title": "80x27s-islandy-loop-925bpm-132431",
  "length": null,
  "cover_art": null,
  "cover_art_sizes": "QmS8Eb9uUuVetiNFVmHe55XZRoxAKb1TrrD7wxKwqYy6Fp",
  "tags": null,
  "genre": "Experimental",
  "mood": null,
  "credits_splits": null,
  "created_at": null,
  "create_date": null,
  "updated_at": null,
  "release_date": "Thu Mar 30 2023 15:18:39 GMT-0700",
  "file_type": null,
  "track_segments": [
    {
      "multihash": "QmZZz46oQmN6WtTsJLdy4fTe4AzDjJMe7VVpubFmfabWGM",
      "duration": 6.016
    },
    {
      "multihash": "QmckAGvaTQ1Gyr99JLvPFEqcUc3pfQ3umANUjh76Z1K6BT",
      "duration": 5.994667
    },
    {
      "multihash": "QmTW45Mhy6aaUANfJEftLfFTaKavyTDB52JVafLMEnnvPA",
      "duration": 5.994667
    },
    {
      "multihash": "QmZkjfuGSrWtjmPGATvH7uWyqkTAjVr3spSHEx6y4dznyP",
      "duration": 5.994667
    },
    {
      "multihash": "QmcGMuBvn6RutoqRCuCJnsjb35facptkfHPKkjqGpMReSP",
      "duration": 2.013189
    }
  ],
  "has_current_user_reposted": false,
  "followee_reposts": [],
  "followee_saves": [],
  "is_current": true,
  "is_unlisted": false,
  "is_premium": false,
  "premium_conditions": null,
  "field_visibility": {
    "genre": true,
    "mood": true,
    "tags": true,
    "share": false,
    "play_count": false,
    "remixes": true
  },
  "remix_of": null,
  "repost_count": 0,
  "save_count": 0,
  "description": null,
  "license": "All rights reserved",
  "isrc": null,
  "iswc": null,
  "download": null,
  "is_playlist_upload": false,
  "artwork": {
    "url": "blob:http://localhost:3000/0342668b-f7b0-4738-8a2e-12077d54dad4",
    "file": {},
    "source": "original"
  },
  "stem_of": null
}

Note: track CIDs are different due to #4631.

Monitoring - How will this change be monitored? Are there sufficient logs / alerts?

This is all gated by the storage_v2 flag.

Copy link
Contributor

@sliptype sliptype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! All looks good to me but I don't have much context on the mediorum stuff

Copy link
Contributor

@stereosteve stereosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. Added a note about not using ReadAll

The other thing we should also do is update postBlob function to compute CID for incoming blob instead of trusting the multipart filename from peer.

file, err := upload.Open()
func computeFileCID(file io.Reader) (string, error) {
builder := cid.V1Builder{}
contents, err := ioutil.ReadAll(file)
Copy link
Contributor

@stereosteve stereosteve Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two problems with using ReadAll here:

  • it consumes the reader passed in... so calling code needs to Seek(0,0) if it expects reader to still work.
  • it reads whole file into memory, which for a wav could easily mean 100+ MB.

Anyway I think you can solve both by:

  • making param ReedSeeker
  • removing ReadAll and using CidFromReader actually that is to parse a cid... looking for the reader friendly thing.
  • doing a defer file.Seek(0,0)

next week planning to do the "parallel replicate" change, which will pass around files everywhere so we can have independent readers which will improve this.

Copy link
Contributor

@stereosteve stereosteve Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for not using ReadAll, something like:

h, err := multihash.SumStream(file, multihash.SHA2_256, -1)

basically modify the utils function, drop utils dep and use multihash directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate all the details! re-tested now after updating CID computation to use SumStream + seek to the beginning and making postBlob verify CIDs instead of trusting

@theoilie theoilie requested a review from stereosteve April 3, 2023 17:13
Copy link
Contributor

@stereosteve stereosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@theoilie theoilie merged commit 518758a into main Apr 3, 2023
@theoilie theoilie deleted the theo-track-upload-v2-metadata branch April 3, 2023 19:00
audius-infra pushed a commit that referenced this pull request Apr 4, 2023
## Changelog

- 2023-04-03 [ac2f3f3] Fix discover-node-selector response (#5043) [Dylan Jeffers]
- 2023-04-03 [dfee3cc] SDK: Add getBlockers(), getBlockees(), unfurl(), and getPermissions() (#5041) [Marcus Pasell]
- 2023-04-03 [1d8db84] SDK: Snake case DN selector services key (#5039) [Marcus Pasell]
- 2023-04-03 [518758a] Storage v2: add image upload+polling, use CIDs, remove metadata (#5032) [Theo Ilie]
- 2023-04-03 [0115b13] Bump sdk to v2.0.3-beta.2 [audius-infra]
audius-infra pushed a commit that referenced this pull request Apr 4, 2023
## Changelog

- 2023-04-03 [ac2f3f3] Fix discover-node-selector response (#5043) [Dylan Jeffers]
- 2023-04-03 [dfee3cc] SDK: Add getBlockers(), getBlockees(), unfurl(), and getPermissions() (#5041) [Marcus Pasell]
- 2023-04-03 [1d8db84] SDK: Snake case DN selector services key (#5039) [Marcus Pasell]
- 2023-04-03 [518758a] Storage v2: add image upload+polling, use CIDs, remove metadata (#5032) [Theo Ilie]
- 2023-04-03 [0115b13] Bump sdk to v2.0.3-beta.2 [audius-infra]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants