Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The downloaded data did not match the data from the server #654

Closed
beshkenadze opened this issue Jun 10, 2015 · 27 comments · Fixed by #745
Closed

The downloaded data did not match the data from the server #654

beshkenadze opened this issue Jun 10, 2015 · 27 comments · Fixed by #745
Assignees
Labels
api: storage Issues related to the Cloud Storage API.

Comments

@beshkenadze
Copy link

Hey,
Any files that I get from the bucket "pubsite_prod_rev_", gets error code: CONTENT_DOWNLOAD_MISMATCH.

bucket.file("reviews/reviews_com.sample.android_201409.csv").download({
  destination: './reviews.cvs'
},
function(err, content){
  console.log(err.code);//CONTENT_DOWNLOAD_MISMATCH
});
@jgeewax jgeewax added the api: storage Issues related to the Cloud Storage API. label Jun 10, 2015
@beshkenadze
Copy link
Author

:(

@jgeewax
Copy link
Contributor

jgeewax commented Jul 18, 2015

Any thougths @stephenplusplus ?

@stephenplusplus
Copy link
Contributor

@beshkenadze sorry that it's taken so long to get on this. It's hard to say what could be causing this or if it's related to #651.

The error is being returned because either an MD5 or CRC32c validation check isn't passing. In other words, the data you've received isn't matching the data that's stored in the bucket. You can workaround this by disabling validation:

bucket.file("reviews/reviews_com.sample.android_201409.csv").download({
  destination: './reviews.cvs',
  validation: false
},
function(err, content){
  // No more mismatch error (hopefully)
});

If you want to do some debugging, I put up a branch you can swap out your gcloud dependency for. It will just do a little console.log-ing to help see what's going on:

$ npm install --save stephenplusplus/gcloud-node#spp--654

@beshkenadze
Copy link
Author

Hey @stephenplusplus,
How is getting of the hash from a local file?
Play Сloud gives files compressed in GZ format.
Possible hashes are calculated from the extracted file?

@jgeewax
Copy link
Contributor

jgeewax commented Jul 20, 2015

Not sure I understand the question @beshkenadze ...

To give some background (sorry if you already know this, just adding for context):

I don't see anything that would indicate that we're looking at the uncompressed file, as we're treating the data as nothing more than bytes and ignoring the file type all together.

It could be possible that Play (when uploading the data) is somehow bypassing the part where they set the CRC32 and MD5 hash for the file (which would cause this error to happen on all Play-uploaded files). Could you tell us what you get back in the headers that start with x-goog-hash when you GET the files from GCS?

/cc @stephenplusplus

@beshkenadze
Copy link
Author

Now find the real file and will try to show an example.

@stephenplusplus
Copy link
Contributor

Could you tell us what you get back in the headers that start with x-goog-hash when you GET the files from GCS?

That's what this will do:

$ npm install --save stephenplusplus/gcloud-node#spp--654

@beshkenadze
Copy link
Author

@stephenplusplus version ("version": "0.8.1") to old :)

@stephenplusplus
Copy link
Contributor

That branch (spp--654) is tracking master: https://github.com/stephenplusplus/gcloud-node/tree/spp--654

@beshkenadze
Copy link
Author

I used request-debug and got this:

{
  response: {
    debugId: 1,
    headers: {
      'x-guploader-uploadid': 'XXXX',
      expires: 'Mon, 20 Jul 2015 13:40:35 GMT',
      date: 'Mon, 20 Jul 2015 13:40:35 GMT',
      'cache-control': 'private, max-age=0',
      'last-modified': 'Sun, 19 Jul 2015 18:53:59 GMT',
      etag: 'W/"XXXX"',
      'x-goog-generation': '1437332039288000',
      'x-goog-metageneration': '1',
      'x-goog-stored-content-encoding': 'gzip',
      'x-goog-stored-content-length': '5939',
      'content-type': 'text/csv; charset=utf-16le',
      'x-goog-hash': 'crc32c=66rJzQ==, md5=2T/NKanU9vTItoiF7+tMAA==',
      'x-goog-storage-class': 'STANDARD',
      vary: 'Accept-Encoding',
      'content-length': '24148',
      server: 'UploadServer',
      'alternate-protocol': '443:quic,p=1',
      connection: 'close'
    },
    statusCode: 200
  }
}

@stephenplusplus
Copy link
Contributor

Nice :) Using my branch will show the hashes that are being built locally as well.

@beshkenadze
Copy link
Author

@stephenplusplus ;)

Headers: { 'x-guploader-uploadid': 'XXXX',
  expires: 'Mon, 20 Jul 2015 13:46:35 GMT',
  date: 'Mon, 20 Jul 2015 13:46:35 GMT',
  'cache-control': 'private, max-age=0',
  'last-modified': 'Sun, 19 Jul 2015 18:53:59 GMT',
  etag: 'W/"XXXX"',
  'x-goog-generation': '1437332039288000',
  'x-goog-metageneration': '1',
  'x-goog-stored-content-encoding': 'gzip',
  'x-goog-stored-content-length': '5939',
  'content-type': 'text/csv; charset=utf-16le',
  'x-goog-hash': 'crc32c=66rJzQ==, md5=2T/NKanU9vTItoiF7+tMAA==',
  'x-goog-storage-class': 'STANDARD',
  vary: 'Accept-Encoding',
  'content-length': '24148',
  server: 'UploadServer',
  'alternate-protocol': '443:quic,p=1',
  connection: 'close' }

Local CRC32c Hash: Fw==
Local MD5 Hash: Hwt6cw9joXTy4EOtQqh0pg==
crypto.js:126
  return this._handle.digest(outputEncoding);
                      ^
Error: Not initialized
    at Error (native)

@stephenplusplus
Copy link
Contributor

Those don't match even a little bit! Like you pointed out @beshkenadze, I think we're running into issues because request automatically decodes the file as it's being downloaded, resulting in different hashes. I can't think of a great solution immediately for how we can work around this, other than:

  1. shut off the auto-decoding for all downloads (don't think we want this),
  2. ignore the hash mismatch if we see the file was gzip'd in the response headers,
  3. branch off from the request download stream, and run the calculation on the native http.IncomingMessage response stream (which won't do the decoding)

@jgeewax
Copy link
Contributor

jgeewax commented Jul 20, 2015 via email

@stephenplusplus
Copy link
Contributor

Yep, just confirmed it still happens.

@jgeewax
Copy link
Contributor

jgeewax commented Jul 20, 2015 via email

@stephenplusplus
Copy link
Contributor

I think idea no. 3 is probably the best way to handle this. I'll take a stab at it and PR soon.

@beshkenadze
Copy link
Author

👍

@adielmil
Copy link

Hi ,
Issue resolved? it happens to me today more than once.
I'm running on Google App engine flex env.
@google-cloud/storage version: 1.5.1

@stephenplusplus
Copy link
Contributor

@adielmil Could you post a new issue on https://github.com/googleapis/nodejs-storage? Be sure to fill out the issue template and provide sample code/files that we can use to reproduce.

@kirillgroshkov
Copy link

Started happening to us too (also with a .gz file). Piece of logs:

ApiError: Multiple errors occurred during the request. Please see the `errors` array for complete details.

    1. Bad Request
    2. Metadata part is too large.


    at new ApiError (/root/repo/node_modules/@google-cloud/common/build/src/util.js:73:15)
    at Util.parseHttpRespMessage (/root/repo/node_modules/@google-cloud/common/build/src/util.js:175:41)
    at Util.handleResp (/root/repo/node_modules/@google-cloud/common/build/src/util.js:149:76)
    at /root/repo/node_modules/@google-cloud/common/build/src/util.js:477:22
    at onResponse (/root/repo/node_modules/retry-request/index.js:228:7)
    at /root/repo/node_modules/teeny-request/src/index.ts:244:13
    at processTicksAndRejections (internal/process/task_queues.js:95:5) {
  code: 400,
  errors: [],
  response: PassThrough {

...

statusMessage: 'Bad Request',
    request: {
      agent: [Agent],
      headers: [Object],
      href: 'https://storage.googleapis.com/upload/storage/v1/b/our-bucket-id/o?uploadType=multipart&name=incremental%2FUserAchievements.ndjson.gz'
    },
    body: 'Metadata part is too large.',
    headers: {
      'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"',
      'content-length': '27',
      'content-type': 'text/plain; charset=utf-8',
      date: 'Wed, 22 Sep 2021 10:17:23 GMT',
      server: 'UploadServer',
      'x-guploader-uploadid': 'ADPycdvEW5Od42f4mkWWh8AJtU2Pm2qcEBgw2rXrv5_GC6iOu72CaG9gOIjFLQX3LJuwTh4yiizP2Sc18Vito9g9gsHb96lSHg'
    },

@acSpock
Copy link

acSpock commented Oct 14, 2021

Just to throw another angle on this issue. I'm currently seeing this in firebase storage emulators when developing locally. So far, not in production.

@JarnoRFB
Copy link

JarnoRFB commented Dec 2, 2021

@acSpock Hit the same problem using firebase emulator. What helped me was to use file.download({validation: false}) as described here https://www.reddit.com/r/Firebase/comments/nhilth/help_in_new_firebase_storage_emulator/

@LB-Digital
Copy link

@acSpock Hit the same problem using firebase emulator. What helped me was to use file.download({validation: false}) as described here https://www.reddit.com/r/Firebase/comments/nhilth/help_in_new_firebase_storage_emulator/

Perhaps a slightly better approach would be...

bucket.file(filePath).download({
    validation: !process.env.FUNCTIONS_EMULATOR,
});

So then you keep the validation in prod. In addition, if you're using TypeScript, you can satisfy the type checking by following the suggestion in this SO post: https://stackoverflow.com/a/53981706/6506026

@JarnoRFB
Copy link

JarnoRFB commented Dec 7, 2021

@LB-Digital is FUNCTIONS_EMULATOR an environment variable you expect to be set by default when running with the emulator? Because in tests I execute with firebase emulators:exec it seems not to be set.

@ElBouhaliMohamed
Copy link

@JarnoRFB yes you need it for firebase admin to know youre running local emulator

draperunner added a commit to draperunner/framejoy that referenced this issue Apr 24, 2022
@johnnyoshika
Copy link

I'm running into this error when using the emulator as well

chingor13 pushed a commit that referenced this issue Sep 8, 2022
* feat: add batchGetEffectiveIamPolicies sample code.

Add batchGetEffectiveIamPolicies sample code and also lint the protobuf
imports.

* chore: fix the Copyright year for getBatchEffectiveIamPolicies.js

* chore: refactor logging and remove loop for checking results

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* chore: modify the logging to print nested Object.

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
sofisl pushed a commit that referenced this issue Nov 11, 2022
This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/940354f9-15cd-4361-bbf4-dc9af1426979/targets

- [ ] To automatically regenerate this PR, check this box.

Source-Link: googleapis/synthtool@99c93fe
sofisl pushed a commit that referenced this issue Nov 11, 2022
* feat: add batchGetEffectiveIamPolicies sample code.

Add batchGetEffectiveIamPolicies sample code and also lint the protobuf
imports.

* chore: fix the Copyright year for getBatchEffectiveIamPolicies.js

* chore: refactor logging and remove loop for checking results

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* chore: modify the logging to print nested Object.

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
sofisl pushed a commit that referenced this issue Nov 17, 2022
Source-Link: googleapis/googleapis@253807f
Source-Link: googleapis/googleapis-gen@80a264b
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiODBhMjY0YmIyZWRmOWVjZWRhYzM1NDNjMDk2Y2IxODY0MGYzMzVjMSJ9
See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Benjamin E. Coe <bencoe@google.com>
sofisl pushed a commit that referenced this issue Jan 10, 2023
chore: relocate owl bot post processor
sofisl pushed a commit that referenced this issue Jan 24, 2023
Source-Author: Takashi Matsuo <tmatsuo@google.com>
Source-Date: Fri Oct 2 12:13:27 2020 -0700
Source-Repo: googleapis/synthtool
Source-Sha: 0c868d49b8e05bc1f299bc773df9eb4ef9ed96e9
Source-Link: googleapis/synthtool@0c868d4
sofisl pushed a commit that referenced this issue Jan 25, 2023
Source-Author: Takashi Matsuo <tmatsuo@google.com>
Source-Date: Fri Oct 2 12:13:27 2020 -0700
Source-Repo: googleapis/synthtool
Source-Sha: 0c868d49b8e05bc1f299bc773df9eb4ef9ed96e9
Source-Link: googleapis/synthtool@0c868d4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants