Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 cp --recursive not downloading all small files, when it says it does #3087

Closed
rterwedo opened this issue Jan 17, 2018 · 16 comments
Closed
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.

Comments

@rterwedo
Copy link

rterwedo commented Jan 17, 2018

We are in the process of writing a few thousand ~500 byte files to an s3 bucket/folder.

Currently In the console it shows 280 files (we assume this is correct as we have not wrote them all yet) in the folder.

aws s3 ls bucket/folder/ --recursive shows 280 files.

aws s3 cp bucket/folder/ ./ --recursive shows its copied 280 files on the command line. You can read the number explicitly and I counted lines of "copied file" output in console.

However in mac os (right click get info) it shows 211 files.

Additionally a ls . | wc -l shows 211 files.

I have tried reducing number of concurrent to 1. Cannot understand why it would show completed when the file doesn't exist on the local. This is over wifi btw, so if there is no checking if a file was downloaded correctly, maybe thats it? Unsure where to look next

UPDATE: We put together a hack to get around this bug... basically, if you run:

aws s3 ls bucket/folder/ --recursive --dryrun >> filestodownload.txt

It shows all the files you want to download, and saves it in a text file thats easy to parse. We then parsed it, and did a separate aws s3 cp command for each individual file. All downloaded successfully, albeit it was slow... so now working on concurrency... but files are still MIA (even though we have local checks after download the file size > 0 and all succeed.

Really confusing...

UPDATE 2: We spun up a AWS AMI and repeated process above. It worked as expected. IDK, seems to be related to Mac? I am on 10.12.4 Sierra.

@rterwedo rterwedo reopened this Jan 18, 2018
@stealthycoin
Copy link
Contributor

You rerun the commands with the --debugflag which will give you a lot of information to help pinpoint the issue. If you still think something is a bug you can post the sanitized logs here for us to take a look at.

@stealthycoin stealthycoin added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jan 18, 2018
@rterwedo
Copy link
Author

Extremely large amount of output, let me look thru

@rterwedo
Copy link
Author

What are you looking for? A bunch of items relating region redirect, and stuff like below

[2018-01-18 18:28:17,014 - ThreadPoolExecutor-0_5 - botocore.hooks - DEBUG - Event after-call.s3.GetObject: calling handler <function enhance_error_msg at 0x104583230>
2018-01-18 18:28:17,014 - ThreadPoolExecutor-0_2 - botocore.retryhandler - DEBUG - No retry needed.
2018-01-18 18:28:17,015 - ThreadPoolExecutor-0_1 - botocore.hooks - DEBUG - Event after-call.s3.GetObject: calling handler <function enhance_error_msg at 0x104583230>
2018-01-18 18:28:17,015 - ThreadPoolExecutor-0_5 - s3transfer.tasks - DEBUG - IOWriteTask(transfer_id=6, {'offset': 0}) about to wait for the following futures []
2018-01-18 18:28:17,015 - ThreadPoolExecutor-0_0 - botocore.hooks - DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x104cd23d0>>
](url)

@JordonPhillips
Copy link
Member

What version of the CLI were you using that was running into these errors? Sounds like an upgrade might have resolved it, but it would be good to know for sure.

@joguSD
Copy link
Contributor

joguSD commented Mar 14, 2018

Closing due to inactivity.

@joguSD joguSD closed this as completed Apr 24, 2018
@HamedAlemo
Copy link

I have the same problem. Trying to copy all files from a bucket to another one with --recursive but at the end some of the files are not copied, and no error is reported.
I simply use:

aws s3 cp s3://old_bucket/folder1 s3://new_bucket/folder2/ --recursive

@davidhusselmann
Copy link

I also have this issue. At the moment I have a feeling it might be to do with S3's eventual consistency model and me doing something weird when poking the files into S3. In case this helps anyone.

@mukteshkrmishra
Copy link

Facing the same issue. ls shows all directories and files. CP or sync only does some of them No error. Also, there is no policy set up for inclusion-exclusion.

@jqadev
Copy link

jqadev commented Dec 13, 2018

The same problem here, awscli==1.16.74
aws s3 cp command with --recursive parameter go the deepest directory, and then back to directory only one level higher.
This way it omits many directories on higher levels.

@JordonPhillips please open this issue

@S0uLHun43r
Copy link

The same problem here, awscli==1.16.186
aws s3 cp and aws s3 sync happen same .
but when i use s3cmd,it works . @JordonPhillips please open this issue
image

@frankyaorenjie
Copy link

frankyaorenjie commented Feb 18, 2021

Maybe it is caused by Python version. I had this problem on EC2-Linux, with 1.18 awscli/Python3.7. But after install Python3.7 and awscli 1.19 by pip. It works - aws s3 copy will download all files.

@debu99
Copy link

debu99 commented Aug 13, 2021

Same issue, it works in commandline but not work in bash script, no errors

@kdaily
Copy link
Member

kdaily commented Aug 13, 2021

@debu99 - if you are still experiencing this issue, can you please open up a new bug report with our template so we can get all of the information from you? A minimal reproducible example where it does work and does not work would be most useful. Thanks!

@debu99
Copy link

debu99 commented Aug 14, 2021

I fixed the issue, it is due to \r in end of every line in the filelist file, I remove it and use aws sync, and then my bash script works

thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this issue Feb 12, 2022
* sam pipeline bootstrap (aws#2811)

* two-stages-pipeline plugin

* typos

* add docstring

* make mypy happy

* removing swap file

* delete the two_stages_pipeline plugin as the pipeline-bootstrap command took over its responsibility

* remove 'get_template_function_runtimes' function as the decision is made to not process the SAM template during pipeline init which was the only place we use the function

* sam pipeline bootstrap command

* move the pipelineconfig.toml file to .aws-sam

* UX - rewriting

Co-authored-by: Chris Rehn <crehn@outlook.com>

* UX improvements

* make black happy

* apply review comments

* UX - rewriting

Co-authored-by: Chris Rehn <crehn@outlook.com>

* refactor

* Apply review comments

* use python way of array elements assignments

* Update samcli/lib/pipeline/bootstrap/stage.py

Co-authored-by: _sam <3804518+aahung@users.noreply.github.com>

* apply review comments

* typo

* read using utf-8

* create and user a safe version of the save_config method

* apply review comments

* rename _get_command_name to _get_command_names

* don't save generated ARNs for now, will save during init

* Revert "don't save generated ARNs for now, will save during init"

This reverts commit d184e164022d9560131c62a826436edbc93da189.

* Notify the user to rotate periodically rotate the IAM credentials

* typo

* Use AES instead of KMS for S3 SSE

* rename Ecr to ECR and Iam to IAM

* Grant lambda service explicit permissions to thhe ECR instead of relying on giving this permissions on ad-hoc while creating the container images

Co-authored-by: Chris Rehn <crehn@outlook.com>
Co-authored-by: _sam <3804518+aahung@users.noreply.github.com>

* sam pipeline init command (aws#2831)

* sam pipeline init command

* apply review comments

* apply review comments

* display a message that we have successfully created the pipeline configuration file(s).

* doc typo

* Let 'sam pipeline init'  prefills pipeline's infrastructure resources… (aws#2894)

* Let 'sam pipeline init'  prefills pipeline's infrastructure resources' values from 'sam pipeline bootstrap'  results.

* save bootstrapped sateg region

* make black happy

* exclude non-dict keys from samconfig.get_env_names method.

* Rename the pipeline 'Stage' concept to 'Environment' (aws#2908)

* Rename the pipeline 'Stage' concept to 'Environment'

* typo

* Rename --environment-name argument to --environment

* Sam pipelines ux rename ecr repo to image repository (aws#2910)

* Rename ecr-repo to image-repository

* UT Fixes

* typo

* typo

* feat: Support creating pipeline files directly into . without hooks (aws#2911)

* feat: Support creating pipeline files directly into . without hooks

* Integration test for pipeline init and pipeline bootstrap (aws#2841)

* Expose Environment._get_stack_name for integ test to predict stack name

* Add integ test for pipeline bootstrap

* Add init integ test

* small UX improvements: (aws#2914)

* small UX improvements:
1. show a message when the user cancels a bootstrapping command.
2. Don't prompt for CI/CD provider or provider templates if there is only one choice.
3. Make PipelineFileAlreadyExistsError a UserError.
4. use the Colored class instead of fg='color' when prompting a colored message.
5. Fix a bug where we were not allowing empty response for not required questions.

* Fix Integration Test: We now don't ask the user to select a provider's pipeline template if there is only one

* Add docs for PipelineFileAlreadyExistsError

* make black happy

* Sam pipelines s3 security (aws#2975)

* Deny non https requests for the artifacts S3 bucket

* enable bucket serverside logging

* add integration tests for artifacts bucket SSL-only requests and access logging

* typo

* Ensure the ArtifactsLoggingBucket denies non ssl requests (aws#2976)

* Sam pipelines ux round 3 (aws#2979)

* rename customer facing message 'CI/CD provider' to 'CI/CD system'

* add a note about what 'Environment Name' is during the pipeline bootstrap guided context

* Apply suggestions from code review

typo

Co-authored-by: Chris Rehn <crehn@outlook.com>

Co-authored-by: Chris Rehn <crehn@outlook.com>

* let pipeline IAM user assume only IAM roles tagged with Role=pipeline-execution-role (aws#2982)

* Adding AWS_ prefix to displayed out. (aws#2993)

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Add region to pipeline bootstrap interactive flow (aws#2997)

* Ask AWS region in bootstrap interactive flow

* Read default region from boto session first

* Fix a unit test

* Inform write to pipelineconfig.toml at the end of bootstrap (aws#3002)

* Print info about pipelineconfig.toml after resources are bootstrapped

* Update samcli/commands/pipeline/bootstrap/cli.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

Co-authored-by: Chris Rehn <crehn@outlook.com>

* List detected env names in pipeline init when prompt to input the env name (aws#3000)

* Allow question.question can be resolved using key path

* Pass the list of env names message (environment_names_message) into pipeline init interactive flow context

* Update samcli/commands/pipeline/init/interactive_init_flow.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Fix unit test (trigger pr builds)

* Fix integ test

* Fix integ test

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Adding account id to bootstrap message. (aws#2998)

* Adding account id to bootstrap message.

* adding docstring

* Addressing PR comments.

* Adding unit tests.

* Fixing unit tests.

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Cfn creds fix (aws#3014)

* Removing pipeline user creds from cfn output. This maintains same user exp.

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Ux bootstrap revamp 20210706 (aws#3021)

* Add intro paragraph to bootstrap

* Add switch account prompt

* Revamp stage definition prompt

* Revamp existing resources prompt

* Revamp security prompt

* Allow answers to be changed later

* Add exit message for bootstrap

* Add exit message for bootstrap (1)

* Add indentation to review values

* Add "Below is the summary of the answers:"

* Sweep pylint errors

* Update unit tests

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/guided_context.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update samcli/commands/pipeline/bootstrap/cli.py

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Update unit tests

* Add bold to other literals

Co-authored-by: Chris Rehn <crehn@outlook.com>

* Adding account condition for CFN execution role. (aws#3027)

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* pipeline UX revamp 20210707 (aws#3031)

* Allow running bootstrap inside pipeline init

* Select account credential source within bootstrap

* Add bootstrap decorations within pipeline init

* Removing ip range option from bootstrap. (aws#3036)

* Removing ip range option from bootstrap.

* Fixing unit test from UX PR.

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Fix toml file incorrect read/write in init --bootstrap (aws#3037)

* Temporarily removing account fix. (aws#3038)

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Rename environment to stage (aws#3040)

* Improve account source selection (aws#3042)

* Fixing various cosmetics UX issues with pipeline workflow. (aws#3046)

* Fixing credential to credentials

* Forcing text color to yellow.

* Adding new line after stage diagram.

* Adding extra line after checking bootstrap message.

* Renaming config -> configuration

* account source -> credential source

* Removing old message.

* Fixing indentation in list.

* Fixing bunch of indentation.

* fixing f string

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Auto skip questions if stage detected (aws#3045)

* Autofill question if default value is presented

* Allow to use index to select stage names (aws#3051)

* Updating message when bootstrap stages are missing. (aws#3058)

* Updating message when bootstrap stages are missing.

* Fixing indendation

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Fixing bootstrap integ tests. (aws#3061)

* Fixing bootstrap integ tests.

* Cleaning up some integ tests.

* Using environment variables when running integ test on CI.

* Using expression instead of full loop.

* Adding instruction to use default profile on local.

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Fix bootstrap test region (#3064)

* Fix bootstrap region in integ test

* Fix regions in non-interactive mode as well

* Add more pipeline init integ test (aws#3065)

* Fix existing pipeline init integ test

* Add more pipeline init integ tests

* Config file bug (aws#3066)

* Validating config file after bootstrap stack creation.

* Validating config file after bootstrap.

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

* Fix pipeline init integ test because of pipelineconfig file exists (aws#3067)

* Make stage name randomized to avoid race condition among multi canary runs (aws#3078)

* Load number of stages from pipeline template (aws#3059)

* Load number of stages from templates

* Rename variable and add debug log

* Add encoding to open()

* Allow roles with Tag aws-sam-pipeline-codebuild-service-role to assume PipelineExecutionRole (aws#2950)

* pipeline init UX: Ask to confirm when file exists (aws#3079)

* Ask to confirm overriding if files already exist, or save to another directory

* Add doc links (aws#3087)

* Adding accidentally removed tests back. (aws#3088)

Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>

Co-authored-by: elbayaaa <72949274+elbayaaa@users.noreply.github.com>
Co-authored-by: Chris Rehn <crehn@outlook.com>
Co-authored-by: Ahmed Elbayaa <elbayaaa@amazon.com>
Co-authored-by: Tarun <c2tarun@users.noreply.github.com>
Co-authored-by: Tarun Mall <tarun@amazon.noreply.github.com>
@usamec
Copy link

usamec commented Jun 28, 2022

Check your logs.
It often contains something like:

@Asce099
Copy link

Asce099 commented May 16, 2023

Screenshot 2023-05-16 114440
i am new at this i tried using command aws s3 cp s3://bucket_name/ E:/file_name --recursive, i get download message yet i cant find the files on local folder why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.
Projects
None yet
Development

No branches or pull requests