Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge download number from Amazon Linux 1 #4278

Closed
methane opened this issue Jun 27, 2019 · 11 comments
Closed

Huge download number from Amazon Linux 1 #4278

methane opened this issue Jun 27, 2019 · 11 comments

Comments

@methane
Copy link
Contributor

methane commented Jun 27, 2019

Regarding pypistats, awscli and its dependencies are the most downloaded packages. I try to investigate who downloads awscli from PyPI so much.

I found a very interesting result. It seems awscli is downloaded from Amazon Linux 1 much.

date kernel downloads
2019-05-14 4.14.77-70.59.amzn1.x86_64 244827
2019-05-14 4.4.23-31.54.amzn1.x86_64 55211
2019-05-15 4.14.77-70.59.amzn1.x86_64 168414
2019-05-15 4.14.114-83.126.amzn1.x86_64 74483
2019-05-16 4.14.114-83.126.amzn1.x86_64 208952
2019-05-16 4.4.23-31.54.amzn1.x86_64 63206
2019-05-17 4.14.114-83.126.amzn1.x86_64 206870
2019-05-17 4.4.23-31.54.amzn1.x86_64 64965
--- --- ---
2019-06-17 4.14.114-83.126.amzn1.x86_64 211850
2019-06-17 4.4.23-31.54.amzn1.x86_64 56809
2019-06-18 4.14.123-86.109.amzn1.x86_64 167728
2019-06-18 4.14.114-83.126.amzn1.x86_64 67278
--- --- ---
2019-06-25 4.14.123-86.109.amzn1.x86_64 234755
2019-06-25 4.4.23-31.54.amzn1.x86_64 66793

I suspect that this huge number of downloads are from not regular EC2 user because:

  • Although Amazon Linux 2 is released a year ago, downloads from Amazon Linux 1 is not decreasing.
  • It seems download from Amazon Linux 1 is much higher than download from Ubuntu, while Ubuntu is popular too.

I'm sorry if I am wrong, but could you confirm some service in AWS based on Amazon Linux 1 do pip install awscli from very old pip (6.1.1), about 200k times/day?

@methane
Copy link
Contributor Author

methane commented Jun 28, 2019

date kernel python pip
2019-06-18~ 4.14.123-86.109.amzn1.x86_64 2.7.16 6.1.1
2019-05-15~2019-06-17 4.14.114-83.126.amzn1.x86_64 2.7.16 6.1.1
2018-11-20~2019-05-15 4.14.77-70.59.amzn1.x86_64 2.7.14 6.1.1
2018-08-17~2018-11-20 4.14.62-65.117.amzn1.x86_64 2.7.14 6.1.1
2018-05-18~2018-08-21 4.14.33-51.37.amzn1.x86_64 2.7.14 6.1.1
~2018-05-14 4.14.26-46.32.amzn1.x86_64 2.7.13 6.1.1
  • pip 6.1.1 is very old, and combination of Python 2.7.16 and pip 6.1.1 and amzn1 kernel is uncommon. For example, Latest Amazon Linux 1 preinstalls Python 2.7.16 and pip 9.0.3. Amazon Linux 1 2017.03 preinstalls Python 2.7.12 and pip 6.1.1.
  • Python and Kernel were updated several times, within one or two days.
  • 200k DL/day even on Weekends.

It seems very strange. I suspect these huge downloads are from AWS itself or very large company's system.

@justnance justnance assigned justnance and unassigned justnance Jul 1, 2019
@justnance justnance added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jul 1, 2019
@methane
Copy link
Contributor Author

methane commented Jul 10, 2019

I found "CloudWatch Logs Agent" downgrade pip to 6.1.1 and install awscli !

I ran this query on BigQuery:

SELECT
  details.system.release,
  COUNT(*) AS cnt
FROM
  [the-psf:pypi.downloads20190709]
WHERE
  file.project = "pip"
  AND file.version = "6.1.1"
  AND details.implementation.version = "2.7.16"
GROUP BY
  details.system.release
ORDER BY
  cnt DESC

Result:

details_system_release cnt
4.14.123-86.109.amzn1.x86_64 195311
4.14.109-80.92.amzn1.x86_64 3578
4.9.27-14.33.amzn1.x86_64 2348

Bingo! About 200k DL!

@methane
Copy link
Contributor Author

methane commented Jul 10, 2019

I created a pull request to update the doc to use standalone mode.

But many users use online install in their UserData already.
Would you update the awslogs-agent-setup.py file to download dependencies from S3, not PyPI?

@stealthycoin
Copy link
Contributor

Thanks for digging into this so much. I have raised this internally with the CloudWatch Logs team.

@stealthycoin stealthycoin added pr:work-in-progress This PR is a draft and needs further work. and removed pr:work-in-progress This PR is a draft and needs further work. labels Jul 11, 2019
@methane
Copy link
Contributor Author

methane commented Jul 16, 2019

FWIW, download number of awscli is stil huge even though excluding downloads from awslogs-agent-setup.py.
It would be helpful to recommend bundled installer is recommended more than pip install.

sudo pip install awscli may conflict system packages. Bundled installer is easier than manually setup virtual environment. Additionally, users can use frozen dependency libraries. So some broken library update or PyPI outage doesn't affect to user's server provisioning.

So bundled installer is much better than pip for regular sys admins.

@pradyunsg
Copy link

A gentle ping on this. Any updates?

@kyleknap kyleknap removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Aug 26, 2019
@pradyunsg
Copy link

pradyunsg commented Oct 8, 2019

Pinging again, to see if folks are interested in taking this forward.

@methane
Copy link
Contributor Author

methane commented Oct 17, 2019

May I write a patch for awslogs-agent-setup.py to download files from S3?

@methane
Copy link
Contributor Author

methane commented Jan 15, 2020

I found https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/awslogs-agent-setup.py is updated to download dependencies frorm 'https://s3.amazonaws.com/aws-cloudwatch/downloads/latest/AgentDependencies.tar.gz'.

I will close this issue after in this week, after I confirm the PyPIStats.

@methane
Copy link
Contributor Author

methane commented Jan 16, 2020

Confirmed.

image

It affects Python 2 vs 3 ratio in some packages. For example, this is download stats of urllib3.

image

Thank you for fixing this.

@methane methane closed this as completed Jan 16, 2020
@methane
Copy link
Contributor Author

methane commented Jan 29, 2020

I found there are still huge download from pip 6.1.1.
Is there any installer like awslogs-agent-setup.py but for awscli?

query:

SELECT
  file.project as proj,
  COUNT(*) AS cnt
FROM
  `the-psf.pypi.downloads20200128`
WHERE
  details.installer.name = "pip"
  and details.installer.version = "6.1.1"
GROUP BY
  proj
ORDER BY
  cnt DESC

Result:

proj cnt
botocore 188069
s3transfer 184599
urllib3 181705
awscli 179487
six 174167
python-dateutil 173112
docutils 172611
pyasn1 170876
jmespath 169021
colorama 168216
rsa 167941
pyyaml 166260
futures 163741
simplejson 128451
argparse 128146
ordereddict 126778
awscli-cwlogs 25806
boto3 25099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants