Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VM #4

Open
8 of 13 tasks
shntnu opened this issue May 1, 2020 · 12 comments
Open
8 of 13 tasks

Update VM #4

shntnu opened this issue May 1, 2020 · 12 comments

Comments

@shntnu
Copy link
Member

shntnu commented May 1, 2020

  • cytominer_ami.json: update ami_name (latest Ubuntu) and related items in builders
  • docker.sh: update to latest instructions
  • efs.sh: nothing
  • init.sh: nothing
  • python.sh: specify versions for everything, use cytominer-database on PyPi. Add back Python2
  • r.sh: specify versions for everything, figure out how to do that for R packages, use cytominer on CRAN
  • s3.sh: update s3fs-fuse installation procedure
  • tools.sh: use apt instead of apt-get
  • README.md: explain why we use we need to mount S3, and any other FAQs
  • all: wrap column to 80
  • Figure out why an instance created with this AMI gives *** System restart required *** on launch. Maybe read this New provisionner: posix-restart hashicorp/packer#4555 (comment)
  • See if there is anything to be done here https://github.com/cytomining/cytominer-vm/pull/5/files#r607784533. @bethac07 had followed up with So I added sudo mount -a to the bashrc as well and that makes the bucket auto-mount, so in that sense it's "fixed", but I would be curious to understand how to make it work the way it does in the AMI
  • Read this https://serverfault.com/a/261807 and make sure we are consistent with the roles of the three files.

Notes:
@connersk said: Yup used https://github.com/cytomining/cytominer-vm but I had to tweak a few things, the VM runs Ubuntu 14.04 which is no longer supported, so I updated it to Ubuntu 18.04, changed the R and python installations, and used a different means of creating a python 3 virtual environment

@shntnu shntnu mentioned this issue May 3, 2020
@shntnu
Copy link
Member Author

shntnu commented May 3, 2020

@bethac07 I have update the provisioning script for DCP Control Node by looking up instructions on the [wiki](# https://github.com/CellProfiler/Distributed-CellProfiler/wiki/Before-you-get-started%3A-setting-up)

I have copied it nearly verbatim here https://github.com/cytomining/cytominer-vm/blob/issues/4/python.sh#L68 so I think we should be all set, but can you confirm that there's nothing extra you do to set up the node that is not documented in the DCP wiki?

Also, a minor question: do you know why you do this pip install awscli --ignore-installed six and not pip install awscli

@bethac07
Copy link
Member

bethac07 commented May 4, 2020

DCP isn't tested in Python3 yet.

@shntnu
Copy link
Member Author

shntnu commented May 4, 2020

DCP isn't tested in Python3 yet.

Ah I missed that. I think I had confused it for the fact that CPv3 is now in DCP.
I assume it is non-trivial to do so? If so I'll re-add Python2 support in this VM (I had some ssl issues in doing so but should be easily fixable)

PS - I just saw DistributedScience/Distributed-CellProfiler#83 so its certainly more than a few hours of work. I'll go ahead and add back Python2

@bethac07
Copy link
Member

bethac07 commented May 4, 2020 via email

@shntnu
Copy link
Member Author

shntnu commented May 4, 2020

I don't think there's any urgent need to update the VM until that is ready.

Sounds good. I'll leave this PR open

@bethac07
Copy link
Member

DCP now does work on Python 3 (or 2, still). Only dependency for the login node is boto3

@shntnu shntnu linked a pull request Mar 2, 2021 that will close this issue
@shntnu
Copy link
Member Author

shntnu commented Mar 2, 2021

@BolekZapiec asked:

any advice as to what Ubuntu/python are most up to date but still functional to make it as future proof as possible? I’ve gotten it working on 18.04, any experience with 20.04? Is python 2 still needed?

  1. Python 2 is indeed still needed, because of https://github.com/broadinstitute/pe2loaddata
  2. Re Python 3 version, what's the latest version you'd recommend for DCP @bethac07?
  3. Re 20.04 – I haven't tested it but please do create it in 20.04 @BolekZapiec and we can test it out

@bethac07
Copy link
Member

bethac07 commented Mar 2, 2021

I haven't tested DCP on Python 3.9, but IIRC it should support anything 2.7-3.8, and i have no reason to think it doesn't support 3.9. It's not a huge codebase, and the only dependency now is boto3.

I think we can probably pretty easily python3-ize pe2loaddata, I can prioritize that if need be.

@shntnu
Copy link
Member Author

shntnu commented Apr 6, 2021

@BolekZapiec – just checking if you trying working further with this VM and if you had any notes to share?

@bethac07
Copy link
Member

bethac07 commented Apr 6, 2021

Maybe it's outside the scope of this, but I think it might be worth explaining the security group setup for the EFS and how to select an IAM role that allows bucket access (whether we're using it at build rather than runtime) - if only for the benefits of our future selves, because it definitely took me a while to figure out the EFS setup.

@bethac07
Copy link
Member

bethac07 commented Apr 6, 2021

Also, this assumes an R-cytominer world, do we currently live in a py-cytominer world and if so, we should add in whatever we need for that.

@BolekZapiec
Copy link

@shntnu Thanks for the reminder - I got sidetracked with getting the ECS (finally!) working on our internal subnet and forgot to update this. Below are my notes for the necessary changes

tool.sh

change
mysql-client-core-5.5
to
mysql-client //5.5 not supported starting 16.04

change
libreadline6 libreadline6-dev
to
libreadline8 libreadline-dev

Change
sudo apt-get install -y python python-dev python-pip python-pip python-setuptools
to
sudo apt-get install -y python-is-python2 python-dev-is-python2 python3-pip python-setuptools

python.sh
ubuntu 20.0.4 includes python3 and global packages should be installed by apt

add
sudo apt install python3-pip
curl https://bootstrap.pypa.io/2.7/get-pip.py --output get-pip.py
sudo python2 get-pip.py

change
pyenv install 3.5.1
to
pyenv install 3.8.0

change
pyenv install 2.7.12
to (compatibility with libssl from 20.04, 18.04 was still compatible with old libssl that's 2.7.12 friendly)
pyenv install 2.7.17

docker.sh
change
sudo sh -c 'echo "deb https://apt.dockerproject.org/repo ubuntu-trusty main" > /etc/apt/sources.list.d/docker.list'
to
sudo apt-get install apt-transport-https ca-certificates curl gnupg
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

change
sudo apt-get install -y linux-image-extra-$(uname -r)
to
sudo apt-get install -y linux-modules-extra-$(uname -r)

change
sudo apt-get install -y docker-engine
to
sudo apt-get install docker.io

delete
sudo groupadd docker

cytominer_ami.json

change
"ami_name": "cytomining/images/hvm-ssd/cytominer-ubuntu-trusty-14.04-amd64-server-{{timestamp}}",
to
"ami_name": "cytomining/images/hvm-ssd/cytominer-ubuntu-focal-20.04-amd64-server-{{timestamp}}",

change
"name": "ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-",
to
"name": "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-
",

For the S3, for some reason, the command in fstab didn't work for me, instead I first need to generate rights file:
echo ACCESS_KEY_ID:SECRET_ACCESS_KEY > ${HOME}/.passwd-s3fs
chmod 600 ${HOME}/.passwd-s3fs
then mount with:
s3fs mybucket /path/to/mountpoint -o passwd_file=${HOME}/.passwd-s3fs
or debug with:
s3fs mybucket /path/to/mountpoint -o passwd_file=${HOME}/.passwd-s3fs -o dbglevel=info -f -o curldbg
I'm guessing this is due to the same reason my ecsinstance IAM role couldn't use its role to access the S3 bucket (it needed kmsdecryptgenerate) so I'll try adding it to ec3-spot-fleet-role and see if the original S3 mount method works.

Before pushing these changes I'd like to re-create a cytominer VM with all these changes implemented so that I can be sure there was no additional debugging steps I did post-hoc that made things work and that the edits listed are comprehensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants