Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for CI #2741

Merged
merged 4 commits into from
Oct 15, 2022
Merged

Update for CI #2741

merged 4 commits into from
Oct 15, 2022

Conversation

puhuk
Copy link
Contributor

@puhuk puhuk commented Oct 14, 2022

Description: Update python, pytorch, cuda version for CI

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions bot added the ci CI label Oct 14, 2022
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 14, 2022

@puhuk can you try to run the workflow: .github/workflows/pytorch-version-tests.yml on your fork to see if it is passing ?

@puhuk
Copy link
Contributor Author

puhuk commented Oct 14, 2022

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 14, 2022

https://github.com/pytorch/ignite/actions/workflows/pytorch-version-tests.yml?query=branch%3Aci_version on your fork

- pytorch-version: 1.3.1
python-version: 3.10
- pytorch-version: 1.4.0
python-version: 3.10
Copy link
Collaborator

@vfdev-5 vfdev-5 Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked if 1.5.1, 1.6.0 etc have 3.10 python binaries ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check
When I test with github action, it tests with 3.1 not 3.10. May I ask why it does not recognize 10 ??
https://github.com/puhuk/ignite/actions/runs/3249191243/jobs/5331308987

Copy link
Collaborator

@vfdev-5 vfdev-5 Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be a str "3.10"

python-version: [3.7, 3.8, 3.9, "3.10"]

Copy link
Contributor Author

@puhuk puhuk Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check from this page and I exclude all the pytorch version with python 3.10 except 1.11.0 and 1.12.1

https://download.pytorch.org/whl/torch_stable.html

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 14, 2022

Can you run the workflow on your fork and send here a link on the status ?

@puhuk
Copy link
Contributor Author

puhuk commented Oct 14, 2022

It fails from pytorch 1.11.0 with python 3.10
https://github.com/puhuk/ignite/actions/runs/3250367041/jobs/5333960054

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 14, 2022

It fails from pytorch 1.11.0 with python 3.10 https://github.com/puhuk/ignite/actions/runs/3250367041/jobs/5333960054

Let's put a skip and add a comment why

@puhuk
Copy link
Contributor Author

puhuk commented Oct 15, 2022

I remove 1.11.0 with python 3.10 and it pass the test
https://github.com/puhuk/ignite/actions/runs/3254402175

It seems it occurs error when run torchrun, let me share the reason as soon as I figure it out.

subprocess.CalledProcessError: Command '['torchrun', '--nproc_per_node=4', '/data/projects/sangho/projects/oss/ignite/tests/ignite/distributed/check_idist_parallel.py', '--backend=gloo', '--init_method=file:///tmp/tmpqgwyyg02/shared']' died with <Signals.SIGSEGV: 11>.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Oct 15, 2022

Where do you see this error ?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @puhuk !

@puhuk
Copy link
Contributor Author

puhuk commented Oct 15, 2022

When I reproduce the error from my server.

@vfdev-5 vfdev-5 merged commit 32ef11d into pytorch:master Oct 15, 2022
@puhuk puhuk deleted the ci_version branch October 21, 2022 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants