Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pairwise_dist, mse and lambda in test for generating data with different rank #2670

Merged
merged 14 commits into from
Sep 1, 2022

Conversation

puhuk
Copy link
Contributor

@puhuk puhuk commented Aug 23, 2022

Description: Update mean_pairwise_distance, mean_squared_error, metrics_lambda

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

… different rank

Update `mean_pairwise_distance`, `mean_squared_error`, `metrics_lambda`
To add `assert` clauses
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Aug 31, 2022

@puhuk xla test are failing and related to your changes. Please check

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @puhuk !

@vfdev-5 vfdev-5 enabled auto-merge (squash) September 1, 2022 15:21
@vfdev-5 vfdev-5 merged commit 96f717b into pytorch:master Sep 1, 2022
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Sep 1, 2022

@puhuk this PR breaks GPU tests : https://app.circleci.com/pipelines/github/pytorch/ignite/2733/workflows/7b587ea4-8495-4e63-abf8-aaf613507355/jobs/7944

please fix it !

=================================== FAILURES ===================================
____________________________ test_distrib_nccl_gpu _____________________________

obj = tensor([[0.4265, 0.3351, 0.4854,  ..., 0.2016, 0.6155, 0.0631],
        [0.5099, 0.5548, 0.3666,  ..., 0.0475, 0.8877,... ..., 0.4133, 0.7024, 0.7195],
        [0.1103, 0.6870, 0.6060,  ..., 0.9267, 0.7757, 0.7151]],
       device='cuda:0')
method = 'argmax', args = (), kwds = {'axis': -1, 'out': None}
bound = <built-in method argmax of Tensor object at 0x7f21700964d0>

    def _wrapfunc(obj, method, *args, **kwds):
        bound = getattr(obj, method, None)
        if bound is None:
            return _wrapit(obj, method, *args, **kwds)
    
        try:
>           return bound(*args, **kwds)
E           TypeError: argmax() got an unexpected keyword argument 'axis'

/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py:57: TypeError

During handling of the above exception, another exception occurred:

distributed_context_single_node_nccl = {'local_rank': 0, 'rank': 0, 'world_size': 1}

    @pytest.mark.distributed
    @pytest.mark.skipif(not idist.has_native_dist_support, reason="Skip if no native dist support")
    @pytest.mark.skipif(torch.cuda.device_count() < 1, reason="Skip if no GPU")
    def test_distrib_nccl_gpu(distributed_context_single_node_nccl):
    
        device = idist.device()
        _test_distrib_integration(device)
>       _test_distrib_metrics_on_diff_devices(device)

tests/ignite/metrics/test_metrics_lambda.py:545: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/ignite/metrics/test_metrics_lambda.py:533: in _test_distrib_metrics_on_diff_devices
    f1_true = f1_score(y_true.ravel(), np.argmax(y_preds.reshape(-1, n_classes), axis=-1), average="macro")
<__array_function__ internals>:6: in argmax
    ???
/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py:1195: in argmax
    return _wrapfunc(a, 'argmax', axis=axis, out=out)
/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py:66: in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
/opt/conda/lib/python3.7/site-packages/numpy/core/fromnumeric.py:43: in _wrapit
    result = getattr(asarray(obj), method)(*args, **kwds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = tensor([[0.4265, 0.3351, 0.4854,  ..., 0.2016, 0.6155, 0.0631],
        [0.5099, 0.5548, 0.3666,  ..., 0.0475, 0.8877,... ..., 0.4133, 0.7024, 0.7195],
        [0.1103, 0.6870, 0.6060,  ..., 0.9267, 0.7757, 0.7151]],
       device='cuda:0')
dtype = None

    def __array__(self, dtype=None):
        if has_torch_function_unary(self):
            return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
        if dtype is None:
>           return self.numpy()
E           TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

/opt/conda/lib/python3.7/site-packages/torch/_tensor.py:757: TypeError

@puhuk puhuk mentioned this pull request Sep 1, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants