Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuml.dask.decomposition.PCA: Fails with 'AttributeError:' #4027

Closed
rilango opened this issue Jul 2, 2021 · 1 comment
Closed

cuml.dask.decomposition.PCA: Fails with 'AttributeError:' #4027

rilango opened this issue Jul 2, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@rilango
Copy link

rilango commented Jul 2, 2021

Describe the bug
For the dataset at https://drive.google.com/file/d/15aRM1_KtSjiD7wGKAYA7bmL6vRSJvsXH/view?usp=sharing, cuml.dask.decomposition.PCA transform fails with following trace.


distributed.worker - WARNING - Compute Failed
Function:  _transform_func
args:      (PCAMG(),       0    1    2    3    4    5    6    ...  505  506  507  508  509  510  511
0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  1.0
1     0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  1.0
2     0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  1.0
3     0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  1.0
4     0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  1.0
...   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
9995  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  1.0  0.0  0.0  0.0  0.0
9996  0.0  0.0  0.0  0.0  0.0  0.0  1.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9997  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9998  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9999  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0  0.0  0.0  0.0  0.0

[10000 rows x 512 columns])
kwargs:    {}
Exception: AttributeError()

INFO:cuchemcommon.utils.logger:### Runtime pca time (hh:mm:ss.ms) 0:00:02.120781
[1625256148.660650] [cuchemUI:24   :0]           sock.c:451  UCX  ERROR recv(fd=62) failed: Connection reset by peer
Traceback (most recent call last):
  File "/workspace/cuchem//startdash.py", line 382, in <module>
    main()
  File "/workspace/cuchem//startdash.py", line 378, in main
    Launcher()
  File "/workspace/cuchem//startdash.py", line 95, in __init__
    getattr(self, args.command)()
  File "/workspace/cuchem//startdash.py", line 343, in analyze
    mol_df = workflow.cluster()
  File "/workspace/cuchem/cuchem/wf/cluster/gpukmeansumap.py", line 186, in cluster
    self)
  File "/opt/conda/envs/rapids/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/workspace/cuchem/cuchem/wf/cluster/gpukmeansumap.py", line 52, in _
    return _gpu_cluster_wrapper(embedding, n_pca, self)
  File "/opt/conda/envs/rapids/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/workspace/cuchem/cuchem/wf/cluster/gpukmeansumap.py", line 65, in _
    return self._cluster(embedding, n_pca)
  File "/workspace/cuchem/cuchem/wf/cluster/gpukmeansumap.py", line 114, in _cluster
    embedding = self.pca.transform(embedding)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/dask/decomposition/pca.py", line 210, in transform
    delayed=delayed)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/dask/common/base.py", line 340, in _transform
    **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/dask/common/base.py", line 311, in _run_parallel_func
    output = dask.dataframe.from_delayed(preds)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/dataframe/io/io.py", line 592, in from_delayed
    meta = delayed(make_meta)(dfs[0]).compute()
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py", line 285, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/dask/base.py", line 567, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 2674, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 1989, in gather
    asynchronous=asynchronous,
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 852, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 354, in sync
    raise exc.with_traceback(tb)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/utils.py", line 337, in f
    result[0] = yield future
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/client.py", line 1848, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/dask/common/base.py", line 432, in _transform_func
    return model.transform(data, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.7/site-packages/cuml/internals/api_decorators.py", line 586, in inner_get
    ret_val = func(*args, **kwargs)
  File "cuml/decomposition/pca.pyx", line 689, in cuml.decomposition.pca.PCA.transform
  File "cuml/common/base.pyx", line 270, in cuml.common.base.Base.__getattr__
AttributeError

Steps/Code to reproduce bug

!pip install tables

from cuml.dask.decomposition import PCA as cuDaskPCA
import cudf
import dask_cudf
import cupy
from dask.distributed import Client, LocalCluster
from dask_cuda import initialize, LocalCUDACluster
from dask_cuda.local_cuda_cluster import cuda_visible_devices
from dask_cuda.utils import get_n_gpus

enable_tcp_over_ucx = True
enable_nvlink = True
enable_infiniband = True

initialize.initialize(create_cuda_context=True,
                      enable_tcp_over_ucx=enable_tcp_over_ucx,
                      enable_nvlink=enable_nvlink,
                      enable_infiniband=enable_infiniband)
n_gpu = get_n_gpus()

device_list = cuda_visible_devices(1, range(n_gpu)).split(',')
CUDA_VISIBLE_DEVICES = list(map(lambda x : int(x), device_list))

cluster = LocalCUDACluster(protocol="ucx",
                           dashboard_address=':8787',
                           CUDA_VISIBLE_DEVICES=CUDA_VISIBLE_DEVICES,
                           enable_tcp_over_ucx=enable_tcp_over_ucx,
                           enable_nvlink=enable_nvlink,
                           enable_infiniband=enable_infiniband)

client = Client(cluster)
client.run(cupy.cuda.set_allocator)

embedding = cudf.read_hdf('/data/test.h5', 'test')
embedding = dask_cudf.from_cudf(embedding, npartitions=1).reset_index()

pca = cuDaskPCA(client=client, n_components=7)
pca.fit(embedding)
embedding = pca.transform(embedding)

Expected behavior
Create result dataset without error.

Environment details (please complete the following information):

  • Environment location: Docker
  • Linux Distro/Architecture: Ubuntu 20.04.2 LTS
  • GPU Model/Driver: Quadro P5000 Driver Version: 465.19.01
  • CUDA: 11.3
  • Method of cuDF & cuML install: Docker
    • If method of install is [Docker], provide docker pull & docker run commands used
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 -v /home/rilango/testdata/:/data  rapidsai/rapidsai-core:21.06-cuda11.2-runtime-ubuntu18.04-py3.7 bash
@rilango rilango added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jul 2, 2021
@hcho3 hcho3 removed the ? - Needs Triage Need team to review and classify label Jul 2, 2021
@lowener lowener self-assigned this Jul 5, 2021
@lowener
Copy link
Contributor

lowener commented Jul 5, 2021

I was able to reproduce this error. The AttributeError is raised by n_components. This has been resolved in PR #3912.

Using this fix made your code work on my side so you should try to update to the latest version of cuml (branch-21.08).

@lowener lowener closed this as completed Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants