Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf_udf nightly cudf import rmm failed #4277

Closed
pxLi opened this issue Dec 3, 2021 · 1 comment · Fixed by #4278
Closed

[BUG] cudf_udf nightly cudf import rmm failed #4277

pxLi opened this issue Dec 3, 2021 · 1 comment · Fixed by #4278
Labels
bug Something isn't working build Related to CI / CD or cleanly building

Comments

@pxLi
Copy link
Collaborator

pxLi commented Dec 3, 2021

Describe the bug
conda install seems inconsistent sometimes it could be
cudf-22.02.00a211129 cuda_11_py38_g0f7c5328db_101
and sometimes its
cudf-22.02.00a211202 cuda_11_py38_gb848dd5c9c_142

if cuda_11_py38_g0f7c5328db_101, it would fail to import rmm in test

21/12/03 00:59:34 ERROR Executor: Exception in task 5.0 in stage 1.0 (TID 11)
java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:120)
	at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:136)
	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:135)
	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:131)
	at org.apache.spark.sql.rapids.execution.python.GpuArrowEvalPythonExec.$anonfun$doExecuteColumnar$2(GpuArrowEvalPythonExec.scala:618)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/12/03 00:59:34 INFO CoarseGrainedExecutorBackend: Got assigned task 12
21/12/03 00:59:34 INFO Executor: Running task 5.1 in stage 1.0 (TID 12)
Traceback (most recent call last):
  File "/home/jenkins/agent/workspace/jenkins-rapids_cudf_udf-dev-github-21-cuda11.0/jars/rapids-4-spark_2.12-22.02.0-SNAPSHOT.jar/rapids/daemon.py", line 131, in manager
  File "/home/jenkins/agent/workspace/jenkins-rapids_cudf_udf-dev-github-21-cuda11.0/jars/rapids-4-spark_2.12-22.02.0-SNAPSHOT.jar/rapids/worker.py", line 37, in initialize_gpu_mem
    from cudf import rmm
  File "/opt/conda/lib/python3.8/site-packages/cudf/__init__.py", line 12, in <module>
    from cudf import api, core, datasets, testing
  File "/opt/conda/lib/python3.8/site-packages/cudf/datasets.py", line 5, in <module>
    from cudf._lib.transform import bools_to_mask
ImportError: /opt/conda/lib/python3.8/site-packages/cudf/_lib/transform.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN4cudf21generalized_masked_opERKNS_10table_viewERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_9data_typeEPN3rmm2mr22device_memory_resourceE
21/12/03 00:59:35 ERROR Executor: Exception in task 4.0 in stage 1.0 (TID 10)
@pxLi pxLi added bug Something isn't working build Related to CI / CD or cleanly building labels Dec 3, 2021
@pxLi
Copy link
Collaborator Author

pxLi commented Dec 3, 2021

I am going to try use mamba to see if this could resolve the inconsistent versions issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant