Use tf.function for list column operations #938

edknv · 2022-12-26T19:13:32Z

Goals ⚽

In Tensorflow >= 2.10, there seems to be a race condition or some thread safety issue when Dataloader is loading list columns. The errors described in the above issue happen non-deterministically. When the list column tensors are copied successfully by the time tensorflow begins its execution, the tests run successfully. If the copy is incomplete, the tests fail. This PR proposes a workaround for this issue by using the @tf.function decorator on the methods that involve list columns.

Implementation Details 🚧

Some observations:

This does not happen on CPU, and only happens on GPUs.
This does not happen in graph mode, and only happens in eager mode.

The above two observations suggest two workarounds:

Move the problematic operations to CPU (by using with tf.device("CPU")).
Take the problematic operations out of eager execution and convert them to graph executions (by using tf.function).

The second workaround seems to be the superior approach. One could also argue that all methods, not just the ones involving list columns, that were written in eager mode should be wrapped with @tf.function for efficiency. However, I consider it to be out of scope for this PR. This PR has the minimum changes required to have the unit tests complete successfully; the @tf.function decorator is applied only to the methods that are failing unit tests in Tensorflow 2.10.

Also has a minor fix due to cudf.RangeIndex not having an empty property anymore, which was copied from #904 (comment).

Testing Details 🔍

Removed the upper bound on tensorflow from 2.10 to 2.11 in GPU CI / gpu-ci (pull_request).
This issue is currently blocking the 22.12 release pipeline. Tested manually with nvcr.io/nvstaging/merlin/tmp-merlin-tensorflow-stg:22.12 (which has tensorflow 2.10).

github-actions · 2022-12-26T19:19:11Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-938

EvenOldridge

Looks good. The wrapping of everything to a function for graph optimization may be something we want to explore in the future. Please create a brief issue describing what's proposed and why.

Use tf.function for list column operations

b74aa69

edknv force-pushed the ragged_tf_function branch from 88629bc to b74aa69 Compare December 26, 2022 21:34

edknv mentioned this pull request Dec 27, 2022

Use tf.function for list column operations NVIDIA-Merlin/dataloader#89

Merged

edknv self-assigned this Dec 27, 2022

edknv added bug Something isn't working area/tensorflow ci labels Dec 27, 2022

edknv added this to the Merlin 22.12 milestone Dec 27, 2022

edknv marked this pull request as ready for review December 27, 2022 00:54

rnyak requested review from sararb and marcromeyn December 27, 2022 18:29

EvenOldridge approved these changes Dec 27, 2022

View reviewed changes

edknv merged commit 73d650f into NVIDIA-Merlin:main Dec 27, 2022

edknv deleted the ragged_tf_function branch December 27, 2022 19:02

edknv added a commit that referenced this pull request Dec 27, 2022

Use tf.function for list column operations (#938)

a4b0d14

edknv mentioned this pull request Dec 27, 2022

[BUG] Exception in model when using ragged tensors with tensorflow 2.10.0 NVIDIA-Merlin/dataloader#74

Closed

gabrielspmoreira mentioned this pull request Feb 6, 2023

Fixes support of sequential continuous features for sequential and non-sequential models #969

Merged

edknv mentioned this pull request Feb 26, 2023

Wrap SequencePredictLast with tf.function #1001

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tf.function for list column operations #938

Use tf.function for list column operations #938

edknv commented Dec 26, 2022 •

edited

Loading

github-actions bot commented Dec 26, 2022

EvenOldridge left a comment

Use tf.function for list column operations #938

Use tf.function for list column operations #938

Conversation

edknv commented Dec 26, 2022 • edited Loading

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

github-actions bot commented Dec 26, 2022

Documentation preview

EvenOldridge left a comment

Choose a reason for hiding this comment

edknv commented Dec 26, 2022 •

edited

Loading