Remove sparse tensor output type for list features #103

oliverholworthy · 2023-02-20T12:59:33Z

Remove sparse tensor output type for list features

Motivation

The value count attributes of columns in dataset.schema currently controls the output type of list columns.

With the addition of shape in the schema NVIDIA-Merlin/Merlin#813 we're going to start seeing value counts specified more. This will result in unexpected output type's if previously a value count was not specified.

We also currently have a possibility of output type sparse tensor which doesn't have a clear use-case and appears to be an implementation detail of padding ragged columns to dense.

Current

Dense Tensor
- value_count.max specified and is_ragged=False
Sparse Tensor
- value_count.max specified and is_ragged=True
Ragged Representation (values, offsets)
- value_count.max not specified and is_ragged=True

After - With this PR

Ragged Representation
Always returns this for all list columns. value count does not influence output type.

Planning a follow-up / independent change in #97 to return a dense tensor if the schema specifies that a column is of fixed size (is_ragged=False).

Remove sparse tensor output type for list features

5a43f21

oliverholworthy added the enhancement New feature or request label Feb 20, 2023

oliverholworthy self-assigned this Feb 20, 2023

oliverholworthy mentioned this pull request Feb 27, 2023

Updates Models to support new dataloader format for lists (__values and __offsets in dict) and scalar (1D) NVIDIA-Merlin/models#999

Merged

oliverholworthy mentioned this pull request Mar 14, 2023

Update padding of ragged features to enable dataloader change NVIDIA-Merlin/Transformers4Rec#647

Merged

oliverholworthy added this to the Merlin 23.03 milestone Mar 14, 2023

oliverholworthy added 3 commits March 16, 2023 10:15

Merge branch 'main' into remove-sparse-tensor-output

fab1a40

Remove unused _get_indices and _get_max_seq_len methods

f0a2e22

Remove unused _pull_values_offsets method

ba1cdbc

oliverholworthy marked this pull request as ready for review March 16, 2023 10:58

karlhigley requested review from jperez999 and bschifferer March 16, 2023 14:11

karlhigley approved these changes Mar 16, 2023

View reviewed changes

karlhigley merged commit a9a2c78 into NVIDIA-Merlin:main Mar 16, 2023

karlhigley added the breaking label Mar 16, 2023

oliverholworthy mentioned this pull request Mar 16, 2023

Remove tests for sparse tensors in dataloader NVIDIA-Merlin/NVTabular#1783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove sparse tensor output type for list features #103

Remove sparse tensor output type for list features #103

oliverholworthy commented Feb 20, 2023 •

edited

Loading

Remove sparse tensor output type for list features #103

Remove sparse tensor output type for list features #103

Conversation

oliverholworthy commented Feb 20, 2023 • edited Loading

Motivation

Current

After - With this PR

oliverholworthy commented Feb 20, 2023 •

edited

Loading