using `sum()` causes dtype mismatch #135

JackTemaki · 2022-04-27T16:18:02Z

Writing e.g. l2loss = sum(nn.reduce(param ** 2.0, axis=list(param.shape), mode="sum") for param in net.parameters())
Causes a dtype mismatch when running the network (not during construction with returnn_common surprisingly):

  File "/work/asr4/rossenbach/sisyphus_work_folders/tts_asr_2021_work/i6_core/tools/git/CloneGitRepositoryJob.UbdRLURNoRpi/output/repository/returnn/tf/util/data.py", line 3614, in Data.copy_compatible_to
    line: assert not check_dtype or self.dtype == data.dtype
    locals:
      check_dtype = <local> True
      self = <local> Data{'reduce_output', []}
      self.dtype = <local> 'float32', len = 7
      data = <local> Data{'add_output', [], dtype='int32'}
      data.dtype = <local> 'int32

The error on the Returnn side is justified, because the generated layers are:

    "reduce": {
        "class": "reduce",
        "from": "pow",
        "mode": "sum",
        "axis": [transformer_default_model_dim, _3_transformer_default_model_dim],
        "out_shape": {},
    },
    "constant_0": {"class": "constant", "value": 0},
    "add": {
        "class": "combine",
        "from": ["constant_0", "reduce"],
        "kind": "add",
        "out_shape": {},
    },

As the value is 0 and not 0.0 Returnn assumes int32 instead of float32 (float32 is the correct type of the reduce output). The 0 is added probably on the first sum() internal call of:

  def __add__(self, other: Union[RawTensorTypes, Tensor]) -> Tensor:
    from ._generated_layers import _combine
    return _combine([self, nn.convert_to_tensor(other)], kind="add", name="add")

I am not sure how to deal with the correct dtypes here, as when calling sum() we do may not exactly know which dtype we want to have.

One thing I could image is that we infer the type from self if other is a zero, and extend the nn.convert_to_tensor to have a dtype parameter.

The text was updated successfully, but these errors were encountered:

albertz · 2022-04-27T16:29:30Z

Yea, sum([p1, p2]) is equivalent to 0 + p1 + p2.
In PyTorch, int + float is allowed and automatically upcasted.
In TF, int + float is not allowed (but when one part is not a tensor, e.g. int, it will automatically be casted to the dtype of the other tensor).
In RETURNN, in the CombineLayer, I think we don't allow this (I assume this is your error; you have not postet the full traceback).

So, obviously we cannot change how sum is defined, as this is defined by Python itself.

We could add an optimization on returnn-common side which automatically reduces 0 + x or x + 0 to x, no matter the type of x.

We could extend RETURNN CombineLayer to allow int + float.

Not sure which way is the best or you would prefer. I think the returnn-common optimization would be nice anyway and would already solve this.

albertz · 2022-04-27T16:33:35Z

we do may not exactly know which dtype we want to have.

Note that such upcasting (promotion) rules are well defined in PyTorch/JAX/Numpy.

https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype
https://jax.readthedocs.io/en/latest/type_promotion.html
https://jax.readthedocs.io/en/latest/design_notes/type_promotion.html

I think TF in its normal behavior actually does not have it.

https://www.tensorflow.org/api_docs/python/tf/math/add

JackTemaki · 2022-04-27T16:38:38Z

Okay then lets start with the optimizations like x+0 -> x; x-0 -> x; x*1 -> x and so on... I think those are a good idea anyway.

albertz · 2022-04-27T16:52:57Z

Ok done.

albertz · 2022-04-27T17:09:32Z

Btw, if you want to have an L2 loss, I would not recommend to do it this way. Rather, see #59. And you should also enable decouple_constraints then.

albertz closed this as completed in 30e5224 Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using `sum()` causes dtype mismatch #135

using `sum()` causes dtype mismatch #135

JackTemaki commented Apr 27, 2022 •

edited

Loading

albertz commented Apr 27, 2022 •

edited

Loading

albertz commented Apr 27, 2022 •

edited

Loading

JackTemaki commented Apr 27, 2022

albertz commented Apr 27, 2022

albertz commented Apr 27, 2022

using sum() causes dtype mismatch #135

using sum() causes dtype mismatch #135

Comments

JackTemaki commented Apr 27, 2022 • edited Loading

albertz commented Apr 27, 2022 • edited Loading

albertz commented Apr 27, 2022 • edited Loading

JackTemaki commented Apr 27, 2022

albertz commented Apr 27, 2022

albertz commented Apr 27, 2022

using `sum()` causes dtype mismatch #135

using `sum()` causes dtype mismatch #135

JackTemaki commented Apr 27, 2022 •

edited

Loading

albertz commented Apr 27, 2022 •

edited

Loading

albertz commented Apr 27, 2022 •

edited

Loading