Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for PyTorch 1.7 release #2683

Merged
merged 11 commits into from
Nov 17, 2020
Merged

Fixes for PyTorch 1.7 release #2683

merged 11 commits into from
Nov 17, 2020

Conversation

fritzo
Copy link
Member

@fritzo fritzo commented Oct 28, 2020

See release notes

Tasks

  • replace .expand(...) -> .expand(...).clone() if the result must support .__setitem__()

  • update to use torch.fft; see Deprecate old fft functions pytorch/pytorch#44876 (comment)

  • fix examples/sparse_regression.py

    To reproduce, run pdb -cc examples/sparse_regression.py --num-steps=2 --num-data=50 --num-dimensions 20

    Traceback (most recent call last):
    File "/Users/fobermey/opt/miniconda3/envs/pyro/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 867, in split
      len(indices_or_sections)
    TypeError: object of type 'int' has no len()
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
    File "/Users/fobermey/opt/miniconda3/envs/pyro/lib/python3.7/pdb.py", line 1697, in main
      pdb._runscript(mainpyfile)
    File "/Users/fobermey/opt/miniconda3/envs/pyro/lib/python3.7/pdb.py", line 1566, in _runscript
      self.run(statement)
    File "/Users/fobermey/opt/miniconda3/envs/pyro/lib/python3.7/bdb.py", line 585, in run
      exec(cmd, globals, locals)
    File "<string>", line 1, in <module>
    File "/Users/fobermey/github/pyro-ppl/pyro/examples/sparse_regression.py", line 4, in <module>
      import argparse
    File "/Users/fobermey/github/pyro-ppl/pyro/examples/sparse_regression.py", line 290, in main
      median['sigma'].double())
    File "/Users/fobermey/opt/miniconda3/envs/pyro/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
      return func(*args, **kwargs)
    File "/Users/fobermey/github/pyro-ppl/pyro/examples/sparse_regression.py", line 173, in compute_posterior_stats
      active_quadratic_dims = np.split(active_quadratic_dims, active_quadratic_dims.shape[0])
    File "<__array_function__ internals>", line 6, in split
    File "/Users/fobermey/opt/miniconda3/envs/pyro/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 871, in split
      if N % sections:
    ZeroDivisionError: integer division or modulo by zero
    
  • fix multi-chain mcmc, failing in the baseball example

    The baseball example is failing. to reproduce:

    python examples/baseball.py --num-samples=200 --warmup-steps=100 --jit
    
    raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries.  If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).
    

Tested

Ran tests locally against torch==1.7. (Note CI still uses PyTorch 1.6 since that is the oldest supported)

  • pytest -vx --stage unit
  • pytest -vx --stage integration
  • make test-tutorials
  • pytest -vx --stage test_examples (currently failing)

@fritzo fritzo added the WIP label Oct 28, 2020
@fritzo fritzo added this to the 1.5.1 milestone Oct 28, 2020
@fehiepsi fehiepsi self-assigned this Nov 14, 2020
@fehiepsi fehiepsi removed their assignment Nov 15, 2020
@fritzo
Copy link
Member Author

fritzo commented Nov 16, 2020

Thanks for helping, @fehiepsi 🎉

@fritzo
Copy link
Member Author

fritzo commented Nov 16, 2020

@martinjankowiak can you please take a look at your failing sparse_regression.py example?

@neerajprad @fehiepsi any idea how to fix multi-chain mcmc in the baseball example?

@fritzo fritzo mentioned this pull request Nov 16, 2020
12 tasks
@fritzo
Copy link
Member Author

fritzo commented Nov 16, 2020

@neerajprad I'm seeing weird unexpected .requires_grad in the baseball example. I think PyTorch 1.7 might be overly propagating .requires_grad during jitting, even to .shape (which under the jit is a tensor). 😕

@neerajprad
Copy link
Member

@neerajprad I'm seeing weird unexpected .requires_grad in the baseball example. I think PyTorch 1.7 might be overly propagating .requires_grad during jitting, even to .shape (which under the jit is a tensor). 😕

I'll take a look at this, @fritzo.

# at https://github.com/pytorch/pytorch/issues/10375
# This also resolves "RuntimeError: Cowardly refusing to serialize non-leaf tensor which
# requires_grad", which happens with `jit_compile` under PyTorch 1.7
args = [arg.clone().detach() if torch.is_tensor(arg) else arg for arg in args]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neerajprad Could you double-check if this is a good solution? This resolves the issues for:

  • cuda + jit/nojit
  • cpu + jit

There is no issue with cpu + nojit so should we filter it out at if self.num_chains > 1 above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we remove the num_chains restriction and simply detach (instead of cloning)? That should be very cheap and we can do that as a sanity measure anyways. I think @fritzo has correctly identified that the jit is incorrectly propagating up requires_grad, so this seems like a bigger problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It seems that .detach() solves the issue. Removing num_chains helps for single-chain too (without detach, baseball failed with num_chains=1???). I think the problem comes from this line of Binomial.log_prob. If I change k * self.logits to (k + 0) * self.logits, the problem goes away without having to detach... That observation agrees with: the jit is incorrectly propagating up requires_grad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fehiepsi. I wonder if the problem is due to the cached logits. I have seen this previously for transforms where the cached value has its requires_grad set during backward, but in that case we get an error saying that JIT cannot insert a constant with a requires_grad attribute set. That doesn't explain why adding 0 helps take care of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I remember that the issue still happens when I changed self.logits to self.probs.

pyro/infer/mcmc/api.py Outdated Show resolved Hide resolved
@fritzo
Copy link
Member Author

fritzo commented Nov 17, 2020

Thanks again for fixing the baseball example @fehiepsi! I think this is ready to merge.

@neerajprad neerajprad merged commit ae55140 into dev Nov 17, 2020
@fritzo fritzo deleted the torch-1.7-fixes branch September 27, 2021 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants