# This is a combination of 197 commits. · nailimixaM/pytorch@a3763c7

Commit

# This is a combination of 197 commits.

# This is the 1st commit message:

Add Gaussian negative log likelihood loss

# This is the commit message #2:

flake8 compliance of test file

# This is the commit message #3:

flake8 compliance loss math description

# This is the commit message #4:

flake8 compliance loss docstring

# This is the commit message #5:

Fix tests and docs

# This is the commit message #6:

Add loss to init script

# This is the commit message #7:

Change eps

# This is the commit message #8:

Fix test and docs

# This is the commit message #9:

Cleaner docs and fix tests

# This is the commit message #10:

Update docs for var clamping change

# This is the commit message #11:

Fix overridetests

# This is the commit message #12:

Fix reduction mode bug and var view bug

# This is the commit message #13:

Update class init to have kwargs

# This is the commit message #14:

Add note and reference to docs

# This is the commit message #15:

Fix typos

# This is the commit message #16:

Preserve memory format in qconv op (#49533)

Summary:
* qconv used to return NHWC no matter the input format
* this change returns NCHW format if the input was NCHW

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49533

Test Plan:
pytest test/quantization/test_quantized_op.py::\
TestQuantizedConv::test_qconv2d_preserve_mem_format

Fixes https://github.com/pytorch/pytorch/issues/47295

Reviewed By: kimishpatel

Differential Revision: D25609205

Pulled By: axitkhurana

fbshipit-source-id: 83f8ca4a1496a8a4612fc3da082d727ead257ce7

# This is the commit message #17:

Added linalg.inv (#48261)

Summary:
This PR adds `torch.linalg.inv` for NumPy compatibility.

`linalg_inv_out` uses in-place operations on provided `result` tensor.

I modified `apply_inverse` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_inv_out` but removing the error checks and device memory synchronization.

I fixed `lda` (leading dimension parameter which is max(1, n)) in many places to handle 0x0 matrices correctly.
Zero batch dimensions are also working and tested.

Ref https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48261

Reviewed By: ngimel

Differential Revision: D25690129

Pulled By: mruberry

fbshipit-source-id: edb2d03721f22168c42ded8458513cb23dfdc712

# This is the commit message #18:

Mod lists to neutral+descriptive terms in caffe2/docs (#49803)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49803

Per "https://fb.workplace.com/groups/e/permalink/3320810064641820/" we can no longer use the terms "whitelist" and "blacklist", and editing any file containing them results in a critical error signal. Let's embrace the change.
This diff changes "blacklist" to "blocklist" in a number of non-interface contexts (interfaces would require more extensive testing and might interfere with reading stored data, so those are deferred until later).

Test Plan: Sandcastle

Reviewed By: vkuzo

Differential Revision: D25686924

fbshipit-source-id: 117de2ca43a0ea21b6e465cf5082e605e42adbf6

# This is the commit message #19:

Improve docs for scatter and gather functions (#49679)

Summary:
- Add warning about non-unique indices
- And note that these functions don't broadcast
- Add missing `torch.scatter` and `torch.scatter_add` doc entries
- Fix parameter descriptions
- Improve code examples to make indexing behaviour easier to understand

Closes gh-48214
Closes gh-26191
Closes gh-37130
Closes gh-34062
xref gh-31776

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49679

Reviewed By: mruberry

Differential Revision: D25693660

Pulled By: ngimel

fbshipit-source-id: 4983e7b4efcbdf1ab9f04e58973b4f983e8e43a4

# This is the commit message #20:

removes more unused THC functions (#49788)

Summary:
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49788

Reviewed By: mruberry

Differential Revision: D25693328

Pulled By: ngimel

fbshipit-source-id: 244a096214d110e4c1a94f2847ff8457f1afb0d1

# This is the commit message #21:

[pt][quant] Make the CUDA fake quantize logic consistent with CPU fake quantize logic (#49808)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49808

In PyTorch, it uses `dst = std::nearbyint(src * inv_scale) + zero_point` instead of the LEGACY  `dst = std::nearbyint(src * inv_scale + zero_point)`. However, the CUDA implementation doesn't match this. This Diff makes the CPU and CUDA implementation consistent.

- FBGEMM code pointer: https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L76-L80
- PyTorch code pointer:
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer.cpp#L306

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D25694235

fbshipit-source-id: 0a615e559132aafe18543deac1ea5028dd840cb9

# This is the commit message #22:

[numpy] `torch.erfinv`: promote integer inputs to float (#49155)

Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49155

Reviewed By: ngimel

Differential Revision: D25664234

Pulled By: mruberry

fbshipit-source-id: 630fd1d334567d78c8130236a67dda0f5ec02560

# This is the commit message #23:

[reland] Early terminate when CUDA assert were thrown (#49799)

Summary:
this is a reland of https://github.com/pytorch/pytorch/issues/49527.

fixed slow test not running properly in py36 because capture_output is introduced in py37.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49799

Reviewed By: janeyx99

Differential Revision: D25692616

Pulled By: walterddr

fbshipit-source-id: 9c5352220d632ec8d7464e5f162ffb468a0f30df

# This is the commit message #24:

Fix typo in complex autograd docs (#49755)

Summary:
Update complex autograd docs to fix a typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49755

Reviewed By: mruberry

Differential Revision: D25692649

Pulled By: soulitzer

fbshipit-source-id: 43c2113b4c8f2d1828880102189a5a9b887dc784

# This is the commit message #25:

Revert D25690129: [pytorch][PR] Added linalg.inv

Test Plan: revert-hammer

Differential Revision:
D25690129 (https://github.com/pytorch/pytorch/commit/8554b58fbdd865c760d92bfa50c1119cc8fc65e9)

Original commit changeset: edb2d03721f2

fbshipit-source-id: 8679ea18e637423d35919544d2b047a62ac3abd8

# This is the commit message #26:

Creation of test framework for Sparse Operators (#48488)

Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48488

Reviewed By: ngimel

Differential Revision: D25696487

Pulled By: mruberry

fbshipit-source-id: dc4f57c6628f62b74dd321f3f6b0fff86f25b040

# This is the commit message #27:

Revert D25692616: [pytorch][PR] [reland] Early terminate when CUDA assert were thrown

Test Plan: revert-hammer

Differential Revision:
D25692616 (https://github.com/pytorch/pytorch/commit/e6a215592ea5b7f7f7e59e89116b507089bfb8d0)

Original commit changeset: 9c5352220d63

fbshipit-source-id: dade8068cad265d15ee908d98abe0de5b81a195d

# This is the commit message #28:

[quant][graphmode][fx] Standalone module support {input/output}_quantized_idxs (#49754)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49754

This PR adds the support for {input/output}_quantized_idxs for standalone module.

if input_quantized_idxs = [] and output_quantized_idxs = [], the standalone module will be expecting float
input and produce float output, and will quantize the input and dequantize output internally

if input_quantized_idxs = [0] and otuput_qiuantized_idxs = [0], the standalone module will be expecting quantized
input and produce quantized output, the input will be quantized in the parent module, and output will be dequantized
in the parent module as well, this is similar to current quantized modules like nn.quantized.Conv2d

For more details, please see the test case

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_standalone_module

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D25684692

fbshipit-source-id: 900360e01c0e35b26fe85f4a887dc1fd6f7bfb66

# This is the commit message #29:

Clip small scales to fp16 min

Summary: When the FC output min max range is very small, we want to enforce a cutoff on the scale parameter to better generalize for future values that could fall beyond the original range.

Test Plan:
More analysis about the output distributions can be found in N425166

An example workflow using fp16 min clipping is f240972205

Reviewed By: jspark1105

Differential Revision: D25681249

fbshipit-source-id: c4dfbd3ee823886afed06e6c2eccfc29d612f7e6

# This is the commit message #30:

Revert D25684692: [quant][graphmode][fx] Standalone module support {input/output}_quantized_idxs

Test Plan: revert-hammer

Differential Revision:
D25684692 (https://github.com/pytorch/pytorch/commit/89b4899ea5363fd69872c0cabf0dedea2dc533c8)

Original commit changeset: 900360e01c0e

fbshipit-source-id: 8b65fa8fbc7b364fbddb5f23cc696cd9b7db98cd

# This is the commit message #31:

[numpy] `torch.digamma` : promote integer inputs to float (#48302)

Summary:
**BC-breaking Note:**

This PR updates PyTorch's digamma function to be consistent with SciPy's special.digamma function. This changes the result of the digamma function on the nonpositive integers, where the gamma function is not defined. Since the gamma function is undefined at these points, the (typical) derivative of the logarithm of the gamma function is also undefined at these points, and for negative integers this PR updates digamma to return NaN. For zero, however, it returns -inf to be consistent with SciPy.

Interestingly, SciPy made a similar change, which was noticed by at least one user: https://github.com/scipy/scipy/issues/9663#issue-396587679.

SciPy's returning of negative infinity at zero is intentional:
https://github.com/scipy/scipy/blob/59347ae8b86bcc92c339efe213128f64ab6df98c/scipy/special/cephes/psi.c#L163

This change is consistent with the C++ standard for the gamma function:
https://en.cppreference.com/w/cpp/numeric/math/tgamma

**PR Summary:**
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48302

Reviewed By: ngimel

Differential Revision: D25664087

Pulled By: mruberry

fbshipit-source-id: 1168e81e218bf9fe5b849db0e07e7b22e590cf73

# This is the commit message #32:

early termination of CUDA tests (#49869)

Summary:
This is follow up on https://github.com/pytorch/pytorch/issues/49799.

* uses `torch.cuda.synchronize()` to validate CUDA assert instead of inspecting error message.
* remove non CUDA tests.

hopefully can reproduce why slow_tests fails but not normal test. since the test still runs for >1min.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49869

Reviewed By: mruberry

Differential Revision: D25714385

Pulled By: walterddr

fbshipit-source-id: 04f8ccb50d8c9ee42826a216c49baf90285b247f

# This is the commit message #33:

[*.py] Rename "Arguments:" to "Args:" (#49736)

Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619

# This is the commit message #34:

Support the `in` operator with str (#47057)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47057

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D24863370

Pulled By: ansley

fbshipit-source-id: 5d17165b06052f0a4676537c5f6757083185a591

# This is the commit message #35:

[NNC] masked fill (#49627)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49627

There was a bug in the test that was hidden by the `If eager mode doesn't support a dtype/op/device combo` try /  catch, so cuda wasn't being tested �  The fix is just to rename `aten::masked_fill` to `aten_masked_fill`.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D25696409

Pulled By: eellison

fbshipit-source-id: 83de1f5a194df54fe317b0035d4a6c1aed1d19a0

# This is the commit message #36:

[JIT] Constant prop getattr (#49806)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49806

Fix for https://github.com/pytorch/pytorch/issues/47089

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D25696791

Pulled By: eellison

fbshipit-source-id: 914c17b8effef7f4f341775ac2b8150ee4703efd

# This is the commit message #37:

fx quant: hook up ConvTranspose{n}d (#49717)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49717

Quantization of `ConvTranpose{n}d` is supported in Eager mode. This PR
adds the support for FX graph mode.

Note: this currenlty only works in `qnnpack` because per-channel weights
are not supported by quantized conv transpose. In a future PR we should throw
an error when someone tries to quantize a ConvTranspose model with per-channel
weight observers until this is fixed.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_conv_transpose_1d
python test/test_quantization.py TestQuantizeFxOps.test_conv_transpose_2d
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25674636

fbshipit-source-id: b6948156123ed55db77e6337bea10db956215ae6

# This is the commit message #38:

fx quant: split linear test cases (#49740)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49740

1. Separates the module and functional linear test cases.
2. Combines the test case which tests for linear bias observation into
the main linear test case, as requested in
https://github.com/pytorch/pytorch/pull/49628.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_linear_module
python test/test_quantization.py TestQuantizeFxOps.test_linear_functional
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25681272

fbshipit-source-id: 0ed0ebd5afb8cdb938b530f7dbfbd79798eb9318

# This is the commit message #39:

Implement torch.linalg.qr (#47764)

Summary:
I am opening this PR early to have a place to discuss design issues.
The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following:

`reduced`
this is completely equivalent to `some=True`, and both are the default.

`complete`
this is completely equivalent to `some=False`.

`r`
this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I **think** that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case.

`raw`
in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world.
I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives.

`full`, `f`
alias for `reduced`, deprecated since numpy 1.8.0

`economic`, `e`
similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0

To summarize:
  * `reduce`, `complete` and `r` are straightforward to implement.

  * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future?

  * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead

/cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764

Reviewed By: ngimel

Differential Revision: D25708870

Pulled By: mruberry

fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b

# This is the commit message #40:

Fix errata (#49903)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49903

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25718411

Pulled By: ansley

fbshipit-source-id: 0cc365c5a53077752dc1c5a5c4a65b873baa3604

# This is the commit message #41:

Update gather documentation to allow index.shape[k] <= input.shape[k] rather than ==. (#41887)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41887

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D22680014

Pulled By: gchanan

fbshipit-source-id: b162fccabc22a1403c0c43c1131f0fbf4689a79d

# This is the commit message #42:

Enable tests using named temp files on Windows (#49640)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49640

Reviewed By: ngimel

Differential Revision: D25681548

Pulled By: malfet

fbshipit-source-id: 0e2b25817c98d749920cb2b4079033a2ee8c1456

# This is the commit message #43:

added fuse_op and list_construct - list_unpack pass

Summary: Added fuse_op and list_construct and list_unpack pass

Test Plan:
jit_graph_opt_test.py
jit_graph_optimizer_test.cc
sparsenn_fused_operator_test.py

Reviewed By: qizzzh

Differential Revision: D25715079

fbshipit-source-id: fa976be53135a83f262b8f2e2eaedadd177f46c4

# This is the commit message #44:

Clean up type annotations in caffe2/torch/nn/modules (#49938)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49938

Test Plan: Sandcastle tests

Reviewed By: xush6528

Differential Revision: D25718705

fbshipit-source-id: 6a9e3e6d17aa458726cd32aa0a71a63c51b601d9

# This is the commit message #45:

[Tensorexpr]Copying header files in tensorexpr dir (#49933)

Summary:
Previously header files from jit/tensorexpr were not copied, this PR should enable copying.

This will allow other OSS projects like Glow to used TE.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49933

Reviewed By: Krovatkin, mruberry

Differential Revision: D25725927

Pulled By: protonu

fbshipit-source-id: 9d5a0586e9b73111230cacf044cd7e8f5c600ce9

# This is the commit message #46:

Clean up some type annotations in caffe2/torch/quantization (#49942)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49942

Upgrades type annotations from Python2 to Python3

Test Plan: Sandcastle tests

Reviewed By: vkuzo

Differential Revision: D25717551

fbshipit-source-id: 1b63dc485ecf6641641b05f7ce095ae1d2d87346

# This is the commit message #47:

Revert D25718705: Clean up type annotations in caffe2/torch/nn/modules

Test Plan: revert-hammer

Differential Revision:
D25718705 (https://github.com/pytorch/pytorch/commit/891759f8609f300203d41cccc7337089b38858bd)

Original commit changeset: 6a9e3e6d17aa

fbshipit-source-id: 1a4ef0bfdec8eb8e7ce149bfbdb34a4ad8d964b6

# This is the commit message #48:

added List as an option to the unflattened_size (#49838)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/49743

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49838

Reviewed By: mruberry

Differential Revision: D25727971

Pulled By: ngimel

fbshipit-source-id: 60142dae84ef107f0083676a2a78ce6b0472b7e1

# This is the commit message #49:

Fix auto exponent issue for torch.pow (#49809)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49809

Fixes https://github.com/pytorch/xla/issues/2688 #46936

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D25724176

Pulled By: anjali411

fbshipit-source-id: 16287a1f481e9475679b99d6fb45de840da225be

# This is the commit message #50:

Adding JIT support for cuda streams and events (#48020)

Summary:
=======

This PR addresses the following:

 * Adds JIT support for CUDA Streams
 * Adds JIT support for CUDA Events
 * Adds JIT support for CUDA Stream context manager

Testing:
======

python test/test_jit.py -v TestCUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020

Reviewed By: navahgar

Differential Revision: D25725749

Pulled By: nikithamalgifb

fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c

# This is the commit message #51:

Remove THPWrapper (#49871)

Summary:
Remove `THPWrapper` from PyTorch C code since it is not used anymore and because we have dropped Python 2 compatibility, its usage can be replaced by capsule objects (`PyCapsule_New`, `PyCapsule_CheckExact`, `PyCapsule_GetPointer` and `PyCapsule_GetDestructor`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49871

Reviewed By: mruberry

Differential Revision: D25715038

Pulled By: albanD

fbshipit-source-id: cc3b6f967bbe0dc42c692adf76dff4e4b667fdd5

# This is the commit message #52:

Enable test_fusions TanhQuantize (#49970)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49970

enable test_fusions:test_tanhquantize

Test Plan: https://internalfb.com/intern/testinfra/testrun/6755399469176694

Reviewed By: hyuen

Differential Revision: D25732684

fbshipit-source-id: b8479e43b5248ba5510f0c78c993d534d3ffc2b0

# This is the commit message #53:

[numpy] `torch.rsqrt` : promote integer inputs to float (#47909)

Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47909

Reviewed By: ngimel

Differential Revision: D25730876

Pulled By: mruberry

fbshipit-source-id: c87a8f686e1dd64e511640e0278021c4a584ccf2

# This is the commit message #54:

Accept input tensor with 0-dim batch size for MultiLabelMarginLoss (#46975)

Summary:
Fix for one of the layers listed in https://github.com/pytorch/pytorch/issues/12013 or https://github.com/pytorch/pytorch/issues/38115

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46975

Reviewed By: mruberry

Differential Revision: D25719980

Pulled By: ngimel

fbshipit-source-id: 83414bad37c0b004bc7cced04df8b9c89bdba3e6

# This is the commit message #55:

Fix a KaTeX crash and many docstring issues (#49684)

Summary:
The first commit fixes the `MultiheadAttention` docstrings, which are causing a cryptic KaTeX crash.

The second commit fixes many documentation issues in `torch/_torch_docs.py`, and closes gh-43667 (missing "Keyword arguments" headers). It also fixes a weird duplicate docstring for `torch.argmin`; there's more of these, it looks like they were written based on whether the C++ implementation has an overload. That makes little sense to a Python user though, and the content is simply duplicate.

The `Shape:` heading for https://pytorch.org/docs/master/generated/torch.nn.MultiheadAttention.html looked bad, here's what it looks like with this PR:

<img width="475" alt="image" src="https://user-images.githubusercontent.com/98330/102797488-09a44e00-43b0-11eb-8788-acdf4e936f2f.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49684

Reviewed By: ngimel

Differential Revision: D25730909

Pulled By: mruberry

fbshipit-source-id: d25bcf8caf928e7e8e918017d119de12e10a46e9

# This is the commit message #56:

Remove incorrect usage of layout(std430) on uniform buffers, correctly now treated as error in the latest release of Vulkan SDK. (#49572)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49572

Differential Revision: D25729888

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Pulled By: AshkanAliabadi

fbshipit-source-id: 15dd4acef3dfae72f03e7e3085b1ff5936becf3d

# This is the commit message #57:

quant docs: add common errors section (#49902)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49902

Adds a common errors section, and details the two errors
we see often on the discuss forums, with recommended solutions.

Test Plan: build the docs on Mac OS, the new section renders correctly.

Reviewed By: supriyar

Differential Revision: D25718195

Pulled By: vkuzo

fbshipit-source-id: c5ef2b24831d18d57bbafdb82d26d8fbf3a90781

# This is the commit message #58:

[quant] Quantizable LSTM (#49671)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49671

- Introduces the `torch.nn.quantizable` namespace
- Adds the `torch.nn.quantizable.LSTM` module

The point of the `quantizable` namespace is to segregate the purely quantized modules with the modules that could be quantized through a normal quantization flow, but are not using the quantized kernels explicitly.
That means the quantizable modules are functionally and numerically equivalent to the FP ones and can be used instead of the FP ones without any loss.

The main difference between the `torch.nn.LSTM` and the `torch.nn.quantizable.LSTM` is that the former one does not support observation for the linear layers, because all the computation is internal to the `aten` namespace.
The `torch.nn.quantizable.LSTM`, however, uses explicit linear layers that can be observed for further quantization.

Test Plan: Imported from OSS

Differential Revision: D25663870

Reviewed By: vkuzo

Pulled By: z-a-f

fbshipit-source-id: 70ff5463bd759b9a7922571a5712d3409dfdfa06

# This is the commit message #59:

[PyTorch] Decouple version numbers from c10 and caffe2 targets (#49905)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49905

There's size regression in model delivery in D25682312. Only the model version numbers are used. However, the dependency of the entire c10 (128 KB) is pulled in.

This diff is to decouple the version numbers to a separate header file, versions.h. Other targets referring to version numbers only can have deps of ```caffe2:version_headers```.
ghstack-source-id: 119161467

Test Plan: CI

Reviewed By: xcheng16, guangyfb

Differential Revision: D25716601

fbshipit-source-id: 07634bcf46eacfefa4aa75f2e4c9b9ee30c6929d

# This is the commit message #60:

Revert D25719980: [pytorch][PR] Accept input tensor with 0-dim batch size for MultiLabelMarginLoss

Test Plan: revert-hammer

Differential Revision:
D25719980 (https://github.com/pytorch/pytorch/commit/6b56b71e61e14bf4de5b371f0d8f2f2029065b31)

Original commit changeset: 83414bad37c0

fbshipit-source-id: 27eddd711a2b9e0adbc08bfab12100562e63ac21

# This is the commit message #61:

Improve `torch.flatten` docs and add tests to test_view_ops (#49501)

Summary:
Addresses https://github.com/pytorch/pytorch/issues/39474

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49501

Reviewed By: mruberry

Differential Revision: D25734450

Pulled By: soulitzer

fbshipit-source-id: 993667dd07acd81a4616465e0a3b94bde449193e

# This is the commit message #62:

Fix inf norm grad (reland) (#48611)

Summary:
Reland of https://github.com/pytorch/pytorch/issues/48122

Does this result in a regression? No significant regression observed.

Timer script:
```
import torch
from torch.utils.benchmark import Timer

setup="""
a = torch.rand((2, 2), requires_grad=True)
gradient = torch.ones(2)
"""

stmt="""
torch.autograd.grad(torch.norm(a, dim=(0,), keepdim=False), a, gradient)
"""

timer = Timer(stmt, setup)

print(timer.timeit(10000))
print(timer.collect_callgrind(100))
```
Note: small matrix, keepdim is False, and dims is non-empty

Before change
```
Runtime   37.37 us
1 measurement, 10000 runs , 1 thread

                           All          Noisy symbols removed
    Instructions:     15279045                   15141710
    Baseline:             4257                       3851
100 runs per measurement, 1 thread
```

After change
```
Runtime 36.08 us
1 measurement, 10000 runs , 1 thread

                           All          Noisy symbols removed
    Instructions:     15296974                   15153534
    Baseline:             4257                       3851
100 runs per measurement, 1 thread
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48611

Reviewed By: albanD, mruberry

Differential Revision: D25309997

Pulled By: soulitzer

fbshipit-source-id: 5fb950dc9259234342985c0e84ada25a7e3814d6

# This is the commit message #63:

Revert D25734450: [pytorch][PR] Improve `torch.flatten` docs and add tests to test_view_ops

Test Plan: revert-hammer

Differential Revision:
D25734450 (https://github.com/pytorch/pytorch/commit/730965c246192c94c804e5ac4a95f175dca2fb18)

Original commit changeset: 993667dd07ac

fbshipit-source-id: 603af25311fc8b29bb033167f3b2704da79c3147

# This is the commit message #64:

Remove flops warnings from the default profiler use case (#49896)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49896

Add missing check for with_flops option set

Test Plan:
python test/test_profiler.py
CI

Reviewed By: xuzhao9, ngimel

Differential Revision: D25716930

Pulled By: ilia-cher

fbshipit-source-id: 0da0bbb6c1a52328f665237e503406f877b41449

# This is the commit message #65:

[c10/**] Fix typos (#49815)

Summary:
All pretty minor. I avoided renaming `class DestructableMock` to `class DestructibleMock` and similar such symbol renames (in this PR).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49815

Reviewed By: VitalyFedyunin

Differential Revision: D25734507

Pulled By: mruberry

fbshipit-source-id: bbe8874a99d047e9d9814bf92ea8c036a5c6a3fd

# This is the commit message #66:

Back out "[pytorch][PR] Preserve memory format in qconv op" (#49994)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49994

Revert preserving memory format in qconv op because it is negatively affecting performance, will revert revert after fixing all issues

Test Plan: pytest fbcode/caffe2/test/quantization/test_quantized_op.py

Reviewed By: kimishpatel

Differential Revision: D25731279

fbshipit-source-id: 908dbb127210a93b27ada7ccdfa531177edf679a

# This is the commit message #67:

Making ops c10-full: list of optional tensors (#49138)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138

See for details: https://fb.quip.com/QRtJAin66lPN

We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this.

## Backwards Compatibility

- This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`.
- This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57
- This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`.

ghstack-source-id: 119269131

Test Plan:
## Benchmarks (C++ instruction counts):
### Forward
#### Script
```py
from torch.utils.benchmark import Timer

counts = Timer(
    stmt="""
        auto t = {{op call to measure}};
    """,
    setup="""
        using namespace torch::indexing;
        auto x = torch::ones({4, 4, 4});
    """,
    language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
#### Results
|  Op call                                                              |before   |after   |delta  |      |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x[0] = 1                                                                |11566015 |11566015|0      |0.00% |
|x.index({0})                                                            |6807019  |6801019 |-6000  |-0.09%|
|x.index({0, 0})                                                         |13529019 |13557019|28000  |0.21% |
|x.index({0, 0, 0})                                                      |10677004 |10692004|15000  |0.14% |
|x.index({"..."})                                                        |5512015  |5506015 |-6000  |-0.11%|
|x.index({Slice(None, None, None)})                                      |6866016  |6936016 |70000  |1.02% |
|x.index({None})                                                         |8554015  |8548015 |-6000  |-0.07%|
|x.index({false})                                                        |22400000 |22744000|344000 |1.54% |
|x.index({true})                                                         |27624088 |27264393|-359695|-1.30%|
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})|123472000|123463306|-8694|-0.01%|

### Autograd
#### Script
```py
from torch.utils.benchmark import Timer

counts = Timer(
    stmt="""
        auto t = {{op call to measure}};
    """,
    setup="""
        using namespace torch::indexing;
        auto x = torch::ones({4, 4, 4}, torch::requires_grad());
    """,
    language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
Note: the script measures the **forward** path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path.

#### Results
|  Op call                                                              |before   |after   |delta  |      |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x.index({0})                                                            |14839019|14833019|-6000| 0.00% |
|x.index({0, 0})                                                         |28342019|28370019|28000| 0.00% |
|x.index({0, 0, 0})                                                      |24434004|24449004|15000| 0.00% |
|x.index({"..."})                                                       |12773015|12767015|-6000| 0.00% |
|x.index({Slice(None, None, None)})                                      |14837016|14907016|70000| 0.47% |
|x.index({None})                                                        |15926015|15920015|-6000| 0.00% |
|x.index({false})                                                        |36958000|37477000|519000| 1.40% |
|x.index({true})                                                         |41971408|42426094|454686| 1.08% |
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) |168184392|164545682|-3638710| -2.16% |

Reviewed By: bhosmer

Differential Revision: D25454632

fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d

# This is the commit message #68:

Add type annotations to _tensorboard_vis.py and hipify_python.py (#49834)

Summary:
closes gh-49833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49834

Reviewed By: mruberry

Differential Revision: D25725341

Pulled By: malfet

fbshipit-source-id: 7454c7afe07a3ff829826afe02aba05b7f649d9b

# This is the commit message #69:

Run test_type_hints first (#49748)

Summary:
Since it sort of a liner check and fails frequently

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49748

Reviewed By: vkuzo

Differential Revision: D25682980

Pulled By: malfet

fbshipit-source-id: 7dba28242dced0277bad56dc887d3273c1e9e575

# This is the commit message #70:

Update update_s3_htmls.yml (#49934)

Summary:
It is now running for forks, and generates a lot of failure message to owner of forks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49934

Reviewed By: mruberry

Differential Revision: D25739552

Pulled By: seemethere

fbshipit-source-id: 0f9cc430316c0a5e9972de3cdd06d225528c81c2

# This is the commit message #71:

Improve `torch.flatten` docs and add tests to test_view_ops (#49501)

Summary:
Addresses https://github.com/pytorch/pytorch/issues/39474

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49501

Reviewed By: mrshenli

Differential Revision: D25740586

Pulled By: soulitzer

fbshipit-source-id: 3d7bdbab91eb208ac9e6832bb766d9d95a00c103

# This is the commit message #72:

move to non-legacy magma v2 headers (#49978)

Summary:
We recently (https://github.com/pytorch/pytorch/issues/7582) dropped magma v1 support, but we were still including the legacy compatibility headers and using functions only provided by them.
This changes the includes to the new magma_v2 header and fixes the triangular solve functions to use the v2-style magma_queue-using API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49978

Reviewed By: mrshenli

Differential Revision: D25752499

Pulled By: ngimel

fbshipit-source-id: 26d916bc5ce63978b341aefb072af228f140637d

# This is the commit message #73:

Enforce c10-fullness for all ops (#49619)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49619

This is a minimal-change PR that enforces that all operators are c10-full by making it the default.

This does not clean up any code yet, that will happen in PRs stacked on top. But this PR already ensures
that there are no non-c10-full ops left and there will be no non-c10-full ops introduced anymore.
ghstack-source-id: 119269182

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D25650198

fbshipit-source-id: efc53e884cb53193bf58a4834bf148453e689ea1

# This is the commit message #74:

.circleci: Ignore unbound variables for conda (#50053)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50053

For some reason conda likes to re-activate the conda environment when attempting this install
which means that a deactivate is run and some variables might not exist when that happens,
namely CONDA_MKL_INTERFACE_LAYER_BACKUP from libblas so let's just ignore unbound variables when
it comes to the conda installation commands

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D25760737

Pulled By: seemethere

fbshipit-source-id: 9e7720eb8a4f8028dbaa7bcfc304e5c1ca73ad08

# This is the commit message #75:

Construct CppSignatureGroup from NativeFunction (#49245)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245

This will make it easier to implement the POC in
https://github.com/peterbell10/pytorch/commit/d534f7d4c555a37fd178c143098b8537a5a05d61
see also https://github.com/pytorch/pytorch/pull/45666

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594005

Pulled By: ezyang

fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e

# This is the commit message #76:

Tighten up error checking on manual_kernel_registration (#49341)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49341

I noticed that #49097 was using manual_kernel_registration incorrectly,
so this diff tightens up the testing so that:

1. We don't generate useless wrapper functions when manual_kernel_registration
is on (it's not going to be registered, so it does nothing).

2. manual_kernel_registration shouldn't affect generation of functions in
Functions.h; if you need to stop bindings, use manual_cpp_binding

3. Structured and manual_kernel_registration are a hard error

4. We raise an error if you set dispatch and manual_kernel_registration at the
same time.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594003

Pulled By: ezyang

fbshipit-source-id: 655b10e9befdfd8bc95f1631b2f48f995a31a59a

# This is the commit message #77:

codegen: Resolve overload ambiguities created by defaulted arguments (#49348)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348

This is a redux of #45666 post refactor, based off of
https://github.com/peterbell10/pytorch/commit/d534f7d4c555a37fd178c143098b8537a5a05d61
Credit goes to peterbell10 for the implementation.

Fixes #43945.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594004

Pulled By: ezyang

fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f

# This is the commit message #78:

Move default or no default logic into native.argument (#49489)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49489

Previously, it was done at a use site, but that meant other use
sites don't get the right logic.  Pushing it in makes sure everyone
gets it.

I also fixed one case of confusion where defn() was used to define a decl().
If you want to define a declaration with no defaults, say no_default().decl()
which is more direct and will give us code reviewers a clue if you should
have pushed this logic in.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25595407

Pulled By: ezyang

fbshipit-source-id: 89c664f0ed4d95699794a0d3123d54d0f7e4cba4

# This is the commit message #79:

Make use_c10_dispatcher: full mandatory for structured kernels (#49490)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49490

No reason to let people to do the legacy thing for the brand new kernel.
This simplifies the codegen.  I have to port the two structured kernels
to this new format.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25595406

Pulled By: ezyang

fbshipit-source-id: b5931873379afdd0f3b00a012e0066af05de0a69

# This is the commit message #80:

Add trace batching forward/backward rule (#49979)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49979

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D25734379

Pulled By: ejguan

fbshipit-source-id: 8f9346afaf324e7ab17bafd6ecc97eed8442fd38

# This is the commit message #81:

[pytorch] add threshold_backward batching for vmap (#49881)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49881

title

Test Plan: pytest test/test_vmap.py -v -k "BatchedGrad"

Reviewed By: zou3519

Differential Revision: D25711289

fbshipit-source-id: f1856193249fda70da41e36e15bc26ea7966b510

# This is the commit message #82:

torch.xlogy: Use wrapped_scalar_tensor / gpu_with_scalars to speed up GPU kernel. (#49926)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49926

While investigating https://github.com/pytorch/pytorch/issues/49758, I changed the xlogy kernel to use the recommended wrapped_scaler_tensor pattern instead of moving the scalar to the GPU as a tensor.
While this doesn't avoid a synchronization (there is no synchronization in the move, as its done via fill), this does significantly speed up the GPU kernel (almost ~50%, benchmark in PR comments).

From looking at the nvprof output, it looks like this code path avoids broadcasting.  Aside: this seems unnecessary, as there is nothing special from the point-of-view of broadcasting whether the Tensor
is ()-sized or marked as a wrapped_scalar.  Still, this is a useful change to make as we avoid extra kernel launches and dispatches to create and fill the tensor.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25724215

Pulled By: gchanan

fbshipit-source-id: 4adcd5d8b3297502672ffeafc77e8af80592f460

# This is the commit message #83:

[BE] unified run_process_no_exception code (#49774)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49774

Reviewed By: janeyx99

Differential Revision: D25756811

Pulled By: walterddr

fbshipit-source-id: 4d2b3bd772572764ff96e5aad70323b58393e332

# This is the commit message #84:

prohibit assignment to a sparse tensor (#50040)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/48225 by prohibiting assignment to a sparse Tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50040

Reviewed By: mrshenli

Differential Revision: D25757125

Pulled By: zou3519

fbshipit-source-id: 3db6f48932eb10bf6ca5e97a6091afcabb60e478

# This is the commit message #85:

Suppress "statement is unreachable" warning (#49495)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49495

Compiling PyTorch currently generates a large number of warnings like this:
```
caffe2/aten/src/ATen/core/builtin_function.h(105): warning: statement is unreachable
```
The offending code
```
  std::string pretty_print_schema() const override {
    TORCH_INTERNAL_ASSERT(false);
    return "";
  }
```
has an unreachable return which prevents a "no return" warning.

We resolve the situation by using NVCC's pragma system to suppress this warning within this function.

Test Plan:
The warning appears when running:
```
buck build mode/dev-nosan //caffe2/torch/fb/sparsenn:test
```
As well as a number of other build commands.

Reviewed By: ngimel

Differential Revision: D25546542

fbshipit-source-id: 71cddd4fdb5fd16022a6d7b2daf0e6d55e6e90e2

# This is the commit message #86:

[ONNX] Handle Sub-block index_put in _jit_pass_onnx_remove_inplace_ops_for_onnx (#48734)

Summary:
For the added UT and existing UTs, this code is independent and ready for review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48734

Reviewed By: izdeby

Differential Revision: D25502677

Pulled By: bzinodev

fbshipit-source-id: 788b4eaa5e5e8b5df1fb4956fbd25928127bb199

# This is the commit message #87:

Dont inlinine intermediates on cpu (#49565)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565

Test Plan: Imported from OSS

Reviewed By: Krovatkin, ZolotukhinM

Differential Revision: D25688271

Pulled By: eellison

fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688

# This is the commit message #88:

Drop unused imports from scripts (#49956)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49956

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727347

fbshipit-source-id: 74d0a08aa0cfd0f492688a2b8278a0c65fd1deba

# This is the commit message #89:

Drop unused imports from leftovers (#49953)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49953

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727348

fbshipit-source-id: b3feef80b9b4b535f1bd4060dace5b1a50bd5e69

# This is the commit message #90:

Clean up some type annotations in caffe2/contrib/aten/gen_op (#49945)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49945

Upgrades type annotations from Python2 to Python3

Test Plan: Sandcastle tests

Reviewed By: xush6528

Differential Revision: D25717502

fbshipit-source-id: 718d93e8614e9d050f4da1c6bd4ac892bab98154

# This is the commit message #91:

[ONNX] Modified var_mean symbolic to support more combinations of dims (#48949)

Summary:
Based on existing implementation of var_mean, values of dim have to be sequential and start with zero. The formats listed below are cause scenarios with incompatible dimension for the Sub node.
-> dim[1, 2]
-> dim[0, 2]
-> dim[2, 0]

The changes in this PR allow such formats to be supported in var_mean

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48949

Reviewed By: houseroad

Differential Revision: D25540272

Pulled By: SplitInfinity

fbshipit-source-id: 59813a77ff076d138655cc8c17953358f62cf137

# This is the commit message #92:

introduce a flag to disable aten::cat in TE (#49579)

Summary:
introduce a flag to disable aten::cat in TE

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49579

Reviewed By: eellison

Differential Revision: D25763758

Pulled By: Krovatkin

fbshipit-source-id: c4f4a8220964813202369a3383057e77e7f10cb0

# This is the commit message #93:

Complex backward for indexing, slicing, joining, and mutating ops (#49552)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49552

This PR:
1. Migrates independent autograd test for `hstack`, `dstack`, `vstack`, `movedim`, `moveaxis` from `test_autograd.py` to the new `OpInfo` based tests.
2. Migrates autograd test for `gather`, `index_select` from the method_tests to the new `OpInfo` based tests.
2. Enables complex backward for `stack, gather, index_select, index_add_` and adds tests for complex autograd for all the above mentioned ops.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D25682511

Pulled By: anjali411

fbshipit-source-id: 5d8f89db4a9ec340ab99a6196987d44a23e2c6c6

# This is the commit message #94:

[FX] fix Graph python_code return type annotation (#49931)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49931

This fixes #49932. The `maybe_return_annotation` was not being passed by reference, so it was never getting modified.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D25725582

Pulled By: esqu1

fbshipit-source-id: 4136ff169a269d6b98f0b8e14d95d19e7c7cfa71

# This is the commit message #95:

[TensorExpr] Fix LLVM 10 build after LLVM API changes

Summary: Use `llvm::CodeGenFileType` for llvm-10+

Test Plan: local build

Reviewed By: asuhan

Differential Revision: D25694990

fbshipit-source-id: c35d973ef2669929715a94da5dd46e4a0457c4e8

# This is the commit message #96:

unit test for fc parallelization aot (#50056)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50056

buck test //caffe2/caffe2/contrib/fakelowp/test:test_chunkingnnpi -- --fallback-classic

Test Plan: https://our.intern.facebook.com/intern/testinfra/testrun/7036874446100155

Reviewed By: venkatacrc

Differential Revision: D25731079

fbshipit-source-id: 4aa4ffc641659cd90bf4670d28cb43e43ae76dcd

# This is the commit message #97:

Fix return value of _vmap_internals._get_name (#49951)

Summary:
This appears to have been a copy-paste error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49951

Reviewed By: mrshenli

Differential Revision: D25757099

Pulled By: zou3519

fbshipit-source-id: e47cc3b0694645bd0025326bfe45852ef0266adf

# This is the commit message #98:

Fix grammar typo in readme.md (#50000)

Summary:
missing `

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50000

Reviewed By: ezyang

Differential Revision: D25759608

Pulled By: mrshenli

fbshipit-source-id: 4dbe06b8978ae5b2b9b66cde163dab4bd8ee2257

# This is the commit message #99:

Fixing error in Readme.md. (#50033)

Summary:
Fix incorrect command in readme.
Fix incorrect url in readme.
Add url for dockerfile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50033

Reviewed By: ezyang

Differential Revision: D25759567

Pulled By: mrshenli

fbshipit-source-id: 2a3bc88c8717a3890090ddd0d6657f49d14ff05a

# This is the commit message #100:

Revert D25763758: [pytorch][PR] introduce a flag to disable aten::cat in TE

Test Plan: revert-hammer

Differential Revision:
D25763758 (https://github.com/pytorch/pytorch/commit/9e0b4a96e48132190220820684033a77a92e8a33)

Original commit changeset: c4f4a8220964

fbshipit-source-id: 98775ad9058b81541a010e646b0cf4864854be3e

# This is the commit message #101:

Patch death tests/fork use after D25292667 (part 3)

Summary: (Note: this ignores all push blocking failures!)

Test Plan: unit tests

Differential Revision: D25775357

fbshipit-source-id: 0ae3c59181bc123d763ed9c0d05c536998ae5ca0

# This is the commit message #102:

fixes indices computation for trilinear interpolate backwards (#50084)

Summary:
https://github.com/pytorch/pytorch/issues/48675 had some typos in indices computations so that results for trilinear interpolation where height is not equal to width were wrong. This PR fixes it.
cc xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50084

Reviewed By: BIT-silence

Differential Revision: D25777083

Pulled By: ngimel

fbshipit-source-id: 71be545628735fe875b7ea30bf6a09df4f2fae5c

# This is the commit message #103:

Run mypy on more test files (#49658)

Summary:
Improves one annotation for `augment_model_with_bundled_inputs`

Also add a comment to not work on caffe2 type annotations, that's not worth the effort - those ignores can stay as they are.

xref gh-16574

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49658

Reviewed By: heitorschueroff

Differential Revision: D25757721

Pulled By: ezyang

fbshipit-source-id: 44c396d8da9ef3f41b97f9c46a528f0431c4b463

# This is the commit message #104:

Run mypy over test/test_utils.py (#49654)

Summary:
This caught one incorrect annotation in `cpp_extension.load`.

xref gh-16574.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49654

Reviewed By: heitorschueroff

Differential Revision: D25757691

Pulled By: ezyang

fbshipit-source-id: 145ce3ae532cc585d9ca3bbd5381401bad0072e2

# This is the commit message #105:

quant: ensure observers do not crash for empty Tensors (#49800)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49800

Ensures that having a Tensor with 0 elements does not crash observers.
Note: it's illegal to pass Tensors with 0 elements to reductions such
as min and max, so we gate this out before the logic hits min/max.

This should not be hit often in practice, but it's coming up
during debugging of some RCNN models with test inputs.

Test Plan:
```
python test/test_quantization.py TestObserver.test_zero_numel
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25693230

fbshipit-source-id: d737559697c98bd923356edacba895835060bb38

# This is the commit message #106:

quant: nice error message on convtranspose with per-channel weight (#49899)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49899

Per channel weights observer in conv transpose is not supported yet.  Adding an
error message which fails instantly instead of making the user wait until after
calibration/training finishes.

Test Plan:
```
python test/test_quantization.py TestPostTrainingStatic.test_convtranspose_per_channel_fails_early
python test/test_quantization.py TestQuantizeFx.test_convtranspose_per_channel_fails_early
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25717151

fbshipit-source-id: 093e5979030ec185e3e0d56c45d7ce7338bf94b6

# This is the commit message #107:

quant: throw a nice error message for allclose with quantized inputs (#49802)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49802

Currently `torch.allclose` is not supported with quantized inputs.
Throw a nice error message instead of a cryptic one.

Test Plan:
```
torch.allclose(x_fp32, y_fp32)

torch.allclose(x_int8, y_int8)
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D25693538

fbshipit-source-id: 8958628433adfca3ae6ce215f3e3ec3c5e29994c

# This is the commit message #108:

eager quant: fix error with removing forward hooks (#49813)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49813

https://github.com/pytorch/pytorch/issues/49739 reports a crash
where removing forward hooks results in a

```
RuntimeError: OrderedDict mutated during iteration
```

Unfortunately I cannot repro this inside the PyTorch module, but the issue
author has a good point and and we should not mutate the dict inside
of the iteration.

Test Plan:
```
// test plan from https://github.com/pytorch/pytorch/pull/46871 which
// originally added this
python test/test_quantization.py TestEagerModeQATOps
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25698725

fbshipit-source-id: 13069d0d5017a84038c8f7be439a3ed537938ac6

# This is the commit message #109:

[JIT] Remove buffer metadata serialization forward-compat gate (#49990)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49990

**Summary**
This commit removes the forward-compatibility gate for buffer metadata
serialization. It was introduced to allow versions of fbcode
binaries statically linked against older versions of PyTorch (without
buffer metadata in JIT) to deserialize archives produced by new versions
of PyTorch. Enough time has probably passed that these old binaries
don't exist anymore, so it should be safe to remove the gate.

**Test Plan**
Internal tests.

Test Plan: Imported from OSS

Reviewed By: xw285cornell

Differential Revision: D25743199

Pulled By: SplitInfinity

fbshipit-source-id: 58d82ab4362270b309956826e36c8bf9d620f081

# This is the commit message #110:

Add an option to disable aten::cat in TE (re-revert) (#50101)

Summary:
This reverts commit ace78ddb6a2bdbf03f08c69767eba57306dd69ed.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50101

Reviewed By: eellison

Differential Revision: D25784785

Pulled By: Krovatkin

fbshipit-source-id: cbb3d377e03303f6c8c71f4c59c6d90ab40d55f7

# This is the commit message #111:

[distributed] Provide parameter to pass GPU ID in barrier function (#49069)

Summary:
For a multi GPU node, rank and corresponding GPU mapping can be different.
Provide optional parameter to specify the GPU device number for the
allreduce operation in barrier function.

Add test cases to validate barrier device_ids.

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Fixes https://github.com/pytorch/pytorch/issues/48110

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49069

Reviewed By: mrshenli

Differential Revision: D25658528

Pulled By: rohan-varma

fbshipit-source-id: 418198b6224c8c1fd95993b80c072a8ff8f02eec

# This is the commit message #112:

[RPC] Relax some profiling tests (#49983)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49983

We have observed very rare flakiness in some profiling tests recently,
i.e.: . However, we were not able to reproduce these even with thousands of
runs on the CI machines where the failure was originally reported. As a result,
relaxing these tests and re-enabling them to reduce failure rates.
ghstack-source-id: 119352019

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D25739416

fbshipit-source-id: 4dbb6b30f20d3af94ba39f4a7ccf4fb055e440bc

# This is the commit message #113:

support building with conda installed libraries (#50080)

Summary:
This should fix a bunch of share library compilation error when installed in conda lib, lib64 folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50080

Reviewed By: seemethere

Differential Revision: D25781923

Pulled By: walterddr

fbshipit-source-id: 78a74925981d65243b98bb99a65f1f2766e87a2f

# This is the commit message #114:

Fix store based barrier to only use 'add'. (#49930)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49930

Certain store implementations don't work well when we use get() and
add() on the same key. To avoid this issue, we only use add() in the store
based barrier. The buggy store implementations can't be properly fixed due to
legacy reasons.

Test Plan:
1) unit tests.
2) waitforbuildbot

Reviewed By: osalpekar

Differential Revision: D25725386

fbshipit-source-id: 1535e2629914de7f78847b730f8764f92cde67e7

# This is the commit message #115:

[caffe2][a10] Move down pragma pop to properly suppress warning 4522 (#49233)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49233

As the comments on line 160, say we should suppress this overly aggressive warning with MSVC:
```
caffe2\tensorbody.h_ovrsource#header-mode-symlink-tree-only,headers\aten\core\tensorbody.h(1223): warning C4522: 'at::Tensor': multiple assignment operators specified
```

However, in order to remove the warning, the closing brace of the class must be between the`#pragma warning` push and its corresponding pop. Move the pop down to ensure that.

Test Plan: Built locally using clang for Windows without buck cache, confirmed the warning resolved

Reviewed By: bhosmer

Differential Revision: D25422447

fbshipit-source-id: c1e1c66fb8513af5f9d4e3c1dc48d0070c4a1f84

# This is the commit message #116:

Drop unused imports from caffe2/python (#49980)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49980

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727359

fbshipit-source-id: c4f60005b10546423dc093d31d46deb418352286

# This is the commit message #117:

Update MultiHeadAttention docstring (#49950)

Summary:
Fixes MultiHeadAttention docstring.

Currently, https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html#torch.nn.MultiheadAttention
is

<img width="648" alt="Screen Shot 2020-12-29 at 21 06 43" src="https://user-images.githubusercontent.com/2459423/103311124-cd10cc00-4a19-11eb-89c9-0ee261364963.png">

and with the fix will be

<img width="648" alt="Screen Shot 2020-12-29 at 22 41 35" src="https://user-images.githubusercontent.com/2459423/103315838-0dc31200-4a27-11eb-82e2-ca8f13d713a1.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49950

Reviewed By: mrshenli

Differential Revision: D25732573

Pulled By: zhangguanheng66

fbshipit-source-id: b362f3f617ab26b0dd25c3a0a7d4117e522e620c

# This is the commit message #118:

Revert D25757691: [pytorch][PR] Run mypy over test/test_utils.py

Test Plan: revert-hammer

Differential Revision:
D25757691 (https://github.com/pytorch/pytorch/commit/c86cfcd81da46b5e8226441edb58f0b11a97f215)

Original commit changeset: 145ce3ae532c

fbshipit-source-id: 3dfd68f0c42fc074cde15c6213a630b16e9d8879

# This is the commit message #119:

Enable distribution validation if __debug__ (#48743)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/47123
Follows https://github.com/pyro-ppl/pyro/pull/2701

This turns on `Distribution` validation by default. The motivation is to favor beginners by providing helpful error messages. Advanced users focused on speed can disable validation by calling
```py
torch.distributions.Distribution.set_default_validate_args(False)
```
or by disabling individual distribution validation via `MyDistribution(..., validate_args=False)`.

In practice I have found many beginners forget or do not know about validation. Therefore I have [enabled it by default](https://github.com/pyro-ppl/pyro/pull/2701) in Pyro. I believe PyTorch could also benefit from this change. Indeed validation caught a number of bugs in `.icdf()` methods, in tests, and in PPL benchmarks, all of which have been fixed in this PR.

## Release concerns
- This may slightly slow down some models. Concerned users may disable validation.
- This may cause new `ValueErrors` in models that rely on unsupported behavior, e.g. `Categorical.log_prob()` applied to continuous-valued tensors (only {0,1}-valued tenso…

Loading branch information

M.L. Croci committed Jan 13, 2021

1 parent 5171bd9 commit a3763c7

.circleci/config.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,6 +11,9 @@ parameters: @@
       run_binary_tests:
         type: boolean
         default: false
+      run_build:
+        type: boolean
+        default: true
     docker_config_defaults: &docker_config_defaults
       user: jenkins
@@ Expand Down Expand Up / @@ -9762,6 +9765,7 @@ workflows: @@
                   only:
                     - postnightly
               executor: windows-with-nvidia-gpu
+        when: << pipeline.parameters.run_build >>
       ecr_gc:
         triggers:
           - schedule:
@@ Expand Down @@

.circleci/generate_config_yml.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -112,7 +112,10 @@ def gen_build_workflows_tree(): @@
                     "when": r"<< pipeline.parameters.run_binary_tests >>",
                     "jobs": [f() for f in binary_build_functions],
                 },
-                "build": {"jobs": [f() for f in build_workflows_functions]},
+                "build": {
+                    "when": r"<< pipeline.parameters.run_build >>",
+                    "jobs": [f() for f in build_workflows_functions]
+                },
             }
         }
@@ Expand Down @@

.circleci/scripts/binary_linux_test.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -51,7 +51,14 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then @@
         else
           cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
         fi
-        retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"
+        (
+          # For some reason conda likes to re-activate the conda environment when attempting this install
+          # which means that a deactivate is run and some variables might not exist when that happens,
+          # namely CONDA_MKL_INTERFACE_LAYER_BACKUP from libblas so let's just ignore unbound variables when
+          # it comes to the conda installation commands
+          set +u
+          retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"
+        )
       fi
     elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
       pip install "\$pkg"
@@ Expand Down @@

.circleci/scripts/windows_cuda_install.sh

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,11 +1,13 @@
  
    #!/bin/bash

    set -eux -o pipefail

    if [[ "$CUDA_VERSION" =~ ^10.* ]]; then

    cuda_major_version=${CUDA_VERSION%.*}

    if [[ "$cuda_major_version" == "10" ]]; then

        cuda_installer_name="cuda_10.1.243_426.00_win10"

        msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

        cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

    elif [[ "$CUDA_VERSION" =~ ^11.* ]]; then

    elif [[ "$cuda_major_version" == "11" ]]; then

        cuda_installer_name="cuda_11.1.0_456.43_win10"

        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

        cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

    @@ -14,7 +16,7 @@ else
  
        exit 1

    fi

    if [[ "$CUDA_VERSION" =~ ^11.* && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then

    if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then

        cuda_install_packages="${cuda_install_packages} Display.Driver"

    fi

.circleci/scripts/windows_cudnn_install.sh

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,9 +1,11 @@
  
    #!/bin/bash

    set -eux -o pipefail

    if [[ "$CUDA_VERSION" =~ ^10.* ]]; then

    cuda_major_version=${CUDA_VERSION%.*}

    if [[ "$cuda_major_version" == "10" ]]; then

        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"

    elif [[ "$CUDA_VERSION" =~ ^11.* ]]; then

    elif [[ "$cuda_major_version" == "11" ]]; then

        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"

    else

        echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"

.circleci/verbatim-sources/header-section.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,6 +11,9 @@ parameters: @@
       run_binary_tests:
         type: boolean
         default: false
+      run_build:
+        type: boolean
+        default: true
     docker_config_defaults: &docker_config_defaults
       user: jenkins
@@ Expand Down @@

.github/pytorch-circleci-labels.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -9,3 +9,5 @@ labels_to_circle_params: @@
             - release/.*
           tags:
             - v[0-9]+(\.[0-9]+)*-rc[0-9]+
+        set_to_false:
+          - run_build

.github/workflows/update_s3_htmls.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -9,6 +9,7 @@ on: @@
     jobs:
       update-html:
         runs-on: ubuntu-latest
+        if: ${{ github.repository_owner == 'pytorch' }}
         strategy:
           matrix:
             prefix: ["whl", "whl/test", "whl/nightly"]
@@ Expand Down @@

.gitignore

-Original file line number
+Diff line change
@@ Expand Up / @@ -10,6 +10,7 @@ @@
     .coverage
     coverage.xml
+    .dmypy.json
     .gradle
     .hypothesis
     .mypy_cache
@@ Expand Down @@

.jenkins/pytorch/README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -10,9 +10,9 @@ it is very easy to run these tests yourself:
  
       ``registry.pytorch.org/pytorch/pytorch-$BUILD_ENVIRONMENT:$DOCKER_VERSION``,

       where ``$BUILD_ENVIRONMENT`` is one of the build environments

       enumerated in

       [pytorch-dockerfiles](https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh)

       [pytorch-dockerfiles](https://github.com/pytorch/pytorch/blob/master/.circleci/docker/build.sh). The dockerfile used by jenkins can be found under the `.circle` [directory](https://github.com/pytorch/pytorch/blob/master/.circleci/docker)

    2. Run ``docker -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and

    2. Run ``docker run -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and

       run one of the scripts in this directory.

    The Docker images are designed so that any "reasonable" build commands

    @@ -38,5 +38,5 @@ mechanisms we use:
  
      build scripts.

    - We reroute well known paths like `/usr/bin/gcc` to alternate

      implementations with `update-alternatives, instead of setting

      implementations with `update-alternatives`, instead of setting

      `CC` and `CXX` in our implementations.

.jenkins/pytorch/codegen-test.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -48,13 +48,6 @@ python -m tools.autograd.gen_autograd \ @@
       "$OUT"/autograd \
       tools/autograd
-    # unboxing_wrappers codegen (called by torch codegen but can run independently)
-    mkdir -p "$OUT"/unboxing_wrappers
-    python -m tools.jit.gen_unboxing_wrappers \
-      "$OUT"/torch/share/ATen/Declarations.yaml \
-      "$OUT"/unboxing_wrappers \
-      tools/jit/templates
     # annotated_fn_args codegen (called by torch codegen but can run independently)
     mkdir -p "$OUT"/annotated_fn_args
     python -m tools.autograd.gen_annotated_fn_args \
@@ Expand Down @@

.jenkins/pytorch/macos-test.sh

-Original file line number
+Diff line change
@@ Expand Up @@
     # TODO move this to docker
     pip install unittest-xml-reporting pytest
-    # faulthandler become built-in since 3.3
-    if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
-      pip install -q faulthandler
-    fi
     if [ -z "${IN_CI}" ]; then
       rm -rf ${WORKSPACE_DIR}/miniconda3/lib/python3.6/site-packages/torch*
     fi
@@ Expand Down @@

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat

-Original file line number
+Diff line change
@@ Expand Up / @@ -41,8 +41,6 @@ popd @@
     :: The version is fixed to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
     pip install "ninja==1.10.0.post1" future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow unittest-xml-reporting pytest coverage
     if %errorlevel% neq 0 ( exit /b %errorlevel% )
-    :: No need to install faulthandler since we only test Python >= 3.6 on Windows
-    :: faulthandler is builtin since Python 3.3
     set DISTUTILS_USE_SDK=1
@@ Expand Down @@

BUILD.bazel

-Original file line number
+Diff line change
@@ Expand Up / @@ -193,9 +193,6 @@ libtorch_cpp_generated_sources = [ @@
             "torch/csrc/autograd/generated/Functions.h",
             "torch/csrc/autograd/generated/Functions.cpp",
             "torch/csrc/autograd/generated/variable_factories.h",
-            "torch/csrc/jit/generated/generated_unboxing_wrappers_0.cpp",
-            "torch/csrc/jit/generated/generated_unboxing_wrappers_1.cpp",
-            "torch/csrc/jit/generated/generated_unboxing_wrappers_2.cpp",
     ]
     libtorch_python_generated_sources = [
@@ Expand Down @@

CMakeLists.txt

-Original file line number
+Diff line change
@@ Expand Up / @@ -173,6 +173,8 @@ option(USE_NATIVE_ARCH "Use -march=native" OFF) @@
     cmake_dependent_option(
         USE_NCCL "Use NCCL" ON
         "USE_CUDA OR USE_ROCM;UNIX;NOT APPLE" OFF)
+    cmake_dependent_option(USE_RCCL "Use RCCL" ON
+        USE_NCCL OFF)
     cmake_dependent_option(
         USE_STATIC_NCCL "Use static NCCL" OFF
         "USE_NCCL" OFF)
@@ Expand Down Expand Up / @@ -316,7 +318,7 @@ set(OP_DEPENDENCY "" CACHE STRING @@
     # symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk
     # https://software.intel.com/en-us/articles/symbol-lookup-error-when-linking-intel-mkl-with-gcc-on-ubuntu
     if(LINUX)
-      set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--no-as-needed")
+      set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--no-as-needed ${CMAKE_SHARED_LINKER_FLAGS}")
     endif()
     if(MSVC)
@@ Expand Down @@

Dockerfile

-Original file line number
+Diff line change
@@ Expand Up @@
     RUN /opt/conda/bin/pip install torchelastic
     FROM ${BASE_IMAGE} as official
+    ARG PYTORCH_VERSION
     LABEL com.nvidia.volumes.needed="nvidia_driver"
     RUN --mount=type=cache,id=apt-final,target=/var/cache/apt \
         apt-get update && apt-get install -y --no-install-recommends \
@@ Expand All / @@ -71,6 +72,7 @@ ENV PATH /opt/conda/bin:$PATH @@
     ENV NVIDIA_VISIBLE_DEVICES all
     ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
     ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
+    ENV PYTORCH_VERSION ${PYTORCH_VERSION}
     WORKDIR /workspace
     FROM official as dev
@@ Expand Down @@

android/test_app/app/src/main/AndroidManifest.xml

-Original file line number
+Diff line change
@@ Expand Up / @@ -18,4 +18,10 @@ @@
         </application>
         <uses-permission android:name="android.permission.CAMERA" />
+        <!--
+         Permissions required by the Snapdragon Profiler to collect GPU metrics.
+        -->
+        <uses-permission android:name="android.permission.INTERNET" />
+        <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
     </manifest>

aten/conda/meta.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -24,7 +24,7 @@ requirements: @@
         - mkl # [not osx]
     about:
-      home: https://github.com/zdevito/ATen
+      home: https://github.com/pytorch/pytorch
       license: BSD
       summary: A TENsor library for C++14
@@ Expand Down @@

aten/src/ATen/ATen.h

-Original file line number
+Diff line change
@@ Expand Up / @@ -31,3 +31,4 @@ @@
     #include <c10/util/Exception.h>
     #include <ATen/core/UnsafeFromTH.h>
     #include <ATen/core/ivalue.h>
+    #include <ATen/core/jit_type.h>

aten/src/ATen/BatchingRegistrations.cpp

-Original file line number
+Diff line change
@@ Expand Up @@
       return self_physical.getPhysicalToLogicalMap().apply(result);
     }
+    Tensor trace_batching_rule(const Tensor& self) {
+      auto self_physical = MultiBatchVmapTransform::logicalToPhysical(self);
+      // Batched Diagonal View
+      auto self_diag = at::diagonal(self_physical.tensor(), /*offset*/0, /*dim1*/-2, /*dim2*/-1);
+      auto result =  at::sum(self_diag, -1);
+      return self_physical.getPhysicalToLogicalMap().apply(result);
+    }
+    Tensor trace_backward_batching_rule(const Tensor& grad, IntArrayRef input_sizes) {
+      auto grad_physical = MultiBatchVmapTransform::logicalToPhysical(grad);
+      auto grad_input = at::zeros(grad_physical.getPhysicalShape(input_sizes), grad.options());
+      // Batched Diagonal View
+      auto grad_input_diag = at::diagonal(grad_input, /*offset*/0, /*dim1*/-2, /*dim2*/-1);
+      // Append a dimension of size one to the grad output
+      auto grad_physical_tensor = grad_physical.tensor().unsqueeze(-1);
+      grad_input_diag.copy_(grad_physical_tensor);
+      return grad_physical.getPhysicalToLogicalMap().apply(grad_input);
+    }
     Tensor transpose_int_batching_rule(const Tensor& self, int64_t dim0, int64_t dim1) {
       // PyTorch has a special case where scalar_tensor.transpose(dim0, dim1) works
       // for dim0, dim1 in {0, -1} and returns the scalar tensor. If the following happens:
@@ Expand Down Expand Up / @@ -996,7 +1015,7 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { @@
       m.impl("_add_batch_dim", native::_add_batch_dim);
       m.impl("_remove_batch_dim", native::_remove_batch_dim);
-      m.impl_UNBOXED("sum.dim_IntList", sum_batching_rule);
+      m.impl("sum.dim_IntList", sum_batching_rule);
       m.impl("is_complex", native::is_complex);
       m.impl("conj", native::conj);
@@ Expand Down Expand Up / @@ -1029,6 +1048,7 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { @@
       m.impl("squeeze", squeeze_batching_rule);
       m.impl("squeeze.dim", squeeze_dim_batching_rule);
       m.impl("t", native::t); // composite wrt autograd
+      m.impl("trace", trace_batching_rule);
       m.impl("transpose.int", transpose_int_batching_rule);
       m.impl("unbind.int", unbind_batching_rule);
       m.impl("unfold", unfold_batching_rule);
@@ Expand Down Expand Up / @@ -1089,6 +1109,7 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { @@
     #undef TO_BATCHING_RULE
       m.impl("clone", clone_batching_rule);
+      using TensorTensorScalarType = Tensor (*)(const Tensor&, const Tensor&, Scalar);
       using TensorTensorType = Tensor (*)(const Tensor&, const Tensor&);
       using TensorScalarType = Tensor (*)(const Tensor&, Scalar);
@@ Expand All / @@ -1115,6 +1136,12 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { @@
       m.impl("pow.Scalar", pow_scalar_Tensor_batching_rule);
       m.impl("sigmoid_backward", binary_pointwise_batching_rule<TensorTensorType, at::sigmoid_backward>);
+      m.impl(
+          "threshold_backward",
+          binary_pointwise_batching_rule<
+              TensorTensorScalarType,
+              at::threshold_backward,
+              Scalar>);
       // for at::result_type, call the native::result_type implementation.
       // We don't have to do anything special because native::result_type operates
@@ Expand Down Expand Up / @@ -1150,6 +1177,7 @@ TORCH_LIBRARY_IMPL(aten, Batched, m) { @@
       // backward operators
       m.impl("select_backward", select_backward_batching_rule);
       m.impl("slice_backward", slice_backward_batching_rule);
+      m.impl("trace_backward", trace_backward_batching_rule);
       m.impl("diagonal_backward", diagonal_backward_batching_rule);
       // Tensor.new_* operators
@@ Expand Down @@

aten/src/ATen/CMakeLists.txt

-Original file line number
+Diff line change
@@ Expand Up / @@ -72,7 +72,7 @@ file(GLOB metal_h "metal/*.h") @@
     file(GLOB metal_cpp "metal/*.cpp")
     file(GLOB_RECURSE native_metal_h "native/metal/*.h")
     file(GLOB metal_test_srcs "native/metal/mpscnn/tests/*.mm")
-    file(GLOB_RECURSE native_metal_srcs "native/metal/*.mm", "native/metal/*.cpp")
+    file(GLOB_RECURSE native_metal_srcs "native/metal/*.mm" "native/metal/*.cpp")
     EXCLUDE(native_metal_srcs "${native_metal_srcs}" ${metal_test_srcs})
     file(GLOB metal_prepack_h "native/metal/MetalPrepackOpContext.h")
     file(GLOB metal_prepack_cpp "native/metal/MetalPrepackOpRegister.cpp")
@@ Expand Down @@

0 comments on commit `a3763c7`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `a3763c7`

Commit

There are no files selected for viewing

0 comments on commit a3763c7

0 comments on commit `a3763c7`