facebookresearch · blefaudeux · Oct 19, 2021 · Oct 19, 2021 · Oct 19, 2021 · Oct 19, 2021
diff --git a/HOWTO.md b/HOWTO.md
@@ -364,6 +364,10 @@ Transformer(
 
 We don't have the exact same interfaces, but we have something fairly close with the [model_factory](xformers/factory/model_factory.py).
 
+It’s worth noting that xFormer’s blocks expect tensors to be batch first, while Pytorch’s transformers uses a sequence first convention. Don’t forget to permute if you use xFormers’s blocks as drop-in replacements.
+
+Similarly, the attention masks conventions are different: in Pytorch, the mask is *True* when an element should *not* be attended to, whereas in xFormer it’s the opposite. Don’t forget to negate your attention masks to use xFormers’ blocks as drop-in replacements.
+
 The equivalent with xFormers would look like the following. You can think of it  as a declaration of the sequence of blocks that you would like instantiated.
 
 ```python

diff --git a/README.md b/README.md
@@ -1,9 +1,11 @@
 <img src="./docs/assets/logo.png" width=800>
 
-![PyPI](https://img.shields.io/pypi/v/xformers)
-[![Documentation Status](https://readthedocs.org/projects/xformers/badge/?version=latest)](https://xformers.readthedocs.io/en/latest/?badge=latest)
+<!-- FIXME @lefaudeux - PyPI package -->
+<!-- ![PyPI](https://img.shields.io/pypi/v/xformers)
+![PyPI - License](https://img.shields.io/pypi/l/xformers) -->
+
+[![Documentation Status](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg)](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg)
 [![CircleCI](https://circleci.com/gh/facebookresearch/xformers.svg?style=shield)](https://app.circleci.com/pipelines/github/facebookresearch/xformers/)
-![PyPI - License](https://img.shields.io/pypi/l/xformers)
 [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
 --------------------------------------------------------------------------------
 
@@ -13,7 +15,7 @@ xFormers is a modular and field agnostic library to flexibly generate transforme
 
 ## Getting started
 
-The full [documentation](https://xformers.readthedocs.io/) contains instructions for getting started, deep dives and tutorials about the various APIs.
+The full [documentation](https://facebookresearch.github.io/xformers/) contains instructions for getting started, deep dives and tutorials about the various APIs.
 If in doubt, please check out the [HOWTO](HOWTO.md). Only some general considerations are laid out in the README.
 
 ### Installation

diff --git a/xformers/triton/softmax.py b/xformers/triton/softmax.py
@@ -330,7 +330,6 @@ def _softmax_dispatch(x: torch.Tensor, log: bool, mask: Optional[torch.Tensor],
             and x.is_cuda
             and not _triton_registered_overflow
         ):
-            # pyre-ignore[16]: Pyre is unable to find the `apply` method.
             return _softmax_triton.apply(x, mask, log, causal)
     except triton.code_gen.OutOfResources:
         # Catch cases where the current GPU does not have enough registers to hold a full tensor line