diff --git a/HOWTO.md b/HOWTO.md index a6a293a827..2bbe7e9c99 100644 --- a/HOWTO.md +++ b/HOWTO.md @@ -364,6 +364,10 @@ Transformer( We don't have the exact same interfaces, but we have something fairly close with the [model_factory](xformers/factory/model_factory.py). +It’s worth noting that xFormer’s blocks expect tensors to be batch first, while Pytorch’s transformers uses a sequence first convention. Don’t forget to permute if you use xFormers’s blocks as drop-in replacements. + +Similarly, the attention masks conventions are different: in Pytorch, the mask is *True* when an element should *not* be attended to, whereas in xFormer it’s the opposite. Don’t forget to negate your attention masks to use xFormers’ blocks as drop-in replacements. + The equivalent with xFormers would look like the following. You can think of it as a declaration of the sequence of blocks that you would like instantiated. ```python diff --git a/README.md b/README.md index 87af758b14..3ccdebbc8f 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,11 @@ -![PyPI](https://img.shields.io/pypi/v/xformers) -[![Documentation Status](https://readthedocs.org/projects/xformers/badge/?version=latest)](https://xformers.readthedocs.io/en/latest/?badge=latest) + + + +[![Documentation Status](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg)](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg) [![CircleCI](https://circleci.com/gh/facebookresearch/xformers.svg?style=shield)](https://app.circleci.com/pipelines/github/facebookresearch/xformers/) -![PyPI - License](https://img.shields.io/pypi/l/xformers) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md) -------------------------------------------------------------------------------- @@ -13,7 +15,7 @@ xFormers is a modular and field agnostic library to flexibly generate transforme ## Getting started -The full [documentation](https://xformers.readthedocs.io/) contains instructions for getting started, deep dives and tutorials about the various APIs. +The full [documentation](https://facebookresearch.github.io/xformers/) contains instructions for getting started, deep dives and tutorials about the various APIs. If in doubt, please check out the [HOWTO](HOWTO.md). Only some general considerations are laid out in the README. ### Installation diff --git a/xformers/triton/softmax.py b/xformers/triton/softmax.py index 067ba49081..909b91de4e 100644 --- a/xformers/triton/softmax.py +++ b/xformers/triton/softmax.py @@ -330,7 +330,6 @@ def _softmax_dispatch(x: torch.Tensor, log: bool, mask: Optional[torch.Tensor], and x.is_cuda and not _triton_registered_overflow ): - # pyre-ignore[16]: Pyre is unable to find the `apply` method. return _softmax_triton.apply(x, mask, log, causal) except triton.code_gen.OutOfResources: # Catch cases where the current GPU does not have enough registers to hold a full tensor line