This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@bigfootjon has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@bigfootjon merged this pull request in 1142dc4. |
sburman
added a commit
to sageailabs/fastText
that referenced
this pull request
May 22, 2024
* Replace outdated url in the scripts Summary: Replace outdated url in the scripts Reviewed By: piotr-bojanowski Differential Revision: D43464784 fbshipit-source-id: 51a98a9ad5a0939acd0d578126290909a613938b * Add documentation about Hugging Face integration (facebookresearch#1335) Summary: [Word vectors](https://huggingface.co/facebook/fasttext-en-vectors) for 157 languages are now hosted on the Hugging Face Hub as well as the [language identification model](https://huggingface.co/facebook/fasttext-language-identification). (cc ajoulin) A newer language model [referred in the NLLB project](https://github.com/facebookresearch/fairseq/blob/nllb/README.md#lid-model) is not mentioned in the official website, so I updated the doc accordingly. Pull Request resolved: facebookresearch#1335 Reviewed By: Celebio Differential Revision: D46507563 Pulled By: jmp84 fbshipit-source-id: 64883a6829c68b968acd980ba77a712b8e7a1365 * Migrate "deeplearning/fastText" from LLVM-12 to LLVM-15 Summary: fbcode is migrating to LLVM-15 for safer and more up-to-date code and new compiler features. All contbuilds in your directory have passed our build test with LLVM-15, and your directory does not host any packages. This diff will migrate it to LLVM-15. If you approve of this diff, please use the "Accept & Ship" button. If you have a reason for why it should not build with LLVM 15, please make a comment and send it back to author. Otherwise we will land this on Thursday 06/15/2023. See the [FAQ post](https://fb.workplace.com/groups/llvm15platform010/posts/749154386769776/)! Please also direct any questions to [this group](https://fb.workplace.com/groups/llvm15platform010). - If you approve of this diff, please use the "Accept & Ship" button :-) Reviewed By: meyering Differential Revision: D46661531 fbshipit-source-id: 7278fbfcadec2392c94efd6deb710bdd5e9280f8 * Del `(object)` from 200 inc deeplearning/aicamera/trainer/utils/metrics.py Summary: Python3 makes the use of `(object)` in class inheritance unnecessary. Let's modernize our code by eliminating this. Reviewed By: itamaro Differential Revision: D48673901 fbshipit-source-id: 3e0ef05efe886b32a07bb58bd0725fa2ec934c14 * deeplearning, dcp (2972240286315620591) Reviewed By: r-barnes Differential Revision: D49677606 fbshipit-source-id: ec5b375177586c76ecccb83a29b562bc6e9961f6 * Add pyproject.toml to comply with PEP-518 (facebookresearch#1292) Summary: Adds pyproject.toml to comply with PEP-518, which fixes the building of the library by poetry - See python-poetry/poetry#6113 . This is a copy of facebookresearch#1270 , but I have signed the CLA. Pull Request resolved: facebookresearch#1292 Differential Revision: D51601444 Pulled By: alexkosau fbshipit-source-id: 357d702281ca3519c3640483eba04d124d0744b4 * fix compile error with gcc13 facebookresearch#1281 (facebookresearch#1340) Summary: Due to[ header dependency changes](https://gcc.gnu.org/gcc-13/porting_to.html#header-dep-changes) in GCC 13, we need to include the <cstdint> header. Pull Request resolved: facebookresearch#1340 Reviewed By: jmp84 Differential Revision: D51602433 Pulled By: alexkosau fbshipit-source-id: cc9bffb276cb00f1db8ec97a36784c484ae4563a * Predict 1.9-4.2x faster (facebookresearch#1341) Summary: I made prediction 1.9x to 4.2x faster than before. # Motivation I want to use https://tinyurl.com/nllblid218e and similarly parametrized models to run language classification on petabytes of web data. # Methodology The costliest operation is summing the rows for each model input. I've optimized this in three ways: 1. `addRowToVector` was a virtual function call for each row. I've replaced this with one virtual function call per prediction by adding `averageRowsToVector` to `Matrix` calls. 2. `Vector` and `DenseMatrix` were not 64-byte aligned so the CPU was doing a lot of unaligned memory access. I've brought in my own `vector` replacement that does 64-byte alignment. 3. Write the `averageRowsToVector` in intrinsics for common vector sizes. This works on SSE, AVX, and AVX512F. See the commit history for a breakdown of speed improvement from each change. # Experiments Test set [docs1000.txt.gz](https://github.com/facebookresearch/fastText/files/11832996/docs1000.txt.gz) which is a bunch of random documents https://data.statmt.org/heafield/classified-fasttext/ CPU: AMD Ryzen 9 7950X 16-Core Model https://tinyurl.com/nllblid218e with 256-dimensional vectors Before real 0m8.757s user 0m8.434s sys 0m0.327s After real 0m2.046s user 0m1.717s sys 0m0.334s Model https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin with 16-dimensional vectors Before real 0m0.926s user 0m0.889s sys 0m0.037s After real 0m0.477s user 0m0.436s sys 0m0.040s Pull Request resolved: facebookresearch#1341 Reviewed By: graemenail Differential Revision: D52134736 Pulled By: kpuatfb fbshipit-source-id: 42067161f4c968c34612934b48a562399a267f3b * deeplearning/fastText 2/2 Reviewed By: azad-meta Differential Revision: D53908330 fbshipit-source-id: b2215f0522c32a82cd876633210befefe9317d76 * Delete .circleci directory (facebookresearch#1366) Summary: Pull Request resolved: facebookresearch#1366 Reviewed By: jailby Differential Revision: D54850920 Pulled By: bigfootjon fbshipit-source-id: 9a3eec7b7cb42335a786fb247cb16be9ed3c2d59 * this page intentionally left blank --------- Co-authored-by: Onur Çelebi <celebio@meta.com> Co-authored-by: Sheon Han <sheon.han@gmail.com> Co-authored-by: generatedunixname89002005320047 <generatedunixname89002005320047@meta.com> Co-authored-by: Richard Barnes <rbarnes@meta.com> Co-authored-by: generatedunixname89002005287564 <generatedunixname89002005287564@meta.com> Co-authored-by: Chris Culhane <cfculhane@gmail.com> Co-authored-by: Cherilyn Buren <88433283+NiuBlibing@users.noreply.github.com> Co-authored-by: Kenneth Heafield <github@kheafield.com> Co-authored-by: Jon Janzen <jon@jonjanzen.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.