Merge pull request #1585 from PAIR-code/dev

Merge dev onto main for v1.3 release
PAIR-code · Oct 22, 2024 · cfcb439 · cfcb439
2 parents 61faeb6 + 7c3000c
commit cfcb439
Show file tree

Hide file tree

Showing 136 changed files with 2,896 additions and 9,866 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -45,14 +45,10 @@ jobs:
       uses: actions/setup-python@v4
       with:
         python-version: ${{ matrix.python-version }}
-    - name: Install Python dependencies
-      run: python -m pip install -r requirements.txt
-    - name: Install LIT package
-      run: python -m pip install -e .
+    - name: Install LIT package with testing dependencies
+      run: python -m pip install -e '.[test]'
     - name: Test LIT
-      run: |
-        python -m pip install pytest
-        pytest -v
+      run: pytest -v
     - name: Setup Node ${{ matrix.node-version }}
       uses: actions/setup-node@v2
       with:
@@ -73,4 +69,5 @@ jobs:
     - name: Build Docker image
       uses: docker/build-push-action@v4
       with:
+        target: lit-nlp-prod
         tags: lit-nlp:ci-${{ github.sha }}
diff --git a/Dockerfile b/Dockerfile
@@ -14,40 +14,62 @@
 # ==============================================================================
 # Use the official lightweight Python image.
 # https://hub.docker.com/_/python
-FROM python:3.10-slim
+
+# ---- LIT Base Container ----
+
+FROM python:3.11-slim AS lit-nlp-base
 
 # Update Ubuntu packages and install basic utils
 RUN apt-get update
 RUN apt-get install -y wget curl gnupg2 gcc g++ git
 
+# Copy local code to the container image.
+ENV APP_HOME=/app
+WORKDIR $APP_HOME
+
+COPY ./lit_nlp/examples/gunicorn_config.py ./
+
+
+
+# ---- LIT Container for Hosted Demos ----
+
+FROM lit-nlp-base AS lit-nlp-prod
+
+RUN python -m pip install 'lit-nlp[examples-discriminative-ai]'
+
+WORKDIR $APP_HOME
+ENTRYPOINT ["gunicorn", "--config=gunicorn_config.py"]
+
+
+
+# ---- LIT Container for Developing and Testing Hosted Demos ----
+
+FROM lit-nlp-base AS lit-nlp-dev
+
 # Install yarn
 RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -
 RUN echo "deb https://dl.yarnpkg.com/debian/ stable main" | \
     tee /etc/apt/sources.list.d/yarn.list
 RUN apt update && apt -y install yarn
 
-# Copy local code to the container image.
-ENV APP_HOME /app
-WORKDIR $APP_HOME
-
 # Set up python environment with production dependencies
 # This step is slow as it installs many packages.
-COPY ./requirements*.txt ./
-RUN python -m pip install -r requirements.txt
+COPY requirements.txt \
+     requirements_examples_common.txt \
+     requirements_examples_discriminative_ai.txt \
+     ./
+RUN python -m pip install -r requirements_examples_discriminative_ai.txt
 
 # Copy the rest of the lit_nlp package
 COPY . ./
 
 # Build front-end with yarn
 WORKDIR $APP_HOME/lit_nlp/client
-ENV NODE_OPTIONS "--openssl-legacy-provider"
+ENV NODE_OPTIONS="--openssl-legacy-provider"
 RUN yarn && yarn build && rm -rf node_modules/*
 
 # Run LIT server
 # Note that the config file supports configuring the LIT demo that is launched
 # via the DEMO_NAME and DEMO_PORT environment variables.
 WORKDIR $APP_HOME
-ENTRYPOINT [ \
-  "gunicorn", \
-  "--config=lit_nlp/examples/gunicorn_config.py" \
-]
+ENTRYPOINT ["gunicorn", "--config=gunicorn_config.py"]
diff --git a/README.md b/README.md
@@ -51,91 +51,81 @@ For a broader overview, check out [our paper](https://arxiv.org/abs/2008.05122)
 
 ## Download and Installation
 
-LIT can be run via container image, installed via `pip` or built from source.
-Building from source is necessary if you update any of the front-end or core
-back-end code.
+LIT can be installed via `pip` or built from source. Building from source is
+necessary if you want to make code changes.
 
-### Build container image
+### Install from PyPI with pip
 
-Build the image using `docker` or `podman`:
 ```sh
-git clone https://github.com/PAIR-code/lit.git && cd lit
-docker build --file Dockerfile --tag lit-nlp .
+pip install lit-nlp
 ```
 
-See the [advanced guide](https://pair-code.github.io/lit/documentation/docker) for detailed instructions on using the
-default LIT Docker image, running LIT as a containerized web app in different
-scenarios, and how to creating your own LIT images.
-
-### pip installation
+The default `pip` installation will install all required packages to use the LIT
+Python API, built-in interpretability components, and web application. To
+install dependencies for the provided demos or test suite, install LIT with the
+appropriate optional dependencies.
 
 ```sh
-pip install lit-nlp
-```
+# To install dependencies for the discriminative AI examples (GLUE, Penguin)
+pip install 'lit-nlp[examples-discriminative-ai]'
 
-The `pip` installation will install all necessary prerequisite packages for use
-of the core LIT package.
+# To install dependencies for the generative AI examples (Prompt Debugging)
+pip install 'lit-nlp[examples-generative-ai]'
 
-It **does not** install the prerequisites for the provided demos, so you need to
-install those yourself. See
-[requirements_examples.txt](./requirements_examples.txt) for the list of
-packages required to run the demos.
+# To install dependencies for all examples plus the test suite
+pip install 'lit-nlp[test]'
+```
 
 ### Install from source
 
 Clone the repo:
 
 ```sh
-git clone https://github.com/PAIR-code/lit.git && cd lit
+git clone https://github.com/PAIR-code/lit.git
+cd lit
 ```
 
-
 Note: be sure you are running Python 3.9+. If you have a different version on
 your system, use the `conda` instructions below to set up a Python 3.9
 environment.
 
-Set up a Python environment with `venv`:
+Set up a Python environment with `venv` (or your preferred environment manager).
+Note that these instructions assume you will be making code changes to LIT and
+includes the full requirements for all examples and the test suite. See the
+other optional dependency possibilities in the install with pip section.
 
 ```sh
 python -m venv .venv
 source .venv/bin/activate
+python -m pip install -e '.[test]'
 ```
 
-Or set up a Python environment using `conda`:
+The LIT repo does not include a distributable version of the LIT app. You must
+build it from source.
 
 ```sh
-conda create --name lit-nlp
-conda activate lit-nlp
-conda install python=3.9
-conda install pip
-```
-
-Once you have the environment, install LIT's dependencies:
-```sh
-python -m pip install -r requirements.txt
-python -m pip install cudnn cupti  # optional, for GPU support
-python -m pip install torch  # optional, for PyTorch
-
-# Build the frontend
 (cd lit_nlp; yarn && yarn build)
 ```
 
-Note: Use the `-r requirements.txt` option to install every dependency required
-for the LIT library, its test suite, and the built-in examples. You can also
-install subsets of these using the `-r requirements_core.txt` (core library),
-`-r requirements_test.txt` (test suite), `-r requirements_examples.txt`
-(examples), and/or any combination thereof.
-
 Note: if you see [an error](https://github.com/yarnpkg/yarn/issues/2821)
 running `yarn` on Ubuntu/Debian, be sure you have the
 [correct version installed](https://yarnpkg.com/en/docs/install#linux-tab).
 
-
 ## Running LIT
 
 Explore a collection of hosted demos on the
 [demos page](https://pair-code.github.io/lit/demos).
 
+### Using container images
+
+See the [containerization guide](https://pair-code.github.io/lit/documentation/docker) for instructions on using LIT
+locally in Docker, Podman, etc.
+
+LIT also provides pre-built images that can take advantage of accelerators,
+making Generative AI and LLM use cases easier to work with. Check out the
+[LIT on GCP docs](https://codelabs.developers.google.com/codelabs/responsible-ai/lit-on-gcp)
+for more.
+
 ### Quick-start: classification and regression
 
 To explore classification and regression models tasks from the popular
@@ -154,7 +144,6 @@ but you can switch to
 [STS-B](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) or
 [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) using the toolbar or the
 gear icon in the upper right.
-```
 
 And navigate to http://localhost:5432 for the UI.
 
@@ -220,18 +209,19 @@ Google's [Python](https://google.github.io/styleguide/pyguide.html) and
 
 ```sh
 # Run Pylint on your code using the following command from the root of this repo
-pushd lit_nlp & pylint & popd
+(cd lit_nlp; pylint)
 
 # Run ESLint on your code using the following command from the root of this repo
-pushd lit_nlp & yarn lint & popd
+(cd lit_nlp; yarn lint)
 ```
 
 ## Citing LIT
 
-If you use LIT as part of your work, please cite
-[our EMNLP paper](https://arxiv.org/abs/2008.05122):
+If you use LIT as part of your work, please cite the
+[EMNLP paper](https://arxiv.org/abs/2008.05122) or the
+[Sequence Salience paper](https://arxiv.org/abs/2404.07498)
 
-```
+```BibTeX
 @misc{tenney2020language,
     title={The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for {NLP} Models},
     author={Ian Tenney and James Wexler and Jasmijn Bastings and Tolga Bolukbasi and Andy Coenen and Sebastian Gehrmann and Ellen Jiang and Mahima Pushkarna and Carey Radebaugh and Emily Reif and Ann Yuan},
@@ -243,12 +233,22 @@ If you use LIT as part of your work, please cite
 }
 ```
 
+```BibTeX
+@article{tenney2024interactive,
+  title={Interactive prompt debugging with sequence salience},
+  author={Tenney, Ian and Mullins, Ryan and Du, Bin and Pandya, Shree and Kahng, Minsuk and Dixon, Lucas},
+  journal={arXiv preprint arXiv:2404.07498},
+  year={2024}
+}
+```
+
 ## Disclaimer
 
 This is not an official Google product.
 
-LIT is a research project and under active development by a small team. There
-will be some bugs and rough edges, but we're releasing at an early stage because
-we think it's pretty useful already. We want LIT to be an open platform, not a
-walled garden, and we would love your suggestions and feedback - drop us a line
-in the [issues](https://github.com/pair-code/lit/issues).
+LIT is a research project and under active development by a small team. We want
+LIT to be an open platform, not a walled garden, and would love your suggestions
+and feedback &ndash; please
+[report any bugs](https://github.com/pair-code/lit/issues) and reach out on the
+[Discussions page](https://github.com/PAIR-code/lit/discussions/landing).
+
diff --git a/RELEASE.md b/RELEASE.md
@@ -1,5 +1,45 @@
 # Learning Interpretability Tool Release Notes
 
+## Release 1.3
+
+This release updates how the Learning Interpretability Tool (LIT) can be
+deployed on Google Cloud. You can now use LIT to interpret foundation
+models&mdash;including
+[Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference),
+[Gemma](https://ai.google.dev/gemma), [Llama](https://www.llama.com/), and
+[Mistral](https://mistral.ai/technology/#models)&mdash;using LIT's prompt
+debugging workflows. LIT now provides public container images to make it easier
+to deploy on your hosting platform of choice, with an updated
+[tutorial](https://codelabs.developers.google.com/codelabs/responsible-ai/lit-on-gcp)
+for deploying LIT with [Cloud Run](https://cloud.google.com/run).
+
+### New Stuff
+* LIT on GCP -
+[1075325](https://github.com/PAIR-code/lit/commit/1075325c6a08d8fdef3bcf66f193b8d5aef673fb),
+[1acc868](https://github.com/PAIR-code/lit/commit/1acc868d4a5fa0fd2a135f132f56bb4cb8ba3990),
+[55bfc99](https://github.com/PAIR-code/lit/commit/55bfc993cc27fd25ae5089d58ae822bfeca296a3),
+[180f68a](https://github.com/PAIR-code/lit/commit/180f68ad3774f8b276e262c0dcb7307ad87e42a3),
+[64114d5](https://github.com/PAIR-code/lit/commit/64114d553ffd2c0ffd7bc674fb32a36e564ea0f4),
+[2488aa7](https://github.com/PAIR-code/lit/commit/2488aa7cf8f8a112607ca0c8b40870efde73ec24),
+[9baac29](https://github.com/PAIR-code/lit/commit/9baac29b96970ef7fa64f2f36ce2c79ff73707b7),
+[60bdc7c](https://github.com/PAIR-code/lit/commit/60bdc7cf382bd0c5ead2576c119277230a6080c9),
+[7681476](https://github.com/PAIR-code/lit/commit/7681476d5056d927905f24333b890501a36df040),
+[4c81182](https://github.com/PAIR-code/lit/commit/4c81182a7db1fda7f8ba071a9542876f462a13fa),
+[4e5e8e2](https://github.com/PAIR-code/lit/commit/4e5e8e25c2abb658dc141f0d9c6059dd41e14535),
+[b9a0b82](https://github.com/PAIR-code/lit/commit/b9a0b8210263da9ee6d741e4e0f0444849e3a141),
+[424adce](https://github.com/PAIR-code/lit/commit/424adce9cf8c9cbabdf5d89d485cdc5f3fd098ed),
+[1d019c7](https://github.com/PAIR-code/lit/commit/1d019c7a1bf5f135ea42104889167b79c3f795cd),
+[f4436a2](https://github.com/PAIR-code/lit/commit/f4436a26ed79f481e16e2c53c0551703e7ba8c4f),
+
+### Non-breaking Changes, Bug Fixes, and Enhancements
+* Upgrade LIT to MobX v6. - [c1f5055](https://github.com/PAIR-code/lit/commit/c1f5055eb7ee8b3671484c863a0967c05fa58338)
+* Fix indexing issue in Sequence Salience module. - [58b1d2](https://github.com/PAIR-code/lit/commit/58b1d2b6d0d27c6dca086520cef45bf75466a101)
+* Load multiple model wrappers with shared model. - [ba4d975](https://github.com/PAIR-code/lit/commit/ba4d975a90612b0c41a02b3dcb4dbb548261fdd7)
+* Add the custom model and dataset loaders to prompt debugging notebook. - [338c6b](https://github.com/PAIR-code/lit/commit/338c6b12de98b61287a25650ad2c6ad7f7bb80cd)
+* Convert hosted demos images to multi-stage builds. - [4bf1f8](https://github.com/PAIR-code/lit/commit/4bf1f81666fe546357f00c86a2315d2852346ebe)
+* Adding testing instructions to README. - [f24b841](https://github.com/PAIR-code/lit/commit/f24b841959f0402498a056a5164a86ecae6dbb94)
+* More LIT documentation updates. - [2e9d267](https://github.com/PAIR-code/lit/commit/2e9d26738d9344cde0eebd66d49dfc14cd800e74)
+
 ## Release 1.2
 
 This release covers clean-ups on various obsolete demos, as well as improved
@@ -270,7 +310,7 @@ A full list of contributors to this repo can be found at https://github.com/PAIR
   [a95ed67](https://github.com/PAIR-code/lit/commit/a95ed67100f24163624edb4bb659ccfa871dc9bf)
 * Add output embeddings and attention options to GlueConfig -
   [6e0df41](https://github.com/PAIR-code/lit/commit/6e0df41636405b4ee5556cbf797fcce5887c6070)
-* Allow downloading/copying data from the slice editor - 
+* Allow downloading/copying data from the slice editor -
   [57fac3a](https://github.com/PAIR-code/lit/commit/57fac3aeb98fa49c508b20837eded3f4ec80e8f9)
 * Use new custom tooltip elemement in various places -
   [d409900](https://github.com/PAIR-code/lit/commit/d409900984336d4f8ac73735b1fff57c92623ca4),

diff --git a/docs/documentation/_images/lit-colab-server-address.png b/docs/documentation/_images/lit-colab-server-address.png
diff --git a/docs/documentation/_images/lit-ui-error-in-colab.png b/docs/documentation/_images/lit-ui-error-in-colab.png
diff --git a/docs/documentation/_sources/faq.md.txt b/docs/documentation/_sources/faq.md.txt
@@ -171,3 +171,56 @@ this, using LIT's `Dataset` objects to manage training data along with standard
 training APIs (such as Keras' `model.fit()`). See
 [`glue/models.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/models.py)
 for examples.
+
+### Debug LIT UI in Colab
+
+The LIT instance launched from CLI typically has helpful error messages in the
+UI. However, this is not the case for the LIT UI in Colab and the error message
+does not report any stacktrace, which makes debugging very difficult.
+
+![LIT UI error in colab](./images/lit-ui-error-in-colab.png "LIT UI error in colab")
+
+While in
+[Chrome developer tools](https://support.google.com/campaignmanager/answer/2828688?hl=en),
+you will be able to debug issues solely related to the frontend, but not so for
+issues related to the backend or on the HTTP request path.
+
+Thus, to show the full stacktrace, you would need to find the HTTP request sent
+from the frontend to the backend, compose the same request in colab and send it
+to the server.
+
+1.  When rendering the UI, display it in a separate tab to make things a bit
+    easier to work with, e.g. `lit_widget.render(open_in_new_tab=True)`.
+2.  Open
+    [Chrome developer tools](https://support.google.com/campaignmanager/answer/2828688?hl=en),
+    go to "Sources" tab and find the file
+    [client/services/api_service.ts](https://github.com/PAIR-code/lit/blob/main/lit_nlp/client/services/api_service.ts) and set a
+    breakpoint right after where the HTTP request is set up in the `queryServer`
+    method, e.g. after this line `const res = await fetch(url, {method: 'POST',
+    body});`.
+    *   Note it is possible that the whole frontend source code is compiled into
+        a `main.js` file, and the code is not exactly the same as that in LIT
+        frontend source code. You might have to do a bit digging to find the
+        right line.
+3.  Go to the UI and trigger the behavior that causes the error. Now in Chrome
+    developer tools you will be able to see the variables and their values in
+    the `queryServer` method. Copy the values of the `url` and `body` variables
+    in the method.
+4.  Go back to Colab, compose your HTTP request method. Look for the main server
+    address printed out from `lit_widget.render(open_in_new_tab=True)`.
+
+![LIT colab server address](./images/lit-colab-server-address.png "LIT colab server address")
+
+Let's say the server address is "https://localhost:32943/?" as shown above, the
+`body` variable obtained earlier has value "request_body_text" and the `url`
+variable has value "./get_preds?param1=value1". Then your HTTP request will be
+like this:
+
+```sh
+! curl -H "Content-Type: application/json" \
+       -d "request_body_text" \
+       -X POST "http://localhost:32943/get_preds?param1=value1"
+```
+
+Run this in Colab and you should be able to retrieve the full stacktrace of the
+error.