Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU testing for Jax+Torch #1935

Merged
merged 27 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6087fbd
Add GPU testing for torch and jax
ianstenbit Jul 11, 2023
6a1eae3
Consolidate cloudbuild files
ianstenbit Jul 11, 2023
f885ddb
Reformat image name
ianstenbit Jul 11, 2023
fdf201c
gcr.io/cloud-builders/docker
ianstenbit Jul 11, 2023
9066e3f
Underscores are hard
ianstenbit Jul 11, 2023
16ef366
Yay Docker
ianstenbit Jul 11, 2023
8fb05ea
I have activated my second brain cell
ianstenbit Jul 11, 2023
f66a72c
IMAGE_NAME
ianstenbit Jul 11, 2023
6a55cfb
Entrypoint fix in jssonnet
ianstenbit Jul 11, 2023
c41ab45
Re-do env variables in jssonnet
ianstenbit Jul 11, 2023
48d3039
Another one
ianstenbit Jul 11, 2023
dac2df3
Testing an idea
ianstenbit Jul 11, 2023
c8886b9
Try string format
ianstenbit Jul 11, 2023
039d822
Remove bad export
ianstenbit Jul 11, 2023
f4e2962
Rename + try Torch docker image
ianstenbit Jul 11, 2023
ed1d6e0
Create a base test case with Numpy conversion
ianstenbit Jul 12, 2023
32fd07a
Some test fixes
ianstenbit Jul 12, 2023
c76e352
Some test fixes
ianstenbit Jul 12, 2023
1a4a0e1
We out here fixing tests
ianstenbit Jul 12, 2023
b5ee69a
Test fixes -- morning style!
ianstenbit Jul 12, 2023
495e6ed
Merge conflict
ianstenbit Jul 12, 2023
24aaa65
Update README and include a CUDA verification test
ianstenbit Jul 12, 2023
a746374
Better cuda test
ianstenbit Jul 12, 2023
f5c74e0
Working docker config
ianstenbit Jul 12, 2023
6491da2
Last round of test fixes ... maybe?
ianstenbit Jul 12, 2023
dd6730f
Merge from master
ianstenbit Jul 12, 2023
91760f9
Fix docstring + add attribution to Matt
ianstenbit Jul 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ jobs:
KERAS_BACKEND: ${{ matrix.backend }}
JAX_ENABLE_X64: true
run: |
pytest --run_large keras_cv/bounding_box \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume --run_large was a mistake here since this is CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a mistake, I was just doing this during development of the rebase so that we covered all the tests. But yes we shouldn't include it anymore

pytest keras_cv/bounding_box \
keras_cv/callbacks \
keras_cv/losses \
keras_cv/layers/object_detection \
Expand Down
4 changes: 2 additions & 2 deletions cloudbuild/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# keras-cv-image:deps has all deps of KerasCV for testing.
FROM us-west1-docker.pkg.dev/keras-team-test/keras-cv-test/keras-cv-image:deps
ARG IMAGE_NAME
FROM $IMAGE_NAME
COPY . /kerascv
WORKDIR /kerascv
25 changes: 24 additions & 1 deletion cloudbuild/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,29 @@ RUN pip install -r keras-cv/requirements.txt
```
- Run the following command from the directory with your `Dockerfile`:
```
gcloud builds submit --region=us-west1 --tag us-west1-docker.pkg.dev/keras-team-test/keras-cv-test/keras-cv-image:deps --timeout=10m
gcloud builds submit --region=us-west1 --tag us-west1-docker.pkg.dev/keras-team-test/keras-cv-test/keras-cv-image-tensorflow:deps --timeout=20m
```
- Repeat the last two steps for Jax and Torch (replacing "tensorflow" with "jax"
or "torch" in the docker image target name). `Dockerfile` for jax:
```
FROM nvidia/cuda:11.7.1-base-ubuntu20.04
ianstenbit marked this conversation as resolved.
Show resolved Hide resolved
RUN apt-get update
RUN apt-get install -y python3 python3-pip
RUN apt-get install -y git
RUN git clone https://github.com/{path_to_keras_cv_fork}.git
RUN cd keras-cv && git checkout {branch_name}
RUN pip install -r keras-cv/requirements.txt
RUN pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```
and for torch:
```
FROM nvidia/cuda:11.7.1-base-ubuntu20.04
RUN apt-get update
RUN apt-get install -y python3 python3-pip
RUN apt-get install -y git
RUN git clone https://github.com/{path_to_keras_cv_fork}.git
RUN cd keras-cv && git checkout {branch_name}
RUN pip install -r keras-cv/requirements.txt
RUN pip install torch torchvision
```
- Merge the PR adding the dependency
16 changes: 7 additions & 9 deletions cloudbuild/cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,14 @@ substitutions:
# Location of GKE cluster.
_CLUSTER_ZONE: 'us-west1-b'
# Image name.
_IMAGE_NAME: 'us-west1-docker.pkg.dev/keras-team-test/keras-cv-test/keras-cv-image'
_IMAGE_NAME: 'us-west1-docker.pkg.dev/keras-team-test/keras-cv-test/keras-cv-image-${_BACKEND}'
steps:
- name: 'docker'
- name: 'gcr.io/cloud-builders/docker'
id: build-image
args: [
'build',
'.',
'-f', 'cloudbuild/Dockerfile',
'-t', '$_IMAGE_NAME:$BUILD_ID',
]
- name: 'docker'
entrypoint: 'bash'
args:
['-c', 'docker build -f cloudbuild/Dockerfile -t $_IMAGE_NAME:$BUILD_ID --build-arg IMAGE_NAME=$_IMAGE_NAME:deps .']
- name: 'gcr.io/cloud-builders/docker'
id: push-image
waitFor:
- build-image
Expand Down Expand Up @@ -50,6 +47,7 @@ steps:
'--ext-str', 'image=$_IMAGE_NAME',
'--ext-str', 'tag_name=$BUILD_ID',
'--ext-str', 'gcs_bucket=$_GCS_BUCKET',
'--ext-str', 'backend=$_BACKEND',
'-o', 'output.yaml',
]
- name: 'gcr.io/cloud-builders/gcloud'
Expand Down
31 changes: 20 additions & 11 deletions cloudbuild/unit_test_jobs.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,11 @@ local gpus = import 'templates/gpus.libsonnet';
local image = std.extVar('image');
local tagName = std.extVar('tag_name');
local gcsBucket = std.extVar('gcs_bucket');
local backend = std.extVar('backend');

local unittest = base.BaseTest {
// Configure job name.
frameworkPrefix: "tf",
frameworkPrefix: backend,
modelName: "keras-cv",
mode: "unit-tests",
timeout: 3600, # 1 hour, in seconds
Expand All @@ -21,20 +22,28 @@ local unittest = base.BaseTest {
entrypoint: [
'bash',
'-c',
|||
# Build custom ops from source
python build_deps/configure.py
bazel-5.4.0 build keras_cv/custom_ops:all --verbose_failures
cp bazel-bin/keras_cv/custom_ops/*.so keras_cv/custom_ops/
export TEST_CUSTOM_OPS=true
std.format(
|||
export KERAS_BACKEND=%s
export JAX_ENABLE_X64=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you end up needed int64/float64? Just curious really.

In KerasNLP we have just been trying to go int32 everywhere by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably should just do int32 everywhere. During NMS porting I had some things that were real sticklers about int64 in TF and I never found a way to work around

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially just change the dtype for those tests? Asking since we're always flirting with OOM issues and int64 can't help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, yes. I did some looking and it seems like the only place we're using int64 internally with KerasCore is in the YOLOV8 label encoder -- I guess I already got rid of them from NMS. I'll check with Tirth if he's planning on getting rid of those.


# Run whatever is in `command` here.
${@:0}
|||
# Run whatever is in `command` here.
${@:0}
|||,
backend
)
],
command: [
'pytest --run_large --durations 0',
'keras_cv',
'keras_cv/bounding_box',
'keras_cv/callbacks',
'keras_cv/losses',
'keras_cv/layers/object_detection',
'keras_cv/layers/preprocessing',
'keras_cv/models/backbones',
'keras_cv/models/classification',
'keras_cv/models/object_detection/retinanet',
'keras_cv/models/object_detection/yolo_v8',
],
};

Expand Down
3 changes: 2 additions & 1 deletion keras_cv/bounding_box/converters_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from absl.testing import parameterized

from keras_cv import bounding_box
from keras_cv.tests.test_case import TestCase

xyxy_box = np.array([[[10, 20, 110, 120], [20, 30, 120, 130]]], dtype="float32")
yxyx_box = np.array([[[20, 10, 120, 110], [30, 20, 130, 120]]], dtype="float32")
Expand Down Expand Up @@ -88,7 +89,7 @@
] + [("xyxy_xyxy", "xyxy", "xyxy")]


class ConvertersTestCase(tf.test.TestCase, parameterized.TestCase):
class ConvertersTestCase(TestCase):
@parameterized.named_parameters(*test_cases)
def test_converters(self, source, target):
source_box = boxes[source]
Expand Down
4 changes: 2 additions & 2 deletions keras_cv/bounding_box/ensure_tensor_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import tensorflow as tf

from keras_cv import bounding_box
from keras_cv.backend import ops
from keras_cv.tests.test_case import TestCase


class BoundingBoxEnsureTensorTest(tf.test.TestCase):
class BoundingBoxEnsureTensorTest(TestCase):
def test_convert_list(self):
boxes = {"boxes": [[0, 1, 2, 3]], "classes": [0]}
output = bounding_box.ensure_tensor(boxes)
Expand Down
8 changes: 4 additions & 4 deletions keras_cv/bounding_box/iou_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,20 @@
"""Tests for iou functions."""

import numpy as np
import tensorflow as tf

from keras_cv.bounding_box import iou as iou_lib
from keras_cv.tests.test_case import TestCase


class IoUTest(tf.test.TestCase):
class IoUTest(TestCase):
def test_compute_single_iou(self):
bb1 = np.array([[100, 101, 200, 201]])
bb1_off_by_1 = np.array([[101, 102, 201, 202]])
# area of bb1 and bb1_off_by_1 are each 10000.
# intersection area is 99*99=9801
# iou=9801/(2*10000 - 9801)=0.96097656633
self.assertAlmostEqual(
iou_lib.compute_iou(bb1, bb1_off_by_1, "yxyx")[0], 0.96097656633
self.assertAllClose(
iou_lib.compute_iou(bb1, bb1_off_by_1, "yxyx")[0], [0.96097656633]
)

def test_compute_iou(self):
Expand Down
6 changes: 4 additions & 2 deletions keras_cv/bounding_box/mask_invalid_detections_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@

from keras_cv import bounding_box
from keras_cv.backend import ops
from keras_cv.tests.test_case import TestCase


class MaskInvalidDetectionsTest(tf.test.TestCase):
class MaskInvalidDetectionsTest(TestCase):
def test_correctly_masks_based_on_max_dets(self):
bounding_boxes = {
"boxes": ops.random.uniform((4, 100, 4)),
Expand All @@ -32,7 +33,8 @@ def test_correctly_masks_based_on_max_dets(self):

negative_one_boxes = result["boxes"][:, 5:, :]
self.assertAllClose(
negative_one_boxes, -np.ones_like(negative_one_boxes)
negative_one_boxes,
-np.ones_like(ops.convert_to_numpy(negative_one_boxes)),
)

preserved_boxes = result["boxes"][:, :2, :]
Expand Down
3 changes: 2 additions & 1 deletion keras_cv/bounding_box/to_dense_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@
import tensorflow as tf

from keras_cv import bounding_box
from keras_cv.tests.test_case import TestCase


class ToDenseTest(tf.test.TestCase):
class ToDenseTest(TestCase):
@pytest.mark.tf_keras_only
def test_converts_to_dense(self):
bounding_boxes = {
Expand Down
4 changes: 2 additions & 2 deletions keras_cv/bounding_box/to_ragged_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@
# limitations under the License.
import numpy as np
import pytest
import tensorflow as tf

from keras_cv import backend
from keras_cv import bounding_box
from keras_cv.tests.test_case import TestCase


class ToRaggedTest(tf.test.TestCase):
class ToRaggedTest(TestCase):
@pytest.mark.tf_keras_only
def test_converts_to_ragged(self):
bounding_boxes = {
Expand Down
4 changes: 2 additions & 2 deletions keras_cv/bounding_box/utils_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import tensorflow as tf

from keras_cv import bounding_box
from keras_cv.backend import ops
from keras_cv.tests.test_case import TestCase


class BoundingBoxUtilTest(tf.test.TestCase):
class BoundingBoxUtilTest(TestCase):
def test_clip_to_image_standard(self):
# Test xyxy format unbatched
height = 256
Expand Down
3 changes: 2 additions & 1 deletion keras_cv/bounding_box/validate_format_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@
import tensorflow as tf

from keras_cv import bounding_box
from keras_cv.tests.test_case import TestCase


class ValidateTest(tf.test.TestCase):
class ValidateTest(TestCase):
def test_raises_nondict(self):
with self.assertRaisesRegex(
ValueError, "Expected `bounding_boxes` to be a dictionary, got "
Expand Down
4 changes: 2 additions & 2 deletions keras_cv/callbacks/pycoco_callback_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@
# limitations under the License.

import pytest
import tensorflow as tf

import keras_cv
from keras_cv.callbacks import PyCOCOCallback
from keras_cv.metrics.coco.pycoco_wrapper import METRIC_NAMES
from keras_cv.models.object_detection.__test_utils__ import (
_create_bounding_box_dataset,
)
from keras_cv.tests.test_case import TestCase


class PyCOCOCallbackTest(tf.test.TestCase):
class PyCOCOCallbackTest(TestCase):
@pytest.mark.large # Fit is slow, so mark these large.
def test_model_fit_retinanet(self):
model = keras_cv.models.RetinaNet(
Expand Down
3 changes: 2 additions & 1 deletion keras_cv/callbacks/waymo_evaluation_callback_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from tensorflow import keras

from keras_cv.callbacks import WaymoEvaluationCallback
from keras_cv.tests.test_case import TestCase

NUM_RECORDS = 10
POINT_FEATURES = 3
Expand All @@ -32,7 +33,7 @@
]


class WaymoEvaluationCallbackTest(tf.test.TestCase):
class WaymoEvaluationCallbackTest(TestCase):
@pytest.mark.skipif(True, reason="Requires Waymo Open Dataset")
def test_model_fit(self):
# Silly hypothetical model
Expand Down
5 changes: 2 additions & 3 deletions keras_cv/core/factor_sampler/constant_factor_sampler_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import tensorflow as tf

import keras_cv
from keras_cv.tests.test_case import TestCase


class ConstantFactorSamplerTest(tf.test.TestCase):
class ConstantFactorSamplerTest(TestCase):
def test_sample(self):
factor = keras_cv.ConstantFactorSampler(0.3)
self.assertEqual(factor(), 0.3)
Expand Down
5 changes: 2 additions & 3 deletions keras_cv/core/factor_sampler/normal_factor_sampler_test_.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import tensorflow as tf

from keras_cv import core
from keras_cv.tests.test_case import TestCase


class NormalFactorTest(tf.test.TestCase):
class NormalFactorTest(TestCase):
def test_sample(self):
factor = core.NormalFactor(
mean=0.5, stddev=0.2, min_value=0, max_value=1
Expand Down
5 changes: 2 additions & 3 deletions keras_cv/core/factor_sampler/uniform_factor_sampler_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,11 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import tensorflow as tf

import keras_cv
from keras_cv.tests.test_case import TestCase


class UniformFactorSamplerTest(tf.test.TestCase):
class UniformFactorSamplerTest(TestCase):
def test_sample(self):
factor = keras_cv.UniformFactorSampler(0.3, 0.6)
self.assertTrue(0.3 <= factor() <= 0.6)
Expand Down
3 changes: 2 additions & 1 deletion keras_cv/datasets/pascal_voc/segmentation_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,12 @@
from absl import flags

from keras_cv.datasets.pascal_voc import segmentation
from keras_cv.tests.test_case import TestCase

extracted_dir = os.path.join("VOCdevkit", "VOC2012")


class PascalVocSegmentationDataTest(tf.test.TestCase):
class PascalVocSegmentationDataTest(TestCase):
def setUp(self):
super().setUp()
self.tempdir = self.get_tempdir()
Expand Down
5 changes: 3 additions & 2 deletions keras_cv/datasets/waymo/load_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
import os

import pytest
import tensorflow as tf

from keras_cv.tests.test_case import TestCase

try:
from keras_cv.datasets.waymo import load
Expand All @@ -24,7 +25,7 @@
pass


class WaymoOpenDatasetLoadTest(tf.test.TestCase):
class WaymoOpenDatasetLoadTest(TestCase):
def setUp(self):
super().setUp()
self.test_data_path = os.path.abspath(
Expand Down
4 changes: 3 additions & 1 deletion keras_cv/datasets/waymo/transformer_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
import pytest
import tensorflow as tf

from keras_cv.tests.test_case import TestCase

try:
from keras_cv.datasets.waymo import load
from keras_cv.datasets.waymo import transformer
Expand All @@ -25,7 +27,7 @@
pass


class WaymoOpenDatasetTransformerTest(tf.test.TestCase):
class WaymoOpenDatasetTransformerTest(TestCase):
def setUp(self):
super().setUp()
self.test_data_path = os.path.abspath(
Expand Down
Loading
Loading