Refactor ml extension support #1521

zachgk · 2022-03-07T18:27:36Z

This makes a few changes, specifically to fasttext, but it can apply to other
DJL ml wrappers as well. First, it creates several passthrough utility classes
with the goal that ml models can be loaded through the model zoo and run through
the predictor.

Next, it modifies the base of the ml construct to better support engines that
have multiple applications. Each application now manifests as a type of
SymbolBlock rather than a model. Then, the single model class can run any of
them. The model is still created for each engine because it contains general
loading functionality that can determine which block should be used to load the
given target.

It also has to update the fasttext JNI to support this. First, it fixes the
modelType to actually return different results. Then, it modifies the
predictProba method to use an ArrayList instead of an Array. The main difference
is because it is possible to pass a topk of -1 in order to load all elements.
However, this doesn't work with the previous array setup.

Change-Id: I5fbaceb2ac4711a942b9d15dac6e6fed386dd46e

This makes a few changes, specifically to fasttext, but it can apply to other DJL ml wrappers as well. First, it creates several passthrough utility classes with the goal that ml models can be loaded through the model zoo and run through the predictor. Next, it modifies the base of the ml construct to better support engines that have multiple applications. Each application now manifests as a type of SymbolBlock rather than a model. Then, the single model class can run any of them. The model is still created for each engine because it contains general loading functionality that can determine which block should be used to load the given target. It also has to update the fasttext JNI to support this. First, it fixes the modelType to actually return different results. Then, it modifies the predictProba method to use an ArrayList instead of an Array. The main difference is because it is possible to pass a topk of -1 in order to load all elements. However, this doesn't work with the previous array setup. Change-Id: I5fbaceb2ac4711a942b9d15dac6e6fed386dd46e

frankfliu

Look good in general. But I have on doubt that if we should add fit to FtClassificationBlock.
It has discoverability problem.

I think may add fit to a utility class is better.

Also we might not want to expose .classify() as public

frankfliu · 2022-03-07T18:58:44Z

extensions/fasttext/src/test/java/ai/djl/fasttext/CookingStackExchangeTest.java

@@ -125,7 +140,7 @@ public void testBlazingText() throws IOException, ModelException {
            model.load(modelFile);
            String text =
                    "Convair was an american aircraft manufacturing company which later expanded into rockets and spacecraft .";
-            Classifications result = model.classify(text, 5);
+            Classifications result = ((FtTextClassification) model.getBlock()).classify(text, 5);


We should be able to replace this test with Criteria as well.

zachgk · 2022-03-09T16:53:40Z

@frankfliu

Look good in general. But I have on doubt that if we should add fit to FtClassificationBlock. It has discoverability problem.

I think may add fit to a utility class is better.

I am not quite sure what you are thinking here. One of the problems I am also trying to deal with is what happens when the number of applications is quite large. For example, if we have an extension package wrapping sparkml, corenlp, smile, etc. I don't think it would fit well in a single utility for all of the training possibilities. Do you have a better solution?

Also we might not want to expose .classify() as public

I plan to leave it for the moment, because the predictor API is not quite as flexible. For example, I didn't try to figure out how to support predict while specifying topK, so that option is only available through classify.

Change-Id: I7b03f8958594c01d3ab2f722dbb7a573c6d3109e

codecov-commenter · 2022-03-09T17:32:32Z

Codecov Report

Merging #1521 (87de7a9) into master (bb5073f) will decrease coverage by 1.40%.
The diff coverage is 50.03%.

@@             Coverage Diff              @@
##             master    #1521      +/-   ##
============================================
- Coverage     72.08%   70.68%   -1.41%     
- Complexity     5126     5349     +223     
============================================
  Files           473      500      +27     
  Lines         21970    23369    +1399     
  Branches       2351     2552     +201     
============================================
+ Hits          15838    16518     +680     
- Misses         4925     5569     +644     
- Partials       1207     1282      +75

Impacted Files	Coverage Δ
api/src/main/java/ai/djl/modality/cv/Image.java	`69.23% <ø> (-4.11%)`	⬇️
...i/djl/modality/cv/translator/BigGANTranslator.java	`21.42% <ø> (-5.24%)`	⬇️
...odality/cv/translator/BigGANTranslatorFactory.java	`33.33% <0.00%> (+8.33%)`	⬆️
...nslator/InstanceSegmentationTranslatorFactory.java	`14.28% <0.00%> (-3.90%)`	⬇️
.../modality/cv/translator/YoloTranslatorFactory.java	`8.33% <0.00%> (-1.67%)`	⬇️
...i/djl/modality/cv/translator/YoloV5Translator.java	`5.69% <0.00%> (ø)`
...odality/cv/translator/YoloV5TranslatorFactory.java	`8.33% <0.00%> (-1.67%)`	⬇️
...pi/src/main/java/ai/djl/ndarray/BytesSupplier.java	`54.54% <0.00%> (-12.13%)`	⬇️
...pi/src/main/java/ai/djl/repository/Repository.java	`83.33% <ø> (ø)`
...l/training/loss/SigmoidBinaryCrossEntropyLoss.java	`64.00% <0.00%> (ø)`
... and 197 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09192df...87de7a9. Read the comment docs.

Change-Id: Iadf598d930dca954ee1a0f450012cd4b4b2baab5

zachgk requested a review from frankfliu as a code owner March 7, 2022 18:27

frankfliu reviewed Mar 7, 2022

View reviewed changes

Fix PMD

46d8e45

Change-Id: I7b03f8958594c01d3ab2f722dbb7a573c6d3109e

Add TrainFastText utility to aggregate links to training functions

87de7a9

Change-Id: Iadf598d930dca954ee1a0f450012cd4b4b2baab5

frankfliu approved these changes Mar 10, 2022

View reviewed changes

zachgk merged commit d9fb99f into master Mar 10, 2022

zachgk deleted the fasttext branch March 10, 2022 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ml extension support #1521

Refactor ml extension support #1521

zachgk commented Mar 7, 2022

frankfliu left a comment

frankfliu Mar 7, 2022

zachgk commented Mar 9, 2022

codecov-commenter commented Mar 9, 2022 •

edited

Loading

Refactor ml extension support #1521

Refactor ml extension support #1521

Conversation

zachgk commented Mar 7, 2022

frankfliu left a comment

Choose a reason for hiding this comment

frankfliu Mar 7, 2022

Choose a reason for hiding this comment

zachgk commented Mar 9, 2022

codecov-commenter commented Mar 9, 2022 • edited Loading

Codecov Report

codecov-commenter commented Mar 9, 2022 •

edited

Loading