Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yolov5 multiple objects detection in a video #2754

Open
VojtenkoRN opened this issue Aug 17, 2023 · 9 comments
Open

Yolov5 multiple objects detection in a video #2754

VojtenkoRN opened this issue Aug 17, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@VojtenkoRN
Copy link

Description

In my quarkus project I'm trying to use DJL with a yolov5x model for multiple objects detection in a video. I passed all ms coco class names to the synset and then wanted to filter a result. But too many objects were detected and all of them had wrong bounding boxes

Expected Behavior

Correct detection with correct bounding boxes

Error image example

result

How to Reproduce?

Example

Steps to reproduce

Just run the app as the README said

What have you tried to solve it?

I've tried to pass single class name to synset (80 times), but I need several classes and it starts to detect all objects as passed class.

Environment Info

Engine.debugEnvironment() result:

----------- System Properties -----------
java.specification.version: 17
sun.jnu.encoding: UTF-8
java.vm.vendor: GraalVM Community
sun.arch.data.model: 64
java.vendor.url: https://www.graalvm.org/
logging.initial-configurator.min-level: 500
java.vm.specification.version: 17
os.name: Linux
sun.java.launcher: SUN_STANDARD
sun.boot.library.path: ~/.jdks/graalvm-ce-17/lib
jdk.debug: release
sun.cpu.endian: little
jboss.log-version: false
java.specification.vendor: Oracle Corporation
java.version.date: 2023-04-18
java.home: ~/.jdks/graalvm-ce-17
file.separator: /
java.vm.compressedOopsMode: Zero based
jdk.internal.vm.ci.enabled: true
line.separator: 

java.vm.specification.vendor: Oracle Corporation
java.specification.name: Java Platform API Specification
sun.management.compiler: HotSpot 64-Bit Tiered Compilers
java.runtime.version: 17.0.7+7-jvmci-22.3-b18
path.separator: :
os.version: 5.15.0-79-generic
java.runtime.name: OpenJDK Runtime Environment
file.encoding: UTF-8
java.vm.name: OpenJDK 64-Bit Server VM
java.vendor.version: GraalVM CE 22.3.2
java.vendor.url.bug: https://github.com/oracle/graal/issues
java.io.tmpdir: /tmp
java.version: 17.0.7
java.util.concurrent.ForkJoinPool.common.threadFactory: io.quarkus.bootstrap.forkjoin.QuarkusForkJoinWorkerThreadFactory
user.dir: ~/some-path/video-detection-example
os.arch: amd64
java.vm.specification.name: Java Virtual Machine Specification
native.encoding: UTF-8
java.util.logging.manager: org.jboss.logmanager.LogManager
java.library.path: /usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
java.vm.info: mixed mode, sharing
java.vendor: GraalVM Community
java.vm.version: 17.0.7+7-jvmci-22.3-b18
sun.io.unicode.encoding: UnicodeLittle
java.class.version: 61.0

--------- Environment Variables ---------
PATH: ~/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-17-oracle/bin:/usr/lib/jvm/java-17-oracle/db/bin
XAUTHORITY: ~/.Xauthority
GDMSESSION: cinnamon
XDG_DATA_DIRS: /usr/share/cinnamon:/usr/share/gnome:~/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share:/var/lib/snapd/desktop
JAVA_HOME: /usr/lib/jvm/java-17-oracle
XDG_CONFIG_DIRS: /etc/xdg/xdg-cinnamon:/etc/xdg
XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0
DBUS_SESSION_BUS_ADDRESS: unix:path=/run/user/1000/bus
XDG_SESSION_TYPE: x11
XDG_SESSION_ID: c2
XDG_CURRENT_DESKTOP: X-Cinnamon
DISPLAY: :0
SESSION_MANAGER: local/host-Linux:@/tmp/.ICE-unix/3463,unix/host-Linux:/tmp/.ICE-unix/3463
CINNAMON_VERSION: 5.6.8
PWD: ~/some-path/video-detection-example
DERBY_HOME: /usr/lib/jvm/java-17-oracle/db
XDG_SESSION_CLASS: user
GJS_DEBUG_TOPICS: JS ERROR;JS LOG
SHELL: /bin/bash
GTK3_MODULES: xapp-gtk3-module
GIO_LAUNCHED_DESKTOP_FILE: ~/.local/share/applications/jetbrains-idea.desktop
XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/username
J2SDKDIR: /usr/lib/jvm/java-17-oracle
DESKTOP_SESSION: cinnamon
GPG_AGENT_INFO: /run/user/1000/gnupg/S.gpg-agent:0:1
GIO_LAUNCHED_DESKTOP_FILE_PID: 5011
QT_ACCESSIBILITY: 1
GNOME_DESKTOP_SESSION_ID: this-is-deprecated
GJS_DEBUG_OUTPUT: stderr
XDG_SEAT: seat0
J2REDIR: /usr/lib/jvm/java-17-oracle
GTK_MODULES: gail:atk-bridge
SSH_AUTH_SOCK: /run/user/1000/keyring/ssh
GTK_OVERLAY_SCROLLING: 1
XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0
QT_QPA_PLATFORMTHEME: qt5ct
XDG_RUNTIME_DIR: /run/user/1000
XDG_SESSION_DESKTOP: cinnamon
XDG_VTNR: 7
SHLVL: 0
HOME: ~

-------------- Directories --------------
temp directory: /tmp
DJL cache directory: ~/.djl.ai
Engine cache directory: ~/.djl.ai

------------------ CUDA -----------------
GPU Count: 1
CUDA: 122
ARCH: 86
GPU(0) memory used: 1167327232 bytes

----------------- Engines ---------------
DJL version: 0.23.0

----------------- Startup logs ---------------
Default Engine: PyTorch:1.13.1, capabilities: [
	CUDA,
	CUDNN,
	OPENMP,
	MKL,
	MKLDNN,
]
PyTorch Library: ~/.djl.ai/pytorch/1.13.1-SNAPSHOT-cu117-linux-x86_64
Default Device: gpu(0)
PyTorch: 2
@VojtenkoRN VojtenkoRN added the bug Something isn't working label Aug 17, 2023
@KexinFeng
Copy link
Contributor

KexinFeng commented Aug 17, 2023

To debug this problem, the first observation from the output picture is that the bounding boxes are not at the right positions. So in the detection result DetectedObjects, you can check the x, y, w, h of the boundingBoxes to see if they are indeed wrong numbers. You can also in the same DetectedObjects check the classNames. After this, you will know if it is purely plotting problem (since the bounding boxes numbers should be compatible with the image size), or the postprocessing.

You can take a look at this example: /djl/examples/src/main/java/ai/djl/examples/inference/MaskDetection.java. It is similar to your usecase. Check how the images are rescaled, if needed. The relevant PR is #2452 which contains detailed .md file.

Also the first check might be the check on the onnx model. You can use python to load it and inference, and see what the correct boundingBoxes and classNames should be.

@VojtenkoRN
Copy link
Author

VojtenkoRN commented Aug 18, 2023

  1. I've added optArgument("optApplyRatio", true) and .optArgument("rescale", true) but it doesn't change anything.

debug0

  1. As i said earlier, there were too many objects detected and all of them had wrong bounding boxes (result for video frame above)

debug1
debug2

  1. I changed torchscript to onnx and installed TensorRT. I also rewrote code like your example (branch onnx-model).
    It throws File not found: ~/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/51b9a350d1dd717c7e5d1c1c133e6afb8d7c12f0/synset.txt. I haven't found in docs how to specify synset for YoloV5TranslatorFactory. But if I put synset.txt into that path in djl.id/cache (which is rather uncomfortable) it throws UnsupportedOperationException: This NDArray implementation does not currently support this operation when trying to predict.
    Synset.txt:
    debug3

@frankfliu
Copy link
Contributor

A few obvious issue:

First of all, .optArgument() only affects TranslatorFactory, you are directly passing Translator. In your case, you need configure your Translator directly:

        Translator<Image, DetectedObjects> translator = YoloV5Translator
              .builder()
              .setPipeline(pipeline)
              .optSynset(Synset.asNameList())
              .optThreshold(THRESHOLD)
              .optRescaleSize(IMAGE_SIZE, IMAGE_SIZE)
              .optApplyRatio(true)
              .optOutputType(YoloV5Translator.YoloOutputType.AUTO)
              .build();

Another small issue is you have the following code:

              .optDevice(Device.gpu())
              .optDevice(Device.cpu())
  1. DJL will detect the GPU vs CPU automatically, in general you don't need specify device
  2. The last call will override previous call, in your code only CPU will be used.

@VojtenkoRN
Copy link
Author

VojtenkoRN commented Aug 18, 2023

Thanks for advice!
I rewrote Translator as you said and changed model for using it on gpu (my bad).
The problems still remain :(

Detected:
debug0
debug1

On screen:
debug3

UPD: All changes committed

@frankfliu
Copy link
Contributor

I tried your model and it seems working fine:

Path imageFile = Paths.get("src/test/resources/dog_bike_car.jpg");
Image img = ImageFactory.getInstance().fromFile(imageFile);

Criteria<Image, DetectedObjects> criteria =
        Criteria.builder()
                .optApplication(Application.CV.OBJECT_DETECTION)
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(Paths.get("video-detection-example/model/yolov5x.torchscript"))
                .optEngine("PyTorch")
                .optArgument("width", "640")
                .optArgument("height", "640")
                .optArgument("resize", "true")
                .optArgument("rescale", "true")
                .optArgument("optApplyRatio", "true")
                .optArgument("threshold", "0.4")
                .optArgument("synsetUrl", "https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/object_detection/ai/djl/pytorch/classes_coco.txt")
                .optTranslatorFactory(new YoloV5TranslatorFactory())
                .optProgress(new ProgressBar())
                .build();

try (ZooModel<Image, DetectedObjects> model = criteria.loadModel();
    Predictor<Image, DetectedObjects> predictor = model.newPredictor()) {
    DetectedObjects detection = predictor.predict(img);
    saveBoundingBoxImage(img, detection);
    return detection;
}

@VojtenkoRN
Copy link
Author

VojtenkoRN commented Aug 18, 2023

Ok, thanks a lot! I'll check it as best I can.
But is there any other way to pass synset list except .optArgument("synsetUrl", "...") for using in .optTranslatorFactory(new YoloV5TranslatorFactory()) ? Because put synset.txt in ~/djl.id/cache/.../ is rather uncomfortable :(

@frankfliu
Copy link
Contributor

frankfliu commented Aug 19, 2023

@VojtenkoRN

Yes. you can use .optArgument("synset", "dog,cat,car,..."), but we didn't do any csv like comma escape, so it has limitation.

@VojtenkoRN
Copy link
Author

VojtenkoRN commented Aug 20, 2023

Thank you all!

Option 1 (was):

Pipeline pipeline = new Pipeline()
              .add(new Resize(IMAGE_SIZE))
              .add(new ToTensor());

        Translator<Image, DetectedObjects> translator = YoloV5Translator
              .builder()
              .setPipeline(pipeline)
              .optSynset(Synset.asNameList())
              .optThreshold(THRESHOLD)
              .optRescaleSize(IMAGE_SIZE, IMAGE_SIZE)
              .optApplyRatio(true)
              .optOutputType(YoloV5Translator.YoloOutputType.AUTO)
              .build();

        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
              .setTypes(Image.class, DetectedObjects.class)
              .optModelUrls(modelPath)
              .optModelName(modelName)
              .optDevice(Device.gpu())
              .optApplication(Application.CV.OBJECT_DETECTION)
              .optTranslator(translator)
              .optEngine(Engine.getDefaultEngineName())
              .optProgress(new ProgressBar())
              .build();

Option 2 (become, synset passed from code, master)

Criteria<Image, DetectedObjects> criteria = Criteria.builder()
              .setTypes(Image.class, DetectedObjects.class)
              .optModelUrls(modelPath)
              .optModelName(modelName)
              .optDevice(Device.gpu())
              .optApplication(Application.CV.OBJECT_DETECTION)
              .optEngine(Engine.getDefaultEngineName())
              .optArgument("width", IMAGE_SIZE)
              .optArgument("height", IMAGE_SIZE)
              .optArgument("resize", "true")
              .optArgument("rescale", "true")
              .optArgument("optApplyRatio", "true")
              .optArgument("threshold", THRESHOLD)
              .optArgument("synset", Synset.asString())
              .optTranslatorFactory(new YoloV5TranslatorFactory())
              .optProgress(new ProgressBar())
              .build();

Result:
debug0
debug1
debug2

Option 3 (become, synset passed from txt, synset-in-txt)

final var synsetUrl = Path.of(synsetPath).toUri().toString();

        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
              .setTypes(Image.class, DetectedObjects.class)
              .optModelUrls(modelPath)
              .optModelName(modelName)
              .optDevice(Device.gpu())
              .optApplication(Application.CV.OBJECT_DETECTION)
              .optEngine(Engine.getDefaultEngineName())
              .optArgument("width", IMAGE_SIZE)
              .optArgument("height", IMAGE_SIZE)
              .optArgument("resize", "true")
              .optArgument("rescale", "true")
              .optArgument("optApplyRatio", "true")
              .optArgument("threshold", THRESHOLD)
              .optArgument("synsetUrl", synsetUrl)
              .optTranslatorFactory(new YoloV5TranslatorFactory())
              .optProgress(new ProgressBar())
              .build();

Result (this workaround seems to be working):
fine

But it seems a bit odd to me that option 1 and option 2 doesn't work, while option 3 does. Although outwardly there is not much difference between them. Especially when you consider that all classes seem to be defined correctly in option 2
debug3

Thanks for help again! If you think that it isn't a bug - you can close that issue :)

@frankfliu
Copy link
Contributor

I don't see any difference between all options, I just test with the following code, and works fine:

        String url = "https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/object_detection/ai/djl/pytorch/classes_coco.txt";
        List<String> list;
        try (InputStream is = new URL(url).openStream()) {
            list = Utils.readLines(is);
        }
        String synset = String.join(",", list);

and:

        String url = "https://djl-ai.s3.amazonaws.com/mlrepo/model/cv/object_detection/ai/djl/pytorch/classes_coco.txt";
        List<String> list;
        try (InputStream is = new URL(url).openStream()) {
            list = Utils.readLines(is);
        }

        Pipeline pipeline = new Pipeline()
                .add(new Resize(640))
                .add(new ToTensor());

        Translator<Image, DetectedObjects> translator = YoloV5Translator
                .builder()
                .setPipeline(pipeline)
                .optSynset(list)
                .optThreshold(0.4f)
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optOutputType(YoloV5Translator.YoloOutputType.AUTO)
                .build();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants