Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounding box #219

Closed
peterdesmet opened this issue Aug 19, 2022 · 13 comments
Closed

Add bounding box #219

peterdesmet opened this issue Aug 19, 2022 · 13 comments
Assignees
Milestone

Comments

@peterdesmet
Copy link
Member

See discussion in #203. Best solution was to add bounding box as a property of the observation. What isn't defined is the expected format.

@ddachs
Copy link

ddachs commented Sep 28, 2022

I just want to confirm the need for the bounding box info in the observations.csv table. I guess the format is rather secondary, as long as it is defined, because you can easily transform the coordinates.

@peterdesmet
Copy link
Member Author

Discussed with @kbubnicki

  • Name term boundingBox (most recognizable)
  • Insert right after mediaID (to zoom in further)
  • Only use it for media-observations table (not event-observations)
  • Definition to be provided
  • Recommended format to be provided

@ddachs
Copy link

ddachs commented Jan 26, 2023

I recommend the YOLO format to be used.
This way the coordinates will be independent of the image size (which can vary)

@peterdesmet
Copy link
Member Author

peterdesmet commented Feb 6, 2023

@kbubnicki for Agouti, we would like if the bounding box field could also support the [x,y] position of animals. I guess that should be possible in yolo format ([x_center, y_center, width, height]) by having it as x, y, 0, 0?

@peterdesmet
Copy link
Member Author

@danstowell in reply to #314 (comment), if you want to classify a media file containing 3 sparrows with bounding boxes, you would have the following 3 observations:

observationID mediaID scientificName start end boundingBox
obs1 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x1, y1, width1, height1]
obs2 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x2, y2, width2, height2]
obs3 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z [x3, y3, width3, height3]

@kbubnicki
Copy link
Contributor

Alternatively, we could store a bounding box data in 4 separate columns, thus enforcing exactly one bounding box per observation row:

observationID mediaID scientificName start end bboxX bboxY bboxWidth bboxHeight
obs1 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x1 y1 width1 height1
obs2 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x2 y2 width2 height2
obs3 med1 Passer domesticus 2020-08-02T05:00:15Z 2020-08-02T05:00:15Z x3 y3 width3 height3

@danstowell I remember your comment about storing structured data within a CSV cell. What do you think?

@kbubnicki
Copy link
Contributor

The format would be:

[
    {
        "name": "bboxX",
        "description": "The relative X coordinate of a bounding box center, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxY",
        "description": "The relative Y coordinate of a bounding box center, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxWidth",
        "description": "The relative width of a bounding box, normalized to the image width.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    },
    {
        "name": "bboxHeight",
        "description": "The relative height of a bounding box, normalized to the image height.",
        "type": "number",
        "constraints": {
            "required": false,
            "minimum": 0,
            "maximum": 1
        },
        "example": 0.5
    }
]

It is YOLO format (also suggested by @ddachs ). The advantage of this format (i.e. coordinates of the center instead of e.g. upper-left corner) is that bboxX and bboxY columns can be used to store information on the relative position of an animal on an image (e.g. estimated using image-calibration methods for distance sampling applications) without defining an entire bounding box. Then bboxWidth and bboxHeight are simply zeros.

@peterdesmet
Copy link
Member Author

I like that approach.

@danstowell
Copy link
Contributor

Yes, this is indeed a bit clearer. I wasn't planning to comment on that aspect though, because I don't know which of those two options (i.e. single compound column, or separated into columns) will be easier for your target users to produce/consume. If it matches YOLO format then that's an argument in support of it.

Within AudioVisual Core we specified something similar except it was a top-left corner. I rather wish the centrepoint had been an option we considered, since it has some handy properties. (I note also that in AC, zero-sized rectangles are explicitly disallowed, though zero-sized circles are to be used instead! So that's compatible.)

@peterdesmet
Copy link
Member Author

Thanks @danstowell! Given that AudioVisual Core adopted top-left corner we might consider that too ... so we can reference the terms?

bboxX -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/xFrac
bboxY -- skos:exactMatch --> http://rs.tdwg.org/ac/terms/yFrac

@danstowell @kbubnicki Or would you advise against that?

Note: the advantage to split into columns is that we can write easier validation (e.g. x should be between 0 and 1).

@peterdesmet
Copy link
Member Author

@danstowell @baskaufs I'd like to know how we should reference the AC terms and how important the AC Notes are.

For example, our bboxWidth follows the of definition of http://rs.tdwg.org/ac/terms/widthFrac exactly:

The width of the bounding rectangle, expressed as a decimal fraction of the width of the media item.

But we might allow 0 widths, which contracts with the notes of http://rs.tdwg.org/ac/terms/widthFrac:

Zero-sized bounding rectangles are not allowed. To designate a point, use the radius option with a zero value.

Is our bboxWidth than still an exact match or is it broader (because we allow more)?

@peterdesmet
Copy link
Member Author

Update based on #323

  • We have now adopted top-left corner rather than center. It aligns with Megadetector format and AC
  • We don't allow 0 values anymore
  • AC terms are broader than Camtrap DP terms, because the bounding boxes should encompass observed individuals, not just any object.

@baskaufs
Copy link

@peterdesmet Cool. Prior to adopting the AC terms, we looked at a number of systems for defining bounding boxes. Most (nearly all?) had 0,0 as the upper left corner. So following that convention simplifies the conversion to other systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants