Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dataset_stats() for HUB #3536

Merged
merged 4 commits into from
Jun 9, 2021
Merged

Update dataset_stats() for HUB #3536

merged 4 commits into from
Jun 9, 2021

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Jun 8, 2021

Cleanup of b6fdd2e

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhanced dataset handling with autodownload feature and simplified usage.

πŸ“Š Key Changes

  • Added an autodownload parameter to the dataset_stats function for conditional dataset downloading.
  • Simplified data file paths by removing the 'data/' prefix from the default path argument.
  • Streamlined the process of checking dataset existence and potential automatic download within check_dataset function.

🎯 Purpose & Impact

  • πŸ”„ Purpose: The updates are intended to give users more control over dataset downloads and streamline the developer experience when working with datasets.
  • ⬇️ Impact: Users now have the option to automatically download missing datasets during stat checks, reducing manual download steps, and potential errors associated with missing data. The simplification of the file paths makes the API smoother and more user-friendly.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Jun 8, 2021

@kalenmike this is our new function for dataset statistics. Example usage on VOC (20 classes). If test split is missing then it will have a value of None.

from utils.datasets import *; dataset_stats('voc.yaml', verbose=True)

Scanning '../coco128/labels/train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:06<?, ?it/s]
Scanning '../VOC/labels/train.cache' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16551/16551 [00:00<?, ?it/s]
Statistics: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16551/16551 [00:00<00:00, 260314.63it/s]
Scanning '../VOC/labels/val.cache' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4952/4952 [00:00<?, ?it/s]
Statistics: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4952/4952 [00:00<00:00, 288326.74it/s]

The function returns a dictionary:

- nc: 20  # number of classes
  names:  # class names
  - aeroplane
  - bicycle
  - bird
  - boat
  - bottle
  - bus
  - car
  - cat
  - chair
  - cow
  - diningtable
  - dog
  - horse
  - motorbike
  - person
  - pottedplant
  - sheep
  - sofa
  - train
  - tvmonitor
  train:  # < ---  train split
    instances:
      total: 40058
      per_class:
      - 1171
      - 1064
      - 1605
      - 1140
      - 1764
      - 822
      - 3267
      - 1593
      - 3152
      - 847
      - 824
      - 2025
      - 1072
      - 1052
      - 13256
      - 1487
      - 1070
      - 814
      - 925
      - 1108
    images:
      total: 16551
      unlabelled: 0
      per_class:
      - 908
      - 795
      - 1095
      - 689
      - 950
      - 607
      - 1874
      - 1417
      - 1564
      - 444
      - 738
      - 1707
      - 769
      - 771
      - 6095
      - 772
      - 421
      - 736
      - 805
      - 831
  val:  # < --- val split
    instances:
      total: 12032
      per_class:
      - 285
      - 337
      - 459
      - 263
      - 469
      - 213
      - 1201
      - 358
      - 756
      - 244
      - 206
      - 489
      - 348
      - 325
      - 4528
      - 480
      - 242
      - 239
      - 282
      - 308
    images:
      total: 4952
      unlabelled: 0
      per_class:
      - 204
      - 239
      - 282
      - 172
      - 212
      - 174
      - 721
      - 322
      - 417
      - 127
      - 190
      - 418
      - 274
      - 222
      - 2007
      - 224
      - 97
      - 223
      - 259
      - 229
  test: null  # < --- test split

@glenn-jocher glenn-jocher requested a review from kalenmike June 8, 2021 21:18
Copy link
Contributor

@kalenmike kalenmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@glenn-jocher glenn-jocher merged commit 1b5edb6 into master Jun 9, 2021
@glenn-jocher glenn-jocher deleted the glenn-jocher-patch-4 branch June 9, 2021 08:56
@glenn-jocher
Copy link
Member Author

@kalenmike great! PR is merged.

Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021
* Update `dataset_stats()` for HUB 

Cleanup of b6fdd2e

* autodownload flag

* Update general.py

* cleanup

(cherry picked from commit 1b5edb6)
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Update `dataset_stats()` for HUB 

Cleanup of 03b286e

* autodownload flag

* Update general.py

* cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants