Update `dataset_stats()` for HUB #3536

glenn-jocher · 2021-06-08T21:15:28Z

Cleanup of b6fdd2e

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced dataset handling with autodownload feature and simplified usage.

📊 Key Changes

Added an autodownload parameter to the dataset_stats function for conditional dataset downloading.
Simplified data file paths by removing the 'data/' prefix from the default path argument.
Streamlined the process of checking dataset existence and potential automatic download within check_dataset function.

🎯 Purpose & Impact

🔄 Purpose: The updates are intended to give users more control over dataset downloads and streamline the developer experience when working with datasets.
⬇️ Impact: Users now have the option to automatically download missing datasets during stat checks, reducing manual download steps, and potential errors associated with missing data. The simplification of the file paths makes the API smoother and more user-friendly.

Cleanup of b6fdd2e

glenn-jocher · 2021-06-08T21:17:51Z

@kalenmike this is our new function for dataset statistics. Example usage on VOC (20 classes). If test split is missing then it will have a value of None.

from utils.datasets import *; dataset_stats('voc.yaml', verbose=True)

Scanning '../coco128/labels/train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████████| 128/128 [00:06<?, ?it/s]
Scanning '../VOC/labels/train.cache' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:00<?, ?it/s]
Statistics: 100%|██████████| 16551/16551 [00:00<00:00, 260314.63it/s]
Scanning '../VOC/labels/val.cache' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 4952/4952 [00:00<?, ?it/s]
Statistics: 100%|██████████| 4952/4952 [00:00<00:00, 288326.74it/s]

The function returns a dictionary:

- nc: 20  # number of classes
  names:  # class names
  - aeroplane
  - bicycle
  - bird
  - boat
  - bottle
  - bus
  - car
  - cat
  - chair
  - cow
  - diningtable
  - dog
  - horse
  - motorbike
  - person
  - pottedplant
  - sheep
  - sofa
  - train
  - tvmonitor
  train:  # < ---  train split
    instances:
      total: 40058
      per_class:
      - 1171
      - 1064
      - 1605
      - 1140
      - 1764
      - 822
      - 3267
      - 1593
      - 3152
      - 847
      - 824
      - 2025
      - 1072
      - 1052
      - 13256
      - 1487
      - 1070
      - 814
      - 925
      - 1108
    images:
      total: 16551
      unlabelled: 0
      per_class:
      - 908
      - 795
      - 1095
      - 689
      - 950
      - 607
      - 1874
      - 1417
      - 1564
      - 444
      - 738
      - 1707
      - 769
      - 771
      - 6095
      - 772
      - 421
      - 736
      - 805
      - 831
  val:  # < --- val split
    instances:
      total: 12032
      per_class:
      - 285
      - 337
      - 459
      - 263
      - 469
      - 213
      - 1201
      - 358
      - 756
      - 244
      - 206
      - 489
      - 348
      - 325
      - 4528
      - 480
      - 242
      - 239
      - 282
      - 308
    images:
      total: 4952
      unlabelled: 0
      per_class:
      - 204
      - 239
      - 282
      - 172
      - 212
      - 174
      - 721
      - 322
      - 417
      - 127
      - 190
      - 418
      - 274
      - 222
      - 2007
      - 224
      - 97
      - 223
      - 259
      - 229
  test: null  # < --- test split

kalenmike

Looks good

glenn-jocher · 2021-06-09T08:56:24Z

@kalenmike great! PR is merged.

* Update `dataset_stats()` for HUB Cleanup of b6fdd2e * autodownload flag * Update general.py * cleanup (cherry picked from commit 1b5edb6)

* Update `dataset_stats()` for HUB Cleanup of 03b286e * autodownload flag * Update general.py * cleanup

Update dataset_stats() for HUB

e54fc33

Cleanup of b6fdd2e

glenn-jocher requested a review from kalenmike June 8, 2021 21:18

glenn-jocher added 3 commits June 8, 2021 23:22

autodownload flag

800cc98

Update general.py

96aa25b

cleanup

d2defd6

kalenmike reviewed Jun 8, 2021

View reviewed changes

glenn-jocher merged commit 1b5edb6 into master Jun 9, 2021

glenn-jocher deleted the glenn-jocher-patch-4 branch June 9, 2021 08:56

Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021

Update dataset_stats() for HUB (ultralytics#3536)

34847ed

* Update `dataset_stats()` for HUB Cleanup of b6fdd2e * autodownload flag * Update general.py * cleanup (cherry picked from commit 1b5edb6)

glenn-jocher mentioned this pull request Oct 12, 2021

YOLOv5 release v6.0 #5141

Merged

glenn-jocher mentioned this pull request Nov 7, 2021

YOLOv5 v6.0 compatibility update (draft) ultralytics/yolov3#1855

Closed

glenn-jocher mentioned this pull request Nov 14, 2021

YOLOv5 v6.0 compatibility update ultralytics/yolov3#1857

Merged

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022

Update dataset_stats() for HUB (ultralytics#3536)

7001eb6

* Update `dataset_stats()` for HUB Cleanup of 03b286e * autodownload flag * Update general.py * cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `dataset_stats()` for HUB #3536

Update `dataset_stats()` for HUB #3536

glenn-jocher commented Jun 8, 2021 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jun 8, 2021 •

edited

Loading

kalenmike left a comment

glenn-jocher commented Jun 9, 2021

Update dataset_stats() for HUB #3536

Update dataset_stats() for HUB #3536

Conversation

glenn-jocher commented Jun 8, 2021 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Jun 8, 2021 • edited Loading

kalenmike left a comment

Choose a reason for hiding this comment

glenn-jocher commented Jun 9, 2021

Update `dataset_stats()` for HUB #3536

Update `dataset_stats()` for HUB #3536

glenn-jocher commented Jun 8, 2021 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jun 8, 2021 •

edited

Loading