-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset index and slicing #878
Conversation
Biggest code changes are in |
What's the intended behavior when overlapping splits are provided? E.g. |
Is it of interest to allow for specifying negative indices when (1) specifying a single index and/or (2) when providing a list of indices? (1) E.g, if I only want to evaluate on the penultimate sample, specifying (2) E.g., if I want to evaluate on the first and penultimate sample, |
The intended behavior is simply to match the underlying TFDS behavior, which allows duplicate and out-of-order splits. I don't expect people to make much use of it, but it is much simpler on our end than trying to determine uniqueness. |
I had not considered using negative indices. I didn't realize they would work. I can make an update to allow them if you think that makes sense. |
I think, since it's unlikely they would be commonly used, we can hold off and only add later if it's something that a user is specifically looking for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* fix pytorch deepspeech Dockerfile (#820) * Bump to version 0.12.2-dev (#819) * Update docs with new object detection datasets/models/metrics (#818) * added object detection metrics * adding xView and APRICOT docs * update dataset doc * fix typo * don't throw spurious errors on container shutdown after armory exec or launch (#828) * don't throw spurious errors on container shutdown after armory exec or launch * only log a clean container exit when the exit is known clean * Docs update (faq) (#831) * added object detection metrics * update faq with canonical preprocessing * minor grammar nitpicks * Canonical preprocessing for adversarial datasets and non-scenario datasets (#829) * canonical adversarial datasets * canonical datasets * canonical datasets * canonical datasets * canonical datasets * refactored ucf, xview, and apricot * black formatting Co-authored-by: lucas.cadalzo <lucas.cadalzo@twosixlabs.com> * Update datasets, licensing, scenarios, baseline models, and metrics (#826) * Updating datasets docs * Updating adversarial dataset docs * Updating metrics docs * Fix formatting * Fix formatting * Update licensing * Update scenarios doc * Fix formatting * Revert "Fix formatting" This reverts commit cac7eb9. * Update baseline model docs * Remove precision/recall metrics from doc * Update datasets.md * Updating dataset docs * Updating dataset docs * Fix MNIST dtype * Update scenarios doc (#844) * updated ASR scenario documentation * update ASR documentation * updated multimodal scenario documentation * updated xview scenario documentation * updated APRICOT scenario documentation * updated APRICOT scenario documentation * updated poison scenario documentation * updated UCF101 scenario documentation * updated UCF101 scenario documentation * Defences fix (#839) * chmod * ensure dtype output * variable y (#841) * update download command line (#840) * Update attack params (#845) * update * updated to max of test and dev datasets * fast untarring when possible (#832) * fast untarring when possible * standard output * Piping from stdin (#842) * accept standard input * example * exception -> error * reject configs on raw stdin Co-authored-by: Adam Jacobson <adam.jacobson@twosixlabs.com> * xView model gpu fix (#853) * added object detection metrics * remove hardcoded device_type of cpu * iou warning logic (#861) * added object detection metrics * warning is now thrown if the following holds for exactly one of the two boxes: all nonzero coordinates from a box are < 1 * scaled epsilon (#865) * Better ASR Labels (#863) * remove int check to enable str inputs * label updates * weight saving hack (#870) * working FGSM config for ASR + lint (#859) * apricot patch targeted AP metric (#866) * added object detection metrics * checkpoint progress with apricot_patch_targeted_AP_per_class metric; still need to decide how to handle duplicate predictions of targeted class * ignore duplicate true positives for a single patch box; updated comments and changed variable names * adding apricot patch metadata used in apricot_patch_targeted_AP_per_class metric * minor edits to variable name and comments * update apricot test to incorporate new metric * make APRICOT_PATCHES a dict with id's for keys * Update host-requirements.txt (#873) * Add --skip-attack flag (#877) * added object detection metrics * added --skip-attack flag * allow for passing --skip-benign and --skip-attack * truncate very long videos (#880) * Dataset index and slicing (#878) * update split naming * update dataset loading * updated docs * updated token tests * working tests for slicing * allow ordering and duplicates * Sanity check model configurations (#875) * Initial sanity checking * Initial sanity checking * New sanity checks. Ensuring baseline scenario compliance * Moving to pytest framework * Integrated into armory run command with validate-config flag * Move test_config folder * Explict check for classifier model * Add specific ground truth box set * Black formatting * Updating docs * Fix typo * Pin to ART 1.4.2 and move to end of container installs (#887) * added object detection metrics * pin to ART 1.4.2 and move to end of docker container installs * move ART installation to framework Dockerfiles * add ART install to dev Dockerfiles * UCF101 shape bug (#889) * updated preprocessing * revert * Update xview results (#879) * updated baseline results for xview * formatting * fix typos Co-authored-by: lucas.cadalzo <lucas.cadalzo@twosixlabs.com> * Split metrics when targeted attack is present (#884) * Split adversarial metrics relative to ground truth and targeted labels * Flake8 * Updating metrics doc * added pgd_patch art_experimental attack (#883) * added pgd_patch art_experimental attack * ran format_json * set use_gpu false * removing accidentally added files Co-authored-by: lucas.cadalzo <lucas.cadalzo@twosixlabs.com> * updated configs (#897) * Add type annotations to baseline models (#885) * Add type annotations to baseline models * Add docstrings to baseline models * Update type annotations * Update type annotation * GTSRB canonical preprocessing (#888) * Add canonical preprocessing for GTSRB * Update canonical preprocessing for GTSRB * Update GTSRB canonical preprocessing * Update documentation * Update scenario documentation * Deepspeech nan rebase (#891) * updated docker builds * update docker file * update * xView perturbation metric (#898) * added object detection metrics * update l0 norm to divide by number of elements in array; update xView scenarios to use l0 norm * update test_metrics.py * Docs for --skip flags (#900) * added object detection metrics * adding docs for --skip flags * Release version 0.12.2 Co-authored-by: Adam Jacobson <38766736+adamj26@users.noreply.github.com> Co-authored-by: lcadalzo <39925313+lcadalzo@users.noreply.github.com> Co-authored-by: davidslater <david.slater@twosixlabs.com> Co-authored-by: lucas.cadalzo <lucas.cadalzo@twosixlabs.com> Co-authored-by: kevinmerchant <67436031+kevinmerchant@users.noreply.github.com> Co-authored-by: yusong-tan <59029053+yusong-tan@users.noreply.github.com> Co-authored-by: Adam Jacobson <adam.jacobson@twosixlabs.com>
Fixes #871
This PR does a few related things:
Renames "split_type" to "split" to make it clear that it is verbatim what is given to TFDS load (same kwarg name)
https://www.tensorflow.org/datasets/api_docs/python/tfds/load
Modifies scenarios so that configs can override the train and eval splits. This enables TFDS slicing operations on them such as
"test[:10]"
or"train_clean100+train_clean360+train_other500"
:https://www.tensorflow.org/datasets/splits#examples
Adds additional parsing of splits to enable individual indexing for a specific data point, such as "test[355]", and indexing based off of a list, "test[[1, 4, 5, 7, 9]]". Under the hood, these are translated to single-element slices. Advanced slicing operations involving a
step
parameter, such as "test[10:20:2]" are not permitted, and raise a NotImplementedError.