Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add indexing to datasets #1003

Merged
merged 5 commits into from
Mar 5, 2021
Merged

add indexing to datasets #1003

merged 5 commits into from
Mar 5, 2021

Conversation

davidslater
Copy link
Contributor

@davidslater davidslater commented Mar 4, 2021

Fixes #895
Currently, you can add a numeric index to loading a dataset, like this:

import logging
import coloredlogs
coloredlogs.install(logging.DEBUG)
import numpy as np
from armory.data import datasets
ds = datasets.mnist("test", shuffle_files=False, index=None)
print(np.hstack([next(ds)[1] for i in range(10)]))
# [2 0 4 8 7 6 0 6 3 1]
ds = datasets.mnist("test", shuffle_files=False, index=[1, 3, 5, 7, 9])
print(np.hstack([next(ds)[1] for i in range(5)]))
# [0 8 6 6 1]

I need to check to make sure that:

  • this works with config files
  • enable handling of simple slices
  • update documentation
  • update ArmoryDataGenerator size
  • Add tests

I'd be happy for any thoughts at this point.

@davidslater
Copy link
Contributor Author

@ng390 I think it's ready for another review.

@@ -269,6 +269,72 @@ def parse_split_index(split: str):
return "+".join(output_tokens)


def filter_by_index(dataset: "tf.data.Dataset", index: list, dataset_size: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor point - if dataset_size is expected to be an int, why is it cast to an int later? Should it be a Union of int and another type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's primarily so that if the input is not an int, it will fail fast.

armory/data/datasets.py Outdated Show resolved Hide resolved
@davidslater
Copy link
Contributor Author

Okay, pushed change. Any further review?

@ng390
Copy link
Contributor

ng390 commented Mar 5, 2021

LGTM -- good to drop the WIP label?

@davidslater davidslater changed the title WIP: add indexing to datasets add indexing to datasets Mar 5, 2021
@ng390 ng390 merged commit 230f3d7 into twosixlabs:dev Mar 5, 2021
lcadalzo added a commit that referenced this pull request Mar 25, 2021
* Run CI tests on PRs/commits to dev (#1000)

* add indexing to datasets (#1003)

* kwargs

* update slicing

* update data loader

* add tests

* neal change

* Docker updates (#1008)

* added object detection metrics

* WIP: update docker dependencies (#1006)

* updating docker dependencies

* removing poisoning images

* update deepspeech image dependencies

* update host-requirements.txt

* removing poisoning images from release.yml

* add coloredlogs

* update ART pip install command

* remove accidental new line character

* Docker build simplification (#1017)

* remove -dev

* update

* docker build

* distribute docker containers

* update dockerfiles and github workflows

* update docs

* add pytorch-deepspeech to images

* add dev tag

* update release

* update error handling

* remove comment

* remove unneeded command

* tool to update versions

* removing calls to ART's set_learning_phase()

* updating image version

* formatting

* apricot skip (#1020)

Co-authored-by: davidslater <david.slater@twosixlabs.com>

* Fix missing validation folder in whl (#972)

* Fix missing validation folder in whl

* Automatically find test folder relative to armory installation

* Ignore pytest cache warning

* Fix import, add warning filter

* Disable test caching

* skip misclassified examples (#1005)

* added object detection metrics

* adding proof of concept

* black formatting

* move cli/config check to base scenario before _evaluate()

* refactor image_classification skip_misclassified

* update __main__ with skip-misclassified

* record_metric_per_sample doesn't need to be true

* add skip_misclassified to so2sat scenario

* adding skip_misclassified to scenarios where it shouldnt be used

* add skip_misclassified to video

* minor refactor of audio_classification plus adding in skip_misclassified

* docs for skip-misclassified

* Filter by class (#1019)

* added object detection metrics

* adding ability to filter by class; also modified error messages for filter_by_index()

* flake8

* fix new test

* updated docs

* update warning logic for filter_by_index

* add warning if filtering by class and using train split

* fix filter by class dset test

* Micronnet model using pytorch sequential api (#1023)

* Sequential API Pytorch version of MicronNet

* Fix pytest

* unify sysconfig and command line directives (#1027)

* merge sysconfig and command arguments

* unit test for config merge

arguments.py created because __main__  is not pytest importable
(or at least not easily)

additional args like filepath get added to sysconfig, I don't know
if that's a bad thing

* code correct but the test was wrong

* flake8 compliance

* args merge documented

* collapse loops to comprehensions

* remove obsoleted truth table comment

* add cross-reference to command_line.md

* DAPRICOT integration (#1021)

* initial draft of DAPRICOT dataset

* ran black and flake8'

* updating checksums for now

* add dapricot dataset fn

* refactor preprocessing + setting cache=False temporarily

* add WIP dapricot scenario and config

* update scenario

* add label preprocessing to unify label format with other OD datasets

* added attack skeleton and minor scenario updates

* testing insertion of random patch

* changes necessary for upgrade to ART 1.6

* update config for attack rename

* typo

* use robust dpatch to generate attack

* update dapricot config

* black formatting

* format json

* flake8

* add masked pgd attack for dapricot

* formatting

* adding dapricot utils script

* fix channel order

* return batch in (3, H, W, C) shape

* fixing case when x_key is a tuple

* update model to be compatible with ART 1.6

* dont slice channel dim in reverse

* updated dapricot_dev with access to all three cameras

* minor update

* ran black, flake8

* update dapricot version and add new cached checksum file

* use cached by default

* add metric fn for dapricot

* parse patch shape

* fixes bug that arises bc there are no non-targeted adversarial metrics

* make scenario code more specific to dapricot threat model

* move config checks to before model loading

* update configs with label targeters

* add physical threat model

* removing some patch insertion functionality

* remove unused variable

* adding cv2 and necessary dependencies

* add skip-misclassified to dapricot scenario

* removing outdated dockerfile that isnt used anymore

* updating dockerfiles with opencv

* add DEBIAN_FRONTEND=noninteractive to dockerfiles

* docs and minor modification of attack config params

* no longer need to pin to ART dev branch

* adding dapricot test

* tweak to label preprocessing

* typo in sysconfig

* enforce batch_size 1 earlier; fix dapricot config

* fix typo

* json formatting

* reset patch_location and patch_shape per example

* add opencv to host_requirements.txt

* keep everything (3, H, W, 3)

* add comments to host_requirements.txt re cv2

Co-authored-by: lucas.cadalzo <lucas.cadalzo@twosixlabs.com>

* Audio echo (#1030)

* added logic

* audio channel

* audio channel

* audio fixes

* Dev merge (#1032)

* Bump pillow from 7.1.2 to 8.1.1 (#1024)

Bumps [pillow](https://github.com/python-pillow/Pillow) from 7.1.2 to 8.1.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](python-pillow/Pillow@7.1.2...8.1.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump tensorflow-gpu from 1.15.0 to 2.4.0 (#1025)

* Bump tensorflow-gpu from 1.15.0 to 2.4.0

Bumps [tensorflow-gpu](https://github.com/tensorflow/tensorflow) from 1.15.0 to 2.4.0.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorflow@v1.15.0...v2.4.0)

Signed-off-by: dependabot[bot] <support@github.com>

* Update host-requirements.txt

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: davidslater <david.slater@twosixlabs.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: davidslater <david.slater@twosixlabs.com>

Co-authored-by: ng390 <neal.gupta@twosixlabs.com>
Co-authored-by: lcadalzo <39925313+lcadalzo@users.noreply.github.com>
Co-authored-by: kevinmerchant <67436031+kevinmerchant@users.noreply.github.com>
Co-authored-by: matt wartell <matt.wartell@twosixlabs.com>
Co-authored-by: yusong-tan <59029053+yusong-tan@users.noreply.github.com>
Co-authored-by: lucas.cadalzo <lucas.cadalzo@twosixlabs.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants