Include EMNIST #55

andrew-saydjari · 2021-02-04T06:22:46Z

Fixes #51

I think I have properly included all the data and functionality needed to integrate EMNIST into the package. It passed all tests.

I wrote very little fancy functionality and barebones tests/documentation. I plan to add rotMNIST soon and will try to improve things then time allowing. I really wanted the different datasets packaged in EMNIST to be called like EMNIST.byclass.traindata() etc. I am not sure how I implemented was the cleanest way possible, just creating a whole bunch of submodules. If you have advice on better ways to do this (passing functions outside the module inside so I do not have to redefine it every time, as well as having the using modules propagate), that would be great for future reference.

Super open to suggestions and feedback. This is my first PR in Julia (and ever!) so please be kind. Love the Package and am happy to contribute.

johnnychen94

Excellent work! My only concern is the miss of documentation; unlike other datasets (e.g., MNIST.traindata, EMNIST provides six types of sub-datasets, e.g., EMNIST.balanced.traindata and EMNIST.digits.traindata. This could be a source of confusion so I hope there's some usage documentation to it.

johnnychen94 · 2021-02-04T08:23:58Z

src/EMNIST/EMNIST.jl

+            ))
+        end
+
+        module balanced


To avoid confusion, I think we should stick to the conversion that the module name is uppercased. EMNIST.Balanced, EMNIST.ByMerge, EMNIST.ByClass, EMNIST.Letters, EMNIST.Digits, and EMNIST.MNIST. This looks clearer to me, how do you think?

Agreed. The module names have been updated. I have also added some clarification on the front readme page on how the modules are nested for EMNIST since it is sort of 6x as many datasets as the rest. I have also added a table for fast look up of the number of samples (rather than forcing people to go all the way to the EMNIST webpage for that). Does that solve both problems?

johnnychen94 · 2021-02-05T09:39:54Z

@CarloLucibello I noticed that Flux has planed to deprecate its dataset loading codes, so it would be nice to have you double-check this.

johnnychen94 · 2021-02-09T09:01:54Z

@andrew-saydjari Thank you for doing this! This will be available soon in MLDatasets v0.5.5

include EMNIST

93bdb0d

johnnychen94 reviewed Feb 4, 2021

View reviewed changes

andrew-saydjari added 3 commits February 4, 2021 15:24

nomenclature shift

f548c1e

add doc on EMNIST

6d3fdb8

Update README.md

f3a2e37

johnnychen94 approved these changes Feb 5, 2021

View reviewed changes

johnnychen94 requested a review from CarloLucibello February 5, 2021 09:39

johnnychen94 merged commit a1e524a into JuliaML:master Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include EMNIST #55

Include EMNIST #55

andrew-saydjari commented Feb 4, 2021

johnnychen94 left a comment

johnnychen94 Feb 4, 2021

andrew-saydjari Feb 4, 2021

johnnychen94 commented Feb 5, 2021

johnnychen94 commented Feb 9, 2021

Include EMNIST #55

Include EMNIST #55

Conversation

andrew-saydjari commented Feb 4, 2021

johnnychen94 left a comment

Choose a reason for hiding this comment

johnnychen94 Feb 4, 2021

Choose a reason for hiding this comment

andrew-saydjari Feb 4, 2021

Choose a reason for hiding this comment

johnnychen94 commented Feb 5, 2021

johnnychen94 commented Feb 9, 2021