-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include EMNIST #55
Include EMNIST #55
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! My only concern is the miss of documentation; unlike other datasets (e.g., MNIST.traindata
, EMNIST provides six types of sub-datasets, e.g., EMNIST.balanced.traindata
and EMNIST.digits.traindata
. This could be a source of confusion so I hope there's some usage documentation to it.
src/EMNIST/EMNIST.jl
Outdated
)) | ||
end | ||
|
||
module balanced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion, I think we should stick to the conversion that the module name is uppercased. EMNIST.Balanced
, EMNIST.ByMerge
, EMNIST.ByClass
, EMNIST.Letters
, EMNIST.Digits
, and EMNIST.MNIST
. This looks clearer to me, how do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The module names have been updated. I have also added some clarification on the front readme page on how the modules are nested for EMNIST since it is sort of 6x as many datasets as the rest. I have also added a table for fast look up of the number of samples (rather than forcing people to go all the way to the EMNIST webpage for that). Does that solve both problems?
@CarloLucibello I noticed that Flux has planed to deprecate its dataset loading codes, so it would be nice to have you double-check this. |
@andrew-saydjari Thank you for doing this! This will be available soon in MLDatasets v0.5.5 |
Fixes #51
I think I have properly included all the data and functionality needed to integrate EMNIST into the package. It passed all tests.
I wrote very little fancy functionality and barebones tests/documentation. I plan to add rotMNIST soon and will try to improve things then time allowing. I really wanted the different datasets packaged in EMNIST to be called like EMNIST.byclass.traindata() etc. I am not sure how I implemented was the cleanest way possible, just creating a whole bunch of submodules. If you have advice on better ways to do this (passing functions outside the module inside so I do not have to redefine it every time, as well as having the using modules propagate), that would be great for future reference.
Super open to suggestions and feedback. This is my first PR in Julia (and ever!) so please be kind. Love the Package and am happy to contribute.