Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include EMNIST #55

Merged
merged 4 commits into from
Feb 9, 2021
Merged

Conversation

andrew-saydjari
Copy link
Contributor

Fixes #51

I think I have properly included all the data and functionality needed to integrate EMNIST into the package. It passed all tests.

I wrote very little fancy functionality and barebones tests/documentation. I plan to add rotMNIST soon and will try to improve things then time allowing. I really wanted the different datasets packaged in EMNIST to be called like EMNIST.byclass.traindata() etc. I am not sure how I implemented was the cleanest way possible, just creating a whole bunch of submodules. If you have advice on better ways to do this (passing functions outside the module inside so I do not have to redefine it every time, as well as having the using modules propagate), that would be great for future reference.

Super open to suggestions and feedback. This is my first PR in Julia (and ever!) so please be kind. Love the Package and am happy to contribute.

Copy link
Member

@johnnychen94 johnnychen94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! My only concern is the miss of documentation; unlike other datasets (e.g., MNIST.traindata, EMNIST provides six types of sub-datasets, e.g., EMNIST.balanced.traindata and EMNIST.digits.traindata. This could be a source of confusion so I hope there's some usage documentation to it.

))
end

module balanced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, I think we should stick to the conversion that the module name is uppercased. EMNIST.Balanced, EMNIST.ByMerge, EMNIST.ByClass, EMNIST.Letters, EMNIST.Digits, and EMNIST.MNIST. This looks clearer to me, how do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The module names have been updated. I have also added some clarification on the front readme page on how the modules are nested for EMNIST since it is sort of 6x as many datasets as the rest. I have also added a table for fast look up of the number of samples (rather than forcing people to go all the way to the EMNIST webpage for that). Does that solve both problems?

@johnnychen94
Copy link
Member

@CarloLucibello I noticed that Flux has planed to deprecate its dataset loading codes, so it would be nice to have you double-check this.

@johnnychen94 johnnychen94 merged commit a1e524a into JuliaML:master Feb 9, 2021
@johnnychen94
Copy link
Member

@andrew-saydjari Thank you for doing this! This will be available soon in MLDatasets v0.5.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EMNIST
2 participants