UC Irvine Machine Learning Repository is one the most popular collection of datasets that are avalaible for free.
This Package provides functions for the user to easily download from the website directly into a DataFrame.
Additionally, another function allows the user to view the accompanying metadata about the dataset.
julia> Pkg.clone("git://github.com/siddhantjain/UCIMLRepo.jl.git")
note: There are some errors that have been reported so far when trying to run this package on a windows machine. This space will be updated as and when the errors are cleared for windows machine
Two functions are available
1. ucirepodata("DataSetName")
2. ucirepoinfo("DataSetName")
3. ucirepolist()
Obtain a DataFrame with the entire iris data set
using UCIMLRepo
df = ucirepodata("iris")
Alternatively, you may mention the exact link of the dataset to be loaded. There is an optional argument that you need to set to false to do so.
using UCIMLRepo
df = ucirepodata("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",false)
print on STDOUT all the relevant information regarding the dataset
using UCIMLRepo
ucirepoinfo("iris")
As before the exact link may be mentioned for more information on the dataset
using UCIMLRepo
ucirepoinfo("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.names", false)
The package also displays all the packages that are available at the UCI ML repositories. For this end, a simple function as follows can be used
using UCIMLRepo
ucirepolist()
-
Add functionality to parse the output from ucirepoinfo and automatically name the attributes in the DataFrame
-
Add functionality to have a seperate datatype for each attribute in the dataset based on the output from ucirepoinfo
-
Better error handling routines
-
Allow for user to enter the url of the dataset -
Improve speed of ucirepolist