removing data, models and logs directories from repo? #111

fmikaelian · 2019-04-29T08:45:09Z

We could just let the script create them?

andrelmfarias · 2019-05-07T13:29:25Z

I just made a pull request with the changes proposed. By the way, I think we should explain clearly in the README how to use download.py. I remember there was a section about it... did you remove it?

The correct way to use it should be:

Place yourself in the cdQA folder
Run on terminal: python ./cdqa/utils/download.py

fmikaelian · 2019-05-07T16:23:56Z

Sorry I removed the section because I was thinking about the ability to load the data directly from the package, either from disk or from internet. Instead of using a python command.

Like they do in Keras for the data: https://keras.io/datasets/
And for the models: https://keras.io/applications/#xception

andrelmfarias · 2019-05-07T16:29:36Z

Ah ok, I see. I think it is a good idea!

However, what about the creation of the folders data, models and logs locally? Do you think it should be created when the user load a model or a dataset?

fmikaelian · 2019-05-09T08:15:04Z

I think we can be inspired keras' get_file() and load_data() method:

fmikaelian · 2019-05-09T08:16:14Z

The idea is to download a ressource if it is not already cached in a .cdqa repository. Otherwise read directly from this repository.

path = get_file(
        path,
        origin='https://s3.amazonaws.com/keras-datasets/boston_housing.npz',
        file_hash='f553886a1f8d56431e820c5b82552d9d95cfcb96d1e678153f8839538947dff5')

fmikaelian · 2019-05-09T08:20:59Z

Ideally, we can do the same for the datasets and models. For the log repository, we can maybe use logging lib and create .cdqa/logs with it to manage logs.

andrelmfarias · 2019-05-09T16:15:06Z

I think we can be inspired keras' get_file() and load_data() method:

https://github.com/keras-team/keras/blob/a1397169ddf8595736c01fcea084c8e34e1a3884/keras/utils/data_utils.py#L123

https://github.com/keras-team/keras/blob/6b8a3bcd79beb264794ea72fd7a86c64c3f27736/docs/templates/datasets.md#boston-housing-price-regression-dataset

The idea is to download a ressource if it is not already cached in a .cdqa repository. Otherwise read directly from this repository.
path = get_file(
        path,
        origin='https://s3.amazonaws.com/keras-datasets/boston_housing.npz',
        file_hash='f553886a1f8d56431e820c5b82552d9d95cfcb96d1e678153f8839538947dff5')

So actually we will create a cache repo .cdqa with sub repos models and data? The idea is to get rid of download.py?

Ideally, we can do the same for the datasets and models. For the log repository, we can maybe use logging lib and create .cdqa/logs with it to manage logs.

I don't know this library, but I will take a look on it.

fmikaelian · 2019-05-09T16:29:55Z

Yes.

It is just an idea and I think this issue is not the most important at this stage. I think we should focus on getting #122 and #91 and #115 for now. How do you think?

andrelmfarias · 2019-05-09T16:31:27Z

I agree. We can do it later, let's focus on these other issues

fmikaelian · 2019-06-12T07:02:42Z

Came across this after sklearn consortium, might be a good source of inspiration for this issue: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html

andrelmfarias · 2019-06-14T14:24:33Z

Came across this after sklearn consortium, might be a good source of inspiration for this issue: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_openml.html

I am not sure if it suits our needs as our interest is to be able to download the model. It works only for datasets. Do you also want to make it possible to download BNP data?

fmikaelian · 2019-06-14T16:38:48Z

I think the primary idea is to remove the data, models and logs directory from this repo. I usually don't see such directories across python repositories. Users would therefore be able to download manually the models from release, and them read them from anywhere and manage the data and logs as they wish.

The only thing is, if we remove those 3 directories now, what do we need to change ? What is the impact? If none, let's remove these and close the issue.

andrelmfarias · 2019-06-14T16:45:24Z

Got it.

I do not think it will have an impact if we delete them, but I will do some tests to assure us.

The other point I want to adress in my PR (that I am already working on) is #119

fmikaelian added priority: low 4️⃣ tag: environment ☁️ status: help wanted 👋 tag: social 📧 type: question 💬 labels Apr 29, 2019

andrelmfarias self-assigned this May 6, 2019

andrelmfarias mentioned this issue May 7, 2019

Refactoring #124

Closed

fmikaelian closed this as completed Jun 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

removing data, models and logs directories from repo? #111

removing data, models and logs directories from repo? #111

fmikaelian commented Apr 29, 2019

andrelmfarias commented May 7, 2019

fmikaelian commented May 7, 2019 •

edited

Loading

andrelmfarias commented May 7, 2019

fmikaelian commented May 9, 2019 •

edited

Loading

fmikaelian commented May 9, 2019 •

edited

Loading

fmikaelian commented May 9, 2019

andrelmfarias commented May 9, 2019

fmikaelian commented May 9, 2019 •

edited

Loading

andrelmfarias commented May 9, 2019

fmikaelian commented Jun 12, 2019

andrelmfarias commented Jun 14, 2019

fmikaelian commented Jun 14, 2019

andrelmfarias commented Jun 14, 2019 •

edited

Loading

removing data, models and logs directories from repo? #111

removing data, models and logs directories from repo? #111

Comments

fmikaelian commented Apr 29, 2019

andrelmfarias commented May 7, 2019

fmikaelian commented May 7, 2019 • edited Loading

andrelmfarias commented May 7, 2019

fmikaelian commented May 9, 2019 • edited Loading

fmikaelian commented May 9, 2019 • edited Loading

fmikaelian commented May 9, 2019

andrelmfarias commented May 9, 2019

fmikaelian commented May 9, 2019 • edited Loading

andrelmfarias commented May 9, 2019

fmikaelian commented Jun 12, 2019

andrelmfarias commented Jun 14, 2019

fmikaelian commented Jun 14, 2019

andrelmfarias commented Jun 14, 2019 • edited Loading

fmikaelian commented May 7, 2019 •

edited

Loading

fmikaelian commented May 9, 2019 •

edited

Loading

fmikaelian commented May 9, 2019 •

edited

Loading

fmikaelian commented May 9, 2019 •

edited

Loading

andrelmfarias commented Jun 14, 2019 •

edited

Loading