Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open datapackage.json #284

Closed
Stephen-Gates opened this issue Dec 2, 2017 · 27 comments
Closed

Open datapackage.json #284

Stephen-Gates opened this issue Dec 2, 2017 · 27 comments
Labels
f:Feature-request This issue is a request for a new feature fn:Open-Data
Milestone

Comments

@Stephen-Gates
Copy link
Contributor

Stephen-Gates commented Dec 2, 2017

Desired Behaviour

Open a data package .zip or .json from a url or file

Acceptance test

@Stephen-Gates Stephen-Gates added f:Feature-request This issue is a request for a new feature fn:Open-Data labels Dec 2, 2017
@Stephen-Gates Stephen-Gates added this to the v1.x.x milestone Dec 2, 2017
@Stephen-Gates Stephen-Gates modified the milestones: v1.x.x, v1.6.0 Jan 10, 2018
@Stephen-Gates
Copy link
Contributor Author

This is potentially a very important feature if the CKAN Packager extension does not generate a datapackage.zip. The datapackage.json generated may reference the data, schema and csv dialect by URL. The open function will need to deal with this.

If only a datapackage.json is provided, then the README.md will not provided in the download.

@Stephen-Gates
Copy link
Contributor Author

Related to #3

@Stephen-Gates Stephen-Gates modified the milestones: v1.5.0, v1.6.0 Feb 8, 2018
@Stephen-Gates
Copy link
Contributor Author

@louisjasek to confirm this is not overlap with CKAN work and if so propose another issue for this sprint.

@Stephen-Gates
Copy link
Contributor Author

Stephen-Gates commented Apr 18, 2018

This remains in scope. @Stephen-Gates - some of these are invalid according to spec (see fields where the sub-key is empty "" )

Test data

location notes valid
https://raw.githubusercontent.com/frictionlessdata/example-data-packages/master/cpi-data-via-url/datapackage.json datapackage.json at url, data at url, schema and dialect in-line valid
https://raw.githubusercontent.com/frictionlessdata/example-data-packages/master/iso-639-1-language-codes/datapackage.json This data package implements the Language support pattern. invalid
https://github.com/frictionlessdata/example-data-packages/raw/master/zip/cpi.zip local datapackage.zip file, data in package, schema and dialect in-line
https://github.com/frictionlessdata/example-data-packages/raw/master/zip/cpi.zip datapackage.zip at url, data in package, schema and dialect in-line
https://github.com/frictionlessdata/example-data-packages/raw/master/zip/donation-codes-via-url.zip datapackage.zip at url, data, schema and dialect at url

@Stephen-Gates
Copy link
Contributor Author

@mattRedBox clarified the acceptance tests I hope.

Not sure how much you're able to respect all the dialect settings in a data package vs just dealing with comma/tab/semicolon separated files. See questions in the test.

@ghost
Copy link

ghost commented Apr 23, 2018

@Stephen-Gates
There seem to be a few use cases here. So just need some clarification (as the acceptance tests don't seem to spell this out):

  1. Open a URL which is a datapackage.json, which contains URLs for each resources path
  2. Open a URL which is a zipped datapackage (the equivalent of what our export function would create), containing all the csv files within the zipped package
  3. Open a URL which is a zipped datapackage, which contains URLs for each resource path

It seems there are more use cases here, but I'm not entirely sure about these ones:
4. Open a URL which is a datapackage.json, which somehow refers to resource paths as local relative file paths in a zip file (what is the indicator here that the zip file exists as my understanding is that there is no reference to zip within a datapackage.json? -> do we attempt to guess this from the resource parent folder and the base URL of the datapackage.json?, or is there something within the spec that allows this to be spelt out - which would be a lot neater )
5. Could there be a mix of URLs or local relative file paths for resources in any of the above use cases? Or is it always only 1 or the other (the resources are either: all URLS or all local relative file paths)
6. Any of the above could have an invalid datapackage.json or invalid zip, which returns an error indicating this to the user
7. Any of the above could have an invalid URL, which returns an error
8. Any of the above could have a URL which has no content ie: 404

Have I gone too far with these use cases? -> 4 and 5 I'm not so sure about, but just need clarification.
(I'm guessing there are other use cases I haven't thought of here - but it's unlikely I'll get through all of the above by the milestone date).

@Stephen-Gates
Copy link
Contributor Author

@mattRedBox

  1. A datapackage.zip could refer to a mix of resources at a URL or locally.
  2. yes, file could be invalid or missing
  3. yes, url could be in invalid
  4. yes, url, could be missing

thinking about 4....

@ghost
Copy link

ghost commented Apr 23, 2018

Hi @Stephen-Gates
Also thinking about what happens once Data-Curator downloads URLs (both datapackage.json and resource paths):
Once Data-Curator downloads these and opens resource paths in tabs, does the Data-Curator datapackage.json now become 'localised' ie: datapackage.json resource paths are now local and no longer URLs? This seems to make more sense ie: we don't hold on to URLs as Data-Curator export doesn't handle this (yet).

@Stephen-Gates
Copy link
Contributor Author

@mattRedBox for 4. the .zip file is not in the mix at all...

You open a datapackage.json locally
- its data resources .csv may be at a URL or relative file location
- there is no README.md
- for this sprint the schema and dialect are inline in the datapackage.json

You open a datapackage.json at a URL
- its data resources .csv may be at a URL
- there is no README.md
- for this sprint the schema and dialect are inline in the datapackage.json
- example

You open a datapackage.json at a URL
- its data resources .csv may be at a location relative to the datapackage.json URL
- there is no README.md
- for this sprint the schema and dialect are inline in the datapackage.json
- example

@Stephen-Gates
Copy link
Contributor Author

Also thinking about what happens once Data-Curator downloads URLs (both datapackage.json and resource paths):
Once Data-Curator downloads these and opens resource paths in tabs, does the Data-Curator datapackage.json now become 'localised' ie: datapackage.json resource paths are now local and no longer URLs? This seems to make more sense ie: we don't hold on to URLs as Data-Curator export doesn't handle this (yet).

Yes data and data package properties are now local

@Stephen-Gates
Copy link
Contributor Author

Stephen-Gates commented Apr 23, 2018

This is the part of the spec that refers to file locations https://frictionlessdata.io/specs/data-resource/#path-data-in-files

A guide that provides step-by-step instructions for improving your data publishing workflow using Frictionless Data software

@Stephen-Gates
Copy link
Contributor Author

I guess you've discovered an edge case. What If a datapackage.json at a url refers to my.csv relative to that URL and also references a .csv at another URL e.g. http://xyz.com/another.csv ?

Perhaps the user is only interested in editing my.csv and not another.csv because it someone else's data. We don't provide:

  • a facility to not "localise" another.csv
  • a facility to after localising, delete another.csv and replace it with a reference to the URL

Only way to reference data at another URL will be via foreignKeys

Does that make sense @mattRedBox

@Stephen-Gates
Copy link
Contributor Author

Stephen-Gates commented Apr 23, 2018

If you can’t get it all done these priorities may help

  1. open datapackage.zip at URL with all resources in zip file (e.g. someone uploaded the zip to CKAN)
  2. open datapackage.zip at URL with all resources at a URL (e.g. how CKAN Data Package extension currently works)
  3. Open Datapackage.json at URL with all resources at relative URL (e.g. how most Data Packages are stored on GitHub)
  4. More to follow…

@ghost
Copy link

ghost commented Apr 23, 2018

Hi @Stephen-Gates
Thanks for clarification:
So for the first and third case, you've just outlined:

  • when the datapackage.json is a URL, download and process
  • when any/all of the csvs are URLs, download and process (open in separate tabs)
  • when any/all of the csvs are relative file locations, create the URL, based on the datapackage URL using the relative location to append to the base URL, then download and process (open in separate tabs)

@ghost
Copy link

ghost commented Apr 23, 2018

Hi @Stephen-Gates
Ok your last comments indicates there are more cases, so for those cases involving zip files:

  • if there is a datapackage.json as URL, any other files will never be in a zip, but either URLs or relative paths, for Data-Curator to interpret the URLs as required
  • if there is a zip package, download it and:
    -> process the contained datapackage.json as URL or file
    -> process resource paths (described in datapackage.json) as URLs or files

@ghost
Copy link

ghost commented Apr 23, 2018

@Stephen-Gates
Need further clarification (after this I'm going to post a summary of cases which I can then prioritise):

  • I've started with the case that there is a datapackage.json as a URL (so no ZIP file, just resource paths as URLS or relative file locations) - is this still a correct case?

@Stephen-Gates
Copy link
Contributor Author

@mattredbox. In a training course until 2pm. We be hard to reply

ghost pushed a commit that referenced this issue Apr 24, 2018
ghost pushed a commit that referenced this issue Apr 24, 2018
ghost pushed a commit that referenced this issue Apr 24, 2018
ghost pushed a commit that referenced this issue Apr 24, 2018
ghost pushed a commit that referenced this issue Apr 25, 2018
@ghost
Copy link

ghost commented Apr 26, 2018

Need to check licenses and sources in table/package
Also need to pull in package properties.

@Stephen-Gates
Copy link
Contributor Author

Interim menu decision

Open Data Package...
  |- zip from URL
  |- zip from file
  |- json from URL
  |- json from file

@ghost
Copy link

ghost commented Apr 26, 2018

Hi @Stephen-Gates
Datapackage-js, unfortunately, doesn't seem to support a zip as a URL yet.
The logic to load the datapackage first checks if the descriptor ends in '.zip' and then tries to load as a local resource:
Flags that this is a zip
Loads as local resource only
The logic to download a url comes later, but it is too late, as the '.zip' flag has already run.
I'll add loading zip as url to our logic to attempt to get in for this release.

@Stephen-Gates
Copy link
Contributor Author

Stephen-Gates commented Apr 26, 2018

ghost pushed a commit that referenced this issue Apr 26, 2018
ghost pushed a commit that referenced this issue Apr 26, 2018
…always receives string. Handle extra zipped folder in reading csvs from data package json.
ghost pushed a commit that referenced this issue Apr 26, 2018
@ghost
Copy link

ghost commented Apr 26, 2018

Implemented all but json from file.

@Stephen-Gates
Copy link
Contributor Author

Closing will add new issues for bugs found in v0.15.0

@louisjasek
Copy link

I haven't had a chance to test this yet - will do Mon/Tues next week

@louisjasek
Copy link

louisjasek commented Apr 30, 2018

Just had a look at this, all seems to be ok as previously reported, however I have a couple of questions/observations:

  • in our showcase, it was raised table properties might not be displayed (e.g. license information). However, it is working for me. I think the JSON/ZIP we opened in the showcase had license information at the package level rather than the table level. @mattRedBox did you have a look into this? Is this also as you would expect?
  • @Stephen-Gates @mattRedBox - do we display key words in the data curator UI? Where? This data package example contains key words at the package level (https://github.com/frictionlessdata/example-data-packages/raw/master/zip/cpi.zip). When I open it in DataCurator and export it as a new data package, the key words are still there, but I can't find the key words in the DataCurator UI in package properties.

@ghost
Copy link

ghost commented Apr 30, 2018

Hi @louisjasek
Yes, I came to the same conclusion as you - package vs table level.
And the behaviour for key words is a bug I need to fix (to try to align with frictionless as the spec changes, much of the time we pull/push whole settings in/out to capture new changes - it seems I need to explicitly remove? if that's the behaviour that's required ie: when pulling in existing datapackage do we:

  • ignore, but retain (ie: we try to keep existing incoming json but don't allow user to updated - there are also other package/table properties we write in background that currently user is not allowed to change -> not 100% sure how consistently this is applied off topofmyhead but something to keep an eye on once we determine what business logic should be) on export
  • only retain certain properties that package/table can change (ie: remove those not in Data-Curator)- this would have to be spelt out explicitly, as mentioned above, there are also properties we add without allowing any user input - user would only see these on export.

@Stephen-Gates
Copy link
Contributor Author

Stephen-Gates commented May 1, 2018

See #730 and #419 #394

ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
…gainst URL syntax, no resource and invalid datapackage.json.
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
ghost pushed a commit that referenced this issue Apr 21, 2021
…always receives string. Handle extra zipped folder in reading csvs from data package json.
ghost pushed a commit that referenced this issue Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f:Feature-request This issue is a request for a new feature fn:Open-Data
Projects
None yet
Development

No branches or pull requests

2 participants