-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate out of org.json library #26
Comments
@georgeslabreche can you take a look at this as this is a blocker for integrations. |
The action item for this issue is to replace the org.json library in the datapackage-java code as it has a non OSI-compliant license. We should use a library with an open license such as MIT, Apache 2.0, or others listed on the OSI website. |
Based on usage stats, it's a choice between either GSON or Jackson, the others are either too old or small projects with relatively little use. For those two projects, long-term survival is not an issue. I would question whether exposing methods tying the external API to one specific JSON provider is a good design choice, though. I'd think it should use one lib internally and only communicate via JSON strings, not JSON objects. This way, a client is free in choosing their JSON provider. |
Thanks @iSnow! Do you think you would have time to work on this? Or do you think you could help me estimate how much work this would be? I don't know any Java, so it is hard for me to estimate how long it would take to fix this. |
Sure, @lwinfree. I am currently in discussion with @roll about an issue I created that deals with streamlining the datapackage API. Part of that is getting the dependency on org.json classes out of the interface so people could use datapackage in their projects and not be tied to one JSON provider. If that is accepted and the interface changes are landed, I would work on switching over to another JSON provider internally as well - it's not a super small task as the org.json classes are used for a lot of things, among them conversion to CSV. But it's totally doable. Working on both in parallel is a bit too much for me, even if we split off into a feature branch - since we also have to redo JSON handling in tableschema-java, there would be too many moving parts. |
@iSnow awesome. It is really great to have your input and help! |
@iSnow any update here? It would be so great to get the licensing issue resolved here. This would unblock getting this back in OpenRefine. |
I checked what it would take to migrate, but it is not looking good - my guesstimate would be a solid 3-4 weeks for both Java libs together as the json.org code is woven into the whole logic. Since the organization I work for can live with the strange json.org license, other issues like more rigid parsing, security and decent test coverage have a higher prio. And, unfortunately, working on this in parallel isn't really feasible, too many moving parts. |
Thanks for the update @iSnow! That is really helpful information, and I understand that you have higher priorities. To try and get some additional help on the issue, we are going to post this on bountysource (if you end up having the time to work on it, please accept the bounty yourself :-) ). |
Here's the bounty: https://www.bountysource.com/issues/62062088-migrate-out-of-org-json-library |
It's always hard to put a price tag on an issue, but in this case the bounty seems pretty small compared to the task. (just to be clear: I am not trying to bargain, I have no intention to work on this myself)
I agree - for instance the licenses in a Data Package are stored as JSON objects directly, using the underlying library, as the example in the README shows. It would probably be healthier to change this (by introducing the appropriate Given that migrating out of org.json will probably break a lot of interfaces, it might be the appropriate time to also tackle more profound architectural changes (for instance those suggested in #29). But of course that means more work - it could well mean rewriting most of the code! So in terms of effort estimate, I would rate this on the same sort of order of magnitude as what it took to write this library in the first place, IMHO. |
Thanks for the write-up, @wetneb - that's exactly it. I am at liberty to spend some time on improving the code on company costs, but not on topics that don't matter to the org (like this one) and not the amount of hours that would be needed for refactoring both libraries to a different JSON provider. I also like working on it in my free time, but fixing licensing issues is not that interesting, tbh. What I did with the Consequently I removed myself from this issue - if maybe someone drops by who wants to work on this, please go ahead, but check back here first, so I can give some guidance (eg. start with |
Hi @iSnow & others - we've updated the bounty amount based on your feedback! Again, thanks for the comments :-) |
Sounds great, @devmindyg. Please use Jackson as a replacement JSON library as I am already relying on it in a small way. Please start with the Tableschema-java project (https://github.com/frictionlessdata/tableschema-java) as this is the more mature project and the Datapackage-java project depends on it. Please check back in with me before tackling the Datapackage-java project, as this needs more work and we need to plan and align our work packages. To clarify: this is not about just dropping out the json.org library and replacing it with Jackson code, but about improving the library to some extend. The projects were built around taking a JSON object, storing it, taking properties from it, and writing it back out. This leads to a lot of ugly code and switch/case statements as it disregards Java type safety. I changed a lot to make it more idiomatic Java, meaning object inheritance and generic types. Please continue in this spirit. You might want to start with moving the tests to Jackson so you know you have working tests and then switch the library code to Jackson. Jackson features the ObjectMapper that allows you to throw in a JSON string and get back a Java object, not just a Map<String, Object>. It probably would be best if you (and I) worked on feature branches and committed to trunk once a work package is done. |
Jackson would indeed be a sensible choice to migrate to. It has some support for JSON schema, so that could potentially replace the current "Everit" implementation: https://github.com/FasterXML/jackson-module-jsonSchema |
To add to the comment by @wetneb the Everit JSON Schema parser library is specific to json.org JSON objects, it cannot be used with Jackson. So the schema validation logic needs to be refactored to use a different JSON Schema validator. The Jackson module mentioned is mostly for generating a JSON Schema from a POJO (which is something tableschema-java supports), and there's also https://github.com/networknt/json-schema-validator for JSON Schema validation. |
@iSnow @wetneb points taken! |
Forking/PR is fine, @devmindyg, but please keep in mind you absolutely should start with tableschema-java as it is the lower-level library, it has decent test coverage, and is in a relatively stable state. datapackage-java (the one you cloned) must come later, it is currently in no state for changing the JSON lib. The bounty isn't really clear about this, but if you only switch the JSON library for datapackage-java, it will still have a transitive dependency on json.org code via tableschema-java, and OpenRefine will still not be able to use datapackage-java. @lwinfree take note, the scope of the bounty hasn't been clearly defined, if @devmindyg claims the bounty for just refactoring this here library, you do not get what you are looking for. |
Thanks for the clarification @iSnow |
Hey @devmindyg thanks for working on the library. I absolutely like what I see, very good work, and please go ahead and submit a PR. A couple of small things:
So again thanks and you have green light to submit a PR :) |
Hi @iSnow |
Hello again, |
Yes, let's keep this the main communication channel. I'd rather we completely parse the JSON to Java collections/objects and then use type-safe methods to access their properties. Currently, things like I am not 100% sure how to replace the What I'd like to ask you is to either research this and come up with a better idea or to create a branch and to prototypically implement the Schema de-serialization with the help of MixIns or custom Deserializers so that we can see whether that's a route we should take considering effort and code size. The foreign keys aren't the important part, what is essential is the Oh, and please pull in the changes I made in the last two days. |
I haven't had a close look but the Or if the list of subclasses is fixed (and not meant to be extensible by users), then annotations could do it too: |
Thanks, @wetneb, very helpful. Yes, the list of subclasses is fixed and is not user-extensible as the list of subclasses is tied to the schema definition by frictionlessdata. |
@rufuspollock due to COVID-19, I am currently busy |
@iSnow, @rufuspollock do you want me to start on tableschema-java ? |
It might be worth checking with @devmindyg first, to see if they got anywhere near completing this. |
@Amraneze You'll have to wait for @devmindyg to declare whether they are still working on it. |
@devmindyg could you let us know in the next few days if you are still working on this? If not we are planning to see if someone else could take it on. |
@lwinfree could you state in the description of the issue the details of the bounty as the bountysource link is giving an error atm. |
Sounds like an interesting task. |
@jdbranham please go ahead - we have not heard from @devmindyg for a while. /cc @lwinfree @lauragift21 |
Thanks for your interest @jdbranham! @devmindyg thanks for your work on this, but we will need to move ahead here. |
There were no existing conflicts, so I merged |
@iSnow @rufuspollock @lwinfree According to the everit library -
Should I implement schema validation with this library? Is the license acceptable ASL2.0 + LGPL? |
Actually this one is active, Apache 2.0, and lightweight - I'll go ahead with this, but I've abstracted the schema validation, so it will be easier to switch if necessary. |
Sounds great, @jdbranham. It's clear that we have to move away from the everit library, as this is too tightly bound to the org.json API. I don't have a strong opinion on which JSON-schema lib to use, the one you found seems like a very good choice. From what I read, you are on a good path using the Jackson features to read/write JSON, so please carry on :) |
Thanks @iSnow - Workaround for the deserialization process - when the table row is parsed, we can replace the 'speech' quotes the previous library used, with regular double quotes. So it should be backwards compatible. For serialization, it was a little tricker. Basically we change the Json Question about one test - when we infer the schema, one of the tests is expecting the field title to be present, but I don't see it set anywhere. I could make the field title match the name during deserialization, but I wanted to make sure this didn't cause any unwanted side-effects. Let me know what you guys think about this - if it's cool, then I'll fix it, and start on the frictionlessdata/tableschema-java@master...savantly-net:integration |
Actually the title isn't considered in https://travis-ci.org/github/frictionlessdata/tableschema-java/builds/679147569 |
PR opened for datapackage-java PR - |
Hi @iSnow + @jdbranham! Thanks for all the work here! Has frictionlessdata/tableschema-java#55 closed this issue? Or are there other parts remaining? |
hi @lwinfree - not fully, I need to review the PR for datapackage-java, which I will do in the coming days, then it should be good to close the issue and pay the bounty. |
Do I need to do anything else? =) |
Hi @jdbranham thank you for all your work and for pinging me. I'm going to close this so we can pay you the bounty. Cheers! |
hey @jdbranham can you claim the bounty on bounty source? Let me know if you have problems! |
Using org.json as a JSON library is problematic, as the licence of this library includes an additional clause "The Software shall be used for Good, not Evil."
This is regarded as a non-free licence (it is non OSI-compliant).
That makes it impossible for projects relying on this library (such as OpenRefine) to be OSI-compliant in turn.
The text was updated successfully, but these errors were encountered: