Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider moving JSON schemas to own repository? #201

Closed
flaw opened this issue Aug 29, 2018 · 11 comments
Closed

Consider moving JSON schemas to own repository? #201

flaw opened this issue Aug 29, 2018 · 11 comments

Comments

@flaw
Copy link
Contributor

flaw commented Aug 29, 2018

Description

This is more of a question than anything, but has it been considered if we could move the JSON schemas from this repo to something like eiffel-community/eiffel-json-schemas?

Motivation

Given the prevalence of jsonschema-to-X tools and the convenience of using such when doing eiffel implementations, I'd like to be able to add a repo as a submodule to pull in the schemas which I can then conver to whatever I need during the build. Adding this whole repo as a submodule is less than ideal since I don't need 95% of it in order to generate the code I need.

Exemplification & Benefits

In our case we're looking into implementing 2 to 3 Java based plugins. It makes sense that all of them should use the same versions of the JSON schemas and all of them should be updatable to the same version in a consistent manner.
As things stand, the way to do that is to checkout this repository, use something like the CLI mode of https://github.com/joelittlejohn/jsonschema2pojo to generate the Java classes, package and publish on some artifact repository then add that the resulting artifact as a dependency to all the projects. This gets more complicated if (when) we need to generate something else than POJOs while still trying to make sure all our implementations are using the same versions of the schemas.
In an ideal world, I'd just add the schemas repo as a submodule to all of them, checkout the same version, and have them generate the necessary files during build.

Possible Drawbacks

Protocol updates would require 2 PRs instead of one; one for the documentation side of things in this repo and another for the actual schemas.

@d-stahl-ericsson
Copy link
Contributor

d-stahl-ericsson commented Aug 29, 2018

Glad to hear that you're considering writing plugins! Anything we might look forward to as contributions further down the road? :)

Once upon a time, schemas, examples and documentation were actually split into individual repositories and then merged (#26). This was because, as you mention, such a split is not without drawbacks. The main problem is not the overhead of multiple PRs, but keeping the repositories in sync. It's important that relationships between documentation versions and schema versions are absolutely unambiguous, and a single repo solution helps with that.

That being said, since then we have changed schema versioning. All historical schema versions exist in parallel in the repo, with explicit mapping to event type versions. This introduces some independence, making a split repo solution easier to manage.

Then again, I'm not sure I quite understand why the current setup is a problem. If you git submodule the repo, yes, you get some additional baggage, but the schemas would still be available under a stable path, just as if they were in their own repo?

Edit: Current schema versioning was a result from Issue #76. Thanks @e-backmark-ericsson for the detective work! :)

@flaw
Copy link
Contributor Author

flaw commented Aug 29, 2018

Basically the problem is that this repo is already over 35MB of which the schemas and their history is probably some hundreds of kilobytes, tops. I would prefer to not having to pull all that in on every implementation where I need the schemas.

It's not a huge deal, I've just got bad experiences with submodules bloating and pulling in way more than they ever should, ending up with atrocious hacks of post-merge hooks and shallow clones.

Of course since you've got valid reasoning to having them in the same repo, and there are benefits with having the documentation available along with the schemas even on the consuming side of things, I won't complain (too much) if you decide to keep it as is. :)

As for the contributions, the answer is most probably. There are some policy related clarifications required there as usual but the plan is to write them from the get-go in a manner that can be open sourced with minimal effort.
Speaking of open sourcing, do you happen to have any guesstimates when the persistence solution you guys use at Ericsson might be available? That's the one thing where I'd rather not roll our own if I can help it. In the short term I plan on just cramming all events into some database as-is and call it a day, but that's not exactly what I'd call a robust long term solution.

@d-stahl-ericsson
Copy link
Contributor

Thanks for elaborating. I can see the pros and cons of both approaches, and where I'm standing right now I don't see them weighing heavily in either direction. I like to pride myself on being open to persuasion, though :)

About the persistence solution... Well, hmm. The best information I had previously said 2018Q2, but that's past now. At this time I'd prefer to not make any more guesses. It's not something I'm happy about (and somewhat embarrassed), but there it is. Meanwhile, there's been some discussion of a persistence solution in our Google Groups: https://groups.google.com/forum/#!topic/eiffel-community/8PuUt4fWjLY. I'm not sure what the status of the work done by @azeem59 is right now, but perhaps that can be used?

@flaw
Copy link
Contributor Author

flaw commented Aug 29, 2018

I see, that can't be helped then. I just see the event persistence as the one thing that's most likely to have similar requirements in most Eiffel deployments, so having one proper implementation would make sense there.

I've looked at what @azeem59 had done, and it's basically what I referred to as the short-term solution. The persistence side of that is storing all the events from RMQ to MongoDB, and the rest is a Meteor based replacement for Vici. My problem there is that I'm not at all convinced that MongoDB can actually handle the amount of events produced by something like a full-blown Eiffel setup over any longer period of time. Then again that's just my gut feeling.

There's also https://github.com/eiffel-community/eiffel-persistence-technology-evaluation . It seems something semi related is happening there, but it'll take me a while to figure out exactly what.

@d-stahl-ericsson
Copy link
Contributor

d-stahl-ericsson commented Aug 29, 2018

That is a thesis student we have looking at various DBMS alternatives (graph, non-graph and hybrid) to get a sense of how they might scale for Eiffel data in terms of data and users, respectively.

The interesting thing about Eiffel data is that queries aren't purely graph traversal, nor purely object lookup, but a little bit of both. Which suggests that a hybrid DBMS should be a good fit.

There's nothing in the repo, but once finished I'll make sure results get published. If not in that repo, then in the Google Group.

@flaw
Copy link
Contributor Author

flaw commented Aug 29, 2018

That does make sense and sounds promising. There's actually something that seem like results over in that repo already in the development branch instead of master.
It's just raw data though, and I think I'll keep an eye out for your conclusions instead of analyzing it myself.
For the time being I'm sure we'll be more than fine with a "dumb" MongoDB/DynamoDB based setup.

@e-backmark-ericsson
Copy link
Member

Regarding the repo size - there's one single file responsible for more than 95% of that I guess, which is the reference data set. We should probably have left that out of the repo if we had considered this 'problem' by then. No it's kinda' too late to change since the history is there. But I think we should be very restrictive on updating that zip file. Maybe we should actually move that file to Sepia? Wouldn't a reference data set fit quite well together with the reference architecture?

@flaw
Copy link
Contributor Author

flaw commented Aug 30, 2018

One could also consider utilizing GitHub releases. You already tag editions of Eiffel, correct?
I think it should be relatively trivial to have Travis or some similar service build an archive from the tags that contains the schemas which could then be pulled in using the GitHub APIs on the consuming side.
It's at least preferable over pulling in the reference data sets and such.

@d-stahl-ericsson
Copy link
Contributor

Good point! I like that. An attached artifact to the release containing the latest included version of every event type for an edition would be very useful.

Would you be willing to look into that and potentially do a PR?

@flaw
Copy link
Contributor Author

flaw commented Aug 31, 2018

Yeah, I'll take a look. Can't promise a timeframe but I'll try to make some time.

d-stahl-ericsson pushed a commit that referenced this issue Sep 5, 2018
Addressing issue #201, two tarballs of schemas are generated and attached to each edition (i.e. git tag): event-types-all.tar.gz and event-type-latest.tar.gz. The former contains all schema files, while the latter contains the latest schema per event type present in the tagged repository version. This is implemented using Travis CI deployment script.
@d-stahl-ericsson
Copy link
Contributor

Closing this issue. The problem has been addressed by attaching relevant schema versions to GitHub releases (i.e. protocol editions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants