-
-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove standard deviation from stats aggregation #1788
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1788 +/- ##
==========================================
- Coverage 94.14% 94.13% -0.01%
==========================================
Files 275 275
Lines 51916 51895 -21
==========================================
- Hits 48875 48851 -24
- Misses 3041 3044 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Why do we have to remove this just because ES does not have it? That does not seem to justify anything. Can you explain why it is necessary that Tantivy's API strictly matches that of ES instead of just being compatible? Callers of this API who are not interested in the field can just ignore it after all. |
I see where you're coming from. This decision seems arbitrary, but I'll give you more context. Quickwit is built on top of tantivy. Quickwit aims to be fully ES-compatible for a few endpoints. We want to use tools that automatically check whether our ES API implementation conforms to the spec. This field is not part of the spec and will break our tests. Yes, we could deal with that in Quickwit, but it's much simpler to do so in tantivy. Today, Quickwit, Inc., a for-profit company, drives the tantivy project and invests significant resources ($$$) into it. Consequently, the project has evolved positively in the last two years. We are mindful of developing tantivy in a way that allows Quickwit to achieve its goals while ensuring it remains a coherent general-purpose search engine library. In some instances, our design and implementation decisions will be more favorable to the Quickwit users than the pure tantivy users. That's the reality of commercial open-source software. Nevertheless, it's always a tradeoff that we evaluate carefully. In this case, we determined it was perfectly reasonable to move one field from the |
I think for an open-source project, it would nevertheless make sense if this evaluation was public, i.e. giving that context as part of issues and/or pull requests instead of just pushing the resulting decision. (As it is, the related issue here for example adds nothing on top of the PR.)
This seems reasonable to me, but it isn't what this PR proposes. It strictly removes functionality from Tantivy instead of just moving it to another part of the API. (AFAIK, there is no
Finally, w.r.t. the rest of the given context, I have to admit that the comment feels somewhat aggressive to me personally. I have never questioned which organization drives Tantivy or argued about who invests how many resources. I just questioned a change that I cannot understand using the publicly available information. I would probably not have done so, if the rationale of automatically ensuring field-for-field compatibility would have been discussed up front. |
@adamreichold The spec of this collector is to mimick what elasticsearch does. A deviation is a bug that needs to be solved. I agree this spec can be discussed. We considered for a long time leaving this collector in quickwit under the AGPL license, and thought it might be useful for other users. Opensource comes in different forms and a license is not telling much about stewardship, and I understand that this might be frustrating. I'm afraid neither tantivy nor quickwit are very democratic. There are at most consultative democracy. On tantivy, I will usually have the last word, and will always prioritize the interest of, in that order:
I am open to shift this priority when larger users show up publicy, or if companies want to contribute financially to the development of tantivy. |
99c4876
to
0caaf13
Compare
We most likely will not qualify as "large" in terms of data set size, but we do use Tantivy at an official project at the Federal Environment Agency in Germany. (Under AGPL which is planned to stay that way even when commercial contracts are awarded.) But we are still in the prototype phase without a commitment to the technical approach. Whether that commitment happens will not depend on the presence of the standard deviation in the I do understand that in FOSS-without-a-contract, there are no guarantees beyond go-fork-yourself and that the offer of keeping Tantivy open source while building a commercial product on top of it is generous. So I can take that or leave it if I think I have a better alternative (which I do not, otherwise I would not have made the choice and tried to start contributing). But the message that got delivered here does not help me with the taking part. |
(this is actually not part of the elastic api)