Remove standard deviation from stats aggregation #1788

guilload · 2023-01-13T23:52:07Z

(this is actually not part of the elastic api)

codecov-commenter · 2023-01-14T00:10:03Z

Codecov Report

Merging #1788 (0caaf13) into main (a59bd96) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1788      +/-   ##
==========================================
- Coverage   94.14%   94.13%   -0.01%     
==========================================
  Files         275      275              
  Lines       51916    51895      -21     
==========================================
- Hits        48875    48851      -24     
- Misses       3041     3044       +3

Impacted Files	Coverage Δ
src/aggregation/agg_req.rs	`94.03% <ø> (ø)`
src/aggregation/bucket/histogram/histogram.rs	`99.56% <ø> (-0.01%)`	⬇️
src/aggregation/metric/stats.rs	`96.99% <100.00%> (-0.24%)`	⬇️
src/fastfield/multivalued/mod.rs	`98.45% <0.00%> (-0.78%)`	⬇️
src/schema/schema.rs	`98.78% <0.00%> (-0.14%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

adamreichold · 2023-01-14T07:33:46Z

Why do we have to remove this just because ES does not have it? That does not seem to justify anything. Can you explain why it is necessary that Tantivy's API strictly matches that of ES instead of just being compatible? Callers of this API who are not interested in the field can just ignore it after all.

guilload · 2023-01-14T16:22:03Z

I see where you're coming from. This decision seems arbitrary, but I'll give you more context. Quickwit is built on top of tantivy. Quickwit aims to be fully ES-compatible for a few endpoints. We want to use tools that automatically check whether our ES API implementation conforms to the spec. This field is not part of the spec and will break our tests. Yes, we could deal with that in Quickwit, but it's much simpler to do so in tantivy.

Today, Quickwit, Inc., a for-profit company, drives the tantivy project and invests significant resources ($$$) into it. Consequently, the project has evolved positively in the last two years. We are mindful of developing tantivy in a way that allows Quickwit to achieve its goals while ensuring it remains a coherent general-purpose search engine library. In some instances, our design and implementation decisions will be more favorable to the Quickwit users than the pure tantivy users. That's the reality of commercial open-source software. Nevertheless, it's always a tradeoff that we evaluate carefully. In this case, we determined it was perfectly reasonable to move one field from the stats aggregation to the extended_stats one.

adamreichold · 2023-01-14T16:53:59Z

Nevertheless, it's always a tradeoff that we evaluate carefully.

I think for an open-source project, it would nevertheless make sense if this evaluation was public, i.e. giving that context as part of issues and/or pull requests instead of just pushing the resulting decision. (As it is, the related issue here for example adds nothing on top of the PR.)

In this case, we determined it was perfectly reasonable to move one field from the stats aggregation to the extended_stats one.

This seems reasonable to me, but it isn't what this PR proposes. It strictly removes functionality from Tantivy instead of just moving it to another part of the API. (AFAIK, there is no ExtendedStats API at the moment?)

Today, Quickwit, Inc., a for-profit company, drives the tantivy project [..]

Finally, w.r.t. the rest of the given context, I have to admit that the comment feels somewhat aggressive to me personally. I have never questioned which organization drives Tantivy or argued about who invests how many resources.

I just questioned a change that I cannot understand using the publicly available information. I would probably not have done so, if the rationale of automatically ensuring field-for-field compatibility would have been discussed up front.

fulmicoton · 2023-01-17T03:32:18Z

@adamreichold The spec of this collector is to mimick what elasticsearch does. A deviation is a bug that needs to be solved. I agree this spec can be discussed. We considered for a long time leaving this collector in quickwit under the AGPL license, and thought it might be useful for other users.

Opensource comes in different forms and a license is not telling much about stewardship, and I understand that this might be frustrating. I'm afraid neither tantivy nor quickwit are very democratic. There are at most consultative democracy.

On tantivy, I will usually have the last word, and will always prioritize the interest of, in that order:

quickwit
actual official tantivy users weighted by the importance of their usage.
contributors
non-contributors people

I am open to shift this priority when larger users show up publicy, or if companies want to contribute financially to the development of tantivy.

adamreichold · 2023-01-17T06:11:16Z

We most likely will not qualify as "large" in terms of data set size, but we do use Tantivy at an official project at the Federal Environment Agency in Germany. (Under AGPL which is planned to stay that way even when commercial contracts are awarded.) But we are still in the prototype phase without a commitment to the technical approach.

Whether that commitment happens will not depend on the presence of the standard deviation in the Stats aggregator. But it will depend on our work group making the conscious choice to rely on Tantivy for the multi-year project duration (which is admittedly the same for all non-trivial dependencies). And that depends on whether we are able to build up sufficient trust in the stewardship of the project during the prototype phase.

I do understand that in FOSS-without-a-contract, there are no guarantees beyond go-fork-yourself and that the offer of keeping Tantivy open source while building a commercial product on top of it is generous. So I can take that or leave it if I think I have a better alternative (which I do not, otherwise I would not have made the choice and tried to start contributing). But the message that got delivered here does not help me with the taking part.

guilload linked an issue Jan 13, 2023 that may be closed by this pull request

Remove standard_deviation field from Stats #1786

Closed

guilload requested a review from PSeitz January 13, 2023 23:52

fulmicoton approved these changes Jan 17, 2023

View reviewed changes

Remove standard deviation from stats aggregation

0caaf13

guilload force-pushed the guilload/remove-std-dev-from-stats-agg branch from 99c4876 to 0caaf13 Compare January 17, 2023 03:58

guilload merged commit c9cb3d0 into main Jan 17, 2023

guilload deleted the guilload/remove-std-dev-from-stats-agg branch January 17, 2023 04:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove standard deviation from stats aggregation #1788

Remove standard deviation from stats aggregation #1788

guilload commented Jan 13, 2023 •

edited by fulmicoton

Loading

codecov-commenter commented Jan 14, 2023 •

edited

Loading

adamreichold commented Jan 14, 2023

guilload commented Jan 14, 2023

adamreichold commented Jan 14, 2023

fulmicoton commented Jan 17, 2023

adamreichold commented Jan 17, 2023

Remove standard deviation from stats aggregation #1788

Remove standard deviation from stats aggregation #1788

Conversation

guilload commented Jan 13, 2023 • edited by fulmicoton Loading

codecov-commenter commented Jan 14, 2023 • edited Loading

Codecov Report

adamreichold commented Jan 14, 2023

guilload commented Jan 14, 2023

adamreichold commented Jan 14, 2023

fulmicoton commented Jan 17, 2023

adamreichold commented Jan 17, 2023

guilload commented Jan 13, 2023 •

edited by fulmicoton

Loading

codecov-commenter commented Jan 14, 2023 •

edited

Loading