Add top_hits aggregation #6124

martijnvg · 2014-05-12T10:44:11Z

The top_hits aggregator keeps track of the most relevant document being aggregated. This aggregator should be used as a sub aggregator of a bucket based aggregator, so that the top documents per bucket are computed.

Via this aggregator grouping / field collapsing can be achieved and is very versatile. Someone can group by a field (using a terms aggregator as parent) or by time (using a histogram aggregator as parent), in any case the parent bucket aggregator determines how to group. How correct the top hits will depend on the parent aggregator. For example when using the terms aggregator and the top_hits aggregator some document may not end up in the response, because the shard_size on the terms aggregator is less then the field's cardinality.

The top_hits aggregator should have the following options:

size - The amount of hits to collect.
sort - Defines how the top hits should be sorted.
and any other fetch phase options. Like source filtering and highlighting.

The prototype that is attached right now to this PR integrates nicely with the fetch phase, which allows all fetch like features to be implemented easily. Also it executes as if the search_type is set to query_and_fetch, this way aggregations don't need to execute extra round trips.

Example usage of the current prototype:

GET /stack/question/_search?search_type=count
{
  "aggs": {
    "terms": {
      "terms": {
        "field": "tags",
        "size": 10
      },
      "aggs": {
        "top_tag_hits": {
          "top_hits": {
            "_source": {
              "include": [
                "title"
              ]
            },
            "sort": [
              {
                "last_activity_date": {
                  "order": "desc"
                }
              }
            ],
            "size" : 3
          }
        }
      }
    }
  }
}

In this example the hits are sorted by the field last_activity_date and only the top 3 hits are returned. Also per hit only the title field is included.

Response:

{
   "took": 151,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 175275,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "terms": {
         "buckets": [
            {
               "key": "windows-7",
               "doc_count": 25365,
               "top_tag_hits": {
                  "hits": {
                     "total": 25365,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602570",
                           "_score": 1,
                           "_source": {
                              "title": "Counter Strike Screen Resolution"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602249",
                           "_score": 1,
                           "_source": {
                              "title": "Hardware error while burning DVD+"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "linux",
               "doc_count": 18342,
               "top_tag_hits": {
                  "hits": {
                     "total": 18342,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602672",
                           "_score": 1,
                           "_source": {
                              "title": "Ubuntu RFID Screensaver lock-unlock"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "543625",
                           "_score": 1,
                           "_source": {
                              "title": "Linux Mint doesn't boot after creating a swap partition"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602434",
                           "_score": 1,
                           "_source": {
                              "title": "Is desktop pc support ssd and sata hard disk in one machine?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows",
               "doc_count": 18119,
               "top_tag_hits": {
                  "hits": {
                     "total": 18119,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602678",
                           "_score": 1,
                           "_source": {
                              "title": "If I change my computers date / time, what could be affected?"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "472446",
                           "_score": 1,
                           "_source": {
                              "title": "Remove the Browser ballot app from Windows 8"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "321988",
                           "_score": 1,
                           "_source": {
                              "title": "How do I determine if my Windows is 32-bit or 64-bit using a command?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "osx",
               "doc_count": 10971,
               "top_tag_hits": {
                  "hits": {
                     "total": 10971,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602680",
                           "_score": 1,
                           "_source": {
                              "title": "How to Install Google Chrome from the command line"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602482",
                           "_score": 1,
                           "_source": {
                              "title": "All Mac OS X apps crash as opened"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "517263",
                           "_score": 1,
                           "_source": {
                              "title": "Create a shortcut for application on Google Chrome for MacOSX"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "ubuntu",
               "doc_count": 8743,
               "top_tag_hits": {
                  "hits": {
                     "total": 8743,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602665",
                           "_score": 1,
                           "_source": {
                              "title": "Add more partitions to Grub2 - Ubuntu"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "190759",
                           "_score": 1,
                           "_source": {
                              "title": "Ubuntu 10.04 Keyboard and Mouse Freezing Problem"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "597297",
                           "_score": 1,
                           "_source": {
                              "title": "Curly braces with LCtrl+LShift+LAlt+è - how?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows-xp",
               "doc_count": 7517,
               "top_tag_hits": {
                  "hits": {
                     "total": 7517,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "510161",
                           "_score": 1,
                           "_source": {
                              "title": "Windows 8 Hyper-V fails to boot Windows XP ISO"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "180565",
                           "_score": 1,
                           "_source": {
                              "title": "Logitech Optical Mouse Frozen In Middle of Windows XP Pro Screen"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "networking",
               "doc_count": 6739,
               "top_tag_hits": {
                  "hits": {
                     "total": 6739,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602645",
                           "_score": 1,
                           "_source": {
                              "title": "are there any requirements for the sequence number on CP RST packets?"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602449",
                           "_score": 1,
                           "_source": {
                              "title": "Vmware Dev Server not allowing HTTP traffic"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "mac",
               "doc_count": 5590,
               "top_tag_hits": {
                  "hits": {
                     "total": 5590,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602482",
                           "_score": 1,
                           "_source": {
                              "title": "All Mac OS X apps crash as opened"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602553",
                           "_score": 1,
                           "_source": {
                              "title": "How can I load VLC instead of iTunes on my Mac when I press the player buttons on my Mac keyboard?"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602460",
                           "_score": 1,
                           "_source": {
                              "title": "startup mac mini using boot usb with MASTER BOOT RECORD scheme but failed"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "wireless-networking",
               "doc_count": 4409,
               "top_tag_hits": {
                  "hits": {
                     "total": 4409,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "265142",
                           "_score": 1,
                           "_source": {
                              "title": "Connect to Wi-Fi access point with specific MAC address"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602586",
                           "_score": 1,
                           "_source": {
                              "title": "How to adjust Tx Power for Macbook Air mid-2012 Wi-Fi card"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "435290",
                           "_score": 1,
                           "_source": {
                              "title": "Use wifi and ethernet simultaneously?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows-8",
               "doc_count": 3601,
               "top_tag_hits": {
                  "hits": {
                     "total": 3601,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "510161",
                           "_score": 1,
                           "_source": {
                              "title": "Windows 8 Hyper-V fails to boot Windows XP ISO"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "591388",
                           "_score": 1,
                           "_source": {
                              "title": "Android USB Driver on Windows 8"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "553339",
                           "_score": 1,
                           "_source": {
                              "title": "Can I have 2 PDF documents open in Windows 8?"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

jpountz · 2014-05-13T10:28:48Z

This looks good to me. Let's add some documentation and tests and I think that will be it.

yao23 · 2014-05-13T16:41:17Z

It's so great to have this aggregation feature with top_hits, but how to use it? Wait for elasticsearch:master accept the merge request and update? Or clone your branch and compile, import in Maven? Thanks in advance for your future response.

yao23 · 2014-05-13T21:13:37Z

Beside using JSON api to parse, do you have any Java or Scala APIs to retrieve or iterate bucket aggregation results, for example, in "key": "osx", how to get "title": "How to Install Google Chrome from the command line", "title": "All Mac OS X apps crash as opened" and "title": "Create a shortcut for application on Google Chrome for MacOSX"? Thanks!

kimchy · 2014-05-14T00:05:23Z

@yao23 the feature, when it gets in, will be on the 1.3 release (we still have a 1.2 release that will happen hopefully soonish). I would not build this now, wait till it gets into master + 1.x branch, and then if you are eager to try it out, you can build the 1.x branch release once its in.

Regarding the API, there is a full Java client API as part of Elasticsearch, how to access aggregations using it is best asked on the mailing list.

yao23 · 2014-05-14T03:48:37Z

@kimchy Appreciate for your immediate response, I will try to build it and use Java APIs to access aggregations buckets content and post results here after experiment. Reference for other guys, link about Java APIs: http://stackoverflow.com/questions/21018493/how-to-access-aggregations-result-with-elasticsearch-java-api-in-searchresponse

martijnvg · 2014-05-22T20:43:00Z

@jpountz I added tests and documentation.

jpountz · 2014-05-22T21:10:00Z

docs/reference/search/aggregations/bucket/tophits-aggregations.asciidoc

+* {ref}/search-request-source-filtering.html[Source filtering]
+* {ref}/search-request-script-fields.html[Script fields]
+* {ref}/search-request-fielddata-fields.html[Fielddata fields]
+* {ref}/search-request-version.html[Include versions]


jpountz · 2014-05-22T22:15:29Z

@martijnvg this looks great. I left some minor comments about the documentation but other than that I'm good with pushing this change!

s1monw · 2014-05-23T07:21:12Z

src/test/java/org/elasticsearch/search/aggregations/bucket/TopHitsTests.java

+/**
+ *
+ */
+@ElasticsearchIntegrationTest.SuiteScopeTest()


why is this suite scoped? I don't see where this test modifies the cluster neither does it need any specific node level settings?

All agg tests are suite scoped, so that is why I made this test suite scoped as well.

Fail if sub aggs are specified Updated docs

martijnvg · 2014-05-23T12:07:45Z

@s1monw @jpountz I updated the PR: added more tests and updated the docs and disallow sub aggs in the top_hits agg.

s1monw · 2014-05-23T12:09:33Z

thanks @martijnvg LGTM

jpountz · 2014-05-23T12:40:02Z

LGTM

…cument being aggregated per bucket. Closes #6124

vvaradhan · 2014-06-26T18:24:52Z

Is there a master-snapshot version available through maven? I can start on my development till 1.3.0 gets officially released.

Also, what would be a likely release date of 1.3.0?

clintongormley · 2014-07-01T13:35:57Z

Is there a master-snapshot version available through maven? I can start on my development till 1.3.0 gets officially released.

You can always compile from source. See https://github.com/elasticsearch/elasticsearch/blob/master/README.textile

Also, what would be a likely release date of 1.3.0?

Shortly before 1.4.0 ;)

It'll be released when it is ready.

murugan-sundararaj · 2014-07-24T06:38:19Z

Thanks for the top_hits feature in 1.3
Can I use top_hits to get ALL the documents present under a terms bucket. I understand the top_hits "size" accepts only numbers greater than zero. But how to get all the documents?. Thanks

jpountz · 2014-07-24T06:41:15Z

You should not try to get all documents, this would blow up CPU and memory on your cluster.

murugan-sundararaj · 2014-07-24T07:20:40Z

My use case is like this. I store product data

{
"_id": "product_id",
 "group": "A",
 "page_views": 1000,
 "field_x": "123abc",
 "field_y": "1010zzz"
}

I want to do terms bucket on "group" and get ALL the products, present under each bucket, in the descending order of their "page_views". I would use this result for further calculation.

Query:

GET /my_idx/my_type/_search
{
    "size": 0, 
    "aggs": {
     "product_group": {
      "terms": {
        "field": "group",
        "size": 0
      },
      "aggs": {
        "top_products": {
          "top_hits": {
            "sort": [
              {
                "page_views": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "include": [
                "page_views", "field_x", "field_y"
              ]
            },
            "size": 1000000 //this size is not known
          }
        }
      }
    }
  }
}

Please let me know if there is an alternate way to accomplish this.

jpountz · 2014-07-24T07:54:12Z

The only reasonable way to do it would be to first start a request to compute the top groups and then one request per group (with a filter) using scroll for pagination.

murugan-sundararaj · 2014-07-24T08:17:21Z

@jpountz thank you. Will try your approach.

martijnvg added feature labels May 12, 2014

martijnvg mentioned this pull request May 12, 2014

Terms facet results retrieve and pick certain number products for top users #6109

Closed

This was referenced May 12, 2014

Grouping prototype implementation #2326

Closed

Field Collapsing/Combining #256

Closed

martijnvg added 4 commits May 22, 2014 22:42

First prototype of the top_hits agg

2e64b35

Added builder, first test and more options to parsers

eab8b17

Added more tests

19a2a9d

Added docs.

7f16277

jpountz reviewed May 22, 2014
View reviewed changes

s1monw reviewed May 23, 2014
View reviewed changes

Added more tests

685cea4

Fail if sub aggs are specified Updated docs

martijnvg added a commit that referenced this pull request May 23, 2014

Added top_hits aggregation that keeps track of the most relevant do…

90458ca

…cument being aggregated per bucket. Closes #6124

martijnvg closed this in 5fafd24 May 23, 2014

clintongormley mentioned this pull request May 23, 2014

Add a variety constraint to querying and hit collection #500

Closed

jpountz added the highlight label Jun 19, 2014

jpountz changed the title ~~Add top_hits aggregation~~ Aggregations: Add top_hits aggregation Jun 19, 2014

clintongormley mentioned this pull request Jul 2, 2014

Feature Request: The ability to "join" parent and children #761

Closed

gmarz mentioned this pull request Jul 22, 2014

Top hits aggregation support elastic/elasticsearch-net#820

Closed

martijnvg deleted the feature/top_hits_aggs branch May 18, 2015 23:31

clintongormley added the :Analytics/Aggregations Aggregations label Jun 6, 2015

clintongormley changed the title ~~Aggregations: Add top_hits aggregation~~ Add top_hits aggregation Jun 6, 2015

clintongormley added :Top Hits and removed :Analytics/Aggregations Aggregations labels Jun 7, 2015

colings86 added :Analytics/Aggregations Aggregations and removed :Analytics/Aggregations Aggregations labels Mar 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add top_hits aggregation #6124

Add top_hits aggregation #6124

martijnvg commented May 12, 2014

jpountz commented May 13, 2014

yao23 commented May 13, 2014

yao23 commented May 13, 2014

kimchy commented May 14, 2014

yao23 commented May 14, 2014

martijnvg commented May 22, 2014

jpountz May 22, 2014

jpountz commented May 22, 2014

s1monw May 23, 2014

martijnvg May 23, 2014

martijnvg commented May 23, 2014

s1monw commented May 23, 2014

jpountz commented May 23, 2014

vvaradhan commented Jun 26, 2014

clintongormley commented Jul 1, 2014

murugan-sundararaj commented Jul 24, 2014

jpountz commented Jul 24, 2014

murugan-sundararaj commented Jul 24, 2014

jpountz commented Jul 24, 2014

murugan-sundararaj commented Jul 24, 2014

Add top_hits aggregation #6124

Add top_hits aggregation #6124

Conversation

martijnvg commented May 12, 2014

jpountz commented May 13, 2014

yao23 commented May 13, 2014

yao23 commented May 13, 2014

kimchy commented May 14, 2014

yao23 commented May 14, 2014

martijnvg commented May 22, 2014

jpountz May 22, 2014

Choose a reason for hiding this comment

jpountz commented May 22, 2014

s1monw May 23, 2014

Choose a reason for hiding this comment

martijnvg May 23, 2014

Choose a reason for hiding this comment

martijnvg commented May 23, 2014

s1monw commented May 23, 2014

jpountz commented May 23, 2014

vvaradhan commented Jun 26, 2014

clintongormley commented Jul 1, 2014

murugan-sundararaj commented Jul 24, 2014

jpountz commented Jul 24, 2014

murugan-sundararaj commented Jul 24, 2014

Query:

jpountz commented Jul 24, 2014

murugan-sundararaj commented Jul 24, 2014