Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add top_hits aggregation #6124

Closed
wants to merge 5 commits into from

Conversation

martijnvg
Copy link
Member

The top_hits aggregator keeps track of the most relevant document being aggregated. This aggregator should be used as a sub aggregator of a bucket based aggregator, so that the top documents per bucket are computed.

Via this aggregator grouping / field collapsing can be achieved and is very versatile. Someone can group by a field (using a terms aggregator as parent) or by time (using a histogram aggregator as parent), in any case the parent bucket aggregator determines how to group. How correct the top hits will depend on the parent aggregator. For example when using the terms aggregator and the top_hits aggregator some document may not end up in the response, because the shard_size on the terms aggregator is less then the field's cardinality.

The top_hits aggregator should have the following options:

  • size - The amount of hits to collect.
  • sort - Defines how the top hits should be sorted.
  • and any other fetch phase options. Like source filtering and highlighting.

The prototype that is attached right now to this PR integrates nicely with the fetch phase, which allows all fetch like features to be implemented easily. Also it executes as if the search_type is set to query_and_fetch, this way aggregations don't need to execute extra round trips.

Example usage of the current prototype:

GET /stack/question/_search?search_type=count
{
  "aggs": {
    "terms": {
      "terms": {
        "field": "tags",
        "size": 10
      },
      "aggs": {
        "top_tag_hits": {
          "top_hits": {
            "_source": {
              "include": [
                "title"
              ]
            },
            "sort": [
              {
                "last_activity_date": {
                  "order": "desc"
                }
              }
            ],
            "size" : 3
          }
        }
      }
    }
  }
}

In this example the hits are sorted by the field last_activity_date and only the top 3 hits are returned. Also per hit only the title field is included.

Response:

{
   "took": 151,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 175275,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "terms": {
         "buckets": [
            {
               "key": "windows-7",
               "doc_count": 25365,
               "top_tag_hits": {
                  "hits": {
                     "total": 25365,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602570",
                           "_score": 1,
                           "_source": {
                              "title": "Counter Strike Screen Resolution"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602249",
                           "_score": 1,
                           "_source": {
                              "title": "Hardware error while burning DVD+"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "linux",
               "doc_count": 18342,
               "top_tag_hits": {
                  "hits": {
                     "total": 18342,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602672",
                           "_score": 1,
                           "_source": {
                              "title": "Ubuntu RFID Screensaver lock-unlock"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "543625",
                           "_score": 1,
                           "_source": {
                              "title": "Linux Mint doesn't boot after creating a swap partition"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602434",
                           "_score": 1,
                           "_source": {
                              "title": "Is desktop pc support ssd and sata hard disk in one machine?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows",
               "doc_count": 18119,
               "top_tag_hits": {
                  "hits": {
                     "total": 18119,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602678",
                           "_score": 1,
                           "_source": {
                              "title": "If I change my computers date / time, what could be affected?"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "472446",
                           "_score": 1,
                           "_source": {
                              "title": "Remove the Browser ballot app from Windows 8"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "321988",
                           "_score": 1,
                           "_source": {
                              "title": "How do I determine if my Windows is 32-bit or 64-bit using a command?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "osx",
               "doc_count": 10971,
               "top_tag_hits": {
                  "hits": {
                     "total": 10971,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602680",
                           "_score": 1,
                           "_source": {
                              "title": "How to Install Google Chrome from the command line"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602482",
                           "_score": 1,
                           "_source": {
                              "title": "All Mac OS X apps crash as opened"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "517263",
                           "_score": 1,
                           "_source": {
                              "title": "Create a shortcut for application on Google Chrome for MacOSX"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "ubuntu",
               "doc_count": 8743,
               "top_tag_hits": {
                  "hits": {
                     "total": 8743,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602665",
                           "_score": 1,
                           "_source": {
                              "title": "Add more partitions to Grub2 - Ubuntu"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "190759",
                           "_score": 1,
                           "_source": {
                              "title": "Ubuntu 10.04 Keyboard and Mouse Freezing Problem"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "597297",
                           "_score": 1,
                           "_source": {
                              "title": "Curly braces with LCtrl+LShift+LAlt+è - how?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows-xp",
               "doc_count": 7517,
               "top_tag_hits": {
                  "hits": {
                     "total": 7517,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "510161",
                           "_score": 1,
                           "_source": {
                              "title": "Windows 8 Hyper-V fails to boot Windows XP ISO"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "180565",
                           "_score": 1,
                           "_source": {
                              "title": "Logitech Optical Mouse Frozen In Middle of Windows XP Pro Screen"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "networking",
               "doc_count": 6739,
               "top_tag_hits": {
                  "hits": {
                     "total": 6739,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602679",
                           "_score": 1,
                           "_source": {
                              "title": "Windows port opening"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602645",
                           "_score": 1,
                           "_source": {
                              "title": "are there any requirements for the sequence number on CP RST packets?"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602449",
                           "_score": 1,
                           "_source": {
                              "title": "Vmware Dev Server not allowing HTTP traffic"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "mac",
               "doc_count": 5590,
               "top_tag_hits": {
                  "hits": {
                     "total": 5590,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602482",
                           "_score": 1,
                           "_source": {
                              "title": "All Mac OS X apps crash as opened"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602553",
                           "_score": 1,
                           "_source": {
                              "title": "How can I load VLC instead of iTunes on my Mac when I press the player buttons on my Mac keyboard?"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602460",
                           "_score": 1,
                           "_source": {
                              "title": "startup mac mini using boot usb with MASTER BOOT RECORD scheme but failed"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "wireless-networking",
               "doc_count": 4409,
               "top_tag_hits": {
                  "hits": {
                     "total": 4409,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "265142",
                           "_score": 1,
                           "_source": {
                              "title": "Connect to Wi-Fi access point with specific MAC address"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "602586",
                           "_score": 1,
                           "_source": {
                              "title": "How to adjust Tx Power for Macbook Air mid-2012 Wi-Fi card"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "435290",
                           "_score": 1,
                           "_source": {
                              "title": "Use wifi and ethernet simultaneously?"
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "windows-8",
               "doc_count": 3601,
               "top_tag_hits": {
                  "hits": {
                     "total": 3601,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "510161",
                           "_score": 1,
                           "_source": {
                              "title": "Windows 8 Hyper-V fails to boot Windows XP ISO"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "591388",
                           "_score": 1,
                           "_source": {
                              "title": "Android USB Driver on Windows 8"
                           }
                        },
                        {
                           "_index": "stack",
                           "_type": "question",
                           "_id": "553339",
                           "_score": 1,
                           "_source": {
                              "title": "Can I have 2 PDF documents open in Windows 8?"
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

@jpountz
Copy link
Contributor

jpountz commented May 13, 2014

This looks good to me. Let's add some documentation and tests and I think that will be it.

@yao23
Copy link

yao23 commented May 13, 2014

It's so great to have this aggregation feature with top_hits, but how to use it? Wait for elasticsearch:master accept the merge request and update? Or clone your branch and compile, import in Maven? Thanks in advance for your future response.

@yao23
Copy link

yao23 commented May 13, 2014

Beside using JSON api to parse, do you have any Java or Scala APIs to retrieve or iterate bucket aggregation results, for example, in "key": "osx", how to get "title": "How to Install Google Chrome from the command line", "title": "All Mac OS X apps crash as opened" and "title": "Create a shortcut for application on Google Chrome for MacOSX"? Thanks!

@kimchy
Copy link
Member

kimchy commented May 14, 2014

@yao23 the feature, when it gets in, will be on the 1.3 release (we still have a 1.2 release that will happen hopefully soonish). I would not build this now, wait till it gets into master + 1.x branch, and then if you are eager to try it out, you can build the 1.x branch release once its in.

Regarding the API, there is a full Java client API as part of Elasticsearch, how to access aggregations using it is best asked on the mailing list.

@yao23
Copy link

yao23 commented May 14, 2014

@kimchy Appreciate for your immediate response, I will try to build it and use Java APIs to access aggregations buckets content and post results here after experiment. Reference for other guys, link about Java APIs: http://stackoverflow.com/questions/21018493/how-to-access-aggregations-result-with-elasticsearch-java-api-in-searchresponse

@martijnvg
Copy link
Member Author

@jpountz I added tests and documentation.

* {ref}/search-request-source-filtering.html[Source filtering]
* {ref}/search-request-script-fields.html[Script fields]
* {ref}/search-request-fielddata-fields.html[Fielddata fields]
* {ref}/search-request-version.html[Include versions]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

@jpountz
Copy link
Contributor

jpountz commented May 22, 2014

@martijnvg this looks great. I left some minor comments about the documentation but other than that I'm good with pushing this change!

/**
*
*/
@ElasticsearchIntegrationTest.SuiteScopeTest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this suite scoped? I don't see where this test modifies the cluster neither does it need any specific node level settings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All agg tests are suite scoped, so that is why I made this test suite scoped as well.

Fail if sub aggs are specified
Updated docs
@martijnvg
Copy link
Member Author

@s1monw @jpountz I updated the PR: added more tests and updated the docs and disallow sub aggs in the top_hits agg.

@s1monw
Copy link
Contributor

s1monw commented May 23, 2014

thanks @martijnvg LGTM

@jpountz
Copy link
Contributor

jpountz commented May 23, 2014

LGTM

martijnvg added a commit that referenced this pull request May 23, 2014
@martijnvg martijnvg closed this in 5fafd24 May 23, 2014
@jpountz jpountz changed the title Add top_hits aggregation Aggregations: Add top_hits aggregation Jun 19, 2014
@vvaradhan
Copy link

Is there a master-snapshot version available through maven? I can start on my development till 1.3.0 gets officially released.

Also, what would be a likely release date of 1.3.0?

@clintongormley
Copy link
Contributor

Is there a master-snapshot version available through maven? I can start on my development till 1.3.0 gets officially released.

You can always compile from source. See https://github.com/elasticsearch/elasticsearch/blob/master/README.textile

Also, what would be a likely release date of 1.3.0?

Shortly before 1.4.0 ;)

It'll be released when it is ready.

@murugan-sundararaj
Copy link

Thanks for the top_hits feature in 1.3
Can I use top_hits to get ALL the documents present under a terms bucket. I understand the top_hits "size" accepts only numbers greater than zero. But how to get all the documents?. Thanks

@jpountz
Copy link
Contributor

jpountz commented Jul 24, 2014

You should not try to get all documents, this would blow up CPU and memory on your cluster.

@murugan-sundararaj
Copy link

My use case is like this. I store product data

{
"_id": "product_id",
 "group": "A",
 "page_views": 1000,
 "field_x": "123abc",
 "field_y": "1010zzz"
}

I want to do terms bucket on "group" and get ALL the products, present under each bucket, in the descending order of their "page_views". I would use this result for further calculation.

Query:
GET /my_idx/my_type/_search
{
    "size": 0, 
    "aggs": {
     "product_group": {
      "terms": {
        "field": "group",
        "size": 0
      },
      "aggs": {
        "top_products": {
          "top_hits": {
            "sort": [
              {
                "page_views": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "include": [
                "page_views", "field_x", "field_y"
              ]
            },
            "size": 1000000 //this size is not known
          }
        }
      }
    }
  }
}

Please let me know if there is an alternate way to accomplish this.

@jpountz
Copy link
Contributor

jpountz commented Jul 24, 2014

The only reasonable way to do it would be to first start a request to compute the top groups and then one request per group (with a filter) using scroll for pagination.

@murugan-sundararaj
Copy link

@jpountz thank you. Will try your approach.

@martijnvg martijnvg deleted the feature/top_hits_aggs branch May 18, 2015 23:31
@clintongormley clintongormley changed the title Aggregations: Add top_hits aggregation Add top_hits aggregation Jun 6, 2015
@colings86 colings86 added :Analytics/Aggregations Aggregations and removed :Analytics/Aggregations Aggregations labels Mar 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants