Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Segment replication] Search requests to index hang indefinitely #3834

Closed
Poojita-Raj opened this issue Jul 8, 2022 · 2 comments · Fixed by #4118
Closed

[BUG] [Segment replication] Search requests to index hang indefinitely #3834

Poojita-Raj opened this issue Jul 8, 2022 · 2 comments · Fixed by #4118
Assignees
Labels
bug Something isn't working distributed framework

Comments

@Poojita-Raj
Copy link
Contributor

Describe the bug
Upon indexing documents to an index with segment replication enabled, sometimes search requests sent to that index hang indefinitely. This behavior is noticed both before and after refreshing the index.

To Reproduce
Steps to reproduce the behavior:

  1. Run gradle with 2 nodes

  2. Create an index with segment replication enabled
    curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d '{"settings" : { "index" : { "replication.type" : "SEGMENT" }}}'

  3. Add some documents to the index

curl -X POST "localhost:9200/my-index-000001/_doc/?pretty" -H 'Content-Type: application/json' -d'
{
  "message": "GE70000",
  "user": {
    "id": "kimchy2"
  }
}
'
  1. Can use the cat shards api to see that documents are being added and deleted as expected. Occasionally, may need to refresh to see the correct updated value.

curl -X GET "localhost:9200/_cat/shards?pretty"
curl -X POST "localhost:9200/my-index-000001/_refresh?pretty"

  1. Run any search request to the index to see if it hangs indefinitely instead of providing an output. May take a few tries to see this behavior.
curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

Expected behavior
All search requests should be able to resolve with the correct output of documents in the index quickly.

@Poojita-Raj Poojita-Raj added bug Something isn't working untriaged labels Jul 8, 2022
@kartg kartg self-assigned this Aug 1, 2022
@Poojita-Raj
Copy link
Contributor Author

On digging into this issue, I found that the search requests hang whenever they are routed to the replica instead of the primary. This happens on all occasions even when we can see the files are copied over in the directory and through _cat/shards calls.

Can follow the below steps to confirm:

  1. Use these two commands to determine which is the primary and replica and their specific node IDs.
  • curl -X GET "localhost:9200/_cat/nodes?v=true&h=id,ip,port,v,m,n&pretty"
  • curl -X GET "localhost:9200/_cat/shards?pretty"
  1. Send the search requests with preference set to specific node IDs to see when the search request hangs. This query is of type -
    curl -X GET "localhost:9200/my-index-000001/_search?preference=&pretty" -H 'Content-Type: application/json' -d'
    {
    "query": {
    "match_all" : {}
    }
    }
    '

You can see that all the search requests routed to the primary provide an output and all routed to replica hang indefinitely. I think this is a good starting point to dig into what could be wrong.

@mch2
Copy link
Member

mch2 commented Aug 3, 2022

This is related to external refresh listeners not getting wired in NRTReplicationEngine, so awaitShardSearchActive. Never resolves. Working on a PR to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants