Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning Elasticsearch for search improvements #4909

Merged
merged 4 commits into from
Nov 19, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion readthedocs/search/documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ class Meta(object):
title = fields.TextField(attr='processed_json.title')
headers = fields.TextField(attr='processed_json.headers')
content = fields.TextField(attr='processed_json.content')
path = fields.TextField(attr='processed_json.path')
path = fields.KeywordField(attr='processed_json.path')

# Fields to perform search with weight
search_fields = ['title^10', 'headers^5', 'content']
Expand Down
17 changes: 14 additions & 3 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -343,18 +343,29 @@ def USE_PROMOS(self): # noqa
ES_INDEXES = {
'project': {
'name': 'project_index',
'settings': {'number_of_shards': 5,
# We do not have much data in the project index, therefore only 1 shard with
# 1 replica is appropriate project index
'settings': {'number_of_shards': 1,
'number_of_replicas': 1
}
},
'page': {
'name': 'page_index',
'settings': {
'number_of_shards': 5,
'number_of_replicas': 1,
# We have 3 nodes, therefore having 3 shards and each one having 3 replica
# will be good fit for our infrastructure. So all the 9(3*3) shards will be
# allocated to 3 nodes. Therefore, if one nodes get failed, the data will be
# inside other nodes and Elasticsearch can serve properly.
'number_of_shards': 3,
'number_of_replicas': 3,
"index": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put comments in the code that explain why we're doing 3 of each here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! Thanks!

"sort.field": ["project", "version"]
}
}
},
}
# Disable auto refresh for increasing index performance
ELASTICSEARCH_DSL_AUTO_REFRESH = False

ALLOWED_HOSTS = ['*']

Expand Down
1 change: 1 addition & 0 deletions readthedocs/settings/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ class CommunityTestSettings(CommunityDevSettings):
TEMPLATE_DEBUG = False
ES_PAGE_IGNORE_SIGNALS = False
ELASTICSEARCH_DSL_AUTOSYNC = False
ELASTICSEARCH_DSL_AUTO_REFRESH = True

@property
def ES_INDEXES(self): # noqa - avoid pep8 N802
Expand Down