Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timing metrics for case seach #34373

Merged
merged 10 commits into from
Apr 3, 2024
Merged

Add timing metrics for case seach #34373

merged 10 commits into from
Apr 3, 2024

Conversation

esoergel
Copy link
Contributor

@esoergel esoergel commented Apr 2, 2024

Product Description

Admin-only view to see case search performance breakdown. Works best with the LongCaseSearchRequest log in sentry

image

Technical Summary

This initializes a TimingContext object when executing a case search request and keeps track of various sections of the process. Turns out at least in some instances it's not just the ES query that's slow, but also building the query (and executing secondary queries), wrapping cases, serializing cases, fetching an app (seems like that shouldn't be happening here), and enriching the data with related cases. I want to see what this looks like on production, since the circumstances there are pretty different than staging.

The timing code is not necessary in normal case search queries, only when initiating through this admin view, but I don't know of a better way to do this, and I can't imagine this having an appreciable performance impact. It does add a bit of boilerplate to the code, which I don't like, and I'd support removing this when we are no longer as focused on case search performance.

Feature Flag

Safety Assurance

Safety story

The parts of this that involve production code is relatively small, and that's the critical part. The bulk of the changeset is admin-only, and I'm not concerned about errors there. This has been on staging for about a week and I've tested it there without issue.

Automated test coverage

QA Plan

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@esoergel esoergel force-pushed the es/case-search-timings branch from cf0358b to fcfec6d Compare April 2, 2024 21:18
@esoergel esoergel force-pushed the es/case-search-timings branch from fcfec6d to 7879d57 Compare April 2, 2024 21:33
@esoergel esoergel marked this pull request as ready for review April 3, 2024 01:54
@esoergel esoergel requested a review from snopoke as a code owner April 3, 2024 01:54
@esoergel esoergel requested review from Robert-Costello, Jtang-1 and a team April 3, 2024 01:55
Copy link
Contributor

@Jtang-1 Jtang-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused more on the non-admin page related changes. But this is super cool! Just one question

return results


@time_function()
def get_related_cases_result(helper, app, case_types, source_cases, include_all_related_cases):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could either result in _get_all_related_cases OR _get_search_detail_path_defined_cases and _get_child_cases_referenced_in_app. Would it also be helpful to get the label the timing context for those parts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - those can all be given the decorator, so I might as well, huh? Just pushed a commit adding a few more.

@snopoke
Copy link
Contributor

snopoke commented Apr 3, 2024

I haven't looked at the code but I definitely support improving our visibility into core platform components like case search. I can imagine this being generally useful in future, even if we aren't specifically focusing on case search performance, though it would be more useful if app builders could somehow get access to this info (like having a debug mode in web apps or something) - but that's getting a bit off topic.

One thing I notice from the screenshot is that there's no indication of how many times a function was called. For queries that use subcase / ancestor functions it would be useful to have that split out of get some insight into it.

I also wonder if being able to see the actual ES queries that were executed would ever be useful.

Just some thought....

self.searchButtonIcon = ko.observable("fa fa-search");
self.exampleRequestDict = JSON.stringify({
"request_dict": {
"_xpath_query": ["name: 'Ethan'", "subcase-exists(parent, name='Matilda')"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

</div>
</div>
{% if query_additions %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that references to this were removed here: #28278

</div>
<button class="btn btn-default" id="add-xpath" data-bind="click: addXPath">
<i class="fa fa-plus"></i> {% trans "Add XPath" %}
</button>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem unrelated to the profiling, are they just a bonus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I was initially hoping to reuse the existing case search tool (which does profile ES queries), so I added this to be able to specify the types of queries we commonly see (with more than one CSQL expression). Ended up going in a different direction with the new page though once I realized the changes I needed were too fundamental to work here.

return {
key: (value if isinstance(value, list) else [value])
for key, value in params.items() if value is not None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do none of the uses rely on any of the methods of MultiValueDict e.g. getlist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's only used in one place, which converts it to a normal dict. I moved that coercion up the stack to make it easier to initialize in other contexts.

def extract_search_request_config(request_dict):
params = dict(request_dict.lists())

@esoergel
Copy link
Contributor Author

esoergel commented Apr 3, 2024

it would be more useful if app builders could somehow get access to this info (like having a debug mode in web apps or something)

That would be really cool. We could also store the timing information of real requests over some threshold and view it like a report later - that'd be super helpful.

One thing I notice from the screenshot is that there's no indication of how many times a function was called. For queries that use subcase / ancestor functions it would be useful to have that split out of get some insight into it.
I also wonder if being able to see the actual ES queries that were executed would ever be useful.

I agree with both of these - I'd like to explore that next, though I think they'll be a little harder, as you need access to the timing_context deeper down the stack. I was thinking about maybe making some profiler object to hold the timing context, the total counts, and things like actual JSON of the queries run, but I'll have to figure out a way to make it accessible that's not too crufty.

@esoergel esoergel merged commit 7e67343 into master Apr 3, 2024
13 checks passed
@esoergel esoergel deleted the es/case-search-timings branch April 3, 2024 21:20
Copy link

sentry-io bot commented Apr 4, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ESError: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='10.202.40.159', port=9200... /a/{domain}/phone/search/{app_id}/ View Issue
  • ‼️ ESError: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='10.202.41.10', port=9200)... /a/{domain}/phone/search/{app_id}/ View Issue
  • ‼️ ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='10.202.40.159', port=9200... /a/{domain}/phone/search/{app_id}/ View Issue
  • ‼️ JSONDecodeError: Expecting ',' delimiter: line 1 column 81 (char 80) /a/{domain}/case/profile/ View Issue
  • ‼️ CaseSearchUserError: Search request must specify case_types /a/{domain}/case/profile/ View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants