-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admin report for domains which should have data source restriction FF enabled #34813
Conversation
restriction_ff_status=UCRRestrictionFFStatus.ShouldEnable.name, | ||
) | ||
subject = "Weekly report: projects for ucr restriction FF" | ||
recipient = "solutions-tech-app-engineers@dimagi.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I review this, this should probably be an environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have this email? if not, might as well use the soltech email we used to report issues to soltech only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR @Charl1996 , well drafted.
Just wanted to check about pagination on this report. I see we have added 4 options for filter status that we are using to choose to show domains. Is that to work with the pagination or the pagination is not supported on this report?
If you are going to use pagination, Just noting that any filter on the report should be applied when fetching the records to show, then paginated and then they should not be reduced.
A simple code improvement could be do order methods in order of use, the Newspaper metaphor of clean code.
corehq/apps/hqadmin/reports.py
Outdated
restriction_ff_status: str | ||
|
||
def __init__(self, *args, **kwargs): | ||
self.selected_domain = kwargs.get('selected_domain') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can select only one domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a chat with AE and the domain filter is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Charl1996 Could you please update the screenshot in the PR description to show the updated report filters since the domain filter has now been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Let me do that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
corehq/apps/hqadmin/reports.py
Outdated
|
||
for domain in self.ucr_domains(): | ||
case_count = CaseES().domain(domain).count() | ||
form_count = FormES().domain(domain).count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried about the number of queries this would fire up. Lets do this in one go?
Should be straight forward to add a new filter that does that like we have for case_ids filters.term('_id', case_ids)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will look into this a bit.
corehq/apps/hqadmin/reports.py
Outdated
|
||
@staticmethod | ||
def _rebuild_restricted_ff_enabled(domain): | ||
return RESTRICT_DATA_SOURCE_REBUILD.enabled(domain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we memoize RESTRICT_DATA_SOURCE_REBUILD.get_enabled_domains()
so we do this only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
restriction_ff_status=UCRRestrictionFFStatus.ShouldEnable.name, | ||
) | ||
subject = "Weekly report: projects for ucr restriction FF" | ||
recipient = "solutions-tech-app-engineers@dimagi.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have this email? if not, might as well use the soltech email we used to report issues to soltech only.
corehq/apps/hqwebapp/tasks.py
Outdated
|
||
filter_name = UCRRebuildStatusFilter.slug | ||
filter_value = UCRRestrictionFFStatus.ShouldEnable.name | ||
report_url = f"{protocol}://{host}{endpoint}?{filter_name}={filter_value}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not reverse already handle this for you? Is the function get_url_base
doing what you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_url_base
does exactly what I need! Thanks!
I see that goes to commcare-hq/corehq/apps/hqwebapp/views.py Line 890 in f838c3e
As I don't think it would make sense to send this report to the feedback email I think it's probably fine as is? |
3869669
to
76f9c74
Compare
Please note that I force pushed due to lots of small updates baked into existing commits. |
Oh ya, feedback email should not be used. |
@Charl1996 please let me know when its ready for review again. I believe you would either respond to the pagination query I had or update code accordingly. |
Ah, I forgot to push the "re-request review" button. It's ready for review. Thanks for checking in. |
corehq/apps/hqadmin/reports.py
Outdated
self.restrict_rebuild_column_data(domain, case_count, form_count), | ||
] | ||
|
||
def restrict_rebuild_column_data(self, domain, case_count, form_count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to should_restrict_rebuild_column_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think about it, how does _ucr_rebuild_restriction_status_column_data
sound? It's more in line with the column name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds much better 👍
corehq/apps/hqadmin/reports.py
Outdated
def rows(self): | ||
start = self.pagination.start | ||
end = self.pagination.start + self.pagination.count | ||
return self.table_data.rows[start:end] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that we are creating all rows on each page and then picking the ones for the page?
Don't think we should be doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't think we need pagination on this report since we are only fetching domains with UCR feature flag enabled which should not be many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that we are creating all rows on each page and then picking the ones for the page?
Correct. I do feel slightly uncomfortable about that, but I'm not sure how else I could achieve pagination, considering that I need to pull the domains first and then check if the domain applies based on the filter value. In this sense I need to construct all the applicable rows, but then paginate for the UI.
Note: I did check with an AE on whether the filters are necessary in the first place (since they seem kinda nice-to-have) but it was confirmed that they would be very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need pagination on this report since we are only fetching domains with UCR feature flag enabled which should not be many.
The domains on prod having this FF is already quite long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The domains on prod having this FF is already quite long.
Good call on checking that, it is a big list. And that is the reason the current approach won't work, because it will fetch data for all of them and then do the pagination. So, on production, this would have performance issues.
I think the choice here is to
- get rid of filters for status (you could add them as a column instead so that AE still knows it but can't filter by it)
- get rid of pagination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small correction here / correct me if I am wrong:
The domains are iterated and a subset of domains are selected based on the filters (this is something that needs to happen on the application level as there's various things to check)
The domains are iterated and data (case and form count) is fetched for all of them one at a time.
For each domain, It is after the data fetch, we decide if we want to list the domain or not based on filters.
And then once we have the final list of valid domains based on filters, we then paginate.
I agree that performance implication isn't noticeable on staging and then possibly would be same on production.
But I would not consider that as a validation to go with this approach.
If you look at the code, it is like, if these were SQL records, you are fetching all records from DB and then paginating using a list, instead of asking SQL to give you a set of records. And its reasonable because the table is small?
So, I believe, irrespective of the performance implication, you wont go with the above approach to paginate the list instead of via SQL.
The program is fetching and processing all domains for each page anyway, so if we just skip the pagination, the page itself won't become any fast for any single page, but the user won't do multiple requests (different pages) anymore.
So, if you wish to keep the filters, we should skip pagination. Initially, we noticed that the number of UCRs is high, but then the program is going through all of them anyway even now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The domains are iterated and data (case and form count) is fetched for all of them one at a time.
This is not true anymore. It was initially the way I did it by changed it here. Now all the domains' case data is fetched in one query to ES and the forms data is fetched in another query to ES.
The program is fetching and processing all domains for each page anyway, so if we just skip the pagination, the page itself won't become any fast for any single page, but the user won't do multiple requests (different pages) anymore.
This is true. I see your point. Yeah, so if we skip pagination and keep the filters I can now see how that would be an improvement to the current approach. I'll update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted pagination here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the page still looks paginated (maybe this is the standard look) but all records are loaded in the table, since there's no data loading going on when I switch between pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to set disable_pagination=True
on the class?
This reverts commit 72e3af4.
@Charl1996 Thinking through this report, I'm wondering if there would be any value in having an action column as well? This could be even only be enabled for relevant rows that have a "Should Disable" or "Should Enable". Clicking on the button in the new column would then make an Ajax call to the back-end and add/remove the domain from the feature flag. This feels more like a nice-to-have, but it could make things a bit easier by not having to go and manually add/remove domains under the feature flags page. |
@zandre-eng Currently I do something like that, but not as explicitly as an action column. What currently happens is that when the status is "should disable" or "should enable" a link to the FF edit page is included for quick access. See here. |
@Charl1996 Ah, that's perfect. Thanks for the explanation. |
I like that approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go once we are able to remove the pagination entirely.
You mean from the UI? Update: |
@Charl1996 yes. If there is no pagination there should be no pagination on UI.
Spent a couple of mins and I could not get this to work either. let me know if you get blocked on this, I will give this a better try. |
@mkangia I'm having no success so far with removing the pagination, but I'm wondering if we should not complete that part in another PR so as to unblock this one. Thoughts? |
Hey @Charl1996 Got behind this this time, and looks like what we need to set is I tried |
Nice! Thanks! |
@@ -259,7 +259,7 @@ def clear_expired_oauth_tokens(): | |||
call_command('cleartokens') | |||
|
|||
|
|||
@periodic_task(run_every=crontab(day_of_week=1)) | |||
@periodic_task(run_every=crontab(minute=0, hour=1, day_of_week='mon')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did the day_of_week=1 sent the email at a wrong time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it. Apparently there was emails sent every hour or so, but this new configuration seems to have fixed it. It's weird.
default_rows = 10 | ||
disable_pagination = True | ||
ajax_pagination = False | ||
use_datatables = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need all these? I thought use_datatables
was all that was needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they're technically necessary, but I thought of keeping it there to make it obvious to the developer that disabling pagination is the goal, since user_datatables
doesn't really give me an indication that I'm trying to get rid of pagination. I also thought it's maybe a good idea to be explicit by setting the values to their current state.
Let me know if you think I should remove it or if it's fine to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay 👍
I am fine with leaving it how you have done it.
I believe the other options make pagination static, i.e all records are loaded on the UI but there is still pagination shown so there is no ajax request to fetch more records but the user still sees pages. I could see that behavior locally.
I thought of keeping it there to make it obvious to the developer that disabling pagination is the goal
I like that. 👍
Product Description
A new admin report is added to allow us to identify domains which have large amounts of data (forms or cases) which might cause data source rebuilds to be very slow. We use a heuristic (number of cases/form > 1,000,000) to determine whether or not those domains should have the
RESTRICT_DATA_SOURCE_REBUILD
feature flag enabled.The report looks like this:
The column,
UCR rebuild restriction status
(feel free to suggest other names), shows the user whether the domainWhen the FF should be enabled or disabled, the column will have a link (see image above) to the FF page.
In addition, a weekly email is sent to a particular admin email (soltech AEs) using this report's data to identify domains which we should look at. The email links to this report.
Technical Summary
Ticket
I decided to add a separate class,
UCRRebuildRestrictionTable
, handling the table feeding theUCRDataLoadReport
report. This was decided so that we can easily have access to the report data when sending the email (e.g. specifying in the email the number of identified domains based on a certain filter) without having to create a mock request object as is needed when initializing a report class.Feature Flag
No particular feature flag
Safety Assurance
Safety story
Tested locally. Added some tests
Automated test coverage
Added automated tests
QA Plan
No QA from QA team, but will have AE's test it.
Rollback instructions
Labels & Review