Change Case and User Ownership (Script) #34588

zandre-eng · 2024-05-09T12:24:51Z

Technical Summary

Link to ticket here.

Please Note: This PR is not intended to be merged. It simply exists as a way in which feedback can be given from other devs regarding a script that will be run on production.

The purpose of this script is to achieve to things for a specific domain:
(1) Change Location of Mobile Workers
All mobile workers that have a usertype user property matching rc will need to have their primary location changed to match what is saved under the user's rc_number property.

(2) Change Ownership of Cases
All cases matching a given set of case types will be queried and have their owner_id property changed. This will be changed to match the primary location of the mobile worker that created the specific case.

Though it was later made redundant though the first draft of this script makes use of SQLite for keeping track of what users/cases to update. Before executing the script, we would first do an initial run to retrieve and store all the relevant case and user IDs in SQLite. After this has been done, the script will retrieve and process chunks from the DB. Using this process will alleviate the need for managing chunking in ES, and will make starting/stopping/reversing the script more manageable (since we can simply query the DB to get the cases/users we need)

The updated approach didn't need to keep a track on ids updated for the user of tracking or resuming script in case of failures, so the use of temporary db was skipped.

Charl1996 · 2024-05-09T13:34:18Z

corehq/change_ownership.py

+            )
+            # Set this new descendant location as user location
+            user.location_id = loc.location_id
+            user.save(fire_signals=False)


Can we maybe use bulk_save?

zandre-eng · 2024-05-09T16:12:46Z

corehq/change_ownership.py

+### Task 2 & 3
+def transfer_case_ownership():
+    print("---MOVING CASE OWNERSHIP---")
+    case_ids = (


I'm curous to get thoughts from others on using ES to get the case IDs. When the script is run, we will be dealing with quite a lot of cases and I feel fetching them all at once in ES could lead to memory problems.

Yeah you don't want to try to hold all ids in memory at once. I don't have much experience paginating ES queries, but have done something like this in SQL on CommCareCase. You could use paginate_query, with the downside being you'd have to iterate over each partition but that isn't too bad. If helpful, here's an not yet merged example.

Thanks for the suggestion, this is very helpful. I've implemented something similar in my approach where I'm storing case IDs in SQLite. The script retrieves the relevant cases using paginate_query_across_partitioned_databases(). Once we have the case ids in the DB, we can then easily handle chunking the cases appropriately.

gherceg

Nice work. Just left a comment about paginating case ids. The Flake8 issue looks legit as well.

gherceg · 2024-05-10T01:53:59Z

corehq/change_ownership.py

+### Task 2 & 3
+def transfer_case_ownership():
+    print("---MOVING CASE OWNERSHIP---")
+    case_ids = (


Yeah you don't want to try to hold all ids in memory at once. I don't have much experience paginating ES queries, but have done something like this in SQL on CommCareCase. You could use paginate_query, with the downside being you'd have to iterate over each partition but that isn't too bad. If helpful, here's an not yet merged example.

mkangia

Hey @zandre-eng

This looks like a serious migration so I suggest we make this much more robust.
I have shared few ways to do so.

Looks like we are doing two updates in a single file? It is currently not clear what the entry point is. How about using vertical formatting or converting this to 2 management commands?

Good to check out some existing management commands for ideas.
Example:
https://github.com/dimagi/commcare-hq/blob/f0ad6f2433b0e43ab0ce29062ce10d6e1ade641a/corehq/motech/management/commands/create_repeat_records.py
https://github.com/dimagi/commcare-hq/blob/267dc437bfc0b2c158fad699948be654e6369a36/custom/covid/management/commands/update_owner_ids.py

mkangia · 2024-05-10T05:47:30Z

corehq/change_ownership.py

+
+
+success_count = fail_count = 0
+for user in valid_users:


I could not find valid_users. Where is this assigned?

Apologies, this is not part of the actual script and was left in here for testing by myself. Please ignore all the code that comes hereafter, I'll be removing it.

mkangia · 2024-05-10T05:54:20Z

corehq/change_ownership.py

+        loc = SQLLocation.objects.get(
+            domain=DOMAIN,
+            parent__location_id=user.location_id,
+            name=user_data['rc_number']


should we check if this value is even present before firing up the query?
Is the parent check only for the immediate parent? I believe that is what this query would do though not certain. You could look at get_queryset_descendants if you need all descendants.

Searching by name could yield multiple results? Are we to handle that?

Assuming these are quite a number of cases, you could extract this into a function and cache this fetch by name & parent_location_id, so we fetch a name only once. If the number of locations are less, you could simply create a dict mapping even before starting the updates.

Please refer to this comment.

I think that link didn't open up as you wanted it to @zandre-eng

Apologies, the link has been fixed now.

mkangia · 2024-05-10T05:54:59Z

corehq/change_ownership.py

+    try:
+        loc = SQLLocation.objects.get(
+            domain=DOMAIN,
+            parent__location_id=user.location_id,


Should we check if this value is present before firing the query?

Please refer to this comment.

Same, that link isn't opening to a comment.

The link is fixed for this comment as well.

mkangia · 2024-05-10T05:57:35Z

corehq/change_ownership.py

+            case_type('seance_educative'),
+            case_type('fiche_pointage')
+        )
+        .sort('opened_on')  # sort so that we can continue


In case the run fails in between?
How do we resume?

How about dumping these ids in a file first and then reading from there instead?

Do we need closed cases as well or just opened cases?

Both closed and open cases

mkangia · 2024-05-10T06:02:24Z

corehq/change_ownership.py

+    end_time = total_time = 0
+    success_count = fail_count = 0
+    start_time = time.time()
+    for case_obj in CommCareCase.objects.iter_cases(case_ids, domain=DOMAIN):


You should consider using chunked here along with with_progress_bar like here.
This way you can see progress and have better control on the chunk size which is 100 by default, but you can do more I guess, though even 100 is okay

I've implemented chunking along with the with_progress_bar function for both users and cases (bf3b264)

mkangia · 2024-05-10T06:03:02Z

corehq/change_ownership.py

+            case_block = CaseBlock(
+                create=False,
+                case_id=case_obj.case_id,
+                owner_id=user.location_id,


Should we avoid update if its already this owner?

This is a good point. We could probably log this separately and then simply ignore for updating.

I've implemented the same for users, so that we don't save user updates if they are already at the correct location (bf3b264)

mkangia · 2024-05-10T06:05:59Z

corehq/change_ownership.py

+    user_to_save = []
+    for idx, user in enumerate(users):
+        start_time = time.time()
+        percentage_done = round((idx / user_count) * 100, 2)


you could use with_progress_bar to do this for you.

I'll take a look, thanks!

I've gotten rid of all the timing/percentage metrics, making use of with_progress_bar (bf3b264)

mkangia · 2024-05-10T06:07:16Z

corehq/change_ownership.py

+            continue
+
+        try:
+            # Get a descendant of user location which has the same rc number


Do we need to look at just the immediate descendants?

I can confirm that all the mobile workers need to be moved one level down, so we only need to consider the immediate descendants.

mkangia · 2024-05-10T06:09:19Z

corehq/change_ownership.py

+            Skipped: {skip_count},
+            Total Time: {round(total_time / 60, 2)} minutes"
+        )
+    CommCareUser.bulk_save(user_to_save)


Are we holding off on saving users till the end? This could be problematic and is prone to failing, we should update them as we go.

Yeah I can see how this might become an issue. I'll change this so that we save the users in batches (similar to what we are doing for cases).

The user class now processes/saves users in the same way that we do for cases bf3b264

mkangia · 2024-05-10T06:12:49Z

corehq/change_ownership.py

+# DOMAIN = 'alafiacomm'
+DOMAIN = 'alafiacomm-prod'
+
+success_case_log_file_path = os.path.expanduser('~/script_success.log')


nit: I prefer adding timestamps to filenames so in case of re-run they don't conflict.

Good idea! Given the scope of how many cases we'll be dealing with, do you think it will be fine to have everything inside a single file, or should we chunk our logging?

Oh yes, good point.
Single file won't be great, though this is just a text file & this would just dump data in it.

I think we should also consider breaking it down by iterating one case type at a time? This way you can run multiple of these at once!

This is a great suggestion, and I think one that is easily achievable with how the script has been set up now. We can simply store the data for each case type in a separate DB table for processing.

I've updated the script to make CaseUpdater only handle a single case type now (559dfad).

zandre-eng · 2024-05-10T09:34:58Z

The amount/length of functions was starting to make it difficult to follow along with the script, so I've refactored everything into classes taking into account some of the feedback thus far.

zandre-eng · 2024-05-10T10:09:51Z

Looks like we are doing two updates in a single file? It is currently not clear what the entry point is. How about using vertical formatting or converting this to 2 management commands?

@mkangia I've split everything off into relevant classes to make keeping track of everything easier. The entry point would simply be calling the relevant class's start() function. Let me know if this refactoring helps clear things up.

mkangia

Hey @zandre-eng

Looking better. Consider the number of cases, we should do one case type at a time & run them in parallel?

mkangia · 2024-05-10T12:41:20Z

corehq/change_ownership.py

+        with open(file_path, 'a') as log:
+            for id in ids:                
+                log.write(f'{id} {msg_out}\n')
+            log.close()


this is going to be lots of file operations. Can we consider keeping the file open with a with block and if needed use flush to write to file though the writer might do it automatically for you. csv.writer might do it automatically for you.

Noting that this is no longer relevant since the script has moved away from logging into a file to storing all the necessary data in a SQLite DB.

custom/benin/management/commands/migrate_users_and_their_cases_to_new_rc_level.py

mkangia · 2024-06-27T20:57:12Z

custom/benin/management/commands/migrate_users_and_their_cases_to_new_rc_level.py

-        user.set_location(location)
-    log(f"User {user.username}:{user.user_id} location updated to {location.location_id}")
+        user.set_location(new_location)
+        user.unset_location_by_id(existing_location.location_id)


oh this was needed?

Yes, based on the discussion with Adam, it was decided to have only the rc location for the mobile worker.
(Side note: It also helps issues (location ambiguity) with Case Sharing to have only 1 location that own cases)

okay, good to not do this if existing_location is same as new_location, Not sure if that is possible or not.

new_location will always be different from existing_location for this script because it is child location of the existing location.

mkangia · 2024-07-01T21:34:53Z

custom/benin/tasks.py

+
+
+@task(queue='background_queue', ignore_result=True)
+def process_updates_for_village_async(domain, village, dry_run):


celery is great approach. Just one thing to note is that once things are in celery, you won't have any visibility/control on the updates as you lose the logging and once the tasks are queued up you don't know when they would get picked up.

Just good to consider that before you finalize the approach.

Noted. That is a valid concern.
I have added a new logger (file) for the script however I am not sure if it work when tasks are run across multiple celery machines.

zandre-eng · 2024-07-04T12:57:16Z

I will be merging in this PR so that we can have the celery task available in the production environment for when we need to execute this script.

custom/benin/tasks.py

custom/benin/change_ownership.py

ajeety4

Looks good and safe to me as this does not affect any current functionality being a management command.

zandre-eng added 2 commits May 9, 2024 14:01

function for changing user location

3cc7f6f

function for changing case ownership

006a266

zandre-eng added the Open for review: do not merge A work in progress label May 9, 2024

zandre-eng requested review from kaapstorm, mkangia, Charl1996 and ajeety4 May 9, 2024 12:24

Charl1996 reviewed May 9, 2024

View reviewed changes

zandre-eng added 5 commits May 9, 2024 17:10

bulk save users

ff5cdae

update owner id for cases

b47f557

catch error retrieving non commcare user

3d045ad

fix func name overshadowing import

2b80d2e

minor refactoring

c35c65d

zandre-eng commented May 9, 2024

View reviewed changes

gherceg reviewed May 10, 2024

View reviewed changes

mkangia requested changes May 10, 2024

View reviewed changes

zandre-eng added 2 commits May 10, 2024 09:48

remove test code

975cd7f

refactor all funcs into classes

bf3b264

add todos

d777a43

zandre-eng added 2 commits May 10, 2024 13:21

fix logging func writing empty lines if no msg

4a9d929

func to load all relevant case ids to file

51734c4

mkangia reviewed May 10, 2024

View reviewed changes

zandre-eng added 6 commits May 17, 2024 17:38

create sqlite db manager class + remove text logging

a6d54c5

refactor user class to use db manager

d436a14

refactor case class to use db manager

3aaa66f

custom table names for case/user updater

25e153b

correctly flatten ids

c87c4b8

correctly create tuple

192b6ee

nit: added a couple of logs

26e3e0d

ajeety4 reviewed Jun 20, 2024

View reviewed changes

custom/benin/management/commands/migrate_users_and_their_cases_to_new_rc_level.py Outdated Show resolved Hide resolved

ajeety4 added 4 commits June 20, 2024 22:26

fix - location codes are lowercase

e11aa9f

fix - use get_user_data() instead of user_data

ff44777

fix-case block as text

c4c23bc

nit: logs

80fc2c7

mkangia reviewed Jun 20, 2024

View reviewed changes

custom/benin/management/commands/migrate_users_and_their_cases_to_new_rc_level.py Show resolved Hide resolved

mkangia reviewed Jun 20, 2024

View reviewed changes

custom/benin/management/commands/migrate_users_and_their_cases_to_new_rc_level.py Outdated Show resolved Hide resolved

mkangia reviewed Jun 20, 2024

View reviewed changes

custom/benin/management/commands/migrate_users_and_their_cases_to_new_rc_level.py Outdated Show resolved Hide resolved

ajeety4 added 4 commits June 24, 2024 14:09

add custom.benin to installed apps

130b843

minor updates: execution time, device id and logs

3154c6f

unset exsiting location

c7e2cdb

nit: use . instead of : in file name for scp support

cb455b5

mkangia reviewed Jun 27, 2024

View reviewed changes

nit: avoid printing progress bar for 0 records

0e160f1

mkangia reviewed Jul 1, 2024

View reviewed changes

ajeety4 added 2 commits July 2, 2024 11:57

use python logging

9fc9cc1

adds option to run in celery

ed4e5e8

ajeety4 force-pushed the ze/change-case-and-user-ownership branch from d366fa8 to ed4e5e8 Compare July 2, 2024 06:27

zandre-eng mentioned this pull request Jul 3, 2024

Add New Celery Queue for Custom Script (Temporary) dimagi/commcare-cloud#6335

Closed

move script location

2098d59

zandre-eng marked this pull request as ready for review July 4, 2024 12:56

ajeety4 reviewed Jul 4, 2024

View reviewed changes

custom/benin/tasks.py Outdated Show resolved Hide resolved

ajeety4 reviewed Jul 4, 2024

View reviewed changes

custom/benin/change_ownership.py Outdated Show resolved Hide resolved

update queue + remove unused script file

1ca177a

ajeety4 approved these changes Jul 4, 2024

View reviewed changes

zandre-eng and others added 3 commits July 8, 2024 10:23

optional fetch villages by commune

d799f67

additional user stats in logging

2056184

Comment out assertion

e7ab359



		@task(queue='background_queue', ignore_result=True)
		def process_updates_for_village_async(domain, village, dry_run):

Change Case and User Ownership (Script) #34588

Are you sure you want to change the base?

Change Case and User Ownership (Script) #34588

Conversation

zandre-eng commented May 9, 2024 • edited by mkangia Loading

Technical Summary

Charl1996 May 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zandre-eng May 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gherceg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkangia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zandre-eng May 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zandre-eng May 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zandre-eng commented May 10, 2024

zandre-eng commented May 10, 2024

mkangia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajeety4 Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zandre-eng commented Jul 4, 2024

ajeety4 left a comment

Choose a reason for hiding this comment

zandre-eng commented May 9, 2024 •

edited by mkangia

Loading

Charl1996 May 9, 2024 •

edited

Loading

zandre-eng May 9, 2024 •

edited

Loading

zandre-eng May 10, 2024 •

edited

Loading

zandre-eng May 10, 2024 •

edited

Loading

ajeety4 Jul 1, 2024 •

edited

Loading