Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing] Test that the migrated database works as expected #2789

Closed
szoupanos opened this issue Apr 22, 2019 · 4 comments · Fixed by #3090
Closed

[Testing] Test that the migrated database works as expected #2789

szoupanos opened this issue Apr 22, 2019 · 4 comments · Fixed by #3090
Assignees
Labels
Milestone

Comments

@szoupanos
Copy link
Contributor

Since a lot of information are migrated (attributes, extras and settings) it is good to get a project that has started in Django EAV and migrate it to Django JSONB and see if the calculations continue correctly and if the engine works as expected with the migration of the settings.

@szoupanos szoupanos added this to the v1.0.0 milestone Apr 22, 2019
@szoupanos szoupanos self-assigned this Apr 22, 2019
@szoupanos
Copy link
Contributor Author

@giovannipizzi @sphuber
I just checked the attributes of some nodes of Davide's database (that I migrated from the Django EAV to Django JSONB) and I see that some of the attributes have not been migrated. They should be ~500 from a group of 1400 elements.

So it needs some investigation of what went wrong.
I don't know where I can put the files for other people to have a look (maybe it is not the best idea to attach them to the ticket)

@szoupanos
Copy link
Contributor Author

The files were added to the common GDrive (MARVEL and AiiDA/AiiDA/AiiDA Django JSONB/migration problems)

The attributes and the extras of all the nodes belonging to a specific group were printed before and after the JSONB migration. So the problem is that several nodes after the migration have null value in their attributes and extras.

The database is the one that I got from Davide and can be found at theospc22 - I can give access to whoever wants to have a look (I will also look into this).

@sphuber
Copy link
Contributor

sphuber commented Jun 15, 2019

The problem with the migration was that the llazy fetcher that was retrieving nodes to migrate in batches was not sorting the query sets in any way. The result is that if your node count is larger than the batch size, which is currently a 1000, then you run the risk of fetching the same node twice. Since the code fetches exactly as many nodes as there are in the databsae, this runs the risk of some nodes never being fetched and migrated. This is where the nodes without attributes comes from. Sorting the fetch batches by id solves the problem. I have pushed the fix to the django_jsonb branch and have performed the migration after the fix. Now the migration seems to have worked.

Original database

aiida_jsonb_original=# SELECT count(*) FROM db_dbnode LEFT JOIN db_dbextra ON db_dbnode.id = db_dbextra.dbnode_id WHERE db_dbextra.id IS NULL;
  count  
---------
 5970100
(1 row)

aiida_jsonb_original=# SELECT count(*) FROM db_dbnode LEFT JOIN db_dbattribute ON db_dbnode.id = db_dbattribute.dbnode_id WHERE db_dbattribute.id IS NULL;
 count  
--------
 739472
(1 row)

Broken migration

aiida_jsonb=# select count(*) from db_dbnode where extras = '{}';
  count  
---------
 4846303
(1 row)

aiida_jsonb=# select count(*) from db_dbnode where extras is null;
  count  
---------
 3258942
(1 row)

aiida_jsonb=# select count(*) from db_dbnode where attributes = '{}';
 count  
--------
 507864
(1 row)

aiida_jsonb=# select count(*) from db_dbnode where attributes is null;
  count  
---------
 3258942
(1 row)

aiida_jsonb=# select count(*) from db_dbnode where attributes is null and node_type = 'data.structure.StructureData.';
 count  
--------
 636890
(1 row)

Fixed migration

aiida_jsonb_migrated=# select count(*) from db_dbnode where extras = '{}';
  count  
---------
 5970096
(1 row)

aiida_jsonb_migrated=# select count(*) from db_dbnode where extras is null;
 count
-------
     0
(1 row)

aiida_jsonb_migrated=# select count(*) from db_dbnode where attributes = '{}';
 count  
--------
 739472
(1 row)

aiida_jsonb_migrated=# select count(*) from db_dbnode where attributes is null;
 count
-------
     0
(1 row)

aiida_jsonb_migrated=# select count(*) from db_dbnode where attributes is null and node_type = 'data.structure.StructureData.';
 count
-------
     0
(1 row)

As you can see, the new numbers match perfectly with that of the original database.
The only difference is that there now 5,970,096 nodes with no extras, so 4 less than the original. The number of nodes seems to have stayed the same. So I guess there are 4 nodes that have gotten extras during the migration that they didn’t have before. This is due a migration that moves the hidden attribute from code nodes to extras. Proof:

aiida_jsonb_original=# select count(*) from db_dbattribute JOIN db_dbnode ON db_dbattribute.dbnode_id = db_dbnode.id WHERE db_dbnode.type = 'code.Code.' and db_dbattribute.key = 'hidden';
 count
-------
     4
(1 row)

@szoupanos
Copy link
Contributor Author

I also had a look at Seb's fix and indeed the sorting seems to fix the problem.
To reproduce the problem, there is a test called TestAttributesExtrasToJSONMigrationManyNodes which create an predefined number of nodes with different attributes and extras. By creating a sufficient number of nodes and reducing the batch size of the migration the problem can be demonstrated. In the same way, we can demonstrate that the fix works.

I also compared the database that resulted from his final migration and checked for nodes that failed to migrate correctly in my initial attempt. All the nodes that I checked (I checked a few) had attributes and extras and also had the expected values.

  • One concern could be the overhead of sorting on the id (for a migration of a big database). Maybe this is not a lot if we have the right indices.
  • Another comment would be why the lazy fetcher was failing when the result was not sorted - I didn't have time to investigate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants