Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor archive migrations #4532

Merged
merged 31 commits into from
Nov 3, 2020

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Oct 29, 2020

This PR primarily refactors the archive migrations, to provide an ArchiveMigratorAbstract interface, which is agnostic to the internal implementation of the archive (i.e. not dependent on the presence of metadata.json and data.json)
This will allow for subsequent changes to the archive format.
To facilitate this:

  • MIGRATE_FUNCTIONS now includes both the to and from versions of the migration,
  • this allows for a change, from a recursive migration approach to pre-computing the migration path, then applying the migrations iteratively
  • this also allows for a progress reporter of the migration steps
  • the signature of migration step functions has been changed, such that they now only receive the uncompressed archive folder, and not also specifically the data.json and metadata.json dicts.
  • instead, the folder is wrapped in a new CacheFolder class, which caches file writes in memory, so reading of the files from the file system only happen once and they are written after all the migrations have finished.
  • the --verbose flag has been added to verdi export migrate, unfortunately only the long-format can be used, since -v is already reserved for the version.
  • consolidated the extracting of tar/zip into safe_extract_tar/safe_extract_zip.
    These include callbacks, for which I created aiida.common.progress_reporter::create_callback, to provide a convenience function.
  • moved all migration tests to pytest

This is the output of an example migration:

$ verdi export migrate tmp/mount_folder/tests/static/export/migrate/export_v0.4_simple.aiida --verbosity DEBUG -v 0.9 -f -F tar.gz tmp/mount_folder/tests/static/export/migrate/test.tar.gz 
Migration pathway: 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9
Extracting archive to temporary folder
Performing migrations: 0.8 -> 0.9        100.0%|█████████████████████████████████████████████████████| 5/5
Re-compressing archive as 'tar.gz'
Moving archive to: tmp/mount_folder/tests/static/export/migrate/test.tar.gz
Success: migrated the archive to version 0.9

@chrisjsewell chrisjsewell requested a review from ltalirz October 29, 2020 16:11
@chrisjsewell
Copy link
Member Author

@ltalirz I have not yet updated the tests, but you can start looking at this 😄

@ltalirz
Copy link
Member

ltalirz commented Oct 29, 2020

  • -v is already reserved for the version.

Hm yeah, I think this should be deprecated.

@ltalirz
Copy link
Member

ltalirz commented Oct 29, 2020

Looks very nice!

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Oct 29, 2020

Oh and I also need to update cmd_import, to improve the migration integration (e.g. to use the correct verbosity)

@ltalirz
Copy link
Member

ltalirz commented Oct 29, 2020

Oh and I also need to update cmd_import

By the way, the cmd_import can probably be shortened a lot - there is a lot of duplication for archives coming from urls or from files that is mostly unnecessary

@ltalirz
Copy link
Member

ltalirz commented Oct 29, 2020

Came across this #3156 - just in case it's easy to fix

@ltalirz
Copy link
Member

ltalirz commented Oct 29, 2020

and this #3193

@ltalirz
Copy link
Member

ltalirz commented Oct 29, 2020

And curious to see how migration tests will be affected in #3678

@chrisjsewell
Copy link
Member Author

And curious to see how migration tests will be affected in #3678

some of them should definitely be quicker, since get_json_files was extracting the whole archive, just to retrieve data.json and metadata.json, and so I've replaced that with the new read_file_in_zip and read_file_in_tar functions

@chrisjsewell
Copy link
Member Author

ok @ltalirz I have updated all the tests!

I moved all migration ones to pytest, and additionally I have consolidated the extracting of tar/zip into safe_extract_tar/safe_extract_zip.
These include callbacks, for which I created aiida.common.progress_reporter::create_callback, to provide a convenience function.
This means that the CLI now includes a progress bar for this:

$ verdi export migrate tmp/mount_folder/tests/static/export/migrate/export_v0.4_simple.aiida --verbosity DEBUG -v 0.9 -f -F tar.gz tmp/mount_folder/tests/static/export/migrate/test.tar.gz 
Migration pathway: 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9
Extracting archive to temporary folder
Extracting zip files                     100.0%|██████████████████████| 76/76
Performing migrations: 0.8 -> 0.9        100.0%|████████████████████████| 5/5
Re-compressing archive as 'tar.gz'
Moving archive to: tmp/mount_folder/tests/static/export/migrate/test.tar.gz
Success: migrated the archive to version 0.9

In terms of test timings, here are the top ones. The long-running ones all belong to test_links.py, which I have not touched (and don't intent to in this PR 😬).

9.72s call     tests/tools/importexport/orm/test_links.py::TestLinks::test_high_level_workflow_links
5.32s call     tests/tools/importexport/orm/test_links.py::TestLinks::test_complex_workflow_graph_export_sets
3.78s call     tests/tools/importexport/orm/test_links.py::TestLinks::test_link_flags
0.90s call     tests/tools/importexport/test_complex.py::TestComplex::test_reexport
0.73s call     tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.2]
0.66s call     tests/tools/importexport/orm/test_calculations.py::TestCalculations::test_calcfunction
0.66s call     tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.6]
0.64s call     tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.4]
0.63s call     tests/tools/importexport/orm/test_users.py::TestUsers::test_non_default_user_nodes
0.62s call     tests/tools/importexport/orm/test_links.py::TestLinks::test_complex_workflow_graph_links
0.62s call     tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.7]
0.62s call     tests/tools/importexport/orm/test_comments.py::TestComments::test_reimport_of_comments_for_single_node
0.61s call     tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.5]
0.60s call     tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.8]
0.59s call     tests/tools/importexport/orm/test_extras.py::TestExtras::test_extras_import_mode_correct
0.59s call     tests/tools/importexport/test_prov_redesign.py::TestProvenanceRedesign::test_node_process_type
0.56s call     tests/tools/importexport/test_complex.py::TestComplex::test_complex_graph_import_export
0.55s call     tests/tools/importexport/orm/test_logs.py::TestLogs::test_reimport_of_logs_for_single_node
0.52s call     tests/tools/importexport/orm/test_computers.py::TestComputer::test_different_computer_same_name_import

@chrisjsewell chrisjsewell marked this pull request as ready for review October 30, 2020 01:15
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @chrisjsewell , looks great!

I started going through this; will continue tomorrow at some point

aiida/cmdline/commands/cmd_export.py Show resolved Hide resolved
aiida/common/progress_reporter.py Show resolved Hide resolved
aiida/tools/importexport/archive/common.py Show resolved Hide resolved
aiida/tools/importexport/archive/common.py Outdated Show resolved Hide resolved
aiida/tools/importexport/archive/migrations/__init__.py Outdated Show resolved Hide resolved
aiida/tools/importexport/archive/migrations/v01_to_v02.py Outdated Show resolved Hide resolved
@chrisjsewell chrisjsewell requested a review from ltalirz October 30, 2020 03:03
@codecov
Copy link

codecov bot commented Oct 30, 2020

Codecov Report

Merging #4532 into develop will increase coverage by 0.12%.
The diff coverage is 86.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4532      +/-   ##
===========================================
+ Coverage    79.40%   79.51%   +0.12%     
===========================================
  Files          480      482       +2     
  Lines        35087    35333     +246     
===========================================
+ Hits         27856    28092     +236     
- Misses        7231     7241      +10     
Flag Coverage Δ
django 73.67% <86.00%> (+0.15%) ⬆️
sqlalchemy 72.85% <85.80%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ida/tools/importexport/archive/migrations/utils.py 86.21% <ø> (ø)
...ools/importexport/archive/migrations/v01_to_v02.py 75.00% <73.92%> (ø)
aiida/tools/importexport/archive/common.py 79.56% <77.22%> (-16.73%) ⬇️
aiida/cmdline/commands/cmd_import.py 82.86% <80.33%> (+3.33%) ⬆️
aiida/cmdline/commands/cmd_export.py 94.02% <89.29%> (+2.81%) ⬆️
aiida/tools/importexport/archive/migrators.py 90.84% <90.84%> (ø)
...ools/importexport/archive/migrations/v03_to_v04.py 91.16% <93.34%> (ø)
aiida/common/progress_reporter.py 92.46% <100.00%> (+1.35%) ⬆️
aiida/tools/importexport/archive/__init__.py 100.00% <100.00%> (ø)
.../tools/importexport/archive/migrations/__init__.py 100.00% <100.00%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ff07c1...982500a. Read the comment docs.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @chrisjsewell

I went through the code except for the individual tests.

As discussed, we may want to think a bit more about the cachefolder approach

aiida/cmdline/commands/cmd_import.py Outdated Show resolved Hide resolved
aiida/tools/importexport/archive/common.py Outdated Show resolved Hide resolved
aiida/tools/importexport/archive/migrators.py Show resolved Hide resolved


@pytest.fixture()
def core_archive():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the core archive?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in general, do we want to define constants as fixtures as well?
it's true that then you don't need to import them, but perhaps importing constants actually makes it easier to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core archive or ones that point to files in the aiida-core repository, as opoosed to the external ones, which point to the aiida-export-migration-tests package.
Although I did point out that having the archives in a separate repo may not really be necessary anymore: aiidateam/aiida-export-migration-tests#13

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was just the easiest way really, to minimise the changes require during the conversion pytest. I don't think its too bad, if they are tightly scoped (i.e. the conftest only applies to this single folder),
plus its a lot easier now that vs code's intellisense recognises pytest fixtures, so that they exactly the same as if you had imported them 😄

Copy link
Member

@ltalirz ltalirz Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I did point out that having the archives in a separate repo may not really be necessary anymore: aiidateam/aiida-export-migration-tests#13

Right, you are very welcome to move them back to the repo (not sure when someone will touch these tests for the next time).

By the way, there is also the git-lfs for storing larger binary files so it wouldn't even clog up repository size; I think git now automatically suggests it when trying to store larger files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting thanks I will check that out

tests/tools/importexport/migration/test_migration.py Outdated Show resolved Hide resolved
allow for only a single copy of a dict to exist in memory at any time
@chrisjsewell
Copy link
Member Author

chrisjsewell commented Oct 31, 2020

@ltalirz in b44e350, I made the alterations to CacheFolder, to allow for only one copy of a dict to be in memory. The catch is that obviously any mutations of such a dict will affect the cache, so hopefully I have made that clear in CacheFolder.load_json

Unscientific test of caching performance, with test duration for: tests/tools/importexport/migration/test_migration.py::TestExportFileMigration::test_migrate_to_newest[0.2]

CacheFolder._max_items = 100

0.64s, 0.65s, 0.63s, 0.67s, 0.66s

CacheFolder._max_items = 0 (i.e. always read/write)

0.65s, 0.70s, 0.98s, 0.68s, 0.77s

But obviously its a bit hard to tell definitively from such a small archive.

what do you think?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Nov 2, 2020

@ltalirz, ok I've moved the compression code to the common module and added a progress bar.

Run through below with both zip and tar, and speeds seem as expected.
The longest part is the re-compression (noticeably longer for tar).
As such, the last thing I am going to do is figure the best way in cmd_import to not have to (needlessly) compress after the migration, allowing the import to read directly on the migrated tempfolder.

(base) aiida@162adc8d0bf8:/$ verdi export migrate tmp/mount_folder/two_dimensional_database.tar.gz tmp/mount_folder/output_08.zip --verbosity DEBUG --version 0.8
Reading archive version
Migration pathway: 0.3 -> 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8
Extracting archive to temporary folder
Extracting tar files                     100.0%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 618360/618360
Performing migrations: 0.7 -> 0.8        100.0%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5
Flushing cache
Re-compressing archive as 'zip'
Compressing objects as zip               100.0%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614731/614731
Moving archive to: tmp/mount_folder/output_08.zip
Cleaning temporary folder
Success: migrated the archive to version 0.8
(base) aiida@162adc8d0bf8:/$ verdi export migrate tmp/mount_folder/output_08.zip tmp/mount_folder/output_09.tar.gz --verbosity DEBUG --version 0.9 -F tar.gz
Reading archive version
Migration pathway: 0.8 -> 0.9
Extracting archive to temporary folder
Extracting zip files                     100.0%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614731/614731
Performing migrations: 0.8 -> 0.9        100.0%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Flushing cache
Re-compressing archive as 'tar.gz'
Compressing objects as tar               100.0%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 614731/614731
Cleaning temporary folder
Success: migrated the archive to version 0.9

@chrisjsewell
Copy link
Member Author

Ok done. So there is no longer a compression/extraction during an verdi import auto-migration (it reads directly from the migrated folder in the temp folder):

$ verdi import -w https://archive.materialscloud.org/record/2018.0001/v1
Info: retrieving archive URLS from https://archive.materialscloud.org/record/2018.0001/v1
Success: 2 archive URLs discovered and added
Info: downloading archive https://archive.materialscloud.org/record/file?record_id=20&file_id=c8c80235-cf81-4660-b0cf-3ca5824c3725&filename=SSSP_efficiency_pseudos.aiida
Success: archive downloaded, proceeding with import
Info: incompatible version detected for https://archive.materialscloud.org/record/file?record_id=20&file_id=c8c80235-cf81-4660-b0cf-3ca5824c3725&filename=SSSP_efficiency_pseudos.aiida, trying migration
Reading archive version
Migration pathway: 0.3 -> 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9
Extracting archive to work directory
Info: proceeding with import of migrated archive                                                                                                                                            

IMPORT
--------  ---------
Archive   extracted

Parameters
--------------------------  ------
Comment rules               newest
New Node Extras rules       import
Existing Node Extras rules  kcl
                                                                                                                                                                                            
Summary
-----------------------  ---------------
Auto-import Group label  20201102-132157
User(s)                  1 existing
Node(s)                  85 existing
Group(s)                 1 existing

Success: imported archive https://archive.materialscloud.org/record/file?record_id=20&file_id=c8c80235-cf81-4660-b0cf-3ca5824c3725&filename=SSSP_efficiency_pseudos.aiida
Info: downloading archive https://archive.materialscloud.org/record/file?record_id=20&file_id=bb240983-f808-4a0c-aba9-ce4a728820ac&filename=SSSP_accuracy_pseudos.aiida
Success: archive downloaded, proceeding with import
Info: incompatible version detected for https://archive.materialscloud.org/record/file?record_id=20&file_id=bb240983-f808-4a0c-aba9-ce4a728820ac&filename=SSSP_accuracy_pseudos.aiida, trying migration
Reading archive version
Migration pathway: 0.3 -> 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9
Extracting archive to work directory
Info: proceeding with import of migrated archive                                                                                                                                            

IMPORT
--------  ---------
Archive   extracted

Parameters
--------------------------  ------
Comment rules               newest
New Node Extras rules       import
Existing Node Extras rules  kcl
                                                                                                                                                                                            
Summary
-----------------------  ---------------
Auto-import Group label  20201102-132200
User(s)                  1 existing
Node(s)                  85 existing
Group(s)                 1 existing

Success: imported archive https://archive.materialscloud.org/record/file?record_id=20&file_id=bb240983-f808-4a0c-aba9-ce4a728820ac&filename=SSSP_accuracy_pseudos.aiida

@chrisjsewell
Copy link
Member Author

@ltalirz ready for your review 👍

@ramirezfranciscof ramirezfranciscof added this to the v1.5.0 milestone Nov 2, 2020
Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @chrisjsewell - just one request: could you check whether there currently are tar files in the test set?
If not, could you please add one (e.g. use one of the early zipped archives and tar it), so that we are safe from accidentally breaking tar in the future.

I'll be checking the timings & memory consumption in the meanwhile

aiida/cmdline/commands/cmd_import.py Show resolved Hide resolved
aiida/tools/importexport/archive/common.py Show resolved Hide resolved
@ltalirz
Copy link
Member

ltalirz commented Nov 2, 2020

@chrisjsewell In my test, the run in this PR takes quite a bit more memory than both develop and v1.4.2. Can we get this down?

It also runs slower than develop, but if I understand correctly, we accidentally introduced a bug into the migration (is that correct? if yes, can we add a test that would have flagged this?).

This PR (5d62743): 16m18s; top memory consumption ~2200 MB

$ time verdi export migrate two_dimensional_database.aiida 2d_migrated_develop.aiida
Reading archive version
^[Migration pathway: 0.3 -> 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9
Extracting archive to work directory
Re-compressing archive as 'zip'
Moving archive to: 2d_migrated_develop.aiida
Success: migrated the archive to version 0.9

real    16m18.294s
user    12m1.563s
sys     1m43.576s

image

develop: 12m28s; top memory consumption ~1400MB

$ time verdi export migrate two_dimensional_database.aiida 2d_migrated_develop.aiida
Success: migrated the archive from version 0.3 to 0.9

real    12m28.035s
user    8m39.272s
sys     1m27.630s

image

v1.4.2: 19m30s; top memory consumption ~1400MB

$ time verdi export migrate two_dimensional_database.aiida 2d_migrated_v1.4.2.aiida
Success: migrated the archive from version 0.3 to 0.9

real    19m30.655s
user    10m21.675s
sys     1m48.365s

image

chrisjsewell and others added 2 commits November 3, 2020 01:43
Co-authored-by: Leopold Talirz <leopold.talirz@gmail.com>
@chrisjsewell
Copy link
Member Author

chrisjsewell commented Nov 3, 2020

Can we get this down?

Firstly I looked at procpath and it seems way over-complicated 😬.
See below, I've used https://github.com/jeetsukumaran/Syrupy which is a lot simpler and easier to customise.

In 4dd7fa8 I have reduced the memory usage to the same as develop (and also it looks to have reduced the time a bit)

This is the run when only making the _write_object modification (you see the same memory signature as your plots, but with the large peak removed).

$ cd tmp
$ syrupy.py -i 1 --separator=, --no-align verdi export migrate mount_folder/two_dimensional_database.tar.gz mount_folder/output_09.zip
SYRUPY: Writing process resource usage samples to 'syrupy_20201103014840.ps.log'
SYRUPY: Writing raw process resource usage logs to 'syrupy_20201103014840.ps.raw'
SYRUPY: Executing command 'verdi export migrate mount_folder/two_dimensional_database.tar.gz mount_folder/output_09.zip'
SYRUPY: Redirecting command output stream to 'syrupy_20201103014840.out.log'
SYRUPY: Redirecting command error stream to 'syrupy_20201103014840.err.log'
SYRUPY: Completed running: verdi export migrate mount_folder/two_dimensional_database.tar.gz mount_folder/output_09.zip
SYRUPY: Started at 2020-11-03 01:48:40.621727
SYRUPY: Ended at 2020-11-03 02:12:45.366769
SYRUPY: Total run time: 0 hour(s), 24 minute(s), 04.745042 second(s)
$ python -c 'import pandas as pd; ax = pd.read_csv("syrupy_20201103014840.ps.log").set_index("ELAPSED").plot(y="RSS", grid=True); ax.get_figure().savefig("mount_folder/output.png")'

output

This is the run when also removing the full json.dumps check in write_json

$ syrupy.py -i 1 --separator=, --no-align verdi export migrate mount_folder/two_dimensional_database.tar.gz mount_folder/output_09.zip
SYRUPY: Writing process resource usage samples to 'syrupy_20201103023904.ps.log'
SYRUPY: Writing raw process resource usage logs to 'syrupy_20201103023904.ps.raw'
SYRUPY: Executing command 'verdi export migrate mount_folder/two_dimensional_database.tar.gz mount_folder/output_09.zip'
SYRUPY: Redirecting command output stream to 'syrupy_20201103023904.out.log'
SYRUPY: Redirecting command error stream to 'syrupy_20201103023904.err.log'
SYRUPY: Completed running: verdi export migrate mount_folder/two_dimensional_database.tar.gz mount_folder/output_09.zip
SYRUPY: Started at 2020-11-03 02:39:04.003449
SYRUPY: Ended at 2020-11-03 03:01:28.654327
SYRUPY: Total run time: 0 hour(s), 22 minute(s), 24.650878 second(s)
$ python -c 'import pandas as pd; ax = pd.read_csv("syrupy_20201103023904.ps.log").set_index("ELAPSED").plot(y="RSS", grid=True); ax.get_figure().savefig("mount_folder/output.png")'

output

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Nov 3, 2020

It also runs slower than develop, but if I understand correctly, we accidentally introduced a bug into the migration (is that correct? if yes, can we add a test that would have flagged this?).

No the bug was in this PR, for unpacking tar. You would not be able to add a normal test, since it was just writing the same files multiple times, but not actually causing an error (just overwriting it).

Obviously this could be picked up in a benchmark timing test.
Currently, there is only one for export and import (of a dynamically created provenance graph) but not for migration.
Its a bit unclear what a benchmark test for migration would be though, i.e. if you set a migration from v0.1 to newest, then this will inherently become slower as you add new versions.
You could think to run it against a migration of the last 5 versions or something, but I'm not sure if that is particularly helpful?

Generally though, I think as long as the cpu/memory performance is equal to v1.4.2 thats fine for now, since obviously the performance enhancements will come with the new archive format

@chrisjsewell
Copy link
Member Author

could you check whether there currently are tar files in the test set?
If not, could you please add one (e.g. use one of the early zipped archives and tar it), so that we are safe from accidentally breaking tar in the future.

Added in 348242f

@chrisjsewell chrisjsewell requested a review from ltalirz November 3, 2020 04:24
@ltalirz
Copy link
Member

ltalirz commented Nov 3, 2020

In 4dd7fa8 I have reduced the memory usage to the same as develop

Thanks for fixing!

This is the run when also removing the full json.dumps check in write_json

Ah, I see... I was wondering where this weird oscillation came from; so it's throwing the JSON string away but only after it reaches some buffer size?

No the bug was in this PR, for unpacking tar.

Ok.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @chrisjsewell , this is good to go from my side

@chrisjsewell
Copy link
Member Author

Thanks!

so it's throwing the JSON string away but only after it reaches some buffer size?

Yeh I guess something like that 🤷‍♂️

@chrisjsewell chrisjsewell merged commit 8326050 into aiidateam:develop Nov 3, 2020
@chrisjsewell chrisjsewell deleted the archive/migrate-refactor branch November 3, 2020 09:31
@chrisjsewell chrisjsewell mentioned this pull request Nov 18, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants