Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Refactor Gluon parameter serialization format #18749

Closed
wants to merge 2 commits into from

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Jul 17, 2020

Switch to npz serialization format https://numpy.org/devdocs/reference/generated/numpy.lib.format.html & preserve duplicate names of shared parameters (while storing them only once).

At this point in time, only Python bindings exist and no functionality is introduced to load the npz files in the backend (as it's not needed). Once the need arises, bindings can be introduced with ease via https://github.com/leezu/cnpy/tree/libzip

Fixes #18717 #18667

@leezu leezu requested a review from szha as a code owner July 17, 2020 23:26
@mxnet-bot
Copy link

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, clang, sanity, unix-cpu, windows-cpu, website, centos-cpu, windows-gpu, miscellaneous, unix-gpu, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor Author

leezu commented Jul 17, 2020

@zheyuye please verify if this addresses #18717

params[names[0]] = param._reduce().asnumpy()
for name in names[1:]:
# Shared parameters are known under multiple names. We save the
# parameter according to it's first name and save the mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo nit: its

szha
szha previously requested changes Jul 18, 2020
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test for backward compatibility. The deserialization test should not rely on mxnet serialization to avoid the situation when backward compatibility is broken in serialization and deserialization simultaneously.

@szha
Copy link
Member

szha commented Jul 18, 2020

Also, I'd like to see backend serialization/deserialization happen since it likely will be needed. If we merge this for python only first and there's any issue in adding that support later we won't be able to rollback.

@leezu
Copy link
Contributor Author

leezu commented Jul 18, 2020

There's no code in the backend that interacts with parameters. You can search for "MXNDArrayLoad" and you see that all code invoking this API has been removed. As the backend does not interact with the format, there is no point in adding APIs here, as their use is not yet defined.
The zipfile and numpy formats are trivial to work with, so there's no issue in adding the support at any time.

Please add test for backward compatibility

The old format is faulty and I don't see a strong reason to provide backwards compatibility throughout 2.x series. We may remove the C API for loading the old format in a later PR (for example when removing the ndarray operators). As of this PR, backwards compatibility provided as there is no change to C APIs and the Python code backs of to using the C API if the input is not of the new format. It's tested already via unittests and integration tests (via the model-zoo API). Unittest:

https://github.com/apache/incubator-mxnet/blob/a4ea4a8330251dd244947074cf4ddb875e611dd5/tests/python/unittest/test_ndarray.py#L1988-L2001

@zheyuye
Copy link
Contributor

zheyuye commented Jul 18, 2020

Yes, this PR do slove #18717. We might also need do refactor mx.npx.load to load 'npz' and return a parameters dict. What do you think?

@leezu
Copy link
Contributor Author

leezu commented Jul 18, 2020

I think we can delete mx.npx.load (or make it private to have a transition period). This PR marks it as deprecated. Do you see any problems?

@szha
Copy link
Member

szha commented Jul 19, 2020

their use is not yet defined

Frontend for JVM is planned in #17783. We will at least need to support inference with backend API which will require loading the parameters. MXNet will not be a Python-only framework and the design decision now on the serialization affects other frontends, so it must be taken with care.

@szha
Copy link
Member

szha commented Jul 20, 2020

Discussed offline and the conclusions are:

  • we will go forward with this PR to allow serialization of parameters without duplicates.
  • we will take into account serialization of symbol graphs in the same format later on, potentially under a different file name/extension. serialization of graph is optional and we need to support serializing parameters alone.
  • @leezu will post an RFC to summarize the serialization use cases in frontends and backend.

@szha szha dismissed their stale review July 20, 2020 19:22

see comment above

@leezu
Copy link
Contributor Author

leezu commented Jul 21, 2020

I'll close this and proceed with a PR of larger scope later.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants