Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only write the modified tablets to file system. #304

Merged
merged 6 commits into from
Mar 9, 2018

Conversation

jetfuel
Copy link
Collaborator

@jetfuel jetfuel commented Mar 9, 2018

Move some functions definitions from headers to cc files.

The approach is to track what tag is modified in the storage during training. Only process the modified tablets at the PersistToDisk.

This should cover the first step in #284.
We should also look into if we can only call persistToDisk once at the end of training. Protobuf always serialize all data to the filesystem and that's a huge waste.

Move some functions definitions from headers to cc files.
@jetfuel jetfuel self-assigned this Mar 9, 2018
}

Record Tablet::AddRecord() {
parent()->MarkTabletModified(internal_encoded_tag_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkTableModified use a set and that takes an overhead if AddRecord is triggered frequently(e.g. in a scalar), and theoretically, it should be placed in WRITE_GUARD so that modifying a record will also mark the tablet "modified". But currently, this implementation is ok.

There is a trick that can be faster, assigning each tablet an internal member id_ (continuously ascending from zero), and make modified_tablet_set_ a vector, the marking logic like this

vector<bool> modified_tablet_set_; // each time add a tablet will append a true
Record Tablet::AddRecord() {
  parent()->MarkTabletModified(id_);
  // ...
}
inline void Storage::MarkTabletModified(int tagid) {
  modified_tablet_set_[tagid] = true;
}
void Storage::PersistToDisk(const std::string& dir) ... // clear states

@jetfuel jetfuel merged commit 8f9339e into PaddlePaddle:develop Mar 9, 2018
@jetfuel jetfuel deleted the onlyWriteModifiedTablets branch March 9, 2018 21:40
@jetfuel
Copy link
Collaborator Author

jetfuel commented Mar 10, 2018

I spent some time on converting to vector, but encounter some errors when adding records. Will put this on hold for a bit. :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants