Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up technical debt #20

Merged
merged 13 commits into from
Aug 8, 2022

Conversation

georgepisaltu
Copy link
Contributor

Fixes #14 #19

This PR cleans up various parts of the codebase which accumulated technical debt. Here are the main points:

  • Have proper error handling and propagation through subcommands (using thiserror).
  • Added --overwrite option to subcommands which didn't have it already and write files to the file system.
  • Added progress tracker util which logs progress for long running tasks. The exception is archive create where the function we are using from tar is a blackbox which takes file paths as input, so there was no way to integrate the progress tracker.
  • Added util for counting entries in an LMDB database. There is a PR open in lmdb to do this, but the project has been inactive for a long time so I don't think it will get merged soon.
  • Made DB-PATH parameter consistent across the codebase. It now always means the path of the directory with the storage.lmdb file.

Added @sacherjj for review of the CLI usage and overall UX.

This commit removes the old subcommand success reporting system where
subcommands would log errors and return a `bool`. Now all
subcommand errors are defined with `thiserror` and are all unified in
a big error enum at the `subcommand` mod level. This means errors
are properly propagated all the way to the top level and can be handled
in all lower levels as appropriate.

Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Rename the entry counting util from `entries_count` to `entry_count`
and fix various unnecessary mutable borrows

Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Comment on lines 18 to 19
while self.processed > (self.total_to_process * self.progress_factor as usize) / 20 {
log_progress(self.progress_factor * 5);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two magic numbers (20 and 5) are bound - it could be better to just derive one from the other, like:

steps = 20;
progress_multiplier = (100/steps)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

use lmdb_sys::{mdb_stat, MDB_stat};

/// Retrieves the number of entries in a database.
#[allow(unused)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, forgot to delete it. It's gone now.

assert_eq!(entry_count(&txn, db).unwrap(), 2);
txn.commit().unwrap();
};
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but can you extend the test to del the previously added element and call the entry_count(), just to prove how the system behaves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +6 to +7
pub use create::Error as CreateError;
pub use unpack::Error as UnpackError;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can get rid of this name aliasing, which can be confusing?
Did you consider just renaming multiple pub enum Error to specific versions per module, like pub enum ArchiveError, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly did it this way for 2 reasons:

  • Flexibility - you can use whatever name you need for the error you're importing (similar to the library style) and you can avoid having your error names to verbose (in order to avoid name conflicts)
  • Organizing - no matter what module you're working in, functions return Error and you construct Error variants - it is a method of organizing and separating the main error thrown by this module and other error types used.

As you can see, for me at least it's not confusing. I personally would like to keep it this way, but if people feel strongly about this, we can change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of the approach taken in this PR personally. I dislike the stuttering effect of having e.g. unpack::UnpackError.

I'm also ok with avoiding aliases if folks have an objection, but we generally end up with almost the same readability - e.g. Unpack(#[from] unpack::Error) instead of Unpack(#[from] UnpackError).

@@ -50,7 +50,7 @@ fn archive_create_roundtrip() {
let out_dir = tempfile::tempdir().unwrap();
let archive_path = dst_dir.path().join("test_archive.tar.zst");
// Create the compressed archive.
assert!(pack::create_archive(&src_dir, &archive_path).is_ok());
assert!(pack::create_archive(&src_dir, &archive_path, false).is_ok());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be valuable to have at least one test case where overwrite=true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new test.

// Destination directory isn't empty.
let root_dst = tempfile::tempdir().unwrap();
let existing_file = NamedTempFile::new_in(&root_dst).unwrap();
assert!(pack::create_archive(&src_dir, existing_file.path(), false,).is_err());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert!(pack::create_archive(&src_dir, existing_file.path(), false,).is_err());
assert!(pack::create_archive(&src_dir, existing_file.path(), false).is_err());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +61 to +64
Err(Error::Destination(IoError::new(
ErrorKind::InvalidInput,
"not an empty directory",
)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test that proves this condition is detected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is a test for this, archive_unpack_existing_destination. Its code comment was outdated, but now it should be fine.

let bytes_read = self.reader.read(buf)?;
if let Some(progress_tracker) = self.maybe_progress_tracker.as_mut() {
progress_tracker.advance(bytes_read, |completion| {
info!("Archive reading {}% complete...", completion)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info!("Archive reading {}% complete...", completion)
info!("Decompression {}% complete...", completion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but it's not really decompression but archive reading, though I couldn't find a better name for this. I'm open to other suggestions as well and if we find no better ones, I'll switch to your suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's both really isn't it? Maybe "Archive decompressing and reading"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with "Archive reading and decompressing" because it's happening in that order. Let me know if it's ok with you both.

let path = matches.value_of(DB_PATH).unwrap();
let failfast = !matches.is_present(NO_FAILFAST);
let specific = matches.value_of(SPECIFIC);
let start_at: usize = matches
.value_of(START_AT)
.unwrap()
.expect("should have a default")
.parse()
.expect("Value of \"--start-at\" must be an integer.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.expect("Value of \"--start-at\" must be an integer.");
.unwrap_or_else(|_| panic!("Value of \"--{}\" must be an integer.", START_AT));

A nit, just to avoid hardcoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Copy link
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot cleaner overall - nice work!

src/common/lmdb_utils.rs Outdated Show resolved Hide resolved
Comment on lines +6 to +7
pub use create::Error as CreateError;
pub use unpack::Error as UnpackError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of the approach taken in this PR personally. I dislike the stuttering effect of having e.g. unpack::UnpackError.

I'm also ok with avoiding aliases if folks have an objection, but we generally end up with almost the same readability - e.g. Unpack(#[from] unpack::Error) instead of Unpack(#[from] UnpackError).

src/common/progress.rs Show resolved Hide resolved
src/common/progress.rs Outdated Show resolved Hide resolved
src/common/progress.rs Outdated Show resolved Hide resolved
src/subcommands/archive/unpack/download_stream.rs Outdated Show resolved Hide resolved
let bytes_read = self.reader.read(buf)?;
if let Some(progress_tracker) = self.maybe_progress_tracker.as_mut() {
progress_tracker.advance(bytes_read, |completion| {
info!("Archive reading {}% complete...", completion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's both really isn't it? Maybe "Archive decompressing and reading"?

src/subcommands/archive/unpack/file_stream.rs Outdated Show resolved Hide resolved
src/subcommands/trie_compact.rs Outdated Show resolved Hide resolved
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
Signed-off-by: George Pisaltu <georgep@casperlabs.io>
This commit brings the following improvements to `ProgressTracker`:
- added documentation for the struct and its methods
- covered a zero initialization error case which would have caused an
  infinite loop
- removed the need for the custom `Drop` implementation
- moved the logging function to the constructor instead of the step
  function
- added warning logs for implementations around `ProgressTracker` when
  it can't be initialized

Signed-off-by: George Pisaltu <georgep@casperlabs.io>
@georgepisaltu georgepisaltu merged commit 4160609 into casper-network:master Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use this error throughout db-utils
4 participants