Clean up technical debt #20

georgepisaltu · 2022-07-26T09:07:52Z

This PR cleans up various parts of the codebase which accumulated technical debt. Here are the main points:

Have proper error handling and propagation through subcommands (using thiserror).
Added --overwrite option to subcommands which didn't have it already and write files to the file system.
Added progress tracker util which logs progress for long running tasks. The exception is archive create where the function we are using from tar is a blackbox which takes file paths as input, so there was no way to integrate the progress tracker.
Added util for counting entries in an LMDB database. There is a PR open in lmdb to do this, but the project has been inactive for a long time so I don't think it will get merged soon.
Made DB-PATH parameter consistent across the codebase. It now always means the path of the directory with the storage.lmdb file.

Added @sacherjj for review of the CLI usage and overall UX.

This commit removes the old subcommand success reporting system where subcommands would log errors and return a `bool`. Now all subcommand errors are defined with `thiserror` and are all unified in a big error enum at the `subcommand` mod level. This means errors are properly propagated all the way to the top level and can be handled in all lower levels as appropriate. Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Rename the entry counting util from `entries_count` to `entry_count` and fix various unnecessary mutable borrows Signed-off-by: George Pisaltu <georgep@casperlabs.io>

rafal-ch · 2022-07-28T14:34:24Z

src/common/progress.rs

+        while self.processed > (self.total_to_process * self.progress_factor as usize) / 20 {
+            log_progress(self.progress_factor * 5);


These two magic numbers (20 and 5) are bound - it could be better to just derive one from the other, like:

steps = 20; progress_multiplier = (100/steps)

rafal-ch · 2022-07-28T14:37:15Z

src/common/lmdb_utils.rs

+use lmdb_sys::{mdb_stat, MDB_stat};
+
+/// Retrieves the number of entries in a database.
+#[allow(unused)]


Should not be needed.

You're right, forgot to delete it. It's gone now.

rafal-ch · 2022-07-28T14:40:45Z

src/common/lmdb_utils.rs

+            assert_eq!(entry_count(&txn, db).unwrap(), 2);
+            txn.commit().unwrap();
+        };
+    }


A nit, but can you extend the test to del the previously added element and call the entry_count(), just to prove how the system behaves?

rafal-ch · 2022-07-28T14:48:32Z

src/subcommands/archive.rs

+pub use create::Error as CreateError;
+pub use unpack::Error as UnpackError;


I wonder if we can get rid of this name aliasing, which can be confusing?
Did you consider just renaming multiple pub enum Error to specific versions per module, like pub enum ArchiveError, etc.?

I explicitly did it this way for 2 reasons:

Flexibility - you can use whatever name you need for the error you're importing (similar to the library style) and you can avoid having your error names to verbose (in order to avoid name conflicts)

Organizing - no matter what module you're working in, functions return Error and you construct Error variants - it is a method of organizing and separating the main error thrown by this module and other error types used.

As you can see, for me at least it's not confusing. I personally would like to keep it this way, but if people feel strongly about this, we can change it.

I'm in favour of the approach taken in this PR personally. I dislike the stuttering effect of having e.g. unpack::UnpackError.

I'm also ok with avoiding aliases if folks have an objection, but we generally end up with almost the same readability - e.g. Unpack(#[from] unpack::Error) instead of Unpack(#[from] UnpackError).

rafal-ch · 2022-07-28T15:09:46Z

src/subcommands/archive/create/tests.rs

@@ -50,7 +50,7 @@ fn archive_create_roundtrip() {
    let out_dir = tempfile::tempdir().unwrap();
    let archive_path = dst_dir.path().join("test_archive.tar.zst");
    // Create the compressed archive.
-    assert!(pack::create_archive(&src_dir, &archive_path).is_ok());
+    assert!(pack::create_archive(&src_dir, &archive_path, false).is_ok());


It could be valuable to have at least one test case where overwrite=true.

Added a new test.

rafal-ch · 2022-07-28T15:10:28Z

src/subcommands/archive/create/tests.rs

+    // Destination directory isn't empty.
+    let root_dst = tempfile::tempdir().unwrap();
+    let existing_file = NamedTempFile::new_in(&root_dst).unwrap();
+    assert!(pack::create_archive(&src_dir, existing_file.path(), false,).is_err());


Suggested change

assert!(pack::create_archive(&src_dir, existing_file.path(), false,).is_err());

assert!(pack::create_archive(&src_dir, existing_file.path(), false).is_err());

rafal-ch · 2022-07-28T15:13:57Z

src/subcommands/archive/unpack.rs

+                Err(Error::Destination(IoError::new(
+                    ErrorKind::InvalidInput,
+                    "not an empty directory",
+                )))


Please add a test that proves this condition is detected.

There already is a test for this, archive_unpack_existing_destination. Its code comment was outdated, but now it should be fine.

rafal-ch · 2022-07-28T15:18:17Z

src/subcommands/archive/unpack/file_stream.rs

+        let bytes_read = self.reader.read(buf)?;
+        if let Some(progress_tracker) = self.maybe_progress_tracker.as_mut() {
+            progress_tracker.advance(bytes_read, |completion| {
+                info!("Archive reading {}% complete...", completion)


Suggested change

info!("Archive reading {}% complete...", completion)

info!("Decompression {}% complete...", completion)

I thought about this, but it's not really decompression but archive reading, though I couldn't find a better name for this. I'm open to other suggestions as well and if we find no better ones, I'll switch to your suggestion.

Well, it's both really isn't it? Maybe "Archive decompressing and reading"?

I went with "Archive reading and decompressing" because it's happening in that order. Let me know if it's ok with you both.

rafal-ch · 2022-07-28T15:22:12Z

src/subcommands/check.rs

    let path = matches.value_of(DB_PATH).unwrap();
    let failfast = !matches.is_present(NO_FAILFAST);
    let specific = matches.value_of(SPECIFIC);
    let start_at: usize = matches
        .value_of(START_AT)
-        .unwrap()
+        .expect("should have a default")
        .parse()
        .expect("Value of \"--start-at\" must be an integer.");


Suggested change

.expect("Value of \"--start-at\" must be an integer.");

.unwrap_or_else(|_| panic!("Value of \"--{}\" must be an integer.", START_AT));

A nit, just to avoid hardcoding.

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Fraser999

Looking a lot cleaner overall - nice work!

src/common/lmdb_utils.rs

Fraser999 · 2022-08-03T13:40:18Z

src/subcommands/archive.rs

+pub use create::Error as CreateError;
+pub use unpack::Error as UnpackError;


I'm in favour of the approach taken in this PR personally. I dislike the stuttering effect of having e.g. unpack::UnpackError.

I'm also ok with avoiding aliases if folks have an objection, but we generally end up with almost the same readability - e.g. Unpack(#[from] unpack::Error) instead of Unpack(#[from] UnpackError).

src/common/progress.rs

src/subcommands/archive/unpack/download_stream.rs

Fraser999 · 2022-08-03T14:27:14Z

src/subcommands/archive/unpack/file_stream.rs

+        let bytes_read = self.reader.read(buf)?;
+        if let Some(progress_tracker) = self.maybe_progress_tracker.as_mut() {
+            progress_tracker.advance(bytes_read, |completion| {
+                info!("Archive reading {}% complete...", completion)


Well, it's both really isn't it? Maybe "Archive decompressing and reading"?

src/subcommands/archive/unpack/file_stream.rs

src/subcommands/trie_compact.rs

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

This commit brings the following improvements to `ProgressTracker`: - added documentation for the struct and its methods - covered a zero initialization error case which would have caused an infinite loop - removed the need for the custom `Drop` implementation - moved the logging function to the constructor instead of the step function - added warning logs for implementations around `ProgressTracker` when it can't be initialized Signed-off-by: George Pisaltu <georgep@casperlabs.io>

src/common/progress.rs

georgepisaltu added 6 commits July 13, 2022 12:30

Add overwrite flags to guard against accidents

53c5e59

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Make DB_PATH arg usage consistent

8087879

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Add util for counting db entries

cbc8ce6

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Add util for tracking progress in long running cmd

301e86f

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Clean up entry counting util

9b4b21b

Rename the entry counting util from `entries_count` to `entry_count` and fix various unnecessary mutable borrows Signed-off-by: George Pisaltu <georgep@casperlabs.io>

georgepisaltu requested review from Fraser999, sacherjj, goral09, marc-casperlabs and rafal-ch July 26, 2022 09:07

georgepisaltu self-assigned this Jul 26, 2022

rafal-ch reviewed Jul 28, 2022

View reviewed changes

georgepisaltu added 4 commits August 3, 2022 16:10

Add del case to entry count test

e68b95e

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Add overwrite test for archive create

88b5552

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Fix outdated comment in archive unpack test

a99c5c6

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Refactor previous work

fa898c9

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Fraser999 reviewed Aug 3, 2022

View reviewed changes

georgepisaltu added 3 commits August 3, 2022 18:46

Restrict unnecessary unsafe usage in lmdb utils

d41795c

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

Fix typo in trie compact error

5e89028

Signed-off-by: George Pisaltu <georgep@casperlabs.io>

goral09 reviewed Aug 3, 2022

View reviewed changes

src/common/progress.rs Show resolved Hide resolved

Fraser999 approved these changes Aug 3, 2022

View reviewed changes

georgepisaltu merged commit 4160609 into casper-network:master Aug 8, 2022

georgepisaltu mentioned this pull request Aug 8, 2022

Clean up casper-db-utils #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up technical debt #20

Clean up technical debt #20

georgepisaltu commented Jul 26, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

Fraser999 Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

Fraser999 Aug 3, 2022

georgepisaltu Aug 3, 2022

rafal-ch Jul 28, 2022

georgepisaltu Aug 3, 2022

Fraser999 left a comment

Fraser999 Aug 3, 2022

Fraser999 Aug 3, 2022

		while self.processed > (self.total_to_process * self.progress_factor as usize) / 20 {
		log_progress(self.progress_factor * 5);

		pub use create::Error as CreateError;
		pub use unpack::Error as UnpackError;

	assert!(pack::create_archive(&src_dir, existing_file.path(), false,).is_err());
	assert!(pack::create_archive(&src_dir, existing_file.path(), false).is_err());

	info!("Archive reading {}% complete...", completion)
	info!("Decompression {}% complete...", completion)

	.expect("Value of \"--start-at\" must be an integer.");
	.unwrap_or_else(\|_\| panic!("Value of \"--{}\" must be an integer.", START_AT));

Clean up technical debt #20

Clean up technical debt #20

Conversation

georgepisaltu commented Jul 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fraser999 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment