Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Storage Component in C++ #270

Merged
merged 622 commits into from
Jan 11, 2024
Merged

Conversation

vGsteiger
Copy link
Collaborator

@vGsteiger vGsteiger commented Jun 7, 2023

In the interest of #197 and #147 we have reworked the storage component from Python to cpp. In the process of this we have dramatically overworked the workflows involved while maintaining the logic and functionality of the component.

For additional testing we extended the pipeline workflow with additional c++ testing and linting. Note that timing changes from milliseconds to seconds in the database.

Open TODOs for Maxi:

  • Can we avoid having to rebuild the C++ storage every time Python source code changes? This needs some changes to the Dockerfile but is necessary for fast iteration. Alternatively, we can add a flag to use a local gRPC install instead of building gRPC from source.
  • Support grpc 1.59.1
  • Move from sync to async? (also see https://stackoverflow.com/questions/52298811/c-grpc-thread-number-configuration) Otherwise, can we set size of threadpool?
  • Add some cmake flags explanation to the README (e.g., MODYN_BUILD_STORAGE for fast development)
  • Swap primary key order in SQL for postgres?
  • Does it still build without MODYN_BUILD_STORAGE?
  • Use Asan/Tsan in Integrationtests for Storage, but NOT for Extensions (Python breaks with them)
  • Does the storage use bulk insertion the same way as sqlalchemy?
  • Move templated functions in storage service impl into header

@vGsteiger vGsteiger self-assigned this Jun 7, 2023
@github-actions
Copy link

github-actions bot commented Jun 8, 2023

✅ Result of Pytest Coverage

---------- coverage: platform linux, python 3.11.7-final-0 -----------

Name Stmts Miss Cover
modyn/common/benchmark/stopwatch.py 26 0 100%
modyn/common/example_extension/example_extension.py 28 2 93%
modyn/common/ftp/ftp_server.py 31 18 42%
modyn/common/ftp/ftp_utils.py 83 69 17%
modyn/common/grpc/grpc_helpers.py 67 36 46%
modyn/common/trigger_sample/trigger_sample_storage.py 158 9 94%
modyn/database/abstract_database_connection.py 35 0 100%
modyn/database/partition_by_meta.py 33 12 64%
modyn/evaluator/evaluator.py 25 1 96%
modyn/evaluator/evaluator_entrypoint.py 32 3 91%
modyn/evaluator/internal/dataset/evaluation_dataset.py 68 3 96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py 22 0 100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py 185 27 85%
modyn/evaluator/internal/metric_factory.py 18 1 94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py 10 1 90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py 29 2 93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py 10 1 90%
modyn/evaluator/internal/metrics/accuracy.py 20 2 90%
modyn/evaluator/internal/metrics/f1_score.py 63 0 100%
modyn/evaluator/internal/metrics/roc_auc.py 36 1 97%
modyn/evaluator/internal/pytorch_evaluator.py 121 31 74%
modyn/evaluator/internal/utils/evaluation_info.py 34 1 97%
modyn/evaluator/internal/utils/evaluation_process_info.py 8 0 100%
modyn/evaluator/internal/utils/evaluator_messages.py 3 0 100%
modyn/metadata_database/metadata_base.py 3 0 100%
modyn/metadata_database/metadata_database_connection.py 55 3 95%
modyn/metadata_database/models/pipelines.py 22 1 95%
modyn/metadata_database/models/sample_training_metadata.py 15 0 100%
modyn/metadata_database/models/selector_state_metadata.py 46 10 78%
modyn/metadata_database/models/trained_models.py 18 0 100%
modyn/metadata_database/models/trigger_partitions.py 10 0 100%
modyn/metadata_database/models/trigger_training_metadata.py 14 0 100%
modyn/metadata_database/models/triggers.py 10 0 100%
modyn/metadata_database/utils/model_storage_strategy_config.py 21 3 86%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py 18 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py 24 0 100%
modyn/metadata_processor/internal/metadata_processor_manager.py 23 4 83%
modyn/metadata_processor/metadata_processor.py 11 0 100%
modyn/metadata_processor/metadata_processor_entrypoint.py 24 1 96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py 29 0 100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py 17 2 88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py 6 1 83%
modyn/model_storage/internal/grpc/grpc_server.py 23 0 100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py 54 0 100%
modyn/model_storage/internal/model_storage_manager.py 117 5 96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py 11 2 82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py 16 1 94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py 12 0 100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py 14 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py 26 2 92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py 16 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py 15 0 100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py 26 10 62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py 98 1 99%
modyn/model_storage/internal/utils/model_storage_policy.py 31 0 100%
modyn/model_storage/model_storage.py 27 3 89%
modyn/model_storage/model_storage_entrypoint.py 32 3 91%
modyn/models/articlenet/articlenet.py 30 16 47%
modyn/models/coreset_methods_support.py 29 1 97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py 24 13 46%
modyn/models/dlrm/dlrm.py 67 9 87%
modyn/models/dlrm/nn/embeddings.py 123 64 48%
modyn/models/dlrm/nn/factories.py 24 9 62%
modyn/models/dlrm/nn/interactions.py 50 11 78%
modyn/models/dlrm/nn/mlps.py 77 23 70%
modyn/models/dlrm/nn/parts.py 60 4 93%
modyn/models/dlrm/setup.py 5 5 0%
modyn/models/dlrm/utils/install_lib.py 11 7 36%
modyn/models/dlrm/utils/utils.py 28 0 100%
modyn/models/fmownet/fmownet.py 25 0 100%
modyn/models/resnet18/resnet18.py 28 0 100%
modyn/models/resnet50/resnet50.py 28 0 100%
modyn/models/resnet152/resnet152.py 28 0 100%
modyn/models/tokenizers/distill_bert_tokenizer.py 11 0 100%
modyn/models/yearbooknet/yearbooknet.py 23 0 100%
modyn/selector/internal/grpc/selector_grpc_servicer.py 90 24 73%
modyn/selector/internal/grpc/selector_server.py 33 12 64%
modyn/selector/internal/selector_manager.py 141 47 67%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py 153 15 90%
modyn/selector/internal/selector_strategies/coreset_strategy.py 71 10 86%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py 32 2 94%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py 14 9 36%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py 55 2 96%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py 12 7 42%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py 5 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py 12 7 42%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py 5 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py 14 2 86%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py 19 13 32%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py 15 10 33%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py 10 0 100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py 133 14 89%
modyn/selector/internal/selector_strategies/new_data_strategy.py 102 11 89%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py 55 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py 26 2 92%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py 16 1 94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py 39 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py 16 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py 12 1 92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py 14 0 100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py 32 6 81%
modyn/selector/internal/storage_backend/database/database_storage_backend.py 85 7 92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py 15 5 67%
modyn/selector/selector.py 82 14 83%
modyn/selector/selector_entrypoint.py 31 3 90%
modyn/supervisor/entrypoint.py 47 7 85%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py 14 2 86%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py 17 0 100%
modyn/supervisor/internal/evaluation_result_writer/log_result_writer.py 4 1 75%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py 13 0 100%
modyn/supervisor/internal/grpc_handler.py 357 71 80%
modyn/supervisor/internal/triggers/amounttrigger.py 15 0 100%
modyn/supervisor/internal/triggers/timetrigger.py 27 1 96%
modyn/supervisor/internal/triggers/trigger.py 6 0 100%
modyn/supervisor/internal/utils/evaluation_status_tracker.py 23 0 100%
modyn/supervisor/internal/utils/training_status_tracker.py 121 12 90%
modyn/supervisor/supervisor.py 354 61 83%
modyn/tests/common/example_extension/test_example_extension.py 13 0 100%
modyn/tests/common/grpc/test_grpc_helpers.py 3 0 100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py 128 0 100%
modyn/tests/database/test_abstract_database_connection.py 19 0 100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py 128 2 98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py 20 0 100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py 333 16 95%
modyn/tests/evaluator/internal/metrics/test_accuracy.py 45 0 100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py 53 0 100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py 31 0 100%
modyn/tests/evaluator/internal/test_metric_factory.py 13 0 100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py 166 19 89%
modyn/tests/evaluator/test_evaluator.py 30 0 100%
modyn/tests/evaluator/test_evaluator_entrypoint.py 22 0 100%
modyn/tests/metadata_database/models/test_pipelines.py 48 0 100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py 40 0 100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py 46 0 100%
modyn/tests/metadata_database/models/test_trained_models.py 48 0 100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py 38 0 100%
modyn/tests/metadata_database/models/test_triggers.py 33 0 100%
modyn/tests/metadata_database/test_metadata_database_connection.py 47 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py 26 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py 27 0 100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py 42 3 93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py 60 0 100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py 43 0 100%
modyn/tests/metadata_processor/test_metadata_processor.py 22 3 86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py 22 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py 16 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py 100 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py 27 1 96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py 36 1 97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py 88 2 98%
modyn/tests/model_storage/internal/test_model_storage_manager.py 215 1 99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py 28 0 100%
modyn/tests/model_storage/test_model_storage.py 37 0 100%
modyn/tests/model_storage/test_model_storage_entrypoint.py 22 0 100%
modyn/tests/models/test_bert_tokenizer.py 24 0 100%
modyn/tests/models/test_dlrm.py 46 0 100%
modyn/tests/models/test_embedding_recorder.py 27 0 100%
modyn/tests/models/test_fmownet.py 25 0 100%
modyn/tests/models/test_resnet18.py 22 0 100%
modyn/tests/models/test_resnet50.py 22 0 100%
modyn/tests/models/test_resnet152.py 22 0 100%
modyn/tests/models/test_yearbook_net.py 47 0 100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py 156 0 100%
modyn/tests/selector/internal/grpc/test_selector_server.py 16 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py 15 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py 13 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py 17 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py 5 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py 114 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py 0 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py 164 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py 51 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py 95 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py 139 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py 171 0 100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py 232 0 100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py 308 0 100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py 519 0 100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py 123 0 100%
modyn/tests/selector/internal/test_selector_manager.py 162 4 98%
modyn/tests/selector/test_selector.py 92 4 96%
modyn/tests/selector/test_selector_entrypoint.py 26 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py 7 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py 16 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py 21 0 100%
modyn/tests/supervisor/internal/test_grpc_handler.py 351 0 100%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py 30 0 100%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py 26 0 100%
modyn/tests/supervisor/internal/triggers/test_trigger.py 5 0 100%
modyn/tests/supervisor/internal/utils/test_evaluation_status_tracker.py 41 0 100%
modyn/tests/supervisor/internal/utils/test_training_status_tracker.py 133 0 100%
modyn/tests/supervisor/test_entrypoint.py 39 0 100%
modyn/tests/supervisor/test_supervisor.py 420 2 99%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py 89 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py 92 0 100%
modyn/tests/trainer_server/internal/data/test_data_utils.py 22 1 95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py 59 0 100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py 335 3 99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py 53 3 94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py 17 0 100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py 404 8 98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py 41 0 100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py 51 1 98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py 21 1 95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py 75 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py 12 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py 249 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py 56 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py 116 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py 92 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py 104 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py 82 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py 101 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py 49 0 100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py 361 33 91%
modyn/tests/trainer_server/test_trainer_server.py 34 0 100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py 22 0 100%
modyn/tests/utils/test_utils.py 176 0 100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py 33 33 0%
modyn/trainer_server/internal/dataset/data_utils.py 17 2 88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py 21 5 76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py 23 1 96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py 54 2 96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py 53 3 94%
modyn/trainer_server/internal/dataset/online_dataset.py 275 28 90%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py 14 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py 22 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py 244 38 84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py 33 0 100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py 22 2 91%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py 15 3 80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py 21 0 100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py 477 152 68%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py 66 4 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py 9 1 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py 32 3 91%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py 28 17 39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py 29 12 59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py 66 34 48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py 9 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py 103 15 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py 116 78 33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py 95 7 93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py 17 1 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py 42 5 88%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py 15 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py 34 5 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py 30 3 90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py 61 18 70%
modyn/trainer_server/internal/utils/metric_type.py 3 0 100%
modyn/trainer_server/internal/utils/trainer_messages.py 4 0 100%
modyn/trainer_server/internal/utils/training_info.py 51 2 96%
modyn/trainer_server/internal/utils/training_process_info.py 10 0 100%
modyn/trainer_server/trainer_server.py 19 0 100%
modyn/trainer_server/trainer_server_entrypoint.py 32 3 91%
modyn/utils/utils.py 167 12 93%
TOTAL 16059 1414 91%
Coverage HTML written to
================ 1830 passed, 639

@MaxiBoether MaxiBoether changed the title Storage cpp Overhaul Rewrite Storage Component in C++ Jun 9, 2023
@vGsteiger vGsteiger mentioned this pull request Jun 25, 2023
modyn/protos/storage.proto Outdated Show resolved Hide resolved
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already thank you so much for the work. I did a first review excluding the actual gRPC service and the binary and csv file wrapper since my brain did not focus anymore. I will try to review these tomorrow.

I hope my comments make sense. Let's address these comments, get the integration tests up and running and then I will make another pass when we know everything is working with postgres end to end. I think my biggest note is that I don't understand the multithreading within the FileWatcher and think it can be simplified a lot. For everything else, see the comments :)

.pylintrc Show resolved Hide resolved
docker/Storage/Dockerfile Outdated Show resolved Hide resolved
docker/Storage/Dockerfile Outdated Show resolved Hide resolved
modyn/storage/__init__.py Show resolved Hide resolved
modyn/storage/internal/__init__.py Show resolved Hide resolved
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I went through the file wrappers and gRPC servicer.

  1. The gRPC servicer has the same MT pattern as the file watcher which should be changed.
  2. The gRPC servicer currently does not use sql partitions if I see correctly (see the comment on it for details) and materializes everything in memory instead like in Python. We should align this.
  3. For the CSV File wrapper, I wasn't sure whether we should use a library or not.

See the individual comments again for the details. Thank you so much for the work. Sorry for the amount of comments, it's just a huge PR, but I think step-by-step the things are addressable.

As discussed, for now, the idea would be that you adjust according to the review. Then, we get the integration tests up and running (in some places during the review I was not sure whether it will actually work end 2 end on postgres), and then I do another round of review, including the tests that time :) If the integration tests require some setup work I will be happy to assist, but I am not sure where the current issues lie

modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, another round done. I resolved most of the old comments, with the exception of few of them which were still relevant.

I think a bigger question is how the gRPC server executes things with threads/processes. Furthermore, we should try to use something similar to sqlalchemy's yield_per interface. The servicer probably needs some more work/cleanup, the other parts are looking solid. There, my comments were mostly minor.

I didn't get to review the tests tonight, unfortunately

docker/Dependencies/Dockerfile Outdated Show resolved Hide resolved
docker/Dependencies/Dockerfile Outdated Show resolved Hide resolved
integrationtests/storage/integrationtest_storage.py Outdated Show resolved Hide resolved
integrationtests/storage/integrationtest_storage_binary.py Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 3, 2023

Line Coverage: -% ( % to main)
Branch Coverage: -% ( % to main)

Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(viktor in case you see this ignore it it's just for me)

modyn/playground/playground.cpp Show resolved Hide resolved
modyn/storage/src/internal/file_watcher/file_watcher.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/file_watcher/file_watcher.cpp Outdated Show resolved Hide resolved
modyn/storage/src/internal/grpc/storage_service_impl.cpp Outdated Show resolved Hide resolved
@MaxiBoether MaxiBoether merged commit fe4e614 into main Jan 11, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants