Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up unused methods. Prepare for file_system argument. #315

Merged
merged 3 commits into from
Jul 26, 2024

Conversation

delucchi-cmu
Copy link
Contributor

Change Description

Closes #301

Accomplishes a few clean-ups that I noticed in the course of implementing issue #307. To reduce that PR, I'm splitting them off.

  • Allow for more duplicated code (consider docstrings when determining duplication; it's misleading)
  • write_parquet_metadata had previously been moved. removes the soft links.
  • adds a load_csv_to_pandas_generator where the generator method owns the file connection, and it stays within the generator method.
  • CSV reading first opens the file using file_system, then passes along to pandas. This avoids many inconsistencies with panda's file operations.
  • read_parquet_file was only called within unit tests. removes the method
  • adds test coverage by attempting to load all catalog types via the loader
  • remove unused text fixture and print statement

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.85%. Comparing base (2c625ce) to head (e37ba01).
Report is 32 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #315   +/-   ##
=======================================
  Coverage   93.84%   93.85%           
=======================================
  Files          58       58           
  Lines        2048     2049    +1     
=======================================
+ Hits         1922     1923    +1     
  Misses        126      126           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jul 25, 2024

Before [2c625ce] After [92cd542] Ratio Benchmark (Parameter)
19.4±1ms 21.5±0.6ms ~1.11 benchmarks.MetadataSuite.time_load_partition_info_order6
76.1±0.4ms 77.7±1ms 1.02 benchmarks.MetadataSuite.time_load_partition_info_order7
76.3±0.2ms 77.9±1ms 1.02 benchmarks.MetadataSuite.time_load_partition_join_info
13.3±0.6ms 13.4±0.4ms 1.01 benchmarks.Suite.time_inner_pixel_alignment
86.7±2ms 87.9±2ms 1.01 benchmarks.Suite.time_paths_creation
118±0.7ms 120±0.3ms 1.01 benchmarks.time_test_alignment_even_sky
384±3ms 381±1ms 0.99 benchmarks.Suite.time_outer_pixel_alignment
985±8μs 977±10μs 0.99 benchmarks.time_test_cone_filter_multiple_order
41.6±0.8ms 40.9±0.6ms 0.98 benchmarks.Suite.time_pixel_tree_creation

Click here to view all benchmarks.

@delucchi-cmu delucchi-cmu requested a review from Schwarzam July 26, 2024 12:02
@delucchi-cmu
Copy link
Contributor Author

@Schwarzam

Part of this change is to read CSVs with pandas with an already-open file handle. I think this means that we don't need to change the headers inside storage_options to accommodate pandas' idiosyncrasies. Our unit tests don't have anything that exercises the headers, so I wanted to check with you that this doesn't impact your work.

@delucchi-cmu delucchi-cmu requested a review from hombit July 26, 2024 12:03
Copy link
Contributor

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@Schwarzam
Copy link
Contributor

Tested locally, and everything is working perfectly!

Co-authored-by: Konstantin Malanchev <hombit@gmail.com>
@delucchi-cmu delucchi-cmu merged commit 3234fcf into main Jul 26, 2024
12 checks passed
@delucchi-cmu delucchi-cmu deleted the issue/307/prefactor branch July 26, 2024 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot read HTTP catalog
3 participants