Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Move out datatype and schema from daft-core #2806

Merged
merged 25 commits into from
Sep 9, 2024

Conversation

samster25
Copy link
Member

@samster25 samster25 commented Sep 6, 2024

  • Factors out DataType, Field, Schema into daft-schema
  • Factors out our display logic to common-display
  • Factors out array ffi code into common-arrow-ffi
  • Create StrValue trait that allows us to pass &dyn StrValue to the common-display crate without it knowing what a series is.
  • Reexport daft-schema::{DataType, Schema, Field} for convenience

Follow on:

  • Factor out daft-scan and daft-plan dependance on daft-core. This will allow them to directly depend on daft-schema

@github-actions github-actions bot added the chore label Sep 6, 2024
Copy link

codspeed-hq bot commented Sep 6, 2024

CodSpeed Performance Report

Merging #2806 will degrade performances by 37.79%

Comparing sammy/datatype-move-out (7647320) with main (f9b7af2)

Summary

⚡ 2 improvements
❌ 1 regressions
✅ 13 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main sammy/datatype-move-out Change
test_count[1 Small File] 10.5 ms 16.9 ms -37.79%
test_explain[100 Small Files] 45.3 ms 36.8 ms +23.13%
test_show[100 Small Files] 105 ms 55.6 ms +88.64%

@Eventual-Inc Eventual-Inc deleted a comment from netlify bot Sep 8, 2024
@samster25 samster25 marked this pull request as ready for review September 8, 2024 22:42
Copy link

github-actions bot commented Sep 8, 2024

@github-actions github-actions bot temporarily deployed to pull request September 8, 2024 22:52 Inactive
Copy link

codecov bot commented Sep 8, 2024

Codecov Report

Attention: Patch coverage is 78.81944% with 61 lines in your changes missing coverage. Please review.

Project coverage is 63.26%. Comparing base (045dc05) to head (33cf2cf).

Files with missing lines Patch % Lines
src/daft-core/src/array/ops/image.rs 34.00% 33 Missing ⚠️
src/daft-core/src/utils/display.rs 67.64% 22 Missing ⚠️
src/daft-core/src/datatypes/binary_ops.rs 90.47% 4 Missing ⚠️
src/common/display/src/table_display.rs 94.44% 1 Missing ⚠️
src/daft-table/src/ffi.rs 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2806      +/-   ##
==========================================
- Coverage   63.30%   63.26%   -0.05%     
==========================================
  Files        1007     1008       +1     
  Lines      114142   114189      +47     
==========================================
- Hits        72262    72242      -20     
- Misses      41880    41947      +67     
Files with missing lines Coverage Δ
src/common/arrow-ffi/src/lib.rs 99.21% <ø> (ø)
src/common/daft-config/src/lib.rs 82.14% <ø> (ø)
src/daft-core/src/array/ops/cast.rs 85.70% <ø> (ø)
src/daft-core/src/array/ops/repr.rs 65.69% <100.00%> (ø)
src/daft-core/src/array/ops/utf8.rs 93.19% <100.00%> (ø)
src/daft-core/src/datatypes/mod.rs 28.57% <ø> (ø)
src/daft-core/src/lib.rs 100.00% <ø> (ø)
src/daft-core/src/python/mod.rs 100.00% <100.00%> (ø)
src/daft-core/src/python/series.rs 94.01% <ø> (ø)
src/daft-core/src/series/array_impl/binary_ops.rs 93.33% <100.00%> (+0.47%) ⬆️
... and 94 more

... and 18 files with indirect coverage changes

Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Main question is why do we do use daft_core::prelude::{Schema, SchemaRef, DataType...} in a bunch of places instead of using daft_schema directly instead? And the corollary: why does daft_core::prelude re-export a bunch of stuff from daft-schema::prelude?

I also feel like we might want to narrow down the things in our prelude::* a little more. I did find myself getting a little confused when it came to certain more "specialized" types like ImageFormat when they just magically appeared and actually came from the daft-schema prelude.

image::ImageFormat::Tiff => ImageFormat::TIFF,
image::ImageFormat::Gif => ImageFormat::GIF,
image::ImageFormat::Bmp => ImageFormat::BMP,
_ => unimplemented!("Image format {:?} is not supported", image_format),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a TryFrom thing instead of panicking?

@@ -937,7 +937,7 @@ impl Utf8Array {
pub fn to_datetime(&self, format: &str, timezone: Option<&str>) -> DaftResult<TimestampArray> {
let len = self.len();
let self_iter = self.as_arrow().iter();
let timeunit = crate::datatypes::utils::infer_timeunit_from_format_string(format);
let timeunit = daft_schema::time_unit::infer_timeunit_from_format_string(format);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I prefer this than the prelude pattern, which the datatypes expose (e.g. felt a little confusing when I was trying to figure out where ImageFormat came from earlier but then finally realized it came from datatypes::prelude)

Maybe we need to find a happy medium of what should actually be on the prelude?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule of thumb im going with is anything that is used in a public API of that module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the prelude is an optional opt-in, it's not automatically imported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule of thumb im going with is anything that is used in a public API of that module.

Isn't that just use daft_core::*? The *::prelude::* pattern feels more like "please import all the stuff I absolutely will need to work with your crate"

For e.g. in pyo3 I think it's all the pyclass and pymethod stuff, but for more specific things we still do separate imports.

https://www.lurklurk.org/effective-rust/wildcard.html?highlight=prelude#item-23-avoid-wildcard-imports

However, there's another common exception where wildcard imports make sense. Some crates have a convention that common items for the crate are re-exported from a prelude module, which is explicitly intended to be wildcard imported:
use thing::prelude::*;
Although in theory the same concerns apply in this case, in practice such a prelude module is likely to be carefully curated, and higher convenience may outweigh a small risk of future problems.

Copy link
Member Author

@samster25 samster25 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can take a look at datafusion-cores's prelude

Also happy to put things within a namespace under prelude.

Also looks like pyo3 puts a ton under their prelude (all the methods and traits)
https://github.com/PyO3/pyo3/blob/main/src/prelude.rs

Copy link
Member Author

@samster25 samster25 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also use daft_core::* would just import all the modules that are public which is different behavior.

instead of Series, DataArray, ListArray it would do the modules series, array and datatypes which is not what we want.

src/daft-core/src/datatypes/binary_ops.rs Show resolved Hide resolved
src/daft-core/src/datatypes/mod.rs Show resolved Hide resolved
src/daft-core/src/datatypes/prelude.rs Show resolved Hide resolved
src/daft-core/src/python/mod.rs Show resolved Hide resolved
src/daft-csv/src/metadata.rs Show resolved Hide resolved
daft_version: crate::VERSION.into(),
daft_build_type: crate::DAFT_BUILD_TYPE.into(),
// daft_version: crate::VERSION.into(),
// daft_build_type: crate::DAFT_BUILD_TYPE.into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this and why is it commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah these were declared in daft-core (Daft-version and build type). factoring these out to a common crate.

@github-actions github-actions bot temporarily deployed to pull request September 9, 2024 00:49 Inactive
@samster25 samster25 merged commit 5cf876a into main Sep 9, 2024
36 of 37 checks passed
@samster25 samster25 deleted the sammy/datatype-move-out branch September 9, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants