-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: 📝 Add expected answers to DataFrame
method examples
#12564
Conversation
datafusion/core/src/dataframe/mod.rs
Outdated
@@ -225,7 +225,12 @@ impl DataFrame { | |||
/// # async fn main() -> Result<()> { | |||
/// let ctx = SessionContext::new(); | |||
/// let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?; | |||
/// let df = df.select_columns(&["a", "b"])?; | |||
/// df.select_columns(&["a", "b"])?.show().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think about using assert_batches_eq
instead so that the output is automatically checked as part of the doc tests rather than relying on us manually keeping it up to date?
It is of similar readability I think. Here is an example:
datafusion/datafusion/core/tests/sql/select.rs
Lines 27 to 49 in d9cb6e6
let results = ctx | |
.sql("SELECT * FROM test WHERE c1 = $1") | |
.await? | |
.with_param_values(vec![ScalarValue::from(3i32)])? | |
.collect() | |
.await?; | |
let expected = vec![ | |
"+----+----+-------+", | |
"| c1 | c2 | c3 |", | |
"+----+----+-------+", | |
"| 3 | 1 | false |", | |
"| 3 | 10 | true |", | |
"| 3 | 2 | true |", | |
"| 3 | 3 | false |", | |
"| 3 | 4 | true |", | |
"| 3 | 5 | false |", | |
"| 3 | 6 | true |", | |
"| 3 | 7 | false |", | |
"| 3 | 8 | true |", | |
"| 3 | 9 | false |", | |
"+----+----+-------+", | |
]; | |
assert_batches_sorted_eq!(expected, &results); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thought is to simplify large long column on doctest(less readable), like this: https://github.com/Eason0729/datafusion/blob/4dd44c5a2b0d9810d7e9163689afab227c58d542/datafusion/core/src/dataframe/mod.rs#L734
Therefore, I simplify example_long.csv in the next commit to the point which is just enough to showcase most method on dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the expect behavior on describe
method is more complex, which might require dedicated csv, so I decided to leave as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @Eason0729 for this contribution 🙏
I have started the CI to check this code
I think the PR would be even better if the output was verified (so it is guaranteed to stay in sync). I left a suggestion on how to do this. Let me know what you think
@@ -184,7 +184,7 @@ impl DataFrame { | |||
} | |||
|
|||
/// Creates logical expression from a SQL query text. | |||
/// The expression is created and processed againt the current schema. | |||
/// The expression is created and processed against the current schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks @Eason0729 for your contribution.
Underneath the method can call |
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
This seems reasonable to me -- perhaps you can file a ticket to track the idea and we can as our resident DataFrame experts like @timsaucer and @Omega359 |
Thanks again @Eason0729 |
📝 |
…2564) * docs: 📝 Add expected answers to `DataFrame` method examples * test: 📝 use assert_batches_sorted_eq and simplify example_long.csv * Update datafusion/core/src/dataframe/mod.rs Co-authored-by: Oleks V <comphead@users.noreply.github.com> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Oleks V <comphead@users.noreply.github.com>
Which issue does this PR close?
Closes #12527.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?