Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata macros #377

Merged
merged 9 commits into from
Feb 27, 2024
Merged

Add metadata macros #377

merged 9 commits into from
Feb 27, 2024

Conversation

pdeffebach
Copy link
Collaborator

Description of API to come

@pdeffebach
Copy link
Collaborator Author

@bkamins

This is ready for a review.

The API is very minimal. However I do add two functions, printlabels and printnotes because working with metadata in data cleaning is pretty frustrating without them.

Maybe we can have printlabels and printnotes live in DataFrames.jl?

Long discussion of variable construction.
```

Unlike labels, notes are appended.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, but then maybe add a comment how to remove a note from a column?

@bkamins
Copy link
Member

bkamins commented Feb 27, 2024

Thank you. Looks good. I would leave printlabels and printnotes in DataFramesMeta.jl for now as they are only printing things (not returning a value).

@pdeffebach
Copy link
Collaborator Author

@bkamins

I beefed up the printing a little, trying to make it as useful as possible.

We can change the API at 1.0 if people don't like it. But I think it's pretty useful.

This is ready for merging.

API of printing:

Labels

printlabels(df, [cols=All()]; unlabelled = true)

Pretty-print all labels in a data frame.

Arguments

  • cols: Optional argument to select columns to print. Can
    be any valid multi-column selector, such as Not(...),
    Between(...), or a regular expression.

  • unlabelled: Keyword argument for whether to print
    the columns without user-defined labels. Deftaults to true.
    For column col without a user-defined label, label(df, col) returns
    the name of the column, col.

Examples

julia> df = DataFrame(wage = [12], age = [23]);

julia> @label! df :wage = "Hourly wage (2015 USD)";

julia> printlabels(df)
┌────────┬────────────────────────┐
│ Column │                  Label │
├────────┼────────────────────────┤
│   wage │ Hourly wage (2015 USD) │
│    age │                    age │
└────────┴────────────────────────┘

julia> printlabels(df, :wage)
┌────────┬────────────────────────┐
│ Column │                  Label │
├────────┼────────────────────────┤
│   wage │ Hourly wage (2015 USD) │
└────────┴────────────────────────┘

julia> printlabels(df; unlabelled = false)
┌────────┬────────────────────────┐
│ Column │                  Label │
├────────┼────────────────────────┤
│   wage │ Hourly wage (2015 USD) │
└────────┴────────────────────────┘

julia> printlabels(df, r"^wage")
┌────────┬────────────────────────┐
│ Column │                  Label │
├────────┼────────────────────────┤
│   wage │ Hourly wage (2015 USD) │
└────────┴────────────────────────┘

Notes

printnotes(df, cols = All(); unnoted = false)

Print the notes and labels in a data frame.

Arguments

  • cols: Optional argument to select columns to print. Can
    be any valid multi-column selector, such as Not(...),
    Between(...), or a regular expression.
  • unnoted: Keyword argument for whether to print
    the columns without user-defined notes or labels.

For the purposes of printing, column labels are printed in
addition to notes. However column labels are not returned by
note(df, col).

julia> df = DataFrame(wage = [12], age = [23]);

julia> @label! df :age = "Age (years)";

julia> @note! df :wage = "Derived from American Community Survey";

julia> @note! df :wage = "Missing values imputed as 0 wage";

julia> @label! df :wage = "Hourly wage (2015 USD)";

julia> printnotes(df)
Column: wage
────────────
Label: Hourly wage (2015 USD)
Derived from American Community Survey
Missing values imputed as 0 wage

Column: age
───────────
Label: Age (years)

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just merge conflicts need to be resolved.

@pdeffebach
Copy link
Collaborator Author

Thanks! Will merge!

@pdeffebach pdeffebach merged commit edf22c3 into master Feb 27, 2024
13 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants