Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Chars as Strings implicitly #999

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Dec 11, 2024

Related to #998

This PR can best be reviewed per-commit. It contains 3 parts where we can enhance Char support:

The first commit adds String-fallback for Char columns to df.convert()... and charCol.convertTo<X>(). This means you can now do things like

enum class EnumClass { A, B }

columnOf('A', 'B').convertTo<EnumClass>()
// previously you could only do this
columnOf('A', 'B').convertToString().convertTo<EnumClass>() 

Converting Char -> Int remains unchanged, it takes the char code instead of the value, similar to casting a char to int in java.

The second commit adds "treating chars as strings" behavior to the df.convertTo<Schema> {} DSL. This makes the behavior mentioned in #998 possible again:

df.convertTo<Schema>() {
    // can now be used for both String and chars
    parser { /* it: String -> */ MyCustomClass(it) }

    // will be used instead of parser {} if present
    convert<Char>().with { /* it: Char -> */ ... }
}

This we could change to charParser {} and parser {} separately if we wish so. It's still up for debate.

The third commit introduces parsing of Char columns, similar to String columns.
This means you can now do:

columnOf('1', '2', '3').parse() // results in DataColumn<Int>

columnOf('a', 'b', 'c').parse() // results in DataColumn<String>

and Char columns will also be considered when calling DataFrame.parse().

I still need to update the docs, but first I want to be sure there are not other places that could benefit from this Char-as-String treatment.

@Jolanrensen Jolanrensen added the enhancement New feature or request label Dec 11, 2024
@Jolanrensen Jolanrensen added this to the 0.16.0 milestone Dec 11, 2024
@Jolanrensen Jolanrensen self-assigned this Dec 11, 2024
…sing, but can never result in Char and can never fail (since it can parse to String)
@hantsy
Copy link

hantsy commented Dec 14, 2024

columnOf('a', 'b', 'c').parse() // results in DataColumn

I remember any chars in the ascii table can be converted into Int automatically in Java.

@Jolanrensen
Copy link
Collaborator Author

columnOf('a', 'b', 'c').parse() // results in DataColumn

I remember any chars in the ascii table can be converted into Int automatically in Java.

yes indeed! However, in java you need to be careful when you want to convert to/from ascii codes or the readable values. In Kotlin, this has been made clearer:

// get ascii code from char
'1'.code == 49

// get readable value
'1'.digitToInt() == 1

// and back, from ascii code
49.toChar == '1'

// back to readable char
1.digitToChar()  == '1'

In DataFrame, I'd like convert to use the ascii conversion methods, while parse should provide the readable value conversion alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants