Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive a parser for delimited data #36

Open
tsheinen opened this issue Dec 2, 2023 · 9 comments
Open

Derive a parser for delimited data #36

tsheinen opened this issue Dec 2, 2023 · 9 comments

Comments

@tsheinen
Copy link

tsheinen commented Dec 2, 2023

I'd like to be able to able to parse delimited data -- for example 1, 2, 3, 4, 5

I envision this looking something like

#[derive(Debug,Display,FromStr)]
struct Container {
   #[from_str(delimiter=", ")]
   numbers: Vec<usize>
}

If I have time over the next couple days i'll try and PR this; otherwise thought i'd mention it to see if this would be useful to other people (or if im dumb and this is already doable lol)

@fritzrehde
Copy link
Contributor

Nope, you're definitely not the only one who thinks this would be cool. I was just solving day 2 of advent of code, and had a struct:

/// A semicolon-separated list of subsets of cube-pickings.
#[derive(From)]
#[cfg_attr(test, derive(PartialEq, Eq, Debug))]
struct CubePickingSubsets(Vec<CubePickingSubset>);

// TODO: this FromStr impl should be automated e.g. by parse_display
impl str::FromStr for CubePickingSubsets {
    type Err = Error;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        let cube_subsets = s.split("; ").map(str::parse).collect::<Result<_>>()?;
        Ok(Self(cube_subsets))
    }
}

Would love to see this feature in parse_display!

@tsheinen
Copy link
Author

tsheinen commented Dec 2, 2023

haha yeah that's exactly where this question came from

@frozenlib
Copy link
Owner

frozenlib commented Dec 3, 2023

I too would find it useful to be able to do that.

Also, I may want to customize the implementation of each field in other ways, such as

However, adding more functions to the macro to increase the number of customization methods would make the macro more complex, so I am considering adding just one function instead, as follows.

#[display(with = ...)]

Specify an expression that returns a value implements following DisplayFormat and FromStrFormat, and code using these traits will be generated.

trait DisplayFormat<T> {
    fn write(&self, f: &mut Formatter, t: &T) -> core::fmt::Result;
}

trait FromStrFormat<T> {
    type Err;
    fn parse(&self, s: &str) -> core::result::Result<T, Self::Err>;
    fn regex(&self) -> &str {
        "(?s:.*?)"
    }
}

Once this is available, the following code will do the same as #[from_str(delimiter=", ")].

fn delimiter(delimiter: &'static str) -> Delimiter {
    Delimiter(delimiter)
}

struct Delimiter(&'static str);

impl<T: Display> DisplayFormat<Vec<T>> for Delimiter {
    fn write(&self, f: &mut Formatter, value: &Vec<T>) -> Result {
        let mut first = true;
        for v in value {
            if !first {
                write!(f, "{}", self.0)?;
            }
            first = false;
            write!(f, "{v}")?;
        }
        Ok(())
    }
}
impl<T: FromStr> FromStrFormat<Vec<T>> for Delimiter {
    type Err = T::Err;
    fn parse(&self, s: &str) -> core::result::Result<Vec<T>, Self::Err> {
        s.split(self.0)
            .map(str::parse)
            .collect::<core::result::Result<_, _>>()
    }
}

#[derive(Display, FromStr)]
struct Container {
   #[display(with = delimiter(", "))]
   numbers: Vec<usize>
}

@tsheinen
Copy link
Author

tsheinen commented Dec 3, 2023

ahhhh that's a good idea. I wrote up a poc for delimited data (https://github.com/tsheinen/parse-display/tree/delimited_fields) and was kinda having the same "well if i add delimited data i would kinda like to be able to do.... as well oh my thats a lot of complexity" realization as i worked on it. add a prelude of common DisplayFormat/FromStrFormat use cases and it's essentially the same thing but implemented more cleanly and extensibly.

@frozenlib
Copy link
Owner

I implemented #[display(with = ...)].
However, to ensure that type inference works well, the definitions of DisplayFormat and FromStrFormat are different from before, as follows.

pub trait DisplayFormat {
    type Value;
    fn write(&self, f: &mut Formatter, value: &Self::Value) -> Result;
}

pub trait FromStrFormat {
    type Value;
    type Err;
    fn parse(&self, s: &str) -> core::result::Result<Self::Value, Self::Err>;
    fn regex(&self) -> &str {
        "(?s:.*?)"
    }
}

I also wrote DisplayFormat and FromStrFormat implementation in the module parse_display::formats, but there are still some inconveniences such as delimiter() works for Vec but not for slices, so it needs to be improved.

@fritzrehde
Copy link
Contributor

I was just curious: Do you know if delimiter("") would also work for splitting a continuous list of chars without delimiter (e.g. "abcd" into vec![str::parse("a"), str::parse("b"), str::parse("c"), str::parse("d")]? Maybe this should also just be a separate feature, since it would be more efficient/idiomatic to split "abcd" into chars directly, not strings, but then we lose the ability to use our regular FromStr implementation for parsing each character.

@frozenlib
Copy link
Owner

Since delimiter uses str::split internally and "abcd".split("") is ["", "a", "b", "c", "d", ""], it cannot be used for that purpose.

Instead, you need to implement FromStrFormat as follows.

use std::{fmt::Formatter, marker::PhantomData, str::FromStr};

use parse_display::{FromStr, FromStrFormat};

fn main() {
    let s: ParsableVec<u32> = "1234".parse().unwrap();
    println!("{:?}", s.0); // [1, 2, 3, 4]
}

#[derive(FromStr)]
struct ParsableVec<T: FromStr>(#[from_str(with = CharsFormat)] Vec<T>);

struct CharsFormat;

impl<T: FromStr> FromStrFormat<Vec<T>> for CharsFormat {
    type Err = T::Err;
    fn parse(&self, s: &str) -> core::result::Result<Vec<T>, Self::Err> {
        let mut items = Vec::new();
        if !s.is_empty() {
            let mut start = 0;
            for i in 1..=s.len() {
                if s.is_char_boundary(i) {
                    items.push(s[start..i].parse()?);
                    start = i;
                }
            }
        }
        Ok(items)
    }
}

@fritzrehde
Copy link
Contributor

Any updates on this? Would really love this feature in parse_display!

@frozenlib
Copy link
Owner

The support for #[display(with = ...)] in derive macro is complete, but parse_display::formats module, which contains the implementation of DisplayFormat and FromStrFormat that can be used with this attribute, is not yet complete.

Since parse_display::formats module will be changed incompatibly, I plan to release the following two crates.

  • New version of parse_display which supports #[display(with = ...)] and contains the definition of DisplayFormat and FromStrFormat , but excluding parse_display::formats module.
  • A new crate containing only parse_display::formats module.

By separating into two crates, I believe it is also appropriate to add formats that depend on other crates (e.g., formats that use chrono::NaiveDate::parse_from_str).

These crates will be released as soon as I add #[display(with = ...)] documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants