Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deserialize into reference if there are escaped quotation marks in string #742

Closed
arlyon opened this issue Jan 3, 2021 · 6 comments

Comments

@arlyon
Copy link

arlyon commented Jan 3, 2021

Hi,

Sorry if this issue is known, I'm not sure how to phrase it. I am dealing with json strings, which I am parsing into structs that store string slices into the original string.

use serde::{Deserialize, Serialize};

const data: &str = r#"
[{
    "id": 27,
    "name": "grizzlemaw bear",
    "description": "This breed of honeyvore bear is distinguished by its gray fur. They live in deep snow away from villages, so it's uncommon to encounter one."
},
{
    "id": 28,
    "name": "hylian retriever",
    "description": "The native breed of this mammal varies by region, but one thing remains true: this animals has been known as \"man's best friend\" since ancient times."
}]
"#;

#[derive(Serialize, Deserialize, Debug)]
struct Creature<'a> {
    id: u32,
    name: &'a str,
    description: &'a str,
}

fn main() {
    let creatures: Vec<Creature> = serde_json::from_str(data).unwrap();
}

Stangely, serde is unable to parse the data if any of these string slices have escaped quotes in them. In the case of the second list element, I receive an error:

    Finished dev [unoptimized + debuginfo] target(s) in 0.78s
     Running `target/debug/botw-rust`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"The native breed of this mammal varies by region, but one thing remains true: this animals has been known as \\\"man\\\'s best friend\\\" since ancient times. They\\\'re very clever and obedient, so aside from serving as pets, they are also put to work watching over grazing livestock. It\\\'s said that all Hylian retrievers are descendants of the dog once owned by the king of Hyrule.\", expected a borrowed string", line: 308, column: 398)', src/main.rs:57:94
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Seems that, for whatever reason, it is not able to be borrowed if there is a quotation mark. If I am missing something, please let me know. If there is suspicion that this is a genuine bug I will happily write a minimal reproducing example.

@jonasbb
Copy link

jonasbb commented Jan 4, 2021

The problem here is that due to the escape sequences, serde_json cannot simply hand out a reference to the original string, since the escape sequences need to be removed while deserializing. You can deserialize this into a String or a Cow<'_, str>. You can also apply #[serde(borrow)] to the latter to get zero copy deserialization when possible.

If it has to be zero copy deserialization you can check here dtolnay/request-for-implementation#7

@arlyon
Copy link
Author

arlyon commented Jan 5, 2021

Ah, interesting, thanks for the reply. Going to leave this here for more context. #318

One thing I'm missing (for my personal understanding) is why? What is it that makes escape sequences hard to manage? Otherwise, Cow is fine for my use case, so I'll use that. Thanks!

@jbg
Copy link

jbg commented Feb 3, 2021

@arlyon the "input" bytes (from the JSON string) contain backslashes (and maybe other characters, like in a sequence like \u00f8) which should not be in the "output" bytes (the bytes pointed to by the &str in whatever structure that serde_json gives back to you), since those output bytes should contain the decoded escape sequence as normal UTF-8 (\" becomes ", \u00f8 becomes °, etc). So it's not possible to borrow because the bytes actually need to be different.

#[serde(borrow)] foo: Cow<'a, str> is a nice way to handle this because you get borrowed bytes whenever possible and owned bytes if it's not possible.

@arlyon
Copy link
Author

arlyon commented Feb 13, 2021

I think given that that this issue is 'solved'. Meant to close it last week but didn't get round to replying. Thanks for the detailed explanation!

@arlyon arlyon closed this as completed Feb 13, 2021
@arlyon
Copy link
Author

arlyon commented Mar 3, 2021

Useful article I stumbled across that may help others in this situation: https://d3lm.medium.com/rust-beware-of-escape-sequences-85ec90e9e243#ee0e-58229fc84d02

@ccddan
Copy link

ccddan commented Jul 10, 2022

Useful article I stumbled across that may help others in this situation: https://d3lm.medium.com/rust-beware-of-escape-sequences-85ec90e9e243#ee0e-58229fc84d02

Thanks @arlyon this was very helpful (for someone starting to use Rust)

@serde-rs serde-rs locked and limited conversation to collaborators Jul 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

4 participants