Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.0.0-alpha2 Problem with string field with special char #3743

Closed
lukapor opened this issue May 9, 2019 · 5 comments
Closed

7.0.0-alpha2 Problem with string field with special char #3743

lukapor opened this issue May 9, 2019 · 5 comments

Comments

@lukapor
Copy link
Contributor

lukapor commented May 9, 2019

I discover another bug with new internal serializer. The problem appears when special char is present in string. Newtonsoft json serialized that character diffrent that utf8json. The problem occours on server side

Object: {"CreatedDate":"2019-05-07T12:00:00.01","Text":"Pƒ:$Zçâ¡/<Ïþ_-ª¥§R3,9ómS\u0012%5"}
POST http://localhost:9210/default_index/_doc?pretty=true&refresh=wait_for
{"createdDate":"2019-05-07T12:00:00.0100000","text":"Pƒ:$Zçâ¡/<Ïþ_-ª¥§R3,9ómS�%5"}

Status: 400
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [text] of type [text] in document with id 'maasnmoBoQE9XFgPGe4Z'"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [text] of type [text] in document with id 'maasnmoBoQE9XFgPGe4Z'",
"caused_by" : {
"type" : "json_parse_exception",
"reason" : "Illegal unquoted character ((CTRL-CHAR, code 18)): has to be escaped using backslash to be included in string value\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@55e03e0e; line: 1, column: 94]"
}
},
"status" : 400
}

Code to reproduce:

using System;
using System.Text;
using Elasticsearch.Net;
using Nest;
using Newtonsoft.Json;

namespace ConsoleApp11
{
class Program
{
static void Main(string[] args)
{
var defaultIndex = "default_index";
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9210"));

        var settings = new ConnectionSettings(pool)
            .DefaultIndex(defaultIndex)
            .DisableDirectStreaming()
            .PrettyJson()
            .OnRequestCompleted(callDetails =>
            {
                if (callDetails.RequestBodyInBytes != null)
                {
                    Console.WriteLine(
                        $"{callDetails.HttpMethod} {callDetails.Uri} \n" +
                        $"{Encoding.UTF8.GetString(callDetails.RequestBodyInBytes)}");
                }
                else
                {
                    Console.WriteLine($"{callDetails.HttpMethod} {callDetails.Uri}");
                }

                Console.WriteLine();

                if (callDetails.ResponseBodyInBytes != null)
                {
                    Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
                             $"{Encoding.UTF8.GetString(callDetails.ResponseBodyInBytes)}\n" +
                             $"{new string('-', 30)}\n");
                }
                else
                {
                    Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
                             $"{new string('-', 30)}\n");
                }
            });

        var client = new ElasticClient(settings);

        if (client.IndexExists(defaultIndex).Exists)
            client.DeleteIndex(defaultIndex);

        client.CreateIndex(defaultIndex, c => c
            .Map<Document>(mm => mm
                .AutoMap()
            )
        );
                           
        var specialText = @"Pƒ:$Žçâ¡/‹Ïþ_—ª¥§R3,9ómS�%5";
        var doc = new Document { CreatedDate = new DateTime(2019, 5, 7, 12, 0, 0, 10), Text = specialText };
        var serString =  JsonConvert.SerializeObject(doc);
        Console.WriteLine($"Object: {serString}");

        client.Index(
            doc,
            i => i.Refresh(Refresh.WaitFor));
    }
}

public class Document
{
    public DateTime CreatedDate { get; set; }

    public string Text { get; set; }
}

}

@russcam
Copy link
Contributor

russcam commented May 9, 2019

@lukapor is Pƒ:$Zçâ¡/<Ïþ_-ª¥§R3,9ómS\u0012%5 the original string?

@lukapor
Copy link
Contributor Author

lukapor commented May 9, 2019

Yes sorry, i didnt notice that character gone when i passed the code

@russcam
Copy link
Contributor

russcam commented May 9, 2019

Thanks, I'm looking at now

russcam added a commit that referenced this issue May 10, 2019
This commit corrects serialization of Unicode special characters.
All characters before U+0020 (' ' space character) must be encoded
in bytes as Unicode escape sequences

Fixes #3743
@russcam
Copy link
Contributor

russcam commented May 10, 2019

I've opened #3744 to address. Thanks for opening @lukapor 👍

russcam added a commit that referenced this issue May 10, 2019
This commit corrects serialization of Unicode special characters.
All characters before U+0020 (' ' space character) must be encoded
in bytes as Unicode escape sequences

Fixes #3743
@russcam
Copy link
Contributor

russcam commented May 16, 2019

closing this as #3744 is now merged and will be the the next release

@russcam russcam closed this as completed May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants