Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat datetime values as Strings during json deserialization in Json… #562

Merged

Conversation

rogordon01
Copy link
Contributor

JsonDataParser currently uses the default Newtonsoft datetime deserialization behavior, which is to turn datetime strings into .Net DateTime objects. This causes reformatting of the supplied value as well as loss of timezone details. This updated value is what is supplied to Liquid. The converted value may not be desired by end users and could cause further issues downstream.

This PR makes several changes:

  • Updates the JsonDataParser to treat DateTime strings as .Net strings by default
  • Allows end users to JsonDataParser classes which can change the serialization behavior
  • Updates JsonParser to allow different implementations of IDataParser to be supplied during construction
  • Updated test data to include datetime value, and ensure that they are converted correctly

using Newtonsoft.Json.Linq;

namespace Microsoft.Health.Fhir.Liquid.Converter.Parsers
{
public class JsonDataParser : IDataParser
{
private static Func<string, JsonReader> _defaultJsonReaderGenerator = (json) => new JsonTextReader(new StringReader(json))
{
DateParseHandling = DateParseHandling.None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustinburson @pallar-ms This is technically a breaking change for anyone using the OSS code. But I'd like to address it here as its a bug. A couple of ideas:

  • Bump the major version of the oss code to account for the breaking change
  • Keep the existing behavior in OSS, and create different implementations in PaaS products which set this value to DateParseHandling.None.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at how this is defined/used in the FHIR service. I am concerned this won't allow us to change the behavior per request at the FHIR service level.

I am not concerned about a breaking change in the OSS. We can do the necessary version update as you said and add documentation if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to just create a StrictJsonDataParser & StrictJsonProcessor. This would preserve backwards compat but allow us to define which parser we use in the API calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Fhir Service pulls out the correct IFhirConveter at request time from a map.

After this change the map will hold the 'fixed' datatime logic. The plan is to create a new JsonParser with a DateTimeFormattingJsonDataParser supplied to it and store that in the ConvertDataEngine/FhirServer. This parser will perform the current behavior. Based on the incoming request either use this parser or pull from the map.

Copy link
Collaborator

@pallar-ms pallar-ms May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to use an API version of the request or another request body input to decide which parser to use?
Btw the convert G2 also loads the processors into the map during app init and on request, depending on the input data type selects the processor. https://microsofthealth.visualstudio.com/Health/_git/convert?path=/convert/core/src/Microsoft.Health.Convert.LiquidConverter/Handlers/LiquidConverterHandler.cs
We can update the key tuple with another value(whatever the new input we are basing it off of) and update the map that way too, i.e., we can add JsonProcessor() and JsonProcessor(new DateTimeFormattingJsonDataParser) to the map with different keys.

Copy link
Contributor Author

@rogordon01 rogordon01 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current idea is to create a new request body input called JsonFhirConversionDatesAsStrings with a boolean or enum type of 'enabled/disabled'.

The map idea is interesting but I can see it getting a bit messy as we would need to add in another dimension to the tuple key, as you've said. I think it would be easier with a conditional:

//ConvertDataEngine Class
private readonly JsonProcessor strictJsonProcessor = new JsonProcessor(new DateTimeFormattingJsonDataParser);

private string GetConvertDataResult(ConvertDataRequest convertRequest, ITemplateProvider templateProvider, CancellationToken cancellationToken)
{
    IFhirConvert converter;
    if ( convertRequest.JsonFhirConversionDatesAsStrings)
    {
       parserToUse = strictJsonProcessor;
    }
    else
    {
       parserToUse = convertMap.Get(...)
    }

   ...
}

We could also capture this logic inside of a factory to keep it a bit cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that would work and I agree keeping it in the factory would be better. Might need to adjust this accordingly - https://github.com/microsoft/FHIR-Converter/blob/main/src/Microsoft.Health.Fhir.Liquid.Converter/Processors/ConvertProcessorFactory.cs

The only reason I proposed the map update was to keep the pattern of choosing the processor consistent from the map. Otherwise on first glance it looks odd why for one request param 'JsonFhirConversionDatesAsStrings' we pick the processor differently and then for another request param 'InputDataType' we use the map. But not strongly advocating it either since yeah adding to the key tuple is not too extendable and neat if we have another field later.

@rogordon01 rogordon01 changed the title Treat datetime strings as Strings during json deserialization in Json… Treat datetime values as Strings during json deserialization in Json… May 23, 2024
@rogordon01 rogordon01 requested a review from ms-teli May 23, 2024 22:04
@dustinburson dustinburson self-requested a review May 24, 2024 17:26
@rogordon01 rogordon01 requested a review from pallar-ms May 28, 2024 16:53
using Newtonsoft.Json.Linq;

namespace Microsoft.Health.Fhir.Liquid.Converter.Parsers
{
public class JsonDataParser : IDataParser
{
private static Func<string, JsonReader> _defaultJsonReaderGenerator = (json) => new JsonTextReader(new StringReader(json))
{
DateParseHandling = DateParseHandling.None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if we can supply other configurable settings in JsonTextReader()?
For now this seems fine since we are looking at datetime specifically, but wondering if we should just use JsonSerializerSettings to configure the deserialization behaviour for more options in the future. e.g.,
JsonConvert.DeserializeObject<JObject>(json, new JsonSerializerSettings { DateParseHandling = DateParseHandling.None });
Unless there are perf hits with using this compared to JsonTextReader.

Copy link
Contributor Author

@rogordon01 rogordon01 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into JsonConvert.DeserializeObject originally but it used JsonTextReader under the hood. And since it didn't appear to set the DateParseHandling flag on the reader I thought it wouldn't work. But I actually tried your suggestion and it appears to be honoring the DateParseHandling flag on the JsonSerializerSettings object. So your suggestion makes sense, I'll update accordingly.

Unsure about any perf impacts. Is there a test/perf harness available where I can try and get numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set up some unit tests to quickly evaluate perf. For each deserialization method, I performed 5 runs and captured the overall time. Each run performed 1 million deserializations.

The results are below. There doesn't appear to be much difference between the two approaches, so I'll go with JsonConvert.DeserializeObject.
 

JTokenParseTestAsync
00:00:07.7852149
00:00:06.5830398
00:00:06.9395215
00:00:05.9901044
00:00:06.0748714

JsonConvertTestAsync
00:00:06.4465205
00:00:06.6376552
00:00:06.4552853
00:00:06.4866381
00:00:06.3361394

Copy link
Collaborator

@pallar-ms pallar-ms May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for evaluating the perf impact.
I initially wondered if JsonTextReader might be more efficient from a memory standpoint since it streams data but if DeserializeObject is internally calling the JsonTextReader, then should be the same.

@@ -22,7 +42,8 @@ public object Parse(string json)

try
{
return JToken.Parse(json).ToObject();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does JObject.Parse(json) also have the same behaviour? If so, I do see other places where JObject.Parse is used and would be good to check if a similar setting needs to be applied there too depending on the input being parsed in those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suspect that all JXXX.Parse methods have the same behavior. There seem to be some default settings in Newtonsoft that are undesirable for our needs.

Agree that we should go through and address this if needed across the project. I suggest doing that work in a separate PR to reduce the scope of this one, which is to address a known customer issue.

We may also want to use this opportunity to move away from Newtonsoft and onto .Net's implementation, which would be a bigger effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. But just to point out, in the thread you forwarded, Dustin mentioned that this problem exists even in the post processing logic which also impacts the customer's issue being addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a potential issue with 'pre/post' processing the result in ADF, and not within Convert/FHIR Server. Looking at the internal PostProcessor of Json it looks like DateParsing is already disabled. I've tested out using Convert/Fhir Server directly and verified that results can come back with the original date format preserved.

@@ -56,7 +69,7 @@ public string Convert(JObject data, string rootTemplate, ITemplateProvider templ
{
var jsonData = data.ToObject();
var result = InternalConvertFromObject(jsonData, rootTemplate, templateProvider, traceInfo);
var hl7Message = GenerateHL7Message(JObject.Parse(result));
var hl7Message = GenerateHL7Message(ConvertToJObject(result));
Copy link
Contributor Author

@rogordon01 rogordon01 Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pallar-ms @dustinburson This class also uses the updated JsonProcessor which defaults to treating dates as strings. Because of this I updated this post processing step to also treat dates as strings.

We can also add this in a backwards compatible way in the Fhir-Server

As @pallar-ms mentioned there are a few other places where JObject.Parse is used. We can see if they need to be updated in as separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the JsonToHL7v2Processor is only used in the new convert preview APIs and is not supported in the FHIR server $convert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! That makes things easier

pallar-ms
pallar-ms previously approved these changes Jun 3, 2024
@@ -362,8 +362,8 @@ public void GivenJObjectInput_WhenConvertWithJsonProcessor_CorrectResultShouldBe
{
var processor = _jsonProcessor;
var templateProvider = new TemplateProvider(TestConstants.JsonTemplateDirectory, DataType.Json);
var testData = JObject.Parse(_jsonTestData);
var result = processor.Convert(testData, "ExamplePatient", templateProvider);
var testData = JObject.ReadFrom(new JsonTextReader(new StringReader(_jsonTestData)) { DateParseHandling = DateParseHandling.None });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion: Probably doesn't matter for this test, but just for consistency and in case a search all is done to get all references, maybe changing this to also do DeserializeObject() with the serializer settings would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rogordon01 rogordon01 merged commit 253307f into main Jun 3, 2024
2 checks passed
@rogordon01 rogordon01 deleted the personal/rogordon/fixDateDeserializationOnJsonConversion branch June 3, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants