Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition using schema configuration when caching in multi-threaded scenarios #744

Closed
costas80 opened this issue May 8, 2023 · 3 comments · Fixed by #910
Closed

Comments

@costas80
Copy link
Contributor

costas80 commented May 8, 2023

Class SchemaValidatorsConfig is currently used to define all configuration properties for validation before a JsonSchema instance is loaded. This is set on the schema:

  • When the schema is initialised.
  • If schema caching is enabled, once the schema has been retrieved from the cache.

In the second case, the configuration is set on the schema in case a different configuration was used when the schema was first initialised. This works fine if, using caching, we execute multiple validations sequentially, but is not correct in multi-threaded environments where we have parallel validations (for example a web application). The problem comes from setting the configuration on the cached schema itself which means that it may be changed by one thread while another thread is executing a validation.

As an example of the problem consider a web application using schema caching that allows users to choose the language for their JSON validation report between EN and FR. The following steps take place:

  1. Bob selects EN and triggers a validation. The schema is loaded for the first time with EN as its language and cached.
  2. Bob's validation begins and starts collecting EN validation messages.
  3. Before Bob's validation completes, Alice triggers a validation in FR. The same schema is loaded from the cache and FR is set as the language of choice.
  4. Bob's validation continues to produce messages, but given step 3, these are now in FR.

In brief, the issue is one of a shared concurrent cache with mutable state leading to race conditions. The current workaround for this is to either create a new JsonSchemaFactory per validation, or use a shared one with caching disabled.

To correct the problem and fully support caching we could follow one of two approaches:

  • Option A: When a JsonSchema instance is loaded from the cache, clone it in-depth before returning it, and set the configuration on the clone.
  • Option B: Use a ThreadLocal to set and read the SchemaValidatorsConfig instance.

From the two options above, using a ThreadLocal would be the simplest and is the approach used also elsewhere for similar concerns (see CollectorContext).

@stevehu
Copy link
Contributor

stevehu commented May 9, 2023

We are doing the schema validation in the light-4j framework and it works for us as we don't change the configuration once it is created. That is why we didn't experience the issue like you do. Both option A and B will have extra cost and most people will have the static configuration in a multi-thread environment. Can we brainstorm to find a better option?

@costas80
Copy link
Contributor Author

Fair point. There is however no way around the current design whereby cached schemas are basically the same instances that may be reused concurrently. This needs either different instances per thread execution or something based on ThreadLocals.

What I would suggest to address the (valid) performance concern you mention, is to use a ThreadLocal approach but only if explicitly requested by the user. I would add a new ValidationConfig, or even a second instance of SchemaValidatorsConfig that the user would explicitly pass when calling the factory.getSchema() methods. This second instance would be set and accessed via a ThreadLocal and, if set would override for the current validation run the default configuration set on the initial SchemaValidatorsConfig instance (if any).

Using this approach there is zero impact on performance given that the new per-validation configuration applies only if explicitly set by the user. How does this sound?

@stevehu
Copy link
Contributor

stevehu commented May 19, 2023

I think this should be OK. The default is to use a singleton.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants