-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized deserialization for enum values #95
Comments
From joelittl...@gmail.com on March 13, 2013 13:50:49
When you refer to creating a String in memory, are you talking about calling something like value.toUpperCase() before passing the value into this method? Could we not solve this problem by simply changing the above code snippet to:
Or maybe I have misunderstood what the problem is or the proposed solution? |
From sak...@gmail.com on March 13, 2013 13:54:01 |
From joelittl...@gmail.com on March 13, 2013 16:37:09 I think we do have a penalty at runtime here. It looks to me like we're trading off iteration for regex matching and hash calculation. The question is, how many enum values are required before the latter becomes more efficient (and is the cost ever big enough to warrant this extra complexity). I'd be interested to see some stats on what the performance characteristics are for each approach as the number of enum constants grows. |
From joelittl...@gmail.com on March 13, 2013 16:38:11 |
From sak...@gmail.com on March 13, 2013 16:38:18 |
From joelittl...@gmail.com on March 13, 2013 16:53:42
For every string we parse, we need to perform a regex match and calculate a hash. When the number of constants in the enum is small, iteration may be faster than a regex match plus hash calculation (particularly as the JIT may flatten this iteration into a single set of comparisons). When the number of enum constants is large, I'm sure that iteration will be more costly than regex match plus hash calculation. I don't have a good idea what these performance characteristics would be, but I'd be interested to see the stats as the number of enum constants grows. |
From sak...@gmail.com on March 13, 2013 17:11:49 Current strategy: repeated iteration of enum values for every membership check or parse attempt (you don't currently have a membership check other than the thrown exception) Proposed strategy: one time iteration of enum values to build regex and calculate hashes The current strategy has O(n), or linear, order of complexity. The proposed strategy has O(1), or constant, order of complexity since it uses hashes. Yes, there is a penalty for the proposed strategy of O(n), but it is a one-time penalty. And unless you're concerned with how quickly your JVM loads your classes, it's not really a problem at all. In fact, unless you're in the business of writing an application that let's users load plug-ins at runtime where you've probably written your own class loader or use a IoC container like Spring or Giuce to manage the class loading for you, I cannot imagine where the proposed strategy isn't the preferred one. No doubt it would be interesting to see how long the JVM takes to load enum with 1000 values under this new strategy. But even if it takes a full minute, if you're writing Java for a webapp or a server app, the start time is usually sacrificed without concern for faster runtime operations. |
From joelittl...@gmail.com on March 13, 2013 17:20:28 I don't think I'm leaving out any detail, but I don't know how I can rephrase this another time. Maybe I'm misunderstanding your pseudo-code? The 'Proposed' strategy performs the following on parsing a value:
These are not one-time penalties, correct? To repeat the above: "When the number of constants in the enum is small, iteration may be faster than a regex match plus hash calculation (particularly as the JIT may flatten this iteration into a single set of comparisons). When the number of enum constants is large, I'm sure that iteration will be more costly than regex match plus hash calculation." |
From sak...@gmail.com on March 13, 2013 17:22:41 Yes, these are not one-time penalties. I'm not sure either. I'll test it when I get a chance and upload the test project to github. |
From joelittl...@gmail.com on March 13, 2013 17:31:45 I guess there may be situations in which mechanically generated schemas produce a very large number of enum constants, and this iteration time would become significant. In this case there would be no real workaround other than to replace the enum with a hand-coded version. We should probably avoid replacing this with a case-insensitive mapping, this would not be compatible with the schema definition for enum and would potentially allow bad json data to be bound. This should simplify the implementation though, so if you do come to test this it would be good to test without the regex step too. |
From joelittl...@gmail.com on March 24, 2013 21:37:32 Thanks for raising this sakutz, and please feel free to build a snapshot from the current master and try the new implementation. |
From sak...@gmail.com on April 02, 2013 01:50:18 |
Original author: sak...@gmail.com (March 13, 2013 12:48:52)
The way enums are currently generated is not as effective as it could be, especially when it comes to parsing them from their string values. Check out a quick example I wrote at https://gist.github.com/akutz/5151745. The reason I used both a pattern string and a map was to gain the efficiency of a case-insensitive comparison for testing the validity of the text as an enum vs always transforming the tested string. There is no reason to create a new String in memory for the comparison if we can handle it up-front with a case-insensitive regex pattern.
Original issue: http://code.google.com/p/jsonschema2pojo/issues/detail?id=95
The text was updated successfully, but these errors were encountered: