Limit deep merging for mergeDeep patches when properties do not conflict #1217

dballance · 2018-01-03T22:29:40Z

Description

@shockey - Not sure if this is a good change to make or not, but would like to start a conversation to see if we can pull this (or similar change) in. 😄

Basically, with large, cyclical specs deep-extend cannot handle mergeDeep patches. This change checks the properties object of a mergeDeep patch - and if the props do no conflict, performs a shallow merge instead of a deep merge. This prevents quite a bit of time spent on recursion in deep-extend.

Example of a merge deep valPatch.value and patch.value:

// Do we really need to deeply merge these objects?
// properties in valPatch.value and patch.value don't conflict
valPatch.value = {
  '$$ref': '#/definitions/OData.Entity',
  properties: {
    '@odata.type': {
      type: "string"
    }
  }
}
patch.value = {
  description: 'A base type for all Entities',
  properties: {
    Id: {
      type: "string",
      description: "The unique entity id (key)."
    }
  }
}

deep-extend works great in the above example, but starts to break down once once the object trees are sufficiently large. There is an open item on unclechu/node-deep-extend#36 for large objects, but not sure if anything can be done there to avoid this issue.

I had a solution that checked all Object properties of the valPatch.value. If it found a matching property in the patch.value, it compared the object properties to see if any collisions occurred. This was probably more complete and robust, but less efficient. So for now, it only operates on properties prop of the patch. May want to add a property table to pull from during these checks so we can ensure completeness - think freelyNamedKeyParents.

Thoughts?

Motivation and Context

This change allows a very large spec to finally completely resolve. This is the same spec from #1150, #1190, any #1209.

How Has This Been Tested?

All existing tests pass.

With a spec sitting at ~5400 lines, spec resolution goes from ~226ms to ~193ms (average of 5 runs)

In the current spec I'm testing with - spec resolution is ~5.5 seconds. I've never been able to resolve this spec before these changes (and changes in #1209)! 🍾 🥂

Tested with a few specs from #1209 in swagger-ui as well - everything looks okay.

Types of changes

No code changes (changes to documentation, CI, metadata, etc)
Dependency changes (any modification to dependencies in package.json)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

shockey · 2018-01-20T03:51:15Z

Sorry for kicking the can down the road here!

Before I greenlight this, I'd like to hear from @ponelat - he architected this bit of Swagger-Client 😄

dballance · 2018-01-20T04:23:10Z

All good. Like I said, not even sure if it's a good change - unsure if there is a reason a deep clone is needed. IIRC, I don't think the patches hold references, so merging the patch value without the deep clone is pretty safe, but it's pretty ugly imo to cherry pick the merge strategy here.

This PR might not be the best change, but I think it's worthwhile optimizing these merges.

BTW - I'm trying to talk to legal on my side to see if we can share our spec externally. It's going to customers at some point, but unsure of any contractual relationship related to the data model. Might be worthwhile to share to use for testing. I know there will be more optimizations for swagger-ui as well - and having the spec available may help identify targets for optimization. The spec currently holds upwards of 300 definitions and almost 800 paths. The result of trying to slap REST on a 20+ year old monolith. 😨

ponelat · 2018-02-05T11:45:13Z

Thanks @dballance !
Just adding my 2c here...

Not sure how the shallow merge will differ ( in performance, or effect ) from a deep merge, if the children props are all different? If the props are the same, but there value is the same ( worst case for deep merge) then I understand the perf hit, and we can ( should ) optimize for that where its safe to do so.

We're looking at all sorts of performance boosts, so happy to have someone putting their eyes and time on it too!

PS: I'm not very active on this repo, but will try and respond as often as I can!

dballance · 2018-02-05T18:28:38Z

@ponelat - Yeah, it's not immediately obvious. The issue stems from the behavior of deepExtend, which recursively traverses and copies, the entire object - effectively always doing a deep clone, and never a shallow copy.

So, for large specs, it's always spending time making a deep copy of the patch object, and traversing all paths within that object, regardless of whether they conflict or not. It also recursively calls deepExtend on itself with an empty object in order to clone objects as they are merged.

From node-deep-extend, line 130:

// overwrite by new value if source isn't object or array
} else if (typeof src !== 'object' || src === null || Array.isArray(src)) {
	target[key] = deepExtend({}, val);
	return;
}

In my test data, some patches have a path that is 53 levels of object depth. This is where the performance bottleneck is. With Object.assign, we're just doing a shallow merge, and the entire object tree isn't being traversed.

ponelat · 2018-02-06T09:40:58Z

src/specmap/lib/index.js

+    const currentProps = Object.keys(valPatch.value.properties || {})
+    const newProps = Object.keys(patch.value.properties || {})
+
+    const hasConflictingProps = currentProps


Would it be worthwhile to deepExtend the non-conflicting props, and assign the missing/conflicting ones?
This would replace the if(hasConflictingProps) and the deepExtend call.

// if(hasFreelyNamedFields(patch.path) { ?? Something that looks at the path array to decide if its a freely named field. // Do NOT deeply clone all object values, only those that need to be deeply merged. for( const key in newProps ) { if(currentProps.hasOwnProperty(prop)) { deepExtend(currentProps[key], newProps[key]) } else { Object.assign(currentProps[key], newProps[key]) } }

Not sure I really follow the changes here, but I do think the logic could be simplified here a little. I'll make a few changes to simplify. 😄

ponelat · 2018-02-06T09:43:27Z

Aha! That is interesting!
Thanks for taking the time to explain @dballance.
So deepExtend will always fully traverse the object, whether or not we need to merge it. This gives us the a full/deep clone. Just trying to use my own words, to understand it.

Yeah, I don't see why this can't land. Added a comment, that might add a touch more perf. ( instead of dismissing the patch outright, if just one prop conflicts ).

Thanks for the efforts here! @shockey I imagine we'd still need a test here, to ensure deep merging occurs when we need it, or is that in the suite already?

dballance · 2018-02-21T04:41:23Z

@ponelat @shockey - sorry for taking so long to circle back to this. Apologies ahead for the lengthy explanation, just want to make sure the change is understood.

The typical problem area here is not the top level of the patch value itself, but the properties key of the patch value. This is the Open API properties schema element seen in this sample.

Because Open API properties can be composed via $allOfs and $refs, in highly graphical apis (OData via $expand) you can have very deep trees here. Like I mentioned before, a single property tree in my sample data was 53 levels deep, even after the $ref plugin preventing cyclical references. Our API is just so large that it takes 53 references before a cycle occurs - hooray for legacy monolithic codebases! 🤦‍♂️

So, we're actually digging into the patch value here to see if either patch has a properties key, and if any of the properties of this properties object are the same. If all the keys are unique, we don't need to deepExtend and can short circuit this expensive deep traversal.

I've re-implemented the logic here, and it should be more readable and not dig into the actual patch value for some specific property. Basically, we perform some of the tasks of deepExtend on the properties of the patch value, but only deepExtend if an Object property has conflicting keys. This should catch all cases when the current value and patch value share an Object prop, instead of only inspecting Open API properties.

Hope this make sense. New logic seems just as performant, but more readable and robust. Tests are 💯.

ponelat · 2018-02-28T08:18:33Z

lgtm.
Thanks for circling back @dballance !

shockey

LGTM

shockey · 2018-03-03T01:33:47Z

thanks, @dballance!

p.s.: have you had a chance to look at the new lazy resolution changes? i'd love to hear about how your massive document is faring with it.

dballance · 2018-03-05T16:21:54Z

@shockey - I haven't yet, will try to look this week and send some metrics along.

dballance · 2018-03-14T15:37:20Z

@shockey - Wanted to let you know that the new lazy resolution + these perf changes looks really promising. Our massive spec loads quite quick in the latest ui. Capturing some numbers now.

dballance · 2018-03-14T19:09:21Z

Did a few quick tests on http://petshop.swagger.io using my massive spec hosted on localhost. Measurements are from performance.mark timings between click events on the explore button to the proper title being displayed on the page. It's not 100% accurate, mostly because it was a quick & dirty check.

Button + Title I'm watching for this metric:

Average load time of ~4625ms, after discarding outliers. Again, this is for an absolutely massive spec.

If I enable some config optimizations (docExpansion='list', defaultModelExpandDepth=0, defaultModelsExpandDepth=0), we get an average load time of ~832ms!

Pretty damn good @shockey! 🎉

Measurements:
w/o config optimizations

w/ config optimizations

webron · 2018-03-14T23:27:24Z

@dballance - it sounds like you're happy ;)

shockey · 2018-03-20T00:19:33Z

@dballance awesome!! glad to hear the situation has improved on your end.

you've been a great help with these improvements 😄 if y'all run into any other bottlenecks, feel free to write up a report or open a PR!

shockey modified the milestones: January 12, 2018, January 19, 2018 Jan 6, 2018

shockey removed this from the January 19, 2018 milestone Jan 20, 2018

shockey requested a review from ponelat January 20, 2018 03:51

ponelat reviewed Feb 6, 2018

View reviewed changes

webron requested a review from shockey February 20, 2018 03:46

dballance added 5 commits February 20, 2018 20:06

Limit deep merging for mergeDeep patches when properties do not conflict

76d2603

fix lint

aff9822

remove invalid comparison

05dbc52

reimplement for more predictable and readable merges + rebase

afbed14

fix trailing space

66e3ebb

shockey approved these changes Mar 3, 2018

View reviewed changes

Merge branch 'master' into master

35c3013

shockey merged commit b6f59dc into swagger-api:master Mar 3, 2018

lipnitsk mentioned this pull request Nov 6, 2021

fix: deep array merge #2324

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit deep merging for mergeDeep patches when properties do not conflict #1217

Limit deep merging for mergeDeep patches when properties do not conflict #1217

dballance commented Jan 3, 2018

shockey commented Jan 20, 2018

dballance commented Jan 20, 2018

ponelat commented Feb 5, 2018

dballance commented Feb 5, 2018 •

edited

Loading

ponelat Feb 6, 2018

dballance Feb 21, 2018

ponelat commented Feb 6, 2018

dballance commented Feb 21, 2018 •

edited

Loading

ponelat commented Feb 28, 2018

shockey left a comment

shockey commented Mar 3, 2018

dballance commented Mar 5, 2018

dballance commented Mar 14, 2018

dballance commented Mar 14, 2018

webron commented Mar 14, 2018

shockey commented Mar 20, 2018

Limit deep merging for mergeDeep patches when properties do not conflict #1217

Limit deep merging for mergeDeep patches when properties do not conflict #1217

Conversation

dballance commented Jan 3, 2018

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

shockey commented Jan 20, 2018

dballance commented Jan 20, 2018

ponelat commented Feb 5, 2018

dballance commented Feb 5, 2018 • edited Loading

ponelat Feb 6, 2018

Choose a reason for hiding this comment

dballance Feb 21, 2018

Choose a reason for hiding this comment

ponelat commented Feb 6, 2018

dballance commented Feb 21, 2018 • edited Loading

ponelat commented Feb 28, 2018

shockey left a comment

Choose a reason for hiding this comment

shockey commented Mar 3, 2018

dballance commented Mar 5, 2018

dballance commented Mar 14, 2018

dballance commented Mar 14, 2018

webron commented Mar 14, 2018

shockey commented Mar 20, 2018

dballance commented Feb 5, 2018 •

edited

Loading

dballance commented Feb 21, 2018 •

edited

Loading