Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit deep merging for mergeDeep patches when properties do not conflict #1217

Merged
merged 6 commits into from
Mar 3, 2018
Merged

Conversation

dballance
Copy link
Contributor

Description

@shockey - Not sure if this is a good change to make or not, but would like to start a conversation to see if we can pull this (or similar change) in. 😄

Basically, with large, cyclical specs deep-extend cannot handle mergeDeep patches. This change checks the properties object of a mergeDeep patch - and if the props do no conflict, performs a shallow merge instead of a deep merge. This prevents quite a bit of time spent on recursion in deep-extend.

Example of a merge deep valPatch.value and patch.value:

// Do we really need to deeply merge these objects?
// properties in valPatch.value and patch.value don't conflict
valPatch.value = {
  '$$ref': '#/definitions/OData.Entity',
  properties: {
    '@odata.type': {
      type: "string"
    }
  }
}
patch.value = {
  description: 'A base type for all Entities',
  properties: {
    Id: {
      type: "string",
      description: "The unique entity id (key)."
    }
  }
}

deep-extend works great in the above example, but starts to break down once once the object trees are sufficiently large. There is an open item on unclechu/node-deep-extend#36 for large objects, but not sure if anything can be done there to avoid this issue.

I had a solution that checked all Object properties of the valPatch.value. If it found a matching property in the patch.value, it compared the object properties to see if any collisions occurred. This was probably more complete and robust, but less efficient. So for now, it only operates on properties prop of the patch. May want to add a property table to pull from during these checks so we can ensure completeness - think freelyNamedKeyParents.

Thoughts?

Motivation and Context

This change allows a very large spec to finally completely resolve. This is the same spec from #1150, #1190, any #1209.

How Has This Been Tested?

All existing tests pass.

With a spec sitting at ~5400 lines, spec resolution goes from ~226ms to ~193ms (average of 5 runs)

In the current spec I'm testing with - spec resolution is ~5.5 seconds. I've never been able to resolve this spec before these changes (and changes in #1209)! 🍾 🥂

Tested with a few specs from #1209 in swagger-ui as well - everything looks okay.

Types of changes

  • No code changes (changes to documentation, CI, metadata, etc)
  • Dependency changes (any modification to dependencies in package.json)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@shockey shockey removed this from the January 19, 2018 milestone Jan 20, 2018
@shockey
Copy link
Contributor

shockey commented Jan 20, 2018

Sorry for kicking the can down the road here!

Before I greenlight this, I'd like to hear from @ponelat - he architected this bit of Swagger-Client 😄

@shockey shockey requested a review from ponelat January 20, 2018 03:51
@dballance
Copy link
Contributor Author

All good. Like I said, not even sure if it's a good change - unsure if there is a reason a deep clone is needed. IIRC, I don't think the patches hold references, so merging the patch value without the deep clone is pretty safe, but it's pretty ugly imo to cherry pick the merge strategy here.

This PR might not be the best change, but I think it's worthwhile optimizing these merges.

BTW - I'm trying to talk to legal on my side to see if we can share our spec externally. It's going to customers at some point, but unsure of any contractual relationship related to the data model. Might be worthwhile to share to use for testing. I know there will be more optimizations for swagger-ui as well - and having the spec available may help identify targets for optimization. The spec currently holds upwards of 300 definitions and almost 800 paths. The result of trying to slap REST on a 20+ year old monolith. 😨

@ponelat
Copy link
Member

ponelat commented Feb 5, 2018

Thanks @dballance !
Just adding my 2c here...

Not sure how the shallow merge will differ ( in performance, or effect ) from a deep merge, if the children props are all different? If the props are the same, but there value is the same ( worst case for deep merge) then I understand the perf hit, and we can ( should ) optimize for that where its safe to do so.

We're looking at all sorts of performance boosts, so happy to have someone putting their eyes and time on it too!

PS: I'm not very active on this repo, but will try and respond as often as I can!

@dballance
Copy link
Contributor Author

dballance commented Feb 5, 2018

@ponelat - Yeah, it's not immediately obvious. The issue stems from the behavior of deepExtend, which recursively traverses and copies, the entire object - effectively always doing a deep clone, and never a shallow copy.

So, for large specs, it's always spending time making a deep copy of the patch object, and traversing all paths within that object, regardless of whether they conflict or not. It also recursively calls deepExtend on itself with an empty object in order to clone objects as they are merged.

From node-deep-extend, line 130:

// overwrite by new value if source isn't object or array
} else if (typeof src !== 'object' || src === null || Array.isArray(src)) {
	target[key] = deepExtend({}, val);
	return;
}

In my test data, some patches have a path that is 53 levels of object depth. This is where the performance bottleneck is. With Object.assign, we're just doing a shallow merge, and the entire object tree isn't being traversed.

const currentProps = Object.keys(valPatch.value.properties || {})
const newProps = Object.keys(patch.value.properties || {})

const hasConflictingProps = currentProps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worthwhile to deepExtend the non-conflicting props, and assign the missing/conflicting ones?
This would replace the if(hasConflictingProps) and the deepExtend call.

// if(hasFreelyNamedFields(patch.path) { ?? Something that looks at the path array to decide if its a freely named field.
// Do NOT deeply clone all object values, only those that need to be deeply merged.
for( const key in newProps ) {
  if(currentProps.hasOwnProperty(prop)) {
    deepExtend(currentProps[key], newProps[key]) 
  } else {
    Object.assign(currentProps[key], newProps[key])
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I really follow the changes here, but I do think the logic could be simplified here a little. I'll make a few changes to simplify. 😄

@ponelat
Copy link
Member

ponelat commented Feb 6, 2018

Aha! That is interesting!
Thanks for taking the time to explain @dballance.
So deepExtend will always fully traverse the object, whether or not we need to merge it. This gives us the a full/deep clone. Just trying to use my own words, to understand it.

Yeah, I don't see why this can't land. Added a comment, that might add a touch more perf. ( instead of dismissing the patch outright, if just one prop conflicts ).

Thanks for the efforts here! @shockey I imagine we'd still need a test here, to ensure deep merging occurs when we need it, or is that in the suite already?

@webron webron requested a review from shockey February 20, 2018 03:46
@dballance
Copy link
Contributor Author

dballance commented Feb 21, 2018

@ponelat @shockey - sorry for taking so long to circle back to this. Apologies ahead for the lengthy explanation, just want to make sure the change is understood.

The typical problem area here is not the top level of the patch value itself, but the properties key of the patch value. This is the Open API properties schema element seen in this sample.

Because Open API properties can be composed via $allOfs and $refs, in highly graphical apis (OData via $expand) you can have very deep trees here. Like I mentioned before, a single property tree in my sample data was 53 levels deep, even after the $ref plugin preventing cyclical references. Our API is just so large that it takes 53 references before a cycle occurs - hooray for legacy monolithic codebases! 🤦‍♂️

So, we're actually digging into the patch value here to see if either patch has a properties key, and if any of the properties of this properties object are the same. If all the keys are unique, we don't need to deepExtend and can short circuit this expensive deep traversal.

I've re-implemented the logic here, and it should be more readable and not dig into the actual patch value for some specific property. Basically, we perform some of the tasks of deepExtend on the properties of the patch value, but only deepExtend if an Object property has conflicting keys. This should catch all cases when the current value and patch value share an Object prop, instead of only inspecting Open API properties.

Hope this make sense. New logic seems just as performant, but more readable and robust. Tests are 💯.

@ponelat
Copy link
Member

ponelat commented Feb 28, 2018

lgtm.
Thanks for circling back @dballance !

Copy link
Contributor

@shockey shockey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shockey shockey merged commit b6f59dc into swagger-api:master Mar 3, 2018
@shockey
Copy link
Contributor

shockey commented Mar 3, 2018

thanks, @dballance!

p.s.: have you had a chance to look at the new lazy resolution changes? i'd love to hear about how your massive document is faring with it.

@dballance
Copy link
Contributor Author

@shockey - I haven't yet, will try to look this week and send some metrics along.

@dballance
Copy link
Contributor Author

@shockey - Wanted to let you know that the new lazy resolution + these perf changes looks really promising. Our massive spec loads quite quick in the latest ui. Capturing some numbers now.

@dballance
Copy link
Contributor Author

Did a few quick tests on http://petshop.swagger.io using my massive spec hosted on localhost. Measurements are from performance.mark timings between click events on the explore button to the proper title being displayed on the page. It's not 100% accurate, mostly because it was a quick & dirty check.

Button + Title I'm watching for this metric:
image

Average load time of ~4625ms, after discarding outliers. Again, this is for an absolutely massive spec.

If I enable some config optimizations (docExpansion='list', defaultModelExpandDepth=0, defaultModelsExpandDepth=0), we get an average load time of ~832ms!

Pretty damn good @shockey! 🎉

Measurements:
w/o config optimizations
image

w/ config optimizations
image

@webron
Copy link
Contributor

webron commented Mar 14, 2018

@dballance - it sounds like you're happy ;)

@shockey
Copy link
Contributor

shockey commented Mar 20, 2018

@dballance awesome!! glad to hear the situation has improved on your end.

you've been a great help with these improvements 😄 if y'all run into any other bottlenecks, feel free to write up a report or open a PR!

@lipnitsk lipnitsk mentioned this pull request Nov 6, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants