Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resync watch resource if receive stop event #6024

Closed
wants to merge 1 commit into from

Conversation

ly5156
Copy link
Contributor

@ly5156 ly5156 commented May 24, 2022

Summary

Fixes #

#5997

Occurred changes and/or fixed issues

Call the dispatch('resyncWatch', obj) instead of the dispatch('watch', obj);

Technical notes summary

The UI should not compare the resourceVersion of a collection with the resourceVersion of a collection element to get the maximum value as the watch's resourceVersion parameter, because the resourceVersion of a collection is not the same concept as the resourceVersion of a collection element, refer to the following Link:
https://kubernetes.io/docs/reference/using-api/api-concepts/#resourceversion-in-metadata

ui code:
https://github.com/rancher/dashboard/blob/master/shell/plugins/steve/subscribe.js#L604-L612

Copy link
Member

@rak-phillip rak-phillip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I didn't encounter the original error while testing, there seems to be a few knock-on effects associated with this change.. Here are at least two differences that I've observed between 2.6-head and this PR. I observed these differences while creating an RKE2 Digital Ocean cluster.

  1. Many instances of Uncaught TypeError: can't access property "match", t is undefined. Here's an example of one such instance:

    Uncaught TypeError: can't access property "match", t is undefined
        haveIds type-map.js:1260
        activeProducts type-map.js:1260
        activeProducts type-map.js:1219
        wrappedGetter vuex.esm.js:881
        partial vuex.esm.js:140
        VueJS 3
        get vuex.esm.js:671
        currentProduct index.js:122
        wrappedGetter vuex.esm.js:881
        partial vuex.esm.js:140
        VueJS 3
        get vuex.esm.js:671
        currentStore index.js:139
        defaultTableSortGenerationFn ResourceTable.vue:20
        safeSortGenerationFn ResourceTable.vue:343
        arrangedRows sorting.js:36
        VueJS 3
        filteredRows filtering.js:41
        VueJS 3
        pagedRows paging.js:36
        VueJS 3
        updateLiveColumns index.vue:559
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
        setTimeout handler*updateLiveColumns index.vue:586
        _liveColumnsTimer index.vue:586
    
  2. The console becomes much more chatty throughout this process (not necessarily a problem, but I think we should understand why). Here's an example of some console chatter that I'm seeing from action.js and subscribe.js that wasn't present before

    Find: [rancher] cloudcredential cattle-global-data:cc-gwlcd [actions.js:272](webpack:///shell/plugins/dashboard-store/actions.js)
    Resync [management] 
    Object { type: "management.cattle.io.setting", resourceType: "management.cattle.io.setting", id: undefined, namespace: undefined, selector: undefined }
    [subscribe.js:287](webpack:///shell/plugins/steve/subscribe.js)
    Find All: [management] management.cattle.io.setting [actions.js:110](webpack:///shell/plugins/dashboard-store/actions.js)
    Resync [management] 
    Object { type: "management.cattle.io.feature", resourceType: "management.cattle.io.feature", id: undefined, namespace: undefined, selector: undefined }
    [subscribe.js:287](webpack:///shell/plugins/steve/subscribe.js)
    Find All: [management] management.cattle.io.feature [actions.js:110](webpack:///shell/plugins/dashboard-store/actions.js)
    Resync [management] 
    Object { type: "management.cattle.io.cluster", resourceType: "management.cattle.io.cluster", id: undefined, namespace: undefined, selector: undefined }
    [subscribe.js:287](webpack:///shell/plugins/steve/subscribe.js)
    Find All: [management] management.cattle.io.cluster [actions.js:110](webpack:///shell/plugins/dashboard-store/actions.js)
    Resync [management] 
    Object { type: "namespace", resourceType: "namespace", id: undefined, namespace: undefined, selector: undefined }
    [subscribe.js:287](webpack:///shell/plugins/steve/subscribe.js)
    Find All: [management] namespace [actions.js:110](webpack:///shell/plugins/dashboard-store/actions.js)
    Resync [management] 
    Object { type: "management.cattle.io.fleetworkspace", resourceType: "management.cattle.io.fleetworkspace", id: undefined, namespace: undefined, selector: undefined }
    [subscribe.js:287](webpack:///shell/plugins/steve/subscribe.js)
    

@ly5156
Copy link
Contributor Author

ly5156 commented May 25, 2022

@rak-phillip
If the cluster is being created, the backend will push a stop message for the relevant resource(ui.cattle.io.navlink,namespace ,schema ...), which will trigger resyncWatch resource and cause the js to report an error.

  1. The schema resource needs to be loaded separately because the ui needs to add _id and _group attributes to the schema resource(https://github.com/rancher/dashboard/blob/master/shell/plugins/dashboard-store/actions.js#L55-L58), which was not done before and resulted in the js error. I have therefore amended the PR and tested myself that it does not report errors anymore.

  2. The console logs come from https://github.com/rancher/dashboard/blob/master/shell/plugins/steve/subscribe.js#L287
    If the resyncWatch method is called the log will be printed

@ly5156 ly5156 requested a review from rak-phillip May 25, 2022 05:37
@richard-cox richard-cox added this to the v2.6.6 milestone May 31, 2022
@rak-phillip rak-phillip self-assigned this Jun 9, 2022
Copy link
Member

@rak-phillip rak-phillip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ly5156 once again, thanks for the PR. This looks to be quite a delicate change, so I want to make sure that its able to get the attention it deserves.

I'm still experiencing some regressions as a result of this change. It looks like we've introduced an intermittent timing issue when provisioning a new cluster with this change. I'm not able to reproduce this issue when using watch.

The issue surfaces itself as:

CreateEditView mixin failed to save:  TypeError: can't access property "id", _this13.value.mgmt is null

In reviewing shell/edit/provisioning.cattle.io.cluster/rke2.vue, I can see that mgmt does end up intermittently returning a null value, but I'm not able to identify why this would happen. This will need to be addressed and it has me wondering if always dispatching resyncWatch is the correct course of action here..

I'm thinking we might need to get a few things in place in order to address the issue we're seeing:

  1. Update our caching mechanism to to distinguish between list and item resources. I'm not seeing a way to improve on this with what's immediately available to us in dashboard. We make use of the type (e.g. catalog.cattle.io.app) and attempt to look up the latestresourceVersion by making use of Math.max().
  2. Identify cache invalidation earlier. Dashboard currently doesn't have any information available to discern if the cache is invalidated unless we receive a message that contains too old in the resource.error event.. I think we'd be able to appropriately make use of the local cache and resyncWatch if we could make use of something like 410 Gone on the resource.stop event. https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes

@rak-phillip
Copy link
Member

@ly5156 thanks again for raising this PR and contributing to Rancher.

I'm closing this PR because we have a working resolution via rancher/steve#49. We still need to follow-up after reviewing our web socket implementation and caching strategy to properly resolve other issues identified in #5997.

@rak-phillip rak-phillip closed this Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants