Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gate reconfig of primary to node endpoint changing #3830

Merged
merged 2 commits into from
Sep 7, 2022

Conversation

theoilie
Copy link
Contributor

@theoilie theoilie commented Sep 6, 2022

Description

Makes it so that regardless of which reconfig mode is enabled, a reconfig will not be issued if:

  • The primary is being updated in the reconfig
  • The cause of the update is anything except the primary's endpoint changing (as determined by its absence in our CNodeEndpoint->SP_ID map)

Tests

  • Automated tests pass
  • Manual testing of taking down a user's primary locally shows that it skips issuing the reconfig:
{
  "data": {
    "enqueuedBy": "find-replica-set-updates-queue#12",
    "wallet": "0x95a670da5e5fa236ddb586c71670cbf88f3b502d",
    "userId": 2,
    "primary": "http://cn1_creator-node_1:4000",
    "secondary1": "http://cn3_creator-node_1:4002",
    "secondary2": "http://cn4_creator-node_1:4003",
    "unhealthyReplicas": [
      "http://cn1_creator-node_1:4000"
    ],
    "replicaToUserInfoMap": {
      "http://cn4_creator-node_1:4003": {
        "clock": 2,
        "filesHash": "c7f7aaa9a7d02e52ebad83e616f9081b"
      },
      "http://cn1_creator-node_1:4000": {
        "clock": -1
      },
      "http://cn3_creator-node_1:4002": {
        "clock": 2,
        "filesHash": "c7f7aaa9a7d02e52ebad83e616f9081b"
      }
    },
    "parentSpanContext": {
      "traceId": "a2bc2b46719f8c064b357d30f42b1d9d",
      "spanId": "06abd97381ae7ae6",
      "traceFlags": 1
    },
    "enabledReconfigModes": [
      "RECONFIG_DISABLED",
      "ONE_SECONDARY",
      "MULTIPLE_SECONDARIES",
      "PRIMARY_AND_OR_SECONDARIES"
    ]
  },
  "returnValue": {
    "errorMsg": "",
    "issuedReconfig": false,
    "newReplicaSet": {
      "newPrimary": "http://cn3_creator-node_1:4002",
      "newSecondary1": "http://cn4_creator-node_1:4003",
      "newSecondary2": "http://cn2_creator-node_1:4001",
      "issueReconfig": true,
      "reconfigType": "PRIMARY_AND_OR_SECONDARIES"
    },
    "healthyNodes": [
      "http://cn3_creator-node_1:4002",
      "http://cn2_creator-node_1:4001",
      "http://cn4_creator-node_1:4003"
    ],
    "metricsToRecord": [
      {
        "metricName": "audius_cn_state_machine_update_replica_set_queue_job_duration_seconds",
        "metricType": "HISTOGRAM_OBSERVE",
        "metricValue": 5.603,
        "metricLabels": {
          "result": "skip_update_replica_set",
          "issuedReconfig": "false",
          "reconfigType": "primary_and_or_secondaries"
        }
      }
    ]
  }
}

Monitoring - How will this change be monitored? Are there sufficient logs / alerts?

Monitor logs for primaries being reconfiged: search for "PRIMARY_AND_OR_SECONDARIES" and then see if issuedReconfig is true or false.

@theoilie theoilie added the content-node Content Node (previously known as Creator Node) label Sep 6, 2022
@theoilie theoilie requested a review from SidSethi September 6, 2022 21:44
Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a temporary patch this seems fine

Copy link
Contributor

@SidSethi SidSethi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! i know you hadn't tested yet so if you can that'd be great

@theoilie
Copy link
Contributor Author

theoilie commented Sep 7, 2022

verified after a fix and updated the PR description to show that. ready to merge once mad dog passes @SidSethi @dmanjunath

Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love easy to follow conditional logic

@theoilie theoilie merged commit 97bf563 into master Sep 7, 2022
@theoilie theoilie deleted the theo-gate-primary-reconfig branch September 7, 2022 18:04
audius-infra pushed a commit that referenced this pull request Sep 8, 2022
## Changelog

- 2022-09-08 [1c8ea27] [C-991] Add entity-manager to native-libs (#3842) [Dylan Jeffers]
- 2022-09-08 [be3521b] Split Up `legacyUtil.js` (#3834) [Johannes Naylor]
- 2022-09-08 [108ee77] [CON-332] Refactor ContentNodeInfoManager and consumers for redis caching (#3819) [Theo Ilie]
- 2022-09-08 [4911903] [PAY-487] DN - Handle premium track access for track apis (#3783) [Saliou Diallo]
- 2022-09-07 [9171f7b] [CON-353] Add tracing visualization (#3837) [Johannes Naylor]
- 2022-09-07 [a6e3a54] Fix prune plays test cutoff time (#3839) [Isaac Solo]
- 2022-09-07 [8cdf4f3] Simplify EM social feature deletes to not rely on existing records (#3816) [Isaac Solo]
- 2022-09-07 [f2c4fc1] Add entity manager social features to libs (#3814) [Isaac Solo]
- 2022-09-07 [600fdeb] Add coalesce for release date sort for favorites query (#3836) [Kyle Shanks]
- 2022-09-07 [97bf563] Gate reconfig of primary to node endpoint changing (#3830) [Theo Ilie]
- 2022-09-07 [4ef1d2b] Bump to version 0.3.66 (#3835) [Cheran]
- 2022-09-06 [7ca4fc2] CON-358 - Skip delisted content and do not fail sync jobs anymore (#3765) [vicky :)]
- 2022-09-06 [cc6f005] Retry on cid verification in findCIDInNetwork (#3831) [vicky :)]
- 2022-09-06 [e99e788] Update findCIDInNetwork (#3825) [vicky :)]
- 2022-09-06 [0c3b906] Optimize Entity Manager DB fetch (#3829) [Isaac Solo]
- 2022-09-06 [e90c7ed] Fix duplicate export causing TypeError (#3827) [Theo Ilie]
- 2022-09-06 [aef1b30] Increase recurring sync queue waiting size from 1000 to 10000 (#3828) [Dheeraj Manjunath]
- 2022-09-06 [93cb0b8] Better logging after syncs (#3820) [Dheeraj Manjunath]
- 2022-09-06 [b09a612] Update random sp selection for reconfig (#3824) [vicky :)]
- 2022-09-06 [0a13201] remove utils.js (#3823) [Johannes Naylor]
- 2022-09-02 [4eca681] fix inconsistency with trailing zero (#3758) [Joseph Lee]
- 2022-09-02 [0b85e0c] Provide generateRecoveryLink status (#3817) [Dylan Jeffers]
- 2022-09-02 [12ee57d] [CON-284] Instrumentation of State Machine Queues (#3782) [Johannes Naylor]
- 2022-09-02 [cfe5fb2] [CON-284] Instrumentation of AsyncProcessingQueue (#3781) [Johannes Naylor]
- 2022-09-02 [9944c4b] Let orphaned data recovery run 24/7 (#3815) [Theo Ilie]
- 2022-09-02 [7e5bab1] Add social features to entity manager (#3795) [Isaac Solo]
- 2022-09-02 [c4ca932] Fix registration on dev setup (#3813) [Cheran]
- 2022-09-01 [40de44b] INF-224 Set ${PROTOCOL_DIR} within provision-dev-env.sh (#3811) [Joaquin Casares]
- 2022-09-01 [83394df] Fix get tracks by removing shared cache (#3810) [Isaac Solo]
- 2022-09-01 [3a9227b] [PAY-549] Write out IP with each call to relay (#3798) [Michael Piazza]
- 2022-09-01 [cf99a87] Clear write locks on init (#3809) [vicky :)]
- 2022-09-01 [11783e3] Update replica set and sync concurrency (#3808) [Dheeraj Manjunath]
- 2022-09-01 [de6fc7d] Add a libs function to save transaction metadata (#3804) [Marcus Pasell]
- 2022-09-01 [a475c31] Make queues limit how many jobs they add to each other (#3807) [Theo Ilie]
- 2022-09-01 [aabd49a] [PAY-587] Add transaction metadata to identity for transaction details (#3773) [Marcus Pasell]
- 2022-09-01 [cd0bb38] Fix typo in enum value (#3806) [Johannes Naylor]
- 2022-09-01 [a7ddb23] remove unneeded mount, since we don't load these modules (#3805) [Joaquin Casares]
- 2022-09-01 [f26f4a4] Better get_feed_es error logging.  Fix fetch related saves + reposts (#3797) [Steve Perkins]
- 2022-08-31 [5fc8f62] Add entity manager metrics (#3767) [Isaac Solo]
- 2022-08-31 [e85b8b1] Bump try higher so that lock is always released (#3803) [vicky :)]
- 2022-08-31 [ab26cf1] INF-227 Release Grafana Alerts to Production (#3772) [Joaquin Casares]
- 2022-08-31 [424f2d1] CON-380 CN make primarySyncFromSecondary work for multi-page exports against nodes without export bugfix (#3796) [Sid Sethi]
- 2022-08-31 [5273bf5] Lower orphaned data recovery sync reqs/sec (#3799) [Theo Ilie]
- 2022-08-31 [411b86b] [CON-284] Instrument tracing on syncQueue and immediateSyncQueue (#3780) [Johannes Naylor]
- 2022-08-31 [ff85889] Bump sdk to v1.0.1 [audius-infra]
audius-infra pushed a commit that referenced this pull request Sep 8, 2022
## Changelog

- 2022-09-08 [1c8ea27] [C-991] Add entity-manager to native-libs (#3842) [Dylan Jeffers]
- 2022-09-08 [be3521b] Split Up `legacyUtil.js` (#3834) [Johannes Naylor]
- 2022-09-08 [108ee77] [CON-332] Refactor ContentNodeInfoManager and consumers for redis caching (#3819) [Theo Ilie]
- 2022-09-08 [4911903] [PAY-487] DN - Handle premium track access for track apis (#3783) [Saliou Diallo]
- 2022-09-07 [9171f7b] [CON-353] Add tracing visualization (#3837) [Johannes Naylor]
- 2022-09-07 [a6e3a54] Fix prune plays test cutoff time (#3839) [Isaac Solo]
- 2022-09-07 [8cdf4f3] Simplify EM social feature deletes to not rely on existing records (#3816) [Isaac Solo]
- 2022-09-07 [f2c4fc1] Add entity manager social features to libs (#3814) [Isaac Solo]
- 2022-09-07 [600fdeb] Add coalesce for release date sort for favorites query (#3836) [Kyle Shanks]
- 2022-09-07 [97bf563] Gate reconfig of primary to node endpoint changing (#3830) [Theo Ilie]
- 2022-09-07 [4ef1d2b] Bump to version 0.3.66 (#3835) [Cheran]
- 2022-09-06 [7ca4fc2] CON-358 - Skip delisted content and do not fail sync jobs anymore (#3765) [vicky :)]
- 2022-09-06 [cc6f005] Retry on cid verification in findCIDInNetwork (#3831) [vicky :)]
- 2022-09-06 [e99e788] Update findCIDInNetwork (#3825) [vicky :)]
- 2022-09-06 [0c3b906] Optimize Entity Manager DB fetch (#3829) [Isaac Solo]
- 2022-09-06 [e90c7ed] Fix duplicate export causing TypeError (#3827) [Theo Ilie]
- 2022-09-06 [aef1b30] Increase recurring sync queue waiting size from 1000 to 10000 (#3828) [Dheeraj Manjunath]
- 2022-09-06 [93cb0b8] Better logging after syncs (#3820) [Dheeraj Manjunath]
- 2022-09-06 [b09a612] Update random sp selection for reconfig (#3824) [vicky :)]
- 2022-09-06 [0a13201] remove utils.js (#3823) [Johannes Naylor]
- 2022-09-02 [4eca681] fix inconsistency with trailing zero (#3758) [Joseph Lee]
- 2022-09-02 [0b85e0c] Provide generateRecoveryLink status (#3817) [Dylan Jeffers]
- 2022-09-02 [12ee57d] [CON-284] Instrumentation of State Machine Queues (#3782) [Johannes Naylor]
- 2022-09-02 [cfe5fb2] [CON-284] Instrumentation of AsyncProcessingQueue (#3781) [Johannes Naylor]
- 2022-09-02 [9944c4b] Let orphaned data recovery run 24/7 (#3815) [Theo Ilie]
- 2022-09-02 [7e5bab1] Add social features to entity manager (#3795) [Isaac Solo]
- 2022-09-02 [c4ca932] Fix registration on dev setup (#3813) [Cheran]
- 2022-09-01 [40de44b] INF-224 Set ${PROTOCOL_DIR} within provision-dev-env.sh (#3811) [Joaquin Casares]
- 2022-09-01 [83394df] Fix get tracks by removing shared cache (#3810) [Isaac Solo]
- 2022-09-01 [3a9227b] [PAY-549] Write out IP with each call to relay (#3798) [Michael Piazza]
- 2022-09-01 [cf99a87] Clear write locks on init (#3809) [vicky :)]
- 2022-09-01 [11783e3] Update replica set and sync concurrency (#3808) [Dheeraj Manjunath]
- 2022-09-01 [de6fc7d] Add a libs function to save transaction metadata (#3804) [Marcus Pasell]
- 2022-09-01 [a475c31] Make queues limit how many jobs they add to each other (#3807) [Theo Ilie]
- 2022-09-01 [aabd49a] [PAY-587] Add transaction metadata to identity for transaction details (#3773) [Marcus Pasell]
- 2022-09-01 [cd0bb38] Fix typo in enum value (#3806) [Johannes Naylor]
- 2022-09-01 [a7ddb23] remove unneeded mount, since we don't load these modules (#3805) [Joaquin Casares]
- 2022-09-01 [f26f4a4] Better get_feed_es error logging.  Fix fetch related saves + reposts (#3797) [Steve Perkins]
- 2022-08-31 [5fc8f62] Add entity manager metrics (#3767) [Isaac Solo]
- 2022-08-31 [e85b8b1] Bump try higher so that lock is always released (#3803) [vicky :)]
- 2022-08-31 [ab26cf1] INF-227 Release Grafana Alerts to Production (#3772) [Joaquin Casares]
- 2022-08-31 [424f2d1] CON-380 CN make primarySyncFromSecondary work for multi-page exports against nodes without export bugfix (#3796) [Sid Sethi]
- 2022-08-31 [5273bf5] Lower orphaned data recovery sync reqs/sec (#3799) [Theo Ilie]
- 2022-08-31 [411b86b] [CON-284] Instrument tracing on syncQueue and immediateSyncQueue (#3780) [Johannes Naylor]
- 2022-08-31 [ff85889] Bump sdk to v1.0.1 [audius-infra]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content-node Content Node (previously known as Creator Node) size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants