Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NPE during kubernetes cluster creation when network has rules with ports saved as null on DB #9223

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

GaOrtiga
Copy link
Contributor

Description

During the creation of firewall rules, if one of the limits for the ports is not informed, it is saved in the database as Null, indicating that there is no starting/ending limit.

while creating a Kubernetes cluster, if the selected network has a rule that contains ports saved as null, an error is thrown, stopping the execution of the process.

This behaviour has been fixed, making it so that any port saved as null is regarded as being being on its respective limit (1 for start ports and 65535 for end ports) during the creation of a Kubernetes cluster.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I created a rule in an existing network, with null starting port and ending on port 10, making it so it should not conflict with any of the ports required by the Kubernetes cluster. Before applying the changes, an error would be thrown.

I repeated the process after applying the changes and the cluster was successfully created.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Copy link

codecov bot commented Jun 12, 2024

Codecov Report

Attention: Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.

Project coverage is 15.81%. Comparing base (b19c069) to head (b3be45e).

Files with missing lines Patch % Lines
...va/com/cloud/network/dao/FirewallRulesDaoImpl.java 0.00% 11 Missing ⚠️
...KubernetesClusterResourceModifierActionWorker.java 0.00% 8 Missing ⚠️
...om/cloud/network/firewall/FirewallManagerImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9223      +/-   ##
============================================
- Coverage     15.81%   15.81%   -0.01%     
+ Complexity    12554    12553       -1     
============================================
  Files          5629     5629              
  Lines        492028   492041      +13     
  Branches      62750    60065    -2685     
============================================
- Hits          77813    77812       -1     
- Misses       405892   405906      +14     
  Partials       8323     8323              
Flag Coverage Δ
uitests 4.48% <ø> (ø)
unittests 16.60% <13.04%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GaOrtiga GaOrtiga force-pushed the fix_NPE_Kubernetes_cluster branch 2 times, most recently from 1be11ff to a8ce0f2 Compare June 12, 2024 18:35
Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

not tested

@borisstoyanov
Copy link
Contributor

@blueorangutan package

@borisstoyanov borisstoyanov self-assigned this Jun 13, 2024
@blueorangutan
Copy link

@borisstoyanov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 9927

borisstoyanov

This comment was marked as outdated.

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a failing test @GaOrtiga

13:56:56 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.128 s - in com.cloud.kubernetes.version.KubernetesVersionManagerImplTest
13:56:56 [INFO] Running com.cloud.kubernetes.version.KubernetesVersionServiceTest
13:56:56 [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.234 s - in com.cloud.kubernetes.version.KubernetesVersionServiceTest
13:56:57 [INFO] 
13:56:57 [INFO] Results:
13:56:57 [INFO] 
13:56:57 [ERROR] Failures: 
13:56:57 [ERROR]   KubernetesClusterManagerImplTest.validateIsolatedNetworkIpRulesApiConflictingRules Expected exception: com.cloud.exception.InvalidParameterValueException
13:56:57 [ERROR]   KubernetesClusterManagerImplTest.validateIsolatedNetworkIpRulesSshConflictingRules Expected exception: com.cloud.exception.InvalidParameterValueException
13:56:57 [INFO] 
13:56:57 [ERROR] Tests run: 39, Failures: 2, Errors: 0, Skipped: 0
13:56:57 [INFO] 
13:56:57 [INFO] ------------------------------------------------------------------------
13:56:57 [INFO] Reactor Summary for Apache CloudStack 4.20.0.0-SNAPSHOT:

@borisstoyanov borisstoyanov self-requested a review June 13, 2024 11:43
@borisstoyanov
Copy link
Contributor

thanks @GaOrtiga
@blueorangutan package

@blueorangutan
Copy link

@borisstoyanov a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9929

Copy link
Collaborator

@FelipeM525 FelipeM525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR in a local environment by trying to create a k8s cluster on a network with a firewall rule containing an empty start port.

Before

image

After

image

image

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10382

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-10884)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 51258 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9223-t10884-kvm-centos7.zip
Smoke tests completed. 129 look OK, 8 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_role_account_acls_multiple_mgmt_servers Error 2.15 test_dynamicroles.py
test_query_async_job_result Error 102.01 test_async_job.py
test_revoke_certificate Error 0.02 test_certauthority_root.py
test_configure_ha_provider_invalid Error 0.01 test_hostha_simulator.py
test_configure_ha_provider_valid Error 0.04 test_hostha_simulator.py
test_ha_configure_enabledisable_across_clusterzones Error 0.01 test_hostha_simulator.py
test_ha_disable_feature_invalid Error 0.01 test_hostha_simulator.py
test_ha_enable_feature_invalid Error 0.01 test_hostha_simulator.py
test_ha_list_providers Error 0.01 test_hostha_simulator.py
test_ha_multiple_mgmt_server_ownership Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_available Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_degraded Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_fenced Error 0.01 test_hostha_simulator.py
test_ha_verify_fsm_recovering Error 0.01 test_hostha_simulator.py
test_hostha_configure_default_driver Error 0.01 test_hostha_simulator.py
test_hostha_configure_invalid_provider Error 0.01 test_hostha_simulator.py
test_hostha_disable_feature_valid Error 0.01 test_hostha_simulator.py
test_hostha_enable_feature_valid Error 0.01 test_hostha_simulator.py
test_hostha_enable_feature_without_setting_provider Error 0.01 test_hostha_simulator.py
test_list_ha_for_host Error 0.01 test_hostha_simulator.py
test_list_ha_for_host_invalid Error 0.01 test_hostha_simulator.py
test_list_ha_for_host_valid Error 0.01 test_hostha_simulator.py
test_01_host_ping_on_alert Error 0.07 test_host_ping.py
test_01_host_ping_on_alert Error 0.07 test_host_ping.py
test_01_browser_migrate_template Error 15.30 test_image_store_object_migration.py
test_06_purge_expunged_vm_background_task Failure 340.83 test_purge_expunged_vms.py
test_hostha_enable_ha_when_host_disabled Error 5.79 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Error 305.91 test_hostha_kvm.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-10912)
Environment: kvm-alma8 (x2), Advanced Networking with Mgmt server a8
Total time taken: 52725 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9223-t10912-kvm-alma8.zip
Smoke tests completed. 135 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_06_purge_expunged_vm_background_task Failure 339.14 test_purge_expunged_vms.py
test_03_secured_to_nonsecured_vm_migration Error 396.95 test_vm_life_cycle.py

@DaanHoogland
Copy link
Contributor

@borisstoyanov , @FelipeM525 tested (#9223 (review)), do you still want to test this as well?

@JoaoJandre
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 10641

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 10658

@rohityadavcloud
Copy link
Member

@GaOrtiga can you review and address build failures? Thanks.

@GaOrtiga
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@GaOrtiga a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11098

@weizhouapache
Copy link
Member

@GaOrtiga
Can you update the title with short summay of the issue ?

@GaOrtiga GaOrtiga changed the title fix error during kubernetes cluster creation Fix NPE during kubernetes cluster creation when ports are saved as null on DB Sep 12, 2024
@GaOrtiga GaOrtiga changed the title Fix NPE during kubernetes cluster creation when ports are saved as null on DB Fix NPE during kubernetes cluster creation when network has rules with ports saved as null on DB Sep 12, 2024
@GaOrtiga
Copy link
Contributor Author

@GaOrtiga can you review and address build failures? Thanks.

I have rebased it, everything seems fine now.

@GaOrtiga Can you update the title with short summay of the issue ?

Title updated.

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11103

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11481)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53922 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9223-t11481-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Contributor

@borisstoyanov is this lgty now?

Copy link
Contributor

@kiranchavala kiranchavala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaOrtiga

If the network state is implemented and has an existing firewall rule.

The network already has a running state router

CKS cluster fails with the following exception

Screenshot 2024-09-23 at 4 17 13 PM

Logs

2024-09-23 10:44:27,659 DEBUG [c.c.k.c.KubernetesClusterManagerImpl] (qtp253011924-23:[ctx-ca5a29af, ctx-1c410861]) (logid:fd9f9d79) Validating rule with purpose: Firewall for network: 79a2d575-5596-4637-9675-8d39614a8515 with ports: 1-65535
2024-09-23 10:44:27,666 INFO  [c.c.a.ApiServer] (qtp253011924-23:[ctx-ca5a29af, ctx-1c410861]) (logid:fd9f9d79) Network ID: 79a2d575-5596-4637-9675-8d39614a8515 has conflicting firewall rules to provision Kubernetes cluster for API access

There is no issue with CKS cluster creation if the network state is allocated and has an existing firewall rule.

The network does not have a running router

Screenshot 2024-09-23 at 4 28 38 PM

@kiranchavala kiranchavala self-assigned this Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants