-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[skip ci] Add ROBO test plans #7297
[skip ci] Add ROBO test plans #7297
Conversation
d421c42
to
a352d6d
Compare
9e6d7f8
to
6e5edeb
Compare
6e5edeb
to
36bf9bc
Compare
36bf9bc
to
ef97184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - just one small non-blocking comment, take it or leave it
* All steps should succeed. | ||
* In Step 2, the VCH should be placed on a host that satisfies the license and other feature requirements. | ||
* In Steps 3 and 6, containers shouldn't fail to be created/started unless the cluster resources/limits are exhausted. | ||
* In Steps 4 and 7, containers should be placed according to the criteria defined in [Purpose](#purpose). More details are TBD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually number the references and refer here to the 2nd ref, not the purpose
ef97184
to
a71fd90
Compare
@@ -0,0 +1,36 @@ | |||
Test 19-3 - ROBO - VM Placement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to control host resource consumption and test with that in mind. For example deploy progrium/stress
containerVMs that will consume resources in a predictable manner -- once the hosts are running at a known level of consumption then deploy test containerVMs (ubuntu, busybox, etc) and ensure they are placed on the host that we know has the necessary free resources.
Additionally, we need a negative test -- i.e. all hosts are consumed, so deployment fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll add these details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
a71fd90
to
63ba0fb
Compare
This is a solid start - as we discussed via slack we'll need to address the |
@@ -15,10 +15,10 @@ This test requires access to VMware Nimbus cluster for dynamic ESXi and vCenter | |||
2. Add the ROBO SKU license to the vCenter appliance | |||
3. Assign the ROBO SKU license to each of the hosts within the vCenter | |||
4. Install the VIC appliance onto one of the hosts in the vCenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this should be explicit about standalone hosts, single host clusters, or multi-host clusters.
======= | ||
|
||
# Purpose: | ||
To verify that the total container VM limit feature works as expected in a vSphere ROBO Advanced environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration option should not be unique to ROBO - I suggest the main body of this test be in the CI test buckets and this test, in the robo specific setting, uses it in the same way the previous robo test references the regression tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought this up with @mhagen-vmware since the container VM limit feature doesn't apply just to ROBO, but to vic-machine
in general. This pull request is intended only for ROBO-focused test plans. A CI test plan for this would come when closing #6529 for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated #6529's acceptance criteria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this response... but I'm not going to push it at this time. We'll simply end up committing the tests directly into the CI test suites and refactoring this test runbook to reference those.
|
||
# Test Steps: | ||
1. Deploy a ROBO Advanced vCenter testbed for both environments above | ||
2. Install the VIC appliance on vCenter with a container VM limit of y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which cluster are we installing to? Is it random, round robin, each in turn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could pick any cluster to install into, unless we want to run this test in each cluster configuration - in which case I can mention so in the Environment
section above.
I'll make this line specific.
13. Delete/stop some containers so the current container VM count is lower than the set limit | ||
14. Attempt to create/run more containers until the set limit | ||
15. Delete the VCH | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest we also try:
- running
run/delete
in series to ensure that we're only enforcing concurrently running limits. - creating more than y cVMs but not starting them (assuming the enforcement is for running cVMs for first drop)
- starting and deleting containers at the same time so we stay close to the limit but are bouncing off it and ensure after we stop with the concurrent operations that we can still hit the limit and are not over it - this tests the concurrency of the bound tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - thanks for clarifying!
|
||
# Test Steps: | ||
1. Deploy a ROBO Advanced vCenter testbed for both environments above | ||
2. Install the VIC appliance on vCenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which cluster/clusters? I'm assuming one of the multi-host clusters_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - one of the multi/single-host clusters. I'll make it specific.
3. Deploy containers that will consume resources predictably (e.g. the `progrium/stress` image) | ||
4. Measure ESX host metrics and gather resource consumption | ||
5. Create and run regular containers such as `busybox` | ||
6. Create and run enough containers to consume all available host resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now I'm thinking single host cluster... and this test is focused on testing the resource exhaustion case.
But in that case I'd expect another test that ensures we can consume the entire cluster resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent for this test is to target a particular cluster (multi-host or otherwise) - I'll make the steps clearer and specific.
@@ -15,10 +15,10 @@ This test requires access to VMware Nimbus cluster for dynamic ESXi and vCenter | |||
2. Add the ROBO SKU license to the vCenter appliance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the environment of primary interest has an Enterprise license for VC but ROBO for the hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I'm changing this line to Add the Enterprise license to the vCenter appliance
. cc @mhagen-vmware since he wrote this particular test.
1. Deploy a ROBO Advanced vCenter testbed for both environments above | ||
2. Install the VIC appliance on vCenter | ||
3. Visit the VCH Admin page and verify that the License and Feature Status sections show that required license and features are present | ||
4. Assign a more restrictive license such as ROBO (unadvanced) or Standard that does not have the required features (VDS, VSPC) to vCenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROBO (unadvanced)
ROBO Standard
1. Deploy a ROBO Advanced vCenter testbed for both environments above | ||
2. Install the VIC appliance on vCenter | ||
3. Create and start some container services such as nginx, wordpress or a database | ||
4. Run a containerized application with docker-compose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this to be an explicitly multi-container application so we exercise bridge communications, etc.
2. Install the VIC appliance on vCenter | ||
3. Create and start some container services such as nginx, wordpress or a database | ||
4. Run a containerized application with docker-compose | ||
5. For each ESXi host that hosts containerVM(s), disconnect it from vCenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're emulating WAN link outage then it should be all hosts in the cluster.
We should also ensure that the hosts can continue to talk to each other.
I'm unsure what is meant by disconnect
- it needs to be unexpected from both the VC and ESX side so that we don't end up with polite behaviours.
Might be possible with firewall rules on ESX or in nimbus. Alternatively if the VC is addressed via a separate network than the other hosts in the cluster.
63ba0fb
to
4759438
Compare
|
||
# Test Steps: | ||
1. Deploy a ROBO Advanced vCenter testbed for both environments above | ||
2. Install the VIC appliance on a particular (multi/single-host) cluster on vCenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a single host cluster for this test would be:
a. to ensure we deal cleanly with that scenario
b. to check resource exhaustion behaviour in a simple setting.
We must test in a multi-host cluster as that is what this placement logic is expressly for. These should be pulled out as two distinct variants of the test rather than incidentally noted in the current manner.
It's okay to leave comments in for the multi-host test saying we're not sure of exact behaviour at this time as it will be contingent on the algorithm design for the placement, but we should note the we are able to reach whatever cluster utilization level we would expect from the cVM size, cluster capacity and placement logic.
2. Install the VIC appliance on vCenter | ||
3. Visit the VCH Admin page and verify that the License and Feature Status sections show that required license and features are present | ||
4. Assign a more restrictive license such as ROBO Standard or Standard that does not have the required features (VDS, VSPC) to vCenter | ||
5. Assign the above license to each of the hosts within the vCenter host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vCenter host
vCenter cluster
We will also need to decide what we will do if not all hosts in the cluster are licensed with ROBO advanced. Do we place only onto the hosts that have met the requirements (which is kind of what we do with incompletely connected cluster datastores and networks via DRS currently) or do we refuse to install in a heterogeneous cluster. @cgtexmex ?
|
||
# Test Steps: | ||
1. Deploy a ROBO Advanced vCenter testbed for both environments above | ||
2. Install the VIC appliance on vCenter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Install the VIC appliance on vCenter
Install a VCH in a cluster
I'm going to be increasingly pedantic about word usage going forwards, as part of my responsibilities to ensure clear/concise communication. This same comment applies to other uses of this term - VIC appliance is an overloaded term at this time, for example I think you're talking about a VCH here, but you may well be talking about testing the VIC appliance with Harbor/Admiral across a WAN link.
Harbor/Admiral over WAN is a good test and we should likely add a section for, even if it's a statement that we're explicitly not testing that facet at this time @cgtexmex .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VIC appliance is an overloaded term at this time
Agree completely. I'm replacing all occurrences of VIC appliance
with VCH
in this change. For this particular test, I'll add some steps to test Harbor/Admiral with Engine through the VIC appliance.
ad3e4dd
to
6d3cfa2
Compare
This commit adds test plans for the ROBO support features in a new directory (Group19-ROBO) under manual test cases. The existing ROBO-SKU test has been moved into this directory. The test plans include tests for the container limit feature, placement without DRS, the license/ feature checks and WAN connectivity. Fixes vmware#7294
6d3cfa2
to
3d6969b
Compare
* Dump dmesg if container rootfs blocks or fails mount (#7260) This is to enable bridging of the guest side state with the virtual hardware if we see issues such as disks not presenting on a pvscsi controller or a mount operation hanging. * updating priority definitions to include features (#7292) * Change default fellows for gandalf (#7310) * Avoid exposing test credentials in 12-01-Delete (#7306) To avoid exposing test credentials, use the established `Run VIC Machine Delete Command` keyword, which in turn calls a secret keyword. This changes the behavior of the test slightly: - It no longer checks for the absence of "delete failed" in output. - It will wait up to 30 seconds for the deletion to succeed. - It will clean up cert files at the end of the deletion. * Bug fix in API delete test: disable volume store cleanup (#7303) * Remove volume store cleanup before re-installing VIC appliance using existing volume stores * Cleanup dangling volume stores on test teardown * Add logging for image upload (#7296) * Reduce datastore searches during non-vSAN delete operations (#6951) * Optimize portlayer volume cache rebuild on startup (#7267) This commit modifies the portlayer volume cache's rebuildCache func to only process the volumes from the volume store that is currently being added to the cache. rebuildCache is invoked for every volume store during portlayer startup. Before this change, rebuildCache would process volumes from all volume stores in the cache every time a volume store was added. This led to unneeded extra computation which could slow down portlayer startup and overwhelm NFS endpoints if NFS volume stores are being used. Fixes #6991 * Added local ci testing target to makefile (#7170) Make testing locally as friction-free as possible by 1. Adding a makefile target 'local-ci-test' 2. Using TARGET_VCH added in VIC 1.3 to use an existing VCH 3. Using a custom script that doesn't utilize drone so that if the test fails and craters, we can still access the logs 4. Parameters can come from env vars, arguments, or secrets file Resolves #7162 * Added upload progress bar tracker for ISO images. (#7320) * Added upload progress bar tracker for ISO images. Removed concurrent upload since it doesn't make any significant performance imapact. When I tried to measure performance differene with and without concurrent uppload, the results were fluctuating in a wide range so no good measurement was possible. * Document the design for the vic-machine API (#6702) This document proposes a design for a comprehensive vic-machine API, the implementation of which will be tracked by #6116. Subsets of this API (tracked by #5721, #6123, and eventually others) will be implemented incrementally, and the design will be revised as those efforts progress to reflect changes to the long-term vision. * Remove superfluous calls to Set Test VCH Name (#7304) Several tests explicitly call the `Set Test VCH Name` keyword shortly after calling `Set Test Environment Variables`. This can lead to test failures when a VCH name collision occurs; subsequent tests which re-use the VCH name fail because there may be leftover certificates from the first VCH with that name. `Set Test Environment Variables` itself calls `Set Test VCH Name` and then cleans up old certificate directories. Therefore, the explicit calls to `Set Test VCH Name` are both superfluous and problematic. * Ensure that static ip worker is on the same nimbus pod as VC otherwise network connectivity not guaranteed (#7307) * [skip ci] Add ROBO test plans (#7297) This commit adds test plans for the ROBO support features in a new directory (Group19-ROBO) under manual test cases. The existing ROBO-SKU test has been moved into this directory. The test plans include tests for the container limit feature, placement without DRS, the license/ feature checks and WAN connectivity. Fixes #7294 * Add hosts to DVS within the test bed as well (#7326) * Setup updated for Longevity Tests (#7298) * Setup updated for Longevity Tests to run on 6.5U1 * [skip ci] Terminate gracefully to gather logs (#7331) * Terminate gracefully to gather logs * Remove extra whitespace * Increase timeout to 70 minutes * Increase ELM timeout to 70 minutes * Add repo to slack message since we have multiple repos reporting now (#7334) * Not sending user credentials with every request (#6382) * Add concurrent testing tool to tests folder (#6534) Adds a minimized test case for testing our core vSphere interactions at varying degrees of concurrency. This is intended to simplify debugging issues that are suspected to be platform problems, or API usage issues that are conceptually divorced from the VIC engine product code. * Refactor Install Harbor To Test Server keyword (#7335) The secret tag on the `Install Harbor To Test Server` makes it difficult to investigate failures when they occur. Only one out of 30+ lines actually uses secret information. Refactor the keyword to extract the secret information into its own keyword, allowing the tag to be applied in a more focused way. This is similar to the pattern used by keywords in `VCH-Util`. * Add ability to cache generated dependency. (#7340) * Add ability to cache generated dependency, so not much time wasted during the build process. * Added documentation to reflect necessary steps to leverage such improvements. * Ignore not-supported result from RetrieveUserGroups in VC 6.0 (#7328) * Move build time directives from title to body of PR (#7060) * Retry the harbor setup as well (#7336) * Skip non vSphere managed datastores when granting perms (#7346) * Fix volume leak on group 23 test (#7358) * Fix github status automation filtering (#7344) Adds filtering for the event source and consolidates remote API calls. Details the specific builds and their status for quick reference. * Drone 0.8 and HaaS updates (#7364) * Add tether.debug in integration test log bundle (#7422) * Update the gcs plugin (#7421) * [skip ci] Suggest subnet/gateway to static ip worker * Ensure that static ip worker is on the same nimbus pod as VC otherwise network connectivity not guaranteed (#7307) * Refactored some proxy code to reuse with wolfpack Refactored the system, volume, container, and stream swagger code into proxy code. 1) Moved the errors.go from backends to a new folder to be accessed by all folders outside of the backends folder. 2) Refactored Container proxy and moved from engine/backends to engine/proxy 3) Refactored Volume proxy and moved from engine/backends to engine/proxy 4) Refactored System proxy and moved from engine/backends to engine/proxy 5) Refactored Stream proxy and moved from engine/backends to engine/proxy 6) Adopted some common patterns in all the proxies 7) Moved common networking util calls to engine/networking 8) Fix up unit tests 9) Changed all "not yet implemented messages" 10) Updated robot scripts More refactoring will be needed to make these proxy less dependent on docker types and portlayer swagger types. Helps resolves #7210 and #7232 * Add virtual-kubelet binary to VIC ISO (#7315) * Start virtual-kubelet inside the VCH (#7369) * Fix value of the PORTLAYER_ADDR environment variable (#7400) * Use vic kubelet provider * Made modifications for virtual kubelet * Added admin log collection and fix env variable content (#7404) * Added most-vkubelet target (#7418)
This commit adds test plans for the ROBO support features in a new
directory (Group19-ROBO) under manual test cases. The existing ROBO-SKU
test has been moved into this directory. The test plans include tests
for the container limit feature, placement without DRS, the license/
feature checks and WAN connectivity.
Fixes #7294