Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[skip ci] Add ROBO test plans #7297

Merged
merged 2 commits into from
Feb 14, 2018

Conversation

anchal-agrawal
Copy link
Contributor

@anchal-agrawal anchal-agrawal commented Feb 9, 2018

This commit adds test plans for the ROBO support features in a new
directory (Group19-ROBO) under manual test cases. The existing ROBO-SKU
test has been moved into this directory. The test plans include tests
for the container limit feature, placement without DRS, the license/
feature checks and WAN connectivity.

Fixes #7294

@anchal-agrawal anchal-agrawal force-pushed the 7294-robo-testplans branch 3 times, most recently from d421c42 to a352d6d Compare February 9, 2018 00:10
@anchal-agrawal anchal-agrawal changed the title [WIP] [skip ci] ROBO test plans [WIP] [skip ci] Add ROBO test plans Feb 9, 2018
@anchal-agrawal anchal-agrawal force-pushed the 7294-robo-testplans branch 2 times, most recently from 9e6d7f8 to 6e5edeb Compare February 9, 2018 19:12
@anchal-agrawal anchal-agrawal changed the title [WIP] [skip ci] Add ROBO test plans [skip ci] Add ROBO test plans Feb 9, 2018
Copy link
Contributor

@mhagen-vmware mhagen-vmware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - just one small non-blocking comment, take it or leave it

* All steps should succeed.
* In Step 2, the VCH should be placed on a host that satisfies the license and other feature requirements.
* In Steps 3 and 6, containers shouldn't fail to be created/started unless the cluster resources/limits are exhausted.
* In Steps 4 and 7, containers should be placed according to the criteria defined in [Purpose](#purpose). More details are TBD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually number the references and refer here to the 2nd ref, not the purpose

@@ -0,0 +1,36 @@
Test 19-3 - ROBO - VM Placement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to control host resource consumption and test with that in mind. For example deploy progrium/stress containerVMs that will consume resources in a predictable manner -- once the hosts are running at a known level of consumption then deploy test containerVMs (ubuntu, busybox, etc) and ensure they are placed on the host that we know has the necessary free resources.

Additionally, we need a negative test -- i.e. all hosts are consumed, so deployment fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I'll add these details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@cgtexmex
Copy link
Contributor

This is a solid start - as we discussed via slack we'll need to address the vic-machine specifics and we'll have additional testing bubble up as we solidify some items (i.e. events, cache invalidation, inventory folder support, etc..)

@@ -15,10 +15,10 @@ This test requires access to VMware Nimbus cluster for dynamic ESXi and vCenter
2. Add the ROBO SKU license to the vCenter appliance
3. Assign the ROBO SKU license to each of the hosts within the vCenter
4. Install the VIC appliance onto one of the hosts in the vCenter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be explicit about standalone hosts, single host clusters, or multi-host clusters.

=======

# Purpose:
To verify that the total container VM limit feature works as expected in a vSphere ROBO Advanced environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration option should not be unique to ROBO - I suggest the main body of this test be in the CI test buckets and this test, in the robo specific setting, uses it in the same way the previous robo test references the regression tests.

Copy link
Contributor Author

@anchal-agrawal anchal-agrawal Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought this up with @mhagen-vmware since the container VM limit feature doesn't apply just to ROBO, but to vic-machine in general. This pull request is intended only for ROBO-focused test plans. A CI test plan for this would come when closing #6529 for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated #6529's acceptance criteria.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this response... but I'm not going to push it at this time. We'll simply end up committing the tests directly into the CI test suites and refactoring this test runbook to reference those.


# Test Steps:
1. Deploy a ROBO Advanced vCenter testbed for both environments above
2. Install the VIC appliance on vCenter with a container VM limit of y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which cluster are we installing to? Is it random, round robin, each in turn?

Copy link
Contributor Author

@anchal-agrawal anchal-agrawal Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could pick any cluster to install into, unless we want to run this test in each cluster configuration - in which case I can mention so in the Environment section above.

I'll make this line specific.

13. Delete/stop some containers so the current container VM count is lower than the set limit
14. Attempt to create/run more containers until the set limit
15. Delete the VCH

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest we also try:

  1. running run/delete in series to ensure that we're only enforcing concurrently running limits.
  2. creating more than y cVMs but not starting them (assuming the enforcement is for running cVMs for first drop)
  3. starting and deleting containers at the same time so we stay close to the limit but are bouncing off it and ensure after we stop with the concurrent operations that we can still hit the limit and are not over it - this tests the concurrency of the bound tracking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - thanks for clarifying!


# Test Steps:
1. Deploy a ROBO Advanced vCenter testbed for both environments above
2. Install the VIC appliance on vCenter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which cluster/clusters? I'm assuming one of the multi-host clusters_

Copy link
Contributor Author

@anchal-agrawal anchal-agrawal Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - one of the multi/single-host clusters. I'll make it specific.

3. Deploy containers that will consume resources predictably (e.g. the `progrium/stress` image)
4. Measure ESX host metrics and gather resource consumption
5. Create and run regular containers such as `busybox`
6. Create and run enough containers to consume all available host resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I'm thinking single host cluster... and this test is focused on testing the resource exhaustion case.
But in that case I'd expect another test that ensures we can consume the entire cluster resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent for this test is to target a particular cluster (multi-host or otherwise) - I'll make the steps clearer and specific.

@@ -15,10 +15,10 @@ This test requires access to VMware Nimbus cluster for dynamic ESXi and vCenter
2. Add the ROBO SKU license to the vCenter appliance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the environment of primary interest has an Enterprise license for VC but ROBO for the hosts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I'm changing this line to Add the Enterprise license to the vCenter appliance. cc @mhagen-vmware since he wrote this particular test.

1. Deploy a ROBO Advanced vCenter testbed for both environments above
2. Install the VIC appliance on vCenter
3. Visit the VCH Admin page and verify that the License and Feature Status sections show that required license and features are present
4. Assign a more restrictive license such as ROBO (unadvanced) or Standard that does not have the required features (VDS, VSPC) to vCenter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROBO (unadvanced)

ROBO Standard

1. Deploy a ROBO Advanced vCenter testbed for both environments above
2. Install the VIC appliance on vCenter
3. Create and start some container services such as nginx, wordpress or a database
4. Run a containerized application with docker-compose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this to be an explicitly multi-container application so we exercise bridge communications, etc.

2. Install the VIC appliance on vCenter
3. Create and start some container services such as nginx, wordpress or a database
4. Run a containerized application with docker-compose
5. For each ESXi host that hosts containerVM(s), disconnect it from vCenter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're emulating WAN link outage then it should be all hosts in the cluster.
We should also ensure that the hosts can continue to talk to each other.

I'm unsure what is meant by disconnect - it needs to be unexpected from both the VC and ESX side so that we don't end up with polite behaviours.

Might be possible with firewall rules on ESX or in nimbus. Alternatively if the VC is addressed via a separate network than the other hosts in the cluster.


# Test Steps:
1. Deploy a ROBO Advanced vCenter testbed for both environments above
2. Install the VIC appliance on a particular (multi/single-host) cluster on vCenter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of a single host cluster for this test would be:
a. to ensure we deal cleanly with that scenario
b. to check resource exhaustion behaviour in a simple setting.

We must test in a multi-host cluster as that is what this placement logic is expressly for. These should be pulled out as two distinct variants of the test rather than incidentally noted in the current manner.

It's okay to leave comments in for the multi-host test saying we're not sure of exact behaviour at this time as it will be contingent on the algorithm design for the placement, but we should note the we are able to reach whatever cluster utilization level we would expect from the cVM size, cluster capacity and placement logic.

2. Install the VIC appliance on vCenter
3. Visit the VCH Admin page and verify that the License and Feature Status sections show that required license and features are present
4. Assign a more restrictive license such as ROBO Standard or Standard that does not have the required features (VDS, VSPC) to vCenter
5. Assign the above license to each of the hosts within the vCenter host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vCenter host

vCenter cluster

We will also need to decide what we will do if not all hosts in the cluster are licensed with ROBO advanced. Do we place only onto the hosts that have met the requirements (which is kind of what we do with incompletely connected cluster datastores and networks via DRS currently) or do we refuse to install in a heterogeneous cluster. @cgtexmex ?


# Test Steps:
1. Deploy a ROBO Advanced vCenter testbed for both environments above
2. Install the VIC appliance on vCenter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Install the VIC appliance on vCenter

Install a VCH in a cluster

I'm going to be increasingly pedantic about word usage going forwards, as part of my responsibilities to ensure clear/concise communication. This same comment applies to other uses of this term - VIC appliance is an overloaded term at this time, for example I think you're talking about a VCH here, but you may well be talking about testing the VIC appliance with Harbor/Admiral across a WAN link.

Harbor/Admiral over WAN is a good test and we should likely add a section for, even if it's a statement that we're explicitly not testing that facet at this time @cgtexmex .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VIC appliance is an overloaded term at this time

Agree completely. I'm replacing all occurrences of VIC appliance with VCH in this change. For this particular test, I'll add some steps to test Harbor/Admiral with Engine through the VIC appliance.

Anchal Agrawal added 2 commits February 14, 2018 12:24
This commit adds test plans for the ROBO support features in a new
directory (Group19-ROBO) under manual test cases. The existing ROBO-SKU
test has been moved into this directory. The test plans include tests
for the container limit feature, placement without DRS, the license/
feature checks and WAN connectivity.

Fixes vmware#7294
@anchal-agrawal anchal-agrawal merged commit 97a7881 into vmware:master Feb 14, 2018
sflxn pushed a commit that referenced this pull request Mar 1, 2018
* Dump dmesg if container rootfs blocks or fails mount (#7260)

This is to enable bridging of the guest side state with the virtual hardware if we
see issues such as disks not presenting on a pvscsi controller or a mount operation
hanging.

* updating priority definitions to include features (#7292)

* Change default fellows for gandalf (#7310)

* Avoid exposing test credentials in 12-01-Delete (#7306)

To avoid exposing test credentials, use the established `Run VIC
Machine Delete Command` keyword, which in turn calls a secret keyword.

This changes the behavior of the test slightly:
 - It no longer checks for the absence of "delete failed" in output.
 - It will wait up to 30 seconds for the deletion to succeed.
 - It will clean up cert files at the end of the deletion.

* Bug fix in API delete test: disable volume store cleanup (#7303)

* Remove volume store cleanup before re-installing VIC appliance using existing volume stores
* Cleanup dangling volume stores on test teardown

* Add logging for image upload (#7296)

* Reduce datastore searches during non-vSAN delete operations (#6951)

* Optimize portlayer volume cache rebuild on startup (#7267)

This commit modifies the portlayer volume cache's rebuildCache func to
only process the volumes from the volume store that is currently being
added to the cache. rebuildCache is invoked for every volume store
during portlayer startup.

Before this change, rebuildCache would process volumes from all volume
stores in the cache every time a volume store was added. This led to
unneeded extra computation which could slow down portlayer startup and
overwhelm NFS endpoints if NFS volume stores are being used.

Fixes #6991

* Added local ci testing target to makefile (#7170)

Make testing locally as friction-free as possible by

1. Adding a makefile target 'local-ci-test'
2. Using TARGET_VCH added in VIC 1.3 to use an existing VCH
3. Using a custom script that doesn't utilize drone so that if
   the test fails and craters, we can still access the logs
4. Parameters can come from env vars, arguments, or secrets file

Resolves #7162

* Added upload progress bar tracker for ISO images. (#7320)

* Added upload progress bar tracker for ISO images.

Removed concurrent upload since it doesn't make any significant performance imapact.
When I tried to measure performance differene with and without concurrent uppload,
the results were fluctuating in a wide range so no good measurement was possible.

* Document the design for the vic-machine API (#6702)

This document proposes a design for a comprehensive vic-machine API,
the implementation of which will be tracked by #6116.

Subsets of this API (tracked by #5721, #6123, and eventually others)
will be implemented incrementally, and the design will be revised as
those efforts progress to reflect changes to the long-term vision.

* Remove superfluous calls to Set Test VCH Name (#7304)

Several tests explicitly call the `Set Test VCH Name` keyword shortly
after calling `Set Test Environment Variables`.

This can lead to test failures when a VCH name collision occurs;
subsequent tests which re-use the VCH name fail because there may be
leftover certificates from the first VCH with that name.

`Set Test Environment Variables` itself calls `Set Test VCH Name` and
then cleans up old certificate directories. Therefore, the explicit
calls to `Set Test VCH Name` are both superfluous and problematic.

* Ensure that static ip worker is on the same nimbus pod as VC otherwise network connectivity not guaranteed (#7307)

* [skip ci] Add ROBO test plans (#7297)

This commit adds test plans for the ROBO support features in a new
directory (Group19-ROBO) under manual test cases. The existing ROBO-SKU
test has been moved into this directory. The test plans include tests
for the container limit feature, placement without DRS, the license/
feature checks and WAN connectivity.

Fixes #7294

* Add hosts to DVS within the test bed as well (#7326)

* Setup updated for Longevity Tests (#7298)

* Setup updated for Longevity Tests to run on 6.5U1

* [skip ci] Terminate gracefully to gather logs (#7331)

* Terminate gracefully to gather logs

* Remove extra whitespace

* Increase timeout to 70 minutes

* Increase ELM timeout to 70 minutes

* Add repo to slack message since we have multiple repos reporting now (#7334)

* Not sending user credentials with every request (#6382)

* Add concurrent testing tool to tests folder (#6534)

Adds a minimized test case for testing our core vSphere interactions at
varying degrees of concurrency. This is intended to simplify debugging
issues that are suspected to be platform problems, or API usage issues
that are conceptually divorced from the VIC engine product code.

* Refactor Install Harbor To Test Server keyword (#7335)

The secret tag on the `Install Harbor To Test Server` makes it difficult
to investigate failures when they occur.

Only one out of 30+ lines actually uses secret information.

Refactor the keyword to extract the secret information into its own
keyword, allowing the tag to be applied in a more focused way. This is
similar to the pattern used by keywords in `VCH-Util`.

* Add ability to cache generated dependency. (#7340)

* Add ability to cache generated dependency, so not much time wasted during the build process.
* Added documentation to reflect necessary steps to leverage such improvements.

* Ignore not-supported result from RetrieveUserGroups in VC 6.0 (#7328)

* Move build time directives from title to body of PR (#7060)

* Retry the harbor setup as well (#7336)

* Skip non vSphere managed datastores when granting perms (#7346)

* Fix volume leak on group 23 test (#7358)

* Fix github status automation filtering (#7344)

Adds filtering for the event source and consolidates remote API calls.
Details the specific builds and their status for quick reference.

* Drone 0.8 and HaaS updates (#7364)

* Add tether.debug in integration test log bundle (#7422)

* Update the gcs plugin (#7421)

* [skip ci] Suggest subnet/gateway to static ip worker

* Ensure that static ip worker is on the same nimbus pod as VC otherwise network connectivity not guaranteed (#7307)

* Refactored some proxy code to reuse with wolfpack

Refactored the system, volume, container, and stream swagger code
into proxy code.

1) Moved the errors.go from backends to a new folder to be accessed
by all folders outside of the backends folder.
2) Refactored Container proxy and moved from engine/backends to engine/proxy
3) Refactored Volume proxy and moved from engine/backends to engine/proxy
4) Refactored System proxy and moved from engine/backends to engine/proxy
5) Refactored Stream proxy and moved from engine/backends to engine/proxy
6) Adopted some common patterns in all the proxies
7) Moved common networking util calls to engine/networking
8) Fix up unit tests
9) Changed all "not yet implemented messages"
10) Updated robot scripts

More refactoring will be needed to make these proxy less dependent on
docker types and portlayer swagger types.

Helps resolves #7210 and #7232

* Add virtual-kubelet binary to VIC ISO (#7315)

* Start virtual-kubelet inside the VCH (#7369)

* Fix value of the PORTLAYER_ADDR environment variable (#7400)

* Use vic kubelet provider

* Made modifications for virtual kubelet

* Added admin log collection and fix env variable content (#7404)

* Added most-vkubelet target (#7418)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants