Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch 'feature/vic-machine-service' [full ci] #6665

Merged
merged 56 commits into from
Nov 20, 2017

Conversation

zjs
Copy link
Member

@zjs zjs commented Oct 31, 2017

Define a new VCH Management API and implement the subset of that API which is necessary for building a VCH Creation Wizard as a part of the H5 Client Plugin.

Fixes #5721


At this stage, review should focus on understanding this large change and identifying work which may need to be completed before the merge. Feedback raised as a part of this review will be separated out into new GitHub issues for tracking.

Instructions:

  • To build the API server locally, use make serviceapi.
    • To run the resulting API server binary, see bin/vic-machine-server --help.
  • To build the container for the API server, see cmd/vic-machine-server/Dockerfile.
    • To build an OVA which includes the container for the API server, see installer/BUILD.md in the vic-product repository.
  • For information on using this API, see Add tutorial for VCH Management API vic-product#1004.

Known remaining work:

Known testing gaps:

Known bugs:

Known debt:

Believe to be out-of-scope for 1.3:

@zjs zjs self-assigned this Oct 31, 2017
@zjs zjs force-pushed the feature/vic-machine-service branch from e2957ac to 66411cf Compare October 31, 2017 23:19
Copy link
Contributor

@mdubya66 mdubya66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All licenses are ok Approved for vendor changes.

Copy link
Contributor

@mhagen-vmware mhagen-vmware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nicely done

@zjs zjs force-pushed the feature/vic-machine-service branch 6 times, most recently from 2d7026a to 99035e4 Compare November 7, 2017 01:23
@@ -76,6 +77,12 @@ func (t *Target) HasCredentials() error {
return cli.NewExitError("--target argument must be specified", 1)
}

// assume if a vsphere session key exists, we want to use that instead of user/pass
if t.CloneTicket != "" {
t.URL.User = nil // necessary?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzt?

if err != nil {
return nil, fmt.Errorf("Validation Error: %s", err)
}
// If dc is not set, and multiple datacenter is available, vic-machine ls will list VCHs under all datacenters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/datacenter is/datacenters are

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix in #6721

// If dc is not set, and multiple datacenter is available, vic-machine ls will list VCHs under all datacenters.
validator.AllowEmptyDC()

_, err = validator.ValidateTarget(ctx, d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error block can be collapsed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix in #6719

if err != nil {
return nil, fmt.Errorf("Target validation failed: %s", err)
}
_, err = validator.ValidateCompute(ctx, d, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error block can be collapsed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix in #6719

return nil, util.NewError(http.StatusNotFound, fmt.Sprintf("Unable to find VCH %s: %s", d.ID, err))
}

err = validate.SetDataFromVM(validator.Context, validator.Session.Finder, vch, d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error block can be collapsed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declining to fix (same explanation as #6694 (comment))


// getDatastoreHelper validates the VCH and returns the datastore helper for the VCH. It errors when validation fails or when datastore is not ready
func getDatastoreHelper(op trace.Operation, d *data.Data) (*datastore.Helper, error) {
// TODO (angiew): abstract some of the boilerplate into helpers in common.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for TODO (username)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue filed to track TODO, reference added in #6720

@@ -125,7 +137,7 @@ func (d *Dispatcher) uploadImages(files map[string]string) error {
switch err.(type) {
// if not found, do nothing
case object.DatastoreNoSuchFileError:
// otherwise force delete
// otherwise force delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems out of place - it'd fit better inside the default block since that's what the comment is for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #6719

@@ -125,7 +137,7 @@ func (d *Dispatcher) uploadImages(files map[string]string) error {
switch err.(type) {
// if not found, do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems out of place - it'd fit better inside the case object.DatastoreNoSuchFileError: block since that's what the comment is for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #6719

@@ -101,6 +101,10 @@ func (v *Build) String() string {
}

func (v *Build) ShortVersion() string {
if v == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is needed - if v were indeed nil we'd get a SIGSEGV in the caller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, we can say:

if v.Version == "" && v.BuildNumber == "" && v.GitCommit == "" {
    return "unknown"
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If v is nil, we get "unknown".

https://play.golang.org/p/A5t_Ad_q_a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting - TIL! I found this thread with some discussion: https://groups.google.com/forum/#!topic/golang-nuts/wcrZ3P1zeAk.

cc @matthewavery since we were discussing this in person earlier.

@@ -202,6 +204,7 @@ func (s *Session) Connect(ctx context.Context) (*Session, error) {
}

soapClient := soap.NewClient(soapURL, s.Insecure)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: unneeded whitespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ignore this to avoid further churn in this file.

anchal-agrawal pushed a commit to anchal-agrawal/vic that referenced this pull request Nov 7, 2017
This commit makes the 'Remove a container and its anonymous volume'
test case more robust by force-removing the container that runs the
/bin/ls command, since it's possible for the container to not have
powered off fully and the container state to not have updated to
'Stopped' before the remove command is issued in the test.

Before this change, it was possible for the test to fail if the
container state wasn't updated on time. This was seen in CI build
14551 for PR vmware#6665.

Since that test snippet's intent (verify that the anonymous volume
can be used by a new container) is unaffected by how the container
is removed, this change is valid.

Enhances vmware#6262
Copy link
Member

@hickeng hickeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some items that must be changed. Other comments are informational or left to your discretion - ping me directly for exposition on specific items.

Makefile Outdated
@@ -365,7 +384,7 @@ $(appliance-staging): isos/appliance-staging.sh $(iso-base)
@$(TIME) $< -c $(BIN)/.yum-cache.tgz -p $(iso-base) -o $@

# main appliance target - depends on all top level component targets
$(appliance): isos/appliance.sh isos/appliance/* isos/vicadmin/** $(vicadmin) $(vic-init) $(portlayerapi) $(docker-engine-api) $(appliance-staging)
$(appliance): isos/appliance.sh isos/appliance/* isos/vicadmin/** $(vicadmin) $(vic-init) $(portlayerapi) $(serviceapi-server) $(docker-engine-api) $(appliance-staging)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the appliance you think it is - this is the VCH endpoint ISO target and definitely doesn't have a dependency on the serviceapi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #6723

Makefile Outdated
@rm -f ./lib/apiservers/service/restapi/doc.go
@rm -f ./lib/apiservers/service/restapi/embedded_spec.go
@rm -f ./lib/apiservers/service/restapi/server.go
@rm -rf ./lib/apiservers/service/restapi/operations/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you have ./lib/apiservers/service/client and ./lib/apiservers/service/models in the list of directories to be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #6723

@@ -76,6 +77,12 @@ func (t *Target) HasCredentials() error {
return cli.NewExitError("--target argument must be specified", 1)
}

// assume if a vsphere session key exists, we want to use that instead of user/pass
if t.CloneTicket != "" {
t.URL.User = nil // necessary?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Hoping to get an answer here: #6665 (comment)

User string
Password *string
CloneTicket string
Thumbprint string `cmd:"thumbprint"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear why this is being added to the vic-machine command without an option by which to set it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzt may be able to provide more context, but my assumption is that this is essentially due to the poorly defined separation of responsibilities we have between the CLI code and the API and undue coupling between the CLI code and the code it calls.

I'd propose we attempt to clean this up as a part of #6032.

@@ -37,6 +38,7 @@ const (
uploadMaxElapsedTime = 30 * time.Minute
uploadMaxInterval = 1 * time.Minute
uploadInitialInterval = 10 * time.Second
timeFormat = "2006-01-02T15:04:05-0700"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in lib/constants for consistent reference across components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this change to #6718.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to handle separately: #6818

}
model.Runtime.PowerState = string(powerState)

if public := vchConfig.ExecutorConfig.Networks["public"]; public != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct.
See https://github.com/vmware/vic/blob/master/cmd/docker/main.go#L224 for the port choice logic.

The API is always served on the client network role (unless something is very wrong with current vic-machine) which MAY share an interface with the public network role.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #6728 for this.

return operations.NewGetTargetTargetVchVchIDLogDefault(util.StatusCode(err)).WithPayload(err.Error())
}

return operations.NewGetTargetTargetVchVchIDLogOK().WithPayload(output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this is NOT expected to stream the log? Is it expected to block or truncate at whatever is the end-of-file when we're reading it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The proposal is to eventually use a more appropriate protocol (WebSockets?) to allow for streaming of logs. (See #6702 for more information.)

return nil, util.NewError(http.StatusNotFound, fmt.Sprintf("Unable to find VCH %s: %s", d.ID, err))
}

if err := validate.SetDataFromVM(validator.Context, validator.Session.Finder, vch, d); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a vch.FolderName method that you could call that tells you the directory name of the VM.
It looks like that has been recently changed to only provide the Base name - I am unsure why and @caglar10ur has just left. However this is the call used in lib/install/management/appliance.go:523.

The ImageStores path MAY be the same, but should not be in the case of vSAN - a VM in vSAN MUST have a dedicated namespace and I believe we place the image VMDKS into their own namespace.

Copy link
Member Author

@zjs zjs Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #6729 for this.


// getAllLogFilePaths returns a list of all log file paths under datastore folder, errors out when no log file found
func getAllLogFilePaths(op trace.Operation, helper *datastore.Helper) ([]string, error) {
res, err := helper.Ls(op, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the implementation of helper.Ls it would be trivial to change it so that you can actually provide a pattern instead of a static name. There is already a MatchPattern defined as part of the search spec.
That would basically eliminate the need for this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #6733 for this.

sort.Strings(paths)

for _, p := range paths {
reader, err := helper.Download(op, p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be able to access the file if it is open via a different process (for example still being uploaded) if not accessed via the same host. I do not know if this is an issue or not.
If it is then a specific host can be chosen for the operation by setting a target host in the context: c.vm.Datastore.HostContext(op, h)
See

ctx = c.vm.Datastore.HostContext(ctx, h)
for usage.

Copy link
Member Author

@zjs zjs Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #6730 for this.

@hickeng
Copy link
Member

hickeng commented Nov 8, 2017

I've not run through the test code but have done a first pass on the rest of it (high level correctness and flow rather than low level detail).

anchal-agrawal pushed a commit to anchal-agrawal/vic that referenced this pull request Nov 8, 2017
This commit makes the 'Remove a container and its anonymous volume'
test case more robust by force-removing the container that runs the
/bin/ls command, since it's possible for the container to not have
powered off fully and the container state to not have updated to
'Stopped' before the remove command is issued in the test.

Before this change, it was possible for the test to fail if the
container state wasn't updated on time. This was seen in CI build
14551 for PR vmware#6665.

Since that test snippet's intent (verify that the anonymous volume
can be used by a new container) is unaffected by how the container
is removed, this change is valid.

Enhances vmware#6262
anchal-agrawal pushed a commit that referenced this pull request Nov 8, 2017
This commit makes the 'Remove a container and its anonymous volume'
test case more robust by force-removing the container that runs the
/bin/ls command, since it's possible for the container to not have
powered off fully and the container state to not have updated to
'Stopped' before the remove command is issued in the test.

Before this change, it was possible for the test to fail if the
container state wasn't updated on time. This was seen in CI build
14551 for PR #6665.

Since that test snippet's intent (verify that the anonymous volume
can be used by a new container) is unaffected by how the container
is removed, this change is valid.

Enhances #6262
zjs and others added 14 commits November 20, 2017 00:14
Include issue numbers with all TODOs added as a part of the initial
VCH creation API project, as suggested during review of the merge
commit for the VCH creation API feature branch.
Clarify an unclear comment identified during review of the merge
commit for the VCH creation API feature branch.
Remove the serviceapi target as a dependency for the appliance.iso and
ensure all generated code is cleaned up.
Introduce a pair of handlers for deleting VCHs within a vSphere target
or datacenter. By default, deletion includes the VCH and powered off
containers. Deletion may optionally include powered on containers
and/or volume stores as well.

If any containers remain, the VCH is not deleted. If the VCH is not
deleted, the response includes a non-2xx status code.

Define a suite of end-to-end tests which verify the intended deletion
behavior. End-to-end tests do not attempt to verify the behavior of
concurrent operations.
Registry blacklist functionality was designed, but has not been fully
implemented in the engine. Remove references to it from the API.
When creating a VCH via the API, we should not write key/certificate
files to the API server's disk.

This change introduces the behavior as a flag so that it could be
exposed as an option for the CLI at some point in the future, but
does not expose it at this time.
The vic-machine CLI has differing requirements for gateway routing
information, depending on the type of network.

According to the CLI help:
 - a client gateway must specify one or more routing destinations
 - a public gateway must not specify any routing destinations
 - a management gateway must specify one or more routing destinations
 - a container gateway must specify exactly one routing destination

This does not seem to be enforced in code, and may simply be more of a
suggestion about how these gateways should be used than a requirement.

Update the parsing for client, public, and management to support all
zero or more routing destinations in all cases; defer to the existing
ProcessNetwork code to ensure consistent validation behavior now and
in the future.

Additionally update the parsing for container to provide a clear error
message if the expected routing destination is not supplied.
The vic-machine CLI requires that static addresses are specified as a
CIDR, which allows the static address and subnet mask to be supplied
in a compact way on the command line. This pattern does not allow for
static addresses to be expressed in terms of a hostname.

Update the API to match this convention. We may wish to allow for more
flexibility in the API in the future, but there's value in at least
starting with consistent behavior.
The vic-machine CLI allows IP ranges to be expressed in CIDR notation
or as simple ranges in some places, but requires that CIDR notation be
used in others.

Initially there was a hope of making the API behave more consistently,
but that requires changes to underlying logic (some of which is not
well covered by existing tests).

For now, be consistent with the CLI.
The vic-machine CLI supports specifying a "trust level" for each
container network using the --container-network-firewall option.

Support the same functionality in the API.

Additionally, fix a small bug with the way ip ranges are returned
by the inspect API.
Use the client network address, not the public network address, when
displaying the admin portal URI and the docker host information as a
part of API responses.

Additionally, eliminate code duplication.
Additionally, move the binary to a subdirectory of opt to allow the
ISO files to be packaged with it.

See also:
 * http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/opt.html
Log each request and response to a configurable destination. Use
trace.Operation to associate the request and response information
with the logs for the handler to allow for easy debugging, even
when there are several requests being handled concurrently.

Adjust logging in server's main.go to avoid printing to stdout.

Relies on external configuration for log rotation.
@zjs zjs force-pushed the feature/vic-machine-service branch 5 times, most recently from 50d22de to e47dfcb Compare November 20, 2017 08:52
@zjs
Copy link
Member Author

zjs commented Nov 20, 2017

(note to self)

Interesting CI runs:

@zjs zjs force-pushed the feature/vic-machine-service branch from e47dfcb to ad673c0 Compare November 20, 2017 09:10
For each conceptual vic-machine operation, create a trace.Operation
with an operation-local logger. Emit log messages using the log
methods on the operation instead of logrus global methods. This will
write to the vic-machine.log using the global trace.Logger and to
the user using the operation-local logger.

This ensures that both the CLI and API can print log messages to both
a local file and the user (via a TTY, if the CLI, and the datastore).
 * In the CLI, use the operation-local logger to print to the console
   (using the default Logrus TTY formatting) and write to both the
   datastore and vic-machine.log file with the global trace.Logger
   (using VIC's custom log formatter).
 * In the API, use the the operation-local logger to write to the
   datastore and write to the server's console with the global
   trace.Logger. In both cases, use VIC's custom log formatter.

Update code in packages used by vic-machine to accept a context,
coerce/convert it into a trace.Operation, and use its log methods.

Improve readability of logs for concurrent operations by including a
operation/context in trace.Begin calls when one is available.

Retain configuration for the global logrus logger in vic-machine so
that we "fail ugly" (i.e., have lines with mismatched log formats) on
the CLI if something uses the old pattern, but don't lose information.

Update tests which parse log messages to understand this new format
instead of or, where necessary, in addition to old format(s).
@zjs zjs force-pushed the feature/vic-machine-service branch 3 times, most recently from 110545c to dc30dd0 Compare November 20, 2017 13:19
@zjs zjs merged commit dc30dd0 into master Nov 20, 2017
zjs added a commit that referenced this pull request Nov 20, 2017
Define a new VCH Management API and implement the subset of that API
which is necessary for building a VCH Creation Wizard as a part of the
H5 Client Plugin.
AngieCris pushed a commit to AngieCris/vic that referenced this pull request Nov 20, 2017
This commit makes the 'Remove a container and its anonymous volume'
test case more robust by force-removing the container that runs the
/bin/ls command, since it's possible for the container to not have
powered off fully and the container state to not have updated to
'Stopped' before the remove command is issued in the test.

Before this change, it was possible for the test to fail if the
container state wasn't updated on time. This was seen in CI build
14551 for PR vmware#6665.

Since that test snippet's intent (verify that the anonymous volume
can be used by a new container) is unaffected by how the container
is removed, this change is valid.

Enhances vmware#6262
@andrewtchin andrewtchin deleted the feature/vic-machine-service branch February 21, 2018 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants