Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[skip ci] live streaming VCH creation log to VCH datastore folder #6403

Merged

Conversation

AngieCris
Copy link
Contributor

@AngieCris AngieCris commented Sep 21, 2017

Fixes #6036

Goal
Enable live VCH creation log streaming to VCH datastore folder

Design details
A new package called vchlog dealing with log streaming is added under lib/install/
vchlog has a vchlog.Run() function that blocks until the VCH datastore folder is available, then it uploads a buffered pipe readwriter (for log messages) to datastore folder via govmomi.
vchlog is initialized and the vchlog.Run() is called in main.go by cmd/vic-machine in a go routine.

govmomi upload function spins in a loop reading from the stream until EOF. By the implementation of the buffered pipe within vchlog, the read is blocked until a write has done writing (when there's new data available). Thus log messages are concurrently read and uploaded to the datastore while they're being written ("real-time" stream).

Remaining work

  1. Enable live log streaming for other commands
    Currently only vic-machine create can live stream logs to VCH datastore folder. For other commands, this pr is a no-op.
  2. Investigate operation ID for each vic-machine operation

Can people please review this PR? Thanks! @zjs @andrewtchin @jzt

Copy link
Member

@zjs zjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems bufferedPipe.go could be simplified by using an io.Pipe and a bufio.Reader instead of trying to replicate their functionality. Is there a reason that is not feasible?

@@ -139,6 +140,12 @@ func main() {
logs = append(logs, f)
}

// create VCH logger for streaming creation logs to datastore folder
Copy link
Member

@zjs zjs Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the logging framework we're using, logrus, uses hooks as an extensibility model. Could our new logging functionality be simplified by following that pattern? (To be clear: I'm not asserting that we should use the hook pattern, just making sure we've considered it.)

Copy link
Contributor Author

@AngieCris AngieCris Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zjs hook is essentially callback function that gets called every time a log message is written. Datastore.Upload provided by govmomi overwrites the original content, so we can't do upload in hook (every log.infof, log.debugf and the like will be followed with a upload, and the previous log message got overwritten by the next one, etc.)
Also, we know exactly what point of time the datastore folder is ready, and the callback hook is applied to all the log messages alike, it's better to signal at a specific point of time than putting it in a general callback hook.
(and yea I talked about it with Hasan before and experimented with adding a hook. And on datastore it ended up being just one line of log message in the log file. That's how I found out govmomi's Datastore.Upload does not append)

@AngieCris
Copy link
Contributor Author

AngieCris commented Sep 21, 2017

@zjs write to the pipe is still blocked since before datastore path is ready, there's no read request (no Datastore.Upload is called) even if the read is buffered. The blocking for io.PipeReader and io.PipeWriter is two-way - all reads are blocked until one writer is free, all writes are blocked until one reader is free.
Also, I think it's the write requests that need to be buffered, not the reads - before the datastore folder is ready, there're a lot of writes to the log stream that needs to be buffered. The read requests come after that, and they're called by Datastore.Upload via govmomi.
For the other way around, I tried wrapping the io.PipeWriter with bufio.Writer, but after inspecting the source code, the writer just writes everything in a buffer, and won't send to the underlying writer for the reader to retrieve without flushing. Since every log.infof and log.debugof and the like is a write to the writer, I could do flushing in a callback logrus hook but I think it's just neater to implement our own logging package.

@AngieCris
Copy link
Contributor Author

Update:

now the log file name contains a unique operation ID for every vic-machine process.
still, it only works for VCH creation, not other vic-machine commands. VCH creation is partially asynchronous. So this update is in prep for future development of live log streaming for other VCH commands

@AngieCris AngieCris self-assigned this Sep 21, 2017
@zjs
Copy link
Member

zjs commented Sep 21, 2017

any read from io.PipeReader is blocked until a write is done, even after wrapping it with bufio.Reader

Isn't this the behavior we want? That is, don't we expect the upload process reading the stream to block between writes?

(And isn't this blocking behavior what is implemented? https://github.com/vmware/vic/pull/6403/files/cb5c21cbe8e2ff8019595792d1d1f0bfaec50b13#diff-d6efab20244e6d6571da7b6b44edce17R45)

@AngieCris
Copy link
Contributor Author

AngieCris commented Sep 21, 2017

@zjs I just edited the comments (my bad I didn't say that clear at first, sorry) - we want the writes to be buffered (to be not blocked), not the reads, so even we wrap the io.PipeReader with bufio.Reader, still the second write would be blocked.
As I mentioned in the comment, I tried wrapping io.PipeWriter with bufio.Writer, but the write won't be flushed automatically to the underlying writer stream (so that the reader can receive it) unless we explicitly do writer.Flush()(we can't flush until everything's written because we want it "real-time", and on the other hand, we can't either flush everywhere)

// suffix the log file name with caller operation ID as vic-machine process identifier
callerOp := trace.NewOperation(sig.Context, "vic-machine")
opID := callerOp.ID()
logFileNameWithOpID := sig.LogFileName + "_" + opID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include a timestamp here for easy sorting of the log files? (Maybe a startTime could be stored as a part of the operation created by trace.NewOperation?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. Operation package currently does not have a timestamp field, but I can pass that as a signal to the logger along with the operation. Do we want the timestamp to be unix nano in msec, or year-month-day?

sig := <-signalChan

// suffix the log file name with caller operation ID as vic-machine process identifier
callerOp := trace.NewOperation(sig.Context, "vic-machine")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong place to be calling trace.NewOperation. Shouldn't this code be passed a context where the operation has already been created? (That way the operation id in the log messages matches the operation id in the filename.)

Copy link
Contributor Author

@AngieCris AngieCris Sep 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right. I was not clear about the operation concept before. This is def not a good place to do trace.NewOperation...
fixing it

"github.com/vmware/vic/pkg/trace"
)

// VCHCreatedSignal serves as a signal struct indicating datastore folder path is available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with a more accurate name for this? (We're sending the datastore is created, not the VCH; we're only part of the way through the VCH creation process.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, good catch

@zjs zjs force-pushed the feature/vic-machine-service branch from 92aa233 to 6e38a9b Compare September 22, 2017 20:22
@AngieCris AngieCris force-pushed the vchlivecreatelogs branch 2 times, most recently from 81111c0 to eb7171b Compare September 25, 2017 17:10
Copy link
Member

@zjs zjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

@AngieCris
Copy link
Contributor Author

AngieCris commented Sep 25, 2017

Clarifications on the design of a self-implemented log package (bufferedPipe):

We have the following goals for the logger that streams VCH creation logs to VCH datastore folder:

  1. buffer an undetermined (possibly unbounded) number of writes
  2. block the read when there's no data to read (waiting for new data to come)

The reasons being that:

  1. VCH datastore folder path is available quite late in the process, after the params parsing, network configuring, pool checking, etc. Before it becomes available, there is an uncertain number of log messages being generated and we should buffer them instead of dropping them. After the datastore folder path is ready, we begin consuming those buffered data.
  2. The logger does not write continuously. Since the VCH creation process is across packages/layers, there's necessary context switch that results in lag time between some writes. Since Datastore.Upload reads in a tight loop, data will be exhausted during the context switch before the new write comes after context switch. Hence the read must be blocked when data is exhausted.
    If the read is not blocked, during context switch time in VCH creation process, the log messages are exhausted by read, and further reads give EOF which early terminates the streaming, resulting in incomplete log files.

According to 1. above, we must have a bytes.Buffer as underlying structure since it grows and buffers data.
According to 2., the read is always blocked unless the logger pipe has been explicitly closed. And the usage of sync.Cond in the bufferedPipe package enables this with Wait() and Signals().

Why standard libraries by golang does not work:

  1. Use io.Pipe and wrap io.PipeReader with bufio.Reader
    bufio.Reader does not buffer writes.
  2. Use io.Pipe and wrap io.PipeWriter with bufio.Writer
    This does buffer writes, but it does not flush to the underlying pipe for the reader to grab unless we explicitly do writer.Flush(). This means at the moment datastore path becomes available, we flush the buffered writes. For all log messages after that point, we flush for every write (because we want it to be "real-time").
    Since every log.infof, log.debugf and the like is a write, the only way we can do this is via a callback log hook. But this hook is applied automatically to all writes alike, we can't twerk it to act differently for log messages in different situations.
  3. Use an intermediate bytes.Buffer to be consumed and produced at the same time
    The main issue is that read must be blocked. As described before, during context switch time, if read is not blocked, incomplete log files would be generated due to early EOF.

@AngieCris AngieCris requested review from mdharamadas1 and emlin and removed request for mdharamadas1 September 26, 2017 14:55
datastoreReadySignal := vchlog.DatastoreReadySignal{
Datastore: d.session.Datastore,
LogFileName: "vic-machine-create",
Operation: trace.NewOperation(d.ctx, "vic-machine create"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't see much value for the operation ID here, because you're writing logs to different files for different operation in different time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be more than one vic machine processes at the same time, to prevent overwriting, we also included the operation ID together with the timestamp

@zjs zjs force-pushed the feature/vic-machine-service branch 2 times, most recently from 9ecfbab to 5511407 Compare September 27, 2017 21:28
@AngieCris AngieCris force-pushed the vchlivecreatelogs branch 3 times, most recently from 709af07 to 9ee460a Compare September 27, 2017 22:08
@zjs zjs force-pushed the feature/vic-machine-service branch from 5511407 to 75bbbc0 Compare September 28, 2017 22:17
Copy link
Contributor

@mhagen-vmware mhagen-vmware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add at least one integration test as well? should just need to do a govc call, maybe in the existing regression test?

Copy link
Contributor

@mhagen-vmware mhagen-vmware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@AngieCris AngieCris merged commit 9e44bd5 into vmware:feature/vic-machine-service Oct 2, 2017
zjs pushed a commit to zjs/vic that referenced this pull request Oct 9, 2017
…ware#6403)

* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Oct 25, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Oct 31, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Nov 6, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Nov 6, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit to zjs/vic that referenced this pull request Nov 7, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit to zjs/vic that referenced this pull request Nov 7, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Nov 15, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Nov 16, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
zjs pushed a commit that referenced this pull request Nov 20, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
AngieCris added a commit to AngieCris/vic that referenced this pull request Nov 20, 2017
* clean up after rebase

* integration test added to 6-04

* integration test md file modified

* integration test md file change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants