Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETW exporter how-to - usage instructions document #628

Merged
merged 6 commits into from
Mar 30, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions exporters/etw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Getting Started with OpenTelemetry C++ SDK and ETW exporter on Windows

Event Tracing for Windows (ETW) is an efficient kernel-level tracing facility that lets you log kernel or
application-defined events to a log file. You can consume the events in real time or from a log file and
use them to debug an application or to determine where performance issues are occurring in the application.

OpenTelemetry C++ SDK ETW exporter allows the code instrumented using OpenTelemetry API to forward events
to out-of-process ETW listener, for subsequent data recording or forwarding to alternate pipelines and
flows. Windows Event Tracing infrastructure is available to any vendor or application being deployed on
Windows.

## API support

These are the features planned to be supported by ETW exporter:

- [x] OpenTelemetry Tracing API and SDK headers are **stable** and moving towards GA.
- [ ] OpenTelemetry Logging API is work-in-progress, pending implementation of [Latest Logging API spec here](https://github.com/open-telemetry/oteps/pull/150)
- [ ] OpenTelemetry Metrics API is not implemented yet.

Implementation of OpenTelemetry C++ SDK ETW exporter on Windows OS is `header only` :
maxgolov marked this conversation as resolved.
Show resolved Hide resolved

- full definitions of all macros, functions and classes comprising the library are visible to the compiler
in a header file form.
- implementation does not need to be separately compiled, packaged and installed in order to be used.

All that is required is to point the compiler at the location of the headers, and then `#include` the header
files into the application source. Compiler's optimizer can do a much better job when all the library's
source code is available. Several options below may be turned on to optimize the code with the usage of
standard C++ library, Microsoft Guidelines Support library, Google Abseil Variant library. Or enabling
support for non-standard features, such as 8-bit byte arrays support that enables performance-efficient
representation of binary blobs on ETW wire.

## Example project

The following include directories are required, relative to the top-level source tree of OpenTelemetry C++ repo:

- api/include/
- exporters/etw/include/
- sdk/include/

Code that instantiates ETW TracerProvider, subsequently obtaining a Tracer bound to `OpenTelemetry-ETW-Provider`,
and emitting a span named `MySpan` with attributes on it, as well as `MyEvent` within that span.

```cpp

#include <map>
maxgolov marked this conversation as resolved.
Show resolved Hide resolved
#include <string>

#include "opentelemetry/exporters/etw/etw_tracer_exporter.h"

using namespace OPENTELEMETRY_NAMESPACE;
using namespace opentelemetry::exporter::ETW;

// Supply unique instrumentation name (ETW Provider Name) here:
std::string providerName = "OpenTelemetry-ETW-Provider";

exporter::ETW::TracerProvider tp;

int main(int argc, const char* argv[])
{
// Obtain a Tracer object for instrumentation name.
// Each Tracer is associated with unique TraceId.
auto tracer = tp.GetTracer(providerName, "TLD");

// Properties is a helper class in ETW namespace that is otherwise compatible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I am missing something, but where is this helper class Properties defined in ETW namespace ? Had a quick look to the code, but didn't find it there ?

Copy link
Contributor Author

@maxgolov maxgolov Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might have missed the using aliases. I'll refresh it. It's presently defined here, in a PR which contains the code. I wanted to scope this PR to documentation only. This is the code:

using Properties = opentelemetry::exporter::ETW::Properties;

using Properties       = opentelemetry::exporter::ETW::Properties;
using PropertyValue    = opentelemetry::exporter::ETW::PropertyValue;
using PropertyValueMap = opentelemetry::exporter::ETW::PropertyValueMap;

Going forward I want to benchmark the pros and cons of passing a container vs. initializer list, the topic we discussed with Johannes above. In general, the concept is not necessarily unique to ETW. It could be applied to other exporters. Maybe I'd propose a refactor of pulling it in a separate (optional) helper class on API surface. Users may still use the existing initializer lists as well, as this does not break the "API". There is a moment where we have to be careful about "ABI". But since the implementation is header-only and linked straight into the app, with the matching runtime, the "ABI" compatibility issue is not applicable in this scenario. I can elaborate on this offline.

Copy link
Contributor Author

@maxgolov maxgolov Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to make sure that even if we pass in a reference to map, if there is a need to involve a transform to API call across the ABI boundary, then the container itself is KeyValueIterable-interface compatible. Means the SDK code may iterate over that one in ABI safe manner, and perform a copy of all key-value pairs to the actual container beyond the ABI boundary. This will only be needed for other (non-ETW) exporters, and ETW exporter is lucky in this regard. As it's currently done fully as header-only, slim, very few hops to the actual "Event Write" routine, with no shared library. Maybe I can contribute an example that can fork the incoming event properties container (Event) to both - ETW and some other destination, such as Standard Output exporter for illustrative purposes (in contrib repo). Basically, an example that shows how to forward that container, dual-home to two TraceProvider. An example of a MetaTraceProvider, that aggregates within itself two different OpenTelemetry provider implementations.

// with Key-Value Iterable accepted by OpenTelemetry API. Using Properties
// should enable more efficient data transfer without unnecessary memcpy.
Comment on lines +65 to +67
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit misleading in the context of this example. As far as I can see, none of the usages of Properties here is more efficient than what the API provides out of the box. There's no memcpy involved in this case:

span->AddEvent(eventName, {
    {"uint32Key", (uint32_t)1234},
    {"uint64Key", (uint64_t)1234567890},
    {"strKey", "someValue"}
});

Copy link
Contributor Author

@maxgolov maxgolov Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer - really depends. When we need to decorate the properties with additional attributes, such as - passing non-const properties, append some common properties... Maybe a set of common properties unique to custom TracerProvider . For example - some sort of custom correlation technique that needs to be appended on every record. Let's say, some custom field called cV and stamped on every Properties object passed in; OR appending a group of other device.* or app.* fields on top.. Then we could have reused the original container that was created in the customer code once, then passing it around as non-const by reference on that same thread. Thus, not requiring a transform to Recordable , and not needing the key-value iteration on it. And not needing to construct it in the SDK from initializer list as an argument...

I think I should have moved this comment from Attributes down to event Properties below, to make a stronger case 😄 Because with Attributes, it appears that what you are suggesting is indeed looking nicer. Maybe I should capture both. I can remove the comment if you don't like it, but I see scenarios where it would be more efficient.

Copy link
Contributor Author

@maxgolov maxgolov Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this code reference as an example where accepting the collection, then passing this original collection around by reference, decorating it with additional contents, stripping unneded fields, etc. would be more efficient than trying to construct a new object (or a copy of an object) from initializer list. I think I understand your concern that I should've elaborated on exactly what scenarios would (in future) may benefit from passing in the original collection. This should work great for a header-only library, as we later on may completely bypass the copy/iteration/creation of a new object from initializer list, operating on original customer-code created object.

Structure-wise, both initializations look similar: initializer list as an argument to template, or a reference to container that's been already populated in customer code.

We mostly follow this pattern in this "other" library elsewhere:
https://github.com/microsoft/cpp_client_telemetry/blob/master/lib/api/Logger.cpp
And the practice of passing around the container prepared in customer code, e.g. literally passing around a modifyable map of variants, - has been working well. Especially when the original collection passed to SDK needs additional "decoration".

Copy link
Contributor Author

@maxgolov maxgolov Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another scenario: let's say your collection does not change and contains something like appName=x,status=Failed,scenario=FileOpen. You'd prepare it once in a static initializer. Then you pass it as a reference to Properties to AddEvent, never destroying it, thus - never needing to construct/reconstruct it. Appears like in the other case (if you used the initializer list instead of passing collection by reference), you'd end up constructing it from Key-Value Iterable every time? It appears like construct-once-then-reuse will be more optimal then, at least for a statically linked code where there is no passing across ABI boundary needed, no key-value iterable transform. I can add some benchmarks to measure those scenarios in my other (code) PR.

Copy link
Member

@lalitb lalitb Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the specific use-case/scenario as an example in this doc or in unit-test ( and refer it here ) to justify the performance statement. As the current api/sdk implementation, the only place where memcpy MAY happen is during Recordable::AddEvent() execution, but that would depend how exporter implements this logic.

Copy link
Contributor Author

@maxgolov maxgolov Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only place where memcpy MAY happen is during Recordable::AddEvent() execution

Well, that is the most commonly invoked function, actually. In ETW use case - it is entirely header-only, and it entirely avoids transform to Recordable upfront. The idea is that Properties may not even need to be transformed to Recordable. Because it stays valid as a container - for the duration of invocation of synchronous AddEvent call, that immediately performs serialization of event from original container + decoration into either ETW/TraceLogging (binary packed in standard TraceLogging format) or ETW/MessagePack. That (transform + IPC) is done right away on user calling thread with no background batching.

In case of other Key-Value iterable container, e.g. in case of a map -- there is a need to iterate for each key-value (as you mentioned, in case of batching exporter and `Recordable). I'll illustrate the difference between when the memcpy transform is unavoidable, and where the original container may be reused.

Copy link
Contributor Author

@maxgolov maxgolov Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing an update to my code PR to explain this further ( +1 benchmark, yay! ), but that won't change the statement made here.


// Span attributes
Properties attribs =
{
{"attrib1", 1},
{"attrib2", 2}
};

// Start Span with attributes
auto span = tracer->StartSpan("MySpan", attribs);

// Emit an event on Span
std::string eventName = "MyEvent1";
Properties event =
{
{"uint32Key", (uint32_t)1234},
{"uint64Key", (uint64_t)1234567890},
{"strKey", "someValue"}
};
span->AddEvent(eventName, event);

// End Span.
span->End();

// Close the Tracer on application stop.
tracer->CloseWithMicroseconds(0);

return 0;
}
```

Note that different `Tracer` objects may be bound to different ETW destinations.

## Build options and Compiler Defines

While including OpenTelemetry C++ SDK with ETW exporter, the customers are in complete control of
what options they would like to enable for their project using `Preprocessor Definitions`.

These options affect how "embedded in application" OpenTelemetry C++ SDK code is compiled:

| Name | Description |
|---------------------|------------------------------------------------------------------------------------------------------------------------|
| HAVE_CPP_STDLIB | Use STL classes for API surface. This option requires at least C++17. C++20 is recommended. Some customers may benefit from STL library provided with the compiler instead of using custom OpenTelemetry `nostd::` implementation due to security and performance considerations. |
| HAVE_GSL | Use [Microsoft GSL](https://github.com/microsoft/GSL) for `gsl::span` implementation. Library must be in include path. Microsoft GSL claims to be the most feature-complete implementation of `std::span`. It may be used instead of `nostd::span` implementation in projects that statically link OpenTelemetry SDK. |
| HAVE_ABSEIL_VARIANT | Use `absl::variant` instead of `nostd::variant`. `nostd::variant` is incompatible with Visual Studio 2015. `absl::variant` should be used instead if targeting Visual Studio 2015. |
| HAVE_TLD | Use ETW/TraceLogging Dynamic protocol. This is the default implementation compatible with existing C# "listeners" / "decoders" of ETW events. This option requires an additional optional Microsoft MIT-licensed `TraceLoggingDynamic.h` header. |
| HAVE_CSTRING_TYPE | Allow passing C-string type `const char *` to API calls. In several scenarios (e.g. integration with C code) it is more performant to construct `nostd::string_view` from original C-string instead of constructing it from `std::string` copy of the C-string, thus avoiding unnecessary `memcpy`. |

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely summarized the build options. Few of the options ( HAVE_ABSEIL_VARIANT, HAVE_CPP_STDLIB, HAVE_ABSEIL_VARIANT ) are applicable as build instructions for opentelemetry-cpp, do you think we can move them there ( https://github.com/open-telemetry/opentelemetry-cpp/blob/main/INSTALL.md#building-as-standalone-cmake-project ), give it's reference and keep rest of them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I want to have a separate document created that covers ALL options. Maybe a separate section in INSTALL.md as you are suggesting. For this one - I tried to keep it scoped to what is relevant in this concrete usage scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the description for the options, could we add an extra column to the the pros and even cons of using each option to help users to make decision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of elaborating on what this option is doing. But maybe in that same description column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more sentences. HAVE_CPP_STDLIB has its own separate detailed design doc here:
https://github.com/open-telemetry/opentelemetry-cpp/blob/main/docs/building-with-stdlib.md

## Debugging

### ETW TraceLogging Dynamic Events

ETW supports a mode that is called "Dynamic Manifest", where event may contain strongly-typed
key-value pairs, with primitive types such as `integer`, `double`, `string`, etc. This mode
is optional and requires `TraceLoggingDynamic.h` header.

Complete [list of ETW types](https://docs.microsoft.com/en-us/windows/win32/wes/eventmanifestschema-outputtype-complextype#remarks).

OpenTelemetry C++ ETW exporter implements the following type mapping:

| OpenTelemetry C++ API type | ETW type |
|----------------------------|-----------------|
| bool | xs:byte |
| int (32-bit) | xs:int |
| int (64-bit) | xs:long |
| uint (32-bit) | xs:unsignedInt |
| uint (64-bit) | xs:unsignedLong |
| double | xs:double |
| string | win:Utf8 |

Support for arrays of primitive types is not implemented yet.

Visual Studio 2019 allows to use `View -> Other Windows -> Diagnostic Events` to capture
events that are emitted by instrumented application and sent to ETW provider in a live view.
Instrumentation name passed to `GetTracer` API above corresponds to `ETW Provider Name`.
If Instrumentation name contains a GUID - starts with a curly brace, e.g.
`{deadbeef-fade-dead-c0de-cafebabefeed}` then the parameter is assumed to be
`ETW Provider GUID`.

Click on `Settings` and add the provider to monitor either by its Name or by GUID. In above
example, the provider name is `OpenTelemetry-ETW-Provider`. Please refer to Diagnostic Events
usage instructions [here](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-how-to-monitor-and-diagnose-services-locally#view-service-fabric-system-events-in-visual-studio)
to learn more. Note that running ETW Listener in Visual Studio requires Elevation, i.e.
Visual Studio would prompt you to confirm that you accept to run the ETW Listener process as
Administrator. This is a limitation of ETW Listeners, they must be run as privileged process.

### ETW events encoded in MessagePack

OpenTelemetry ETW exporter optionally allows to encode the incoming event payload using
[MessagePack](https://msgpack.org/index.html) compact binary protocol. ETW/MsgPack encoding
requires [nlohmann/json](https://github.com/nlohmann/json) library to be included in the build
of OpenTelemetry ETW exporter. Any recent version of `nlohmann/json` is compatible with ETW
exporter. For example, the version included in `third_party/nlohmann-json` directory may be
used.

There is currently **no built-in decoder available** for this format. However, there is ongoing
effort to include the ETW/MsgPack decoder in [Azure/diagnostics-eventflow](https://github.com/Azure/diagnostics-eventflow)
project, which may be used as a side-car listener to forward incoming ETW/MsgPack events to many
other destinations, such as:

- StdOutput (console output)
- HTTP (json via http)
- Application Insights
- Azure EventHub
- Elasticsearch
- Azure Monitor Logs

And community-contributed exporters:

- Google Big Query output
- SQL Server output
- ReflectInsight output
- Splunk output

[This PR](https://github.com/Azure/diagnostics-eventflow/pull/382) implements the `Input adapter`
for OpenTelemetry ETW/MsgPack protocol encoded events for Azure EventFlow.

Other standard tools for processing ETW events on Windows OS, such as:

- [PerfView](https://github.com/microsoft/perfview)
- [PerfViewJS](https://github.com/microsoft/perfview/tree/main/src/PerfViewJS)

will be augmented in future with support for ETW/MsgPack encoding.

## Addendum

This document needs to be supplemented with additional information:

- [ ] mapping between OpenTelemetry fields and concepts and their corresponding ETW counterparts
- [ ] links to E2E instrumentation example and ETW listener
- [ ] Logging API example
- [ ] Metrics API example (once Metrics spec is finalized)
- [ ] example how ETW Listener may employ OpenTelemetry .NET SDK to 1-1 transform from ETW events back to OpenTelemetry flow
- [ ] links to NuGet package that contains the source code of SDK that includes OpenTelemetry SDK and ETW exporter