Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on requirements for C++ library #4

Closed
tigrannajaryan opened this issue Oct 2, 2019 · 17 comments
Closed

Decide on requirements for C++ library #4

tigrannajaryan opened this issue Oct 2, 2019 · 17 comments

Comments

@tigrannajaryan
Copy link
Member

A few topics that need decisions:

  • Bare C or C++ interface.
  • If C++ then minimum C++ version to target (e.g. C++98, 11 or newer).
  • Header-only library or no. If not, do we provide precompiled binaries and for which platforms.
  • Do we depend on C++ std or no.
  • Are exceptions allowed in the API and in the implementation or not.
  • What are external dependencies (other libraries that this library will use and that become a transitive dependency for end users).

Opinions are welcome here in the form of comments and we will also discuss this in SIG meetings.

@rnburn
Copy link
Contributor

rnburn commented Oct 2, 2019

Bare C or C++ interface.

I'd recommend having the first version be based on C++11, but with support for older compilers (for example, gcc 4.8, which would be a subset of C++11)

Gcc 4.8 support is something that's frequently been asked for with OpenTracing and some of their related projects. Example issues
opentracing/opentracing-cpp#50
opentracing-contrib/nginx-opentracing#65

C++98 and C could be supported via a separate C API if there's enough demand for it. And the C API could bind to a C++11 implementation via a bridge so that it's not necessary to have an entire C implementation.

Header-only library or no. If not, do we provide precompiled binaries and for which platforms.

It might be possible to make most of the API usable as a header-only file. But if you want support for a global tracer concept (where a tracer can be registered to a global variable), you'll need a source file. I would expect an implementation of the API to, of course, require source files.

Are exceptions allowed in the API and in the implementation or not.

Given that errors in a tracer should should be non-critical for an application, I'd recommend having most of the API noexcept.

The exception would be cases where user-provided callback code gets invoked and the user might throw exceptions in their own code that gets propagated out of the API function. For example, the OpenTracing API allowed methods like this to throw exceptions if they were initiated by the user's code
https://github.com/opentracing/opentracing-cpp/blob/master/include/opentracing/tracer.h#L117

What are external dependencies (other libraries that this library will use and that become a transitive dependency for end users).

I think the API should have no external dependencies and dependencies from the implementations should be optional depending on which one you use.

@meastp
Copy link
Contributor

meastp commented Oct 3, 2019

I think that we should consider what is most important:

(C-libraries that want to adopt open-telemetry would need a separate C-implementation that does not depend on a C++-runtime. Thus, I think we should at least provide a C++ interface, although we could provide a more API-stable C-interface as well)

  1. If we fully cater for the least common denominator and make a C++98-only implementation using the strictest guidelines and no C++ Standard Library, anyone will be able to use the library in their applications (even though the interface might be less convenient to use).
    However, this will almost certainly make it harder and/or less convenient/fun to contribute to the library.

  2. Another approach could be to use a more modern version of C++ (say, a compiler that fully supports C++14 or C++17) and perhaps provide a C interface for those with older compilers.

Multiple SDKs might be a possibility. Requirements like no use of the C++ Standard Library make the SDK harder/more time consuming to develop - perhaps this could be done in more specialized versions of the SDK?

  1. Please support a widely used build system like CMake. This makes it possible to support package systems like vcpkg and conan etc. which make this library easier to adopt into the CI Build pipeline by users. It also makes contributions easier if one does not have to learn a new build system or the build system is not built into the IDE one is used to.

@tavi-cacina
Copy link

C++11 or newer, std, cmake

@coryan
Copy link
Contributor

coryan commented Oct 22, 2019

While you are consider C++ versions, this might be useful:

https://www.jetbrains.com/lp/devecosystem-2019/cpp/

Consider also that Protobuf is C++11, and so is gRPC.

@coryan
Copy link
Contributor

coryan commented Oct 22, 2019

@coryan
Copy link
Contributor

coryan commented Oct 22, 2019

Note that gRPC decided to depend on std:: after avoiding it for (I think) a couple of years:

https://github.com/grpc/proposal/blob/master/L59-core-allow-cppstdlib.md
https://github.com/grpc/proposal/blob/master/L6-core-allow-cpp.md

I think you should use std:: freely.

@coryan
Copy link
Contributor

coryan commented Oct 22, 2019

@rnburn

Given that errors in a tracer should should be non-critical for an application, I'd recommend having most of the API noexcept.

That is probably not what you want. Declaring a function noexcept is not a compile-time check against exceptions, is a guarantee that at run-time no exceptions will escape, by crashing the program if necessary.

@rnburn
Copy link
Contributor

rnburn commented Oct 22, 2019

@coryan Yes, I'm aware of what it does; and yes, that is what I want.

Given that tracing isn't critical to an app, a tracer should handle errors by logging and dropping spans, data, etc rather than exposing an exception to the end user. So it makes a lot of sense to mark most of the API as noexcept and that's what we did with the OpenTracing API (see, for example, the span interface).

It's is also consistent with recommended guidelines. See Core Guidelines and Effective C++.

Could you explain why you don't think it should be used?

@coryan
Copy link
Contributor

coryan commented Oct 22, 2019

Could you explain why you don't think it should be used?

Because it is trivially easy to make function calls that do raise exceptions and then you get a crash. For example in your span interface:

  virtual std::unique_ptr<SpanContext> Clone() const noexcept = 0;

Assuming the implementation does some memory allocation (I think that is reasonable, but you tell me), then this can throw a std::bad_alloc exception. With the noexcept the program would crash (well, I guess technically it calls std::terminate but that is marked [[noreturn]]).

I am aware that std::bad_alloc happens more or less never, but it can and does happen:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1404r1.html

I might be wrong about all this, and if so I am happy to learn otherwise. noexcept is a great thing to use when you know that both the function itself, and any function it calls, never throw, but otherwise it is better to let exceptions propagate.

@rnburn
Copy link
Contributor

rnburn commented Oct 22, 2019

That function returns a nullptr on failure -- it's documented in the interface.

While the most resilient implementations will turn bad_alloc's into noops (and that's what I did when I implemented it, it's also perfectly accept terminate on bad_alloc.

In fact, it's likely that the default standard allocator will change to terminate on bad_alloc (See Herb Sutter's talk on error handling

g-easy added a commit to g-easy/opentelemetry-cpp that referenced this issue Dec 17, 2019
g-easy added a commit that referenced this issue Dec 19, 2019
@eyakimov-bbg
Copy link
Contributor

I was hoping to add a viewpoint regarding decisions around ABI Compatibility, std:: and versions.

Early vs Late adopters argument

All the following points are based on the viewpoint that: early adopters are more likely to be using latest technologies (versions) whereas those using say C++03 (this includes parts of my org too...) will be less likely to adopt OpenTelemetry until it has been more widely adopted/stable. (I don't have data to back up this opinion 😞 other than personal experience)

Therefore I'd suggest that the best delivery and adoption would be achieved by offering a library to the likely early adopters using latest technologies and deferring the overhead of support for older platforms (and hence late adopters).

ABI Compatibility

How about a hybrid approach of an ABI stable C interface and a header-only C++ interface. This not only avoids the ABI compatibility issue but enables C-only users to use the library, albeit with more difficulty. As per the argument above, this can be deferred and added later, e.g. the initial release can be for say C++17 only, and offer no ABI compatibility guarantees when compiled with older/newer standards.

Standard Library

Presuming ABI issues are solved using the mechanism above, there should be no issue of using std:: in a header-only implementation (presuming care is taken to avoid this). However, if OpenTelemetry is used in a library that is then distributed, then it will inherit this problem which I consider beyond the responsibility of the OpenTelemtry library.

C++ Version

Given the goal of catering to the early adopters who are likely using later versions, I suggest starting with C++17. This version comes with std::string_view, std::variant and std::optional, all of which are useful for this project (e.g. attribute keys and values). Furthermore, both can be backported to earlier versions with reduced features at a later time.

Support for older versions can be added incrementally through version guards (#ifdefs) and allows say C++11 or even C++03 support to be added later and at a higher cost. Perhaps even relying on contributions from those that have the need.

Conclusion

Based on the above I am suggesting:

  1. Use C++17 with STD to start with, taking advantage of useful features
  2. Handle ABI compatibility next, using low-level ABI stable API
  3. Incrementally add language support (C++14/11/03?) with backports as necessary at an increased cost

@rnburn
Copy link
Contributor

rnburn commented Jan 23, 2020

Use C++17 with STD to start with, taking advantage of useful features

Right now we're supporting gcc-4.8 which includes most of C++11. There's still a lot of demand for older compilers and I think targeting something as late as C++17 would be too restrictive. If you look through the issues, you'll find lots of people using opentracing-cpp with older compilers. A few examples:
opentracing/opentracing-cpp#50
opentracing-contrib/nginx-opentracing#65

And we've already gotten most of the benefits of the later versions of C++ by backporting the relevant parts of the standard library (e.g. std::string_view, std::span)

How about a hybrid approach of an ABI stable C interface and a header-only C++ interface.

I'm not opposed to having an additional C API that can wrap the opentelemetry-cpp API. However, a C API would be less convenient to use, have a larger surface area, and have fewer people willing to maintain it.

This came up regularly with opentracing and someone started opentracing-c, but it's unmaintained and I doubt anyone is using it: https://github.com/opentracing/opentracing-c

I'd rather focus on making the C++ API as portable and broadly usable as we can.

@eyakimov-bbg
Copy link
Contributor

Right now we're supporting gcc-4.8 which includes most of C++11. There's still a lot of demand for older

I appreciate the appeal of broader support, my comment was more about the "when" rather than the "if", but I do understand if the answer is "we wish to support C++11 from day 1". If that's the case, might I suggest that we catalogue a list of key projects that we're tackling that require these constraints, e.g. Nginx. It would be a great way to both validate, maintain a list of supported projects/platforms and also provide some marketing of what is being targeted end hence will hopefully offer OpenTelemetry support.

Another thought, for further down-the-line, would be to produce and document an engineering strategy for supporting older platforms, e.g. how many years a certain platform is likely to be supported and the exit strategy. It may be prohibitive to maintain, say C++03 support.

I'm not opposed to having an additional C API that can wrap the opentelemetry-cpp API.

While this does provide an ABI portable C API, the C++ API will not be ABI portable, e.g. it would be hard to guarantee that a version of OpenTelemetry-cpp built with one toolchain would be link-able in a project built in another toolchain. It raises the question of whether this project wants to provide Binary or Source distribution artefacts as this problem does not exist with Source distribution as long as a single toolchain is mandated.

I'm not sure if this is covered by the statement in
https://github.com/open-telemetry/opentelemetry-cpp/blame/master/docs/requirements.md#L18

This also requires a stable C++ ABI. There is quite a bit of interest in this.

Maybe worth while clarifying the stance on this.

Sorry if this comes across critical, my intention is to get clarity and align with the goals.

@rnburn
Copy link
Contributor

rnburn commented Jan 24, 2020

While this does provide an ABI portable C API, the C++ API will not be ABI portable, e.g. it would be hard to guarantee that a version of OpenTelemetry-cpp built with one toolchain would be link-able in a project built in another toolchain.

It's absolutely possible to get a portable C++ ABI. You only need a of handful basic vocabulary types to describe the virtual methods of tracers and meters. We provide those (for example nostd::string_view, nostd::span) so that we fix their ABI and provide implicit conversion operators to their corresponding standard c++ types.

While the C++ standard doesn't officially describe things like vtable layout, in practice, nearly all the compilers we care about will have ABI compatible vtable layouts for a target architecture (it's specified as part of the Itanium C++ ABI).

To get a portable tracer plugin, you then

  1. Statically link the standard c++ library
  2. Apply an export map so that standard C++ library symbols aren't exposed from your shared object (The process is described here)
  3. Dynamically link the standard c library but remap symbols to compatible versions (Process described here).

This will give you a tracer plugin that's usable across a wide range of environments and isn't tied to a particular version of the standard C++ library.

@rnburn
Copy link
Contributor

rnburn commented Feb 10, 2020

Following up from last weeks call, here's my attempt at breaking out the different deployment scenarios that are relevant for ABI stability.

Case 1: Early-binding linking. Both the OpenTelemetry API and implementation are known and linked into the application at compile-time.

This might be one of the most common scenarios. But I don't see it posing any challenges for ABI stability since everything is fixed to the same version.

Case 2: Early-binding linking for the OpenTelemetry API. Late-binding linking for an Implementation. The OpenTelemetry API is known and linked into the application at compile-time. An implementation is loaded as a plugin at runtime.

This approach has the advantage that the application isn't tethered to a particular OpenTelemetry implementation. We are free to use alternative implementations without rebuilding the app, allowing vendors to support target applications without integrating into the OpenTelemetry repo or changing the target application's source code.

Some examples of this pattern:

  • The instrumentation module for NGINX
  • The dynamic-loading support in Envoy
  • Instrumentation in Ceph

It's desirable with this approach to support pre-build portable OpenTelemetry plugin binaries that can be easily deployed. Ensuring that an application can work together with a plugin is possible, but takes some care (See my post above or the plugin PR) as the plugin and application may not have been linked against the same standard C++ library or OpenTelemetry API.

Case 3: Late-binding for instrumented code. In this case, (possibly multiple) instrumented DSOs are loaded into an application at runtime. If we want the DSOs to share global OpenTelemetry objects (tracers, meters, span-contexts), then we need to be careful that 1) the ABIs for global OpenTelemetry objects are stable for different standard C++ libraries and 2) the DSOs are dlopen-ed and dlclose-ed in a way that allows them to share global symbols.

Examples:

  • The OpenTracing implementation for Lua allows it to share a global C++ tracer with the application. With NGINX, this allows traced Lua code to interoperate with tracing done within the NGINX plugin (See example).
  • On our call last week, we discussed loading instrumented native Python modules written in C++ and having them share tracing objects.

I haven't thought through this scenario as thoroughly, but I put together a quick example of how global symbol can be shared across loaded DSOs. A few notes on the linking

  1. The global symbols need to have global visability. The example uses this export map
{
  global:
    f;
    extern "C++" {
      S::Get1*;
    };
  local: *;
};

Only S::Get1 will be shared across DSOs; other symbols like S::Get2 would only have local visibility.

  1. dlopen must be called with RTLD_GLOBAL; If RTLD_LOCAL were used, then the DSO's global symbols won't be exposed to other DSOs.

Python, for example, uses RTLD_LOCAL by default; so for this to work, you'd need to tweak some options.

  1. If globals are shared, you also need to ensure that a DSO isn't dlclose-ed while another DSO still has access to one of its global objects.

@rnburn
Copy link
Contributor

rnburn commented Feb 10, 2020

And to clarify on ABI stability requirements, what's important is to have stable ABI types in the virtual method signatures.

Provided you set up linking correctly, it's fine to use STL types with non-stable ABIs in non-virtual methods and as data members.

Here's an example that sets up two DSOs that interface with a virtual Cont class but have different definitions for the derived class ContImpl (to simulate using incompatible STLs).

If you build it and run, you can see that each DSO can still interact with a ContImpl constructed from the other DSO

root@c905904975ac:/src# ./a.out ./liba.so ./libb.so
constructing ContImpl v1
A.f: A
B.f: A
B.f: B
root@c905904975ac:/src# ./a.out ./libb.so ./liba.so
constructing ContImpl v2
B.f: B
A.f: B
A.f: A

@eyakimov-bbg
Copy link
Contributor

@rnburn this was a great write up in #4 (comment), as per my comments during previous week's SIG meeting, I think it would be worthwhile broadening the options further to consider:

Binding Options:

  • Static
  • Dynamic Early
  • Dynamic Late

Components:

  • Application Binary (the thing with main)
  • OTel library itself (with a singleton)
  • OTel compliant vendor (exporter?) implementation
  • OTel instrumented library

We can then explore each scenario/example, e.g. the Nginx example becomes:

  • Pre-built binary without OTel awareness
  • Dynamic Late bound OTel instrumentation module
  • Dynamic Early OR Static binding of OTel library into instrumentation module
  • Dynamic Late bound vendor-specific OTel compliant

Now, in this case, the 3rd step, whether statically or dynamically early bound has some implications on how an OTel singleton would be loaded.

This would also highlight the challenges of some combinations, such as:
2 dynamically bound modules who themselves may statically bind OTel singletons. This creates a problem that there are now potentially multiple singletons loaded. Therefore this should be discouraged or at least the risks clearly stated.

I suggest documenting the "library binary distribution" policy, including both the ABI compatibility implications, the different binding scenarios, and recommendations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants