Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should "impossible" runtime access checks for not_null being null be disabled in production? #649

Closed
xaxxon opened this issue Mar 28, 2018 · 19 comments

Comments

@xaxxon
Copy link
Contributor

xaxxon commented Mar 28, 2018

Is the goal of not_null to be to try to ensure against runtime corruption of the pointer it manages? Right now, on every access there is a penalty associated with checking whether it's null, even though it's not a valid state for it to get in to.

This feels wrong to me. I would expect that check to be disabled outside of debug builds.

Thoughts?

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 28, 2018

In addition, is the check even going to happen or is it giving a false sense of security? If the result is immediately dereferenced, won't the optimizer just throw out the check since it's "impossible" to dereference a null pointer?

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 28, 2018

I'm not perfect at writing benchmarks, but I think this shows the check is a 50% hit to the time to dereference an int * if the check happens (otherwise the times are identical)

http://quick-bench.com/EASWGo0kJYR4HenJaItSw6gOKCs

I had to make the pointer volatile to make sure the optimizer wouldn't discard the null check since dereferencing a null is UB, but overall having this level of unpredictability (either you think there's a test but there's not, or if the optimizer doesn't discard it, you pay a significant penalty when you think you're dereferencing a pointer) doesn't feel right in a release build.

@MikeGitb
Copy link
Contributor

Unless you deactivate contract checking, the compiler will not optimize the check away based on what happens afterwards, because the check could terminate the program before that. This time jumps due to UB are only possible in cases, where your program will always execute the invalid operation.

However, the compiler might be smart enough to track the origin of a pointer, know it can't be null and eliminate a check based on that. So no, this does not save you from memory corruption.

@xaxxon xaxxon closed this as completed Mar 28, 2018
@xaxxon xaxxon reopened this Mar 28, 2018
@neilmacintosh
Copy link
Collaborator

@xaxxon I believe you are referring to the Ensures statement in get() of not_null. That is a contract statement, that describes (and enforces) the important postcondition that the pointer you obtain from a not_null object is never null.

When originally implementing GSL classes, we took the approach of documenting contracts like this, assuming that compilers are free to optimize away tests that can safely be proven redundant. Where they cannot optimize away a check, then either the test provides safety that is important to users of the class, or there is something that blocks the optimization. Either way, it is worth investigating to learn more.

I'm not sure the test you attached to this issue is a particularly compelling one to investigate. Because you have made the pointer being tested volatile I think you effectively kill many optimizations around that pointer...so yes, I would expect to see worst-case performance on testing its nullness. However, your test case did make me think that using a not_null in conjunction with a volatile pointer would be a mistake worth catching.

Can I suggest the more relevant discussion would be around whether the Ensures is optimized away in more general cases. If it is not, let's dig in as to whether its a matter of how we are expressing the contract, whether there is something we have mis-implemented, or whether particular compilers could do better.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 28, 2018

The point of the volatile was only to disable compiler optimizations in a straightforward way. It very clearly creates a situation where the compiler cannot know what the value is without introducing any additional overhead.

However, here's a different benchmark which forces the compiler to not assume the value by putting it inside a function call. The function is forced to not be inline to simulate a function which can not be inlined, yet keep the example small and easily sharable, but hopefully we can agree that such a situation is reasonable to happen.

http://quick-bench.com/igyYgPYOleAZ1xsSBWywQ9FPz5o

Honestly I'm not sure why the one with the test is > 6x slower.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 28, 2018

As for expressing the contract, it seems the contract should be "cannot become null" which doesn't require any test for "isn't currently null" except as part of unit test on the class to make sure it is implemented correctly. I'm not suggesting removing the checks while creating a not_null from a non not_null object.

@neilmacintosh
Copy link
Collaborator

I think the contract is correct, as written. It says what the class offers: that you can't obtain a null pointer from it.

If I understand correctly, the thing you are concerned about is performance when that contract is enforced, in situations where the compiler doesn't optimize away the check.

Your benchmark is interesting, but not really comparing what we're discussing. It appears to compare doing one null-check in a non-inlined function call to doing no null-checks at all. If a programmer wanted to have no null checks at all, why would you use not_null?

I think the more interesting comparison would be passing a not_null parameter (with its check at construction and in get()) against manually checking nullness inside the non-inlined, called function. Of course, that still is just a micro-benchmark and wouldn't tell you much about performance in real-world codebases, where inlining does happen (and whole program optimizations etc). But at least it would be illustrating what I think is the concern you originally raised.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 29, 2018

I seem to be making some assumptions that we don't share so I'll try to make them more explicit.

First, I assumed the point of a test case isn't to show off actual real-world code with a bunch of irrelevant business logic, but to show the behavior in a clear, minimal example with only relevant details.

I specifically said in my first example " had to make the pointer volatile to make sure the optimizer wouldn't discard the null check" and not that I wanted to actually use a volatile pointer. You responded with " I think you effectively kill many optimizations around that pointer." which was exactly my intent. The thought was that someone would be able to imagine other situations where the optimizer would be similarly unable to make assumptions.

In order to make it more clear, I switched the example to a non-inlined function since functions called across TU's are normally not inlinable. I stated "The function is forced to not be inline to simulate a function which can not be inlined" which I believed that someone would be able to know common situations where this happens.

Second, I assumed the most important performance metric of a pointer abstraction is it's access time. For example, shared_ptr has creation overhead, but has zero access overhead associated with using it. My examples don't have any tests on creation costs because I believe that optimizing for cases with far more accesses than assignments is by far the most common situation. This makes it unnecessary to write a benchmark which takes into consideration a null check on assignment. I definitely never stated that I wanted no null checking at all.

If you don't share these assumptions, then what I thought would be a fairly quick and straightforward discussion is going to be VERY long and drawn out and honestly not worth having as compared to just maintaining my own code.

Going down to first principles on goals for pointer abstractions or dealing with long-form code on my particular use cases for not_null just isn't worth the benefit.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 29, 2018

The third assumption I was making is that the abstraction shouldn't try to protect the user from malicious use. If the interface strongly enforces a value not being able to be set to null, needing to check that it continues to be null is unnecessary. There is nothing one can do to reasonably accidentally set the value to null so checking that it continues to be null is unnecessary. Programmers make mistakes but I cannot envision anything short of intentional malice or general memory corruption which will cause the access check to trigger and even if it does, it will provide no benefit since the actual error will almost certainly be somewhere completely unrelated to the not_null - and that's assuming the not_null check isn't eliminated by the optimizer anyhow, so it's not a guarantee in the face of memory corruption, anyhow.

@neilmacintosh
Copy link
Collaborator

@xaxxon I genuinely don't mean to frustrate you here.

Your first assumption about test cases is completely valid and I agree with it. But you didn't present a test case to discuss usage. You provided a micro-benchmark that did not even include the class under discussion and focused only on a non-optimizable scenario. We have repeatedly made it clear that while we do not rely on optimization being performed in 100% of cases, we do assume good compiler optimization to help reduce the overhead of runtime checks.

Your second assumption is something I disagree with. I am interested in the overall performance of each of these types - not only one of their dimensions. Moreover, because you are not concerned with the construction cost of a not_null, you omitted it from your benchmark. But the act of checking the pointer in construction may help an optimizer see that access checks can be removed. So by focusing narrowly on one element of the class it is easy to misrepresent opportunities for optimization.

Your third assumption is valid - we don't try to protect users of these types from actively malicious code. That is not the purpose of the Ensures statement inside get(). The purpose of that statement is to describe (and enforce) a contract offered by not_null. If there is evidence that having the enforcement there is a significant cost in real-world usage scenarios, then we could discuss removing it. But evidence of that would have to include the full not_null class.

The GSL has some clear performance goals. If there is a straightforward way to better meet them here I'd love to see it pursued. However, the GSL is also supposed to embody the principles of the Core Guidelines. The Guidelines recommend using Expects and Ensures to describe and enforce contracts in code (in the current absence of a language-provided contract mechanism). So I'd want to see some clear evidence of how it costs unnecessarily here and why compilers don't optimize it. Then it becomes easy to discuss removing that particular enforcement.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 29, 2018

#include "GSL/include/gsl/pointers"                                                                                                                                                                                                           
#include <benchmark/benchmark.h>                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
int f(gsl::not_null<int *>);                                                                                                                                                                                                                  
int f2(int *);                                                                                                                                                                                                                                
                                                                                                                                                                                                                                              
volatile int * p = new int;                                                                                                                                                                                                                   
void terminate_handler() {                                                                                                                                                                                                                    
  *p++;                                                                                                                                                                                                                                       
}                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
static void null_test(benchmark::State& state) {                                                                                                                                                                                              
  std::set_terminate(&terminate_handler);                                                                                                                                                                                                     
  // Code inside this loop is measured repeatedly                                                                                                                                                                                             
                                                                                                                                                                                                                                              
  gsl::not_null<int*> nnpi = new int;                                                                                                                                                                                                         
  for (auto _ : state) {                                                                                                                                                                                                                      
      for (int i = 0; i < 1000; i++)                                                                                                                                                                                                          
      benchmark::DoNotOptimize(f(nnpi));                                                                                                                                                                                                      
  }                                                                                                                                                                                                                                           
}                                                                                                                                                                                                                                             
// Register the function as a benchmark                                                                                                                                                                                                       
BENCHMARK(null_test);                                                                                                                                                                                                                         
                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
static void no_null_test(benchmark::State& state) {                                                                                                                                                                                           
                                                                                                                                                                                                                                              
    int * pi = new int;                                                                                                                                                                                                                       
    for (auto _ : state) {                                                                                                                                                                                                                    
        for (int i = 0; i < 1000; i++)                                                                                                                                                                                                        
        benchmark::DoNotOptimize(f2(pi));                                                                                                                                                                                                     
    }                                                                                                                                                                                                                                         
}                                                                                                                                                                                                                                             
BENCHMARK(no_null_test);                                                                                                                                                                                                                      
                                                                                                                                                                                                                                              
BENCHMARK_MAIN();                                                                                                                                                                                                                             

and


#include <iostream>                                                                                                                                                                                                                           
#include <vector>                                                                                                                                                                                                                             
#include <map>                                                                                                                                                                                                                                
#include <string>                                                                                                                                                                                                                             
#include <type_traits>                                                                                                                                                                                                                        
#include "GSL/include/gsl/pointers"                                                                                                                                                                                                           
#include <benchmark/benchmark.h>                                                                                                                                                                                                              
                                                                                                                                                                                                                                              
using namespace std;                                                                                                                                                                                                                          
                                                                                                                                                                                                                                              
int f(gsl::not_null<int*> nnpi) {                                                                                                                                                                                                             
    return *nnpi;                                                                                                                                                                                                                             
}                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                              
int f2(int * pi) {                                                                                                                                                                                                                            
    return *pi;                                                                                                                                                                                                                               
}                    

built with: /Users/xaxxon/Downloads/clang+llvm-6.0.0-x86_64-apple-darwin/bin/clang++ main.cpp foo.cpp -lbenchmark -std=c++17 -O3

Timing on existing not_null code (I ran a bunch of times, this is a median-ish run)


Zacs-MacBook-Air:deleteme xaxxon$ ./a.out 
2018-03-29 13:57:16
Run on (4 X 1700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x2)
  L1 Instruction 32K (x2)
  L2 Unified 262K (x2)
  L3 Unified 3145K (x1)
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
null_test          2686 ns       2684 ns     243860
no_null_test       2409 ns       2400 ns     301106

run without the null check in get() (median-ish run)

Zacs-MacBook-Air:deleteme xaxxon$ ./a.out 
2018-03-29 13:59:49
Run on (4 X 1700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x2)
  L1 Instruction 32K (x2)
  L2 Unified 262K (x2)
  L3 Unified 3145K (x1)
----------------------------------------------------
Benchmark             Time           CPU Iterations
----------------------------------------------------
null_test          2397 ns       2396 ns     266253
no_null_test       2398 ns       2397 ns     272918

edit: the termination handler stuff doesn't impact timing if it is removed.

@neilmacintosh
Copy link
Collaborator

Thanks @xaxxon I think now we've got something more concrete to discuss.

I note there is no null checking at all in the no_null_test. So, surely people only use not_null because they want to ensure non-nullness. So wouldn't it be more of an apples-to-apples comparison if f2 looked like:

int f2(int * pi) {                                                                                                                                                                                                                            
    if (pi) return *pi;                                                                                                                                                                                                                               
    else std::terminate();
}      

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 29, 2018 via email

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 29, 2018 via email

@neilmacintosh
Copy link
Collaborator

@xaxxon So I think you are suggesting that the reasonable comparison for equivalent null-safety is to compare the times for just the null_test case that you provided...the first time is with the check in get() the second is without. That makes sense, sorry I missed that from your post. So it looks like the difference is a little under 300ns in this non-optimized case.

That's a worst case, if we remove the prohibition on optimizations, do things improve? That way we get an idea of what the two extremes look like.

Again, I don't think you understand my position. I'm not suggesting that the get() check protects against a specific situation that is left vulnerable without it. I agree that we could remove it and the class (in its present form) would be just as null-safe. But that check is just a result of using a contracts convention specified in the Core Guidelines. I'm interested in understanding if using that convention is really a terrible overhead or not. We can see here that in a no-optimization micro-benchmark with one popular compiler, it costs around 12%. That's a useful data point that does concern me...which is why I'd like to see what things look like with optimization as well.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 30, 2018

if we remove the prohibition on optimizations, do things improve?

I don't quite understand what you mean by "prohibition", I believe I've shown a common every-day use case in which the compiler cannot optimize the code". However, in a simple case, the compiler sees through the entirety of a not_null and can remove it entirely - the constructor and the get(): https://godbolt.org/g/8DwXMz

But that check is just a result of using a contracts convention specified in the Core Guidelines.

I would hope the contracts would be associated with user-defined behaviors which could mess with the invariants of the object. Adding runtime overhead needlessly is very anti-c++ where reliable zero-cost abstractions are what we strive for.

I'd like to see what things look like with optimization as well.

I'm a little confused by what you mean here. All my benchmarks were compiled with full optimizations enabled. As I stated earlier, there are certainly places where the check is discarded by the optimizer at which point the code performs the same, but passing a not_null across compilation units doesn't seem overly contrived. Imagine an object Obj where it has a not_null member in its definition obj.h and its member functions which all use that not_null data member are defined in a separate obj.cpp implementation file. Every other compilation unit calling member functions on that object will incur this overhead - that's basically the exact situation from my last benchmark. You can see the test not being optimized out here https://godbolt.org/g/7RtJ7C on lines 3-4 of the ASM display. I am talking about the situation where the implementation isn't inline in the class definition (as opposed to a templated class which would have the function implementation inline in the class definition) where the compiler can see the code -- in that case it does optimize the test away in many situations (as shown above). However, having a separate .cpp file with implementations is a very common situation for non-templated classes and then you don't get the test optimized away.

I'm not saying that I think the overhead is massive, but I think it's reasonable while programming to do early-optimization on some-cost vs zero-cost. If I know something is zero cost, I don't have to weigh using it at all. If there is some cost, I need to understand what possible tech debt I'm building into my program if it turns out I need to change my approach based on profiling later. I know I'm never going to be creating objects at such a rate that a null check during construction is going to impact me, but saying the same thing about dereferencing a pointer? That's not nearly as easy and I don't want to have to worry about whether I'm using something in a context where my current optimizer is likely to optimize it away or not - that's never any fun.

I'd also like to say that my above assertions are only valid when we never return a reference or pointer to the contained pointer-like object. (I may have submitted a pull request which does that) The current implementation of get() which returns a copy of the contained pointer-like thing doesn't work with move-only types, so something would have to give between removing this test and supporting move-only types, I believe.

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 30, 2018

it was just pointed out to me that one of the guidelines explicitly state that not_null is (edit: among other things) to reduce redundant checks for null:

To improve performance by avoiding redundant checks for nullptr.

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#reason-24

@neilmacintosh
Copy link
Collaborator

@xaxxon I guess I was misled by the lines containing DoNotOptimize in your tests. I thought these were specifically disabling optimizations. Just out of curiosity, if they don't do that, what do they do (sorry for not being familiar with that benchmark framework)?

I don't think your example about passing not_null across compilation units is necessarily compelling. There is nothing that inherently prevents a compiler from optimizing across compilation units.

With the GSL, we have tried to design things to be simple and semantically correct, and then push on compiler vendors to understand why abstractions that should be zero-cost are not. This contract is one example of that process.

But - from the perspective of a library user - I do see a lot of merit in your "some vs. zero-cost" argument. If we had an actual contracts facility, I would expect this contract to be expressed and not generate any cost and I would want to keep it there and push on vendors. That it does have a cost today - in some cases (and it's hard to know how many) - is a frustrating artifact of the lack of language-supported contracts.

I'd hate to see this block adoption of the type. You could offer a PR that simply comments it out with an explanation that - as long as T is a raw pointer being returned by-value - it's safe to not enforce it. I'd hate to see it go from the source code altogether, though, as it is part of the class contract.

And I agree...if we start wrapping smart pointers and returning things by reference from get(), then we'll need to enforce the contract again(at least for non-raw pointer uses of the class).

@xaxxon
Copy link
Contributor Author

xaxxon commented Mar 30, 2018

I guess I was misled by the lines containing DoNotOptimize in your tests.

When writing a microbenchmark, it is often quite easy for the compiler to realize that the code you're testing doesn't actually do anything and just remove the entirety of the code you want to time. DoNotOptimize forces the compiler to think that the result of what is inside of it is used, but without actually generating any additional output code. It doesn't say anything about how to get to that result, simply that it's "important".

If you haven't watched Chandler Caruth's excellent microbenchmarking talk from cppcon 2015, I very highly recommend it. It's highly informational as well as entertaining. The relevant part is here, though: https://www.youtube.com/watch?v=nXaxk27zwlk&t=40m40s The two "magic" assembly functions he writes in the talk were implemented into google benchmark, one of them being named DoNotOptimize.

I'll put together a PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants