Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix time_since_epoch different in different os default return precision #4288

Merged
merged 12 commits into from
Sep 6, 2018

Conversation

zyfjeff
Copy link
Member

@zyfjeff zyfjeff commented Aug 29, 2018

Signed-off-by: tianqian.zyf tianqian.zyf@alibaba-inc.com

Description: time_since_epoch different in different os default return precision, use abseil time library to use unified timestamp
Risk Level: low
Testing: N/A
Docs Changes:
Release Notes:
[Optional Fixes #Issue] #4278
[Optional Deprecated:]

@zyfjeff zyfjeff requested a review from lizan as a code owner August 29, 2018 05:40
@zyfjeff zyfjeff force-pushed the time_precision branch 2 times, most recently from 5577bbb to 33c7e57 Compare August 29, 2018 05:55
@zyfjeff
Copy link
Member Author

zyfjeff commented Aug 29, 2018

==21403==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x00000b5e4e20 at pc 0x000003961e2a bp 0x7ffeb6aada40 sp 0x7ffeb6aada38
READ of size 4 at 0x00000b5e4e20 thread T0
    #0 0x3961e29 in std::__atomic_base<unsigned int>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/atomic_base.h:396:9
    #1 0x3961e29 in absl::base_internal::SpinLock::TryLockImpl() /proc/self/cwd/external/com_google_absl/absl/base/internal/spinlock.h:167
    #2 0x395ec13 in absl::base_internal::SpinLock::Lock() /proc/self/cwd/external/com_google_absl/absl/base/internal/spinlock.h:81:10
    #3 0x399291a in absl::GetCurrentTimeNanosSlowPath() /proc/self/cwd/external/com_google_absl/absl/time/clock.cc:393:8
    #4 0x39927cc in absl::GetCurrentTimeNanos() /proc/self/cwd/external/com_google_absl/absl/time/clock.cc:353:10
    #5 0x39911eb in absl::Now() /proc/self/cwd/external/com_google_absl/absl/time/clock.cc:39:15
    #6 0x588fcc in __cxx_global_var_init.49 /proc/self/cwd/external/envoy/test/test_common/utility.cc:37:47
    #7 0x5891c2 in _GLOBAL__sub_I_utility.cc /proc/self/cwd/external/envoy/test/test_common/utility.cc

Look abseil time library cause AddressSanitizer fail

static const int32_t SEED = std::chrono::duration_cast<std::chrono::nanoseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count();
static const int32_t SEED = absl::ToUnixNanos(absl::Now());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the spot where the static initialization fiasco is getting introduced. I'm actually surprised it wasn't an issue before - maybe it was, and the effect is benign if it just perturbs the random seed.

I would try the technique described in https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use to see if the ASAN error goes away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you are right

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this direction (introducing absl::Time) is right. Though another thing to consideration is incorporate this better with Envoy::TimeSource, @jmarantz?

std::chrono::system_clock::now().time_since_epoch())
.count();
int32_t getSeed() {
static int32_t seed = absl::ToUnixNanos(absl::Now());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const?

Copy link
Member Author

@zyfjeff zyfjeff Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add a const to the return value that is a value type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want, to have the seed be static for the binary, as opposed to static for an individual test? Moreover, this may be related to the AddressSanitizer problem you have above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More good reading about the static initialization "fiasco" referenced in the AddressSanitizer report is here:

https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use

My personal rule of thumb: avoid class statics and pre-main function-calls wherever possible and programs will be easier to reason about and more portable. Compile-time initialized scalars, arrays, and structs are fine.

Re the "const" question; that's covered in https://google.github.io/styleguide/cppguide.html#Use_of_const

@lizan lizan requested a review from jmarantz August 30, 2018 08:39
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more about why you want to move from std::chrono to absl::time? From the description I don't fully understand what problem this solves.

If we do that I think it should be done pretty broadly over all of Envoy, and might be a bit easier in conjunction with follow-ups to #4257 , with check_format enhancements extending what's in #4248 to ensure we are working with a single coherent time system.

std::chrono::system_clock::now().time_since_epoch())
.count();
int32_t getSeed() {
static int32_t seed = absl::ToUnixNanos(absl::Now());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what we want, to have the seed be static for the binary, as opposed to static for an individual test? Moreover, this may be related to the AddressSanitizer problem you have above.

@dnoe
Copy link
Contributor

dnoe commented Aug 30, 2018

@jmarantz Can you drive this PR to some conclusion that is compatible with your WIP? Thanks.

@zyfjeff
Copy link
Member Author

zyfjeff commented Aug 30, 2018

@jmarantz
There are two main points:

  1. std::chrono::system_clock::now().time_since_epoch() the duration unit returned by the default os is different, uniform use of absl::time can avoid this problem, Different unit of timestamp in metrics_service in linux and macos #4278 it is because of this problem that the timestamp units returned by metrics are different in different os.
  metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
  1. Use absl::ToUnixMillis(absl::Now() to increase the readability of the code
-   const auto unix_timestamp = std::chrono::duration_cast<std::chrono::seconds>(
                                  std::chrono::system_clock::now().time_since_epoch())
                                  .count();
+   const auto unix_timestamp = absl::ToUnixSeconds(absl::Now());

@jmarantz
Copy link
Contributor

Would it be possible to solve #4278 by using std::chrono differently, rather than switching to a different time library?

std::chrono::system_clock::now().time_since_epoch())
.count();
int32_t getSeed() {
static const int32_t seed = absl::ToUnixNanos(absl::Now());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The old code was probably only working because it got 'lucky', despite its static-init fiasco. With absl::time you are getting less lucky.

Can you try just removing the 'static' keyword here? It will only get called once (per test) when TestRandomGenerator is constructed, and that should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz I'm reviewing #4314 atm and I tend to think it would be better to make this static. TestRandomGenerator can be constructed more than once per test; it's just a stack or member variable, it seems safer to ensure that it has a consistent seed each time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I followed up in #4314 as that's where the code change is being proposed; hopefully there's enough context there now.

@zyfjeff
Copy link
Member Author

zyfjeff commented Aug 30, 2018

@jmarantz of course, but I still think that replacing std::chrono with absl::time is the right choice.

@zyfjeff zyfjeff force-pushed the time_precision branch 2 times, most recently from c6fad8e to 44b68e9 Compare August 30, 2018 16:00
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree absl::time is the right way to go, but not at just one point in the code. I think you should just fix the bug without introducing an alternate mechanism to measure time. Once there is a strong abstraction in place a global pivot to absl::time should be straightforward and safe.

In particular I would worry about subtle issues of skew.

Of course for a random seed it's probably ok, but seems unrelated to the bug fix and maybe should be split out?

@zyfjeff
Copy link
Member Author

zyfjeff commented Aug 30, 2018

@jmarantz Agree, I am re-editing the code, about the static initialization "fiasco" question, do I need to create an issue and pr?

@jmarantz
Copy link
Contributor

RE 'fiasco' -- I think just putting that getSeed() change into a separate PR would be fine; I'm not sure it needs to fix an issue, since the old code worked (even if just by luck).

metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
metric->set_timestamp_ms(std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; can you factor out this verbose function call -- repeated 3x, into a helper function.

Copy link
Member Author

@zyfjeff zyfjeff Aug 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can put the helper function on the anonymous namespace in the current file?Or in the common/common/utility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easiest to put it in local anon namespace, I will be moving it into TimeSource anyway.

There's a bit of debt here -- we should be using TimeSource::systemTime() rather than directly reading the system-clock, so that this code can have time-injection, but I'll fix that after you merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, can you add (near your helper method):
// TODO(#4160): use TimeSource::systemTime() rather than directly reading the system-clock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

jmarantz
jmarantz previously approved these changes Aug 31, 2018
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; looks great.

@zyfjeff
Copy link
Member Author

zyfjeff commented Sep 3, 2018

@dnoe You need to continue to review?

@dnoe
Copy link
Contributor

dnoe commented Sep 4, 2018

@lizan Is this ready to go from your perspective?

dnoe
dnoe previously approved these changes Sep 4, 2018
@dnoe
Copy link
Contributor

dnoe commented Sep 4, 2018

I didn't actually intend to approve, just to remove the "changes requested" in UI. Defer to @lizan for initial maintainer review.

htuch pushed a commit that referenced this pull request Sep 4, 2018
fix static initialization fiasco problem in #4288
Risk Level: low

Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
@@ -16,6 +16,15 @@ namespace Extensions {
namespace StatSinks {
namespace MetricsService {

namespace {
// TODO(#4160): use TimeSource::systemTime() rather than directly reading the system-clock.
int64_t getUnixMicrosForNow() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name suggests that it returns microseconds, but the body is dealing with milliseconds, seems like one of these should change.

@zyfjeff zyfjeff dismissed stale reviews from dnoe and jmarantz via 82d1695 September 5, 2018 00:54
@@ -16,6 +16,15 @@ namespace Extensions {
namespace StatSinks {
namespace MetricsService {

namespace {
// TODO(#4160): use TimeSource::systemTime() rather than directly reading the system-clock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz what's stopping us switching to this right away? Seems best to avoid exception cases like this if we can given how regular time is now treated in Envoy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing, now that you merged the TimeSystem PR; this TODO is addressable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zyfjeff post #4257 merge, do you think we can now use this time source?

Copy link
Contributor

@jmarantz jmarantz Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the path to doing this is to plumb Event::TimeSystem& into constructors as needed, up to MainCommon. Note that both ServerImpl and Dispatcher support timeSystem() or timeSource() APIs, so once you go up the constructor stack till you have one of those, you can stop. #4341 is an example of this refactor.

Once we have all the std::chrono removed from everywhere in Envoy we could decide to switch to absl::Time, though I'd want to socialize the benefits of that to envoy-dev on slack first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch Yes i do, I am going to modify it.

@jmarantz I can try to replace all std::chrono with TimeSystem

…ecision

Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for iterating on this. I've kicked the failing CI jobs, will merge when they pass.

@htuch htuch merged commit 7707c3b into envoyproxy:master Sep 6, 2018
@zyfjeff zyfjeff deleted the time_precision branch September 6, 2018 09:09
jmarantz added a commit to jmarantz/envoy that referenced this pull request Sep 6, 2018
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants