Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix static initialization fiasco problem #4314

Merged
merged 2 commits into from
Sep 4, 2018

Conversation

zyfjeff
Copy link
Member

@zyfjeff zyfjeff commented Aug 31, 2018

Signed-off-by: tianqian.zyf tianqian.zyf@alibaba-inc.com

Description: fix static initialization fiasco problem in #4288
Risk Level: low
Testing: N/A
Docs Changes: N/A
Release Notes:
[Optional Fixes #Issue]
[Optional Deprecated:]

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought static initialization fiasco only applies to non-PoD... is this addressing a bug you noticed?

@htuch htuch self-assigned this Aug 31, 2018
@zyfjeff
Copy link
Member Author

zyfjeff commented Sep 1, 2018

static order initialization fiasco for the non-Pod types of data still exist problems, because I'm here to initialize the SEED using a function call

https://isocpp.org/wiki/faq/ctors#static-init-order-on-intrinsics

When I was in fix #4288, accidentally triggered a asan error, so I think this place is a potential problem
related discussion

==21403==ERROR: AddressSanitizer: initialization-order-fiasco on address 0x00000b5e4e20 at pc 0x000003961e2a bp 0x7ffeb6aada40 sp 0x7ffeb6aada38
READ of size 4 at 0x00000b5e4e20 thread T0
    #0 0x3961e29 in std::__atomic_base<unsigned int>::load(std::memory_order) const /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/atomic_base.h:396:9
    #1 0x3961e29 in absl::base_internal::SpinLock::TryLockImpl() /proc/self/cwd/external/com_google_absl/absl/base/internal/spinlock.h:167
    #2 0x395ec13 in absl::base_internal::SpinLock::Lock() /proc/self/cwd/external/com_google_absl/absl/base/internal/spinlock.h:81:10
    #3 0x399291a in absl::GetCurrentTimeNanosSlowPath() /proc/self/cwd/external/com_google_absl/absl/time/clock.cc:393:8
    #4 0x39927cc in absl::GetCurrentTimeNanos() /proc/self/cwd/external/com_google_absl/absl/time/clock.cc:353:10
    #5 0x39911eb in absl::Now() /proc/self/cwd/external/com_google_absl/absl/time/clock.cc:39:15
    #6 0x588fcc in __cxx_global_var_init.49 /proc/self/cwd/external/envoy/test/test_common/utility.cc:37:47
    #7 0x5891c2 in _GLOBAL__sub_I_utility.cc /proc/self/cwd/external/envoy/test/test_common/utility.cc

std::chrono::system_clock::now().time_since_epoch())
.count();
int32_t getSeed() {
return std::chrono::duration_cast<std::chrono::nanoseconds>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense (https://isocpp.org/wiki/faq/ctors#static-init-order-on-intrinsics). Can you make this a static internally in this function, so that if we instantiate another TestRandomGenerator object, we get the same seed? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the previous talk about static
#4288 (comment)

It will only get called once (per test) when TestRandomGenerator is constructed, and that should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replied at #4288 (comment), we can discuss in that thread, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4288 is now a minimal fix to an unrelated bug, which doesn't touch this seed-construction at all (anymore); my suggestion is to work toward merging there; and discuss the static fiasco here.

I am not understanding why TestRandomGenerator wants to get the same seed on each test run within a single process, but is OK getting different seeds when linked into multiple processes. One way that's confusing is that the coverage run would get a single seed for the entire suite of envoy tests, whereas the unit tests ordinarily would get a different seed for each _test.cc. If there's some rationale why that needs to work that way, it would be good to have a comment.

But in any case, if we really do want to share the seed within the binary, we need to do that in a non-fiasco way :) Our attention was called to the fiasco by the explicit use of that term in the ASAN test that failed when @zyfjeff switched from std::chrono to absl::Time in the static initialization expression, in an earlier version of #4288 . My suspicion is that the absl::Time function had its own evil static construction and the linker chose to run that after this one. I didn't investigate deeply because I prefer to avoid static initialization involving function calls.

I'm also not entirely clear when the static initialization is lazy (which at least makes order deterministic). I might have thought it would be lazy here but the ASAN error suggested it was not.

gtest does provide a mechanism for complex static initialization: SetUpTestCase. Could we init the seed in that (if it really needs to be static)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're in agreement that we should fix the static initialization order fiasco and that this current PR is roughly the right thing, just trying to zero in on some minor detail.

I agree that we don't need to have every TestRandomGenerator have the exact same seed (in fact in the interest of intended random behavior, it would be ideal that if we have more than one instance that the do use slightly different seeds). However, what we do need is the ability to be able to replay deterministically the behavior that is provided from logs.

In this current PR, if we don't have the CLI flag set and have multiple generators, we actually have two dynamically created seeds. Later on, if the user wants to recreate behavior, which of these logged seeds do they use? Either way, the rerun won't exhibit the original behavior.

What I'm suggesting is to make an inner variable of getSeed() static. C++ is very clear that this will then only get initialized on first use and solve the static initialization order fiasco. We can chat IRL; I don't think this is controversial.

On a related note, why do we even need this "initialize seed from time" ability? Doesn't gtest always set random_seed? Is this for some situation that we are outside of gtest??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you say that from the point of view that you can replay and reproduce the problem, I recognize that SEED should be static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm -- I forgot that SEED was previously a statically initialized variable. So making it a lazy-initialized static in a helper function will solve the fiasco.

Can you just add a comment that this wants to be a static so there's just one seed specified for replay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

jmarantz
jmarantz previously approved these changes Sep 4, 2018
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
Signed-off-by: tianqian.zyf <tianqian.zyf@alibaba-inc.com>
@dnoe
Copy link
Contributor

dnoe commented Sep 4, 2018

Looks like the CI failure is some CircleCI resource problem. Please make an empty commit to re-trigger the CI.

Check out https://gist.github.com/htuch/79caffa2b16eaeda8f74093da4a71326 for an example you can add to your git config to do this easily.

@zyfjeff
Copy link
Member Author

zyfjeff commented Sep 4, 2018

@dnoe I rebase the master code and re-trigger CI.

@dnoe
Copy link
Contributor

dnoe commented Sep 4, 2018

Perfect, that should work too. I see Circle CI has some alert about known resource issue right now, so let's hope it's OK this time.

@dnoe dnoe self-assigned this Sep 4, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@htuch htuch merged commit 07efc6d into envoyproxy:master Sep 4, 2018
@zyfjeff zyfjeff deleted the static_initialization branch September 5, 2018 00:46
@zyfjeff zyfjeff restored the static_initialization branch September 6, 2018 08:32
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
Pulling the following changes from github.com/envoyproxy/envoy:

f936fc6 ssl: serialize accesses to SSL socket factory contexts (envoyproxy#4345)
e34dcd6 Fix crash in tcp_proxy (envoyproxy#4323)
ae6a252 router: fix matching when all domains have wildcards (envoyproxy#4326)
aa06142 test: Stop fake_upstream methods from accidentally succeeding (envoyproxy#4232)
5d73187 rbac: update the authenticated.user to a StringMatcher. (envoyproxy#4250)
c6bfc7d time: Event::TimeSystem abstraction to make it feasible to inject time with simulated timers (envoyproxy#4257)
752483e Fixing the fix (envoyproxy#4333)
83487f6 tls: update BoringSSL to ab36a84b (3497). (envoyproxy#4338)
7bc210e test: fixing interactions between waitFor and ignore_spurious_events (envoyproxy#4309)
69474b3 admin: order stats in clusters json admin (envoyproxy#4306)
2d155f9 ppc64le build (envoyproxy#4183)
07efc6d fix static initialization fiasco problem (envoyproxy#4314)
0b7e3b5 test: Remove declared but undefined class methods (envoyproxy#4297)
1485a13 lua: make sure resetting dynamic metadata wrapper when request info is marked dead
d243cd6 test: set to zero when start_time exceeds limit (envoyproxy#4328)
0a1e92a test: fix heap use-after-free in ~IntegrationTestServer. (envoyproxy#4319)
cddc732 CONTRIBUTING: Document 'kick-ci' trick. (envoyproxy#4335)
f13ef24 docs: remove reference to deprecated value field (envoyproxy#4322)
e947a27 router: minor doc fixes in stream idle timeout (envoyproxy#4329)
0c2e998 tcp-proxy: fixing a TCP proxy bug where we attempted to readDisable a closed connection (envoyproxy#4296)
00ffe44 utility: fix strftime overflow handling. (envoyproxy#4321)
af1183c Re-enable TcpProxySslIntegrationTest and make the tests pass again. (envoyproxy#4318)
3553461 fuzz: fix H2 codec fuzzer post envoyproxy#4262. (envoyproxy#4311)
42f6048 Proto string issue fix (envoyproxy#4320)
9c492a0 Support Envoy to fetch secrets using SDS service. (envoyproxy#4256)
a857219 ratelimit: revert `revert rate limit failure mode config` and add tests (envoyproxy#4303)
1d34172 dns: fix exception unsafe behavior in c-ares callbacks. (envoyproxy#4307)
1212423 alts: add gRPC TSI socket (envoyproxy#4153)
f0363ae fuzz: detect client-side resets in H2 codec fuzzer. (envoyproxy#4300)
01aa3f8 test: hopefully deflaking echo integration test (envoyproxy#4304)
1fc0f4b ratelimit: link legacy proto when message is being used (envoyproxy#4308)
aa4481e fix rare List::remove(&target) segfault (envoyproxy#4244)
89e0f23 headers: fixing fast fail of size-validation (envoyproxy#4269)
97eba59 build: bump googletest version. (envoyproxy#4293)
0057e22 fuzz: avoid false positives in HCM fuzzer. (envoyproxy#4262)
9d094e5 Revert ac0bd74 (envoyproxy#4295)
ddb28a4 Add validation context provider (envoyproxy#4264)
3b47cba added histogram latency information to Hystrix dashboard stream (envoyproxy#3986)
cf87d50 docs: update SNI FAQ. (envoyproxy#4285)
f952033 config: fix update empty stat for eds (envoyproxy#4276)
329e591 router: Add ability of custom headers to rely on per-request data (envoyproxy#4219)
68d20b4  thrift: refactor build files and imports (envoyproxy#4271)
5fa8192 access_log: log requested_server_name in tcp proxy (envoyproxy#4144)
fa45bb4 fuzz: libc++ clocks don't like nanos. (envoyproxy#4282)
53f8944 stats: add symbol table for future stat name encoding (envoyproxy#3927)
c987b42 test infra: Remove timeSource() from the ClusterManager api (envoyproxy#4247)
cd171d9 websocket: tunneling websockets (and upgrades in general) over H2 (envoyproxy#4188)
b9dc5d9 router: disallow :path/host rewriting in request_headers_to_add. (envoyproxy#4220)
0c91011 network: skip socket options and source address for UDS client connections (envoyproxy#4252)
da1857d build: fixing a downstream compile error by noting explicit fallthrough (envoyproxy#4265)
9857cfe fuzz: cleanup per-test environment after each fuzz case. (envoyproxy#4253)
52beb06 test: Wrap proto string in std::string before comparison (envoyproxy#4238)
f5e219e extensions/thrift_proxy: Add header matching to thrift router (envoyproxy#4239)
c9ce5d2 fuzz: track read_disable_count bidirectionally in codec_impl_fuzz_test. (envoyproxy#4260)
35103b3 fuzz: use nanoseconds for SystemTime in RequestInfo. (envoyproxy#4255)
ba6ba98 fuzz: make runtime root hermetic in server_fuzz_test. (envoyproxy#4258)
b0a9014 time: Add 'format' test to ensure no one directly instantiates Prod*Time from source. (envoyproxy#4248)
8567460 access_log: support beginning of epoch in START_TIME. (envoyproxy#4254)
28d5f41 proto: unify envoy_proto_library/api_proto_library. (envoyproxy#4233)
f7d3cb6 http: fix allocation bug introduced in envoyproxy#4211. (envoyproxy#4245)

Fixes istio/istio#8310 (once pulled into istio/istio).

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants