Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: make overload manager non-optional #11777

Merged
merged 15 commits into from
Jul 14, 2020

Conversation

akonradi
Copy link
Contributor

Commit Message:
Make the Overload Manager argument a reference instead of a pointer, and make the ThreadLocalOverloadState an interface.

Additional Description: Making the OM non-optional reduces special case handling in the HTTP Connection Manager impl, and making the thread-local state object an interface the makes it easier to update the OverloadManagerImpl without causing churn in header files.

Risk Level: low
Testing: ran unit tests
Docs Changes: none
Release Notes: none

The only user of a null OverloadManager was the admin console. Switch
that to using a no-op OverloadManager subclass so that the OM can be
provided via reference in all cases. This makes the HCM initialization
simpler since it doesn't need to worry about the null pointer.

Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi akonradi force-pushed the non-optional-om branch 2 times, most recently from e994350 to ef0f99b Compare June 26, 2020 18:52
Make ThreadLocalOverloadState an interface that the Overload Manager can
return an implementation of. This decouples the behavior specified by
the header from the details of the implementation.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cleanup. Some questions and nits below.

source/server/admin/admin.h Outdated Show resolved Hide resolved
source/server/admin/admin.cc Outdated Show resolved Hide resolved

NullOverloadManager(Event::Dispatcher& dispatcher) : thread_local_overload_state_(dispatcher) {}
void start() override {}
ThreadLocalOverloadState& getThreadLocalOverloadState() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ThreadLocalOverloadState& since ThreadLocalOverloadState has no mutation methods as part of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to make this const because I expect in the near future that we'll want non-const methods on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me more about said non-const methods. I would expect OverloadManager to be the one doing all the updates to overload state resources. I'm guessing that you're referring to methods to create scalable timers, which in fact would need to be non-const.

That said, I wonder if scalable timers should be more closely integrated with the Event::Dispatcher interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my concern is around adding methods to create scalable timers. I think we can discuss them in a separate issue, though, and leave this as-is here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my advice is to use the Event::Dispatcher interface to create scalable timers. We can discuss in the PR that defines that API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to make this const for now and then remove it down the line if we start needing non-const access, just so we don't accidentally leave this non-const.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally gave it a try and it turns out that won't work since getState() is non-const. getState is non-const because it looks up a key in the internal state map and creates an entry for it if one doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange that getState returns a reference instead of a value. State is a primitive type (an enum)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callers hold onto the reference to avoid extra map lookups

return false;
}

NullThreadOverloadState thread_local_overload_state_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this object is a thread local. I'm not sure what's the right way to resolve this issue. ThreadLocalOverloadState only has const methods in its interface so it is safe for the null overload manager to be a singleton instead of a thread local. I wonder if renaming to OverloadManagerState would help, or further add to the confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this object explicitly thread-local instead of relying on const accessors.

akonradi added 2 commits June 29, 2020 16:00
All implementations of the interface are for thread-local objects so
specify that with interface inheritance.

Signed-off-by: Alex Konradi <akonradi@google.com>
- rename member
- use thread-local store
- remove dispatcher

Signed-off-by: Alex Konradi <akonradi@google.com>
source/server/admin/admin.cc Show resolved Hide resolved
source/server/admin/admin.h Show resolved Hide resolved
NullOverloadManager(ThreadLocal::SlotAllocator& slot_allocator)
: tls_(slot_allocator.allocateSlot()) {}

void start() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is start() called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, moved the TLS initialization to the constructor, which seems to work fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you opted to use start() in the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning to keep this in start vs the ctor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor gets invoked with AdminImpl's constructor, which is invoked in server.cc before the TLS state is initialized. Putting this in start lets us defer it.

Signed-off-by: Alex Konradi <akonradi@google.com>
antoniovicente
antoniovicente previously approved these changes Jul 1, 2020
source/server/admin/admin.h Outdated Show resolved Hide resolved
Signed-off-by: Alex Konradi <akonradi@google.com>
akonradi added 4 commits July 1, 2020 16:24
Signed-off-by: Alex Konradi <akonradi@google.com>
Doing it in the constructor was causing ASAN errors.

Signed-off-by: Alex Konradi <akonradi@google.com>
Initialize the AdminImpl after the dispatchers so that it can create
thread-local data structures. Without this, integration tests fail ASAN
runs.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Contributor Author

akonradi commented Jul 7, 2020

@antoniovicente can I get another pass on this? I had to make some changes to get the ASAN integration tests to pass.

antoniovicente
antoniovicente previously approved these changes Jul 9, 2020

NullOverloadManager(Event::Dispatcher& dispatcher) : thread_local_overload_state_(dispatcher) {}
void start() override {}
ThreadLocalOverloadState& getThreadLocalOverloadState() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my advice is to use the Event::Dispatcher interface to create scalable timers. We can discuss in the PR that defines that API.

actions_[action] = state;
} else {
it->second = state;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure why the body of this function isn't simply:

actions_[action] = state;

@@ -428,6 +411,24 @@ void InstanceImpl::initialize(const Options& options,
dispatcher_->initializeStats(stats_store_, "server.");
}

if (initial_config.admin().address()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific action you need to happen first? Is it the "thread_local_.registerThread(*dispatcher_, true);"

Does worker initialization happen before or after this point?

I wonder if initialization is still happening too early and possible consequences for internal admin API users which bypass request handling on the admin port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's the line. The worker initialization happens before, in the ListenerManagerImpl constructor.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Contributor Author

akonradi commented Jul 9, 2020

@mattklein123 can I get a review?

antoniovicente
antoniovicente previously approved these changes Jul 9, 2020
NullOverloadManager(ThreadLocal::SlotAllocator& slot_allocator)
: tls_(slot_allocator.allocateSlot()) {}

void start() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you opted to use start() in the end.

@akonradi
Copy link
Contributor Author

akonradi commented Jul 9, 2020

@ggreenway I've been informed Matt is OOO, and Alyssa is with the same org as me and Antonio. Can I ask you to assign a senior maintainer to review this?

Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks good overall, just a few comments

NullOverloadManager(ThreadLocal::SlotAllocator& slot_allocator)
: tls_(slot_allocator.allocateSlot()) {}

void start() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning to keep this in start vs the ctor?


NullOverloadManager(Event::Dispatcher& dispatcher) : thread_local_overload_state_(dispatcher) {}
void start() override {}
ThreadLocalOverloadState& getThreadLocalOverloadState() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to make this const for now and then remove it down the line if we start needing non-const access, just so we don't accidentally leave this non-const.


bool registerForAction(const std::string&, Event::Dispatcher&, OverloadActionCb) override {
// This method shouldn't be called by the admin listener
ASSERT(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically we'd use NOT_REACHED_GCOVR_EXCL_LINE to exclude this line from coverage in addition to assert that it wont get called

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Contributor Author

@snowp it looks like the TSAN failure is due to #9784. Also, the /retest command appears to only re-run CircleCI tests, not the Azure pipelines.

Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this LGTM.

I think its failing on a well known flake, I'll try to rerun the test a few times.

@mattklein123 mattklein123 merged commit 0c6523c into envoyproxy:master Jul 14, 2020
scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
The only user of a null OverloadManager was the admin console. Switch
that to using a no-op OverloadManager subclass so that the OM can be
provided via reference in all cases. This makes the HCM initialization
simpler since it doesn't need to worry about the null pointer.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants