-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Talwai/agent dev mode #1577
Talwai/agent dev mode #1577
Conversation
Hey @talwai, I tested it a little and have some remarks (not concerning the code) about it, you probably already talked about it with @LeoCavaille, but I'm going to add my 2 cents. I was expecting a lot more "by check" metrics, ie at the end of every run, display some stats for each check (and not only globally), for instance run time, memory consumption after/before, io read/write (basically the metrics you're already displaying for each run, and maybe tag them by check if you want to upload them to DD). consul:
run_time: 1.01s
memory_before: 60785934
memory_after: 60785935
io read: 2
...
pgbouncer:
run_time: 6.61s
memory_before: 60785834
memory_after: 60785840
io read: 255
...
collector:
run_time: 15.04s
memory_before: 60785834
memory_after: 60786001
io read: 256
... If you do something like this, I think that the profiler should be another option, not activated by default with the "developer mode", because we don't always need this level of details when debugging a check. And maybe also a profiler by check ? (not sure it's needed, just a suggestion) Another thing: I don't think you should put
and a standard user shouldn't see this (event if it doesn't do anything). The new agent_metrics are really nice! 👍 I think you could just prefix them by I just realized that we already have some agent_metrics here: https://github.com/DataDog/dd-agent/blob/master/checks/agent_metrics.py, you might want to put yours there too. (or maybe you split them on purpose ?) |
@degemer Thanks for the feedback!
I agree that we should be getting more data about the performance of individual checks, and not just the collector run as a whole. I can extend
Yep, the output is ugly. Will improve the formatting for the log dumps
I get where you're coming from. In fact, I myself silenced the profiler dump when debugging the
The reason it's in there is:
Basically Will also fix the metric namespace and change |
2c3b4cc
to
3122e10
Compare
@@ -0,0 +1,44 @@ | |||
#3p | |||
import psutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need psutil here!?
Nice! Good to see progress, this will prove super helpful to help the community design more performant checks. Made a few remarks just giving a first look.
|
@@ -22,6 +22,7 @@ | |||
|
|||
# 3rd party | |||
import yaml | |||
import psutil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
psutil is actually not available on source install without a compiler :/
Can you try and catch an import error and disable the developer mode accordingly ?
@LeoCavaille thanks for the review! I think both your ideas are good, if I understand correctly. I will look into moving the profiler higher in the call stack, and dumping cumulative statistics after longer intervals, rather than on every collector run. This should give us more useful higher-level insight than the current setup does. I will also add an option for dumping |
8febb1d
to
af59b19
Compare
The updated default functionality is as follows:
|
f3db44f
to
726abef
Compare
|
||
# 3rd party | ||
import yaml | ||
|
||
try: | ||
import psutil | ||
PSUTIL_PRESENT = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this flag.
To test if psutil is here you can just do
if psutil is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.
dd-agent/checks/system/win32.py
Lines 5 to 8 in 1bef95f
try: | |
import psutil | |
except ImportError: | |
psutil = None |
Great work! Also does it work properly on windows? |
4b80cc1
to
4d51cb2
Compare
profiler.disable_profiling() | ||
profiled = False | ||
collector_profiled_runs = 0 | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what kind of exception can be raised here ?
If you just want to play it safe to make sure that the profiler doesn't crash the agent (which is a good thing) can you log the raised exception please ?
384a798
to
9036155
Compare
@remh This works on windows |
@@ -564,6 +648,15 @@ def run(self): | |||
tb=traceback.format_exc() | |||
) | |||
instance_statuses.append(instance_status) | |||
|
|||
if self.in_developer_mode and self.name != 'agent_metrics': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick but you should use your
AGENT_METRICS_CHECK_NAME
constant
Looks great beside the last comments! It should be ready to merge once those are fixed |
0885520
to
ac9e194
Compare
LEGACY_DATADOG_URLS = [ | ||
"app.datadoghq.com", | ||
"app.datad0g.com", | ||
] | ||
|
||
#Checks whose log output to suppress, unless explicitly asked for | ||
HIDDEN_CHECKS = ['agent_metrics'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the constant you defined here.
ac9e194
to
915ab28
Compare
Changes Unknown when pulling 915ab28 on talwai/agent_dev_mode into ** on master**. |
Changes Unknown when pulling 915ab28 on talwai/agent_dev_mode into ** on master**. |
Changes Unknown when pulling 915ab28 on talwai/agent_dev_mode into ** on master**. |
This adds a "developer mode" to the agent, that can be enabled by the
--profile
command line flag or by thedeveloper_mode
setting inagentConfig
.Developer mode can be enabled at the
check
level or thecollector
level. When enabled at the checklevel, e.g.
./agent.py check nginx --profile
, thecheck.run()
function is profiled withcProfile
and thepstats
output is dumped to the log.When enabled at the
collector
level, the following behavior is enabled:cProfile
, in the same way as aboveAgentMetrics
check (checks.d/agent_metrics.py
) is run at the end of each collector loop. This check can be configured to run a set of additionalpsutil.Process
methods on the current process. These additional metrics are flushed to the dispatcher as well as dumped to the log. Seeconf.d/agent_metrics.yaml.default
for the default configuration, which can be extended as needed. Currently a setting underprocess_metrics
is ignored if there is no correspondingpsutil.Process
method.