Skip to content
This repository has been archived by the owner on Jul 19, 2023. It is now read-only.

Fixes the scrape timeout validation. #465

Merged
merged 4 commits into from
Jan 11, 2023
Merged

Fixes the scrape timeout validation. #465

merged 4 commits into from
Jan 11, 2023

Conversation

cyriltovena
Copy link
Collaborator

Fixes #463

@cyriltovena cyriltovena requested a review from simonswine January 9, 2023 15:25
Copy link
Collaborator

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the bug fix, not too sure if I agree on the second change.

pkg/agent/config.go Outdated Show resolved Hide resolved
Comment on lines 125 to 130
if c.ScrapeTimeout > c.ScrapeInterval {
return fmt.Errorf("scrape timeout must be larger or equal to inverval for: %v", c.JobName)
}
if c.ScrapeTimeout == 0 {
c.ScrapeTimeout = c.ScrapeInterval
c.ScrapeTimeout = c.ScrapeInterval + model.Duration(3*time.Second)
}
if c.ScrapeTimeout <= c.ScrapeInterval {
return fmt.Errorf("scrape timeout must be larger or equal to inverval for: %v", c.JobName)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only do those defaults and checks if delta=true. Because otherwise computing the profile will be instantaneous and we should have shorter timeouts.

pkg/agent/config.go Show resolved Hide resolved
@@ -228,7 +228,7 @@ func (tg *TargetGroup) targetsFromGroup(group *targetgroup.Group) ([]*Target, []
}

if pcfg, found := tg.config.ProfilingConfig.PprofConfig[profType]; found && pcfg.Delta {
params.Add("seconds", strconv.Itoa(int(time.Duration(tg.config.ScrapeTimeout)/time.Second)-1))
params.Add("seconds", strconv.Itoa(int(time.Duration(tg.config.ScrapeInterval)/time.Second)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure why this had been removed. I always thought that this will make sure we have a consistent scrape time, even when network latency is added (up to 1 second).

Let's take this example:

scrape_interval: 15s
scrape_timeout: 18s
profiling_time: 15s (was 14s before)

After this change, when we have network latency of 500ms, we would miss out on scrapes, because the real scrape interval would be 15s+network latency and this will create problems when we look at things over time e.g. this historgram screenshot (similar to the prometheus rate([twice_scrape_interval]) problem.

Here you can see a point missing at 11:12:47 because of that:

scrn-2023-01-11-11-13-41

cyriltovena and others added 3 commits January 11, 2023 12:17
Co-authored-by: Christian Simon <simon@swine.de>
Co-authored-by: Christian Simon <simon@swine.de>
@cyriltovena cyriltovena enabled auto-merge January 11, 2023 13:07
@cyriltovena cyriltovena merged commit c5e6772 into main Jan 11, 2023
@cyriltovena cyriltovena deleted the fixes/463 branch January 11, 2023 13:16
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
Fixes the scrape timeout validation.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to set scrape timeout higher than scrape interval
2 participants