Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestSamples/logstash-logstash_pv is flaky #7538

Closed
barkbay opened this issue Feb 8, 2024 · 4 comments · Fixed by #7539
Closed

TestSamples/logstash-logstash_pv is flaky #7538

barkbay opened this issue Feb 8, 2024 · 4 comments · Fixed by #7539
Labels
>test Related to unit/integration/e2e tests v2.12.0

Comments

@barkbay
Copy link
Contributor

barkbay commented Feb 8, 2024

TestSamples/logstash-logstash_pv failed twice in a row:

=== RUN   TestSamples/logstash-logstash_pv/Logstash_should_respond_to_default_pipeline_requests
Retries (15m0s timeout): .........................................................................................................................................................................................................................................................................................................
    step.go:51: 
        	Error Trace:	/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/utils.go:94
        	Error:      	Received unexpected error:
        	            	fail to request /_node/pipelines/main, status is 404)
        	Test:       	TestSamples/logstash-logstash_pv/Logstash_should_respond_to_default_pipeline_requests
{
    "log.level": "error",
    "@timestamp": "2024-02-07T15:30:17.085Z",
    "message": "continuing with additional tests",
    "service.version": "0.0.0-SNAPSHOT+00000000",
    "service.type": "eck",
    "ecs.version": "1.4.0",
    "error": "test Logstash should respond to default pipeline requests failed"
}
{
    "log.level": "info",
    "@timestamp": "2024-02-07T15:30:17.089Z",
    "log.logger": "e2e",
    "message": "Running eck-diagnostics",
    "service.version": "0.0.0-SNAPSHOT+00000000",
    "service.type": "eck",
    "ecs.version": "1.4.0",
    "cluster": "eck-e2e-gke-all-fpiu-7379",
    "test": "TestSamples/logstash-logstash_pv",
    "step": "Logstash should respond to default pipeline requests"
}
        --- FAIL: TestSamples/logstash-logstash_pv/Logstash_should_respond_to_default_pipeline_requests (900.00s)

I did not investigate, but from a quick look at the logs I can see the following errors:

[2024-02-07T15:14:50,993][WARN ][logstash.persistedqueueconfigvalidator] The persistent queue on path "/usr/share/logstash/data/queue/main" won't fit in file system "/dev/sdd" when full. Please free or allocate 1073741824 more bytes.
...
[2024-02-07T15:14:53,407][ERROR][org.logstash.execution.AbstractPipelineExt] Logstash failed to create queue.
java.io.IOException: Unable to allocate 1073741824 more bytes for persisted queue on top of its current usage of 0 bytes
	at org.logstash.ackedqueue.Queue.ensureDiskAvailable(Queue.java:893) ~[logstash-core.jar:?]
...
[2024-02-07T15:14:53,412][ERROR][logstash.agent           ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"Java::JavaLang::IllegalStateException", :message=>"java.io.IOException: Unable to allocate 1073741824 more bytes for persisted queue on top of its current usage of 0 bytes", :backtrace=>
...

CC @robbavey @kaisecheng , sorry for the ping, I was wondering if you think it could be related to a recent change in the Logstash controller? 🙇

@barkbay barkbay added >test Related to unit/integration/e2e tests v2.12.0 labels Feb 8, 2024
@thbkrkr
Copy link
Contributor

thbkrkr commented Feb 8, 2024

Looking at the main pipeline, it started failing after merging Update Elastic Stack version to v8.12.1 (#7535).

@kaisecheng
Copy link
Contributor

Logstash wanna allocate 1GB for persistent queue but the filesystem "/dev/sdd" doesn't have enough space. The PQ folder is empty (current usage of 0 bytes). The test case assign 1GB PVC to PQ path "/usr/share/logstash/data/queue". I am wondering why "/dev/sdd" is running out of space.

@kaisecheng
Copy link
Contributor

kaisecheng commented Feb 8, 2024

TestSamples run tests in ../../config/samples/*/*.yaml and the test case starts failing since the file rename from yml to yaml. Maybe the test case never pass in CI? Running it locally in minikube always gives me PASS. It fails in GKE

kaisecheng added a commit to kaisecheng/cloud-on-k8s that referenced this issue Feb 8, 2024
@kaisecheng
Copy link
Contributor

alright, it simply don't have enough space in filesystem. require 1GB, available 958M.
believe this will fix #7539

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Related to unit/integration/e2e tests v2.12.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants