Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ES dependencies index mapping to JSON template file #2149

Closed
wants to merge 5 commits into from

Conversation

frittentheke
Copy link
Contributor

Which problem is this PR solving?

This is a follow-up to PR #2144

Short description of the changes

This PR removes the inline ES index template applied to every newly create dependencies index and replaces it with a template matching dependencies indices by name. Similar to PR #1309 which did the same for spans / services

@frittentheke frittentheke force-pushed the depMappings branch 3 times, most recently from a06aecf to 4d06414 Compare April 3, 2020 17:21
@pavolloffay pavolloffay changed the title Remove built in ES schema for dependencies and use mapping template via JSON file Move ES dependencies index mapping to JSON template file Apr 6, 2020
Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks much cleaner now 👍 . Could you please fix the CI? There seem to be some failures

https://travis-ci.org/github/jaegertracing/jaeger/jobs/670675957#L786

plugin/storage/es/dependencystore/storage.go Show resolved Hide resolved
@frittentheke
Copy link
Contributor Author

The PR looks much cleaner now +1 . Could you please fix the CI? There seem to be some failures
https://travis-ci.org/github/jaegertracing/jaeger/jobs/670675957#L786

@pavolloffay I did fix / adjust the tests and also ran the make job to update the statically embedded files (make elasticsearch-mappings) - let's see if we get a green light now.

@frittentheke
Copy link
Contributor Author

@pavolloffay the two currently still failing CI builds seems to be out of "my" control as they are related to other components. I already did move up to the current master, no change though.

@pavolloffay
Copy link
Member

I have restarted the CI

@frittentheke
Copy link
Contributor Author

@pavolloffay Maybe you could help me a little with the still failing tests?

  1. I honestly don't understand the purpose of this test here:
    https://travis-ci.org/github/jaegertracing/jaeger/jobs/672665074#L3474
    Why is there an error expected to occur?
    With this PR the index creation does not happen explicitly anymore, so could we just remove this test?

  2. What is this test_driver here doing and why is it timing out on startup?
    https://travis-ci.org/github/jaegertracing/jaeger/jobs/672665077#L3877

  3. Last but not least in this case there seems to be no Elasticsearch running to apply the template "jaeger-dependencies" to: https://travis-ci.org/github/jaegertracing/jaeger/jobs/672665079#L3600

But how could that relate to my changes?

@pavolloffay
Copy link
Member

All the tests have to be fixed before merging. If the logic has changed then also adapt the tests for the new logic.

Sometimes CI is flaky, the best is to run the test locally and see whether it passes or not.

@frittentheke
Copy link
Contributor Author

frittentheke commented Apr 9, 2020

@pavolloffay maybe that was lost in translation, but I was not arguing to fix the CI, I gladly dive into things more and provide additional commits to get this PR ready to merge.

Just because you wrote so much of that code (and the tests and CI to it) I was hoping you could give me a few hints on why things might be broken.

@pavolloffay
Copy link
Member

Just because you wrote so much of that code (and the tests and CI to it) I was hoping you could give me a few hints on why things might be broken.

Point me to the code where you have questions. Generally speaking, we have around 100% test coverage in Jaeger so test really examine each branch.

@frittentheke
Copy link
Contributor Author

Just because you wrote so much of that code (and the tests and CI to it) I was hoping you could give me a few hints on why things might be broken.

Point me to the code where you have questions. Generally speaking, we have around 100% test coverage in Jaeger so test really examine each branch.

@pavolloffay thanks for the offer to help. I actually did ask questions in my previous #2149 (comment) . One question (1) was actually about the validity of one test in general (asking if I may remove it altogether). The other two questions where about the CI setup in general. And quite honestly there are a whole lot of scripts interworking with each other to setup a test environment for the CI to work against. The complexity there (and even the flakiness you mentioned yourself) create somewhat of a burden for new contributors to understand before they can get to debug their actual code / tests.

@frittentheke frittentheke force-pushed the depMappings branch 2 times, most recently from 803d924 to fb00969 Compare April 14, 2020 13:57
@frittentheke
Copy link
Contributor Author

frittentheke commented Apr 14, 2020

@pavolloffay I am right to assume that I should base my PR commits on release-1.17 branch instead of master itself?

Because master seems to be quite active and there are so many unrelated issues with various tests there when I apply my innocent three commits onto there.

@pavolloffay
Copy link
Member

The PR should be based on the latest master. There is already one git conflict in this PR. About the tests as I mentioned previously, the test should be adapted to code changes. I often run ES integration tests locally. It shouldn't be too hard.

@frittentheke frittentheke force-pushed the depMappings branch 3 times, most recently from 33af9a2 to 820921d Compare April 14, 2020 14:35
@frittentheke
Copy link
Contributor Author

@pavolloffay allright things are moving forward - merge should be clean and with commit 820921d I adjusted the WriteDependencies test according to the changes of this PR -> all green now.

What remains are the other two CI jobs - one about ElasticSearch integration (https://travis-ci.org/github/jaegertracing/jaeger/jobs/674885438#L3648). While that one seems to be related to my changes as it affects ES - there is no real indication / output on why it fails and why this service did not properly start up.

And then there is the crossdock docker compose thing not properly starting up (https://travis-ci.org/github/jaegertracing/jaeger/jobs/674885436#L3734) - this is where your CI things actually are a little more complex to understand ... there is no error whatsoever until that service apparently throws a 503: https://travis-ci.org/github/jaegertracing/jaeger/jobs/674885436#L3734

Do you have any more hints on getting things replicated locally. As per contribution guidelines (https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING.md#pre-requisites) I was able to run the tests locally via make test, but apparently the CI goes much further and starts of quite a few things to run integration tests. Apart from reading CI scripts there seems to be no easy way for a newbie to start this up on one's local box.

@pavolloffay
Copy link
Member

The xdock e2e tests are harder to understand and debug, but they are using ES so assuming I have restarted the job a couple of times it seems to be related to this PR. Let's fix the ES integration and that should also fix the xdock.

The ES storage integration test can be run by

docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e "discovery.type=single-node" --name=elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.4
STORAGE=elasticsearch make storage-integration-test  

Similarly you can run other tests from this script

STORAGE=elasticsearch make storage-integration-test

@frittentheke
Copy link
Contributor Author

frittentheke commented Apr 16, 2020

The xdock e2e tests are harder to understand and debug, but they are using ES so assuming I have restarted the job a couple of times it seems to be related to this PR. Let's fix the ES integration and that should also fix the xdock.

The ES storage integration test can be run by

docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "http.host=0.0.0.0" -e "discovery.type=single-node" --name=elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.4
STORAGE=elasticsearch make storage-integration-test  

Similarly you can run other tests from this script

STORAGE=elasticsearch make storage-integration-test

Thanks for the hint @pavolloffay .

Well the core ElasticSearch tests are all green (as they were in Travis-CI), but the issue starts with

bash -c "set -e; set -o pipefail; go test -v -race -tags token_propagation -run TestBearTokenPropagation ./plugin/storage/integration/... | sed ''/PASS/s//PASS/'' | sed ''/FAIL/s//FAIL/''"

which is the called at https://github.com/jaegertracing/jaeger/blob/master/scripts/travis/es-integration-test.sh#L18 .

The script itself is rather silent about itself and there are 4 additional make targets being called to build and run other things fore the TokenPropagation tests ... this honestly is a little complex and hard to debug just by looking at the CI output.

I shall be trying to replicate this locally and adding some debug statements to see where things might fail when this runs in CI.

@frittentheke
Copy link
Contributor Author

frittentheke commented Apr 16, 2020

@pavolloffay I found (and hopefully) fixed an issue (of course a stupid syntax error in the template file itself) so kudos to CI :-)

When trying to replicate the CI stages locally, I was wondering about the startup of the jaeger-query here: https://github.com/jaegertracing/jaeger/blob/master/scripts/travis/es-integration-test.sh#L25

It misses an Elasticsearch service on port 9200 (the other ones used before were all killed again).
Who or how is that ever started?

@pavolloffay
Copy link
Member

Try it locally, it does not require ES to be up. Then the test use mocked ES instance.

SPAN_STORAGE_TYPE=elasticsearch ./cmd/query/query-linux --es.server-urls=http://127.0.0.1:9200 --es.tls=false --es.version=7 --query.bearer-token-propagation=true

@@ -22,7 +22,10 @@ make build-crossdock-ui-placeholder
GOOS=linux make build-query

make test-compile-es-scripts
CID=$(docker run --rm -d -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:7.3.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

@frittentheke frittentheke Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just playing around with things and locally this was required (no ES was running / mocked on port 9200 in order for those test to run against)

Honestly I am a little lost ... I simply tried the same approach with a single node ES in Docker that was done a few lines above in that script in order to make things work.

@frittentheke
Copy link
Contributor Author

frittentheke commented Apr 17, 2020

Try it locally, it does not require ES to be up. Then the test use mocked ES instance.

SPAN_STORAGE_TYPE=elasticsearch ./cmd/query/query-linux --es.server-urls=http://127.0.0.1:9200 --es.tls=false --es.version=7 --query.bearer-token-propagation=true

@pavolloffay how / where is the ES mock started then? A few lines lines above in the run_integration_test function where is a single node elasticsearch started as a docker container right before. You also mentioned that in your instructions #2149 (comment)

My local attempts fail exactly like the last remaining / failing test in CI (https://travis-ci.org/github/jaegertracing/jaeger/jobs/676153815#L3596) with the jaeger-query (required and missed at 127.0.0.1:16686 by the then failing tests bash -c "set -e; set -o pipefail; go test -v -race -tags token_propagation -run TestBearTokenPropagation ./plugin/storage/integration/... | sed ''/PASS/s//PASS/'' | sed ''/FAIL/s//FAIL/''"
Jaeger-query simply cannot connect to an ElasticSearch at port 9200 and does not start:

{"level":"info","ts":1587124547.0320215,"caller":"flags/service.go:116","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1587124547.0323653,"caller":"flags/admin.go:120","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1587124547.0324762,"caller":"flags/admin.go:126","msg":"Starting admin HTTP server","http-addr":":16687"}
{"level":"info","ts":1587124547.0325162,"caller":"flags/admin.go:112","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}
{"level":"fatal","ts":1587124547.051009,"caller":"command-line-arguments/main.go:101","msg":"Failed to create dependency reader","error":"Put \"http://127.0.0.1:9200/_template/jaeger-dependencies\": dial tcp 127.0.0.1:9200: connect: connection refused","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:101\ngh.neting.cc/spf13/cobra.(*Command).execute\n\tgh.neting.cc/spf13/cobra@v0.0.3/command.go:762\ngh.neting.cc/spf13/cobra.(*Command).ExecuteC\n\tgh.neting.cc/spf13/cobra@v0.0.3/command.go:852\ngh.neting.cc/spf13/cobra.(*Command).Execute\n\tgh.neting.cc/spf13/cobra@v0.0.3/command.go:800\nmain.main\n\tcommand-line-arguments/main.go:134\nruntime.main\n\truntime/proc.go:203"}

When I added an ElasticSearch to be running at port 9200 the jaeger-query instance starts successfully and locally the tests run fine against it.

So what I am trying to say is: The TestBearTokenPropagation is not even starting with jaeger-query not running.

… file like with span and service indices

Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
… json files

Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
…and introduce tests for dependencies index / mapping template

Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
… is no explicit index creation anymore

Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
@frittentheke
Copy link
Contributor Author

@pavolloffay would you kindly take another look at this. I just pushed commit b6d1c8b containing the discussed missing ES instance to run TestBearTokenPropagation.

I believe the ElasticSearch intergration tests are skipped most of the time, which is why my tests fail, but master work (check out last run: https://travis-ci.org/github/jaegertracing/jaeger/jobs/677609252#L2285)

@frittentheke
Copy link
Contributor Author

This PR has been obsoleted by #2285 - closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants