Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] HaHdfsFailoverTestSuiteIT#testHAFailoverWithRepository fails because it was unable to create repo #31739

Closed
polyfractal opened this issue Jul 2, 2018 · 7 comments
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@polyfractal
Copy link
Contributor

Could not reproduce this locally, even with many iterations. Tagging this as Snapshot/Restore, but it might be a security issue instead since the failure was authentication related.

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+31230_gradle48_jdk11+matrix-java-feature-branch/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=java10,nodes=virtual&&linux/16/console

./gradlew :plugins:repository-hdfs:integTestHaRunner \
  -Dtests.seed=6FA024F53EBF2435 \
  -Dtests.class=org.elasticsearch.repositories.hdfs.HaHdfsFailoverTestSuiteIT \
  -Dtests.method="testHAFailoverWithRepository" \
  -Dtests.security.manager=true \
  -Dtests.locale=fr-MC \
  -Dtests.timezone=Europe/Kaliningrad
org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:33111], URI [/_snapshot/hdfs_ha_repo_read], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"repository_exception","reason":"[hdfs_ha_repo_read] failed to create repository"}],"type":"repository_exception","reason":"[hdfs_ha_repo_read] failed to create repository","caused_by":{"type":"unchecked_i_o_exception","reason":"Could not retrieve the current user information","caused_by":{"type":"i_o_exception","reason":"failure to login: javax.security.auth.login.LoginException: Security Exception","caused_by":{"type":"login_exception","reason":"Security Exception","caused_by":{"type":"security_exception","reason":null}}}}},"status":500}
	at __randomizedtesting.SeedInfo.seed([6FA024F53EBF2435:9ADF91EB9F7842B1]:0)
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:918)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:225)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:319)
	at org.elasticsearch.repositories.hdfs.HaHdfsFailoverTestSuiteIT.testHAFailoverWithRepository(HaHdfsFailoverTestSuiteIT.java:109)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:33111], URI [/_snapshot/hdfs_ha_repo_read], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"repository_exception","reason":"[hdfs_ha_repo_read] failed to create repository"}],"type":"repository_exception","reason":"[hdfs_ha_repo_read] failed to create repository","caused_by":{"type":"unchecked_i_o_exception","reason":"Could not retrieve the current user information","caused_by":{"type":"i_o_exception","reason":"failure to login: javax.security.auth.login.LoginException: Security Exception","caused_by":{"type":"login_exception","reason":"Security Exception","caused_by":{"type":"security_exception","reason":null}}}}},"status":500}
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:538)
	at org.elasticsearch.client.RestClient$1.completed(RestClient.java:527)
	at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
	at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
	... 1 more
@polyfractal polyfractal added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jul 2, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor

bleskes commented Jul 3, 2018

@jbaiera do you mind having a look?

@mayya-sharipova
Copy link
Contributor

We have the same test failed today on master just with different error messages.
@atorok in the previous email has mentioned that he has pushed the fix. Has the fix pushed into mater?

REPRODUCE WITH: ./gradlew :plugins:repository-hdfs:integTestHaRunner \
  -Dtests.seed=80D8D1131CF26803 \
  -Dtests.class=org.elasticsearch.repositories.hdfs.HaHdfsFailoverTestSuiteIT \
  -Dtests.method="testHAFailoverWithRepository" \
  -Dtests.security.manager=true \
  -Dtests.locale=hi-IN \
  -Dtests.timezone=Pacific/Rarotonga
16:49:02   2> NOTE: All tests run in this JVM: [HaHdfsFailoverTestSuiteIT]
16:49:02   1> [2018-07-05T16:49:00,658][WARN ][o.e.b.JNANatives         ] unable to install syscall filter: 
16:49:02   1> java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
16:49:02   1> 	at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:341) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:616) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:258) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:113) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:108) [elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at org.elasticsearch.bootstrap.BootstrapForTesting.<clinit>(BootstrapForTesting.java:83) [framework-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at org.elasticsearch.test.ESTestCase.<clinit>(ESTestCase.java:197) [framework-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
16:49:02   1> 	at java.lang.Class.forName0(Native Method) [?:?]
16:49:02   1> 	at java.lang.Class.forName(Class.java:374) [?:?]
16:49:02   1> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:592) [randomizedtesting-runner-2.5.2.jar:?]
16:49:02   1> [2018-07-05T16:49:00,671][WARN ][o.e.b.JNANatives         ] Unable to lock JVM Memory: error=12, reason=Cannot allocate memory
16:49:02   1> [2018-07-05T16:49:00,671][WARN ][o.e.b.JNANatives         ] This can result in part of the JVM being swapped out.
16:49:02   1> [2018-07-05T16:49:00,671][WARN ][o.e.b.JNANatives         ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
16:49:02   1> [2018-07-05T16:49:00,671][WARN ][o.e.b.JNANatives         ] These can be adjusted by modifying /etc/security/limits.conf, for example: 
16:49:02   1> 	# allow user 'jenkins' mlockall
16:49:02   1> 	jenkins soft memlock unlimited
16:49:02   1> 	jenkins hard memlock unlimited
16:49:02   1> [2018-07-05T16:49:00,671][WARN ][o.e.b.JNANatives         ] If you are logged in interactively, you will have to re-login for the new limits to take effect.
16:49:02   1> [2018-07-05T16:49:01,300][INFO ][o.e.r.h.HaHdfsFailoverTestSuiteIT] [testHAFailoverWithRepository]: before test
16:49:02   1> [2018-07-05T16:49:01,400][INFO ][o.e.r.h.HaHdfsFailoverTestSuiteIT] initializing REST clients against [http://[::1]:33032]
16:49:02   1> [2018-07-05T16:49:01,671][WARN ][o.a.h.u.NativeCodeLoader ] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16:49:02   1> [2018-07-05T16:49:02,027][INFO ][o.e.r.h.HaHdfsFailoverTestSuiteIT] [testHAFailoverWithRepository]: after test
16:49:02 ERROR   0.77s | HaHdfsFailoverTestSuiteIT.testHAFailoverWithRepository <<< FAILURES!
16:49:02    > Throwable #1: java.security.PrivilegedActionException: java.io.IOException: failure to login: javax.security.auth.login.LoginException: Security Exception
16:49:02    > 	at __randomizedtesting.SeedInfo.seed([80D8D1131CF26803:75A7640DBD350E87]:0)
16:49:02    > 	at java.base/java.security.AccessController.doPrivileged(Native Method)
16:49:02    > 	at org.elasticsearch.repositories.hdfs.HaHdfsFailoverTestSuiteIT.testHAFailoverWithRepository(HaHdfsFailoverTestSuiteIT.java:79)
16:49:02    > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16:49:02    > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
16:49:02    > 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16:49:02    > 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
16:49:02    > 	at java.base/java.lang.Thread.run(Thread.java:832)
16:49:02    > Caused by: java.io.IOException: failure to login: javax.security.auth.login.LoginException: Security Exception
16:49:02    > 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:877)
16:49:02    > 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:802)
16:49:02    > 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:675)
16:49:02    > 	at org.elasticsearch.repositories.hdfs.HaHdfsFailoverTestSuiteIT.lambda$testHAFailoverWithRepository$0(HaHdfsFailoverTestSuiteIT.java:102)
16:49:02    > 	... 38 more
16:49:02    > Caused by: javax.security.auth.login.LoginException: Security Exception
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext.invoke(LoginContext.java:805)
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext.access$000(LoginContext.java:194)
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:665)
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:663)
16:49:02    > 	at java.base/java.security.AccessController.doPrivileged(Native Method)
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:663)
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext.login(LoginContext.java:574)
16:49:02    > 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:840)
16:49:02    > 	... 41 more
16:49:02    > Caused by: java.lang.SecurityException
16:49:02    > 	at java.base/javax.security.auth.login.LoginContext.invoke(LoginContext.java:806)
16:49:02    > 	... 48 more
16:49:02 Completed [1/1] in 1.87s, 1 test, 1 error <<< FAILURES!

@alpar-t
Copy link
Contributor

alpar-t commented Jul 6, 2018

"fixed" only in the sense to get the build going, I really just muted the test on JDK 11 #31498 is there to get it looked at and fixed.

@alpar-t
Copy link
Contributor

alpar-t commented Jul 6, 2018

@polyfractal when you tried to reproduce it, did you consider the Java versions involved ?
I'm only asking because I think it's easy to miss and that we could include it in the reproduction message since we require env vars for different Java homes already anyhow. The reproduction line would start with something like JAVA_HOME=$JAVA10_HOME RUNTIME_JAVA_HOME=$JAVA11_HOME ./gradlew ... do you think it would be helpful ?

@alpar-t
Copy link
Contributor

alpar-t commented Jul 6, 2018

I updated the check, it turns out that the test fails if jdk 11 is involved in any way ( either run-time or compile time )

@alpar-t alpar-t closed this as completed in eaa247d Jul 6, 2018
@polyfractal
Copy link
Contributor Author

@atorok I don't quite recall... but I think I missed the java version at the time. Even if I remembered this time, there have definitely been times in the past when it took me a while to notice the problem was with one version or another.

++ to making the version more noticeable in the reproduction line, would help eliminate forgetting to check the build.

dnhatn added a commit that referenced this issue Jul 7, 2018
* master:
  [ML] Fix master node deadlock during ML daily maintenance (#31836)
  Build: Switch integ-test-zip to OSS-only (#31866)
  SQL: Remove restriction for single column grouping (#31818)
  Build: Fix detection of Eclipse Compiler Server (#31838)
  Docs: Inconsistency between description and example (#31858)
  Re-enable bwc tests now that #29538 has been backported and 6.x intake build succeeded.
  QA: build improvements related to SQL projects (#31862)
  [Docs] Add clarification to analysis example (#31826)
  Check timeZone() argument in AbstractSqlQueryRequest (#31822)
  SQL: Fix incorrect HAVING equality (#31820)
  Smaller aesthetic fixes to InternalTestCluster (#31831)
  [Docs] Clarify accepted sort case (#31605)
  Temporarily disable bwc test in order to backport #29538
  Remove obsolete parameters from analyze rest spec (#31795)
  [Docs] Fix wrong link in Korean analyzer docs (#31815)
  Fix profiling of ordered terms aggs (#31814)
  Properly mute test involving JDK11 closes #31739
  Do not return all indices if a specific alias is requested via get aliases api. (#29538)
  Get snapshot rest client cleanups (#31740)
  Docs: Explain _bulk?refresh shard targeting
  Fix handling of points_only with term strategy in geo_shape (#31766)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants