Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspaces are failing to start #10300

Closed
magbj opened this issue Jul 6, 2018 · 10 comments
Closed

Workspaces are failing to start #10300

magbj opened this issue Jul 6, 2018 · 10 comments
Labels
kind/question Questions that haven't been identified as being feature requests or bugs.

Comments

@magbj
Copy link

magbj commented Jul 6, 2018

Description

I have been able to get Eclipse Che up and running, and everything seems to be working to the point of creating workspaces. I am able to go through the page used for creating the workspace and I can pull up the IDE, but the container never becomes available.

I am also noticing these types of messages in the browser console:

vendor-73784b1bca.js:108794 WebSocket connection to 'wss://che-cde-che.cde.nonprd.aws.example.com/api/websocket?token=eyJhbGciOiJSUz…mdycmheThiPnaVORPHmMuzIiokjHaCPqq0iuvjxNXeGSl9mRw_65NaY2icLwpeZHkC4YU64dzQ' failed: Error during WebSocket handshake: Unexpected response code: 404
vendor-73784b1bca.js:33381 WebSocket connection is closed.
vendor-73784b1bca.js:31626 GET https://che-cde-che.cde.nonprd.aws.example.com/api/permissions/system 404 ()
vendor-73784b1bca.js:33381 WebSocket is reconnecting, attempt #1 out of 100...
vendor-73784b1bca.js:31626 GET https://che-cde-che.cde.nonprd.aws.example.com/api/oauth/ 403 ()
vendor-73784b1bca.js:31626 GET https://che-cde-che.cde.nonprd.aws.example.com/api/organization/find?name=mbjorkman 404 ()
vendor-73784b1bca.js:108794 WebSocket connection to 'wss://che-cde-che.cde.nonprd.aws.example.com/api/websocket?token=eyJhbGciOiJSUz…mdycmheThiPnaVORPHmMuzIiokjHaCPqq0iuvjxNXeGSl9mRw_65NaY2icLwpeZHkC4YU64dzQ' failed: Error during WebSocket handshake: Unexpected response code: 404
vendor-73784b1bca.js:33381 WebSocket connection is closed.
vendor-73784b1bca.js:33381 WebSocket will be reconnected in 30000 ms...

I am not sure these are related, but I am guessing some breakdown in the communication, as I can see the PVC being created in the Kubernetes cluster and the container has successfully started. I am able to launch into the container (kubectl exec -it workspace4khyhc5nf1563y2a.dockerimage-76d7cbbb5b-bftll --namespace cde-che -- /bin/bash) and everything seems to be normal.

Thanks,
Magnus

Reproduction Steps

I have installed Eclipse Che according to: https://www.eclipse.org/che/docs/kubernetes-multi-user.html

Overall the setup looks like:

  • Deployed into AWS in a private subnet (1 AZ/subnet), running on top of AWS EKS/Kubernetes
  • ELB for incoming traffic with a SSL cert served from AWS ACM (Certificate Manager)
  • Using Nginx Ingress Controller (0.16.2). SSL Termination on ELB is sending traffic over HTTP to Nginx Ingress Controller.
  • Keycloak version: 3.4.3.Final
  • Eclipse Che version: 6.8.0-SNAPSHOT
  • Created a storage class for EBS/GP2
  • All ports/IPs are accessible within private subnet

catalina.log:

2018-07-05 23:30:04,748[ost-startStop-1]  [INFO ] [.e.c.c.d.JNDIDataSourceFactory 59]   - This=org.eclipse.che.core.db.postgresql.PostgreSQLJndiDataSourceFactory@52084554 obj=ResourceRef[className=javax.sql.DataSource,factoryClassLocation=null,factoryClassName=org.apache.naming.factory.ResourceFactory,{type=scope,content=Shareable},{type=auth,content=Container},{type=singleton,content=true},{type=factory,content=org.eclipse.che.api.CommonJndiDataSourceFactory}] name=che Context=org.apache.naming.NamingContext@2afc244e environment={}
2018-07-05 23:30:06,736[ost-startStop-1]  [INFO ] [o.e.c.m.k.s.KeycloakSettings 85]     - Retrieving OpenId configuration from endpoint: https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/.well-known/openid-configuration
2018-07-05 23:30:07,145[ost-startStop-1]  [INFO ] [o.e.c.m.k.s.KeycloakSettings 102]    - openid configuration = {issuer=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che, authorization_endpoint=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/auth, token_endpoint=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/token, token_introspection_endpoint=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/token/introspect, userinfo_endpoint=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/userinfo, end_session_endpoint=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/logout, jwks_uri=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/certs, check_session_iframe=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/protocol/openid-connect/login-status-iframe.html, grant_types_supported=[authorization_code, implicit, refresh_token, password, client_credentials], response_types_supported=[code, none, id_token, token, id_token token, code id_token, code token, code id_token token], subject_types_supported=[public, pairwise], id_token_signing_alg_values_supported=[RS256], userinfo_signing_alg_values_supported=[RS256], request_object_signing_alg_values_supported=[none, RS256], response_modes_supported=[query, fragment, form_post], registration_endpoint=https://che-cde-che.cde.nonprd.aws.example.com/auth/realms/che/clients-registrations/openid-connect, token_endpoint_auth_methods_supported=[private_key_jwt, client_secret_basic, client_secret_post], token_endpoint_auth_signing_alg_values_supported=[RS256], claims_supported=[sub, iss, auth_time, name, given_name, family_name, preferred_username, email], claim_types_supported=[normal], claims_parameter_supported=false, scopes_supported=[openid, offline_access], request_parameter_supported=true, request_uri_parameter_supported=true}
2018-07-05 23:30:07,613[ost-startStop-1]  [INFO ] [o.f.c.i.d.DbSupportFactory 44]       - Database: jdbc:postgresql://postgres:5432/dbche (PostgreSQL 9.6)
2018-07-05 23:30:07,642[ost-startStop-1]  [INFO ] [o.f.c.i.util.VersionPrinter 44]      - Flyway 4.2.0 by Boxfuse
2018-07-05 23:30:07,648[ost-startStop-1]  [INFO ] [o.f.c.i.d.DbSupportFactory 44]       - Database: jdbc:postgresql://postgres:5432/dbche (PostgreSQL 9.6)
2018-07-05 23:30:07,683[ost-startStop-1]  [INFO ] [i.f.CustomSqlMigrationResolver 157]  - Searching for sql scripts in locations [classpath:che-schema]
2018-07-05 23:30:07,720[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbValidate 44]       - Successfully validated 32 migrations (execution time 00:00.039s)
2018-07-05 23:30:07,732[ost-startStop-1]  [INFO ] [o.f.c.i.m.MetaDataTableImpl 44]      - Creating Metadata table: "public"."schema_version"
2018-07-05 23:30:07,758[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Current version of schema "public": << Empty Schema >>
2018-07-05 23:30:07,800[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.0.0.8.1 - 1__init.sql
2018-07-05 23:30:07,997[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.0.0.9.1 - 1__add_index_on_workspace_temporary.sql
2018-07-05 23:30:08,016[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.0.0.9.2 - 2__update_local_links_in_environments.sql
2018-07-05 23:30:08,028[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.2.0.1 - 1__increase_project_attributes_values_length.sql
2018-07-05 23:30:08,043[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.4.0.1 - 1__drop_user_to_account_relation.sql
2018-07-05 23:30:08,060[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.4.0.2 - 2__create_missed_account_indexes.sql
2018-07-05 23:30:08,076[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.6.0.1 - 1__add_exec_agent_where_terminal_agent_is_present.sql
2018-07-05 23:30:08,103[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.7.0.1 - 1__add_factory.sql
2018-07-05 23:30:08,175[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.7.0.2 - 2__remove_match_policy.sql
2018-07-05 23:30:08,187[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.8.0.1 - 1__add_foreigh_key_indexes.sql
2018-07-05 23:30:08,286[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.11.0.1 - 1__optimize_user_search.sql
2018-07-05 23:30:08,312[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.19.0.0.1 - 0.1__add_permissions.sql
2018-07-05 23:30:08,395[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.19.0.0.2 - 0.2__add_resources.sql
2018-07-05 23:30:08,422[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 5.19.0.0.3 - 0.3__add_organization.sql
2018-07-05 23:30:08,470[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.1 - 1__add_path_to_serverconf.sql
2018-07-05 23:30:08,482[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.2 - 2__rename_agents_to_installers.sql
2018-07-05 23:30:08,503[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.3 - 3__add_installer.sql
2018-07-05 23:30:08,542[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.3.1 - 3.1__remove_old_recipe_permissions.sql
2018-07-05 23:30:08,554[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.4 - 4__remove_old_recipe.sql
2018-07-05 23:30:08,567[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.5 - 5__add_machine_env.sql
2018-07-05 23:30:08,584[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.6 - 6__remove_snapshots.sql
2018-07-05 23:30:08,596[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.7 - 7__add_machine_volumes.sql
2018-07-05 23:30:08,616[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.8 - 8__add_serverconf_attributes.sql
2018-07-05 23:30:08,643[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.9 - 9__increase_externalmachine_env_value_length.sql
2018-07-05 23:30:08,654[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.10 - 10__move_dockerimage_recipe_location_to_content.sql
2018-07-05 23:30:08,665[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.11 - 11__increase_workspace_attributes_values_length.sql
2018-07-05 23:30:08,675[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.0.0.12 - 12__remove_stack_sources.sql
2018-07-05 23:30:08,686[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.3.0.1 - 1__add_fk_indexes.sql
2018-07-05 23:30:08,702[ost-startStop-1]  [WARN ] [o.f.c.i.dbsupport.JdbcTemplate 48]   - DB: identifier "che_index_factory_on_projects_loaded_action_value_action_entity_id" will be truncated to "che_index_factory_on_projects_loaded_action_value_action_entity" (SQL State: 42622 - Error Code: 0)
2018-07-05 23:30:08,711[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.3.0.1.1 - 1.1__add_fk_indexes.sql
2018-07-05 23:30:08,739[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.4.0.1 - 1__add_workspace_expirations.sql
2018-07-05 23:30:08,760[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.4.0.2 - 2__add_signature_key.sql
2018-07-05 23:30:08,789[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Migrating schema "public" to version 6.4.0.3 - 3__add_k8s_runtimes.sql
2018-07-05 23:30:08,836[ost-startStop-1]  [INFO ] [o.f.c.i.command.DbMigrate 44]        - Successfully applied 32 migrations to schema "public" (execution time 00:01.105s).
2018-07-05 23:30:10,698[ost-startStop-1]  [INFO ] [org.xnio 93]                         - XNIO version 3.3.4.Final
2018-07-05 23:30:10,722[ost-startStop-1]  [INFO ] [org.xnio.nio 55]                     - XNIO NIO Implementation Version 3.3.4.Final
2018-07-05 23:30:15,072[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.terminal:1.0.0' added to the registry.
2018-07-05 23:30:15,083[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ws-agent:1.0.1' added to the registry.
2018-07-05 23:30:15,089[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.terminal:1.0.1' added to the registry.
2018-07-05 23:30:15,098[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ws-agent:1.0.0' added to the registry.
2018-07-05 23:30:15,103[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.php:5.4.0' added to the registry.
2018-07-05 23:30:15,107[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.php:5.3.7' added to the registry.
2018-07-05 23:30:15,112[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.js-ts:1.0.1' added to the registry.
2018-07-05 23:30:15,116[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.php:2.0.1' added to the registry.
2018-07-05 23:30:15,122[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ssh:1.0.0' added to the registry.
2018-07-05 23:30:15,129[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.exec:1.0.0' added to the registry.
2018-07-05 23:30:15,134[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.csharp:1.0.1' added to the registry.
2018-07-05 23:30:15,141[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.exec:1.0.1' added to the registry.
2018-07-05 23:30:15,145[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.camel:1.0.0' added to the registry.
2018-07-05 23:30:15,149[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.git-credentials:1.0.0' added to the registry.
2018-07-05 23:30:15,154[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.clangd:1.0.0' added to the registry.
2018-07-05 23:30:15,159[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.python:1.0.3' added to the registry.
2018-07-05 23:30:15,163[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.golang:0.1.7' added to the registry.
2018-07-05 23:30:15,168[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.python:1.0.4' added to the registry.
2018-07-05 23:30:15,176[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ws-agent:1.0.3' added to the registry.
2018-07-05 23:30:15,184[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ws-agent:1.0.2' added to the registry.
2018-07-05 23:30:15,189[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.unison:1.0.0' added to the registry.
2018-07-05 23:30:15,193[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.json:1.0.1' added to the registry.
2018-07-05 23:30:15,197[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.ls.yaml:1.0.0' added to the registry.
2018-07-05 23:30:15,201[ost-startStop-1]  [INFO ] [a.i.s.i.LocalInstallerRegistry 78]   - Installer 'org.eclipse.che.test.ls:1.0.0' added to the registry.
2018-07-05 23:30:15,833[ost-startStop-1]  [INFO ] [.m.m.a.s.s.SignatureKeyManager 119]  - Generated signature key pair with id signatureKeyiq5t7lr257rf0q44 and algorithm RSA.
2018-07-05 23:30:23,979[ost-startStop-1]  [INFO ] [.WorkspaceNextObjectsRetriever 79]   - Workspace.Next is disabled - Feature API endpoint property 'che.workspace.feature.api' is not configured
2018-07-05 23:30:23,979[ost-startStop-1]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 164]   - Configured factories for environments: '[kubernetes, dockerimage]'
2018-07-05 23:30:23,980[ost-startStop-1]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 165]   - Registered infrastructure 'kubernetes'
2018-07-05 23:30:28,564[ost-startStop-1]  [WARN ] [o.e.c.a.w.s.stack.StackLoader 133]   - No configured image found for stack che-in-che
2018-07-05 23:30:28,595[ost-startStop-1]  [INFO ] [o.e.c.a.w.s.stack.StackLoader 103]   - Stacks initialization finished
2018-07-05 23:30:28,802[ost-startStop-1]  [WARN ] [p.s.AdminPermissionInitializer 68]   - Admin admin not found yet.
2018-07-05 23:32:26,450[nio-8080-exec-8]  [INFO ] [o.e.c.a.w.s.WorkspaceManager 459]    - Workspace 'mbjorkman/testjava' with id 'workspace7ig8toll3blaybxz' created by user 'mbjorkman'
2018-07-05 23:32:33,357[nio-8080-exec-7]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 329]   - Starting workspace 'mbjorkman/testjava' with id 'workspace7ig8toll3blaybxz' by user 'mbjorkman'
2018-07-05 23:32:33,504[aceSharedPool-0]  [WARN ] [i.f.k.c.i.VersionUsageUtils 55]      - The client is using resource type 'ingresses' with unstable version 'v1beta1'
2018-07-05 23:33:19,467[aceSharedPool-0]  [WARN ] [i.f.k.c.i.VersionUsageUtils 55]      - The client is using resource type 'deployments' with unstable version 'v1beta1'
2018-07-05 23:41:26,689[aceSharedPool-0]  [WARN ] [.i.k.KubernetesInternalRuntime 191]  - Failed to start Kubernetes runtime of workspace workspace7ig8toll3blaybxz. Cause: null
2018-07-05 23:41:26,969[aceSharedPool-0]  [WARN ] [i.f.k.c.i.VersionUsageUtils 55]      - The client is using resource type 'replicasets' with unstable version 'v1beta1'
2018-07-05 23:41:27,217[aceSharedPool-0]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 390]   - Workspace 'mbjorkman:testjava' with id 'workspace7ig8toll3blaybxz' start failed
2018-07-05 23:41:27,219[aceSharedPool-0]  [ERROR] [o.e.c.a.w.s.WorkspaceRuntimes 400]   - null
org.eclipse.che.api.workspace.server.spi.InternalInfrastructureException: null
        at org.eclipse.che.workspace.infrastructure.kubernetes.StartSynchronizer.getStartFailureNow(StartSynchronizer.java:274)
        at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStart(KubernetesInternalRuntime.java:186)
        at org.eclipse.che.api.workspace.server.spi.InternalRuntime.start(InternalRuntime.java:145)
        at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartRuntimeTask.run(WorkspaceRuntimes.java:366)
        at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:37)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: null
        at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
        at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.waitMachines(KubernetesInternalRuntime.java:251)
        at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStart(KubernetesInternalRuntime.java:183)
        ... 7 common frames omitted
2018-07-05 23:41:27,257[aceSharedPool-0]  [WARN ] [o.e.c.a.w.s.WorkspaceManager 423]    - Cannot set error status of the workspace workspace7ig8toll3blaybxz. Error is: null

access.log (snippet):

10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /api/workspace HTTP/1.1" 304 -
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /workspace-loader/mbjorkman/testjava?uid=294921 HTTP/1.1" 200 1091
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /api/keycloak/settings HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /api/keycloak/settings HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /api/workspace/mbjorkman/testjava HTTP/1.1" 200 1159
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /mbjorkman/testjava?uid=294921 HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /api/keycloak/settings HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /api/keycloak/settings HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:48 +0000] "GET /_app/compilation-mappings.txt?1530835848987 HTTP/1.1" 200 142
10.97.80.13 - - [06/Jul/2018:00:10:49 +0000] "GET /_app/compilation-mappings.properties?1530835849016 HTTP/1.1" 200 103
10.97.80.13 - - [06/Jul/2018:00:10:49 +0000] "GET /_app/59CB0CD942F9A2D15831446792F4A5B9.cache.js HTTP/1.1" 304 -
10.97.80.13 - - [06/Jul/2018:00:10:49 +0000] "GET /_app/_app.nocache.js HTTP/1.1" 200 7414
10.97.80.13 - - [06/Jul/2018:00:10:49 +0000] "GET /_app/59CB0CD942F9A2D15831446792F4A5B9.cache.js HTTP/1.1" 304 -
10.97.80.13 - - [06/Jul/2018:00:10:50 +0000] "GET /_app/font-awesome-4.5.0/css/font-awesome.min.css HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:50 +0000] "GET /api/project-template/all HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:50 +0000] "GET /api/profile/ HTTP/1.1" 200 1223
10.97.80.13 - - [06/Jul/2018:00:10:50 +0000] "GET /api/preferences HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:50 +0000] "GET /api/websocket?token=eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldjU1OC01ZjRlLTQ1NmQtOWNhZC1lYmZmMzExZWI0ZWQiLCJleHAiOjE1MzA4MzM4MDYsMtNDFhMS1hODdlLWI3NTM4NTZiMjllMyIsInR5cCI6IkJlYXJlciIsImF6cCI6ImNoZS1wdWJsaWMiLCJub25jZSI6IjdhM2YxYjc3LWNmYjEtNGNhNS05ZDAwLTI5MGIwNTkxYTdlZSIsImF1dGhfdGltZSI6MTUzMDgzMzUwNSwic2Vzc2lvbl9zdGF0ZSI6IjY1YWQ2ZTg1LTY3MTMtNDkxZi04OWYyLTk2NjEwODU5MmFhOSIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsiaHR0cHM6Ly9jaGUtY2RlLWNoZS5jZGUubm9ucHJkLmF3cy5tbHJqb3JrbWFuIiwiZ2l2ZW5fbmFtZSI6Ik1hZ251cyIsImZhbWlseV9uYW1lIjoiQmpvcmttYW4iLCJlbWFpbCI6Im1hZ251cy5iam9ya21hbkBtbHAuY29tIn0.Qvmzk3Qh2VQmud-hERe_g_IhbObzi7-WHF6DDiCZGxsoEkPHW_7J0onRswAnKTaAmOzdBMTHspnZCyr386Srflhqcp8KFAbO_RxCh1pVsGn8PLqis63UyanxYRsNi2sYZKbYdKBYX8U7bZ-vrgDwqYOKIn6Kc-1w7Qr-9dG-_CAmdycmheThiPnaVORPHmMuzIiokjHaCPqq0iuvjxNXeGSl9mRw_65NaY2icLwpeZHkC4YU64dzQ HTTP/1.1" 403 -
10.97.80.13 - - [06/Jul/2018:00:10:50 +0000] "GET /api/workspace/mbjorkman/testjava HTTP/1.1" 200 1159
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /_app/require.js HTTP/1.1" 200 83083
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /_app/font-awesome-4.5.0/fonts/fontawesome-webfont.woff2?v=4.5.0 HTTP/1.1" 200 66624
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /_app/activity.js HTTP/1.1" 200 2018
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /api/workspace/settings HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /api/installer HTTP/1.1" 200 -
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /_app/term/xterm.js HTTP/1.1" 200 74341
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "POST /api/workspace/workspace7ig8toll3blaybxz/runtime?environment=default HTTP/1.1" 200 2011
10.97.80.13 - - [06/Jul/2018:00:10:51 +0000] "GET /api/websocket?token=eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJlMjNGc3kzRlI5dnRUZms3TGlkX1lQOGU0cDNoY0psM20wQTRnckIzNnJJIn0.eyJqdGkiOiIzNDg5Mjk0My05MzNhLTRhMWYtODA3MC04Y2RlZmI4ZWFmMzEiLCJleHAiOjE1MzA4MzYxNDgsIm5iZiI6MCwiaWF0IjoxNTMwODM1ODQ4LCJpc3MiOiJodHRwczovL2NoZS1jZGUtY2hlLmNkZS5ub25wcmQuYXdzLm1scC5jb20vYXV0aC9yZWFsbXMvY2hlIiwiYXVkIjoiY2hlLXB1YmxpYyIsInN1YiI6ImJhYmY1NjZkL2VzcyI6eyJhY2NvdW50Ijp7InJvbGVzIjpbIm1hbmFnZS1hY2NvdW50IiwibWFuYWdlLWFjY291bnQtbGlua3MiLCJ2aWV3LXByb2ZpbGUiXX19LCJuYW1lIjoiTWFnbnVzIEJqb3JrbWFuIiwicHJlZmVycmVkX3VzZXJuYW1lIjoibWJqb3JrbWFuIiwiZ2l2ZW5fbmFtZSI6Ik1hZ251cyIsImZhbWlseV9uYW1lIjoiQmpvcmttYW4iLCJlbWFpbCI6Im1hZ251cy5iam9ya21hbkBtbHAuY29tIn0.IrBnBvqbLAVCdnLNZdpOCo2azUvDi9myrvHQC134ODI12jXi4liEKTIJejXei3VePo0q3vZLhBgOi3Uj_el6mJGt8FPIFW181U6HRlIr4jZVezW48D9-ZTnVsFVbThzQK3_tw_9x82MQRylQQxPLnAiW9Gur2nRjZHZNshJzbmQJO1M31rvMhfyvrWD_2N8QlS1YJ6lU9bnRC-MookupI-kqCw8JKsPjTv4EruhkfMkEqVEf0q1NFtsutetrLWM-Q60mT8LDYV9eqmTOuv9u-oAXtFeR_k9eIwc4336rS6TmRglGxln9VPU6ZB_5-P6_ybvqUnjnDsjuI16ZZL-mww HTTP/1.1" 404 1083

Environment Variables from Che container:

BASH=/bin/bash
BASHOPTS=cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=()
BASH_ARGV=()
BASH_CMDS=()
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="4" [1]="3" [2]="48" [3]="1" [4]="release" [5]="x86_64-alpine-linux-musl")
BASH_VERSION='4.3.48(1)-release'
CHE_API=https://che-cde-che.cde.nonprd.aws.example.com/api
CHE_DEBUG_SERVER=true
CHE_HOST=che-cde-che.cde.nonprd.aws.example.com
CHE_HOST_PORT=tcp://172.20.237.182:8080
CHE_HOST_PORT_8080_TCP=tcp://172.20.237.182:8080
CHE_HOST_PORT_8080_TCP_ADDR=172.20.237.182
CHE_HOST_PORT_8080_TCP_PORT=8080
CHE_HOST_PORT_8080_TCP_PROTO=tcp
CHE_HOST_SERVICE_HOST=172.20.237.182
CHE_HOST_SERVICE_PORT=8080
CHE_HOST_SERVICE_PORT_HTTP=8080
CHE_INFRASTRUCTURE_ACTIVE=kubernetes
CHE_INFRA_KUBERNETES_BOOTSTRAPPER_BINARY__URL=https://che-cde-che.cde.nonprd.aws.example.com/agent-binaries/linux_amd64/bootstrapper/bootstrapper
CHE_INFRA_KUBERNETES_INGRESS_ANNOTATIONS__JSON='{"kubernetes.io/ingress.class": "nginx", "kubernetes.io/tls-acme": "true", "nginx.ingress.kubernetes.ioingress.kubernetes.io/rewrite-target": "/","nginx.ingress.kubernetes.ioingress.kubernetes.io/ssl-redirect": "true","nginx.ingress.kubernetes.ioingress.kubernetes.io/proxy-connect-timeout": "3600","nginx.ingress.kubernetes.ioingress.kubernetes.io/proxy-read-timeout": "3600"}'
CHE_INFRA_KUBERNETES_INGRESS_DOMAIN=cde.nonprd.aws.example.com
CHE_INFRA_KUBERNETES_MACHINE__START__TIMEOUT__MIN=5
CHE_INFRA_KUBERNETES_MASTER__URL=
CHE_INFRA_KUBERNETES_NAMESPACE=cde-che
CHE_INFRA_KUBERNETES_OAUTH__TOKEN=
CHE_INFRA_KUBERNETES_PASSWORD=
CHE_INFRA_KUBERNETES_POD_SECURITY__CONTEXT_FS__GROUP=0
CHE_INFRA_KUBERNETES_POD_SECURITY__CONTEXT_RUN__AS__USER=0
CHE_INFRA_KUBERNETES_PVC_PRECREATE__SUBPATHS=false
CHE_INFRA_KUBERNETES_PVC_STRATEGY=unique
CHE_INFRA_KUBERNETES_SERVER__STRATEGY=single-host
CHE_INFRA_KUBERNETES_TLS__ENABLED=true
CHE_INFRA_KUBERNETES_TLS__SECRET=che-tls
CHE_INFRA_KUBERNETES_TRUST__CERTS=false
CHE_INFRA_KUBERNETES_USERNAME=
CHE_IN_CONTAINER=true
CHE_KEYCLOAK_AUTH__SERVER__URL=https://che-cde-che.cde.nonprd.aws.example.com/auth
CHE_KEYCLOAK_CLIENT__ID=che-public
CHE_KEYCLOAK_REALM=che
CHE_LOCAL_CONF_DIR=/etc/conf
CHE_LOGS_APPENDERS_IMPL=plaintext
CHE_LOGS_DIR=/data/logs
CHE_LOG_LEVEL=INFO
CHE_MULTIUSER=true
CHE_OAUTH_GITHUB_CLIENTID=
CHE_OAUTH_GITHUB_CLIENTSECRET=
CHE_PORT=8080
CHE_PREDEFINED_STACKS_RELOAD__ON__START=false
CHE_WEBSOCKET_ENDPOINT=wss://che-cde-che.cde.nonprd.aws.example.com/api/websocket
CHE_WORKSPACE_AUTO_START=false
COLUMNS=192
DEFAULT_HTTP_BACKEND_PORT=tcp://172.20.38.18:80
DEFAULT_HTTP_BACKEND_PORT_80_TCP=tcp://172.20.38.18:80
DEFAULT_HTTP_BACKEND_PORT_80_TCP_ADDR=172.20.38.18
DEFAULT_HTTP_BACKEND_PORT_80_TCP_PORT=80
DEFAULT_HTTP_BACKEND_PORT_80_TCP_PROTO=tcp
DEFAULT_HTTP_BACKEND_SERVICE_HOST=172.20.38.18
DEFAULT_HTTP_BACKEND_SERVICE_PORT=80
DIRSTACK=()
DOCKER_BUCKET=get.docker.com
DOCKER_VERSION=1.6.0
EUID=0
GROUPS=()
HISTFILE=/root/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/root
HOSTNAME=che-6768979557-8fkbd
HOSTTYPE=x86_64
IFS=$' \t\n'
INGRESS_NGINX_PORT=tcp://172.20.1.136:80
INGRESS_NGINX_PORT_443_TCP=tcp://172.20.1.136:443
INGRESS_NGINX_PORT_443_TCP_ADDR=172.20.1.136
INGRESS_NGINX_PORT_443_TCP_PORT=443
INGRESS_NGINX_PORT_443_TCP_PROTO=tcp
INGRESS_NGINX_PORT_80_TCP=tcp://172.20.1.136:80
INGRESS_NGINX_PORT_80_TCP_ADDR=172.20.1.136
INGRESS_NGINX_PORT_80_TCP_PORT=80
INGRESS_NGINX_PORT_80_TCP_PROTO=tcp
INGRESS_NGINX_SERVICE_HOST=172.20.1.136
INGRESS_NGINX_SERVICE_PORT=80
INGRESS_NGINX_SERVICE_PORT_HTTP=80
INGRESS_NGINX_SERVICE_PORT_HTTPS=443
JAVA_ALPINE_VERSION=8.131.11-r2
JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk/jre
JAVA_OPTS='-XX:MaxRAMFraction=2 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true -Xms20m '
JAVA_VERSION=8u131
KEYCLOAK_PORT=tcp://172.20.206.113:5050
KEYCLOAK_PORT_5050_TCP=tcp://172.20.206.113:5050
KEYCLOAK_PORT_5050_TCP_ADDR=172.20.206.113
KEYCLOAK_PORT_5050_TCP_PORT=5050
KEYCLOAK_PORT_5050_TCP_PROTO=tcp
KEYCLOAK_SERVICE_HOST=172.20.206.113
KEYCLOAK_SERVICE_PORT=5050
KEYCLOAK_SERVICE_PORT_5050=5050
KUBERNETES_PORT=tcp://172.20.0.1:443
KUBERNETES_PORT_443_TCP=tcp://172.20.0.1:443
KUBERNETES_PORT_443_TCP_ADDR=172.20.0.1
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_HOST=172.20.0.1
KUBERNETES_SERVICE_PORT=443
KUBERNETES_SERVICE_PORT_HTTPS=443
LANG=C.UTF-8
LINES=54
MACHTYPE=x86_64-alpine-linux-musl
MAILCHECK=60
OLDPWD=/data/logs
OPENSHIFT_KUBE_PING_NAMESPACE=cde-che
OPTERR=1
OPTIND=1
OSTYPE=linux-musl
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin
PIPESTATUS=([0]="0")
POSTGRES_PORT=tcp://172.20.17.148:5432
POSTGRES_PORT_5432_TCP=tcp://172.20.17.148:5432
POSTGRES_PORT_5432_TCP_ADDR=172.20.17.148
POSTGRES_PORT_5432_TCP_PORT=5432
POSTGRES_PORT_5432_TCP_PROTO=tcp
POSTGRES_SERVICE_HOST=172.20.17.148
POSTGRES_SERVICE_PORT=5432
POSTGRES_SERVICE_PORT_5432=5432
PPID=0
PS1='\s-\v\$ '
PS2='> '
PS4='+ '
PWD=/data/logs/logs
SHELL=/bin/ash
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
TERM=xterm
UID=0

Screenshot:
eclipseche_not_starting

@ghost ghost added the kind/question Questions that haven't been identified as being feature requests or bugs. label Jul 6, 2018
@ghost
Copy link

ghost commented Jul 6, 2018

@magbj have you attempted to deploy in http mode?

Anything suspicious in k8s events? I guess no.. but just in case. If the pod has started, the problem is with bootstrapping a workspace.

After a container acquires running state, a k8s exec is executed to curl bootstrapper binary.

When in a workspace container, do you see anything in /tmp/bootstrapper? Also, running ps ax may help understand where exactly the error occurs.

@magbj
Copy link
Author

magbj commented Jul 6, 2018

@eivantsov do you mean like http vs. https?

I did figure out what the problem was with websockets as well as a related problem with Keycloak not recognizing https. It turns out that ELB Layer 7 cannot do websockets. However, if you start using ELB Layer 4, then Nginx Ingress cannot send all the needed http headers to Keycloak, specifically x-forwarded-proto, which makes Keycloak mix http/https. So either path with ELB will break Eclipse Che/Keycloak. So I switched out ELB with ALB (https://github.com/kubernetes-sigs/aws-alb-ingress-controller), and it seems much better so far. The ALB goes directly to Keycloak and Che, but also on path (/server*) to the nginx-ingress-controller so Che can keep adding in ingress rules.

WSS is still a little bit flakey, but I am able to connect now.

So Keycloak seems rock solid now, and the Eclipse Che UI seems great as well, but I am still not able to create containers. Actually, I did not even get to the stage of the command being issued to Kubernetes. It seems to get stuck on not being able to create ingress rules now. From the Eclipse Che catalina.out:

2018-07-06 21:08:20,486[nio-8080-exec-3]  [INFO ] [o.e.c.a.w.s.WorkspaceManager 459]    - Workspace 'mbjorkman/wksp-9erm' with id 'workspace520fhejaqaqqk87g' created by user 'mbjorkman'
2018-07-06 21:08:30,596[nio-8080-exec-8]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 329]   - Starting workspace 'mbjorkman/wksp-9erm' with id 'workspace520fhejaqaqqk87g' by user 'mbjorkman'
2018-07-06 21:08:30,752[aceSharedPool-0]  [WARN ] [i.f.k.c.i.VersionUsageUtils 55]      - The client is using resource type 'ingresses' with unstable version 'v1beta1'
2018-07-06 21:13:30,910[aceSharedPool-0]  [WARN ] [.i.k.KubernetesInternalRuntime 192]  - Failed to start Kubernetes runtime of workspace workspace520fhejaqaqqk87g. Cause: Waiting for ingress 'ingressix99mpve' reached timeout
2018-07-06 21:13:31,247[aceSharedPool-0]  [INFO ] [o.e.c.a.w.s.WorkspaceRuntimes 390]   - Workspace 'mbjorkman:wksp-9erm' with id 'workspace520fhejaqaqqk87g' start failed
2018-07-06 21:24:02,505[nio-8080-exec-3]  [INFO ] [o.e.c.a.w.s.WorkspaceManager 270]    - Workspace 'workspace520fhejaqaqqk87g' removed by user 'mbjorkman'

Interestingly, the ingress rules seems to have been created right away:

kubectl get ingress --namespace cde-che
NAME              HOSTS                                ADDRESS            PORTS     AGE
che-ingress       che-cde-che.cde.nonprd.aws.mlp.com   internal-8585...   80        23m
ingress2dhslpez   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m
ingress2fjh49op   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m
ingress2hbskuug   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m
ingressix99mpve   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m
ingressq00a29kf   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m
ingresstcj2lvqe   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m
ingressucanqo95   che-cde-che.cde.nonprd.aws.mlp.com                      80        1m

The events don't look particularly odd:

28m         28m       1         ingresstcj2lvqe.153ee3b9dd7fc765                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingresstcj2lvqe
28m         28m       1         ingress2hbskuug.153ee3b9de032fd9                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingress2hbskuug
28m         28m       1         ingressq00a29kf.153ee3b9dfc20fcc                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingressq00a29kf
28m         28m       1         ingressucanqo95.153ee3b9df59982d                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingressucanqo95
28m         28m       1         ingressix99mpve.153ee3b9dc7aea75                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingressix99mpve
28m         28m       1         ingress2dhslpez.153ee3b9deefeace                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingress2dhslpez
28m         28m       1         ingress2fjh49op.153ee3b9de7d55a3                Ingress                           Normal    CREATE                  nginx-ingress-controller      Ingress cde-che/ingress2fjh49op
28m         28m       1         claim-che-workspace-3e0xx6a3.153ee3b9f9ab20f7   PersistentVolumeClaim             Normal    ProvisioningSucceeded   persistentvolume-controller   Successfully provisioned volume pvc-bc0ef3c6-8160-11e8-ba26-0a0748dd349e using kubernetes.io/aws-ebs
27m         27m       1         claim-che-workspace-6qo830x0.153ee3bcb66db4be   PersistentVolumeClaim             Normal    ProvisioningSucceeded   persistentvolume-controller   Successfully provisioned volume pvc-bc0c11f9-8160-11e8-ba26-0a0748dd349e using kubernetes.io/aws-ebs
23m         23m       1         ingress2dhslpez.153ee3ffc3b3a578                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingress2dhslpez
23m         23m       1         ingressix99mpve.153ee3ffc58ac3e1                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingressix99mpve
23m         23m       1         ingressq00a29kf.153ee3ffc631bf59                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingressq00a29kf
23m         23m       1         ingresstcj2lvqe.153ee3ffc6e13105                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingresstcj2lvqe
23m         23m       1         ingressucanqo95.153ee3ffc77516f8                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingressucanqo95
23m         23m       1         ingress2fjh49op.153ee3ffc4569c69                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingress2fjh49op
23m         23m       1         ingress2hbskuug.153ee3ffc4f3a5df                Ingress                           Normal    DELETE                  nginx-ingress-controller      Ingress cde-che/ingress2hbskuug

@magbj
Copy link
Author

magbj commented Jul 6, 2018

After setting stickiness on the ALB Target Groups, all the WSS problems seems to have gone away. No more messages in the console about disconnects.

So the main issue still remain though. Seems somewhat similar to this one: #10231

@i300543
Copy link
Contributor

i300543 commented Jul 8, 2018

I have the very same issue, workspace is not starting :
Failed to start Kubernetes runtime of workspace workspace47k8r15v2vbyedvx. Cause: Waiting for ingress 'ingress3u4egcpp' reached timeout

in the k8s dashboard i can see namespace is created, ingress rules and persistance volume claim as well, but there is no deployment, and no pod in the relevant workspace namespace.

could it be that the k8s adapter api's have changed ? (api used for programmatic yaml deployment)
we are using kubernetes server version "v1.10.5":
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-21T11:34:22Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

@sleshchenko
Copy link
Member

in the k8s dashboard i can see namespace is created, ingress rules and persistance volume claim as well, but there is no deployment, and no pod in the relevant workspace namespace.

The reason why deployments and pods are not created is that they are created when Ingresses are ready.
I think here is an issue with evaluation when ingresses are ready. Current implementation wait until IP will be available in Ingress' Loader Balancer https://github.com/eclipse/che/blob/59c1fa1626faba12bd075debe3a806437a668767/infrastructures/kubernetes/src/main/java/org/eclipse/che/workspace/infrastructure/kubernetes/KubernetesInternalRuntime.java#L587

I remember a similar registered issue (but I can't find it) where it is reported that AWS Ingress backend doesn't set Ingress IP. So, Che fails because it waits for it. @eivantsov Maybe you know where this issue is.

Initially, it was implemented in this way because Ingress IP was required since it was used as a host in servers URLs. Now, there are three ingresses URL strategies: default host, multi-host and single-host. As far as I understand Ingress IP is required only for default host strategies. I think we can create an issue not to wait for Ingress IP when multi-host and single-host are configured.

@i300543
Copy link
Contributor

i300543 commented Jul 9, 2018

the issue was with nginx-ingress-controller getting stuck for some reason.
restarting it (in kube-system) resolved the problem

@magbj
Copy link
Author

magbj commented Jul 9, 2018

Restarting does not solve the issue for me. @i300543 Are you running this in AWS?

@sleshchenko I tried all the different URL strategies, and none of them create an address:

-------------------Single Host----------------------------------

NAME              HOSTS                                ADDRESS            PORTS     AGE
alb-che-ingress   che-cde-che.cde.nonprd.aws.example.com   internal-8585...   80        1h
che-ingress       che-cde-che.cde.nonprd.aws.example.com                      80        2d
ingress1ckfbbri   che-cde-che.cde.nonprd.aws.example.com                      80        21s
ingresse9r5vlex   che-cde-che.cde.nonprd.aws.example.com                      80        21s
ingressjj7mm7c7   che-cde-che.cde.nonprd.aws.example.com                      80        21s
ingressmikpbot9   che-cde-che.cde.nonprd.aws.example.com                      80        21s
ingressn7rk0qeo   che-cde-che.cde.nonprd.aws.example.com                      80        21s
ingressy19qox27   che-cde-che.cde.nonprd.aws.example.com                      80        21s
ingressy2kuufuq   che-cde-che.cde.nonprd.aws.example.com                      80        21s
-------------------Default Host---------------------------------

NAME              HOSTS                                ADDRESS            PORTS     AGE
alb-che-ingress   che-cde-che.cde.nonprd.aws.example.com   internal-8585...   80        2h
che-ingress       che-cde-che.cde.nonprd.aws.example.com                      80        2d
ingress585aj4ev   *                                                       80        4s
ingresselat51xt   *                                                       80        4s
ingressfbq9l9ir   *                                                       80        4s
ingressfk4evy61   *                                                       80        4s
ingressqqm0oygh   *                                                       80        4s
ingresssbxjh6yk   *                                                       80        4s
ingressuxv6pja1   *                                                       80        4s
----------------------Multi Host---------------------------------
NAME              HOSTS                                                                        ADDRESS            PORTS     AGE
alb-che-ingress   che-cde-che.cde.nonprd.aws.example.com,keycloak-cde-che.cde.nonprd.aws.example.com   internal-8585...   80        2h
che-ingress       che-cde-che.cde.nonprd.aws.example.com                                                              80        2d
ingress5otlpa8b   serverbkbxie5t-dev-machine-server-8080.cde.nonprd.aws.example.com                                   80        1m
ingresska186rrm   serverbkbxie5t-dev-machine-server-4412.cde.nonprd.aws.example.com                                   80        1m
ingressl85sxd6s   serverbkbxie5t-dev-machine-server-4401.cde.nonprd.aws.example.com                                   80        1m
ingressm9630flm   serverbkbxie5t-dev-machine-server-8000.cde.nonprd.aws.example.com                                   80        1m
ingressmd3n5w8v   serverbkbxie5t-dev-machine-server-4403.cde.nonprd.aws.example.com                                   80        1m
ingresssmdkub3h   serverbkbxie5t-dev-machine-server-9876.cde.nonprd.aws.example.com                                   80        1m
ingresswptlbcy6   serverbkbxie5t-dev-machine-server-4411.cde.nonprd.aws.example.com                                   80        1m

@magbj
Copy link
Author

magbj commented Jul 9, 2018

I was able to figure it out. Some lessons learnt:

  • For ingress, the address field will be populated based on this flag to the nginx-ingress-controller: --publish-service=$(POD_NAMESPACE)/expose-nginx . That service has to exist and it has to create an external IP or external hostname.
  • You cannot use ELB Layer 7, because it does not support websockets.
  • You cannot use ELB Layer 4 and terminate SSL at the ELB, as nginx is not able to populate the headers correctly to the upstreams (e.g x-forwarded-proto, x-forwarded-port)
  • You cannot use ALB Layer 7, even though it supports websocket, nginx cannot support it when it is proxied in front of.

So for AWS, the only way that I currently could get this to work is using ELB Layer 4 with SSL termination on the Nginx nodes.

@magbj magbj closed this as completed Jul 9, 2018
@sleshchenko
Copy link
Member

@sleshchenko I tried all the different URL strategies, and none of them create an address:

@magbj You're right, that's why I wrote that we can create an issue for that

I think we can create an issue not to wait for Ingress IP when multi-host and single-host are configured.

So, I'll investigate this issue a bit more and then create an issue if needed

@i300543
Copy link
Contributor

i300543 commented Jul 10, 2018

@magbj
No we are running on GCP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Questions that haven't been identified as being feature requests or bugs.
Projects
None yet
Development

No branches or pull requests

3 participants