Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoolOfflineException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN ... #166

Closed
diegopacheco opened this issue Feb 3, 2017 · 6 comments
Assignees
Labels

Comments

@diegopacheco
Copy link

@ipapapa

Running dynomite locally single node with this config.

dyn_o_mite:
  datacenter: local-dc
  rack: rack1
  dyn_listen: 0.0.0.0:8101
  data_store: 0
  listen: 0.0.0.0:8102
  pem_key_file: conf/dynomite.pem  
  dyn_seed_provider: simple_provider
  servers:
  - 0.0.0.0:6379:1
  tokens: '100'

Running with dyno 1.5.7 using this code:

package com.github.diegopacheco.dynomite.dyno.connection.test;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Set;

import org.junit.Test;

import com.github.diegopacheco.dynomite.cluster.checker.DynomiteConfig;
import com.github.diegopacheco.dynomite.cluster.checker.parser.DynomiteNodeInfo;
import com.netflix.dyno.connectionpool.Host;
import com.netflix.dyno.connectionpool.HostSupplier;
import com.netflix.dyno.connectionpool.TokenMapSupplier;
import com.netflix.dyno.connectionpool.Host.Status;
import com.netflix.dyno.connectionpool.impl.RetryNTimes;
import com.netflix.dyno.connectionpool.impl.lb.AbstractTokenMapSupplier;
import com.netflix.dyno.contrib.ArchaiusConnectionPoolConfiguration;
import com.netflix.dyno.jedis.DynoJedisClient;

public class SimpleConnectionTest {
	
	@Test
	public void testConnection(){
		
		String clusterName = "local-cluster";
		
		DynomiteNodeInfo node = new DynomiteNodeInfo("127.0.0.1","8102","rack1","local-dc","100");
		
		DynoJedisClient dynoClient = new DynoJedisClient.Builder()
				.withApplicationName(DynomiteConfig.CLIENT_NAME)
	            .withDynomiteClusterName(clusterName)
	            .withCPConfig( new ArchaiusConnectionPoolConfiguration(DynomiteConfig.CLIENT_NAME)
	            					.withTokenSupplier(toTokenMapSupplier(Arrays.asList(node)))
	            					.setMaxConnsPerHost(1)
                                    .setConnectTimeout(2000)
                                    .setPoolShutdownDelay(0)
                                    .setFailOnStartupIfNoHosts(true)
                                    .setFailOnStartupIfNoHostsSeconds(2)
                                    .setMaxTimeoutWhenExhausted(2000)
                                    .setSocketTimeout(2000)
                                    .setRetryPolicyFactory(new RetryNTimes.RetryFactory(1))
	            )
	            .withHostSupplier(toHostSupplier(Arrays.asList(node)))
	            .build();

		dynoClient.set("Z", "200");
		System.out.println("Z: " + dynoClient.get("Z"));
		
	}
	
	private static TokenMapSupplier toTokenMapSupplier(List<DynomiteNodeInfo> nodes){
		StringBuilder jsonSB = new StringBuilder("[");
		int count = 0;
		for(DynomiteNodeInfo node: nodes){
			jsonSB.append(" {\"token\":\""+ node.getTokens() + "\",\"hostname\":\"" + node.getServer() + 
							"\",\"dc\":\"" +  node.getDc() 
							+ "\",\"rack\":\"" +  node.getRack()
							+ "\",\"zone\":\"" +  node.getDc()
							+ "\"} ");
			count++;
			if (count < nodes.size())
				jsonSB.append(" , ");
		}
		jsonSB.append(" ]\"");
		
	   final String json = jsonSB.toString();
	   TokenMapSupplier testTokenMapSupplier = new AbstractTokenMapSupplier() {
			    @Override
			    public String getTopologyJsonPayload(String hostname) {
			        return json;
			    }
				@Override
				public String getTopologyJsonPayload(Set<Host> activeHosts) {
					return json;
				}
		};
		return testTokenMapSupplier;
	}
	
	private static HostSupplier toHostSupplier(List<DynomiteNodeInfo> nodes){
		final List<Host> hosts = new ArrayList<Host>();
		
		for(DynomiteNodeInfo node: nodes){
			hosts.add(buildHost(node));
		}
		
		final HostSupplier customHostSupplier = new HostSupplier() {
		   @Override
		   public Collection<Host> getHosts() {
			   return hosts;
		   }
		};
		return customHostSupplier;
	}
	
	private static Host buildHost(DynomiteNodeInfo node){
		return new Host(node.getServer(),node.getServer(),8102,node.getRack(),node.getDc(),Status.Up);
	}
		
}

It was all working fine with dyno 1.5.1 so when I change to dyno 1.5.7 i'm getting this Exception.

com.netflix.dyno.connectionpool.exception.PoolOfflineException: PoolOfflineException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]host pool is offline and no Racks available for fallback
	at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.getConnection(HostSelectionWithFallback.java:161)
	at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.getConnectionUsingRetryPolicy(HostSelectionWithFallback.java:120)
	at com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl.executeWithFailover(ConnectionPoolImpl.java:292)
	at com.netflix.dyno.jedis.DynoJedisClient.d_set(DynoJedisClient.java:1233)
	at com.netflix.dyno.jedis.DynoJedisClient.set(DynoJedisClient.java:1223)
	at com.github.diegopacheco.dynomite.dyno.connection.test.SimpleConnectionTest.testConnection(SimpleConnectionTest.java:48)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:678)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

LOGS

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/diego/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.21/be4b3c560a37e69b6c58278116740db28832232c/slf4j-simple-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/diego/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-log4j12/1.7.21/7238b064d1aba20da2ac03217d700d91e02460fa/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
[main] WARN com.netflix.config.sources.URLConfigurationSource - No URLs will be polled as dynamic configuration sources.
[main] INFO com.netflix.config.sources.URLConfigurationSource - To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
[main] INFO com.netflix.config.DynamicPropertyFactory - DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration@56cbfb61
[main] INFO com.netflix.dyno.contrib.ArchaiusConnectionPoolConfiguration - Dyno configuration: CompressionStrategy = NONE
[main] WARN com.netflix.dyno.jedis.DynoJedisClient - DynoJedisClient for app=[DynomiteClusterChecker] is configured for local rack affinity but cannot determine the local rack! DISABLING rack affinity for this instance. To make the client aware of the local rack either use ConnectionPoolConfigurationImpl.setLocalRack() when constructing the client instance or ensure EC2_AVAILABILTY_ZONE is set as an environment variable, e.g. run with -DEC2_AVAILABILITY_ZONE=us-east-1c
[main] INFO com.netflix.dyno.jedis.DynoJedisClient - Starting connection pool for app DynomiteClusterChecker
[pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Adding host connection pool for host: Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up]
[pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Priming connection pool for host:Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up], with conns:3
[pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Successfully primed 3 of 3 to Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up]
[main] WARN com.netflix.dyno.connectionpool.impl.lb.AbstractTokenMapSupplier - Local Datacenter was not defined
[main] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - registered mbean com.netflix.dyno.connectionpool.impl:type=MonitorConsole
[Thread-1] INFO com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Shutting down connection pool for host:Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up]
[Thread-1] WARN com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up] Unexpected end of stream.
[Thread-1] WARN com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up] Unexpected end of stream.
[Thread-1] WARN com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=127.0.0.1, ipAddress=127.0.0.1, port=8102, rack: rack1, datacenter: local-dc, status: Up] Unexpected end of stream.
[Thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Remove host: Successfully removed host 127.0.0.1 from connection pool
[Thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - deregistered mbean com.netflix.dyno.connectionpool.impl:type=MonitorConsole

Cheers,
Diego Pacheco

@diegopacheco
Copy link
Author

diegopacheco commented Feb 3, 2017

@ipapapa @shailesh33

I found a way to make it work. What I changed was this:

Looks like for Dyno RACK is == DC which is weird IMHO previous code should work.

I had to change the code in 2 places.

  1. In HostSupplier when you create a HOST RACK is set up with DC value.
private static Host buildHost(DynomiteNodeInfo node){
		Host host = new Host(node.getServer(),8102,node.getDc());
		host.setStatus(Status.Up);
		return host;
	}
  1. TokenMapSupplier I made zone == DC instead of RACK.
	private static TokenMapSupplier toTokenMapSupplier(List<DynomiteNodeInfo> nodes){
		StringBuilder jsonSB = new StringBuilder("[");
		int count = 0;
		for(DynomiteNodeInfo node: nodes){
			jsonSB.append(" {\"token\":\""+ node.getTokens() 
			                + "\",\"hostname\":\"" + node.getServer() 
							+ "\",\"zone\":\"" +  node.getDc()
							+ "\"} ");
			count++;
			if (count < nodes.size())
				jsonSB.append(" , ");
		}
		jsonSB.append(" ]\"");

Logs - Working :-)

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/diego/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-simple/1.7.21/be4b3c560a37e69b6c58278116740db28832232c/slf4j-simple-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/diego/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-log4j12/1.7.21/7238b064d1aba20da2ac03217d700d91e02460fa/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
[main] WARN com.netflix.config.sources.URLConfigurationSource - No URLs will be polled as dynamic configuration sources.
[main] INFO com.netflix.config.sources.URLConfigurationSource - To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
[main] INFO com.netflix.config.DynamicPropertyFactory - DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration@56cbfb61
[main] INFO com.netflix.dyno.contrib.ArchaiusConnectionPoolConfiguration - Dyno configuration: CompressionStrategy = NONE
[main] WARN com.netflix.dyno.jedis.DynoJedisClient - DynoJedisClient for app=[DynomiteClusterChecker] is configured for local rack affinity but cannot determine the local rack! DISABLING rack affinity for this instance. To make the client aware of the local rack either use ConnectionPoolConfigurationImpl.setLocalRack() when constructing the client instance or ensure EC2_AVAILABILTY_ZONE is set as an environment variable, e.g. run with -DEC2_AVAILABILITY_ZONE=us-east-1c
[main] INFO com.netflix.dyno.jedis.DynoJedisClient - Starting connection pool for app DynomiteClusterChecker
[pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Adding host connection pool for host: Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up]
[pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Priming connection pool for host:Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up], with conns:3
[pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Successfully primed 3 of 3 to Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up]
[main] WARN com.netflix.dyno.connectionpool.impl.lb.AbstractTokenMapSupplier - Local Datacenter was not defined
[main] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - registered mbean com.netflix.dyno.connectionpool.impl:type=MonitorConsole
Z: 200
[Thread-1] INFO com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Shutting down connection pool for host:Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up]
[Thread-1] WARN com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up] Unexpected end of stream.
[Thread-1] WARN com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up] Unexpected end of stream.
[Thread-1] WARN com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Failed to close connection for host: Host [hostname=127.0.0.1, ipAddress=null, port=8102, rack: local-dc, datacenter: local-d, status: Up] Unexpected end of stream.
[Thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Remove host: Successfully removed host 127.0.0.1 from connection pool
[Thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - deregistered mbean com.netflix.dyno.connectionpool.impl:type=MonitorConsole

Cheers,
Diego Pacheco

@ipapapa
Copy link
Contributor

ipapapa commented Mar 17, 2017

Diego you got it ready on the way to provide the DC. Dyno reads the environmental variable from the instance and uses that to determine the datacenter:

public static String getLocalZone() {
String az = System.getenv("EC2_AVAILABILITY_ZONE");
if (az == null) {
az = System.getProperty("EC2_AVAILABILITY_ZONE");
}
return az;
}
/**
*
* @return the datacenter that the client is in
*/
public static String getDataCenter() {
// first try with getEnv
String dc = System.getenv("EC2_REGION");
if (dc == null) {
// then try with getProperty
dc = System.getProperty("EC2_REGION");
}
if (dc == null) {
return getDataCenterFromRack(getLocalZone());
} else {
return dc;
}
}

You can see that in the WARN message you are getting:

[main] WARN com.netflix.dyno.jedis.DynoJedisClient - DynoJedisClient for app=[DynomiteClusterChecker] is configured for local rack affinity but cannot determine the local rack! DISABLING rack affinity for this instance. To make the client aware of the local rack either use ConnectionPoolConfigurationImpl.setLocalRack() when constructing the client instance or ensure EC2_AVAILABILTY_ZONE is set as an environment variable, e.g. run with -DEC2_AVAILABILITY_ZONE=us-east-1c

Please free to ask any further questions or close the issue if your question has been answered.

@diegopacheco
Copy link
Author

I see @ipapapa

This makes sense and we use this in PROD so you can do PREFERABLE ZONE where if your microservice is running on us-west-2a and you have Dynomite us-weest-2a you should pick us-west-2a instead of other zone or region.

This code I sent was just a POC ro proof a point. The point here is IMHO looks wrong to me:

  1. In HostSupplier when you create a HOST RACK is set up with DC value.
  2. TokenMapSupplier I made zone == DC instead of RACK.

@diegopacheco
Copy link
Author

diegopacheco commented Apr 5, 2017

I think maybe there is a corner case here. IF I have a Karyon microservice it works however If I do a java main class does not work unless I rebuild the whole dyno connection.

This code can show the issue: https://github.com/diegopacheco/netflixoss-pocs/tree/master/dynomite-client-dyno-notrca. IF I shut down 1 or my dynamite nodes(assuming 3 nodes cluster in docker) dyno is not failing over to the other nodes unless I rebuild the connection... Strangely this works on this simple microservice https://github.com/diegopacheco/netflixoss-pocs/tree/master/karyon-dyno-microservice

First I get this(Right after killing a node)

com.netflix.dyno.connectionpool.exception.FatalConnectionException: FatalConnectionException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=1]redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.

Some time later - I start getting this...

com.netflix.dyno.connectionpool.exception.PoolOfflineException: PoolOfflineException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]host pool is offline and no Racks available for fallback
I debug ConnectionPoolImpl

Line:283
RetryPolicy retry = cpConfiguration.getRetryPolicyFactory().getRetryPolicy();

There is :  cpConfiguration.getRetryFactory() == RetryONCE == 1

This could be the issue my .setRetryPolicyFactory(new RetryNTimes.RetryFactory(3,true)) not being applied and that wht no fallback. 

IF I do: 
ConfigurationManager.getConfigInstance().setProperty("dyno.dynomiteCluster.retryPolicy","RetryNTimes:3:true");

I get the proper retry count.... but still not falling over as I expect. 

@diegopacheco
Copy link
Author

@ipapapa

You can close this BUG. It works for me now https://github.com/diegopacheco/netflixoss-pocs/tree/master/karyon-dyno-microservice

The problem was that I was not setting LOCAL_RACK and without local_rack the retry/fallback does not happen since there is no remote hosts to burrow connection from. As soon as I set the local rack all starter to work just fine.

I did it doing this : https://github.com/diegopacheco/netflixoss-pocs/blob/master/karyon-dyno-microservice/src/main/java/com/github/diegopacheco/sandbox/java/netflixoss/dyno/msa/rest/DynoManager.java#L52https://github.com/diegopacheco/netflixoss-pocs/blob/master/karyon-dyno-microservice/src/main/java/com/github/diegopacheco/sandbox/java/netflixoss/dyno/msa/rest/DynoManager.java#L52

Thanks anyways @ipapapa

You can close this now.

@ipapapa ipapapa closed this as completed May 25, 2017
@ipapapa
Copy link
Contributor

ipapapa commented May 25, 2017

Actually, you can also set the environment variables to avoid changing the code between local and Karyon. I use the following in my laptop -DEC2_AVAILABILITY_ZONE=us-east-1c -DNETFLIX_ENVIRONMENT=test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants