Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The first of multiple enabled Transports is used #7378

Closed
DSanchen opened this issue Nov 5, 2024 · 13 comments · Fixed by #7393
Closed

The first of multiple enabled Transports is used #7378

DSanchen opened this issue Nov 5, 2024 · 13 comments · Fixed by #7393

Comments

@DSanchen
Copy link

DSanchen commented Nov 5, 2024

Version Information
Version of Akka.NET? - 1.5.30
Which Akka.NET Modules? - Akka.Remote, Akka.Hosting

Describe the bug
It seems to me that the remote actor system chooses the first enabled transport as the (only) one to communicate payload data to a local actor system, no matter which transport the local actor system specified in the actor selection.

To Reproduce

I tested four setups with two Transports: using my own NamedPipe-Transport (that - as the name suggests - uses Windows Named Pipes as communication medium, specifying "akka.pipe" as protocol) AND the default DotNetty.TCP (with the default akka.tcp protocol). Both the remote and the local actor system enable the two transports. On the remote actor system I switch the order of the enabled transports, in the local actor system I use the actor selection based on the different transport-protocols. I'm tracing network traffic with Wireshark looking at the loopback interface (as my tests are on the same machine), filtering on the port that I specify in the actor system configuration (e.g. port 5001).

  1. Remote actor system enabled-transports [ pipe, tcp ], client actor selection is akka.pipe://... - Sending this actor messages, I CANNOT see any data with wireshark - This is what I would expect, as all communication is done over "my" named pipe transport.
  2. Remote actor system enabled-transports: [ pipe, tcp ], client actor selection is akka.tcp://... - Sending this actor messages, I CANNOT see payload data with wireshark on port 5001. What I DO see is probably the association process, and some sort of heartbeat Messages every few seconds. Here I would expect, that all communication would be done using tcp as I selected akka.tcp from the local actor system.
  3. Remote actor system enabled-transports: [ tcp, pipe ], client actor selection is akka.tcp://... - Sending this actor messages, I see association, payload data and heartbeat messages - This is what I would expect, as I specified akka.tcp from the local actor system.
  4. Remote actor system enabled-transports: [ tcp, pipe ], client actor selection is akka.pipe://... - Sending this actor messages, I AGAIN see all communication data (association, payload, heartbeat) in wireshark. Here I would have expected to see no communication at all, as the local actor system wanted to use the akka.pipe protocol.

Environment
Windows, .NET 8.0

Additional context
Problem is: When the remote actor system specifies enabled-transports: [ pipe, tcp ] and the local actor system (this time running on another machine) wants to connect to the remote actor system using actor selection as akka.tcp://... no communication at all takes place, as (obviously) no named pipe can be established between the two machines.

Am I doing something wrong or have I misunderstood the Idea of having multiple transports ?

@Aaronontheweb
Copy link
Member

Interesting - the addressing system into Akka.Remote's TransportManager shouldn't allow this, because it uses the scheme on the address to determine which set of remoting actors handle the transmission. If you turn on verbose debugging in your reproduction, which actors are logging activity when you do your sends in scenarios 1-4?

akka {
  loglevel = "DEBUG"      # Sets the global log level to DEBUG for more detailed logging.
  log-config-on-start = on # Log the full configuration on startup for debugging purposes.
  
  actor {
    debug {
      receive = on                # Enable detailed logging for received messages.
      autoreceive = on            # Enable logging for auto-received messages (e.g., Lifecycle hooks).
      lifecycle = on              # Enable logging for actor lifecycle events.
      unhandled = on              # Log warnings for unhandled messages.
      fsm = on                    # Enable logging for FSM (Finite State Machine) transitions.
      event-stream = on           # Enable event stream logging.
    }
  }

  remote {
    log-remote-lifecycle-events = on  # Log lifecycle events for remote actors (e.g., remote association).
    log-remote-events = on            # Additional remote event logging (useful for debugging).
  }
}

@DSanchen
Copy link
Author

DSanchen commented Nov 7, 2024

I collected the Logfiles for the scenarios, and the wireshark traces (see attached Remoting Scenarios.zip).
But maybe it's more the actor selection, that is causing the issue: while the ActorSelection has an anchor of akka.pipe://... the actor which is resolved using ResolveOne() has a path of akka.tcp://...:

ActorSelection

Remoting Scenarios.zip

@Aaronontheweb
Copy link
Member

Aaronontheweb commented Nov 7, 2024

Oh boy, I think I know what the problem is - it's not the outbound transmission on the ActorSelection, is the computation of which address to use for the Sender. The actor on the other side of the wire sending the reply back is doing the right thing - it's the sender that's at fault. So I think the issue here is going to stem from the way we select which transport to use inside the serialization system.

@Aaronontheweb
Copy link
Member

This is the issue

public Address DefaultAddress { get { return Transport?.DefaultAddress; } }

Followed by

_defaultAddress = akkaProtocolTransports.Head().Address;
_addresses = new HashSet<Address>(akkaProtocolTransports.Select(x => x.Address));

We always select the address at the front of the transport list to be the default Address for the ActorSystem - and that's what's being generated when the ActorSystem that is doing the sending is trying to encode the reply-to address for the Sender.

@Aaronontheweb
Copy link
Member

Switching between possible addresses is probably the right way to solve this problem, but the issue (from my side of the table) is that:

  1. It makes the remoting code more complicated
  2. It makes the hottest possible path in Akka.NET (the remoting pipeline) have even more overhead potentially.

I'll mull it over, but I think there might be a way we can minimize the costs of the latter so it doesn't have an adverse effect on performance. I agree that this is a bug though and it should be fixed.

@DSanchen
Copy link
Author

DSanchen commented Nov 8, 2024

Hi Aaron,
wow that's an incredibly fast response time, I very much appreciate that 👍👍
Please let me know when I can do anything to help.

@DSanchen
Copy link
Author

Other observation:
When remotely deploying an actor to the Server with the remote address being akka.tcp://... AND the Server side enabled-protocols = [ akka.remote.pipe, akka.remote.dot-netty.tcp ], the coordinated shutdown of the Client (the deployer) fails with a timeout. This is also the case when remotely deploying to akka.pipe://... AND the Server side enabled-transports = [ akka.remote.dot-netty.tcp, akka.remote.pipe ]. The communication in these cases occurrs over the protocol, that was used for deploying the remote server. (I attached the logfiles of both server and client, but only for the first case - client remote deploying to akka.tcp://...)
Remote Deployment - Logs.zip

I guess (or hope) that this is just another incarnation of this same issue ??

@Aaronontheweb
Copy link
Member

Yes I think so @DSanchen - and I have some good news: I am pretty sure this is an isolatable issue inside the EndpointWriter, the actor that performs outbound serialization over Akka.Remote. This is good news because it means fixing this issue shouldn't have any knock-on performance impact.

The probably ultimately gets encoded here:

private static ActorRefData SerializeActorRef(Address defaultAddress, IActorRef actorRef)
{
return new ActorRefData()
{
Path = (!string.IsNullOrEmpty(actorRef.Path.Address.Host))
? actorRef.Path.ToSerializationFormat()
: actorRef.Path.ToSerializationFormatWithAddress(defaultAddress)
};
}

But I think it's the wrong address being fed into this actor further up the food chain - this code is just doing what it's told. I am pretty confident we could write a reproduction spec for this and track the problem down that way.

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Nov 17, 2024
@Aaronontheweb
Copy link
Member

Added a reproduction here #7391 - bad news - I was not able to reproduce your issue, so this might actually be a bug with your Named Pipe transport implementation.

You can copy my spec and apply it to your transports - let me know if you get a different result! In the meantime, I can also try tweaking this spec (or RemotingSpec) to use the DotNetty transport and see if that's an issue, but I don't think so.

Aaronontheweb added a commit that referenced this issue Nov 17, 2024
Added to prove the existence of #7378
@DSanchen
Copy link
Author

DSanchen commented Nov 18, 2024

In PingAndVerify():

  // Replacing this
  // selection.Tell("ping", TestActor);

  // with this, I can reproduce the failure with the TestTransport
  var actor = await selection.ResolveOne(TimeSpan.FromSeconds(1));
  actor.Tell("ping", TestActor);

And, yes, when using the ActorSelection-Object instead of a ResolveOne()-ed IActorRef in my client test app (using my pipe-transport), it works...

@Aaronontheweb
Copy link
Member

@DSanchen I'll go give that a try and see what I find!

@Aaronontheweb
Copy link
Member

Yep, I can reproduce your failure - so there's 100% a real bug in Akka.Remote then. THANK YOU for helping my tweak the test to find it.

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Nov 18, 2024
Need to ensure we use `RemoteActorRef` instead of `ActorSelection` in order to properly reproduce bug
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue Nov 18, 2024
@Aaronontheweb
Copy link
Member

I think I have this issue fixed here #7393 - waiting to see what the build system reports back on the other tests, but that fix definitely fixed the reproduction spec

@Aaronontheweb Aaronontheweb added this to the 1.5.32 milestone Nov 18, 2024
Arkatufus pushed a commit that referenced this issue Nov 19, 2024
…ress` when using multiple transports (#7393)

* harden specs for #7378

Need to ensure we use `RemoteActorRef` instead of `ActorSelection` in order to properly reproduce bug

* changed up `ActorSystem` names to make it easier to debug

* fixed outbound address serialization

close #7378

* fixed compilation issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants