Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket #286

Closed
kazu-yamamoto opened this issue Dec 28, 2017 · 17 comments
Closed

Socket #286

kazu-yamamoto opened this issue Dec 28, 2017 · 17 comments

Comments

@kazu-yamamoto
Copy link
Collaborator

@eborden Would you explain why the socket package is fast? I have read the source but I don't understand the key idea.

@eborden
Copy link
Collaborator

eborden commented Dec 28, 2017

@kazu-yamamoto I wrote this up a while ago when @tfausak asked me for a compare and contrast:


There are a few stark and interesting differences.
The primary point of interest where everything jumps off is the Socket type.
network:

data Socket
  = MkSocket
            CInt                 -- File Descriptor
            Family
            SocketType
            ProtocolNumber       -- Protocol Number
            (MVar SocketStatus)  -- Status Flag

socket

newtype Socket f t p
      = Socket (MVar Fd)

The socket library does a really nice job of removing some cruft from network. Instead of explicitly carrying Family, SocketType and ProtocolNumber, socket moves these to the type level and dispatches their values statically via type classes, this is a very nice win for memory residency, socket is just a newtype around Mvar.

socket moves the file descriptor into an Mvar, while network just holds a raw pointer. This means that socket has more predictable concurrency behavior since we need to take the MVar to interact with the file descriptor. Like I previously mentioned this could cause some latency, but it probably isn't a huge issue (some benchmarks would be nice).

The other difference is that socket doesn't carry a status. In network we occasionally check the status to make sure we aren't doing anything unsafe, but network also throws in these cases (not that nice). We also don't check the status reliably or consistently. For example you can write to a closed socket and if the OS has decided to reuse that file descriptor...well...undefined behavior. I think throwing this out is probably okay, we don't have nice ways of safely handling this yet without lots of code bloat. This is where I'd actually like to see linear types to enforce these invariants.

The design decision of pushing socket descriptions to the type level also has some other nice benefits. In network there are areas where we need to check that you are operating on the correct type of socket and socket can rules these out with constraints or explicit types. One prime example is socketPair, you can only safely pair the same types of sockets, in network we just throw if you have done this wrong, this would be impossible in socket.

Though it seems socket is missing pair. I think this is likely due to one of the libraries philosophies "No #ifdef madness in the Haskell sources." socket has chosen to just implement the safe subset of POSIX. network covers many other extensions and unique variations. This has lead to some crazy CPP, but support for more platforms exists. There is a maintenance burden though and we get into the BSD vs GNU land of live on every env partially broken or only target ones with compliance.

In general I like the look of this library.


There was some experimentation with socket in http-client snoyberg/http-client#306

More would have to be done. I'm not really convinced that socket is faster, but I think pushing a fair amount of the book keeping in to the type level provides very nice ergonomics.

@kazu-yamamoto
Copy link
Collaborator Author

Thank you for your explanation. I knew you wrote this somewhere but I lost the reference. And I understand "faster" is my misunderstanding.

In my opinion, the approach of the socket package is good since users can extend all parameters such as socket family. But introducing the approach to the network package breaks backward compatibility. I don't know it's worth trying.

@eborden
Copy link
Collaborator

eborden commented Dec 29, 2017

@kazu-yamamoto Yeah, I'd only really embark on this level of breakage in a epoch version, like 3.0.0.0. I'm also not sure if it is worth it 😄

I had started a branch to experiment with this, but the major changes in module structure would make that difficult to merge or compare. I'll likely open another branch to continue experimenting. That would at least give us a full view of the breakage. The new module structure should actually make this work more digestible!

@kazu-yamamoto
Copy link
Collaborator Author

@eborden Do you know whether or not sockets (ie CInt) can be duplicated like file descriptors? If not, we cannot implement efficient proxys with this approach. In a proxy, one thread reads data from a socket and another thread writes data to the socket. If we use MVar for CInt, the two threads cannot work independently even if we have two cores. Please refer to #277 to see the entire proxy code.

@eborden
Copy link
Collaborator

eborden commented Dec 31, 2017

@kazu-yamamoto That is a valid concern. That would certainly exacerbate contention issues in a MVar based implementation.

@lpeterse
Copy link

lpeterse commented Jan 3, 2018

Hi there (@eborden, @kazu-yamamoto),

I took the time to write a benchmark comparing socket with network with regard to the congestion on the MVar when running strong full-duplex load: https://github.com/lpeterse/network-socket-performance

Here's the result:

half-duplex/full-duplex network socket network (threaded) socket (threaded)
chunkSize=40 bytes 226ms / 453ms 285ms / 1013ms 374ms / 833ms 553ms / 1223ms
chunkSize=4000 bytes 20ms / 38ms 21ms / 47ms 21ms / 42ms 21ms / 41ms

Yes, the concern is valid, but the penalty is still within the same order of magnitude. In general, socket seems to perform slightly worse than network (this is in agreement with the findings by @tfausak).

I wonder whether socket can be tuned to be en par with network - maybe it's not the MVar that's causing the difference. A solution for the congestion problem might be to introduce a second MVar: one for reading and one for writing. Operations like close etc would then be required to acquire both locks before doing anything. I still think the MVar approach is a really good one wrt to safety and fool-proofness.

SocketStatus state tracking: My opinion is DON'T. I thought about it when I wrote socket and I came to the conclusion that this was one bad design decision. For one, not every socket has the concept of a connection and furthermore it is IMHO impossible to keep this in sync with the kernels representation.
The only status one should keep track of is whether the socket is closed or not in order to avoid reuse after free. All other status information can be gathered from the kernel representation via system calls.

I also want to say something more general wrt to the socket vs network issue:

I wrote socket in response to this reddit thread (https://www.reddit.com/r/haskell/comments/2o5558/is_network_library_poorly_implemented_or_am_i). People proposed a general overhaul of the network library, but there were concerns with breaking existing code (even books reference the network library). Also, even a new network major version would cause havoc as one cannot use two versions of the same library in the same project and people might want to do this until all packages depending on network have been migrated to use the "rewrite". I intended socket to be this rewrite with a new clean API and the use of type-safety features like data families and phantom types. As some of its design found approval, I would like to make the unobtrusive request to improve and extend socket to fit all users needs rather than just adopting its ideas for a rewrite.

The following things might be missing right now:

  • Vectored IO: I had doubt about portability, so I left it out for now. I also wonder whether this is really useful as most people will use something like bytestring builders and assemble properly sized chunks in user space.
  • More protocols: There is a socket-sctp as well as a socket-unix package, but people might miss other.
  • sendFile: Should not be too hard to add.

Finally, I honor and appreciate everyone's work taking place right now in refactoring the network library and cleaning up the pile of issues :-)

@kazu-yamamoto
Copy link
Collaborator Author

I wonder whether socket can be tuned to be en par with network - maybe it's not the MVar that's causing the difference. A solution for the congestion problem might be to introduce a second MVar: one for reading and one for writing. Operations like close etc would then be required to acquire both locks before doing anything. I still think the MVar approach is a really good one wrt to safety and fool-proofness.

Yeah. Do dup for CInt and protect two CInts with two MVars. I guess that is good enough.

@kazu-yamamoto
Copy link
Collaborator Author

SocketStatus state tracking: My opinion is DON'T. I thought about it when I wrote socket and I came to the conclusion that this was one bad design decision. For one, not every socket has the concept of a connection and furthermore it is IMHO impossible to keep this in sync with the kernels representation.

I agree. I guess that SocketStatus was necessary for the Network module. But for Network.Socket, it is not necessary.

Also, the Socket type is a bad design. Users cannot extend families by themselves.

@kazu-yamamoto
Copy link
Collaborator Author

sendFile: Should not be too hard to add.

We can use simple-sendfile. :-)

@kazu-yamamoto
Copy link
Collaborator Author

I will spend to refactor network for a while.
But I guess we should encourage users to migrate from network to socket instead of introducing big breaking changes to network.

@eborden What do you think?

@eborden
Copy link
Collaborator

eborden commented Jan 5, 2018

@kazu-yamamoto I'm not sure encouraging a transition to socket is possible right now. The design goals of socket do not seek the same level of support, https://github.com/lpeterse/haskell-socket

@kazu-yamamoto
Copy link
Collaborator Author

@eborden Probably we should consider a migration path from network to socket while maintaining network.

@kazu-yamamoto
Copy link
Collaborator Author

My idea:

  • Keep the Socket abstract type. We should clean up the internal structure such as state.
  • Define a class, say, Socketable which has a method, socketFd :: s -> MVar CInt. Let Socket be an instance of Socketable.
  • Relax APIs. Eg. send :: Socketable s => s -> ByteString -> IO Int.

If we make Socket stateless, we should remove withConnectedScoket.

@kazu-yamamoto
Copy link
Collaborator Author

@eborden Please read https://github.com/haskell/network#milestones
What do you think?

@kazu-yamamoto
Copy link
Collaborator Author

@eborden I made PoC in https://github.com/kazu-yamamoto/network/tree/epoch

Not completed but you can see types of some APIs are relaxed.

@kazu-yamamoto
Copy link
Collaborator Author

I would close this.
Let's discuss on #295.

@vdukhovni
Copy link

@lpeterse SocketStatus state tracking: My opinion is DON'T. I thought about it when I wrote socket and I came to the conclusion that this was one bad design decision. For one, not every socket has the concept of a connection and furthermore it is IMHO impossible to keep this in sync with the kernels representation. The only status one should keep track of is whether the socket is closed or not in order to avoid reuse after free. All other status information can be gathered from the kernel representation via system calls.

My instinct is to strongly disagree with the above. It is not possible to track the sock handle open/close status just in the OS, because the OS representation of a socket (the file descriptor) is reused after close. For safety, just as with Handle the Haskell context for an O/S socket needs to remember which sockets it closed until the socket is GC'd. With that, one could actually have a safe finalizer.

Whether hiding the FD inside an IORef is better than keeping a separate mutable status (checked only on close) depends on whether you want to also invalidate all other operations on close (read, write, ...) by invalidating the FD (on WIndows the magic value is INVALID_SOCKET instead of just -1).

Keeping the FD in an IORef (less contention than MVAR, safer than CInt) would be my starting point, and then change it only on evidence that there's a much better alternative that does not compromise too much safety.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants