Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IORef based socket #303

Merged
merged 12 commits into from
Feb 6, 2018
Merged

Conversation

kazu-yamamoto
Copy link
Collaborator

This implements the idea described in #302.

@kazu-yamamoto kazu-yamamoto requested a review from eborden February 2, 2018 06:47
@kazu-yamamoto
Copy link
Collaborator Author

Cc: @vdukhovni

@@ -17,7 +18,7 @@ import Network.Socket.Types

socketToHandle :: Socket -> IOMode -> IO Handle
socketToHandle s mode = do
let fd = fromIntegral $ fdSocket s
fd <- fromIntegral <$> fdSocket s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should invalidate the socket's file descriptor making shoud avoid behaviour impossible. Basically, emulate close, but without closing the socket's file descriptor. Or if you prefer (where available) dup(2) the socket's file descriptor, and close the socket.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, replace that should with a MUST. Given that Sockets (in this branch of the code) have finalizers that automatically close them, and doing so would then break the Handle, and trigger a second close outside the IORef guard in sockets, this is NOT optional. You must either invalidate the socket, or at least dup the file descriptor and pass that to the handle, but invalidation is I think more appropriate here, since continued I/O via the socket is not supported or expected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
And done in 7e6eea8

Copy link
Collaborator

@eborden eborden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many operations in the library that now let you operate on a file descriptor of -1. Do we want to intercept the possible error here and display an error about the socket being closed?

Something like:

fdSocketOrThrow :: Socket -> IO CInt
fdSocketOrThrow s = do
  fd <- fdSocket s
  if fd == (-1)
    then throw ioError "yada yada attempting to operate on a closed socket"
    else pure fd

unsafeFdSocket = unsafePerformIO . fdSocket
{-# NOINLINE unsafeFdSocket #-}

-- | Creating a socket form a file descriptor.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo form to from

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

close (Socket ref) = do
oldfd <- atomicModifyIORef' ref $ \cur -> (-1, cur)
when (oldfd /= -1) $
closeFdWith (void . c_close . fromIntegral) (fromIntegral oldfd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to throw an error when attempting to close an already closed socket? I can see the argument for both ends of this.

  1. The user might want to know when they are closing a socket that has already been closed. This is available in the underlying api:

close() returns zero on success. On error, -1 is returned, and errno is set appropriately.

  1. Running close on an already closed socket is a safe enough operation. Throwing an exception seems overly explosive.

Maybe we need close', which returns a value indicating whether the socket has been closed or not. The possibilities are binary, but a Bool doesn't really carry enough context to be useful here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this close is to be safe for finalization, it MUST NOT throw exceptions. So if exceptions are to be thrown, there would need to be a second (need not be exported) close' for finalization.

newtype Socket = Socket (IORef CInt)

instance Show Socket where
show s = "<socket " ++ show (unsafeFdSocket s) ++ ">"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that unsafeFdSocket s is a good idea here. Other than distinguishing different sockets, the actual value of fd is not actually that interesting.

To me, it would make more sense to show the Socket type and address family, and ideally if bound the local address, if connected remote address... But some of those properties mutate over time and require IO to query. Perhaps, to avoid the I/O barrier the "show" string could be computed once when the socket is created? Sockets are expensive enough to create that carrying an extra string should not be burdensome. The string could be a ByteString if that's more efficient to store.

   <socket STREAM AF_INET _fd_>
   <socket DGRAM AF_INET6 _fd_>
   <socket STREAM UNIX _fd_>
   ...

Are there backwards compatibility requirements to retain the legacy Show format? Note, I'm not sure whether my alternative suggestion is wise.

Perhaps a minimal Show implementation is fine, and a more complete socket description could be provided as an IO action, that would also display the local address, the remote address (if connected), ...

displaySocket :: Socket -> IO String

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps even a ShortByteString

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I think it is best to avoid unsafePerformIO in the Show implementation. Precomputing the string also means that we can display it even for closed sockets (where it is the description of the socket when it was opened)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precomputing is done in 40801af
I ensured that show string is exactly the same as the previous one.

@vdukhovni
Copy link

Is CInt representation of Sockets with -1 as the invalid sentinel value also to be used on Windows? Windows sockets are unsigned ints with INVALID_SOCKET instead of -1 as the sentinel value. INVALID_SOCKET happens to be (~0) thus same as (unsigned) -1 on 2's complement machines, so not sure whether we should do anything special here or not.

hSetBuffering h NoBuffering
return h
where
err _ = ioError $ userError $ "socketToHandle: already a Handle"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more accurate error message might be "socket is no longer valid", since it might also have been closed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. But I kept the old error message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the mStat MVar distinguished between sockets that got converted to handles and other states (e.g. already closed). We no longer do that, so perhaps the message could be more general to avoid misleading error messages that don't match the symptoms... Not critical, but something to consider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Probably, I was too nervous. I changed it to "socket is no longer valid".

invalidateSocket (Socket ref) errorAction normalAction = do
oldfd <- atomicModifyIORef' ref $ \cur -> (-1, cur)
if oldfd == -1 then errorAction oldfd else normalAction oldfd

-----------------------------------------------------------------------------

-- | Close the socket. Sending data to or receiving data from closed socket
-- may lead to undefined behaviour.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out, we no longer have that problem. When a socket is closed, any later I/O operations will encounter an invalid fd and raise an error (bad file descriptor or similar). The only issue is that in multi-thread programs if one thread is doing I/O and another is closing the socket, then the thread that got the fd just before it was closed, might be using a stale, still valid value.

This is in a semantic improvement. We now have safety for sequential operations. And only risk undefined behaviour in the unlikely case that a thread is closing a socket while another thread is using it.

@@ -74,27 +73,24 @@ import Network.Socket.Imports
-----------------------------------------------------------------------------

-- | Basic type for a socket.
newtype Socket = Socket (IORef CInt)
data Socket = Socket (IORef CInt) String

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you decided to go with String rather than ShortByteString or similar. Perhaps slightly more expensive, but easily changed later, without compatibility issues. Are there any module dependency barriers here (circular dependencies or dependencies we want to avoid) to changing this later? Am I wrong about String being a "more expensive" representation to carry along with the socket?

Of course this is unlikely to matter much...

Copy link
Collaborator Author

@kazu-yamamoto kazu-yamamoto Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHC 7.6 uses bytestring-0.10.0.2 which does not provide ShortByteString unfortunately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to carry around an ascii encoded ByteString and pack it whenever it needs to be shown? Utilizing String here means we are introducing a fair amount of indirection and bloat for a Show instance.

Copy link

@vdukhovni vdukhovni Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my instinct also. (Though of course ByteString show is unpack, but that's just detail).
Perhaps the approach should instead be to store the minimal information needed to avoid having to use unsafePerformIO later. To that end, just the lazy byte string form of the fd is enough.
We could even store an originalFd as a Word32 value, whose different type should be enough to discourage misuse (bypassing the IORef).

data Socket = Socket (IORef CInt) Word32
...
mkSocket :: CInt -> IO Socket
mkSocket fd = do
    ref <- newIORef fd
    let s = Socket ref $ fromIntegral fd
    void $ mkWeakIORef ref $ close s
    return s

and then in the show instance use that:

   show (Socket _ wfd) = "<socket: " ++ show wfd ++ ">"

There's probably no need to pre-format the whole string (was my suggestion, sorry). We could keep in mind that Haskell is lazy, and so the (current) string would I think remain an unevaluated thunk until actually used. Not sure whether the thunk is cheaper to keep around than the string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to data Socket = Socket (IORef CInt) CInt.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. The field is not exported, and the module in which it is visible is small. So the immutable original fd is unlikely to get misused.

@vdukhovni
Copy link

I no longer see any major issues, looks OK to me. At this point any wiser feedback should ideally come from folks deeply versed in Haskell issues, rather than my system programming background.

If anyone else is following along, what do you think of adding the finalizer? Is it a good idea? Or just unnecessary overhead? Either way a mutable fd is I think a win, as it makes sockets safer against various types of misuse. The finalizer could be a luxury, if everyone had been getting along without it all this time... I have no more answers, only questions. Thanks for taking care of the issues I found.

@kazu-yamamoto
Copy link
Collaborator Author

I no longer see any major issues, looks OK to me. At this point any wiser feedback should ideally come from folks deeply versed in Haskell issues, rather than my system programming background.

Since 2.7.0.0 deprecates many APIs, we need to wait for a long time to release 3.0.0.0. Maybe a half or full year.

@kazu-yamamoto
Copy link
Collaborator Author

There are many operations in the library that now let you operate on a file descriptor of -1. Do we want to intercept the possible error here and display an error about the socket being closed?

Something like:

fdSocketOrThrow :: Socket -> IO CInt
fdSocketOrThrow s = do
  fd <- fdSocket s
  if fd == (-1)
    then throw ioError "yada yada attempting to operate on a closed socket"
    else pure fd

I don't have a strong opinion. But when (-1) returned by fdSocket is used in following syscalls, another error occurs eventually. So, unless there are big problems, I would keep this at this moment.

@vdukhovni
Copy link

vdukhovni commented Feb 5, 2018

It is probably OK to let the OS return EBADF for now. If someone finds a compelling reason to check at the Haskell layer, this can be done later. The main win here is that we no longer try to use a previously closed socket's fd. That may even uncover the culprit in the open issue in:

snoyberg/http-client#252
haskell-tls/hs-tls#179

Since perhaps now attempts to use such sockets might be caught a bit earlier with a less confusing context. Or at least the hypothesis that closed sockets are being re-used might be confirmed.

@kazu-yamamoto
Copy link
Collaborator Author

@eborden All issues have been fixed, I believe. Please review this again.

Copy link
Collaborator

@eborden eborden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

kazu-yamamoto added a commit to kazu-yamamoto/network that referenced this pull request Feb 6, 2018
@kazu-yamamoto kazu-yamamoto merged commit d555e65 into haskell:master Feb 6, 2018
@kazu-yamamoto kazu-yamamoto deleted the socket-ioref branch February 6, 2018 03:02
@kazu-yamamoto
Copy link
Collaborator Author

@eborden @vdukhovni Thank you for your comments and review. Merged.

@vdukhovni
Copy link

Thanks. You're welcome.

@kazu-yamamoto kazu-yamamoto mentioned this pull request Feb 7, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants