Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use socket.sendall to send packets #984

Merged
merged 1 commit into from
Jan 22, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions dmoj/packet.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,6 @@ def _connect(self):

log.info('Starting handshake with: [%s]:%s', self.host, self.port)
self.input = self.conn.makefile('rb')
self.output = self.conn.makefile('wb', 0)
self.handshake(problems, versions, self.name, self.key)
log.info('Judge "%s" online: [%s]:%s', self.name, self.host, self.port)

Expand Down Expand Up @@ -238,7 +237,8 @@ def _send_packet(self, packet: dict):
raw = zlib.compress(utf8bytes(json.dumps(packet)))
with self._lock:
try:
self.output.writelines((PacketManager.SIZE_PACK.pack(len(raw)), raw))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python is supposed to send everything with this call. What exactly is the problem you are seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We host a very large number of problems, so the handshake packet is quite large (over 100KBs). Everything is fine if both the judge and bridged are run on the same server, but when the judge is moved to a separate server, it fails to handshake.

From what I understand after reading CPython source code, writelines just calls write for every line, which in turn calls socket.send. The document for socket.send clearly notes that "Applications are responsible for checking that all data has been sent; if only some of the data was transmitted, the application needs to attempt delivery of the remaining data." (ref). In our case, raw is so large that socket.send(raw) could not send everything in one pass.

P/s: Before coming up with this fix, we have to monkey patch like this:

self.output.writelines((PacketManager.SIZE_PACK.pack(len(raw)), raw[:100000], raw[100000:]))

which handshakes successfully, so the problem seems to be indeed due to socket.send unable to send everything at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a Python bug with the makefile interface to sockets because it's supposed to make sockets look like normal streams and handle incomplete transmission. Out of curiosity, what version are you using and how many problems do you have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The judge uses Python 3.9.7. Currently, we have around 9000 problems. You can also reproduce this by adding an extremely long string to the packet:

self._send_packet({'name': 'handshake', 'problems': problems, 'executors': runtimes, 'id': id, 'key': key, 'dummy': 'a' * 1000000000})

Btw, where do you get the info that "it's supposed to make sockets look like normal streams and handle incomplete transmission"? I can't seem to find it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because makefile is supposed to return a file object and file objects do not have this weird behaviour in which a write could not complete fully. In this case what I would actually recommend is just getting rid of the file objects and use socket.sendall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

assert self.conn is not None
self.conn.sendall(PacketManager.SIZE_PACK.pack(len(raw)) + raw)
except Exception: # connection reset by peer
log.exception('Exception while sending packet to site, will not attempt to reconnect! Quitting judge.')
os._exit(1)
Expand Down