Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prim_inet don't clean up monitor if port_command crashed #8484

Closed
zzydxm opened this issue May 15, 2024 · 5 comments · Fixed by #8546
Closed

prim_inet don't clean up monitor if port_command crashed #8484

zzydxm opened this issue May 15, 2024 · 5 comments · Fixed by #8546
Assignees
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS
Milestone

Comments

@zzydxm
Copy link
Contributor

zzydxm commented May 15, 2024

Describe the bug
When the port owner dies because for example a linked process died, the erlang:port_command can crash and fall into this clause
https://github.com/erlang/otp/blob/master/erts/preloaded/src/prim_inet.erl#L616-L619

Problem (1): This will cause a uncleared monitor message so we should demonitor here
Problem (2): The late monitor message appears to us in format
{'DOWN',#Ref<0.754902196.4233101336.24126>,process,#Port<0.55426>,{timeout,{...}}}
which type is process but object is a port, which is not expected anyway and we think it is a bug in BEAM

To Reproduce
No local reproduce yet but from the code problem (1) is straightforward
Problem (2) reproduce see comment below

Expected behavior
There should not be any DOWN message when prim_inet:send failed

Affected versions
OTP 26

@zzydxm zzydxm added the bug Issue is reported as a bug label May 15, 2024
@potatosalad
Copy link
Contributor

potatosalad commented May 15, 2024

Here is a simple way to reproduce the issue manually:

(fun() ->
_ = catch erlang:process_flag(trap_exit, true),
Parent = self(),
{Pid, PidMon} = spawn_monitor(fun() ->
Port = erlang:open_port({spawn, "sleep 10"}, []),
Parent ! {self(), Port},
timer:sleep(timer:seconds(10)),
exit({timeout, expired})
end),
Port = receive {Pid, P} -> P end,
_ = catch erlang:link(Port),
PortMon = erlang:monitor(port, Port),
_ = spawn(fun() -> timer:sleep(timer:seconds(5)), erlang:exit(Pid, {timeout, killed}) end),
#{
    port => Port,
    port_mon => PortMon,
    pid => Pid,
    pid_mon => PidMon
}
end)().
% wait 5 seconds...
flush().
Shell got {'DOWN',#Ref<0.3044587589.3799252993.136686>,process,<0.91.0>,
                  {timeout,killed}}
Shell got {'EXIT',#Port<0.3>,{timeout,killed}}
Shell got {'DOWN',#Ref<0.3044587589.3799252993.136688>,process,#Port<0.3>,
                  {timeout,killed}}

Edit: Confirmed that the above behavior is present in both OTP 26.x and OTP 27.0-rc3

@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label May 20, 2024
@RaimoNiskanen
Copy link
Contributor

It is obviously a bug, and the solution would be simply to add a demonitor(Mref, [flush]) in the catch error: _ -> clause.

But I will have to get the VM guys to take a look at the weird process tag first...

rickard-green added a commit to rickard-green/otp that referenced this issue Jun 4, 2024
…rickard/port-down-fix/25/erlangGH-8484/OTP-19123

* rickard/port-down-fix/24/erlangGH-8484/OTP-19123:
  [erts] Fix DOWN signal from port
@rickard-green rickard-green linked a pull request Jun 4, 2024 that will close this issue
@RaimoNiskanen
Copy link
Contributor

The process tag problem was apparently easy to find. It is on the ToDo list.

rickard-green added a commit that referenced this issue Jun 10, 2024
RaimoNiskanen added a commit that referenced this issue Jun 18, 2024
…-19121' into maint

* raimo/erts/prim_inet-demonitor-on-exception/GH-8484/OTP-19121:
  Update preloaded
  Demonitor after exception from port_command
dgud pushed a commit that referenced this issue Jun 25, 2024
* rickard/port-down-fix/25/GH-8484/OTP-19123:
  [erts] Fix DOWN signal from port
dgud pushed a commit that referenced this issue Jun 25, 2024
…-19121' into maint-26

* raimo/erts/prim_inet-demonitor-on-exception/GH-8484/OTP-19121:
  Update preloaded
  Demonitor after exception from port_command
IngelaAndin pushed a commit that referenced this issue Jul 8, 2024
* rickard/port-down-fix/25/GH-8484/OTP-19123:
  [erts] Fix DOWN signal from port
@RaimoNiskanen RaimoNiskanen modified the milestones: OTP-26.2.5.1, OTP-27.0.1 Jul 9, 2024
@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Jul 9, 2024

Fixed in OTP-26.2.5.1 and planned for OTP-27.0.1.

IngelaAndin pushed a commit that referenced this issue Jul 10, 2024
* rickard/port-down-fix/25/GH-8484/OTP-19123:
  [erts] Fix DOWN signal from port
IngelaAndin pushed a commit that referenced this issue Jul 10, 2024
…-19121' into maint-27

* raimo/erts/prim_inet-demonitor-on-exception/GH-8484/OTP-19121:
  Update preloaded
  Demonitor after exception from port_command

# Conflicts:
#	erts/preloaded/ebin/prim_inet.beam
@potatosalad
Copy link
Contributor

Awesome, thank you @RaimoNiskanen !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS
Projects
None yet
4 participants