Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace unsafe.Slice with memory copying to avoid potential fault memory issue #6664

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Sep 12, 2024

  • Refactored ListIPForwardRows to deep copy IP forwarding table rows.
  • Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

@XinShuYang XinShuYang marked this pull request as draft September 12, 2024 04:02
@XinShuYang XinShuYang force-pushed the fixwinsyscall branch 3 times, most recently from 602534b to eab0b6c Compare September 12, 2024 04:09
@XinShuYang XinShuYang marked this pull request as ready for review September 12, 2024 04:09
@XinShuYang
Copy link
Contributor Author

/test-windows-all

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break down the PR into multiple ones as it is doing essentially 3 unrelated changes. You can then backport specifically what's needed using the normal backporting process, instead of opening an independent / "duplicate" PR (#6665).

cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
Comment on lines -371 to +380
return unsafe.Slice(&table.Table[0], table.NumEntries), nil
rows := make([]MibIPForwardRow, table.NumEntries, table.NumEntries)

pFirstRow := uintptr(unsafe.Pointer(&table.Table[0]))
rowSize := unsafe.Sizeof(table.Table[0])

for i := uint32(0); i < table.NumEntries; i++ {
row := *(*MibIPForwardRow)(unsafe.Pointer(pFirstRow + rowSize*uintptr(i)))
rows[i] = row
}
return rows, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this fix a specific issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We received a report indicating that when host memory is less than 8GB, using unsafe.Slice can lead to memory faults. It’s possible that this function didn’t perform a deep copy, so after calling freeMibTable, we may encounter invalid memory addresses when access rows again. We have already verified that this deep copy method resolves this issue in the same environment.

@XinShuYang
Copy link
Contributor Author

Please break down the PR into multiple ones as it is doing essentially 3 unrelated changes. You can then backport specifically what's needed using the normal backporting process, instead of opening an independent / "duplicate" PR (#6665).

@antoninbas Can I merge the agent crash fix and windows CI fix commits into one PR? Otherwise, the Windows CI pipeline will still fail due to rate limit issue.

@antoninbas
Copy link
Contributor

Please break down the PR into multiple ones as it is doing essentially 3 unrelated changes. You can then backport specifically what's needed using the normal backporting process, instead of opening an independent / "duplicate" PR (#6665).

@antoninbas Can I merge the agent crash fix and windows CI fix commits into one PR? Otherwise, the Windows CI pipeline will still fail due to rate limit issue.

Yes, that's ok given that SNATFullyRandomPorts was introduced after the last minor release and the change doesn't need to be backported.

@@ -368,7 +399,16 @@ func (n *netIO) ListIPForwardRows(family uint16) ([]MibIPForwardRow, error) {
if err != nil {
return nil, os.NewSyscallError("iphlpapi.GetIpForwardTable", err)
}
return unsafe.Slice(&table.Table[0], table.NumEntries), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know the only issue with the previous code is that we were not making a copy of the slice prior to returning, while at the same time freeing the underlying memory by calling n.freeMibTable(unsafe.Pointer(table)).

The Wireguard Windows code has a simpler implementation of this, which still uses unsafe.Slice. It achieves correctness by making a proper copy of the slice: https://github.com/WireGuard/wireguard-windows/blob/e70799b1440690e7d4140bffc7c73baf903c7b54/tunnel/winipcfg/winipcfg.go#L151

That being said, if this new version has already been tested thoroughly, there is probably no string reason to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas , we used to think about the solution as what Wiregurad is taking, a concern is some fields in MibIPForwardRow are using type []byte, which still exists the risk with a shadow copy if the memory resource is limited (e.g., Windows OS re-allocate the memory to other objects after the MibIPForwardTable is free). So we finally choose the solution to deep copy the objects before free the memory. The latest code is verified on the problematic cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean [N]byte types like the [26]byte field in RawSockAddrInet ([]byte would not be possible here AFAIK)? These should be no different than integer types, they should be inlined in the struct with no indirection.

The code before this change was wrong for obvious reasons. I don't think the Wireguard code is wrong. But if you feel like manual copy is the best approach and this has been well-tested, as I wrote before we can merge it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields in MibIPForwardRow is array not slice. It will be deepcopied automatically.
And the current code is inconsistent: it makes a manual deepcopy for RawSockAddrInet but not AddressPrefix which also has a RawSockAddrInet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wenyingd
wenyingd previously approved these changes Sep 14, 2024
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinShuYang
Copy link
Contributor Author

/test-windows-all

@luolanzone
Copy link
Contributor

Hi @antoninbas @tnqn could you help to double check if we can more forward? this issue is a blocker for downstream customers. Thanks.

@XinShuYang
Copy link
Contributor Author

/test-windows-containerd-e2e

antoninbas
antoninbas previously approved these changes Sep 14, 2024
@@ -368,7 +399,16 @@ func (n *netIO) ListIPForwardRows(family uint16) ([]MibIPForwardRow, error) {
if err != nil {
return nil, os.NewSyscallError("iphlpapi.GetIpForwardTable", err)
}
return unsafe.Slice(&table.Table[0], table.NumEntries), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean [N]byte types like the [26]byte field in RawSockAddrInet ([]byte would not be possible here AFAIK)? These should be no different than integer types, they should be inlined in the struct with no indirection.

The code before this change was wrong for obvious reasons. I don't think the Wireguard code is wrong. But if you feel like manual copy is the best approach and this has been well-tested, as I wrote before we can merge it as is.

Comment on lines 137 to 139
sockData := [26]byte{}
copy(a.data[:], sockData[:])
return RawSockAddrInet{Family: a.Family, data: sockData}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return *a would make a copy given that data is an array, not a slice
am I missing something @tnqn ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. The whole function is what struct copy does exactly.

…ory issue

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
@XinShuYang
Copy link
Contributor Author

XinShuYang commented Sep 14, 2024

Thanks for all comments @antoninbas @tnqn , I have updated the code based on the current Windows network management API implementation(https://github.com/tailscale/winipcfg-go/blob/84d0a1a3b210638dfe24d6169581a7af315c13f0/wt_mib_ipforward_row2.go#L51C1-L78C2).

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Sep 14, 2024

Please let me know once the verification is done

@XinShuYang
Copy link
Contributor Author

/test-windows-all

@tnqn
Copy link
Member

tnqn commented Sep 14, 2024

Thanks for all comments @antoninbas @tnqn , I have updated the code based on the current Windows network management API implementation(https://github.com/tailscale/winipcfg-go/blob/84d0a1a3b210638dfe24d6169581a7af315c13f0/wt_mib_ipforward_row2.go#L51C1-L78C2).

Both are correct, The code in wireguard windows is more concise.

@XinShuYang
Copy link
Contributor Author

Please let me know once the verification is done

@tnqn Windows CI passed.

@tnqn
Copy link
Member

tnqn commented Sep 14, 2024

/skip-all

1 similar comment
@tnqn
Copy link
Member

tnqn commented Sep 14, 2024

/skip-all

@tnqn tnqn merged commit b96d22f into antrea-io:main Sep 14, 2024
54 of 60 checks passed
XinShuYang added a commit to XinShuYang/antrea that referenced this pull request Sep 14, 2024
…ory issue (antrea-io#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
XinShuYang added a commit to XinShuYang/antrea that referenced this pull request Sep 14, 2024
…ory issue (antrea-io#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
XinShuYang added a commit to XinShuYang/antrea that referenced this pull request Sep 14, 2024
…ory issue (antrea-io#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
tnqn pushed a commit that referenced this pull request Oct 18, 2024
…ory issue (#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
tnqn pushed a commit that referenced this pull request Oct 18, 2024
…ory issue (#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
tnqn pushed a commit that referenced this pull request Oct 18, 2024
…ory issue (#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
hangyan pushed a commit to hangyan/antrea that referenced this pull request Oct 29, 2024
…ory issue (antrea-io#6664)

* Refactored ListIPForwardRows to copy IP forwarding table rows.
* Removed unsafe.Slice and replaced with manual pointer dereferencing and copying.

This change addresses a potential fault memory issue when iterating through the IP forwarding table,
caused by the use of slices after corresponding memory has been freed, leading to access failure.

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Signed-off-by: Wenying Dong <wenyingd@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants