Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: supporting third-party network stack such as TLDK #9266

Open
amysaq2023 opened this issue Aug 11, 2023 · 26 comments
Open

RFC: supporting third-party network stack such as TLDK #9266

amysaq2023 opened this issue Aug 11, 2023 · 26 comments
Assignees
Labels
area:networking/tldk area: networking Issue related to networking type: enhancement New feature or request

Comments

@amysaq2023
Copy link
Contributor

amysaq2023 commented Aug 11, 2023

Description

As an application kernel, gVisor provides developers with the opportunity to build a lightweight pod-level kernel and allows for more agile development and deployment than the host kernel. To maximize the advantage of gVisor's flexibility, we propose an enhancement to its network module: a solution to support TLDK for better performance, and also want to further discuss about whether there is a more general way to support more third-party network stack such as Smoltcp, F-Stack etc.

Our Implementation to support TLDK

Since cloud-native applications are highly sensitive to network performance, we have expanded gVisor to support a high-performance user-level network stack called TLDK. This has resulted in significantly better network I/O performance in certain scenarios.
To support TLDK network stack, we need to enable CGO in gVisor , as TLDK is currently implemented in C. We then initialized the TLDK stack through a cgo wrapper, based on the network type specified in the container boot config, and set up the TLDK socket opts interface in gVisor. Later network syscalls used gVisor's TLDK socket ops and invoked TLDK socket operation implementation through the cgo wrapper.
One of the key factors for gVisor's significant performance improvement with TLDK is that we support device (SR-IOV) passthrough with TLDK. This not only enhances network I/O performance but also reduces the attack surface on the host kernel. The original gVisor netstack cannot support drivers for device passthrough, but TLDK can work with DPDK as the frontend driver for device passthrough.
Moreover, we have provided a proper thread model and enabled an interrupt mode to avoid busy polling in typical DPDK scenarios. In this mode, the I/O thread wakes up when an event is raised by the host kernel upon receiving a packet from the NIC, and starts to read all available packets in DMA. It then wakes up the corresponding goroutine to receive the packets. This approach ensures efficient use of CPU resources, while avoiding unnecessary busy polling that can negatively impact application performance.
未命名绘图 (7)

Performance with TLDK

We compared runc and gVisor with TLDK, and the results show significant performance improvements in network I/O sensitive scenarios:

            
            Redis SET     Redis GET
runc:       335709 RPS    890301 RPS
runsc/TLDK: 617306 RPS    1391876 RPS

Further Discussion

While supporting TLDK, we had to modify the gVisor code to support another network stack socket ops, which incurred significant development costs. Therefore, in addition to proposing the support for TLDK in gVisor, we would like to open a discussion about whether there is a more general way for users to choose a third-party network stack without modifying gVisor.
One possible solution we are considering is exposing the network interface from the API to the ABI and building third-party network stacks as plugins that fit with these ABIs.

We would appreciate any insights or feedback from the community on this proposal and the further discussion matter and are open to exploring other potential solutions. Thanks.

Is this feature related to a specific bug?

No.

Do you have a specific solution in mind?

As decribed in 'Description' section.

@amysaq2023 amysaq2023 added the type: enhancement New feature or request label Aug 11, 2023
@kevinGC kevinGC self-assigned this Aug 11, 2023
@kevinGC
Copy link
Collaborator

kevinGC commented Aug 11, 2023

Very interested in making this happen. Thinking of this as separate sub-issues:

  1. General third-party stack support - I think this is great. The largest issue I see is API stability -- we develop and build gVisor+netstack as one big binary, so there's no defined API for network stacks. The primary concern for me is getting stuck on an API with problems. From experience I can tell you that we've changed that API within gVisor many times -- that flexibility is useful and finding a way to keep it is ideal. I wonder whether we can do API versioning (think Go modules-esque) so that stable APIs exist, but don't hamper development.
  2. CGO in gVisor - gVisor/runsc can't introduce CGO as a dependency for security reasons. This will have to be explicitly turned on by plugin users.
  3. TLDK performance - With those performance numbers I have a ton of questions. Too many for this post, but generally I'm curious whether your stack is portable or specifically tailored to your environment, e.g:
    1. Can you have multiple pods on a node? Normally DPDK steals the entire NIC, but maybe you use SR-IOV to create multiple NICs.
    2. Does SR-IOV tie you to particular hardware NICs? If I understand correctly it's not fully portable, which could create problems if different nodes have different NICs.
    3. If this is running in Kubernetes, what network plugin (CNI) is used to set everything up?
    4. Do non-gVisor pods run in the same environment?

Please let me know what you think. Also happy to discuss your specific setup in email/chat/wherever if that's easier.

@tanjianfeng
Copy link
Contributor

@kevinGC Among those sub-issues, the core one is CGO.

gVisor/runsc can't introduce CGO as a dependency for security reasons.

  • Does CGO interface introduce security issue? In other words, if we introduce a rust-based component (also memory-safe) in sentry, does that break the security?
  • gVisor itself is a defense-in-depth solution, with the host kernel jailers (seccomp/cgroup/namespace/capabilites/...) as the last line of defense. Can we tradeoff sentry security for perforance? An example in hand (may be not readlly proper), directfs sacrifice the security by allowing open() in sandbox process.

This will have to be explicitly turned on by plugin users.

If we understand it correctly, pure go needs the decision made at compile time. Do we have a conditional compile mechanism in gvisor bazel?

@amysaq2023
Copy link
Contributor Author

amysaq2023 commented Aug 14, 2023

@kevinGC Thanks for your quick response and we are happy to discuss more on these sub issues you have .

To answer sub-issue 3, in short, our stack can be portable to other environment and detailed reasons are below:

Can you have multiple pods on a node? Normally DPDK steals the entire NIC, but maybe you use SR-IOV to create multiple NICs.

Yes, we can support multiple pods on a node and yes, it is supported by using SR-IOV which can create multiple NICs.

Does SR-IOV tie you to particular hardware NICs? If I understand correctly it's not fully portable, which could create problems if different nodes have different NICs.

In our current implement for gVisor with TLDK+DPDK, it does not have requirements on NIC. As long as NIC can be used as virtio backend device, our solution to support TLDK can work on it.

If this is running in Kubernetes, what network plugin (CNI) is used to set everything up?

We do not use any CNI to set TLDK stack up. Instead, we invoke CGO wrapper to initialize TLDK stack during gVisor doing StartRoot().

Do non-gVisor pods run in the same environment?

Yes, non-gVisor pods can run with gVisor with TLDK pods in the same environment.

@kevinGC
Copy link
Collaborator

kevinGC commented Aug 15, 2023

Does CGO interface introduce security issue? In other words, if we introduce a rust-based component (also memory-safe) in sentry, does that break the security?

We've never discussed the CGO interface on its own, i.e. with something other than C being called into. But my first take is that the runsc binary should always be flagged as no CGO. I think a good solution would be to leave runsc as pure Go, and have this plugin system usable by defining a different go_binary target. That way we keep the high level of security, and users who want to make the tradeoff just need to write their own BUILD target. So ideally you'd have your own target looking something like:

go_binary(
    name = "runsc-tldk",
    srcs = ["main.go"],
    pure = False,
    visibility = [
        "//visibility:public",
    ],
    deps = [
        "@dev_gvisor//runsc/cli",
        "@dev_gvisor//runsc/version",
        "//my/codebase/tldk:runsc_plugin",
    ],
)

This yields a few benefits:

  • gVisor remains CGO-free
  • Plugin network stacks can be developed independently of upstream gVisor
  • By consuming gVisor as a bazel dependency, you would pin to a specific version of gVisor. This may be useful when gVisor changes to avoid breaking API changes

@tanjianfeng what do you think? Since you already have a third-party network stack, we want to hear what setup would work for you. If you have specific ideas in mind, we'd love to hear them. Once we have some agreement here, we can get others onboard and actually make the changes.

gVisor itself is a defense-in-depth solution, with the host kernel jailers (seccomp/cgroup/namespace/capabilites/...) as the last line of defense. Can we tradeoff sentry security for perforance?

Yes. Generally such tradeoffs are implemented but off by default. For example, raw sockets are implemented because people need tools like tcpdump, but must be enabled via a flag. Since CGO introduces a security issue just by being present in the binary, we shouldn't compile it in by default.


@amysaq2023 that's super impressive that you're getting the benefits of kernel bypass without many of the traditional issues (e.g. machines being single-app only). A few more questions (if you can answer):

  • Are the nodes in that Redis benchmark VMs or actual machines? My understanding is that the performance boost mostly comes from cutting out the host network stack, but if these are VMs then I'd expect the host machine's stack to slow things down.
  • Did you consider using XDP instead of DPDK? I wonder how performant it would be relative to DPDK, and given that it's probably easier to use.
  • Generally, do you think it's DPDK or TLDK that provide the bulk of the performance improvement? I'd like to do some experimenting of my own, and am wondering whether I'm more likely to see performance differences by hooking kernel bypass up to netstack or TLDK up to an AF_PACKET socket.

@amysaq2023
Copy link
Contributor Author

@kevinGC

what do you think? Since you already have a third-party network stack, we want to hear what setup would work for you.

Thank you for your insightful suggestion on how to support TLDK while maintaining the high level of security in gVisor. We have an additional proposal to consider:
First, we propose abstracting a set of APIs for gVisor's network stack. This way, third-party network stacks will only need to implement these APIs in order to be compatible with gVisor.
Next, we will compile the third-party network stack with gVisor APIs implemented as an object file. This approach ensures seamless integration between gVisor and the third-party network stack.
Most importantly, gVisor needs to support a method to invoke these APIs within the network stack binary. Currently, we are considering options such as using go plugins or implementing something similar.
We feel like that this solution will more thoroughly decouple the development of third-party network stacks from gVisor. Additionally, supporting binary plugins may have potential benefits for other modules, like the filesystem, enabling support for third-party implementations in the future.

Are the nodes in that Redis benchmark VMs or actual machines? My understanding is that the performance boost mostly comes from cutting out the host network stack, but if these are VMs then I'd expect the host machine's stack to slow things down.

The nodes in the Redis benchmark are actual physical machines.

Did you consider using XDP instead of DPDK? I wonder how performant it would be relative to DPDK, and given that it's probably easier to use.
Generally, do you think it's DPDK or TLDK that provide the bulk of the performance improvement? I'd like to do some experimenting of my own, and am wondering whether I'm more likely to see performance differences by hooking kernel bypass up to netstack or TLDK up to an AF_PACKET socket.

DPDK not only functions as a driver, but also offers various performance enhancements. For instance, it utilizes rte_ring for efficient communication with hardware and introduces its own memory management mechanisms with mbuf and mempool. Moreover, DPDK operates entirely at the user-level, completely detached from the host kernel, unlike XDP which still relies on hooking into the host kernel. Therefore, the performance enhancement achieved with TLDK+DPDK goes beyond just kernel bypass, benefiting from the improvements introduced by both TLDK and DPDK.

@avagin avagin added the area: networking Issue related to networking label Aug 23, 2023
@kevinGC
Copy link
Collaborator

kevinGC commented Aug 23, 2023

First, we propose abstracting a set of APIs for gVisor's network stack. This way, third-party network stacks will only need to implement these APIs in order to be compatible with gVisor.

Agreed! Maybe you could send a PR with the interface you use now to work with TLDK -- that would be a really good starting point. Much better than trying to come up with an arbitrary API, given that you've got this running already.

Next, we will compile the third-party network stack with gVisor APIs implemented as an object file. This approach ensures seamless integration between gVisor and the third-party network stack.

Right, if I understand correctly the build process for cgo requires building the object file first, then writing a Go layer around it that can call into it using the tools provided by import "C".

Most importantly, gVisor needs to support a method to invoke these APIs within the network stack binary. Currently, we are considering options such as using go plugins or implementing something similar.

Can you help me understand why we couldn't just build a static binary containing gVisor and the third party network stack? As part of the API we talked about above, gVisor can support registering third party netstacks. So the third party stack would contain an implementation of the API (socket ops like in your diagram), the cgo wrapper, the third party stack itself, and an init function that registers the stack to be used instead of netstack:

import "pkg/sentry/socket"

func init() {
  socket.RegisterThirdPartyProvider(linux.AF_INET, &tldkProvider)
  // etc..
}

This keeps everything building statically and avoids issues introduced by go plugins as far as I can tell, but maybe I'm missing something.

@kevinGC
Copy link
Collaborator

kevinGC commented Aug 23, 2023

Something I should've been more clear about regarding the static binary idea: I'm suggesting that the existing, cgo-free runsc target remain as-is, and that we support third party network stacks by having multiple BUILD targets. So the existing target will look mostly (or entirely) the same as it is today:

go_binary(
    name = "runsc",
    srcs = ["main.go"],
    pure = True,
    tags = ["staging"],
    visibility = [
        "//visibility:public",
    ],
    x_defs = {"gvisor.dev/gvisor/runsc/version.version": "{STABLE_VERSION}"},
    deps = [
        "//runsc/cli",
        "//runsc/version",
    ],
)

And building runsc with a third party network stack requires adding another target (which could be in the same BUILD file, a different one, or even a separate bazel project):

go_binary(
    name = "runsc_tldk",
    srcs = ["main_tldk.go"],
    pure = False,
    tags = ["staging"],
    visibility = [
        "//visibility:public",
    ],
    x_defs = {"gvisor.dev/gvisor/runsc/version.version": "{STABLE_VERSION}"},
    deps = [
        "//runsc/cli",
        "//runsc/version",
        "//othernetstacks/tldk:tldk_provider",
    ],
)

Both go_binary targets are static, avoid go plugins and its headaches, and the default runsc binary remains cgo-free.

@amysaq2023
Copy link
Contributor Author

@kevinGC
Great! We are fully onboard with the idea of introducing an additional target to support third-party networking stack. To kick things off, we will begin by preparing a PR that encompasses gVisor APIs for networking modules, along with our implementation of these APIs in TLDK for seamless integration with gVisor. We sincerely appreciate all the valuable insights shared throughout this discussion thread.

@kevinGC
Copy link
Collaborator

kevinGC commented Sep 27, 2023

Just want to check on this and see if there's anything I can do help it along.

@amysaq2023
Copy link
Contributor Author

Hi Kevin, thanks for checking out. We have finished porting our modification of supporting TLDKv2 to current gVisor master branch and are currently working on refactoring some implementation to make it more general. I think we are on the right track, just needs a little more time due to the amount of code. If everything goes well, we will send out the patch next week.

@kevinGC
Copy link
Collaborator

kevinGC commented Oct 16, 2023

Hey, back to see whether there's anything I can do to help here. We're really excited to try this out, benchmark, and see the effects on gVisor networking.

amysaq2023 added a commit to amysaq2023/gvisor that referenced this issue Oct 17, 2023
This commit adds network stack and socket interfaces for
supporting external network stack.

- pkg/sentry/stack:
  Interfaces for initializing external network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/externalstack:
  Glue layer for external stack's socket and stack ops with sentry. It
  will also register external stack operations if imported.

- pkg/sentry/socket/externalstack/cgo:
  Interfaces defined in C for external network stack to support.

To build target runsc-external-stack, which imports
pkg/sentry/socket/externalstack package and enables CGO:

bazel build runsc:runsc-external-stack

By using runsc-external-stack binary and setting network type as
external stack, user can use third-party network stack instead of
netstack embedded in gVisor.

This commit only sets up the interfaces template, the specific
implementation for external stack operations will be provided in follow
up commits.

Updates google#9266

Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
@amysaq2023
Copy link
Contributor Author

amysaq2023 commented Oct 17, 2023

@kevinGC
Thanks for reaching out! Sorry for the delay caused by the National Day holiday. We have just create a PR (#9551) that introduces interface templates to support external network stack. Specific implemenation of these stack and socket operations will be provided in subsequent commits.
We would greatly appreciate any suggestions regarding the current interface setup. For now, we are actively working on decoupling specific TLDK stack supports from sentry and making it more adaptable to general third-party stacks.

@kevinGC
Copy link
Collaborator

kevinGC commented Oct 20, 2023

Thanks a TON. Just responded over there, but want to ask about testing here.

We'll want to test third party netstacks. I'm thinking that what you're contributing will only be testable if we have a similar environment (DPDK and such). Is that correct?

@amysaq2023
Copy link
Contributor Author

Thanks a TON. Just responded over there, but want to ask about testing here.

We'll want to test third party netstacks. I'm thinking that what you're contributing will only be testable if we have a similar environment (DPDK and such). Is that correct?

Hi Kevin, happy to hear that you are exploring third-party netstack testing too. In the current version we're working on, once we complete the implementation of all the necessary glue layers, we will compile the TLDK repository within it. (It will become clearer when we share the socket ops glue layer for the plugin netstack in the next commit.) With this binary, you can easily test it by 'docker run' to start a container, just as original runsc with native netstack does.

amysaq2023 added a commit to amysaq2023/gvisor that referenced this issue Dec 14, 2023
This commit adds network stack and socket interfaces for
supporting external network stack.

- pkg/sentry/socket/externalstack:
  Interfaces for initializing external network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/externalstack/wrapper:
  Glue layer for external stack's socket and stack ops with sentry. It
  will also register external stack operations if imported.

- pkg/sentry/socket/externalstack/cgo:
  Interfaces defined in C for external network stack to support.

To build target runsc-external-stack, which imports
pkg/sentry/socket/externalstack package and enables CGO:

bazel build runsc:runsc-external-stack

By using runsc-external-stack binary and setting network type as
external stack, user can use third-party network stack instead of
netstack embedded in gVisor.

This commit only sets up the interfaces template, the specific
implementation for external stack operations will be provided in follow
up commits.

Updates google#9266

Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
@amysaq2023
Copy link
Contributor Author

amysaq2023 commented Dec 14, 2023

Hi @kevinGC , we have recently pushed our implementation of supporting plugin network stack into gVisor. You can now compile the runsc binary with support for the plugin stack by executing the following command: bazel build runsc:runsc-plugin-stack. This build process will seamlessly incorporate our sample third-party network stack, TLDK.
To activate the plugin stack, simply adjust the runtimeArgs to include --network="plugin". This enables users to switch to the plugin stack for their networking needs.

We has conducted performance testing of gVisor when utilizing the plugin stack. We chose Redis as our benchmark and test the network performance under various conditions: 1. within runc; 2. within runsc with netstack on KVM; 3. within runsc with the plugin on KVM containers. The results are quite promising—the performance of runsc with the plugin stack closely rivals that of runc, delivering double RPS compared to runsc with netstack. We have documented the detailed performance metrics in our commit log for your review. The current performance test is being conducted with the software-implemented virtio-net backend, which is less optimized. Performance can be further improved if using VF (SR-IOV) passthrough.

Thanks for your continued support and patience throughout this development process. Your feedback on our design and implementation is greatly welcomed and appreciated.

@amysaq2023
Copy link
Contributor Author

Besides, we have encountered a specific issue after integrating cgo to support the plugin stack, which we'd like to bring to the table for discussion.
The problem arises when the mmap trap mechanism, utilized on the KVM platform, leads to a container panic following the introduction of cgo. The root of the issue is traced back to the _cgo_sys_thread_start function in Go runtime. Within this function, all signals are set to be blocked. The process then advances to _cgo_try_pthread_create, where an mmap call is made. This call is trapped by the KVM platform's seccomp rules for mmap.
image
image

When the host kernel processes the trapped syscall, it checks whether the SIGSYS signal is blocked. If it finds SIGSYS blocked, it resets the signal handler to its default address, thereby overwriting the handler we established during KVM initialization.
Consequently, when the host kernel attempts to handle SIGSYS, it encounters a 0x0 signal handler, leading to the default action for SIGSYS—coredump—which results in container panic.
image
image

bpftrace result at force_sig_info_to_task:
image
image

As a temporary workaround, we have reverted the KVM mmap trap mechanism. However, this solution is not intended for merging. We are actively seeking a more appropriate fix for this issue and would highly appreciate any suggestions, ideas, or discussions on how to resolve this problem.

@amysaq2023
Copy link
Contributor Author

@kevinGC Happy New Year :)
Just reach out and check whether we have any comments on #9551 ?

copybara-service bot pushed a commit that referenced this issue Sep 23, 2024
This commit supports a third-party network stack as a plugin stack for
gVisor.

The overall plugin package structure is the following:

- pkg/sentry/socket/plugin:
  Interfaces for initializing plugin network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/plugin/stack:
  Glue layer for plugin stack's socket and stack ops with sentry. It
  will also register plugin stack operations if imported.

- pkg/sentry/socket/plugin/cgo:
  Interfaces defined in C for plugin network stack to support.

To build target runsc-plugin-stack, which imports
pkg/sentry/socket/plugin/stack package and enables CGO:

bazel build --config=cgo-enable runsc:runsc-plugin-stack

By using runsc-plugin-stack binary and setting "--network=plugin" in
runtimeArgs, user can use third-party network stack instead of
netstack embedded in gVisor to get better network performance.

Redis benchmark with following setups:
1. KVM platform
2. 4 physical cores for target pod
3. target pod as redis server

Runc:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 115207.38 requests per second, p50=0.215 msec
GET: 92336.11 requests per second, p50=0.279 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 113895.21 requests per second, p50=0.247 msec
GET: 96899.23 requests per second, p50=0.271 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 126582.27 requests per second, p50=0.199 msec
GET: 95969.28 requests per second, p50=0.271 msec

Runsc with plugin stack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 123915.74 requests per second, p50=0.343 msec
GET: 115473.45 requests per second, p50=0.335 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 120918.98 requests per second, p50=0.351 msec
GET: 117647.05 requests per second, p50=0.351 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 119904.08 requests per second, p50=0.367 msec
GET: 112739.57 requests per second, p50=0.375 msec

Runsc with netstack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.759 msec
GET: 61162.08 requests per second, p50=0.631 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 52219.32 requests per second, p50=0.719 msec
GET: 58719.91 requests per second, p50=0.663 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.751 msec
GET: 60827.25 requests per second, p50=0.751 msec

Updates #9266

Co-developed-by: Tianyu Zhou <wentong.zty@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
FUTURE_COPYBARA_INTEGRATE_REVIEW=#9551 from amysaq2023:support-external-stack 56f2530
PiperOrigin-RevId: 677140616
copybara-service bot pushed a commit that referenced this issue Sep 23, 2024
This commit supports a third-party network stack as a plugin stack for
gVisor.

The overall plugin package structure is the following:

- pkg/sentry/socket/plugin:
  Interfaces for initializing plugin network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/plugin/stack:
  Glue layer for plugin stack's socket and stack ops with sentry. It
  will also register plugin stack operations if imported.

- pkg/sentry/socket/plugin/cgo:
  Interfaces defined in C for plugin network stack to support.

To build target runsc-plugin-stack, which imports
pkg/sentry/socket/plugin/stack package and enables CGO:

bazel build --config=cgo-enable runsc:runsc-plugin-stack

By using runsc-plugin-stack binary and setting "--network=plugin" in
runtimeArgs, user can use third-party network stack instead of
netstack embedded in gVisor to get better network performance.

Redis benchmark with following setups:
1. KVM platform
2. 4 physical cores for target pod
3. target pod as redis server

Runc:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 115207.38 requests per second, p50=0.215 msec
GET: 92336.11 requests per second, p50=0.279 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 113895.21 requests per second, p50=0.247 msec
GET: 96899.23 requests per second, p50=0.271 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 126582.27 requests per second, p50=0.199 msec
GET: 95969.28 requests per second, p50=0.271 msec

Runsc with plugin stack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 123915.74 requests per second, p50=0.343 msec
GET: 115473.45 requests per second, p50=0.335 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 120918.98 requests per second, p50=0.351 msec
GET: 117647.05 requests per second, p50=0.351 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 119904.08 requests per second, p50=0.367 msec
GET: 112739.57 requests per second, p50=0.375 msec

Runsc with netstack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.759 msec
GET: 61162.08 requests per second, p50=0.631 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 52219.32 requests per second, p50=0.719 msec
GET: 58719.91 requests per second, p50=0.663 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.751 msec
GET: 60827.25 requests per second, p50=0.751 msec

Updates #9266

Co-developed-by: Tianyu Zhou <wentong.zty@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
FUTURE_COPYBARA_INTEGRATE_REVIEW=#9551 from amysaq2023:support-external-stack 56f2530
PiperOrigin-RevId: 677140616
copybara-service bot pushed a commit that referenced this issue Sep 23, 2024
This commit supports a third-party network stack as a plugin stack for
gVisor.

The overall plugin package structure is the following:

- pkg/sentry/socket/plugin:
  Interfaces for initializing plugin network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/plugin/stack:
  Glue layer for plugin stack's socket and stack ops with sentry. It
  will also register plugin stack operations if imported.

- pkg/sentry/socket/plugin/cgo:
  Interfaces defined in C for plugin network stack to support.

To build target runsc-plugin-stack, which imports
pkg/sentry/socket/plugin/stack package and enables CGO:

bazel build --config=cgo-enable runsc:runsc-plugin-stack

By using runsc-plugin-stack binary and setting "--network=plugin" in
runtimeArgs, user can use third-party network stack instead of
netstack embedded in gVisor to get better network performance.

Redis benchmark with following setups:
1. KVM platform
2. 4 physical cores for target pod
3. target pod as redis server

Runc:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 115207.38 requests per second, p50=0.215 msec
GET: 92336.11 requests per second, p50=0.279 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 113895.21 requests per second, p50=0.247 msec
GET: 96899.23 requests per second, p50=0.271 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 126582.27 requests per second, p50=0.199 msec
GET: 95969.28 requests per second, p50=0.271 msec

Runsc with plugin stack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 123915.74 requests per second, p50=0.343 msec
GET: 115473.45 requests per second, p50=0.335 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 120918.98 requests per second, p50=0.351 msec
GET: 117647.05 requests per second, p50=0.351 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 119904.08 requests per second, p50=0.367 msec
GET: 112739.57 requests per second, p50=0.375 msec

Runsc with netstack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.759 msec
GET: 61162.08 requests per second, p50=0.631 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 52219.32 requests per second, p50=0.719 msec
GET: 58719.91 requests per second, p50=0.663 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.751 msec
GET: 60827.25 requests per second, p50=0.751 msec

Updates #9266

Co-developed-by: Tianyu Zhou <wentong.zty@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
FUTURE_COPYBARA_INTEGRATE_REVIEW=#9551 from amysaq2023:support-external-stack 56f2530
PiperOrigin-RevId: 677140616
copybara-service bot pushed a commit that referenced this issue Sep 23, 2024
This commit supports a third-party network stack as a plugin stack for
gVisor.

The overall plugin package structure is the following:

- pkg/sentry/socket/plugin:
  Interfaces for initializing plugin network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/plugin/stack:
  Glue layer for plugin stack's socket and stack ops with sentry. It
  will also register plugin stack operations if imported.

- pkg/sentry/socket/plugin/cgo:
  Interfaces defined in C for plugin network stack to support.

To build target runsc-plugin-stack, which imports
pkg/sentry/socket/plugin/stack package and enables CGO:

bazel build --config=cgo-enable runsc:runsc-plugin-stack

By using runsc-plugin-stack binary and setting "--network=plugin" in
runtimeArgs, user can use third-party network stack instead of
netstack embedded in gVisor to get better network performance.

Redis benchmark with following setups:
1. KVM platform
2. 4 physical cores for target pod
3. target pod as redis server

Runc:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 115207.38 requests per second, p50=0.215 msec
GET: 92336.11 requests per second, p50=0.279 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 113895.21 requests per second, p50=0.247 msec
GET: 96899.23 requests per second, p50=0.271 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 126582.27 requests per second, p50=0.199 msec
GET: 95969.28 requests per second, p50=0.271 msec

Runsc with plugin stack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 123915.74 requests per second, p50=0.343 msec
GET: 115473.45 requests per second, p50=0.335 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 120918.98 requests per second, p50=0.351 msec
GET: 117647.05 requests per second, p50=0.351 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 119904.08 requests per second, p50=0.367 msec
GET: 112739.57 requests per second, p50=0.375 msec

Runsc with netstack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.759 msec
GET: 61162.08 requests per second, p50=0.631 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 52219.32 requests per second, p50=0.719 msec
GET: 58719.91 requests per second, p50=0.663 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.751 msec
GET: 60827.25 requests per second, p50=0.751 msec

Updates #9266

Co-developed-by: Tianyu Zhou <wentong.zty@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
FUTURE_COPYBARA_INTEGRATE_REVIEW=#9551 from amysaq2023:support-external-stack 56f2530
PiperOrigin-RevId: 677140616
copybara-service bot pushed a commit that referenced this issue Sep 23, 2024
This commit supports a third-party network stack as a plugin stack for
gVisor.

The overall plugin package structure is the following:

- pkg/sentry/socket/plugin:
  Interfaces for initializing plugin network stack. It will be used
  in network setting up during sandbox creating.

- pkg/sentry/socket/plugin/stack:
  Glue layer for plugin stack's socket and stack ops with sentry. It
  will also register plugin stack operations if imported.

- pkg/sentry/socket/plugin/cgo:
  Interfaces defined in C for plugin network stack to support.

To build target runsc-plugin-stack, which imports
pkg/sentry/socket/plugin/stack package and enables CGO:

bazel build --config=cgo-enable runsc:runsc-plugin-stack

By using runsc-plugin-stack binary and setting "--network=plugin" in
runtimeArgs, user can use third-party network stack instead of
netstack embedded in gVisor to get better network performance.

Redis benchmark with following setups:
1. KVM platform
2. 4 physical cores for target pod
3. target pod as redis server

Runc:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 115207.38 requests per second, p50=0.215 msec
GET: 92336.11 requests per second, p50=0.279 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 113895.21 requests per second, p50=0.247 msec
GET: 96899.23 requests per second, p50=0.271 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 126582.27 requests per second, p50=0.199 msec
GET: 95969.28 requests per second, p50=0.271 msec

Runsc with plugin stack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 123915.74 requests per second, p50=0.343 msec
GET: 115473.45 requests per second, p50=0.335 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 120918.98 requests per second, p50=0.351 msec
GET: 117647.05 requests per second, p50=0.351 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 119904.08 requests per second, p50=0.367 msec
GET: 112739.57 requests per second, p50=0.375 msec

Runsc with netstack:
$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.759 msec
GET: 61162.08 requests per second, p50=0.631 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 52219.32 requests per second, p50=0.719 msec
GET: 58719.91 requests per second, p50=0.663 msec

$redis-benchmark -h [target ip] -n 100000 -t get,set -q
SET: 59952.04 requests per second, p50=0.751 msec
GET: 60827.25 requests per second, p50=0.751 msec

Updates #9266

Co-developed-by: Tianyu Zhou <wentong.zty@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
FUTURE_COPYBARA_INTEGRATE_REVIEW=#9551 from amysaq2023:support-external-stack 56f2530
PiperOrigin-RevId: 677140616
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Its value will be known only on the configuration phase,
before that it can be a select directive.

Updates #9266

PiperOrigin-RevId: 678288252
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Its value will be known only on the configuration phase,
before that it can be a select directive.

Updates #9266

PiperOrigin-RevId: 678288252
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Its value will be known only on the configuration phase,
before that it can be a select directive.

Updates #9266

PiperOrigin-RevId: 678288252
copybara-service bot pushed a commit that referenced this issue Sep 24, 2024
Its value will be known only on the configuration phase,
before that it can be a select directive.

Updates #9266

PiperOrigin-RevId: 678412518
@avagin
Copy link
Collaborator

avagin commented Sep 25, 2024

#10954 starts running a minimal set of tests on buildkite. We need to add more tests. In ideal case, we need to run image tests and network specific tests.

alipay/tldk#4 is needed to be merged, otherwise tldk fails to build in the gvisor docker build container.

copybara-service bot pushed a commit that referenced this issue Sep 25, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 0cb094b
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 25, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 0cb094b
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 0cb094b
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 0cb094b
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 37eac1a
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 009957d
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 009957d
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 009957d
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 009957d
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins 009957d
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins ec18cb1
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 26, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins ec18cb1
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 27, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins ec18cb1
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 27, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins ec18cb1
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 27, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins ec18cb1
PiperOrigin-RevId: 678845857
copybara-service bot pushed a commit that referenced this issue Sep 27, 2024
Updates #9266

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10954 from google:test/avagin/network_plugins ec18cb1
PiperOrigin-RevId: 678845857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:networking/tldk area: networking Issue related to networking type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants