Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Least load balancer #2999

Merged
merged 23 commits into from
Feb 18, 2024
Merged

Least load balancer #2999

merged 23 commits into from
Feb 18, 2024

Conversation

yuhan6665
Copy link
Member

@yuhan6665 yuhan6665 commented Feb 6, 2024

@qwerr0
Copy link

qwerr0 commented Feb 7, 2024

不知道是不是BUG, 似乎 roundrobin 策略会报错:
infra/conf: failed to parse to strategy config. > infra/conf: unknown config id: roundrobin

@yuhan6665
Copy link
Member Author

不知道是不是BUG, 似乎 roundrobin 策略会报错: infra/conf: failed to parse to strategy config. > infra/conf: unknown config id: roundrobin

感谢测试我修一下之后再ping你 :)

@yuhan6665
Copy link
Member Author

@qwerr0 麻烦再测一下
有条件的话 还可以测一下 leastLoad。。 大概配置如 v2fly/v2ray-core#589

@qwerr0
Copy link

qwerr0 commented Feb 8, 2024

{
  "tag": "auto-fallback",
  "selector": ["eq-grpc-", "sr-grpc-", "cf-grpc-"],
  "fallbackTag": "reject",
  "strategy": {
    "type": "leastLoad",
    "settings": {
      "healthCheck": {
        "interval": 30,
        "sampling": 5,
        "destination": "http://www.google.com/gen_204",
        "connectivity": "http://connectivitycheck.platform.hicloud.com/generate_204",
        "timeout": 5
      },
      "costs": [
        {
          "match": "eq-grpc-",
          "value": 4
        },
        {
          "match": "sr-grpc-",
          "value": 8
        },
        {
          "match": "cf-grpc-",
          "value": 16
        }
      ]
    }
  }
}

leastPing 也会出现空指针错误
这样写也是空指针错误

{
  "tag": "auto-fallback",
  "selector": ["eq-grpc"],
  "fallbackTag": "reject",
  "strategy": {
    "type": "leastLoad"
  }
}
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: panic: runtime error: invalid memory address or nil pointer dereference
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x6ab990]
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]:
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: goroutine 73 [running]:
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*LeastLoadStrategy).getNodes(0x40010be000, {0x40001163c0, 0xc, 0x6a3368?}, 0x0)
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/strategy_leastload.go:143 +0xa0
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*LeastLoadStrategy).pickOutbounds(0x40010be000, {0x40001163c0?, 0x0?, 0x6a4010?})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/strategy_leastload.go:75 +0x2c
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*LeastLoadStrategy).PickOutbound(0x400049a460?, {0x40001163c0?, 0x3?, 0x6?})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/strategy_leastload.go:65 +0x20
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*Balancer).PickOutbound(0x400049a460)
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/balancing.go:60 +0x20c
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*Rule).GetTag(...)
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/config.go:20
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*Router).PickRoute(0x102f038?, {0x1037d70?, 0x400000c918?})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/router.go:76 +0x54
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).routedDispatch(0x40006ecfc0, {0x102f038, 0x4001596e40}, 0x40010b8940, {{0x102f1c0, 0x400090a308}, 0x1466, 0x2})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/dispatcher/default.go:403 +0x254
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).Dispatch.func1()
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/dispatcher/default.go:266 +0x538
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: created by github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).Dispatch in goroutine 72
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/dispatcher/default.go:239 +0x38c
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: panic: runtime error: invalid memory address or nil pointer dereference
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x6ab990]
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]:
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: goroutine 87 [running]:
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*LeastLoadStrategy).getNodes(0x40010be000, {0x400038a000, 0xc, 0x6a3368?}, 0x0)
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/strategy_leastload.go:143 +0xa0
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*LeastLoadStrategy).pickOutbounds(0x40010be000, {0x400038a000?, 0x0?, 0x6a4010?})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/strategy_leastload.go:75 +0x2c
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*LeastLoadStrategy).PickOutbound(0x400049a460?, {0x400038a000?, 0x3?, 0x6?})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/strategy_leastload.go:65 +0x20
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*Balancer).PickOutbound(0x400049a460)
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/balancing.go:60 +0x20c
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*Rule).GetTag(...)
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/config.go:20
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/router.(*Router).PickRoute(0x102f038?, {0x1037d70?, 0x4000120630?})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/router/router.go:76 +0x54
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).routedDispatch(0x40006ecfc0, {0x102f038, 0x40010bfe00}, 0x40015882e0, {{0x102f1c0, 0x40006b11bc}, 0x1466, 0x2})
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/dispatcher/default.go:403 +0x254
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).Dispatch.func1()
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/dispatcher/default.go:266 +0x538
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: created by github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).Dispatch in goroutine 86
Thu Feb  8 14:17:33 2024 daemon.err xray[21889]: 	github.com/xtls/xray-core/app/dispatcher/default.go:239 +0x38c

配置上, leastLoad + "maxRTT": 1000 也会报错, leastPing 不会:
infra/conf: failed to parse to strategy config. > invalid duration: 1000

  {
    "tag": "auto-fallback",
    "selector": ["eq-grpc"],
    "fallbackTag": "reject",
    "strategy": {
      "type": "leastLoad",
      "settings": {
        "maxRTT": 1000
      }
    }
  }

@yuhan6665
Copy link
Member Author

yuhan6665 commented Feb 9, 2024

@qwerr0 考察了一下是配置格式变了 不要写 “healthCheck” 而是在最外层与 “routing” 同级加上:
leastPing 需要

  "observatory": {
    "subjectSelector":[
      "tag1",
      "tag2"
    ],
    "probeURL": "http://www.google.com/gen_204",
    "probeInterval": "1h",
    "enableConcurrency": true
  }

leastLoad 需要

  "burstObservatory": {
    "subjectSelector":[
      "tag1",
      "tag2"
    ],
    "pingConfig": {
      "destination": "http://www.google.com/gen_204",
      "interval": "1h",
      "connectivity": "http://connectivitycheck.platform.hicloud.com/generate_204",
      "timeout": "30s",
      "sampling": 2
    }
  }

@yuhan6665
Copy link
Member Author

infra/conf: failed to parse to strategy config. > invalid duration: 1000
这个说明配置写对地方 duration 要写字符串并带单位 比如 "1000s"

@APT-ZERO
Copy link

APT-ZERO commented Feb 12, 2024

Hi,
I have some questions
If i want balancer to test each outbound multiple times (3 times for example) and take the average as the result
what should i do, should just set 'sampling' to 3? or 'sampling' is something else?

And why there is 'destination' and 'connectivity' together? is 'connectivity' a fallback for 'destination'?

Can i use the results of leastLoad balancer inside a roundRobin balancer?
To roundRobin connections to the configs that already tested by reastLoad

is tolerance same as tolerance in Clash.Meta, but it's in Rate/Percent only?
https://wiki.metacubex.one/config/proxy-groups/url-test/#tolerance

+ an Idea
Look at 'expected-status' in Clash.Meta :
https://wiki.metacubex.one/config/proxy-groups/#_1
with 'expected-status', loadbalancer will load any url we set in 'destination' and loadblanacer will check if http-status match or not
you can also add another option like 'expected-keyword', then balancer will load the 'destination' and look for the keyword we set
It can help us to only select the outbounds that can load google or netflix successfully

@yuhan6665
Copy link
Member Author

@APT-ZERO this is different from what clash-meta has. See v2fly/v2ray-core#589 for details. We will be open to further improvements once this is merged. But please do some test with it before that.

@qwerr0
Copy link

qwerr0 commented Feb 13, 2024

过年了忙着赶亲, 前几天没来得及测试, 定义好 observatory 用着没出啥问题了, leastPing 正常, leastLoad 测试也正常

@qwerr0
Copy link

qwerr0 commented Feb 13, 2024

probeURL 应该是 probeUrl 吧? 而且似乎不能直接填域名, 要填网址
"probeUrl": "http://www.google.com/gen_204",

我觉得其实observatory可以添加一个选项, 就是类似于 Clash Meta 的 outbound 失败后(比如连接被重置 连接超时)自动开始检查健康状态, 我把我的一个 outbound down 掉以后, Xray 要至少等待 interval 后才会开始检查健康状态然后才会切到正常的 outbound

还有一个就是负载均衡就不好实现了, 我有几组gRPC, 每一组gRPC的地址端口都是一样的, 目的是为了建立多个TCP连接缓解 TCP 队头堵塞, selector 只能选单个 outbound, 如果能选除了自身以外的 balancer 就好了

@yuhan6665 yuhan6665 merged commit fa5d7a2 into XTLS:main Feb 18, 2024
34 checks passed
@yuhan6665
Copy link
Member Author

感谢大家测试及提供宝贵意见 以后可以继续改进

ttc0419 added a commit to ttc0419/Xray-core that referenced this pull request Feb 20, 2024
ttc0419 added a commit to ttc0419/Xray-core that referenced this pull request Feb 26, 2024
ttc0419 added a commit to ttc0419/Xray-core that referenced this pull request Feb 26, 2024
@BI7PRK
Copy link

BI7PRK commented Feb 29, 2024

@qwerr0 考察了一下是配置格式变了 不要写 “healthCheck” 而是在最外层与 “routing” 同级加上: leastPing 需要

  "observatory": {
    "subjectSelector":[
      "tag1",
      "tag2"
    ],
    "probeURL": "http://www.google.com/gen_204",
    "probeInterval": "1h",
    "enableConcurrency": true
  }

leastLoad 需要

  "burstObservatory": {
    "subjectSelector":[
      "tag1",
      "tag2"
    ],
    "pingConfig": {
      "destination": "http://www.google.com/gen_204",
      "interval": "1h",
      "connectivity": "http://connectivitycheck.platform.hicloud.com/generate_204",
      "timeout": "30s",
      "sampling": 2
    }
  }

请问大佬,destination & connectivity 有什么不同的意义呢?

@yuhan6665
Copy link
Member Author

@BI7PRK destination 是代理测速 connectivity 是本地不代理确认网络联通性的 如果网络不通 不会给节点减分

@terrason
Copy link

terrason commented Mar 6, 2024

leastLoad 真的有起作用吗?
我使用这个配置,虽然不报错,但是总是选择第一个outbound

  "burstObservatory": {
    "subjectSelector":[
      "tag1",
      "tag2"
    ],
    "pingConfig": {
      "destination": "http://www.google.com/gen_204",
      "interval": "1h",
      "connectivity": "http://connectivitycheck.platform.hicloud.com/generate_204",
      "timeout": "30s",
      "sampling": 2
    }
  }

日志打印 app/router: least load: no qualified outbound

2024/03/06 16:06:43 [Info] [3948790889] proxy/socks: TCP Connect request to tcp:52.84.162.122:443
2024/03/06 16:06:43 [Info] app/router: least load: no qualified outbound
2024/03/06 16:06:43 [Info] [3948790889] app/dispatcher: default route for tcp:52.84.162.122:443
2024/03/06 16:06:43 [Info] [3948790889] transport/internet/tcp: dialing TCP to tcp:23.94.57.237:443
2024/03/06 16:06:43 [Debug] transport/internet: dialing to tcp:23.94.57.237:443
2024/03/06 16:06:43 tcp:127.0.0.1:60552 accepted tcp:52.84.162.122:443 [socks-listen >> proxy-RackNerd-Seattle]

每次修改配置文件调整outbound里的顺序再重启,就会发现最终走的是第一个outbound

masked-config.json

@yuhan6665
Copy link
Member Author

yuhan6665 commented Mar 6, 2024

@terrason tag1, tag2 是示例需要你改成你想测试的 outbound tag
另外试一下最新版 不然可能一两个小时都没反应。。

@terrason
Copy link

terrason commented Mar 7, 2024

@terrason tag1, tag2 是示例需要你改成你想测试的 outbound tag 另外试一下最新版 不然可能一两个小时都没反应。。

谢谢回复,我已经修改了burstObservatory配置如下:

"burstObservatory": {
    "subjectSelector":[
      "proxy-DMIT",
      "proxy-RackNerd-Dallas",
      "proxy-RackNerd-Seattle",
      "proxy-bandwagonhost"
    ],
    "pingConfig": {
      "destination": "http://www.google.com/gen_204",
      "interval": "1h",
      "connectivity": "http://connectivitycheck.platform.hicloud.com/generate_204",
      "timeout": "30s",
      "sampling": 2
    }
  }

但是还是一样的报错

2024/03/07 11:57:21 [Info] [2479325708] proxy/socks: TCP Connect request to tcp:18.245.46.14:443
2024/03/07 11:57:21 [Info] app/router: least load: no qualified outbound
2024/03/07 11:57:21 [Info] [2479325708] app/dispatcher: default route for tcp:18.245.46.14:443
2024/03/07 11:57:21 [Info] [2479325708] transport/internet/tcp: dialing TCP to tcp:23.94.57.237:443
2024/03/07 11:57:21 [Debug] transport/internet: dialing to tcp:23.94.57.237:443
2024/03/07 11:57:21 tcp:127.0.0.1:48570 accepted tcp:18.245.46.14:443 [socks-listen >> proxy-RackNerd-Seattle]

我使用的版本是1.8.8,最新版是指用最新源码构建吗?
masked-config.json

补充:
不用构建最新代码,经测试1.8.8 版本中leastLoad是有效的。
一开始启动虽然是第一个outbound,但跑了一会后(半个小时以上)就切换到更快的outbound了

@yuhan6665
Copy link
Member Author

对 最新 main 加了 initial check

@s-kile
Copy link

s-kile commented Mar 15, 2024

can someone explain how this works? least load on the config (i mean least traffic and... ?) does it care about config pings?

@us254
Copy link

us254 commented Mar 16, 2024

can someone explain how this works? least load on the config (i mean least traffic and... ?) does it care about config pings?

  1. It allows configuring a load balancing strategy for outbound connections. The available strategies include "leastPing", "leastLoad", "roundRobin", "random".

  2. For the "leastLoad" strategy specifically:

    • It periodically checks the health and performance of the configured outbound proxies.
    • The health check is done by making HTTP requests to the specified "destination" URL (e.g. http://www.google.com/gen_204) through each outbound proxy.
    • The "connectivity" URL (e.g. http://connectivitycheck.platform.hicloud.com/generate_204) is used to check local network connectivity without going through the proxy.
    • The "interval" specifies how often the checks are performed (e.g. every 1 hour).
    • The "timeout" sets the maximum time to wait for each check request.
    • The "sampling" parameter determines how many times each outbound is tested per health check round.
  3. Based on the results of the periodic health checks, the "leastLoad" balancer will route traffic to the outbound proxy deemed to have the best performance (least load).

  4. If no qualified outbound is found by the health checks, it falls back to a specified "fallbackTag" outbound.

  5. The "subjectSelector" field specifies which outbound proxy tags the balancer should select from and monitor.

the Least Load Balancer distributes traffic across multiple outbound proxies based on their real-time performance as determined by periodic health checks. This helps optimize routing by favoring faster, less congested outbounds. The config allows customizing the health check behavior and fallback logic.

@CrazyBoyFeng
Copy link

can someone explain how this works? least load on the config (i mean least traffic and... ?) does it care about config pings?

This selectLeastLoad() code is written too mathematically, and also provides configuration items for variance baseline and expected value. I'm guessing most people don't know how it works and use the default configuration. The following is my own understanding after reading the code, not necessarily right, welcome to enlighten me:
Without the baseline configured by default, it's just looking for the three (expectation default) nodes with the smallest (most stable) delay variance, and then sorting by average delay.
If you configure baseline and expectation, then in addition to picking the most stable expected number of nodes, it will also pick nodes within the baseline near the expected nodes, and then sort by average delay.

这个 selectLeastLoad() 代码写得太数学化了,而且还提供了方差基线和期望值的配置项。我猜大多数人都不知道它是如何工作的,所以都使用默认配置。以下是我自己阅读代码后的理解,不一定对,欢迎指教:
默认情况下不配置基线,它只是寻找延迟方差最小(最稳定)的三个(默认期望值)节点,然后按平均延迟排序。
如果配置了基线和期望值,那么除了挑选最稳定的预期节点数外,它还会在基线内挑选靠近预期节点的节点,然后按平均延迟排序。

@pulsarice
Copy link

To check "connectivity" URL, how does Xray resolve destination IP? witll it use core's internal DNS server or will it use the operating system's resolver?
Also if we don't have any freedom (direct) outbound, will it still work?

@Goosegit11
Copy link

Goosegit11 commented Nov 4, 2024

are there any web panels for Xray that support load balancing?

upd: 3x-ui does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.