Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

设置延迟暴露,在BOLT协议下,出现RPC cannot find service 异常 #1276

Closed
shawnliang1124 opened this issue Dec 29, 2023 · 6 comments

Comments

@shawnliang1124
Copy link

shawnliang1124 commented Dec 29, 2023

Describe the bug

sofa-boot 3.2.0 版本,在Bolt协议下,provider指定配置provider.delay,provider启动后,consumer的调用会出现:RPC cannot find service

To Reproduce

provider使用双机部署,rpc provider侧使用bolt协议,配置provider.delay = 100000,在META-INF/sofa-rpc/rpc-config.json下配置,在一台未重启,另外一台provider启动成功后,consumer直接调用即可复现。

根因:

debug源码发现,ProviderConfigContainer#publishAllProviderConfig方法,会判断,如果provider指定的protocol方式不是dubbo,就会直接往注册中心写入provider的节点数据(不清楚作者为啥要做这个设计)

    public void publishAllProviderConfig() {
   if (!serverConfig.getProtocol().equalsIgnoreCase(
                SofaBootRpcConfigConstants.RPC_PROTOCOL_DUBBO)) {
              // ..省略
              registry.register(providerConfig);
                }

            }
        }
}
 

报错的provider代码在sofa-rpc 5.6版本的BoltServerProcessor#handleRequest 148行,invoker为空,导致抛异常:RPC cannot find service

 @Override
    public void handleRequest(BizContext bizCtx, AsyncContext asyncCtx, SofaRequest request) {
                     // ..省略
                   Invoker invoker = boltServer.findInvoker(serviceName);
                    if (invoker == null) {
                        throwable = cannotFoundService(appName, serviceName);
                        response = MessageBuilder.buildSofaErrorResponse(throwable.getMessage());
                        break invoke;
                    }
}

回顾源码:Invoker 对象是DefaultProviderBootstrap#doExport() 方法中初始化, 但是因为sofa-boot设置了延迟暴露(provider.delay = 100000),doExport 方法则是100秒后才会被调用,故Invoker对象是在100秒后才会被初始化成功。

  @Override
    public void export() {
        if (providerConfig.getDelay() > 0) { // 延迟加载,单位毫秒
            Thread thread = factory.newThread(new Runnable() {
                @Override
                public void run() {
                    try {
                        Thread.sleep(providerConfig.getDelay());
                    } catch (Throwable ignore) { // NOPMD
                    }
                    doExport();
                }
            });
            thread.start();
        } else {
            doExport();
        }
    }

doExport方法。其中 server.registerProcessor(providerConfig, providerProxyInvoker) 就是做了Invoker 对象初始化的事情

    private void doExport() {
    // 省略
     // 将处理器注册到server
            for (ServerConfig serverConfig : serverConfigs) {
                try {
                    Server server = serverConfig.buildIfAbsent();
                    // 注册请求调用器
                    server.registerProcessor(providerConfig, providerProxyInvoker);
                    if (serverConfig.isAutoStart()) {
                        server.start();
                    }

                } catch (SofaRpcRuntimeException e) {
                    throw e;
                } catch (Exception e) {
                    LOGGER.errorWithApp(appName,
                        LogCodes.getLog(LogCodes.ERROR_REGISTER_PROCESSOR_TO_SERVER, serverConfig.getId()), e);
                }
            }
  
}

可是sofa-boot的 ProviderConfigContainer#publishAllProviderConfig方法已经往注册中心写进去数据,导致consumer看到了该节点,故能直接发起调用provider,但是因为provider的延迟暴露时间设置,进而导致Invoker为空,导致Provider处理异常

Expected behavior

支持延迟暴露配置,并且不会产生RPC cannot find service 异常

Environment:

  • SOFA BOOT 3.2.0 , SOFA-RPC 5.6.1
  • Language Version:JAVA8

Additional context

解决方法:SOFA-BOOT 中,更改 ProviderConfigContainer#publishAllProviderConfig方法,不把节点写上写注册中心

@HzjNeverStop
Copy link
Contributor

@EvenLjj 请帮忙一起 check 下这个 issue

@Lo1nt
Copy link

Lo1nt commented Jan 4, 2024

在SofaBoot环境下,我们会利用SofaBoot特有的 健康检查机制 来控制rpc服务向注册中心注册的时机。因为健康检查能很好的反应应用当前的情况。上述publishAllProviderConfig就是这个机制的实现: 发布rpc服务时不通知注册中心,等待应用健康检查通过后,将所有rpc服务一并注册到注册中心。

provider.delay是SofaRpc自行设计的一个小trick, 该机制的实现逻辑就是在export时直接返回成功, 随后异步处理发布动作。看起来和上述健康检查发布逻辑存在一定的冲突。我们reivew一下这里的逻辑看看如何做一下两个功能的兼容吧。

针对你目前提出的解决方案,更改 ProviderConfigContainer#publishAllProviderConfig方法,不把节点写上写注册中心,你可能需要注意上述两个机制发生的时间差,以确保最终至少有一个机制会正常向注册中心发布服务。

@EvenLjj
Copy link
Contributor

EvenLjj commented Jan 4, 2024

@EvenLjj 请帮忙一起 check 下这个 issue

确实存在这个问题,可以考虑在 ProviderConfigContainer#publishAllProviderConfig 中去识别这种延迟发布的 providerConfig,根据delay时间异步去注册中心注册服务。

wangchengming666 pushed a commit to wangchengming666/sofa-boot that referenced this issue Feb 23, 2024
@wangchengming666
Copy link
Contributor

related #1290

@wangchengming666
Copy link
Contributor

@shawnliang1124 我修复了一版、可以拉一下我的代码验证一下功能哈。

wangchengming666 pushed a commit to wangchengming666/sofa-boot that referenced this issue Feb 23, 2024
wangchengming666 pushed a commit to wangchengming666/sofa-boot that referenced this issue Feb 26, 2024
wangchengming666 pushed a commit to wangchengming666/sofa-boot that referenced this issue Feb 26, 2024
Smurf-Engineer pushed a commit to Smurf-Engineer/Kan-E that referenced this issue Apr 9, 2024
* fix sofastack/sofa-boot#1276

* fix sofastack/sofa-boot#1276

* fix sofastack/sofa-boot#1276

* fix sofastack/sofa-boot#1276

* 1.RpcActuatorAutoConfiguration autoconf after ReadinessAutoConfiguration
2.update unit tests

* 1.RpcActuatorAutoConfiguration autoconf after ReadinessAutoConfiguration
2.update unit tests

---------

Co-authored-by: 呈铭 <beck.wcm@antgroup.com>
Co-authored-by: 致节 <hzj266771@antgroup.com>
@shawnliang1124
Copy link
Author

thanks a lot, the question has resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants