PaddlePaddle · wawltor · Jun 27, 2022 · Jun 21, 2022 · Jun 22, 2022 · Jun 23, 2022
diff --git a/applications/text_classification/hierarchical_classification/README.md b/applications/text_classification/hierarchical_classification/README.md
diff --git a/...text_classification/hierarchical_classification/deploy/paddle_serving/README.md b/...text_classification/hierarchical_classification/deploy/paddle_serving/README.md
@@ -0,0 +1,128 @@
+# 基于Paddle Serving的服务化部署
+
+本文档将介绍如何使用[Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署基于ERNIE 2.0的层次分类部署pipeline在线服务。
+
+## 目录
+- [环境准备](#环境准备)
+- [模型转换](#模型转换)
+- [部署模型](#部署模型)
+
+## 环境准备
+需要[准备PaddleNLP的运行环境]()和Paddle Serving的运行环境。
+
+### 安装Paddle Serving
+安装指令如下，更多wheel包请参考[serving官网文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)
+```
+# 安装client和serving app，用于向服务发送请求
+pip install paddle_serving_app paddle_serving_client
+
+# 安装serving，用于启动服务
+# CPU server
+pip install paddle_serving_server
+
+# GPU server, 选择跟本地环境一致的命令:
+# CUDA10.2 + Cudnn7 + TensorRT6
+pip install paddle-serving-server-gpu==0.8.3.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple
+# CUDA10.1 + TensorRT6
+pip install paddle-serving-server-gpu==0.8.3.post101 -i https://pypi.tuna.tsinghua.edu.cn/simple
+# CUDA11.2 + TensorRT8
+pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+默认开启国内清华镜像源来加速下载，如果您使用 HTTP 代理可以关闭(-i https://pypi.tuna.tsinghua.edu.cn/simple)
+
+
+### 安装FasterTokenizer文本处理加速库（可选）
+如果部署环境是Linux，推荐安装faster_tokenizer可以得到更极致的文本处理效率，进一步提升服务性能。目前暂不支持Windows设备安装，将会在下个版本支持。
+```
+pip install faster_tokenizer
+```
+
+
+## 模型转换
+
+使用Paddle Serving做服务化部署时，需要将保存的inference模型转换为serving易于部署的模型。
+
+用已安装的paddle_serving_client将静态图参数模型转换成serving格式。如何使用[静态图导出脚本](export_model.py)将训练后的模型转为静态图模型详见[模型静态图导出](../../README.md)。
+
+```bash
+# 模型地址--dirname根据实际填写即可
+python -m paddle_serving_client.convert --dirname ../../export --model_filename float32.pdmodel --params_filename float32.pdiparams
+
+
+# 可通过命令查参数含义
+python -m paddle_serving_client.convert --help
+```
+转换成功后的目录如下:
+```
+serving_server/
+├── float32.pdiparams
+├── float32.pdmodel
+├── serving_server_conf.prototxt
+└── serving_server_conf.stream.prototxt
+```
+
+## 部署模型
+
+serving目录包含启动pipeline服务和发送预测请求的代码和模型，包括：
+
+```
+serving/
+├──serving_server
+│  ├── float32.pdiparams
+│  ├── float32.pdmodel
+│  ├── serving_server_conf.prototxt
+│  └── serving_server_conf.stream.prototxt
+├──config.yml        # 层次分类任务启动服务端的配置文件
+├──rpc_client.py     # 层次分类任务发送pipeline预测请求的脚本
+└──service.py        # 层次分类任务启动服务端的脚本
+
+```
+
+### 修改配置文件
+目录中的`config.yml`文件解释了每一个参数的含义，可以根据实际需要修改其中的配置。比如：
+```
+# 修改模型目录为下载的模型目录或自己的模型目录:
+model_config: serving_server =>  model_config: erine-3.0-tiny/serving_server
+
+# 修改rpc端口号为9998
+rpc_port: 9998   =>   rpc_port: 9998
+
+# 修改使用GPU推理为使用CPU推理:
+device_type: 1    =>   device_type: 0
+```
+
+### 分类任务
+#### 启动服务
+修改好配置文件后，执行下面命令启动服务:
+```
+python service.py
+```
+输出打印如下:
+```
+[DAG] Succ init
+[PipelineServicer] succ init
+......
+--- Running analysis [ir_graph_to_program_pass]
+I0624 06:31:00.891119 13138 analysis_predictor.cc:1007] ======= optimize end =======
+I0624 06:31:00.899907 13138 naive_executor.cc:102] ---  skip [feed], feed -> token_type_ids
+I0624 06:31:00.899941 13138 naive_executor.cc:102] ---  skip [feed], feed -> input_ids
+I0624 06:31:00.902855 13138 naive_executor.cc:102] ---  skip [linear_147.tmp_1], fetch -> fetch
+[2022-06-24 06:31:01,899] [ WARNING] - Can't find the faster_tokenizers package, please ensure install faster_tokenizers correctly. You can install faster_tokenizers by `pip install faster_tokenizers`(Currently only work for linux platform).
+[2022-06-24 06:31:01,899] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'ernie-2.0-base-en'.
+[2022-06-24 06:31:01,899] [    INFO] - Already cached /root/.paddlenlp/models/ernie-2.0-base-en/vocab.txt
+[OP Object] init success
+```
+
+#### 启动client测试
+注意执行客户端请求时关闭代理，并根据实际情况修改server_url地址(启动服务所在的机器)
+```
+python rpc_client.py
+```
+输出打印如下:
+```
+text:  b'a high degree of uncertainty associated with the emission inventory for china tends to degrade the performance of chemical transport models in predicting pm2.5 concentrations especially on a daily basis. in this study a novel machine learning algorithm, geographically -weighted gradient boosting machine (gw-gbm), was developed by improving gbm through building spatial smoothing kernels to weigh the loss function. this modification addressed the spatial nonstationarity of the relationships between pm2.5 concentrations and predictor variables such as aerosol optical depth (aod) and meteorological conditions. gw-gbm also overcame the estimation bias of pm2.5 concentrations due to missing aod retrievals, and thus potentially improved subsequent exposure analyses. gw-gbm showed good performance in predicting daily pm2.5 concentrations (r-2 = 0.76, rmse = 23.0 g/m(3)) even with partially missing aod data, which was better than the original gbm model (r-2 = 0.71, rmse = 25.3 g/m(3)). on the basis of the continuous spatiotemporal prediction of pm2.5 concentrations, it was predicted that 95% of the population lived in areas where the estimated annual mean pm2.5 concentration was higher than 35 g/m(3), and 45% of the population was exposed to pm2.5 >75 g/m(3) for over 100 days in 2014. gw-gbm accurately predicted continuous daily pm2.5 concentrations in china for assessing acute human health effects. (c) 2017 elsevier ltd. all rights reserved.'
+label:  0,8
+--------------------
+...
+```
diff --git a/...ications/text_classification/hierarchical_classification/deploy/paddle_serving/config.yml b/...ications/text_classification/hierarchical_classification/deploy/paddle_serving/config.yml
@@ -0,0 +1,59 @@
+#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
+rpc_port: 18090
+
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 9999
+
+#worker_num, 最大并发数。
+#当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+#当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+
+#build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
+build_dag_each_worker: false
+
+dag:
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+
+    #重试次数
+    retry: 1
+
+    #使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
+    use_profile: false
+    tracer:
+        interval_s: 10
+
+op:
+    seq_cls:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+        local_service_conf:
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+
+            #模型路径
+            model_config: serving_server
+
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["linear_147.tmp_1"]
+
+            # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 1
+
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "3"
+
+            #use_mkldnn
+            #use_mkldnn: True
+
+            #thread_num
+            thread_num: 1
+
+            #ir_optim
+            ir_optim: True
+
+            #开启tensorrt后，进行优化的子图包含的最少节点数
+            #min_subgraph_size: 10
diff --git a/...tions/text_classification/hierarchical_classification/deploy/paddle_serving/rpc_client.py b/...tions/text_classification/hierarchical_classification/deploy/paddle_serving/rpc_client.py
@@ -0,0 +1,47 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server.pipeline import PipelineClient
+from numpy import array, float32
+
+import numpy as np
+
+
+class Runner(object):
+
+    def __init__(
+        self,
+        server_url: str,
+    ):
+        self.client = PipelineClient()
+        self.client.connect([server_url])
+
+    def Run(self, data):
+        data = np.array([x.encode('utf-8') for x in data], dtype=np.object_)
+        ret = self.client.predict(feed_dict={"sentence": data})
+        for d, l, in zip(data, eval(ret.value[0])):
+            print("text: ", d)
+            print("label: ", l)
+            print("--------------------")
+        return
+
+
+if __name__ == "__main__":
+    server_url = "127.0.0.1:18090"
+    runner = Runner(server_url)
+    texts = [
+        "a high degree of uncertainty associated with the emission inventory for china tends to degrade the performance of chemical transport models in predicting pm2.5 concentrations especially on a daily basis. in this study a novel machine learning algorithm, geographically -weighted gradient boosting machine (gw-gbm), was developed by improving gbm through building spatial smoothing kernels to weigh the loss function. this modification addressed the spatial nonstationarity of the relationships between pm2.5 concentrations and predictor variables such as aerosol optical depth (aod) and meteorological conditions. gw-gbm also overcame the estimation bias of pm2.5 concentrations due to missing aod retrievals, and thus potentially improved subsequent exposure analyses. gw-gbm showed good performance in predicting daily pm2.5 concentrations (r-2 = 0.76, rmse = 23.0 g/m(3)) even with partially missing aod data, which was better than the original gbm model (r-2 = 0.71, rmse = 25.3 g/m(3)). on the basis of the continuous spatiotemporal prediction of pm2.5 concentrations, it was predicted that 95% of the population lived in areas where the estimated annual mean pm2.5 concentration was higher than 35 g/m(3), and 45% of the population was exposed to pm2.5 >75 g/m(3) for over 100 days in 2014. gw-gbm accurately predicted continuous daily pm2.5 concentrations in china for assessing acute human health effects. (c) 2017 elsevier ltd. all rights reserved.",
+        "previous research exploring cognitive biases in bulimia nervosa suggests that attentional biases occur for both food-related and body-related cues. individuals with bulimia were compared to non-bulimic controls on an emotional-stroop task which contained both food-related and body-related cues. results indicated that bulimics (but not controls) demonstrated a cognitive bias for both food-related and body related cues. however, a discrepancy between the two cue-types was observed with body-related cognitive biases showing the most robust effects and food-related cognitive biases being the most strongly associated with the severity of the disorder. the results may have implications for clinical practice as bulimics with an increased cognitive bias for food-related cues indicated increased bulimic disorder severity. (c) 2016 elsevier ltd. all rights reserved.",
+        "posterior reversible encephalopathy syndrome (pres) is a reversible clinical and neuroradiological syndrome which may appear at any age and characterized by headache, altered consciousness, seizures, and cortical blindness. the exact incidence is still unknown. the most commonly identified causes include hypertensive encephalopathy, eclampsia, and some cytotoxic drugs. vasogenic edema related subcortical white matter lesions, hyperintense on t2a and flair sequences, in a relatively symmetrical pattern especially in the occipital and parietal lobes can be detected on cranial mr imaging. these findings tend to resolve partially or completely with early diagnosis and appropriate treatment. here in, we present a rare case of unilateral pres developed following the treatment with pazopanib, a testicular tumor vascular endothelial growth factor (vegf) inhibitory agent."
+    ]
+    runner.Run(texts)
diff --git a/...ications/text_classification/hierarchical_classification/deploy/paddle_serving/service.py b/...ications/text_classification/hierarchical_classification/deploy/paddle_serving/service.py
@@ -0,0 +1,86 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_server.web_service import WebService, Op
+
+from numpy import array
+
+import logging
+import numpy as np
+
+_LOGGER = logging.getLogger()
+
+
+class Op(Op):
+
+    def init_op(self):
+        from paddlenlp.transformers import AutoTokenizer
+        self.tokenizer = AutoTokenizer.from_pretrained("ernie-2.0-base-en",
+                                                       use_faster=True)
+        # Output nodes may differ from model to model
+        # You can see the output node name in the conf.prototxt file of serving_server
+        self.fetch_names = [
+            "linear_147.tmp_1",
+        ]
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        # convert input format
+        (_, input_dict), = input_dicts.items()
+        data = input_dict["sentence"]
+        if isinstance(data, str) and "array(" in data:
+            data = eval(data)
+        else:
+            _LOGGER.error("input value  {}is not supported.".format(data))
+        data = [i.decode('utf-8') for i in data]
+
+        # tokenizer + pad
+        data = self.tokenizer(data,
+                              max_length=512,
+                              padding=True,
+                              truncation=True)
+        input_ids = data["input_ids"]
+        token_type_ids = data["token_type_ids"]
+        # print("input_ids:", input_ids)
+        # print("token_type_ids", token_type_ids)
+        return {
+            "input_ids": np.array(input_ids, dtype="int64"),
+            "token_type_ids": np.array(token_type_ids, dtype="int64")
+        }, False, None, ""
+
+    def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
+
+        results = fetch_dict[self.fetch_names[0]]
+        results = np.array(results)
+        labels = []
+
+        for result in results:
+            label = []
+            result = 1 / (1 + (np.exp(-result)))
+            for i, p in enumerate(result):
+                if p > 0.5:
+                    label.append(str(i))
+            labels.append(','.join(label))
+        return {"label": labels}, None, ""
+
+
+class Service(WebService):
+
+    def get_pipeline_response(self, read_op):
+        return Op(name="seq_cls", input_ops=[read_op])
+
+
+if __name__ == "__main__":
+    service = Service(name="seq_cls")
+    service.prepare_pipeline_config("config.yml")
+    service.run_service()