Serving/python/examples/util at develop · hysunflower/Serving

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
README_CN.md		README_CN.md
get_acc.py		get_acc.py
show_profile.py		show_profile.py
timeline_trace.py		timeline_trace.py

README.md

Timeline Tool Tutorial

(简体中文|English)

The serving framework has a built-in function for predicting the timing of each stage of the service. The client controls whether to turn on the environment through environment variables. After opening, the information will be output to the screen.

export FLAGS_profile_client=1 #turn on the client timing tool for each stage
export FLAGS_profile_server=1 #turn on the server timing tool for each stage

After enabling this function, the client will print the corresponding log information to standard output during the prediction process.

In order to show the time consuming of each stage more intuitively, a script is provided to further analyze and process the log file.

When using, first save the output of the client to a file, taking profile as an example.

python show_profile.py profile ${thread_num}

Here the thread_num parameter is the number of processes when the client is running, and the script will calculate the average time spent in each phase according to this parameter.

The script calculates the time spent in each stage, divides by the number of threads to average, and prints to standard output.

python timeline_trace.py profile trace

The script converts the time-dot information in the log into a json format and saves it to a trace file. The trace file can be visualized through the tracing function of the Chrome browser.

Specific operation: Open the chrome browser, enter chrome://tracing/ in the address bar, jump to the tracing page, click the load button, and open the saved trace file to visualize the time information of each stage of the prediction service.

The data visualization output is shown as follow, it uses bert as service example GPU inference service. The server starts 4 GPU prediction, the client starts 4 processes, and the timeline of each stage when the batch size is 1. Among them, bert_pre represents the data preprocessing stage of the client, and client_infer represents the stage where the client completes sending and receiving prediction requests. process represents the process number of the client, and the second line of each process shows the timeline of each op of the server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

util

util

README.md

Timeline Tool Tutorial

Files

util

Directory actions

More options

Directory actions

More options

Latest commit

History

util

Folders and files

parent directory

README.md

Timeline Tool Tutorial