Skip to content

Commit

Permalink
[update] the post
Browse files Browse the repository at this point in the history
  • Loading branch information
oceanumeric committed May 12, 2024
1 parent 38e4f54 commit f371171
Show file tree
Hide file tree
Showing 5 changed files with 195 additions and 4 deletions.
82 changes: 78 additions & 4 deletions docs/_blog/gpt-in-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,82 @@ Ever since the release of GPT-3 by OpenAI, the world has been buzzing with excit

I have heard many people told me that it is hard to scale the application with GPT models due to many reasons, such as the cost, the speed, and the performance. In this blog post, I will review the GPT models by OpenAI for certain tasks:

- Name Entity Recognition (NER) for news articles
- Summarization based on Goolge search results
- Ontology generation for a specific domain
- [Summarization based on Goolge search results](#summarization-based-on-goolge-search-results)
- [Ontology generation for a specific domain](#ontology-generation-for-a-specific-domain)
- [Name Entity Recognition for a specific domain](#name-entity-recognition-for-a-specific-domain)

I think overall, GPT models are great for many tasks, but querying time is a big issue.
## Summarization based on Goolge search results

During the development of the firm search and profile generator tool, I have used GPT-3.5-turbo-0125 to summarize the search results from Google. The summarization task is to generate a short description of the search results. The input to the model is the search results from Google, and the output is the summary of the search results.

Figure 1 gives the plot of the query time of GPT models for different tokens. The query time increases with the number of tokens, and the heterogeneity of the query time is also observed across different tokens.


<div class='figure'>
<img src="/blog/images/gpt-querytime-tokens.png"
alt="euleriana_map"
style="width: 87%; display: block; margin: 0 auto;"/>
<div class='caption'>
<span class='caption-label'>Figure 1.</span> Query time of GPT-3.5-turbo-0125 for different tokens.
</div>
</div>

I am not a premium user of OpenAI, so I do not know whether the query time is the same for all users. Maybe for those who pay more or have a higher usage, the query time is faster. As you can see from the figure 1, the average query time is around 2-3 seconds for any task with more than 300 tokens. One cannot argue that this kind of query time is acceptable for many applications.

<div class='figure'>
<img src="/blog/images/gpt-querycost-tokens.png"
alt="euleriana_map"
style="width: 87%; display: block; margin: 0 auto;"/>
<div class='caption'>
<span class='caption-label'>Figure 2.</span> Query cost of GPT-3.5-turbo-0125 for different tokens.
</div>
</div>

The good news is that the cost of the query is not that high. As you can see from the figure 2, the cost of the query is around 0.001 USD for any task with more than 300 tokens. It is actually very cheap to use GPT models for many applications.

There is another thing that I want to mention. __The function calling of the GPT models is not stable__. Sometimes, you got repeated results, and sometimes, you got different formats of results. I do not know why this happens, but it is something that you need to be aware of when you use GPT models. For instance, I set up the format of the results to be a nested json object, but sometimes, GPT models could not return the results in the format that I set up.


I embedded my code including prompt and function calling in the following gist:


<script src="https://gist.github.com/oceanumeric/fd777cbcfa31bbb4fb211ce02a4c3818.js"></script>

## Ontology generation for a specific domain

This is the task that I personally found it very interesting to explore with GPT models. The ontology generation task is to generate a list of concepts and their relationships for a specific domain. The input to the model is the domain-specific text, and the output is the list of concepts and their relationships.

Overall, the GPT models amazed me with their performance on the ontology generation task, especially GPT-4 models. However, since this task is relatively hard, the query time is way longer than the summarization task. The average query time is 5-6 seconds for any task with more than 300 tokens, which is twice as long as the summarization task. Sometimes, the query time could be longer than 10 seconds, which is not acceptable for many applications. The following figure shows the plot of the query time of GPT models for different tokens comparing to the summarization task.

<div class='figure'>
<img src="/blog/images/gpt-querytime-tasks.png"
alt="euleriana_map"
style="width: 87%; display: block; margin: 0 auto;"/>
<div class='caption'>
<span class='caption-label'>Figure 3.</span> Query time of GPT-4-turbo-0125 for different tokens.
</div>
</div>

It is interesting to notice that 'GPT modesl' behaves like a human being, which means that the _'thinking time is longer for harder tasks'_. For the harder tasks, we observe more heterogeneity in the query time across different tokens. The query time is more stable for the easier tasks, such as summarization.


## Name Entity Recognition for a specific domain

The last task that I want to review is the name entity recognition task. I did many experiments with GPT models on this task because one of my research projects is related to this task. The name entity recognition task is to recognize the names of entities in a specific domain. The input to the model is the news title, and the expected output is firm names, country names, and some properties of the entities, such as whether the firm is a public firm or a private firm. The following table gives some examples of the input and output of the name entity recognition task.


| News Title | Company Name | Location | Is Company |
|-------------------------------------------------------------------------------------------------------------|--------------------------------------------------|----------|------------|
| Deputy Director Hu Wei of the Institute of Hydrobiology visited Maotai Group for investigation and exchange | Maotai Group | China | true |
| Institute leaders visited Insect Technology Development Company | Insect Technology Development Company | China | true |
| Vice President Guo Ying of Sugon Group and his team visited Ganjiang Innovation Institute | Sugon Group | China | true |
| Delegation from GlaxoSmithKline visited Institute of Neuroscience | GlaxoSmithKline | Unknown | false |
| Hu Junpeng and his team from Angel Yeast Co., Ltd. visited Yarlong Academy of Sciences and exchanged views | Angel Yeast Co., Ltd. | China | false |

The performance of GPT models on the name entity recognition task is truly amazing. The average F1 score is around 0.9 for any task with more than 300 tokens. For my research project, the acuracy of the name entity recognition task is around 0.92, which is very high. For some news titles in Chinese, the accuracy could be lower, which is around 0.89. It is still acceptable for many applications.

If you have tried other NLP libraries, such as spaCy, or other NER models, such as BERT, you will find that the performance of GPT models is much better than those models.

In conclusion, the GPT models by OpenAI are very powerful for many NLP tasks. The performance of the models is very high, and the cost of the models is very low. However, the speed of the models is not that fast, especially for harder tasks. The query time is around 2-3 seconds for the summarization task, 5-6 seconds for the ontology generation task, and 1-2 seconds for the name entity recognition task. The query time is more stable for the easier tasks, such as summarization, and more heterogeneity is observed in the query time across different tokens for the harder tasks.

I think the trade-off between the performance, the speed, and the cost will shapre the competition of the GPT models in the future. As we all know, there is no free lunch in the world. How to balance the trade-off is the key to the success of the GPT models.
Binary file added docs/blog/images/gpt-querycost-tokens.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/blog/images/gpt-querytime-tasks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/blog/images/gpt-querytime-tokens.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
117 changes: 117 additions & 0 deletions scripts/gpt_in_review.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# ------------------------------------------------------------------------------
# Load the required libraries
library(pacman)
p_load(stringr, data.table, magrittr, ggplot2, XML, RCurl, knitr, vtree, SPARQL)

# color palette
gray_scale <- c('#F3F4F8','#D2D4DA', '#B3B5BD',
'#9496A1', '#7d7f89', '#777986',
'#656673', '#5B5D6B', '#4d505e',
'#404352', '#2b2d3b', '#282A3A',
'#1b1c2a', '#191a2b',
'#141626', '#101223')

ft_palette <- c('#990F3D', '#0D7680', '#0F5499', '#262A33', '#FFF1E5')

ft_contrast <- c('#F83', '#00A0DD', '#C00', '#006F9B', '#F2DFCE', '#FF7FAA',
'#00994D', '#593380')

peep_head <- function(dt, n = 5) {
dt %>%
head(n) %>%
kable()
}

peep_sample <- function(dt, n = 5) {
dt %>%
.[sample(.N, n)] %>%
kable()
}

peep_tail <- function(dt, n = 5) {
dt %>%
tail(n) %>%
kable()
}
# ------------------------------------------------------------------------------

dt <- fread('../data/gpt-statistics.csv')

dt %>%
peep_head()


options(repr.plot.width = 7, repr.plot.height = 4.5, repr.plot.res = 300)
dt %>%
ggplot(aes(x = total_tokens, y = query_time)) +
geom_point() +
geom_smooth(method = 'lm') +
theme_bw(base_size = 14) +
labs(title = 'Query Time vs Total Tokens (gpt-3.5-turbo-0125)',
x = 'Total Tokens',
y = 'Query Time (s)')


options(repr.plot.width = 9, repr.plot.height = 5, repr.plot.res = 300)
par(bg = 'white', cex.axis = 1.5, cex.lab = 1.5)
layout_matrix <- matrix(c(1,1,2), nrow = 1)
# Set the layout
layout(layout_matrix, widths = c(1.3,1), heights = 1)

# First plot
dt %>%
with(boxplot(query_time ~ total_tokens, data = ., col = ft_palette[1],
main = 'Query Time vs Total Tokens (gpt-3.5-turbo-0125)',
xlab = 'Total Tokens',
ylab = 'Query Time (s)'))

# Second plot
dt %>%
with(boxplot(query_time, data = ., col = ft_palette[1],
main = 'Query Time Distribution',
ylab = 'Query Time (s)'))


options(repr.plot.width = 7, repr.plot.height = 4.5, repr.plot.res = 300)
par(bg = 'white', cex.axis = 1, cex.lab = 1)
dt %>%
with(boxplot(query_cost ~ total_tokens, data = ., col = ft_palette[2],
main = 'Query Cost vs Total Tokens (gpt-3.5-turbo-0125)',
xlab = 'Total Tokens',
ylab = 'Query Cost ($)'))


dt2 <- fread('../data/gpt-statistics-onto.csv')

dt2 %>% str()

dt2 %>%
peep_head()


# boxplot for query time
dt %>%
.[, .(query_time, total_tokens)] %>%
# random sample with 60 rows
.[sample(.N, 60)] %>%
# add a new column for the model called 'task'
.[, task := 'summarization'] -> dt_summ


dt2 %>%
.[, .(total_tokens, query_time)] %>%
.[, task := 'ontology'] -> dt_onto


# combine the two data.tables
rbind(dt_summ, dt_onto) %>%
.[total_tokens < 4000] %>%
ggplot(aes(x = total_tokens, y = query_time, color = task)) +
geom_point(size=2) +
theme_bw(base_size = 14) +
labs(title = 'Query Time vs Total Tokens (gpt-3.5-turbo-0125)',
x = 'Total Tokens',
y = 'Query Time (s)') +
# move the legend to the top right and with transparent background
theme(legend.position = c(0.8, 0.8),
legend.background = element_rect(fill = 'transparent'))

0 comments on commit f371171

Please sign in to comment.