How the function `insert_target()` in data.py works? #7

yuboona · 2020-06-02T13:39:46Z

Question:

I found if using insert_target() in data.py, the input data will be split to many sequences which have a lot of overlapping words to each other.

I would like to know why process like this? I think it makes a lot of repeating data.

The text was updated successfully, but these errors were encountered:

thaitrinh · 2020-11-17T12:54:16Z

I have the same question. @yuboona: have you figured out, why do we need to do that? Thank you!

yuboona · 2020-11-17T13:10:19Z

I have the same question. @yuboona: have you figured out, why do we need to do that? Thank you!

I think it's a trick to use more contextual information. Generally, in my previous solution, I just treated punctuation restoration like a sequence labeling task(input one sentence to bert one time, and output all labels of all words in this sentence). But in this repo, it input one sentence, but it just predict one label of the word in the middle of the sentence. I hope this is clear enough.

thaitrinh · 2020-11-17T13:27:12Z

Thank you very much!

Is your solution here: https://github.com/yuboona/punctuation-restoration-pytorch?

You meant:
In your solution: Input: "this is the first sentence this is the second sentence" -> Output: "This is the first sentence. This the second sentence."
In this repo: Input: "this is the first sentence this is the second sentence" -> Output: "0 0 0 0 0 PERIOD 0 0 0 0 0 PERIOD".
Do I understand you correctly?

yuboona · 2020-11-17T13:37:44Z

Thank you very much!

Is your solution here: https://github.com/yuboona/punctuation-restoration-pytorch?

You meant:
In your solution: Input: "this is the first sentence this is the second sentence" -> Output: "This is the first sentence. This the second sentence."
In this repo: Input: "this is the first sentence this is the second sentence" -> Output: "0 0 0 0 0 PERIOD 0 0 0 0 0 PERIOD".
Do I understand you correctly?

Not really. I actually means that:

In my repo: Input: "this is the first sentence this is the second sentence" -> Output: "0 0 0 0 0 PERIOD 0 0 0 0 0 PERIOD".
In this repo:
1. Input: "this is the first sentence this is the second sentence" -> Output: "PERIOD"
2. Input: "is the first sentence this is the second sentence this" -> Output: "0"
3. Input: " the first sentence this is the second sentence this is" -> Output: "0"
4. .......
5. .......
  This insert_target() generates the input data and label like above.

thaitrinh · 2020-11-17T13:49:11Z

I got it. Thank you so much!

Have you compare the performance of 2 models? Do you think the data preparation of this repo could help the model's performance?

yuboona · 2020-11-17T14:34:30Z

I got it. Thank you so much!

Have you compare the performance of 2 models? Do you think the data preparation of this repo could help the model's performance?

In fact, if use bert in these two ways, the perfermance is nearly same. So I don't really think this data preparation can help. Moreover, this data preparation makes input sequence length too long(compared to 1st way), then training time increased.

thaitrinh · 2020-11-17T17:16:04Z

Great! Many thanks, Yuboona! I will try your approach as well!

kotikkonstantin · 2021-11-21T20:32:31Z

@thaitrinh @yuboona Guys, I've made a visualization for target preparation. You can see here.
Also, in my version, I use stacked hidden states instead of LM-logits. For Russian, it works better. And it decreases the dimension of input for FC-layer

yuboona changed the title ~~How the function insert_target() in data.py?~~ How the function insert_target() in data.py works? Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How the function `insert_target()` in data.py works? #7

How the function `insert_target()` in data.py works? #7

yuboona commented Jun 2, 2020

thaitrinh commented Nov 17, 2020

yuboona commented Nov 17, 2020

thaitrinh commented Nov 17, 2020

yuboona commented Nov 17, 2020 •

edited

Loading

thaitrinh commented Nov 17, 2020

yuboona commented Nov 17, 2020

thaitrinh commented Nov 17, 2020

kotikkonstantin commented Nov 21, 2021 •

edited

Loading

How the function insert_target() in data.py works? #7

How the function insert_target() in data.py works? #7

Comments

yuboona commented Jun 2, 2020

thaitrinh commented Nov 17, 2020

yuboona commented Nov 17, 2020

thaitrinh commented Nov 17, 2020

yuboona commented Nov 17, 2020 • edited Loading

thaitrinh commented Nov 17, 2020

yuboona commented Nov 17, 2020

thaitrinh commented Nov 17, 2020

kotikkonstantin commented Nov 21, 2021 • edited Loading

How the function `insert_target()` in data.py works? #7

How the function `insert_target()` in data.py works? #7

yuboona commented Nov 17, 2020 •

edited

Loading

kotikkonstantin commented Nov 21, 2021 •

edited

Loading