-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How the function insert_target()
in data.py works?
#7
Comments
insert_target()
in data.py?insert_target()
in data.py works?
I have the same question. @yuboona: have you figured out, why do we need to do that? Thank you! |
I think it's a trick to use more contextual information. Generally, in my previous solution, I just treated punctuation restoration like a sequence labeling task(input one sentence to bert one time, and output all labels of all words in this sentence). But in this repo, it input one sentence, but it just predict one label of the word in the middle of the sentence. I hope this is clear enough. |
Thank you very much! Is your solution here: https://github.com/yuboona/punctuation-restoration-pytorch? You meant: |
Not really. I actually means that:
|
I got it. Thank you so much! Have you compare the performance of 2 models? Do you think the data preparation of this repo could help the model's performance? |
In fact, if use bert in these two ways, the perfermance is nearly same. So I don't really think this data preparation can help. Moreover, this data preparation makes input sequence length too long(compared to 1st way), then training time increased. |
Great! Many thanks, Yuboona! I will try your approach as well! |
@thaitrinh @yuboona Guys, I've made a visualization for target preparation. You can see here. |
I found if using
insert_target()
in data.py, the input data will be split to many sequences which have a lot of overlapping words to each other.I would like to know why process like this? I think it makes a lot of repeating data.
The text was updated successfully, but these errors were encountered: