My graduation project (with three friends), dealing with tasks about TDT.
Contains some TDT5 token files.
Contains the corresponding boundary files.
Contains main() function only.
Codes that are shared among all other parts, e.g., the class 'Story'.
IF we dont have boundary files, how could we find out the boundaries between two documents in a token file?
Read data, do some pre-processing
Find out the link between two stories .
Detect some topics.
Detect the first story of a ceratin event.