This is the code used in the paper:
"Acquiring Predicate Paraphrases from News Tweets"
Vered Shwartz, Gabriel Stanovsky and Ido Dagan. *SEM 2017. link
The steps performed to create the resource:
We executed the script get_daily_news_stream.sh
and now we can sit back and relax while the job is performed automatically for us... But if you want a detailed explanation step-by-step:
-
Obtain news tweets:
Querying the Twitter Search API for news:get_news_tweets_stream.py --consumer_key=<consumer_key> --consumer_secret=<consumer_secret> --access_token=<access_token> --access_token_secret=<access_token_secret> [--until=<until>]
where
consumer_key
,consumer_secret
,access_token
andaccess_token_secret
are obtained by registering to the Twitter API as an app, in here. The optional argumentuntil
is a date in the formatYYYY/MM/dd
if you'd like to retrieve tweets only until this date. The Search API only supports up to one week ago. If this argument is not specified, it will retrieve current tweets. This script will save the tweets in a file named by the date they were created at.Important note: we downloaded TwitterSearch and changed the code to add the news filter to the search URL. If you want to get news tweets, you should do the same.
-
Extract propositions:
prop_extraction --in=[tweet_folder] --out=[prop_folder]
Note: You can also install our proposition extraction as a stand-alone tool.
-
Generate positive instances:
get_corefering_predicates.py [tweets_file] [out_file]
-
Package the resource:
cat news_stream/positive/* | cut -f1,2,4,5,6,7,8,10,11,12,13,14 > resource python -u package_resource.py resource [repository_dir]
where
news_stream/positive/
is where we keep all the positive instances files.cut
is used to remove the tweets, to comply with Twitter policy.package_resource.py
updates the resource file under[repository_dir]\resource
and pushes the changes.