The data sets consist of isolated, Swedish sentences originating from online discussion forums (familjeliv.se and flashback.se). I have hand-labeled these with their respective speech acts. All the data sets are available on Kaggle.
The data were retrieved from corpora by Språkbanken and are licensed under CC BY 4.0. The original corpora were annotated in XML, but I formatted them to CoNLL-U. Furthermore, I have converted the POS-tags to Universal Dependencies POS-tags, as well as retagged the dependency relations to Universal Dependencies.
These are all CoNLL-U corpora. They all consist of sentences manually annotated with speech acts. The sentences were also automatically annotated with sentiment (positive, negative, neutral), and its probability score.
dev-set.conllu.bz2
- The dev (or validation) set.dev-test-set.conllu.bz2
- A test split of the dev set. Used for developing the rule-based classifier.dev-test-set-upsampled.conllu.bz2
- An upsampled version of dev-test-set.dev-train-set.conllu.bz2
- A train split of the dev set. Used for training the rule-based classifier.dev-train-set-upsampled.conllu.bz2
- An upsampled version of dev-train-set.test-set.conllu.bz2
- The test set used for evaluating the classifiers.test-set-upsampled.conllu.bz2
- An upsampled version of the test-set.
The train set is too large for GitHub, so it will be available somewhere else.
The corpora are formatted as CoNLL-U.
# sent_id = 2200888
# text = Känns hoppfull med så många exempel.
# date = 2009-10-26 16:19:10
# url = http://www.familjeliv.se/forum/thread/48269320-bara-solsken-och-hopp/1#anchor-m3
# genre = internet_forum
# x_sent_id = 053044fa6
# speech_act = expressive
# sentiment_label = positive
# sentiment_score = 0.9705862402915955
1 Känns känna|kännas VERB VB Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass 0 root _ _
2 hoppfull hoppfull ADJ JJ Case=Nom|Definite=Ind|Degree=Pos|Gender=Com|Number=Sing 1 xcomp _ _
3 med med ADP PP _ 6 case _ _
4 så så ADVERB AB _ 5 advmod _ _
5 många _ ADJ JJ Case=Nom|Definite=Def,Ind|Degree=Pos|Gender=Com,Neut|Number=Plur 6 amod _ _
6 exempel exempel NOUN NN Case=Nom|Definite=Ind|Gender=Neut|Number=Plur 1 obl _ SpaceAfter=No
7 . _ PUNCT MAD _ 1 punct _ _