Unsupervised Distributional Semantic Class Induction with Watset

This repository contains code for running semantic class induction with Watset. Since this experiment mostly re-uses the already written code for other experiments, it contains convenient wrappers for the corresponding tools.

Dataset Download

$ make dt-59g-deps-wpf1k-fpw1k.csv.gz # distributional thesaurus (DT)
$ gunzip dt-59g-deps-wpf1k-fpw1k.csv.gz

$ make super-senses-wordnet.tsv # WordNet super senses dataset
$ make wordnet-flat-cut-depth-4-clusters-2017-minclusize-2.tsv # WordNet slices, d=4
$ make wordnet-flat-cut-depth-5-clusters-5737-minclusize-2.tsv # WordNet slices, d=5
$ make wordnet-flat-cut-depth-6-clusters-11274-minclusize-2.tsv # WordNet slices, d=6
$ make watset.jar # Watset

Dataset Pruning

The original distributional thesaurus (dt-59g-deps-wpf1k-fpw1k.csv.gz) is large, so we used a pruned version of this dataset.

$ ./dt-wordnet.py -t 0.001 -w super-senses-wordnet.tsv dt-59g-deps-wpf1k-fpw1k.csv -o dt-wordnet-0_001.txt
$ ./dt-wordnet.py -t 0.01 -w super-senses-wordnet.tsv dt-59g-deps-wpf1k-fpw1k.csv -o dt-wordnet-0_01.txt

This is performed similarly for the WordNet slices, e.g., for d=4.

$ ./dt-wordnet.py -t 0.001 -w wordnet-flat-cut-depth-4-clusters-2017-minclusize-2.tsv dt-59g-deps-wpf1k-fpw1k.csv -o dt-wordnet-d4-0_001.txt
$ ./dt-wordnet.py -t 0.01 -w wordnet-flat-cut-depth-4-clusters-2017-minclusize-2.tsv dt-59g-deps-wpf1k-fpw1k.csv -o dt-wordnet-d4-0_01.txt

Running

Make sure that all the variables in evaluate.sh are specified correctly. This script runs everything, including running the clustering algorithms and evaluating them. Note that the -s flag of supersenses_nmpu.groovy performs sampling, which is extremely slow. However, it can be disabled, which is the recommended behaviour during prototyping.

The results of the sampling can be checked with t-test using sampled_ttest.groovy. The output format is tab-separated: file1,file2,mean1,mean2,var1,var2,pvalue.

$ ./sampled_ttest.groovy eval/dt-wordnet-0_01-*.ser

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dt-wordnet.py		dt-wordnet.py
eval2table.awk		eval2table.awk
evaluate.sh		evaluate.sh
sampled_ttest.groovy		sampled_ttest.groovy
supersenses_nmpu.groovy		supersenses_nmpu.groovy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Distributional Semantic Class Induction with Watset

Dataset Download

Dataset Pruning

Running

About

Releases

Packages

Languages

License

umanlp/watset-classes

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Distributional Semantic Class Induction with Watset

Dataset Download

Dataset Pruning

Running

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages