-
Notifications
You must be signed in to change notification settings - Fork 0
English Knowledge Resources
###SQL-Based Resources
@TODO: The links for downloading the resources refer now to the old BIUTEE webpage. Refer to new Maven repository.
Some knowledge resources are stored as MySQL tables, provided as compressed .sql files. In order to use them:
- Download the resources from the links in the table below. Each file represents one MySQL schema, and may contain several knowledge resources. Note that you don't need to download them all, you may download only the schema files containing the resources you wish to use.
- Install the free SQL server MySQL.
- Install its administration tool MySQL Workbench.
- Run the server.
- Connect to the server via MySQL Workbench, and in it:
- Create a user named db_readonly, with password BIUTEE: ''Users and Privileges --> Add Account''
- Import the schema files to the database: ''Data Import/Restore --> Import from Dump Project Folder --> (input folder path containing uncompressed .sql files) --> Load Folder Contents --> (select all required schemas) --> Start Import''
- Make sure user db_readonly has read (SELECT) privileges to all of the tables in the imported schemas.
- Define an environment variable named MYSQL with a value referring to the MySQL server address (name or IP address) and port. For example: dbsql.cs.biu.ac.il:3306.
Schema Name | Knowledge Resources in Configuration | Schema Download | File Size (Compressed) |
BAP (Directional Similarity) | BAP | Download | 111 MB |
Lin Similarity | LIN_DEPENDENCY_ORIGINAL LIN_PROXIMITY_ORIGINAL |
Download | 236 MB |
Original DIRT | ORIG_DIRT | Download | 55 MB |
Wikipedia Knowledge Resource | WIKIPEDIA | Download | 214 MB |
Binary Lin, Dependency Reuters | BINARY_LIN LIN_DEPENDENCY_REUTERS |
Download | 2.4 GB |
Framenet | FRAMENET | Download | 228 KB |
Geo (Geographical Knowledge Resource) | GEO | Download | 1.4 MB |
ReVerb (Distributional Similarity with Global Constraints) | REVERB | Download | 161 MB |
###Redis-based Resources
####Distributional Similarity
Distribution
- Redis database files
- License: MIT license
- Download
#####Lexical
Java interface: SimilarityStorageBasedLexicalResource
######Lin proximity-based
Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, without dependency-based features. Top 1000 similarities were selected for each element.
About 28M rules.
######Lin dependency-based
Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.
About 20M rules.
######Directional similarities, Reuters
Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.
About 10M rules for left side, and about 15M rules for right side.
Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the English UKWac corpus, with dependency-based features. Top 1000 similarities were selected for each element.
About 21M rules for left side, and about 33M rules for right side.
Java interface: SimilarityStorageBasedDIRTSyntacticResource
Distributional similarity rules for English dependency paths (which appear at least 100 times in the corpus). The similarities were calculated by applying the DIRT method [Lin 1998] on the Reuters RCV1 and RCV2 corpora. Top 1000 similarities were selected for each element.
About 10M rules.
######Distributional Similarity with Global Constraints (Reverb), Redis-based
Distributional similarity rules for English predicates. The similarities were calculated by applying Berant's global optimization [Berant et al. 2013] on Reverb extractions [Fader et al. 2011].