Skip to content

English Knowledge Resources

Roberto Zanoli edited this page Feb 13, 2014 · 1 revision

###SQL-Based Resources

@TODO: The links for downloading the resources refer now to the old BIUTEE webpage. Refer to new Maven repository.

Some knowledge resources are stored as MySQL tables, provided as compressed .sql files. In order to use them:

  • Download the resources from the links in the table below. Each file represents one MySQL schema, and may contain several knowledge resources. Note that you don't need to download them all, you may download only the schema files containing the resources you wish to use.
  • Install the free SQL server MySQL.
  • Install its administration tool MySQL Workbench.
  • Run the server.
  • Connect to the server via MySQL Workbench, and in it:
  • Create a user named db_readonly, with password BIUTEE: ''Users and Privileges --> Add Account''
  • Import the schema files to the database: ''Data Import/Restore --> Import from Dump Project Folder --> (input folder path containing uncompressed .sql files) --> Load Folder Contents --> (select all required schemas) --> Start Import''
  • Make sure user db_readonly has read (SELECT) privileges to all of the tables in the imported schemas.
  • Define an environment variable named MYSQL with a value referring to the MySQL server address (name or IP address) and port. For example: dbsql.cs.biu.ac.il:3306.
Schema Name Knowledge Resources in Configuration Schema Download File Size (Compressed)
BAP (Directional Similarity) BAP Download 111 MB
Lin Similarity LIN_DEPENDENCY_ORIGINAL
LIN_PROXIMITY_ORIGINAL
Download 236 MB
Original DIRT ORIG_DIRT Download 55 MB
Wikipedia Knowledge Resource WIKIPEDIA Download 214 MB
Binary Lin, Dependency Reuters BINARY_LIN
LIN_DEPENDENCY_REUTERS
Download 2.4 GB
Framenet FRAMENET Download 228 KB
Geo (Geographical Knowledge Resource) GEO Download 1.4 MB
ReVerb (Distributional Similarity with Global Constraints) REVERB Download 161 MB

###Redis-based Resources

####Distributional Similarity

Distribution

  • Redis database files
  • License: MIT license
  • Download

#####Lexical

Java interface: SimilarityStorageBasedLexicalResource

######Lin proximity-based

Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, without dependency-based features. Top 1000 similarities were selected for each element.

About 28M rules.

######Lin dependency-based

Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.

About 20M rules.

######Directional similarities, Reuters

Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.

About 10M rules for left side, and about 15M rules for right side.

Directional similarities, UkWAC

Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the English UKWac corpus, with dependency-based features. Top 1000 similarities were selected for each element.

About 21M rules for left side, and about 33M rules for right side.

Syntactic

Java interface: SimilarityStorageBasedDIRTSyntacticResource

DIRT, Reuters, Redis-based

Distributional similarity rules for English dependency paths (which appear at least 100 times in the corpus). The similarities were calculated by applying the DIRT method [Lin 1998] on the Reuters RCV1 and RCV2 corpora. Top 1000 similarities were selected for each element.

About 10M rules.

######Distributional Similarity with Global Constraints (Reverb), Redis-based

Distributional similarity rules for English predicates. The similarities were calculated by applying Berant's global optimization [Berant et al. 2013] on Reverb extractions [Fader et al. 2011].

Clone this wiki locally