Skip to content

Implementation of the Geo-Vec model for embedding documents

Notifications You must be signed in to change notification settings

gverkes/Geo-Vec-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geo Vec Model

We introduce a novel document representation learning model, Geometric Document Vectors (Geo-Vec). Inspired by recent developments in geometric deep learning our model encodes documents as graphs and treats an entire corpus as the result of a latent document topology manifold. Using a modified graph auto-encoder (GAE), our approach successfully propagates complex word relations utilizing the shared weights, thus creating a semantically rich latent space. An attention module is included, that serves as a topic filter to compress learned embeddings. We compare our model to several classic document representation learning models on an information retrieval task, and show that Geo-Vec performs on par or outperforms. The shared weights of the model only depend on the vocabulary and can thus enables training on very large corpora. Additionally, inference on unseen documents can be done efficiently by a simple forward pass.

About

Implementation of the Geo-Vec model for embedding documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published