Skip to content

VHRanger/YASE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yet Another Sentence Embedding Library

The goal of this library is to make it easy to transform lists of sentences or sets of sentences into a matrix of embeddings (eg. one per sentence). This can be done either at the sentence/document level or by grouping sentence embeddings into grouped embeddings.

Such matrices of documents can easily be queried using kd-trees (see notebook in examples) for the most similar document in training data to a queried sentence. It can also be used to cluster document groups together solely by the text in the campaign.

The results can be tested for quality on a handcrafted evaluation dataset by checking how well the sentence embeddings cluster around the natural clusters of the existing ad campaigns.

Examples

The library works out of the box with gensim models, sentence-transformers models, and fse models.

    import yase

    df = ...

    # Gensim Model
    import gensim.downloader as model_api
    gensim_model = model_api.load("glove-wiki-gigaword-300")
    gsb = yase.encoders.embed_column(df.body, model=gensim_model,verbose=True)
    
    # sentence-transformers model
    from sentence_transformers import SentenceTransformer
    sb_model = SentenceTransformer('all-MiniLM-L6-v2')
    rsb = yase.encoders.embed_column(df.body, model=sb_model, verbose=True)

    # FSE model
    from fse import Vectors, Average
    vecs = Vectors.from_pretrained("glove-wiki-gigaword-50")
    fse_model = Average(vecs)
    fsb = yase.encoders.embed_column(df.body, model=fse_model,verbose=True)

Model Routing by Language Detection

Running unit tests

python -m unittest discover tests

About

Yet Another Sentence Embedding Library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages