Releases: enjalot/latent-scope
SAE Features
If you've embeded a dataset with nomic-embed-text-v1.5 you can "process SAE" in the embed step.
This will then annotate each row with SAE features from https://enjalot.github.io/latent-taxonomy/articles/about
You can then explore essentially the concepts that the embedding model uses to represent each data point.
You can also filter by a particular SAE feature to see which rows strongly activate for that concept.
0.5: UI Revamp
Numerous updates to Explore and Setup pages with many contributions from @jzhang621
Highlights:
- Explore page redesign with better filtering UX and more screen real estate for the map
- Setup process redesign, step-by-step
- More options for cluster labeling from huggingface, ollama or custom URLs
- Starting to support Sparse Autoencoder features
See closed milestone issues:
https://github.com/enjalot/latent-scope/issues?q=is%3Aissue+milestone%3A0.5+is%3Aclosed
Some more writing on the changes here:
https://enjalot.substack.com/p/hidden-states-and-latent-scope-05
Table improvements & embedding visualization
This release fixes a few bugs with the Explore page table UI and nearest neighbor search, making it much more reliable and performant.
Thank you to @hydrosquall for issues & PRs! #49 #50 #52
A new experimental feature for directly visualizing embeddings in the table is ready to try:
Use any Sentence Transformer from HuggingFace
This release adopts sentence transformers for embedding using local open source models downloaded automatically from HuggingFace hub.
It also keeps track of recently used models and brings it all together in a much improved selector component on the frontend.
Also includes a PR from @hydrosquall that fixed a bug using truncated embeddings in the nearest neighbor search.
One minor note: for now truncating of sentence transformers isn't supported as we don't have a way to tell if the model supports it arbitrarily. We could maintain a list of matroyshka enabled models separately.
export interactive plots
Export interactive DataMapPlots optionally instead of static thanks to @dhruv-anand-aintech
Fixes an unpinned dependency breaking transformers models
Export static plots
Implements #23, creating a UI to easily export static plots using datamapplot
Support more filetype inputs thanks to #40
Support proxy servers / alternate OpenAI compatible endpoints #44
The requirements.txt has been loosened so Python 3.12 should be enabled and more updated versions of some important pip modules will be installed
new models
Adds a few embedding models:
https://huggingface.co/Snowflake/snowflake-arctic-embed-s
https://huggingface.co/Snowflake/snowflake-arctic-embed-m-long
https://huggingface.co/BAAI/bge-m3
Also a new chat model for labeling:
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
Tweaked the labeling prompt to perform better
Improve setup flow
Minor improvements to the setup flow
Refined data export
Creating a scope now also creates a combined parquet of the input data and the scope annotations.
This makes loading curated scopes much easier in other workflows
0.2.0 Explore Overhaul
This release makes a number of improvements to the exploring and curation part of Latent Scope. You can now filter a number of ways from a unified interface and perform bulk actions on the filtered points.
The following issues were closed:
This wasn't closed, but now we can show images in the data table if there is an image url:
- #24 showing images
Improved documentation and a number of guides have been published to https://enjalot.github.io/latent-scope/