Byte-Barometer Backend

This directory defines the backend for the Byte-Barometer application. It also provides utility for populating the relevant vector database indices.

Quick summary

The backend application offers a websocket endpoint where clients can query for a subject making use of the hybrid embedding search capabilities of some vector databases. In short a query involves the following steps:

The query string is received over websocket, and the sender client id is noted.
The query string is converted into a sparse embedding and a dense embedding, using Splade and OpenAI respectively. These are weighted to prioritize between semantic and conventional search.
A vector database query is performed to find the N closest entries sorted by relevancy.
Aspect based sentiment analysis is applied on the entries, with the query string as the aspect, and as results become available they are published back to the client.

System setup

The backend has been containerized, but in order to make use of the GPU acceleration it is neccessary to install the nvidia-container-toolkit on the host system NVIDIA container toolkit. Ensure that nvidia-smi corresponds as expected in the host and container system.

Configuration

Create an .env file in the root of the project and add your Pinecone API key and environment details:

HUGGINGFACE_API_KEY=<api_key>
OPENAI_API_KEY=<api-key>
PINECONE_ENVIRONMENT=<environment>
PINECONE_API_KEY=<api-key>
PINECONE_INDEX=<index-name>
ENABLE_GPU=True

Getting started

The backend application will regularly fetch, process and store new comments from hackernews so that they may be queried. However this will just happen to new comments, to populate the index with an initial set of data you can do as follows:

source .venv/bin/activate
# Last two months, up to 200 000 documents
python3 populate.py -l 5184000 -d 200000

Alternatively if you prefer the docker image:

docker run --gpus all -v .:/app -it --env-file ../.env --entrypoint python3 byte-barometer populate.py -l 72000 -d 10000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Byte-Barometer Backend

Quick summary

System setup

Configuration

Getting started

Files

README.md

Latest commit

History

README.md

File metadata and controls

Byte-Barometer Backend

Quick summary

System setup

Configuration

Getting started