Skip to content

benitomartin/scraping-to-sql

Repository files navigation

Justicio Web Scraping to SQL

aws

Justicio is a Question/Answering Assistant that generates answers from user questions about the official state gazette of Spain: Boletín Oficial del Estado (BOE).

At this moment we are running a user-free service: Website and Repository

All BOE articles are embedded in vectors and stored in a vector database. When a question is asked, the question is embedded in the same latent space and the most relevant text is retrieved from the vector database by performing a query using the embedded question. The retrieved pieces of text are then sent to the LLM to construct an answer.

Tech Stack

Jupyter Notebook MySQL Python Pandas

Contributions

Web scraping of the municipal regulations of La Coruña and Oviedo and saving the file in an SQL dump for further usage in the vector database.