Chat-PDF is an application designed to facilitate PDF uploads and answer questions related to them.
For the frontend React code, please refer here.
- Python: Install Python from its original site.
-
Fork the Repository Fork the repository into your own GitHub account.
-
Clone your newly forked repository from GitHub onto your local computer.
- Run
python -m venv .venv
to create a virtual environment. - Download the dependencies mentioned in the
requirements.txt
file.
- Obtain your own OpenAI key from here.
- Create a
.env
file. - Set up your OpenAI key within it.
- Run the command
uvicorn main:app --reload
to start the application. - Navigate to http://127.0.0.1:8000/docs to test the APIs.
Our application offers three APIs:
-
PDF Upload API: This API accepts a PDF file and sets up chains to answer questions related to the PDF content.
-
Question Answering API: With this API, you can submit a question, and utilizing the language-based knowledge chains established earlier which returns a suitable answer extracted from your PDF.
-
PDF Retrieval API: This API allows you to retrieve all the PDFs that have been uploaded. ( Note : this can be customized with each user and their pdfs but authentication is not the scope of this project. )
- API accepts a file and validates if it's a PDF. If not, it returns a 400 error.
- The PDF file is read, and its binary content is converted into bytes using IO.
- Filename and filesize are stored in a PostgreSQL database for future retrieval.
- The content of pdf is extracted using FileReader from pypdf.
- The extracted text is divided into smaller chunks for efficient processing.
- Embeddings are created from these chunks, establishing a chain to track the conversation.
- Semantic search is performed based on the user's question.
- Using OpenAI's language model, an appropriate answer is retrieved based on the semantic search results.