Implementing GraphRAG (Graph-based Retrieval-Augmented Generation) using Azure OpenAI
Python latest version might give some errors in installing Gensim and another library, so I recommend you to consider python 3.10 version.
-
Open VSCode > Terminal (command prompt) and continue the below commands:
# Create a new directory mkdir Graphrag # Create a new conda environment with Python 3.10 conda create -p ./graphragvenv python=3.10 # Activate the created conda environment conda activate ./graphragvenv # Install and update pip python -m pip install --upgrade pip # Install and upgrade the setuptools package python -m pip install --upgrade setuptools
-
Install GraphRAG:
# Install GraphRAG pip install graphrag # If installing GraphRAG raises any error, run the below command python -m pip install --no-cache-dir --force-reinstall gensim # Then install GraphRAG again pip install graphrag #OR #If error still comes #Ensure you have the correct version of the graphrag package installed. #Sometimes, issues arise from using an outdated or incorrect version. #You can specify the version during installation: pip install graphrag==0.1.1
-
Initialize GraphRAG:
python -m graphrag.index --init --root .
Running the above command, GraphRAG is initiated and creates folders and files.
-
Creating the Deployments of LLM Model and Embedding Model in Azure OpenAI:
- For embedding, the model is
text-embedding-3-small
- For LLM model, recommended is
GPT-4o
. After creating the deployments, you need to make some changes in thesettings.yaml
file.
- For embedding, the model is
-
Configure OPENAI_API_KEY:
.env
is also created when you initialize GraphRAG, you can configure yourOPENAI_API_KEY
there. -
Make Changes in
settings.yaml
File:Go to the
llm
section:- Configure your
api_key
in.env
i.e,${GRAPHRAG_API_KEY}
- Change
type
fromopenai_chat
toazure_openai_chat
- Change
model
to the model name that you deployed in Azure OpenAI.- In my case, I have taken
GPT-4o
, you can try with other models as well.
- In my case, I have taken
- Uncomment the
api_base
and replace that URL with the endpoint of the Azure OpenAI- In my case
<your_azure_openai_resource_group_name>
is used, you can give any name to the instance.
- In my case
- Uncomment the
api_version
, no need to make changes in that, you can use the defaultapi_version
- Uncomment the
deployement_name
, and replace that with the deployment you have given to the model while creating the deployment.- In my case, I have taken
<deployment_name_of_model>
- In my case, I have taken
In the same
settings.yaml
file, go to theembeddings
section and make some changes:- Configure your
api_key
in.env
i.e,${GRAPHRAG_API_KEY}
- Change
type
fromopenai_embeddings
toazure_openai_embeddings
- Change
model
to the model name that you deployed in Azure OpenAI.- In my case, I have taken
text-embedding-3-small
, you can try with other models as well.
- In my case, I have taken
- Uncomment the
api_base
and replace that URL with the endpoint of the Azure OpenAI- In my case
<your_azure_openai_resource_group_name>
is used, you can give any name to the instance.
- In my case
api_version
, you can comment that or you can uncomment and use that. It won't raise an error- Uncomment the
deployement_name
, and replace that with the deployment you have given to the model while creating the deployment.- In my case, I have taken
<deployment_name_of_model>
- In my case, I have taken
That's it. Now save the changes that you made in the
settings.yaml
file. - Configure your
-
Add Input File:
You need to add an input text file, to do that first create a folder by running the below command:
mkdir input
Now in this
input
folder, you need to add the.txt
file (input text data file). -
Run GraphRAG to Create Graphs on the Data:
python -m graphrag.index --root .
This command will run the GraphRAG and create the parquet files. This will convert all your text data into entities and relationships graphs. You can check that in
output > last folder > artifacts folder
(you have parquet files, which are converted data into graphs). -
Evaluate Our RAG Model:
Now we need to test our RAG model, to do that we need to ask questions. We are doing things in the command prompt, we need to ask questions to our model along with a command. To ask a question, you need to write a command and then the question:
python -m graphrag.query --root . --method local/global "Your_Question"
When you run this command, the model uses either
local
(if writtenlocal
) orglobal
(if writtenglobal
) to answer the question.