Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Query Decompose module #18

Merged
merged 8 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions autorag/nodes/queryexpansion/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .query_decompose import query_decompose
93 changes: 93 additions & 0 deletions autorag/nodes/queryexpansion/query_decompose.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import asyncio
from typing import List

from llama_index.llms.llm import BaseLLM

decompose_prompt = """Decompose a question in self-contained sub-questions. Use \"The question needs no decomposition\" when no decomposition is needed.

Example 1:

Question: Is Hamlet more common on IMDB than Comedy of Errors?
Decompositions:
1: How many listings of Hamlet are there on IMDB?
2: How many listing of Comedy of Errors is there on IMDB?

Example 2:

Question: Are birds important to badminton?

Decompositions:
The question needs no decomposition

Example 3:

Question: Is it legal for a licensed child driving Mercedes-Benz to be employed in US?

Decompositions:
1: What is the minimum driving age in the US?
2: What is the minimum age for someone to be employed in the US?

Example 4:

Question: Are all cucumbers the same texture?

Decompositions:
The question needs no decomposition

Example 5:

Question: Hydrogen's atomic number squared exceeds number of Spice Girls?

Decompositions:
1: What is the atomic number of hydrogen?
2: How many Spice Girls are there?

Example 6:

Question: {question}

Decompositions:"
"""


def query_decompose(queries: List[str], llm: BaseLLM,
prompt: str = decompose_prompt) -> List[List[str]]:
"""
decompose query to little piece of questions.
:param queries: List[str], queries to decompose.
:param llm: BaseLLM, language model to use.
:param prompt: str, prompt to use for query decomposition.
default prompt is guidelines into simpler sub-questions or stating no decomposition is needed, illustrated with examples.
:return: List[List[str]], list of decomposed query. Return input query if query is not decomposable.
"""
# Run async query_decompose_pure function
tasks = [query_decompose_pure(query, llm, prompt) for query in queries]
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*tasks))
return results


async def query_decompose_pure(query: str, llm: BaseLLM,
prompt: str = decompose_prompt) -> List[str]:
"""
decompose query to little piece of questions.
:param query: str, query to decompose.
:param llm: BaseLLM, language model to use.
:param prompt: str, prompt to use for query decomposition.
bwook00 marked this conversation as resolved.
Show resolved Hide resolved
default prompt is guidelines into simpler sub-questions or stating no decomposition is needed, illustrated with examples.
bwook00 marked this conversation as resolved.
Show resolved Hide resolved
:return: List[str], list of decomposed query. Return input query if query is not decomposable.
"""
full_prompt = "prompt: " + prompt + "\n\n" "question: " + query
answer = llm.complete(full_prompt)
if answer.text == "the question needs no decomposition.":
return [query]
try:
lines = [line.strip() for line in answer.text.splitlines() if line.strip()]
if lines[0].startswith("Decompositions:"):
lines.pop(0)
questions = [line.split(':', 1)[1].strip() for line in lines if ':' in line]
if not questions:
return [query]
return questions
except:
return [query]
11 changes: 11 additions & 0 deletions docs/source/api_spec/autorag.nodes.queryexpansion.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
autorag.nodes.queryexpansion package
====================================

Submodules
----------

autorag.nodes.queryexpansion.query\_decompose module
----------------------------------------------------

.. automodule:: autorag.nodes.queryexpansion.query_decompose
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

Expand Down
11 changes: 11 additions & 0 deletions tests/autorag/nodes/queryexpansion/test_query_decompose.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from llama_index.llms.openai import OpenAI

from autorag.nodes.queryexpansion import query_decompose

sample_query = ["Which group has more members, Newjeans or Espa?", "Which group has more members, STAYC or Espa?"]


def test_query_decompose():
llm = OpenAI(temperature=0.2)
result = query_decompose(sample_query, llm)
assert len(result[0]) > 1
bwook00 marked this conversation as resolved.
Show resolved Hide resolved
Loading