Skip to content

Latest commit

 

History

History
165 lines (122 loc) · 8.84 KB

contributing.md

File metadata and controls

165 lines (122 loc) · 8.84 KB

😍 Contributing to GPTCache

Before contributing to GPTCache, it is recommended to read the usage doc example-doc. These two articles will introduce how to use GPTCache and the meaning of parameters of related functions.

In the process of contributing, pay attention to the parameter type, because there is currently no type restriction added.

Note that development MUST be based on the dev branch

First check which part you want to contribute:

  • Add a method to pre-process the llm request
  • Add a scalar store type
  • Add a vector store type
  • Add a new data manager
  • Add a embedding function
  • Add a similarity evaluation function
  • Add a method to post-process the cache answer list
  • Add a new process in handling chatgpt requests

Lazy import and automatic installation

For newly added third-party dependencies, lazy import and automatic installation are required. Implementation consists of the following steps:

  1. Lazy import
# The __init__.py file of the same directory under the new file
__all__ = ['Milvus']

from gptcache.utils.lazy_import import LazyImport

milvus = LazyImport('milvus', globals(), 'gptcache.cache.vector_data.milvus')


def Milvus(**kwargs):
    return milvus.Milvus(**kwargs)
  1. Automatic installation
# 2.1 Add the import method
# add new method to util/__init__.py
__all__ = ['import_pymilvus']

from gptcache.utils.dependency_control import prompt_install


def import_pymilvus():
    try:
        # pylint: disable=unused-import
        import pymilvus
    except ModuleNotFoundError as e:  # pragma: no cover
        prompt_install('pymilvus')
        import pymilvus  # pylint: disable=ungrouped-imports

# 2.2 use the import method in your file
from gptcache.util import import_pymilvus
import_pymilvus()

Add a method to pre-process the llm request

refer to the implementation of Pre.

  1. Make sure the input params, the data represents the original request dictionary object
  2. Implement the post method
  3. Add a usage example to example directory and add the corresponding content to example.md README.md
# The origin openai request
import openai

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

# This is the pre-process function of openai request, which is to get the last message
def last_content(data, **_):
    return data.get("messages")[-1]["content"]

Add a cache storage type

refer to the implementation of SQLDataBase.

  1. Implement the CacheStorage interface
  2. Make sure the newly added third-party libraries are lazy imported and automatic installation
  3. Add the new store to the CacheBase method
  4. Add a usage example to example directory and add the corresponding content to example.md README.md

Add a vector store type

refer to the implementation of milvus.

  1. Implement the VectorBase interface
  2. Make sure the newly added third-party libraries are lazy imported and automatic installation
  3. Add the new store to the VectorBase method
  4. Add a usage example to example directory and add the corresponding content to example.md README.md

Add a new data manager

refer to the implementation of MapDataManager, SSDataManager.

  1. Implement the DataManager interface
  2. Add the new store to the get_data_manager method
  3. Add a usage example to example directory and add the corresponding content to example.md README.md

Add a embedding function

refer to the implementation of cohere or openai.

  1. Add a new python file to embedding directory
  2. Make sure the newly added third-party libraries are lazy imported and automatic installation
  3. Implement the embedding function and make sure your output dimension
  4. Add a usage example to example directory and add the corresponding content to example.md README.md

Add a similarity evaluation function

refer to the implementation of SearchDistanceEvaluation or OnnxModelEvaluation

  1. Implement the SimilarityEvaluation interface
  2. Make sure the range of return value, the range method return the min and max value
  3. Make sure the input params of evaluation, you can learn more about in the user view model
rank = chat_cache.evaluation_func({
    "question": pre_embedding_data,
    "embedding": embedding_data,
}, {
    "question": cache_question,
    "answer": cache_answer,
    "search_result": cache_data,
}, extra_param=context.get('evaluation', None))
  1. Make sure the newly added third-party libraries are lazy imported and automatic installation
  2. Implement the similarity evaluation function
  3. Add a usage example to example directory and add the corresponding content to example.md README.md

Add a method to post-process the cache answer list

refer to the implementation of first or random_one

  1. Make sure the input params, you can learn more about in the adapter
  2. Make sure the newly added third-party libraries are lazy imported and automatic installation
  3. Implement the post method
  4. Add a usage example to example directory and add the corresponding content to example.md README.md
# Get the most similar one from multiple results
def first(messages):
    return messages[0]


# Randomly fetch one of many results
def random_one(messages):
    return random.choice(messages)

Add a new process in handling chatgpt requests

  1. Need to have a clear understanding of the current process, refer to the adapter
  2. Add a new process
  3. Make sure all examples work properly