VW in a langchain chain

Install requirements.txt

There is an example notebook (rl_chain.ipynb) with basic usage of the chain.

TLDR:

Chain is initialized and creates a Vowpal Wabbit instance - only Contextual Bandits and Slates are supported for now
You can change the arguments at chain creation time
There is a default prompt but it can be changed
There is a default reward function that gets triggered and triggers learn automatically
- This can be turned off and score can be spcified explicitly

Flow:

Developer: creates chain
Developer: sets actions
Developer: calls chain with context and other prompt inputs
Chain: calls VW with the context and selects an action
Chain: action (and other vars) are passed to the LLM with the prompt
Chain: if default reward set, the LLM is called to judge and give a reward score of the response based on the context
Chain: VW learn is triggered with that score

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
.github/workflows		.github/workflows
rl_chain		rl_chain
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
prompt_selection.ipynb		prompt_selection.ipynb
requirements.txt		requirements.txt
rl_chain.ipynb		rl_chain.ipynb

Provide feedback