feat: Add `ConditionalRouter` Haystack 2.x component #6147

vblagoje · 2023-10-21T18:36:51Z

Why:

Enable generic and conditionally expressive pipeline routing functionality by introducing a new Router component.
The Router component orchestrates the flow of data by evaluating specified route conditions to determine the appropriate route among a set of provided route alternatives.
fixes Add Conditional Routing in Haystack 2.x Pipelines #6109

What:

Added a new Router class to haystack/preview/components/routers.
Updated __init__.py to include the new Router class.

How can it be used:

Import and utilize the Router component to manage and route connections in your pipelines.
Here is an example:

In this example, we create a Router component with two routes. The first route will be selected if the number of streams is less than 2, and will output the query variable. The second route will be selected if the number of streams is 2 or more, and will output the streams variable. We also specify the routing variables, which are query and streams. These variables need to be provided in the pipeline run() method. Routing variables can be used in route conditions and as route output values.

routes = [
  {"condition": "len(streams) < 2", "output": "query", "output_type": str},
  {"condition": "len(streams) >= 2", "output": "streams", "output_type": List[int]}
]

router = Router(routes=routes, routing_variables=["query", "streams"])


# the second route from above should be selected
kwargs = {"streams": [1, 2, 3], "query": "test"}
result = router.run(**kwargs)
assert result == {"streams": [1, 2, 3]}

# the first route from above should be selected
kwargs = {"streams": [1], "query": "test"}
result = router.run(**kwargs)
assert result == {"query": "test"}

How did you test it:

Unit tests were added to ensure that the new Router component works correctly on the component level. A real-world example is available in this colab

Notes for reviewer:

This is not a final version but more of a start of a conversation in the direction of expressive conditional routing in Haystack 2.x. DO NOT INTEGRATE

vblagoje · 2023-10-21T18:45:27Z

@ZanSara @masci This is not a final solution but the start of a conversation toward adding such a conditional routing component.

ZanSara · 2023-10-23T11:25:11Z

Hey @vblagoje this seems like a solid start! I have a few question:

I see you've addressed the case where no route is selected by raising an exception. I think it would be better to just drop the value with a warning, or at least to let the user choose not to raise an exception.
Is it possible to output on two connections at the same time? If the input matches more than one rule, what happens?
Is it possible that the rules have non-overlapping variables? For example, what happens if I have two rules where one checks for streams and another for files (just an example). In this case, are both inputs mandatory, or are they alternatives?
I am also assuming that the conditions can only be applied to the "whole" inputs, regardless of their types, and lists can't be split with this component (something like what FileTypeClassifier does). Is it the case?

In general though, looking promising!

vblagoje · 2023-10-23T13:27:12Z

Hey @ZanSara I don't know what's the right answer for these questions but we can involve the community to hone in on details.

Some ideas:

I agree with your suggestion on providing a flexible way to handle unmatched routes. Introducing an unmatched_route_behavior parameter with options like 'warn', 'error', or 'drop' can empower users to dictate how the Router behaves in such scenarios.
Currently, the design allows for a single matched route per input to ensure deterministic routing. However, nothing wrong with accommodating multiple matched routes if users ask for it. Multiple routes "fire" in such cases.
The rule evaluation is designed to be flexible. If boolean logic checks for a variable being truthy it can be optional. It seems dependent on boolean logic for routes. Again we can include community.
Whatever you can do with some boolean expression and variable reference should be allowed, right? So if you have access to handle replies from GPTGenerator, you should be able to access the first ChatMessage and see its role for example.

For the last bullet point consider #6138 use case. Here we need to put Router after GPTGenerator to check if an LLM message response is a function call and if so route the message to ServiceContainer to handle it. It would make, what used to be a complex and verbose, function invocation, super simple and easy to understand while isolating responsibilities to exactly where they belong.

ZanSara · 2023-10-23T14:51:00Z

For the last bullet point consider #6138 use case. Here we need to put Router after GPTGenerator to check if an LLM message response is a function call and if so route the message to ServiceContainer to handle it. It would make, what used to be a complex and verbose, function invocation, super simple and easy to understand while isolating responsibilities to exactly where they belong.

Actually this use case is very interesting. I don't know if it's possible, but imagine this scenario: I query the LLM with n=2 (for whatever reason), so I get two answers. If one is a function call and the other isn't, are both outputs going to carry one of these replies each? That would require unpacking the list. I think your current implementation does not support this case yet.

Not a requirement, I'm just trying to define the usecases that are supported and those that aren't 😊

haystack/preview/components/routers/router.py

vblagoje · 2023-10-24T14:48:15Z

@masci @silvanocerza @ZanSara, I came across a compact MIT-licenced library, asteval, that precisely meets our requirements, providing a safe alternative to using eval. It iteratively traverses the ast, executing operations directly. We can set up the Interpreter they way we want (see example below) in our Router component. Here's a snippet demonstrating its use:

import contextlib
import sys
from asteval import Interpreter

aeval = Interpreter(
    minimal=True, 
    use_numpy=False,
    user_symbols={"x": [1,2,3], "y": 2},
    max_statement_length=10
)

# this context manager is totally optional but could be useful for invalid user expressions
@contextlib.contextmanager
def limited_recursion(recursion_limit):
    old_limit = sys.getrecursionlimit()
    sys.setrecursionlimit(recursion_limit)
    try:
        yield
    finally:
        sys.setrecursionlimit(old_limit)

with limited_recursion(50):
    result = aeval.eval("len(x) > y")
    if len(aeval.error) > 0:
        for err in aeval.error:
            print(err.get_error())
    else:
        print(result)

Take this snippet and run it yourself (pip install asteval first) and put a breakpoint in "on_compare" method and a few other interesting places like methods run and eval.

Please let me know if we should proceed with our experiments using this approach.

ZanSara · 2023-10-26T07:56:52Z

Hey @vblagoje, while asteval looks safer than direct eval, I believe @masci's idea was rather about using something much simpler: for example something that looks like the document store's filters.

Right now your example would not work there because we're checking len(): however, such operators can be added to the filtering syntax for this component. I also believe we won't need many of them other than len, especially at the beginning.

vblagoje · 2023-10-26T08:26:23Z

@ZanSara the example above was simple intentionally to show how we can make this work, maybe it biased you. I think the community needs rich boolean expressions to make this component useful. I know I did for the use case I encountered - the need to route messages around based on certain ChatMessage properties as it was a case with ServiceComponent
Here is the updated code sample (recursion circuit breaker omitted for brevity):

from asteval import Interpreter

from haystack.preview.dataclasses import ChatMessage, ChatRole

function_call_message = ChatMessage.from_assistant("Some function payload")
function_call_message.metadata.update({'model': 'gpt-3.5-turbo-0613', 'index': 0, 'finish_reason': 'function_call'})

messages = [ChatMessage.from_user("What's the weather like in Berlin?"),
            function_call_message]


aeval = Interpreter(minimal=True,
                    use_numpy=False,
                    user_symbols={"messages": messages},
                    max_statement_length=100)

result = aeval.eval('messages[-1].metadata["finish_reason"] == "function_call"')
print(result)

masci

Overall looks good to me, left a couple of comments regarding code documentation.

One last question: I see there's some additional complexity to make output_name optional. I think that always passing output_name would simplify both the code and how we teach this feature, but I'm not sure how big of a burden this would be from the UX perspective. How did you evaluate the tradeoff?

haystack/preview/components/routers/conditional_router.py

vblagoje · 2023-11-15T16:57:45Z

Overall looks good to me, left a couple of comments regarding code documentation.

One last question: I see there's some additional complexity to make output_name optional. I think that always passing output_name would simplify both the code and how we teach this feature, but I'm not sure how big of a burden this would be from the UX perspective. How did you evaluate the tradeoff?

Yeah, exactly @masci - I went on to always use output_name in my code tests. However, that's a sample of 1 and it would be great to have others take a look at the colab and play with it to get a sense. That would be an essential piece of information to conclude this PR.

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

masci · 2023-11-17T13:50:29Z

I would give precedence to simplicity: for now we can make the parameter mandatory, and if we get negative feedback around UX we know we can change it later.

vblagoje · 2023-11-17T14:55:21Z

@masci it should be ready now, but please see if I have overlooked something in this slightly refactored version that is now largely simplified. The UX experience colab has not changed - as I have intuitively used all four fields as if they were mandatory. Please also see if docs are easy to digest

ZanSara

Overall seems good to me, but there is a serious issue with type serialization that we need to fix before this component can be used in a pipeline.

When serializing a list assert serialize_type(List[int]) == "typing.List" is not sufficient: canals needs to know that it's a list of int. We need a way to store that information as well, or the deserialized pipeline will fail to re-connect.

ZanSara · 2023-11-20T09:51:14Z

haystack/preview/components/routers/conditional_router.py

+    """Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter."""
+
+
+def serialize_type(target: Any) -> str:


Let's move this one (and it's sibling function deserialize_type) into an external module, so it can be reused. I think it will be handy for other components too.

ZanSara · 2023-11-20T09:56:29Z

haystack/preview/components/routers/conditional_router.py

+            except Exception as e:
+                raise RouteConditionException(f"Error evaluating condition for route '{route}': {e}") from e
+
+        raise NoRouteSelectedException(f"No route fired. Routes: {self.routes}")


This is interesting: why failing instead of dropping the input (with a loud log if necessary)? I'd say we should at least give the option to either fail or drop the value.

I don't know; it's just a simple solution for now; let's put it in the hands of users, and we'll see what they say. If we add an option, it is yet another variable to turn on/off, describe, test, confuse people, and from my perspective - unnecessary.

ZanSara · 2023-11-20T09:58:47Z

haystack/preview/components/routers/conditional_router.py

+    routes = [
+        {
+            "condition": "{{streams|length > 2}}",
+            "output": "{{streams}}",
+            "output_name": "enough_streams",
+            "output_type": List[int],
+        },
+        {
+            "condition": "{{streams|length <= 2}}",
+            "output": "{{streams}}",
+            "output_name": "insufficient_streams",
+            "output_type": List[int],
+        },
+    ]


If I understand correctly, these are dictionary with these fixed 4 keys. How about a small dataclass to help with code completion?

Then @silvanocerza will tell me - why did you make a data class for this thing 🤣 🤣 Perhaps in the next iteration, final release!

ZanSara · 2023-11-20T10:00:50Z

test/preview/components/routers/test_conditional_router.py

+
+    def test_output_type_serialization(self):
+        assert serialize_type(str) == "builtins.str"
+        assert serialize_type(List[int]) == "typing.List"


I'm afraid this won't be enough to deserialize it into a type that Canals can use for a connection. We definitely need to preserve the int as well.

@ZanSara I can adjust code to serialize List[int] into typing.List[int] str, but what about deserialization? Is deserialization into typing.List enough?

Ok, all should be covered now with 7eb0943

…haystack into connection_router_v2

vblagoje · 2023-11-21T15:57:46Z

Please have another pass @ZanSara and @masci
See unit tests in 7eb0943 as we now cover generics and nested generics. I agree we isolate this type (de)serialization, add more tests and develop it independently. But please after this PR has been integrated. I'm not sure I covered all the possible generics serialization cases but certainly many work now.

masci

LGTM, I think custom serialization for the types is the way to go here. Manually parsing the source code in deserialize_type is not ideal, but honestly I couldn't come up with a better alternative.

The rest was already good, thanks for incorporating the feedback about making output_name non-optional, I can confirm the code is easier now, let's re-evaluate later if optional is better.

dfokina

Pushed a tiny docstring update from my side, all good 👍

vblagoje · 2023-11-22T17:42:50Z

@ZanSara Let's integrate this one and then iron out these kinks during beta as we continue to play with ConditionalRouter

vblagoje added 7 commits October 21, 2023 12:57

Initial commit

cc202f3

First crude working example

5412f1b

Small fix

24dfd91

Simplify routes, change names, add pydocs

4d7d161

Several improvements, add unit tests

6380ca3

Compile expressions eagerly

87eb309

Fix typing mistakes in unit tests

7ddf516

vblagoje requested a review from a team as a code owner October 21, 2023 18:36

vblagoje requested review from ZanSara and removed request for a team October 21, 2023 18:36

github-actions bot added the topic:tests label Oct 21, 2023

vblagoje added 2.x Related to Haystack v2.0 and removed topic:tests labels Oct 21, 2023

github-actions bot added the type:documentation Improvements on the docs label Oct 21, 2023

Add release note

fd9d913

vblagoje requested a review from a team as a code owner October 21, 2023 18:41

vblagoje requested review from dfokina and removed request for a team October 21, 2023 18:41

github-actions bot added the topic:tests label Oct 21, 2023

masci suggested changes Oct 23, 2023

View reviewed changes

haystack/preview/components/routers/router.py Outdated Show resolved Hide resolved

Use asteval

1d57520

github-actions bot added topic:dependencies topic:build/distribution labels Oct 28, 2023

Merge branch 'main' into connection_router_v2

0f4d0f3

masci suggested changes Nov 15, 2023

View reviewed changes

vblagoje and others added 5 commits November 17, 2023 14:13

Merge branch 'main' into connection_router_v2

d3593e5

Update haystack/preview/components/routers/conditional_router.py

c9d2e2e

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Update haystack/preview/components/routers/conditional_router.py

be65d7b

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Update haystack/preview/components/routers/conditional_router.py

e7add96

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

Update haystack/preview/components/routers/conditional_router.py

ddda60b

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>

vblagoje added 3 commits November 17, 2023 15:27

Simplify - make all 4 router fileds mandatory

0eeabc6

Black __init__.py

b7bbe26

More pydocs

be1fa40

vblagoje added 4 commits November 17, 2023 16:00

Minor final touches

115241e

Remote unit test markers

fb3f3d8

Remove unit test markers

fb61b11

Merge branch 'main' into connection_router_v2

af97e65

ZanSara suggested changes Nov 20, 2023

View reviewed changes

vblagoje added 3 commits November 21, 2023 15:28

Merge branch 'main' into connection_router_v2

332aa5f

Merge branch 'connection_router_v2' of https://github.com/deepset-ai/…

208b2c6

…haystack into connection_router_v2

Improve (de)serialization, handle nested generics

7eb0943

masci approved these changes Nov 22, 2023

View reviewed changes

lg update

90b68a3

dfokina approved these changes Nov 22, 2023

View reviewed changes

vblagoje requested a review from ZanSara November 22, 2023 17:41

ZanSara approved these changes Nov 23, 2023

View reviewed changes

vblagoje merged commit b557f30 into main Nov 23, 2023
22 checks passed

vblagoje deleted the connection_router_v2 branch November 23, 2023 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `ConditionalRouter` Haystack 2.x component #6147

feat: Add `ConditionalRouter` Haystack 2.x component #6147

vblagoje commented Oct 21, 2023 •

edited

Loading

vblagoje commented Oct 21, 2023

ZanSara commented Oct 23, 2023 •

edited

Loading

vblagoje commented Oct 23, 2023 •

edited

Loading

ZanSara commented Oct 23, 2023

vblagoje commented Oct 24, 2023

ZanSara commented Oct 26, 2023

vblagoje commented Oct 26, 2023

masci left a comment

vblagoje commented Nov 15, 2023 •

edited

Loading

masci commented Nov 17, 2023

vblagoje commented Nov 17, 2023

ZanSara left a comment

ZanSara Nov 20, 2023

ZanSara Nov 20, 2023

vblagoje Nov 21, 2023

ZanSara Nov 20, 2023

vblagoje Nov 21, 2023

ZanSara Nov 20, 2023

vblagoje Nov 21, 2023

vblagoje Nov 21, 2023

vblagoje commented Nov 21, 2023

masci left a comment

dfokina left a comment •

edited

Loading

vblagoje commented Nov 22, 2023

		"""Exception raised when there is an error parsing or evaluating the condition expression in ConditionalRouter."""


		def serialize_type(target: Any) -> str:

feat: Add ConditionalRouter Haystack 2.x component #6147

feat: Add ConditionalRouter Haystack 2.x component #6147

Conversation

vblagoje commented Oct 21, 2023 • edited Loading

Why:

What:

How can it be used:

How did you test it:

Notes for reviewer:

vblagoje commented Oct 21, 2023

ZanSara commented Oct 23, 2023 • edited Loading

vblagoje commented Oct 23, 2023 • edited Loading

ZanSara commented Oct 23, 2023

vblagoje commented Oct 24, 2023

ZanSara commented Oct 26, 2023

vblagoje commented Oct 26, 2023

masci left a comment

Choose a reason for hiding this comment

vblagoje commented Nov 15, 2023 • edited Loading

masci commented Nov 17, 2023

vblagoje commented Nov 17, 2023

ZanSara left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vblagoje commented Nov 21, 2023

masci left a comment

Choose a reason for hiding this comment

dfokina left a comment • edited Loading

Choose a reason for hiding this comment

vblagoje commented Nov 22, 2023

feat: Add `ConditionalRouter` Haystack 2.x component #6147

feat: Add `ConditionalRouter` Haystack 2.x component #6147

vblagoje commented Oct 21, 2023 •

edited

Loading

ZanSara commented Oct 23, 2023 •

edited

Loading

vblagoje commented Oct 23, 2023 •

edited

Loading

vblagoje commented Nov 15, 2023 •

edited

Loading

dfokina left a comment •

edited

Loading