Find answers from the community

Updated 3 months ago

Hey! Since llama-index 0.10.29, there

Hey! Since llama-index 0.10.29, there seems to be a change in how the condition_fn works. We had this code:
Plain Text
# If there are players found, search for them in the vector store
# If there are teams found, search for them in the vector store
p.add_link("get_players", "player_vector", condition_fn=lambda x: len(x) > 0) 
p.add_link("get_teams", "teams_vector", condition_fn=lambda x: len(x) > 0)

# Generate a context object that contains the player and team data in a json format
# This is so that the context can be passed to the text_to_sql component with player and team ids etc
p.add_link("player_vector", "generate_prompt", dest_key="data")
p.add_link("teams_vector", "generate_prompt", dest_key="data")
p.add_link("input", "generate_prompt", dest_key="input")


Both player_vector and teams_vector are connecting to generate_prompt and it was optional to either of them to extract data, and it was working fine. But now, if any of those two condition_fn returns false, the pipeline does not continue to generate_prompt and stops there. Is that expected or was it a bug introduced in 0.10.29 ?
L
j
10 comments
Is that the full code for all your links?
Kind of curious how generate_prompt can run if it doesn't have all its dependencies connected?
that is not the full code but it does have all dependencies connected, data and input
but if player and teams vector don't link, won't it be missing data?

In any case, the full code might be useful for replicating the issue
tbh I'm surprised two links to the same dest key works πŸ˜… I don't remember seeing code to handle the case where more than one dest key is being linked to at runtime -- how does it combine the output of player_vector and teams_vector πŸ€”

Anyways, lets see if I can reproduce with what I have here
Plain Text
def generate_prompt(input, teams = None, players = None):    
    return """\
        Query: {input}

        example rows:
        {teams}
        {players}
    """

FnComponent(fn=generate_prompt)

# If there are players found, search for them in the vector store
# If there are teams found, search for them in the vector store
p.add_link("get_players", "player_vector", condition_fn=lambda x: len(x) > 0) 
p.add_link("get_teams", "teams_vector", condition_fn=lambda x: len(x) > 0)

# Generate a context object that contains the player and team data in a json format
# This is so that the context can be passed to the text_to_sql component with player and team ids etc
p.add_link("player_vector", "generate_prompt", dest_key="players")
p.add_link("teams_vector", "generate_prompt", dest_key="teams")
p.add_link("input", "generate_prompt", dest_key="input")

# Generate an SQL Query based on the context object
p.add_link("generate_prompt", "text_to_sql")

return p
after 0.10.29, generate_prompt is not even called if at least one of the condition_fn return false, even if the other one returns true
ahh I see, sneaky

So, there was some update to conditional links. Basically, there was some odd scenarios where we could have a conditional link fail, but then still try to run dependencies of that original link without having an input, causing a failure.

I actually added a specific unit test for this fix, if you are curious
https://github.com/run-llama/llama_index/blob/8b373239396134a92c9277b36aa7023c633c018a/llama-index-core/tests/query_pipeline/test_query.py#L478

So maybe related to that? I can probably debug later today. Although feel free to make PR and debug as well if you have time. Managing dependenices of the DAG with conditional links is a tad complex πŸ˜…
I probably can debug later in the day. thanks!
I made a quick test for the issue

Plain Text
def get_players(query: str):
  return [1]

def get_teams(query: str):
  return [2]

def player_vector(query: str):
  return [3]

def teams_vector(query: str):
  return [4]

def generate_prompt(input, teams = None, players = None):
  return "\n".join([input, str(teams), str(players)])


pipeline = QueryPipeline(
  modules={
    "input": InputComponent(),
    "get_players": FnComponent(get_players),
    "get_teams": FnComponent(get_teams),
    "player_vector": FnComponent(player_vector),
    "teams_vector": FnComponent(teams_vector),
    "generate_prompt": FnComponent(generate_prompt, opt_params=set(["teams", "players"]))
  },
  verbose=True
)

pipeline.add_link("input", "get_players")
pipeline.add_link("get_players", "player_vector", condition_fn=lambda x: len(x) > 0)

pipeline.add_link("input", "get_teams")
pipeline.add_link("get_teams", "teams_vector", condition_fn=lambda x: len(x) >0)

pipeline.add_link("input", "generate_prompt", dest_key="input")
pipeline.add_link("teams_vector", "generate_prompt", dest_key="players")
pipeline.add_link("player_vector", "generate_prompt", dest_key="teams")

print(pipeline.run(input="hello!"))


The issue is in this function:
https://github.com/run-llama/llama_index/blob/8b373239396134a92c9277b36aa7023c633c018a/llama-index-core/llama_index/core/query_pipeline/query.py#L588

Which gets called from here in this loop:
https://github.com/run-llama/llama_index/blob/8b373239396134a92c9277b36aa7023c633c018a/llama-index-core/llama_index/core/query_pipeline/query.py#L710


I started exploring fix but then it broke unit tests :PSadge:
Add a reply
Sign up and join the conversation on Discord