Hey! Since llama-index 0.10.29, there

Hey! Since llama-index 0.10.29, there seems to be a change in how the condition_fn works. We had this code:

Plain Text

# If there are players found, search for them in the vector store
# If there are teams found, search for them in the vector store
p.add_link("get_players", "player_vector", condition_fn=lambda x: len(x) > 0) 
p.add_link("get_teams", "teams_vector", condition_fn=lambda x: len(x) > 0)

# Generate a context object that contains the player and team data in a json format
# This is so that the context can be passed to the text_to_sql component with player and team ids etc
p.add_link("player_vector", "generate_prompt", dest_key="data")
p.add_link("teams_vector", "generate_prompt", dest_key="data")
p.add_link("input", "generate_prompt", dest_key="input")

Both player_vector and teams_vector are connecting to generate_prompt and it was optional to either of them to extract data, and it was working fine. But now, if any of those two condition_fn returns false, the pipeline does not continue to generate_prompt and stops there. Is that expected or was it a bug introduced in 0.10.29 ?

10 comments

LLogan M

Is that the full code for all your links?

LLogan M

Kind of curious how generate_prompt can run if it doesn't have all its dependencies connected?

jjuanfe190

that is not the full code but it does have all dependencies connected, data and input

LLogan M

but if player and teams vector don't link, won't it be missing data?

In any case, the full code might be useful for replicating the issue

LLogan M

tbh I'm surprised two links to the same dest key works 😅 I don't remember seeing code to handle the case where more than one dest key is being linked to at runtime -- how does it combine the output of player_vector and teams_vector 🤔

Anyways, lets see if I can reproduce with what I have here

jjuanfe190

Plain Text

def generate_prompt(input, teams = None, players = None):    
    return """\
        Query: {input}

        example rows:
        {teams}
        {players}
    """

FnComponent(fn=generate_prompt)

# If there are players found, search for them in the vector store
# If there are teams found, search for them in the vector store
p.add_link("get_players", "player_vector", condition_fn=lambda x: len(x) > 0) 
p.add_link("get_teams", "teams_vector", condition_fn=lambda x: len(x) > 0)

# Generate a context object that contains the player and team data in a json format
# This is so that the context can be passed to the text_to_sql component with player and team ids etc
p.add_link("player_vector", "generate_prompt", dest_key="players")
p.add_link("teams_vector", "generate_prompt", dest_key="teams")
p.add_link("input", "generate_prompt", dest_key="input")

# Generate an SQL Query based on the context object
p.add_link("generate_prompt", "text_to_sql")

return p

jjuanfe190

after 0.10.29, generate_prompt is not even called if at least one of the condition_fn return false, even if the other one returns true

LLogan M

ahh I see, sneaky

So, there was some update to conditional links. Basically, there was some odd scenarios where we could have a conditional link fail, but then still try to run dependencies of that original link without having an input, causing a failure.

I actually added a specific unit test for this fix, if you are curious
https://github.com/run-llama/llama_index/blob/8b373239396134a92c9277b36aa7023c633c018a/llama-index-core/tests/query_pipeline/test_query.py#L478

So maybe related to that? I can probably debug later today. Although feel free to make PR and debug as well if you have time. Managing dependenices of the DAG with conditional links is a tad complex 😅

jjuanfe190

I probably can debug later in the day. thanks!

LLogan M

I made a quick test for the issue

Plain Text

def get_players(query: str):
  return [1]

def get_teams(query: str):
  return [2]

def player_vector(query: str):
  return [3]

def teams_vector(query: str):
  return [4]

def generate_prompt(input, teams = None, players = None):
  return "\n".join([input, str(teams), str(players)])


pipeline = QueryPipeline(
  modules={
    "input": InputComponent(),
    "get_players": FnComponent(get_players),
    "get_teams": FnComponent(get_teams),
    "player_vector": FnComponent(player_vector),
    "teams_vector": FnComponent(teams_vector),
    "generate_prompt": FnComponent(generate_prompt, opt_params=set(["teams", "players"]))
  },
  verbose=True
)

pipeline.add_link("input", "get_players")
pipeline.add_link("get_players", "player_vector", condition_fn=lambda x: len(x) > 0)

pipeline.add_link("input", "get_teams")
pipeline.add_link("get_teams", "teams_vector", condition_fn=lambda x: len(x) >0)

pipeline.add_link("input", "generate_prompt", dest_key="input")
pipeline.add_link("teams_vector", "generate_prompt", dest_key="players")
pipeline.add_link("player_vector", "generate_prompt", dest_key="teams")

print(pipeline.run(input="hello!"))

The issue is in this function:
https://github.com/run-llama/llama_index/blob/8b373239396134a92c9277b36aa7023c633c018a/llama-index-core/llama_index/core/query_pipeline/query.py#L588

Which gets called from here in this loop:
https://github.com/run-llama/llama_index/blob/8b373239396134a92c9277b36aa7023c633c018a/llama-index-core/llama_index/core/query_pipeline/query.py#L710

I started exploring fix but then it broke unit tests :PSadge:

Add a reply

Find answers from the community

Hey! Since llama-index 0.10.29, there