Routing

At a glance

In a RouterQueryEngine, when would it make sense to use PydanticSingleSelector instead of LLMSingleSelector?

30 comments

Always use pydantic if you are using openai. It uses their function calling api, very reliable.

LLM selectors rely on the LLM to output json, which is more error prone

JJana

Thank you 🙂

JJana

I have one more question: SQLJoinQueryEngine can only search either SQL or vector index, but it cannot do subquery and the synthesize, right? What can I use if I want to query SQL and vector index for one query search?

JJana

Or maybe I have not described the query tool as it fails with an error trying to find data that is in the vector index ...

JJana

Let me give you more context: i have a simple table about cats that includes the cat name, origin, temperament, size and social behaviour. And I have vector indexes for semantic data about cats. I wanted to use SQLJoinQueryEngine to query both. The query I send in is Tell me about Maine Coon size and it's life span. I want the size to be read from SQL and life span from vectors. But the engine comes up with the query SELECT size, life_span
FROM cat_breeds
WHERE name = 'Maine Coon'... Here is the Query Engine Tool description: sql_tool = QueryEngineTool.from_defaults(
query_engine=sql_query_engine,
description=(
'Useful for translating a natural language query into a SQL query over a table containing: '
'cat breed, containing the name/origin/temperament/size/social_behaviour of each cat breed.'
'Use when you need information about the cat origin, temperament, size and social behaviour.'
'Do NOT use for other characteristics like life span.'
)
)

LLogan M

Yea the instructions won't be seen by the LLM once it writes the sql query 😅

The agent sees the description and writes an initial query for the sql engine. But then another instance of the LLM, disconnected from all previous chat history/messages, converts that agent query to a sql query

In the sql query engine itself, trying setting the context_query_kwargs to be a dict of table_name to description

context_query_kwargs={"table_name": "table_details"}

This will be inserted alongside the schema and agent query when generating the sql code

JJana

Great, thank you for the explanation, will try it out 🙂

JJana

Hmm, should I format it in another way:

sql_database,
context_query_kwargs={"cat_breeds": "name,origin,temperament,size,social_behavior"}
)

It does now pass the description: {'result': [('Maine Coon', 'Very Large')],
'sql_query': 'SELECT name, size\nFROM cat_breeds\nORDER BY size DESC\nLIMIT 1;'}
[('Savannah', 'USA', 'high energy', 'Large', 'sociable'), ('Ragdoll', 'USA', 'gentle', 'Large', 'sociable'), ('Maine Coon', 'USA (Maine)', 'friendly', 'Very Large', 'sociable')]
INFO:llama_index.indices.struct_store.sql_query:> Table desc str: Table 'cat_breeds' has columns: name (VARCHAR(16)), origin (VARCHAR(16)), temperament (VARCHAR(16)), size (VARCHAR(16)), social_behavior (VARCHAR(16)), and foreign keys: . The table description is: name,origin,temperament,size,social_behavior

Table desc str: Table 'cat_breeds' has columns: name (VARCHAR(16)), origin (VARCHAR(16)), temperament (VARCHAR(16)), size (VARCHAR(16)), social_behavior (VARCHAR(16)), and foreign keys: . The table description is: name,origin,temperament,size,social_behavior

but for some reason it is still looking for the non existing column: OperationalError: (sqlite3.OperationalError) no such column: life_expectancy
[SQL: SELECT name, life_expectancy
FROM cat_breeds
WHERE name = 'Ragdoll']

I'm using only NLSQLTableQueryEngine now. Should I format it differently?

LLogan M

Try adding more description maybe? ".. These are the only columns available, do not use columns that aren't listed here"

JJana

Oh great, thank you!! Now it works 🙂

JJana

So haapy

JJana

Oh, but it does not work with every query... for some reason it knows not to query column life_span, but it till tries to query column eye_color

JJana

This is my code now: from llama_index.indices.struct_store import NLSQLTableQueryEngine
query_engine = NLSQLTableQueryEngine(
sql_database,
context_query_kwargs={"cat_breeds": "These are the only columns available, do not use columns that aren't listed here: name,origin,temperament,size,social_behavior"}
)

JJana

maybe I should switch to gpt 4 🙂

JJana

I tried passing service context to NLSQLTableQueryEngine like this
from llama_index import ServiceContext
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-4", temperature=0, verbose=True)
service_context = ServiceContext.from_defaults(llm=llm)
from llama_index.indices.struct_store import NLSQLTableQueryEngine
query_engine = NLSQLTableQueryEngine(
sql_database,
service_context,
context_query_kwargs={"cat_breeds": (
"The only columns available are: name,origin,temperament,size,social_behavior. Do not use other columns and foreign keys. \n"
"Do not attempt to run a query if the column is not among available columns.\n"
" Do not use unexistant colums in a WHERE part of the query. \n"
"When the wuestion is about characteristics which are not present in the available columns return: SELECT name FROM cat_breeds WHERE 0=1\n"
)
}
)
But I get an error with reading a template

JJana

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/Users/jana/work/LlamaIndexLangChain/llamaindexscripts/queryDatabasesWithNL.ipynb Cell 13 line 1
----> 1 response = query_engine.query("Which cat has the blue almond-shaped eyes?")
2 response.metadata

File ~/work/LlamaIndexLangChain/llamaindexscripts/myenv/lib/python3.9/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
21 if isinstance(str_or_query_bundle, str):
22 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
24 return response

File ~/work/LlamaIndexLangChain/llamaindexscripts/myenv/lib/python3.9/site-packages/llama_index/indices/struct_store/sql_query.py:270, in BaseSQLTableQueryEngine._query(self, query_bundle)
267 table_desc_str = self._get_table_context(query_bundle)
268 logger.info(f"> Table desc str: {table_desc_str}")
--> 270 response_str = self._service_context.llm_predictor.predict(
271 self._text_to_sql_prompt,
272 query_str=query_bundle.query_str,
273 schema=table_desc_str,
274 dialect=self._sql_database.dialect,
275 )
277 sql_query_str = self._parse_response_to_sql(response_str)
278 # assume that it's a valid SQL query

JJana

File ~/work/LlamaIndexLangChain/llamaindexscripts/myenv/lib/python3.9/site-packages/llama_index/llm_predictor/base.py:140, in LLMPredictor.predict(self, prompt, prompt_args) 138 def predict(self, prompt: BasePromptTemplate, prompt_args: Any) -> str:
139 """Predict."""
--> 140 self._log_template_data(prompt, prompt_args) 142 if self._llm.metadata.is_chat_model: 143 messages = prompt.format_messages(llm=self._llm, prompt_args)

File ~/work/LlamaIndexLangChain/llamaindexscripts/myenv/lib/python3.9/site-packages/llama_index/llm_predictor/base.py:130, in LLMPredictor._log_template_data(self, prompt, prompt_args) 124 def _log_template_data( 125 self, prompt: BasePromptTemplate, prompt_args: Any
126 ) -> None:
127 with self.callback_manager.event(
128 CBEventType.TEMPLATING,
129 payload={
--> 130 EventPayload.TEMPLATE: prompt.get_template(llm=self._llm),
131 EventPayload.TEMPLATE_VARS: prompt_args,
132 EventPayload.SYSTEM_PROMPT: self.system_prompt,
133 EventPayload.QUERY_WRAPPER_PROMPT: self.query_wrapper_prompt,
134 },
135 ):
136 pass

AttributeError: 'ServiceContext' object has no attribute 'get_template'

I also tried to just find the right prompt, bu gpt-3.5.turbo just keeps on providing wrong queries even if I very specifically say no to return certain types of queries... any ideas?

LLogan M

gpt-4 might be the way to go for this 😆

JJana

.Well, yes, I thought the same way, but I get this: AttributeError: 'ServiceContext' object has no attribute 'get_template' error when I pass servicecontext to NLSQLTableQueryEngine. So my question is if you have any recommendations on what causes the error and how it can be solved. I have pasted the error stack above

LLogan M

You need to pass it as a kwarg

Plain Text

query_engine = NLSQLTableQueryEngine(
    sql_database,
    service_context=service_context,
    context_query_kwargs={"cat_breeds": (
        "The only columns available are: name,origin,temperament,size,social_behavior. Do not use other columns and foreign keys. \n"
        "Do not attempt to run a query if the column is not among available columns.\n"
         " Do not use unexistant colums in a WHERE part of the query. \n"
        "When the wuestion is about characteristics which are not present in the available columns return: SELECT name FROM cat_breeds WHERE 0=1\n"
        )
    }
)

JJana

Yes, I have done so as I show in the above example. When I add service context with gpt 4 I get this error: AttributeError: 'ServiceContext' object has no attribute 'get_template' error when I pass servicecontext to NLSQLTableQueryEngine

LLogan M

Your example above was missing service_context=service_context,

JJana

Oh, I was shortsighted, thank you. GPT 4 is smart enough to find out the table does not have column, and it returns: The provided schema does not contain any information about the eye color or shape of the cats. Therefore, it is not possible to create a SQL query to answer this question. But since this is not an SQL statement, I get the error:

OperationalError: (sqlite3.OperationalError) near "The": syntax error
[SQL: The provided schema does not contain any information about the eye color or shape of the cats. Therefore, it is not possible to create a SQL query to answer this question.]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

Since the NLSQLTableQueryEngine is trying to execute query. Does NLSQLTableQueryEngine have any nice ways to handle non SQL response from the model?

LLogan M

hmmm, yea it doesn't really handle that well at the moment 😅 Could definitely make a PR to do that better (i.e. return the raw response? detect "non-sql" vs. "broken-sql" responses?

JJana

Oh yes, that would be better!

JJana

@Logan M As I'm exploring agents. I'm not sure if I understand the main difference between these three examples: https://gpt-index.readthedocs.io/en/latest/examples/agent/openai_agent_with_query_engine.html, https://gpt-index.readthedocs.io/en/latest/examples/agent/openai_agent_query_plan.html, and https://gpt-index.readthedocs.io/en/latest/examples/agent/react_agent_with_query_engine.html When would I use any of them, and how to decide which one is better when I have different sources and I want to perform different queries and tasks over them?

LLogan M

Just an example of an openai agent with query engines. It shows how to connect existing query engines to an agent. You can of course also create any function as a tool and pass it to the agent
A small example of getting the agent to plan which tools/query engines to use and in which order. More of a toy example for now imo
Same as 1, but with a react algorithm. I would only use this if you aren't using an OpenAI LLM. The OpenAIAgent uses the function calling API rather than a react loop, and the function calling api is much more reliable

JJana

I'm not quite sure how to use QueryPlanTool. Do you have or know of any in-depth examples where I could learn how to use it and learn what its benefits are?

LLogan M

Tbh I'm not even sure how to use it 😅 I think it's a bit of a beta feature tbh. It probably only works well with gpt-4

LLogan M

Thats the only docs page on it (besides reading source code)

Add a reply

Find answers from the community

Routing