hey im building a chat engine over

At a glance

hey im building a chat engine over different data, e.g Sales data, Labor costs, techincal support docs is it best practice to index these datasets seperately and then use a router to decide which index to query? i ask because all of the router engine use the same nodes but over a summary index and vector index

6 comments

RRoland Tannous

different indexes or use metadata field to filter

rriskytidepod

using metadata filters how do you decide based on a query what filter to use?

RRoland Tannous

let the llm decide.. maybe return a dict which includes both a topic key , and the user query string.
You parse the dict, throw the query string at the index search, then do filter by topic metadata (if the vector db supports this natively).

Another approach using llamaindex is to use tools and MetadataFilters:
In this approach, you define a vector_query tool , you pass it the actual user query string and the topic (infered by the LLM) as arguments.

something like

Plain Text

from llama_index.core.vector_stores import MetadataFilters
def vector_query(
  query: str,
  topic: str) -> str:
  """ Perform vector search based on topic. Try to       
  infer topic category from user query.

  query [str]: the user's question.
  topic [str]: Filter by topic. Topic can be
  sales, labor, technical support.
  """

  metadata_dicts = [
   {"key":"topic", "value": topic}]

  query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetdataFilters.from_dicts(
       metadata_dicts,
       )
  )

  return query_engine.query(query)

vector_query_tool = FunctionTool.from_defaults(
   name="vector_tool",
   fn=vector_query
)

then pass the vector_query_tool to the LLM agent as a tool.
You can also augment the LLM prompt by explicitely asking it to infer the topic (if you don't just want to rely on the function description - as shown above- which is passed to the LLM prompt).

RRoland Tannous

This should work.

RRoland Tannous

Alternatively, if you dont want to use Metadatafilters.... just create a vector_query tool for each topic and in the tool's description specify which topic, each tool applies to .
example tool description: "Use this tool to query the index about finance related issues and topics".
in this case make sure that "vector_index" for each tool points to the appropriate index (name the indexes the same as your topics, ie finance, labor, sales)

rriskytidepod

awesome great suggestion thanks

Add a reply

Find answers from the community

hey im building a chat engine over