Find answers from the community

Updated 2 years ago

Query Types

I have a table (in csv), and I would like for users to be able to ask both general questions (e.g. "what different types of products does the company have?", or "how much does product X cost?"). What would be the best index structure for that? I tried a TreeIndex, but that does not really seem to work well. Any suggestions?
@Logan M Perhaps you can help? πŸ˜ƒ
L
L
7 comments
I think two different indexes, combined with a router engine, would make the most sense here

One index could be a pandas index (for structured queries like the second one) and the other a vector index (for general QA)

But, this is just a guess at the best approach here lol
That is a nice idea. I tried that. Here are my finding (for the record):
  • The pandas index worked well for very specific questions. Like, for example, "how long does product x take to deliver"?
  • The pandas index worked less well for more general questions: "What does this company sell?" -> here the index just returned the title of each product. To the question "What type of products does this company sell?" it responded with a list of column names.
  • Using a vector index over the table worked better for more general questions, but still was not able to do a stellar job. I think the problem is the "chunking" process. It made documents for the index that roughly contained 2.5 lines of the table. That led to subsequent products being attributed characteristics of the previous product. I think the real problem here is not the index, but rather the data structure.
@Logan M Something that I was not able to do is modifying the prompt that is being passed onto the LLM behind the Pandas index. How does one do that? I tried creating a custom prompt (str) and the passing it to the PandasPrompt object, but I was then not able to figure out how to pass that onto the query_engine. The documentation seems to suggest that passing the custom prompt onto the index, during construction, is possible but deprecated. What is the real way of doing that?
I think you should be able to pass the custom prompt in the as_query_engine() call

as_query_engine(pandas_prompt=my_prompt)
Hmmm interesting. How do you know its 2.5 lines? It should be parsing every line in the CSV into a single document/node πŸ€”
I took a look at the documents in the indexes docstore. I am sure. The headings, for example, were parsed together with most of the first row of the table.
Try using a different CSV loader maybe. I like to use this one since it retains column information

https://llamahub.ai/l/file-paged_csv
Add a reply
Sign up and join the conversation on Discord