That is a nice idea. I tried that. Here are my finding (for the record):
- The pandas index worked well for very specific questions. Like, for example, "how long does product x take to deliver"?
- The pandas index worked less well for more general questions: "What does this company sell?" -> here the index just returned the title of each product. To the question "What type of products does this company sell?" it responded with a list of column names.
- Using a vector index over the table worked better for more general questions, but still was not able to do a stellar job. I think the problem is the "chunking" process. It made documents for the index that roughly contained 2.5 lines of the table. That led to subsequent products being attributed characteristics of the previous product. I think the real problem here is not the index, but rather the data structure.