What is the best way to go about indexing excel of csv files for question/answering bot?
Currently what I am doing is turning excel files to csv and then load the data with SimpleDirectoryReader and index it with GPTSimpleVectorIndex. But that sometimes gives unreliable answers.
hi @Erik , re what @yoelk was mentioning, you may want to try setting chunk_size_limit in the vector store index. By default the chunk sizes are quite big.
index = GPTSimpleVectorIndex(documents, ..., chunk_size_limit=512)
How would I go about finding the right chunk size? Do I just need to test it out to find what is matching best or are there some best practices to follow? @jerryjliu0 @yoelk
@Erik Looking at your data, what I would do is create a document per row with the column names. Each line would look something like this: CName1:value, CName2:value...