Find answers from the community

Updated 2 years ago

Indexing csv for Q&A

What is the best way to go about indexing excel of csv files for question/answering bot?

Currently what I am doing is turning excel files to csv and then load the data with SimpleDirectoryReader and index it with GPTSimpleVectorIndex. But that sometimes gives unreliable answers.
y
E
j
11 comments
Are all your columns potential answers? Also you might wanna change the default chunk size which might be too big for this
No, the idea would be to take an excel file consisting financial data and ask questions about it. So there is not set format.
Can you share how a typical line looks like? Have you tried creating a separate document per line?
Attachments
image.png
image.png
image.png
No, I have not tried it, how would I do that?
hi @Erik , re what @yoelk was mentioning, you may want to try setting chunk_size_limit in the vector store index. By default the chunk sizes are quite big.

index = GPTSimpleVectorIndex(documents, ..., chunk_size_limit=512)
How would I go about finding the right chunk size? Do I just need to test it out to find what is matching best or are there some best practices to follow? @jerryjliu0 @yoelk
@Erik Looking at your data, what I would do is create a document per row with the column names. Each line would look something like this:
CName1:value, CName2:value...
yeah @yoelk 's suggestion works well in this case, if you want to explicitly make sure that each embedded entry corresponds to one row
What if there are multiple tables on a sheet? @yoelk @jerryjliu0
You can still use a vector per line
Add a reply
Sign up and join the conversation on Discord