Find answers from the community

Updated 5 months ago

So I would be happy if someone could

So I would be happy if someone could help.

Trying to add some csv data to VectoreStoreIndex to query on like "What is the CodeName for Code".

Using SimpleDirectoryReader I gave it csv 100 rows with 2 columns Code and CodeName. Then created index like:
index = VectorStoreIndex.from_documents
It gave 50% wrong answers for given Codes.

So I gave it only 50 rows. It knew everything perfectly. What is the limitation ?


As I dont know why, I tried to split the csvs into 2 with 50, 50 rows using the following code:

data = SimpleDirectoryReader(input_dir="./diagnozy_semicol_noclear_0-50_50-100/").load_data(show_progress=True)
index = VectorStoreIndex.from_documents(data)

It completely forgot first 50 rows but knew perfectly rows 51-100. What is happening? How to teach it more then a few rows?

Thank you so much, I am completele lost.
W
O
4 comments
I would suggest you use PagedCSVReader. It'll create better nodes for your CSV records and thus improving your results.
https://llamahub.ai/l/readers/llama-index-readers-file?from=
Thank you! SSo I tried:
parser = PagedCSVReader()
file_extractor = {".csv": parser} # Add other CSV formats as needed
data = SimpleDirectoryReader(
"./diagnozy_semicol_noclear_100/", file_extractor=file_extractor
).load_data()
index = VectorStoreIndex.from_documents(data)

and again, it has no idea. An the thing is I need to teach it SQL table with 40 000 rows. Why it works on first 50 but not 100? U know I tried fill it with DB result, it messed up. So i saved SQL query to csv with limitaion of 100 rows and gave it to llama - it messed up big time. So I tried 50 rows. And it is absolutely perfect. What is the difference?
CSV items are scattered. So textual generation based on queries may not work properly.

For example if you ask "give me top 5 states", it may or not be able to answer this correctly.

If your operation is purely CSV based. Try https://docs.pandas-ai.com/intro
No, csv is just a workaround the sql table i need to teach. I just need to teach it list of ouw internal codes and names and ask if any of them shows up in given document.
Add a reply
Sign up and join the conversation on Discord