Find answers from the community

Updated 2 years ago

llamaindex for big numerical csv data

What’s the best way to ingest big csv data (10MB) into llamaindex? I tried it but it’s a bit hallucinating. The response make up numbers that don’t exist, and is wrong at counting something (e.g numbers of transaction with amount 190.00)
S
j
8 comments
When I try to predict a trend by cities, the answer is somewhat correct. They answer Jakarta is the biggest transaction, Surabaya second, and Bandung comes next.

But I’m worried that it might give misleading result soon when I really use it for my company
@Senna have you tried putting it in a sql dataframe or pandas index? chunking it up and putting in a vector db typically isn't a good idea
Will you tell me more about it? I know how dataframe or pandas index works, but how is that going to works with llamaindex (because I have to pay attention to token limits) ?
yeah check out our SQL guide and pandas index demo:

https://gpt-index.readthedocs.io/en/latest/guides/tutorials/sql_guide.html

https://gpt-index.readthedocs.io/en/latest/examples/index_structs/struct_indices/PandasIndexDemo.html

high-level idea is that you first load CSV into a SQL database or dataframe (not using an LLM, just via code), and then you can use our text-to-sql or pandas functionality
so llamaindex is going to figure out the query for me depends on the prompt right?
Thanks btw will read it! if this works well, this llamaindex thing is the future 🌎
Add a reply
Sign up and join the conversation on Discord