The community member is having issues ingesting a large 10MB CSV file into llamaindex, as the responses seem to include made-up numbers and incorrect counts. Another community member suggests trying to load the data into a SQL database or Pandas dataframe first, rather than directly into llamaindex, as chunking the data and putting it into a vector database may not be the best approach.
The community members discuss how to use llamaindex with SQL or Pandas, and a link is provided to the llamaindex documentation on SQL integration. The high-level idea is to first load the CSV data into a SQL database or Pandas dataframe, and then use llamaindex's text-to-SQL or Pandas functionality to interact with the data.
The community members seem optimistic that this approach could work well and that llamaindex could be the future for their use case.
What’s the best way to ingest big csv data (10MB) into llamaindex? I tried it but it’s a bit hallucinating. The response make up numbers that don’t exist, and is wrong at counting something (e.g numbers of transaction with amount 190.00)
When I try to predict a trend by cities, the answer is somewhat correct. They answer Jakarta is the biggest transaction, Surabaya second, and Bandung comes next.
But I’m worried that it might give misleading result soon when I really use it for my company
Will you tell me more about it? I know how dataframe or pandas index works, but how is that going to works with llamaindex (because I have to pay attention to token limits) ?
high-level idea is that you first load CSV into a SQL database or dataframe (not using an LLM, just via code), and then you can use our text-to-sql or pandas functionality