llamaindex for big numerical csv data

At a glance

The community member is having issues ingesting a large 10MB CSV file into llamaindex, as the responses seem to include made-up numbers and incorrect counts. Another community member suggests trying to load the data into a SQL database or Pandas dataframe first, rather than directly into llamaindex, as chunking the data and putting it into a vector database may not be the best approach.

The community members discuss how to use llamaindex with SQL or Pandas, and a link is provided to the llamaindex documentation on SQL integration. The high-level idea is to first load the CSV data into a SQL database or Pandas dataframe, and then use llamaindex's text-to-SQL or Pandas functionality to interact with the data.

The community members seem optimistic that this approach could work well and that llamaindex could be the future for their use case.

Useful resources

SSenna

What’s the best way to ingest big csv data (10MB) into llamaindex? I tried it but it’s a bit hallucinating. The response make up numbers that don’t exist, and is wrong at counting something (e.g numbers of transaction with amount 190.00)

8 comments

SSenna

When I try to predict a trend by cities, the answer is somewhat correct. They answer Jakarta is the biggest transaction, Surabaya second, and Bandung comes next.

But I’m worried that it might give misleading result soon when I really use it for my company

jjerryjliu0

@Senna have you tried putting it in a sql dataframe or pandas index? chunking it up and putting in a vector db typically isn't a good idea

SSenna

Will you tell me more about it? I know how dataframe or pandas index works, but how is that going to works with llamaindex (because I have to pay attention to token limits) ?

SSenna

@jerryjliu0

SSenna

oh this one https://gpt-index.readthedocs.io/en/latest/guides/tutorials/sql_guide.html

jjerryjliu0

yeah check out our SQL guide and pandas index demo:

https://gpt-index.readthedocs.io/en/latest/guides/tutorials/sql_guide.html

https://gpt-index.readthedocs.io/en/latest/examples/index_structs/struct_indices/PandasIndexDemo.html

high-level idea is that you first load CSV into a SQL database or dataframe (not using an LLM, just via code), and then you can use our text-to-sql or pandas functionality

SSenna

so llamaindex is going to figure out the query for me depends on the prompt right?

SSenna

Thanks btw will read it! if this works well, this llamaindex thing is the future 🌎

Add a reply

Find answers from the community

llamaindex for big numerical csv data