Find answers from the community

Updated 2 months ago

Csv

To get started with llama-index, I've been trying to analyze a CSV/Excel file (as a first step, locally on my laptop using ollama due to company regulations). However, the results are rather discouraging (see below). Any hints on how to improve the results? Am I missing something fundamentally or is it just the model size (8B)? Thx!

Plain Text
from pathlib import Path

from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.readers.file import CSVReader

reader = CSVReader()
documents = reader.load_data(file=Path('groceries.csv')) # real file ~5MB
# Simplified example for demonstration purposes:
# Product;Items
# Bananas;3
# Apples;4
# Chocolate bars;2
# Cucumbers;1
# Gummy bears;10
# Carrots;5

Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm = Ollama(model="llama3.1", request_timeout=360.0)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("...")
print(response)

# Questions:
# How many rows does the table have? => Completely wrong answer for real file (10 instead of 3000)
# What is the total number of grocery items? => Completely wrong answer for real file (by a factor of 2000)
# Show all products that are sweet. => Some or missing, others are wrongly listed for real file.
#   In total, much less entries (3-4) than expected and actually matching (>10)
L
R
2 comments
Csvs usually do not work well with a vector index.

What it's doing is embedding each row and returning the top k. You can probably guess from that that the questions you are asking will not be answered well

Probably best to put the csv into a sqlite or duckdb db for text to sql, or use a pandas query engine
@charly-napf what did you end up proceeding with?
Add a reply
Sign up and join the conversation on Discord