To get started with llama-index, I've been trying to analyze a CSV/Excel file (as a first step, locally on my laptop using ollama due to company regulations). However, the results are rather discouraging (see below). Any hints on how to improve the results? Am I missing something fundamentally or is it just the model size (8B)? Thx!
from pathlib import Path
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.readers.file import CSVReader
reader = CSVReader()
documents = reader.load_data(file=Path('groceries.csv')) # real file ~5MB
# Simplified example for demonstration purposes:
# Product;Items
# Bananas;3
# Apples;4
# Chocolate bars;2
# Cucumbers;1
# Gummy bears;10
# Carrots;5
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm = Ollama(model="llama3.1", request_timeout=360.0)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("...")
print(response)
# Questions:
# How many rows does the table have? => Completely wrong answer for real file (10 instead of 3000)
# What is the total number of grocery items? => Completely wrong answer for real file (by a factor of 2000)
# Show all products that are sweet. => Some or missing, others are wrongly listed for real file.
# In total, much less entries (3-4) than expected and actually matching (>10)