RAG with tables.

At a glance

The community member is having trouble with RAG (Retrieval Augmented Generation) when working with a markdown table with six columns, two of which contain money values. The model they are using includes 'llama2 7b llm' and 'HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")' with a service context chunk size of 1024, but they are not getting correct information. The model mixes up columns, is unable to sum according to the ID column, and doesn't recognize the latest row.

Other community members have suggested using text2sql or the Pandas query engine instead of chunking the table. They have also provided links to examples of using SQL indices with LLaMA. The community member has tried using the MarkdownReader, but they are unsure if the model and embedding, as well as the chunk size, could be the issue. Another community member is not sure if tables directly given to LLMs always give correct answers based on their experiments.

There is no explicitly marked answer in the comments.

Useful resources

MMarioZ

I need help with RAG. I've provided a markdown table with six columns, with the latest two containing money values. The model includes 'llama2 7b llm' and 'HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")', with a service context chunk size of 1024. Unfortunately, I didn't obtain any correct information. The model mixes up columns, unable to sum according to the ID column, and it doesn't recognize the latest row. Any advice would be greatly appreciated.

7 comments

rravitheja

@MarioZ for tables, it would be good to use text2sql rather than chunking process.

rravitheja

If its small table you can use pandas query engine as well - https://docs.llamaindex.ai/en/stable/examples/query_engine/pandas_query_engine.html

rravitheja

https://docs.llamaindex.ai/en/stable/examples/index_structs/struct_indices/SQLIndexDemo.html

MMarioZ

@ravitheja I already got almost the same advice on this server to use text2sql or markdown if tables are smaller. So I use:
MarkdownReader = download_loader("MarkdownReader")
but I it is not the problem to use pandas query engine if it works with llm models other then openai. But I don't think that it is a problem in a loader. Do you think that the model and embedding, chunk size as well could return to me satisfied results? I mean on a dataset I described?

rravitheja

Not sure. Tables directly given to LLM did not always give correct answer based on my experiments.

MMarioZ

hmm, I'm feeling that there is no lots of information about structured numeric data although some examples shows that llm read financial data from pdf. I'm little confused with that fact. Thank you @ravitheja on sharing your thoughts.

MMarioZ

*show

Add a reply

Find answers from the community

RAG with tables.