One thing I've been experimenting and it's been successful as well is asking GPT to summarize the tables - Columns + Rows. I usually store the textual output and add the table extracted (in CSV) to the metadata and feed this table (alongside with the summary) to the llm model after the retrieval
Thanks @lucastonon I’m trying to build a fuzzy matching process where I have internal data and need to match external data. I’m trying to build a rag pipeline that embeds the row in the internal database with the columns I need for matching . Then I embed a row of similar columns from the external data and get the 5 most similar and use the LLM to reason which one is the closest/best match