I am aware of three options for querying over structured data like a csv;
- Use a LlamaHub Loader. This will chunk something like a CSV into documents. This injects sequential structure into the data, which in my experience does not work very well.
- TextToSql - query gets converted to SQL and executes. Maybe with an Agent to handle errors and reexecute
- TextToPandas - same as above, but generates pandas and executes over your data
To your original questions, I have not see an embedding model that is able to convert CSV rows into meaningful text (or at least results it better downstream results than TextToSql). Embedding (in NLP) models are trained on sequential data (text), thus the inject sequential structure into data which I find is not that useful for most downstream applications. Long story short, take a look an TextToSQL or Pandas paradigm as a start.