Hi everyone, I’m working with CSV files and exploring the best way to generate and save embeddings for them. I noticed that PagedCSVReader creates one embedding per row, which can be time-consuming for large files.
Could you recommend a more efficient approach to generate embeddings while maintaining accuracy for Retrieval-Augmented Generation (RaG)? I’m looking for something that balances embedding granularity and performance, especially for structured tabular data.
If this doesnt fix your needs, You can checkout PandasAI too,its specifically designed for query on your data principle: https://docs.pandas-ai.com/intro
True, But if you use pandas AI or pandas query engine you dont have to create embeddings for this.
What happens there is these two tools have the head info of the CSV and then based on your query they form a pandas query and then apply it on the pandas dataframe. and then based on the result it provides the answer.
You have the feature to not expose your own data and pandasAI creates a sample data based on the head to provide answers