Hi! I ran into a issue today and I'm not sure how to handle it. I created three vector stores for my different kind of files. 1 - Policies Index 2- Property Information Index 3- Property Invoice Index. After that i created query_engine_tools to query with different indexes. Property invoice files are contains Table data and query_engine does not get the expected results. Sometimes it says not enough information and sometimes it gives wrong information. it didn't give response on the quantity of different items in property. I used vectorStoreIndex in my all indexes. Please suggest which index will be used for table format and how to query it in better way. Thanks
if we have multiple files like flat pdf, docs file and also csv and xlxs. What will be the better way to make the chat effective and get accurate answers
I mean, you could repeat the table extraction process for each document.
But tbh like I said, this is still an area for improvement. Other loaders like unstructured or deepdocdetection offer some table stuff as well, but no one is doing it perfectly yet.
Likely, some custom loading process will probably need to be invested in
No idea tbh. Probably parsing the tables into their own nested pandas query engines like that example shows. It's a tough problem. Either that or parsing the tables into a format more understood by an LLM (like every row becomes a document)