Hi I ran into a issue today and I m not

At a glance

Hi! I ran into a issue today and I'm not sure how to handle it. I created three vector stores for my different kind of files.
1 - Policies Index
2- Property Information Index
3- Property Invoice Index.
After that i created query_engine_tools to query with different indexes.
Property invoice files are contains Table data and query_engine does not get the expected results. Sometimes it says not enough information and sometimes it gives wrong information. it didn't give response on the quantity of different items in property.
I used vectorStoreIndex in my all indexes. Please suggest which index will be used for table format and how to query it in better way. Thanks

8 comments

LLogan M

Table data is still pretty tough to work with, especially when it's contained in larger PDF files

Ideally, you could pull the table out ahead of time (maybe using something like camelot?) and use a pandas or sql query engine for the tables.

There's a small example here
https://gpt-index.readthedocs.io/en/latest/examples/query_engine/pdf_tables/recursive_retriever.html

AAhsan Mirza

@Logan M the example you shared which is about single pdf file, if there are multiple pdf files with excel sheets. What should we do?

AAhsan Mirza

if we have multiple files like flat pdf, docs file and also csv and xlxs.
What will be the better way to make the chat effective and get accurate answers

AAhsan Mirza

@Logan M

LLogan M

I mean, you could repeat the table extraction process for each document.

But tbh like I said, this is still an area for improvement. Other loaders like unstructured or deepdocdetection offer some table stuff as well, but no one is doing it perfectly yet.

Likely, some custom loading process will probably need to be invested in

AAhsan Mirza

what will be the better way to achieve accurate results?

LLogan M

No idea tbh. Probably parsing the tables into their own nested pandas query engines like that example shows. It's a tough problem. Either that or parsing the tables into a format more understood by an LLM (like every row becomes a document)

AAhsan Mirza

thanks

Add a reply

Find answers from the community

Hi I ran into a issue today and I m not