Find answers from the community

Updated last year

Hi I ve created an index of more than

Hi, I've created an index of more than 100+ pdf files. Now i want to search the pdf file with the name of file. QueryEngine does not pull the information from file by its name.
One more question is i've created 5 indexes and how we can query to spefic index like if i ask question and the answer is inside the "index_1" so how can we avoid to generating more sub-questions.
A
L
18 comments
@Logan M no answer here
πŸ€·β€β™‚οΈ

Yea by default the query engine is not going to work per file name. Need to use a vector db that supports metadata filtering (and have the filename in the metadata)

Maybe a router query engine is better than a sub question engine for your use case? Otherwise the sub indexes need better descriptions to help with routing
So, actually there are different engines like 6 csv engines there and one of them is pdf engine.
We have to query csv as separate and pdf engine separatly.
could we add information about filename in metadata inside docstore?
@Logan M
s_engine = RouterQueryEngine(
selector=PydanticSingleSelector.from_defaults(),
query_engine_tools=[
policies_store,
information_store,
],
)
This also not pull the information on the base of filename.
Some files with this name are pulled but most of them are not.
could you please give any example @Logan M As i'm new to learn llmaπŸ˜”
So each store here is from a specific file? The only way to reliably pull information per filename is to either use metatdata filtering, or a router engine like you have

With the router engine, each tool should have a good description, that probably also mentions the filename if that's important
Let me explain,
there is combination of pdf and csv, pdf file contains general information about property and csv file contains detail of property.
I've create an index for each csv on the other side single index for pdf files.
Now we want to search with the number of property
something like "What is the address of 533 prpperty." Here 533 is the name of file.
Yea thats going to be tough. What kind of index did you use for the csv files? Very specific keywords like that won't work well with vector searches
VectorStoreIndex for csv and same for pdf files
PagedCSVReader = download_loader("PagedCSVReader")
loader = PagedCSVReader(encoding="utf-8")

property_inventory_docs = loader.load_data(file=Path("774 Home Inventory.csv"))
property_inventory_index = VectorStoreIndex.from_documents(
property_inventory_docs, service_context=service_context)

storage_context = property_inventory_index.storage_context
storage_context.persist("774 Home Inventory")


this is my code for load csv data and create store.
Please also guide which loader can be user for csv. @Logan M
@Logan M still got no solution.
paged csv loader is probably the best one to use. But you might need to increase the top k

property_inventory_index.as_query_engine(similarity_top_k=10)

I don't have a solution for you, it just takes a lot of knobs to turn and things to try πŸ€·β€β™‚οΈ
I greatly appreciate your assistance, @Logan M By increasing the value of similarity_top_k, I have been able to obtain more accurate responses using the SubQuestionQueryEngine. Your response has been immensely helpful to me. Thank you!
By the way, sorry to bother you πŸ™ƒ
Add a reply
Sign up and join the conversation on Discord