Hi I ve created an index of more than

At a glance

Hi, I've created an index of more than 100+ pdf files. Now i want to search the pdf file with the name of file. QueryEngine does not pull the information from file by its name.
One more question is i've created 5 indexes and how we can query to spefic index like if i ask question and the answer is inside the "index_1" so how can we avoid to generating more sub-questions.

18 comments

AAhsan Mirza

@Logan M no answer here

LLogan M

🤷‍♂️

Yea by default the query engine is not going to work per file name. Need to use a vector db that supports metadata filtering (and have the filename in the metadata)

Maybe a router query engine is better than a sub question engine for your use case? Otherwise the sub indexes need better descriptions to help with routing

AAhsan Mirza

So, actually there are different engines like 6 csv engines there and one of them is pdf engine.
We have to query csv as separate and pdf engine separatly.

AAhsan Mirza

could we add information about filename in metadata inside docstore?

AAhsan Mirza

@Logan M
s_engine = RouterQueryEngine(
selector=PydanticSingleSelector.from_defaults(),
query_engine_tools=[
policies_store,
information_store,
],
)
This also not pull the information on the base of filename.

AAhsan Mirza

Some files with this name are pulled but most of them are not.

AAhsan Mirza

@Logan M

AAhsan Mirza

could you please give any example @Logan M As i'm new to learn llma😔

LLogan M

So each store here is from a specific file? The only way to reliably pull information per filename is to either use metatdata filtering, or a router engine like you have

With the router engine, each tool should have a good description, that probably also mentions the filename if that's important

AAhsan Mirza

Let me explain,
there is combination of pdf and csv, pdf file contains general information about property and csv file contains detail of property.
I've create an index for each csv on the other side single index for pdf files.
Now we want to search with the number of property
something like "What is the address of 533 prpperty." Here 533 is the name of file.

AAhsan Mirza

@Logan M

LLogan M

Yea thats going to be tough. What kind of index did you use for the csv files? Very specific keywords like that won't work well with vector searches

AAhsan Mirza

VectorStoreIndex for csv and same for pdf files

AAhsan Mirza

PagedCSVReader = download_loader("PagedCSVReader")
loader = PagedCSVReader(encoding="utf-8")

property_inventory_docs = loader.load_data(file=Path("774 Home Inventory.csv"))
property_inventory_index = VectorStoreIndex.from_documents(
property_inventory_docs, service_context=service_context)

storage_context = property_inventory_index.storage_context
storage_context.persist("774 Home Inventory")

this is my code for load csv data and create store.
Please also guide which loader can be user for csv. @Logan M

AAhsan Mirza

@Logan M still got no solution.

LLogan M

paged csv loader is probably the best one to use. But you might need to increase the top k

property_inventory_index.as_query_engine(similarity_top_k=10)

I don't have a solution for you, it just takes a lot of knobs to turn and things to try 🤷‍♂️

AAhsan Mirza

I greatly appreciate your assistance, @Logan M By increasing the value of similarity_top_k, I have been able to obtain more accurate responses using the SubQuestionQueryEngine. Your response has been immensely helpful to me. Thank you!

AAhsan Mirza

By the way, sorry to bother you 🙃

Add a reply

Find answers from the community

Hi I ve created an index of more than