LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Loading multiple indices from recursive folder with parent path

Loading multiple indices from recursive folder with parent path

At a glance

The community members are discussing how to handle multiple indexes in a recursive folder structure. The initial post asks if all the indexes will be merged when using the load_indices_from_storage() function, but the comments clarify that the indexes will not be merged automatically. Instead, the function will return a dictionary of indexes with the index IDs as keys.

The community members suggest several approaches for dealing with multiple indexes, such as: - Merging the nodes from different files first, then building the index - Customizing the SimpleDirectoryReader to maintain the hierarchy when indexing - Using a high-level router to navigate between separate indexes for different sets of documents - Indexing the most relevant files separately

There is no explicitly marked answer, but the community members provide guidance on best practices for handling multiple indexes in a hierarchical file structure.

Useful resources

·

Hi guys, if i have a recursive folder with indexes saved on them and use this line with parrent folder path, will i expect to see all the indexes merged in one right? # load multiple indices
indices = load_indices_from_storage(storage_context) # loads all indices
indices = load_indices_from_storage(
storage_context, index_ids=[index_id1, ...]
) # loads specific indices @Logan M

L

m

W

18 comments

indexes do not get merged 🤔 Not sure where you might have read that

its in llamaindex page, see the screenshot

Attachment

it says load multiple indexes and by merge thats what i meant!

but @Logan M in general, what do you suggest as a best practice when u have multiple files with a kind of hirarchy when u want to index them? can you navigate me to the llamaindex strategy for this case.

i notice all of the indexes should be in the same directory though! perhaps just set the index id to different names, to be able to load

i think maybe best is first to merge all my nodes (i am calling llamaparse for my nodes creation) coming from different files and then do transformations and eventually building index out of them!

i noticed that if i merge the nodes first and then try to use the transformations like transformations = [
SentenceSplitter(),
TitleExtractor(nodes=5),
QuestionsAnsweredExtractor(questions=2),
SummaryExtractor(summaries=["prev", "self"]),
# KeywordExtractor(keywords=10),
], then as i consider overlap between the nodes, then it comes up with wong document title and summary extractors!

and all the examples in llamparse almost if when we have just one documet, do we have a example for multiple documents?

or shoul i use https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents/? @WhiteFang_Jr and @Logan M

Yeah it picks all the stored indices but it wont merge them together.
It would be like a dict of indices with key being the index_id.

does this mean we can add this to the query engine? if not, can we build a custom index by loadinga and merging the indexes oursleves?

When you say merging them does that mean combining everything together and creating a single index?

Not sure why you would do that so!

If you have different indexes for some reasons you can still use Router query engine to route the user query to the desired query_engine.

so i have multiple documents with a kind of hirarchy to them, i was till now trying different techniques separately only using one of them, now i want to have a chat engine over all of them, and i was hoping having one index and using advance techniques the user can query all the documents.

fo mine is not query engine, is chat engine

so in my case if i have separate indexes each responsible for a set of pdf documents, its better that i maintain the indexes separately and use a high level router to navigate to the correct index?

so basically i have different files, and i was thinking the best way is to index them separately the ones that are more relevant!

Indexing wont take hierarchy order, At the end of it it will keep everything on the same line as all the nodes are kept together in the same dict.

So if let say you have a folder in which there are sub folders and you want some sort of hierarchy maintained, I would suggest you customize the SimpleDirectoryReader for recursive call and update metadata based on that.

nice, it looks best, i aslo dont think i need the advance query engine routing for the first version

Add a reply

Sign up and join the conversation on Discord