indexes do not get merged π€ Not sure where you might have read that
its in llamaindex page, see the screenshot
it says load multiple indexes and by merge thats what i meant!
but @Logan M in general, what do you suggest as a best practice when u have multiple files with a kind of hirarchy when u want to index them? can you navigate me to the llamaindex strategy for this case.
i notice all of the indexes should be in the same directory though! perhaps just set the index id to different names, to be able to load
i think maybe best is first to merge all my nodes (i am calling llamaparse for my nodes creation) coming from different files and then do transformations and eventually building index out of them!
i noticed that if i merge the nodes first and then try to use the transformations like transformations = [
SentenceSplitter(),
TitleExtractor(nodes=5),
QuestionsAnsweredExtractor(questions=2),
SummaryExtractor(summaries=["prev", "self"]),
# KeywordExtractor(keywords=10),
], then as i consider overlap between the nodes, then it comes up with wong document title and summary extractors!
and all the examples in llamparse almost if when we have just one documet, do we have a example for multiple documents?
Yeah it picks all the stored indices but it wont merge them together.
It would be like a dict of indices with key being the index_id
.
does this mean we can add this to the query engine? if not, can we build a custom index by loadinga and merging the indexes oursleves?
When you say merging them does that mean combining everything together and creating a single index?
Not sure why you would do that so!
If you have different indexes for some reasons you can still use Router query engine to route the user query to the desired query_engine.
so i have multiple documents with a kind of hirarchy to them, i was till now trying different techniques separately only using one of them, now i want to have a chat engine over all of them, and i was hoping having one index and using advance techniques the user can query all the documents.
fo mine is not query engine, is chat engine
so in my case if i have separate indexes each responsible for a set of pdf documents, its better that i maintain the indexes separately and use a high level router to navigate to the correct index?
so basically i have different files, and i was thinking the best way is to index them separately the ones that are more relevant!
Indexing wont take hierarchy order, At the end of it it will keep everything on the same line as all the nodes are kept together in the same dict.
So if let say you have a folder in which there are sub folders and you want some sort of hierarchy maintained, I would suggest you customize the SimpleDirectoryReader
for recursive call and update metadata based on that.
nice, it looks best, i aslo dont think i need the advance query engine routing for the first version