Find answers from the community

Updated 3 months ago

hello guys , a technical question , can

hello guys , a technical question , can i create a single vector store(it's so i can create a single query engine that has all the documents but each documents has a different data) using mutiple documents that have different metadata ?
"for file_name,category,date in data: docs = SimpleDirectoryReader( input_files=[f"./documents/{file_name}"] ).load_data() docs.metadata = {"filename": "<doc_file_name>","date":date,"category":category}"
is it possible to make a single vector off these documents ? pass all these docs as an array , ill appreciate the help guys.
W
a
16 comments
Filename is added by default if the file is pdf or doc IMO

For adding date and category, I would suggest you extend the SimpleDirectoryReader class and make changes as per your requirement to add date and category file wise.
i can do that but will that let me be able to create a single vector store ?
"index = VectorStoreIndex.from_documents(docs, transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)], llm=llm)" do i pass docs as an array of all those docs i just add , cuz in the bove code snippet i gave i'm loading several docs and each has its own metadata
i want wether i can create a single vector store for all of them,
all i found is examples of creating a vector stores and query engine for each ones
Yes this will only create a single index
sorry mate but you prolly didnt get what i tried to say XD
"for file_name,category,date in data:
docs = SimpleDirectoryReader(
input_files=[f"./documents/{file_name}"]
).load_data()
docs.metadata = {"filename": "<doc_file_name>","date":date,"category":category}
"
for example i wanna load all docs and somehow have an instance of docs_all that includes all the docs laoded through the iterations and then create a single vector store ,and docs in docs_all have documents that are different and each has its own metadata
Every file gets its own document. And when you will add all_docs to create index. It'll create a single index only
okay but how do i do that honestly i didnt find a native function to add docs on docs or just add em all
can you give me an example code for that using the docs from this example :
file_name,category,date in data:
docs = SimpleDirectoryReader(
input_files=[f"./documents/{file_name}"]
).load_data()
docs.metadata = {"filename": "<doc_file_name>","date":date,"category":category}
how to add docs from each file into a single docs
ill appreciate it man
Plain Text
total_docs = [] # This will contain docs from all the files.
file_name,category,date in data:
     docs = SimpleDirectoryReader(
        input_files=[f"./documents/{file_name}"]
    ).load_data()
    docs.metadata = {"filename": "<doc_file_name>","date":date,"category":category}
    total_docs.extend(docs) # extending new doc in total docs


Something like this?
extending, since docs returned from the reader is already a list
ooooooh .extend is what i wanted
so like i just pass an array of docs when creating
the vectore store
Add a reply
Sign up and join the conversation on Discord