Find answers from the community

Updated 3 months ago

how many files can you index lets say I

how many files can you index? lets say I wanted to have an entire codebase as an index is that possible?
W
S
12 comments
Yes, it will be loaded into your memory during runtime. So directly depends on your memory size.

Plus same space required in your storage disk to keep the embeddings locally if you are using VectorStoreIndex
what would be the best appraoch if lets say, I have a developement side of things and also financial marketing data. Would it be a bad idea to have a huge lump of mixed data so whenever someone queries they have access to everything? or split it off
is the first approach even viable?
If the docs are different let say Marketing has nothing to do with the developmnt code then it would be better for you to have separate vectors.


I'm hoping that these two can be launched separately or you would have to set some conditons that user selects if he wants to ask query over dev docs or financial docs, Based on the choice simply pick the vector store and proceed accordingly.
wait, how could I launch multiple vectors at once? I am hoping to do this but I am not sure how to do it :D
Yes you can, It will be something like this

Plain Text
dev_docs = SimpleDirectoryReader(DEV_DOCS_PATH).load_data()
financial_docs = SimpleDirectoryReader(FINANCIAL_DOCS_PATH).load_data()
index_dev_docs = GPTVectorStoreIndex.from_documents(dev_docs, service_context=service_context)
index_financial_docs = GPTVectorStoreIndex.from_documents(financial_docs, service_context=service_context)


docs_query_engine = index_dev_docs.as_query_engine()
financial_query_engine = index_financial_docs.as_query_engine()

# for dev query
response = docs_query_engine.query("what is the status on deployment")

# for finance
response = financial_query_engine.query("How to become financially independent")


Now you have two query engines for two different docs, Choose the one based on your condition
No I meant it the other way round where, everything wound be under one index
Would that be the preferred way?
There's no preferred way. It depends on your usecase, If all the docs are interconnected then it would be better to have them together. And if not then keeping them separately would be a good choice.
ok cool, thank you. One more question. Is it possible to append to an exising vector db if new data emerges? more of, is it a simple or a tedius task?
Yes, its a simple task.

Plain Text
# existing_index
index_dev_docs = GPTVectorStoreIndex.from_documents(dev_docs, service_context=service_context)

# new data
new_docs = SimpleDirectoryReader(NEW_DOCS_PATH).load_data()

# Add to existing index 
for docs in new_docs:
    index_dev_docs.insert(docs)
omg this is amazing ty so much
Add a reply
Sign up and join the conversation on Discord