Find answers from the community

Updated 3 months ago

I have a dataset of ~2000 documents

I have a dataset of ~2000 documents which contain information about/checklists for various sports trading card sets. I need the index retrieval time to be about half of what it currently is (currently takes about a minute). What kind of considerations should I make when deciding what type of index to use? I am currently using a Vector Store, which gives decent results but takes too long. Will I have to break up the index if I want to retrieve faster?
d
M
8 comments
are you using the simple vector store (i.e. in-memory)? If you have a large number of documents, we recommend using either FaissVectorStore (also in memory), or any external vectorDBs (e.g. Pinecone, Weaviate, etc)
Thank you. Could you specify what you mean by "in-memory"?
I am using
Plain Text
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir=index_path)
to save the index to file, then loading it with
Plain Text
index = load_index_from_storage( StorageContext.from_defaults(persist_dir="index-???"))
query_engine = index.as_query_engine()
to load it from file. Loading it also takes about 30 minutes, so if there is a way to speed that up as well that would be great. I will look into the FaissVectorStore but could you explain how an external vectorDB is different?
Thank you very much
external vectorDB means that your documents are being sent to a separate service, which holds the data for you
it'd generally be much faster to query, should be on the order of 10s or 100s of miliseconds to query
Thank you, this is amazing.
When you said external vectorDB’s was the Redis Vector Store included in that? Because I seem to be unable to get it to retrieve in under 10 seconds consistently, let alone 10s or 100s of milliseconds.
Add a reply
Sign up and join the conversation on Discord