hmm super weird.
Faiss isn't super used, so I guess this went unoticed.
I can try and look into this soon
Hey @Logan M, some more context
i have a 9mb vectorstore and loading takes longer when using simplevectorstore than faiss
yea that makes sense, faiss is pretty optimized
ultimately it returns the ref_doc_ids and respective similarities
which is great
i just need to map them to the node content now
OH you are using vector_store.query() directly
What happens if you use something like index.as_retriever(similarity_top_k=2).retrieve("query")
that works nicely to retrieve ids even w faiss
a) speed of load - it'd be great to go only load faiss vectorstore
b) querying using a pre-obtained raw embedding
not sure what you mean by a) ? It needs to load the docstore/indexstore/faiss to operate, since faiss cant store text
b) This is possible actually
from llama_index import QueryBundle
index.as_retriever(similarity_top_k=2).retrieve(QueryBundle("not used", embedding=[...]))
ok - re: 1-> so it seems at minimum, i'd need the docstore in memory to retrieve the text associated with a doc_ref_id ?
wondering how i'd load what's required to resolve a TextNode with minnimum time spend (e.g. meb can just load the docstore alone & not build the full index?)
I mean, the docstore and vector store are the heaviest components, and both need to be loaded.
In comparison, loading the entire index has minimum impact on top of the above 2
We do have some remote support for the docstore/indexstore (redis, mongodb), which might help with the load times
gotcha
so here i loaded the json docstore manually
and it was super fast <0.1s
Any way to do this style of loading more idomatically/non-breaking w llama?
if i dive deeper and find out i'll let ya know why it's timing like this on my end
do you know if index.refresh
is possible with Faiss?
Hmm, does faiss support delete yet in their api? π
if so, then it's probably possible to make it work
@Logan M here's what i did enable delete:
then had to add this delete_by_value
interesting π€ I think there might be a better way to do this. I'll try to look into it π
Constraint: my ref doc id's are positions/indexes in my vectorstore - so that makes it easier
import faiss
persist_path = 'intents/intents_llamaindex-faiss/vector_store.json'
# index = faiss.read_index("intents/intents_llamaindex-faiss/vector_store.json")
index = faiss.read_index(persist_path)
# Reconstruct all vectors from the index
all_vectors = np.array([index.reconstruct(i) for i in range(index.ntotal)])
# Define the index of the item to delete
index_of_item_to_delete = int(ref_doc_id)
if index_of_item_to_delete >= (len(all_vectors) - 1):
return
# Remove vector from all_vectors at the specified position
modified_vectors = np.delete(all_vectors, index_of_item_to_delete, axis=0)
# Create a new IndexFlat with the modified set of vectors
d = self._faiss_index.d # Dimension of your vectors
new_index = faiss.IndexFlatL2(d)
new_index.add(modified_vectors)
# self.persist()
faiss.write_index(new_index, persist_path)
print('saved new index w deleted vector')
here's this in case this saves any time
then in data_structs.py:
def delete_by_value(self, doc_id: str) -> None:
"""Delete a Node by value."""
keys_to_delete = [key for key, value in self.nodes_dict.items() if value == doc_id]
for key in keys_to_delete:
del self.nodes_dict[key]
if not keys_to_delete:
print(f"Value {doc_id} not found in nodes_dict.")
then this
@Logan M If you don't mind keep me posted on the better way to do it if/when!