Find answers from the community

Updated last month

Implementing Anomaly Detection on Ingested Documents Using In-Memory Vector Stores

At a glance

The community member is implementing an anomaly detection system on top of documents stored in a vector store (Milvus and OpenSearch). They are trying to load the documents and their embeddings into an in-memory vector store (FAISS) to perform clustering and anomaly detection. However, they are having trouble getting the retriever to return the embeddings along with the text of the documents. The community members suggest that most vector stores don't have an option to return embeddings, mostly to save memory. They recommend using the underlying client for the vector database or submitting a pull request to add the desired functionality.

i am implementing an anomaly detection on top of the documents that have already been ingested into a vector store (i've been using milvus and opensearch so far); i am trying to take a poor man's approach of loading the documents along with their embeddings from the vector store into an in-memory vector store (faiss) and perform some clustering and anomaly detection (lof, dbscan, faiss) which requires embeddings to be loaded from the underlying vector store (milvus, opensearch, etc); not sure if this is a good approach, so please suggest a better one - would love to hear it!

so, i've been at it for hours and still can't figure out how to get the retriever to return the embeddings along with the text of the documents already stored in the vector store; i tried it with milvus as well as opensearch vector store indexes and they both seem to be trimming "embeddings" fields somewhere before returning the nodes in the code shown below; i debugged into MilvusVectorStore code and i can see that the embeddings are returned from the milvus query but are stripped in MilvusVectorStore#_parse_from_milvus_results(..);
Plain Text
retriever = self.index.as_retriever(similarity_top_k=top_k)
        nodes = retriever.retrieve('*')  # Get all documents
L
4 comments
most vector stores don't have an option to return embeddings
mostly to save memory
I would just use the underlying client for whichever vectordb you are using
Or feel free to make a PR
Add a reply
Sign up and join the conversation on Discord