Find answers from the community

Home
Members
di5corder4701
d
di5corder4701
Offline, last seen last month
Joined December 13, 2024
i am implementing an anomaly detection on top of the documents that have already been ingested into a vector store (i've been using milvus and opensearch so far); i am trying to take a poor man's approach of loading the documents along with their embeddings from the vector store into an in-memory vector store (faiss) and perform some clustering and anomaly detection (lof, dbscan, faiss) which requires embeddings to be loaded from the underlying vector store (milvus, opensearch, etc); not sure if this is a good approach, so please suggest a better one - would love to hear it!

so, i've been at it for hours and still can't figure out how to get the retriever to return the embeddings along with the text of the documents already stored in the vector store; i tried it with milvus as well as opensearch vector store indexes and they both seem to be trimming "embeddings" fields somewhere before returning the nodes in the code shown below; i debugged into MilvusVectorStore code and i can see that the embeddings are returned from the milvus query but are stripped in MilvusVectorStore#_parse_from_milvus_results(..);
Plain Text
retriever = self.index.as_retriever(similarity_top_k=top_k)
        nodes = retriever.retrieve('*')  # Get all documents
4 comments
L
hi everyone! i have a llamaindex pipeline which tokenizes, extracts entities, etc from a list of documents and stores it in the vector_store; after that, how do i create a query engine on top of that vector store so that i can ask questions about the documents indexed in the vector store? that vector store already has embeddings from previous piplien runs, so i don't want to use VectorStoreIndex.from_documents(..) because the embeddings are already in the data store - what do i do?

pipeline = IngestionPipeline(
transformations=[
SummaryExtractor(summaries=["self"], llm=qa_llm),
QuestionsAnsweredExtractor(llm=qa_llm, questions=1),
EntityExtractor(label_entities=True, llm=qa_llm),
embed_model,
],
vector_store=vector_store,
)