I see this self.db.similarity_search(query, k, filter =...

At a glance

The community member @kapa.ai asked how to perform a similarity search in LlamaIndex, similar to the self.db.similarity_search(query, k, filter = filter) method used in LangChain. The other community members provided the following information:

In LlamaIndex, you can perform a similarity search using the similarity_search_with_score method available in the JaguarVectorStore class. This method returns nodes most similar to the query embedding, along with their ids and scores. The community member also provided an example of how to use this method.

Another community member mentioned that if you are using Chroma as your vector store, you would need to use the ChromaVectorStore class to create your vector store and perform the similarity search. They also suggested using the retrieve method of a retriever object to perform a similarity search, and provided an example of how to do this.

The community members noted that the actual implementation might vary depending on the specific vector store being used, and provided links to the LlamaIndex documentation for more details.

MMKhere

@kapa.ai I see this self.db.similarity_search(query, k, filter = filter) used in langchain, how to do the same similarity search in llamaindex

6 comments

kkapa.ai

Please stand by @MKhere as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
In LlamaIndex, you can perform a similarity search using the similarity_search_with_score method available in the JaguarVectorStore class. This method returns nodes most similar to the query embedding, along with their ids and scores. Here is how you can use it:

Plain Text

similarity_search_with_score(embedding: Optional[List[float]], k: int = 3, form: str = 'node', **kwargs: Any) -> Union[Tuple[List[TextNode], List[str], List[float]], List[Document]]

The parameters are as follows:

embedding: The embedding of the text to look up.
k: The number of nodes to return. Defaults to 3.
form: If "node", return Tuple[List[TextNode], List[str], List[float]]. If "doc", return List[Document].
kwargs: May have where, metadata_fields, args, fetch_k.

The method returns a tuple consisting of a list of nodes, a list of ids, and a list of similarity scores.

Here is an example of how you might use it:

Plain Text

# Assuming you have a JaguarVectorStore instance `store`
embedding = [0.1, 0.2, 0.3, 0.4, 0.5]  # This should be your actual embedding
k = 5  # Number of similar nodes to return
form = 'node'  # Return format
store.similarity_search_with_score(embedding, k, form)

Please note that the actual implementation might vary depending on the specific vector store you are using. For example, the RocksetVectorStore, VearchVectorStore, LanceDBVectorStore, SingleStoreVectorStore, TxtaiVectorStore, and MongoDBAtlasVectorSearch all have their own implementations of the query method, which is used to perform similarity searches.

kkapa.ai

You can find more details in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

MMKhere

@kapa.ai no here we are not passing any embeddings, also i am using chromadb , the langchain definition of similarity_search is below def similarity_search(
self,
query: str,
k: int = 4,
score_threshold: Optional[float] = 0.0,
kwargs: Any,
) -> List[Document]:
"""Return docs most similar to query.

By default, supports Approximate Search.
Also supports Script Scoring and Painless Scripting.

Args:
query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
score_threshold: Specify a score threshold to return only documents
above the threshold. Defaults to 0.0.

Returns:
List of Documents most similar to the query.

text_field: Document field the text of the document is stored in. Defaults
to "text".

metadata_field: Document field that metadata is stored in. Defaults to
"metadata".
Can be set to a special value "*" to include the entire document.

Optional Args for Approximate Search:
search_type: "approximate_search"; default: "approximate_search"

boolean_filter: A Boolean filter is a post filter consists of a Boolean
query that contains a k-NN query and a filter.

k-NN search with pre-filtering or an approximate search with modified
post-filtering. (deprecated, use efficient_filter)

pre_filter: script_score query to pre-filter documents before identifying
nearest neighbors; default: {"match_all": {}}
"""
docs_with_scores = self.similarity_search_with_score(
query, k, score_threshold, kwargs
)
return [doc[0] for doc in docs_with_scores]

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
In LlamaIndex, you can perform a similarity search using the retrieve method of a retriever object. This method returns documents most similar to the query text. Here is how you can use it:

First, you need to create a retriever from the index:

Plain Text

retriever = index.as_retriever()

Then, you can use the retrieve method to perform a similarity search:

Plain Text

response = retriever.retrieve("query text")

The retrieve method returns a list of documents most similar to the query text. You can iterate over the response to get the text, score, and metadata of each document:

Plain Text

for row in response:
    print(f"Text: {row.get_text()}")
    print(f" Score: {row.get_score():.3f}")
    print(f" Metadata: {row.metadata}")

If you want to apply a filter to the search results, you can specify a filters argument when creating the retriever:

Plain Text

from llama_index.core.vector_stores.types import MetadataFilters, MetadataFilter

filters = MetadataFilters(filters=[MetadataFilter(key="color", value="blue")])
retriever = index.as_retriever(filters=filters, similarity_top_k=3)

Please note that the actual implementation might vary depending on the specific vector store you are using. For example, if you are using Chroma as your vector store, you would need to use the ChromaVectorStore class to create your vector store.

You can find more details in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

I see this self.db.similarity_search(query, k, filter = filter) used in langchain, how to