Hi Guys, i saw this new version 0.11

Hi Guys, i saw this new version 0.11 update last night . is it possible to do similarity search by vector ? just like how search by vector functions available in langchain for respective vector databases?

27 comments

this has always been possible?

This is exactly what happens when you do retriever.retrieve("query") or query_engine.query("query") ?

The above code calls the embedding model to generate a query vector, and runs against your vector db

Or do you mean querying with an embedding vector that you created yourself

Hi @Logan M yes that is right i have an list of vectors that needs to passed as input to search for similar vectors with k=3, this is possible in langachin with function like this similarity_search_by_vector() or similarity_search_by_vector_with_score() where we pass embeddings , k value, and other kwargs

You can do that already for a while

Plain Text

from llama_index.core import QueryBundle

query = QueryBundle(query_str="unused", embedding=[0.1, ...])

nodes = retriever.retrieve(query)
response = query_engine.query(query)

okay i will try this out but is there a way i get the close values with k =3 ?

index.as_retriever(similarity_top_k=3).retrieve(..)

Just set the top k there

Alternatively you can use vector_store.query(), but you'll need to import and fill in this vector store query object

@kapa.ai - how to search by vector in chromadb , i am using the following code self.db = VectorStoreIndex.from_vector_store(
vector_store,
storage_context=storage_context,
embed_model=self.embeddings
) now i want to do something like self.db.similarity_search or . query ( embed_query, k=3) , how to do this

Let me check the knowledge base for you. One moment! :books:

----------------------
To perform a similarity search or query in ChromaDB using LlamaIndex, you can use the as_query_engine method of the VectorStoreIndex class. This method returns a QueryEngine object that you can use to perform queries. Here's an example of how you can do this:

Plain Text

from llama_index.core import VectorStoreIndex

# Assuming you have already created your VectorStoreIndex
index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context,
    embed_model=self.embeddings
)

# Create a QueryEngine
query_engine = index.as_query_engine(similarity_top_k=3)

# Perform a query
response = query_engine.query("Your query here")

In the above code, replace "Your query here" with your actual query. The similarity_top_k=3 argument specifies that you want the top 3 results.

Please note that the query method expects a string query. If you have an embedding and want to perform a similarity search with it, you might need to use a different method or convert the embedding back to a string query.

For more detailed information, you can refer to the following sources:

Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Examples > Ingestion > Redis Ingestion Pipeline > Confirm documents are ingested ()
LlamaIndex > En > Latest > Examples > Vector_stores > Load documents ()
Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Module_guides > Indexing > Vector Store Index usage examples > Connect to external vector stores (with existing embeddings) > Query Part 2 ()

----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@kapa.ai i am not searching using string as query , my application already converts the input query in to embeddings or vectors, now using this vector i need to search in vectorstore

Let me check the knowledge base for you. One moment! :books:

----------------------
If you already have the embeddings or vectors for your query, you can use the VectorStoreQuery class to perform a similarity search in your vector store. Here's an example:

Plain Text

from llama_index.core import VectorStoreQuery

# Assuming you have your query_embedding
query_embedding = ...  # Your query embedding here

# Create a VectorStoreQuery
vector_store_query = VectorStoreQuery(query_embedding=query_embedding, similarity_top_k=3)

# Perform the query
response = index.as_query_engine().query(vector_store_query)

In the above code, replace ... with your actual query embedding. The similarity_top_k=3 argument specifies that you want the top 3 results.

Please note that the query method of the QueryEngine class can accept either a string or a VectorStoreQuery object. If you provide a VectorStoreQuery object, it will perform a similarity search using the provided embedding.

For more detailed information, you can refer to the following source:

LlamaIndex > En > Latest > Examples > Vector_stores > Query the vector database ()

----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@kapa.ai , i get this error result = self.db.as_query_engine().query(query_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MuthuKumaranKuppuswa\AppData\Roaming\Python\Python312\site-packages\llama_index\core\instrumentation\dispatcher.py", line 265, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MuthuKumaranKuppuswa\AppData\Roaming\Python\Python312\site-packages\llama_index\core\base\base_query_engine.py", line 48, in query
dispatcher.event(QueryStartEvent(query=str_or_query_bundle))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MuthuKumaranKuppuswa\AppData\Roaming\Python\Python312\site-packages\pydantic\main.py", line 211, in init
validated_self = self.pydantic_validator.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for QueryStartEvent
query.str
Input should be a valid string [type=string_type, input_value=VectorStoreQuery(query_em...None, hybrid_top_k=None), input_type=VectorStoreQuery]
For further information visit https://errors.pydantic.dev/2.9/v/string_type
query.QueryBundle
Input should be a dictionary or an instance of QueryBundle [type=dataclass_type, input_value=VectorStoreQuery(query_em...None, hybrid_top_k=None), input_type=VectorStoreQuery]
For further information visit https://errors.pydantic.dev/2.9/v/dataclass_type

Let me check the knowledge base for you. One moment! :books:

----------------------
It seems like you're encountering a validation error when trying to use the query method of the QueryEngine class with a VectorStoreQuery object. The error message suggests that the query method is expecting a string or a QueryBundle object, not a VectorStoreQuery object.

Unfortunately, based on the provided knowledge sources, there's no direct example of using a VectorStoreQuery object with the query method of the QueryEngine class. The examples provided in the knowledge sources use a string as the query.

It's possible that the VectorStoreQuery class is not compatible with the query method of the QueryEngine class, or that there's a specific way to use it that's not covered in the provided knowledge sources.

I would recommend checking the documentation of the VectorStoreQuery and QueryEngine classes for more information on how to use them. If the documentation doesn't help, you might want to reach out to the developers or the community for assistance.

Here are the relevant knowledge sources for your reference:

LlamaIndex > En > Latest > Examples > Vector_stores > Query the vector database ()
LlamaIndex > En > Latest > Examples > Query_engine > SQLJoinQueryEngine ()
LlamaIndex > En > Latest > Contributing to LlamaIndex > Contribution Guideline > 1. 🆕 Extend Core Modules > Query Engines ()
LlamaIndex > En > Latest > Api_reference > Tools > Query plan > QueryEngineTool Part 2 ()

----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@kapa.ai , what is the direct approach to pass with vectors to search for vectors in vector index

Let me check the knowledge base for you. One moment! :books:

----------------------
Based on the provided knowledge sources, the direct approach to pass vectors to search for vectors in a vector index using LlamaIndex is not explicitly mentioned. However, the general approach to perform a query using LlamaIndex involves creating a QueryEngine from your VectorStoreIndex and then using the query method of the QueryEngine.

Here's a basic example:

Plain Text

from llama_index.core import VectorStoreIndex

# Assuming you have already created your VectorStoreIndex
index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context,
    embed_model=self.embeddings
)

# Create a QueryEngine
query_engine = index.as_query_engine()

# Perform a query
response = query_engine.query("Your query here")

In the above code, replace "Your query here" with your actual query.

Please note that the query method expects a string query. If you have an embedding and want to perform a similarity search with it, you might need to use a different method or convert the embedding back to a string query.

For more detailed information, you can refer to the following sources:

Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Examples > Ingestion > Redis Ingestion Pipeline > Confirm documents are ingested ()
LlamaIndex > En > Latest > Examples > Vector_stores > Load documents ()
Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Module_guides > Indexing > Vector Store Index usage examples > Connect to external vector stores (with existing embeddings) > Query Part 2 ()

If you want to perform a similarity search using vectors, you might need to check the specific documentation or API of the vector store you're using, as the method to do this can vary between different vector stores.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Let me check the knowledge base for you again. One moment! :books:

----------------------
Based on the error message you're encountering, it seems that the TrafilaturaWebReader is not able to extract text from some of the URLs you're trying to parse. This is causing the Document creation to fail because it's receiving None as the text value, which is not a valid string.

To address this issue, you can modify your approach to handle cases where the text extraction fails. Here's a modified version of your code that should help:

Plain Text

import uuid
from llama_index.readers.web import TrafilaturaWebReader
from llama_index.core import Document

def new_uuids(file_name: str):
    """Assign metadata based on the file name."""
    return {"id_": uuid.uuid4().hex}

reader = TrafilaturaWebReader()
docs = []

for url in to_parse[:50]:
    try:
        loaded_docs = reader.load_data([url], show_progress=True)
        if loaded_docs and loaded_docs[0].text:  # Check if text was successfully extracted
            loaded_docs[0].metadata.update(new_uuids(url))
            docs.append(loaded_docs[0])
        else:
            print(f"Failed to extract text from {url}")
    except Exception as e:
        print(f"Error processing {url}: {str(e)}")

print(f"Successfully loaded {len(docs)} documents")

This approach does the following:

It processes each URL individually.
It uses a try-except block to catch any exceptions that might occur during the loading process.
It checks if the loaded document has text before adding it to the docs list.
It adds the UUID to the metadata of successfully loaded documents.
It prints error messages for URLs that fail to load or don't contain text.

This method should prevent the validation error you're seeing and give you more control over the document loading process.