Find answers from the community

Updated 2 months ago

Hi Guys, i saw this new version 0.11

Hi Guys, i saw this new version 0.11 update last night . is it possible to do similarity search by vector ? just like how search by vector functions available in langchain for respective vector databases?
L
M
k
27 comments
this has always been possible?

This is exactly what happens when you do retriever.retrieve("query") or query_engine.query("query") ?
The above code calls the embedding model to generate a query vector, and runs against your vector db
Or do you mean querying with an embedding vector that you created yourself
Hi @Logan M yes that is right i have an list of vectors that needs to passed as input to search for similar vectors with k=3, this is possible in langachin with function like this similarity_search_by_vector() or similarity_search_by_vector_with_score() where we pass embeddings , k value, and other kwargs
You can do that already for a while

Plain Text
from llama_index.core import QueryBundle

query = QueryBundle(query_str="unused", embedding=[0.1, ...])

nodes = retriever.retrieve(query)
response = query_engine.query(query)
okay i will try this out but is there a way i get the close values with k =3 ?
index.as_retriever(similarity_top_k=3).retrieve(..)
Just set the top k there
Alternatively you can use vector_store.query(), but you'll need to import and fill in this vector store query object
@kapa.ai - how to search by vector in chromadb , i am using the following code self.db = VectorStoreIndex.from_vector_store(
vector_store,
storage_context=storage_context,
embed_model=self.embeddings
) now i want to do something like self.db.similarity_search or . query ( embed_query, k=3) , how to do this
Let me check the knowledge base for you. One moment! :books:
----------------------
To perform a similarity search or query in ChromaDB using LlamaIndex, you can use the as_query_engine method of the VectorStoreIndex class. This method returns a QueryEngine object that you can use to perform queries. Here's an example of how you can do this:

Plain Text
from llama_index.core import VectorStoreIndex

# Assuming you have already created your VectorStoreIndex
index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context,
    embed_model=self.embeddings
)

# Create a QueryEngine
query_engine = index.as_query_engine(similarity_top_k=3)

# Perform a query
response = query_engine.query("Your query here")


In the above code, replace "Your query here" with your actual query. The similarity_top_k=3 argument specifies that you want the top 3 results.

Please note that the query method expects a string query. If you have an embedding and want to perform a similarity search with it, you might need to use a different method or convert the embedding back to a string query.

For more detailed information, you can refer to the following sources:
  • Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Examples > Ingestion > Redis Ingestion Pipeline > Confirm documents are ingested ()
  • LlamaIndex > En > Latest > Examples > Vector_stores > Load documents ()
  • Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Module_guides > Indexing > Vector Store Index usage examples > Connect to external vector stores (with existing embeddings) > Query Part 2 ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai i am not searching using string as query , my application already converts the input query in to embeddings or vectors, now using this vector i need to search in vectorstore
Let me check the knowledge base for you. One moment! :books:
----------------------
If you already have the embeddings or vectors for your query, you can use the VectorStoreQuery class to perform a similarity search in your vector store. Here's an example:

Plain Text
from llama_index.core import VectorStoreQuery

# Assuming you have your query_embedding
query_embedding = ...  # Your query embedding here

# Create a VectorStoreQuery
vector_store_query = VectorStoreQuery(query_embedding=query_embedding, similarity_top_k=3)

# Perform the query
response = index.as_query_engine().query(vector_store_query)


In the above code, replace ... with your actual query embedding. The similarity_top_k=3 argument specifies that you want the top 3 results.

Please note that the query method of the QueryEngine class can accept either a string or a VectorStoreQuery object. If you provide a VectorStoreQuery object, it will perform a similarity search using the provided embedding.

For more detailed information, you can refer to the following source:

  • LlamaIndex > En > Latest > Examples > Vector_stores > Query the vector database ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai , i get this error result = self.db.as_query_engine().query(query_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MuthuKumaranKuppuswa\AppData\Roaming\Python\Python312\site-packages\llama_index\core\instrumentation\dispatcher.py", line 265, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MuthuKumaranKuppuswa\AppData\Roaming\Python\Python312\site-packages\llama_index\core\base\base_query_engine.py", line 48, in query
dispatcher.event(QueryStartEvent(query=str_or_query_bundle))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\MuthuKumaranKuppuswa\AppData\Roaming\Python\Python312\site-packages\pydantic\main.py", line 211, in init
validated_self = self.pydantic_validator.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for QueryStartEvent
query.str
Input should be a valid string [type=string_type, input_value=VectorStoreQuery(query_em...None, hybrid_top_k=None), input_type=VectorStoreQuery]
For further information visit https://errors.pydantic.dev/2.9/v/string_type
query.QueryBundle
Input should be a dictionary or an instance of QueryBundle [type=dataclass_type, input_value=VectorStoreQuery(query_em...None, hybrid_top_k=None), input_type=VectorStoreQuery]
For further information visit https://errors.pydantic.dev/2.9/v/dataclass_type
Let me check the knowledge base for you. One moment! :books:
----------------------
It seems like you're encountering a validation error when trying to use the query method of the QueryEngine class with a VectorStoreQuery object. The error message suggests that the query method is expecting a string or a QueryBundle object, not a VectorStoreQuery object.

Unfortunately, based on the provided knowledge sources, there's no direct example of using a VectorStoreQuery object with the query method of the QueryEngine class. The examples provided in the knowledge sources use a string as the query.

It's possible that the VectorStoreQuery class is not compatible with the query method of the QueryEngine class, or that there's a specific way to use it that's not covered in the provided knowledge sources.

I would recommend checking the documentation of the VectorStoreQuery and QueryEngine classes for more information on how to use them. If the documentation doesn't help, you might want to reach out to the developers or the community for assistance.

Here are the relevant knowledge sources for your reference:
  • LlamaIndex > En > Latest > Examples > Vector_stores > Query the vector database ()
  • LlamaIndex > En > Latest > Examples > Query_engine > SQLJoinQueryEngine ()
  • LlamaIndex > En > Latest > Contributing to LlamaIndex > Contribution Guideline > 1. πŸ†• Extend Core Modules > Query Engines ()
  • LlamaIndex > En > Latest > Api_reference > Tools > Query plan > QueryEngineTool Part 2 ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai , what is the direct approach to pass with vectors to search for vectors in vector index
Let me check the knowledge base for you. One moment! :books:
----------------------
Based on the provided knowledge sources, the direct approach to pass vectors to search for vectors in a vector index using LlamaIndex is not explicitly mentioned. However, the general approach to perform a query using LlamaIndex involves creating a QueryEngine from your VectorStoreIndex and then using the query method of the QueryEngine.

Here's a basic example:

Plain Text
from llama_index.core import VectorStoreIndex

# Assuming you have already created your VectorStoreIndex
index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context,
    embed_model=self.embeddings
)

# Create a QueryEngine
query_engine = index.as_query_engine()

# Perform a query
response = query_engine.query("Your query here")


In the above code, replace "Your query here" with your actual query.

Please note that the query method expects a string query. If you have an embedding and want to perform a similarity search with it, you might need to use a different method or convert the embedding back to a string query.

For more detailed information, you can refer to the following sources:

  • Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Examples > Ingestion > Redis Ingestion Pipeline > Confirm documents are ingested ()
  • LlamaIndex > En > Latest > Examples > Vector_stores > Load documents ()
  • Jerryjliu > Llama_index > Blob > Main > Docs > Docs > Module_guides > Indexing > Vector Store Index usage examples > Connect to external vector stores (with existing embeddings) > Query Part 2 ()
If you want to perform a similarity search using vectors, you might need to check the specific documentation or API of the vector store you're using, as the method to do this can vary between different vector stores.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Let me check the knowledge base for you again. One moment! :books:
----------------------
Based on the error message you're encountering, it seems that the TrafilaturaWebReader is not able to extract text from some of the URLs you're trying to parse. This is causing the Document creation to fail because it's receiving None as the text value, which is not a valid string.

To address this issue, you can modify your approach to handle cases where the text extraction fails. Here's a modified version of your code that should help:

Plain Text
import uuid
from llama_index.readers.web import TrafilaturaWebReader
from llama_index.core import Document

def new_uuids(file_name: str):
    """Assign metadata based on the file name."""
    return {"id_": uuid.uuid4().hex}

reader = TrafilaturaWebReader()
docs = []

for url in to_parse[:50]:
    try:
        loaded_docs = reader.load_data([url], show_progress=True)
        if loaded_docs and loaded_docs[0].text:  # Check if text was successfully extracted
            loaded_docs[0].metadata.update(new_uuids(url))
            docs.append(loaded_docs[0])
        else:
            print(f"Failed to extract text from {url}")
    except Exception as e:
        print(f"Error processing {url}: {str(e)}")

print(f"Successfully loaded {len(docs)} documents")


This approach does the following:

  1. It processes each URL individually.
  2. It uses a try-except block to catch any exceptions that might occur during the loading process.
  3. It checks if the loaded document has text before adding it to the docs list.
  4. It adds the UUID to the metadata of successfully loaded documents.
  5. It prints error messages for URLs that fail to load or don't contain text.
This method should prevent the validation error you're seeing and give you more control over the document loading process.
Remember that web scraping can be unpredictable due to various factors like network issues, website structure changes, or content availability. Always ensure you have permission to scrape the websites you're targeting and consider implementing rate limiting to avoid overloading the servers.

For more information on using the TrafilaturaWebReader, you can refer to the LlamaIndex documentation on TrafilaturaWebReader ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord