Guys how do i query the vector embedding

At a glance

The community member is trying to query a vector embedding index created using LlamaIndex. They have an index self.db and want to search it using a query embedding. The community members discuss how to achieve this, with suggestions to use the VectorStoreQuery and QueryBundle classes to pass the query embedding and retrieve the relevant nodes. They also discuss how to limit the number of results returned and how to create a chain of thoughts using LlamaIndex, referring to external resources for more details.

Useful resources

MMKhere

Guys how do i query the vector embedding, llama_docs=[]
for doc in self.documents:
llama_docs.append(Document.from_langchain_format(doc))

self.db = VectorStoreIndex.from_documents(
llama_docs,
storage_context=storage_context,
embed_model=self.embeddings,
)
this is my index , self.db i want to do search the db using vector, something like self.db.query(query_embeding) , here query_embedding which my application already converts in to embedding

14 comments

WWhiteFang_Jr

Hi, let me know if I'm getting your question right or not.

You want to retrieve the related nodes based on your query right?

And you do not want to generate response, just retrieve nodes

MMKhere

Yes thats right, i dont want llm to generate any response, i want to retrieve the elements closer to the query which is in vectors/embeddings from the index

WWhiteFang_Jr

okay great! Then you can do it like this:

Plain Text

for doc in self.documents:        llama_docs.append( Document.from_langchain_format(doc))

self.index = VectorStoreIndex.from_documents(llama_docs,    storage_context=storage_context,  embed_model=self.embeddings)

retriever = self.index.as_retriever()
nodes = retriever.retrieve("Who is Paul Graham?")

https://docs.llamaindex.ai/en/stable/module_guides/querying/retriever/#retriever

MMKhere

@WhiteFang_Jr Thanks for your response, but my input is not a question ( not string ) , like "what is paul graham" is already converted in to vectors/emebeddings, now I need to search the index using these embeddings? in langchain for chromadb , lancdb we have a method called similarity_search_by_vector, do we have such inbuilt methods in llamaindex ? ?? langchain definition of the function below def similarity_search_by_vector(
self,
embedding: List[float],
k: int = 4,
param: Optional[dict] = None,
expr: Optional[str] = None,
timeout: Optional[float] = None,
**kwargs: Any,
) -> List[Document]:
"""Perform a similarity search against the query string.

Args:
embedding (List[float]): The embedding vector to search.
k (int, optional): How many results to return. Defaults to 4.

WWhiteFang_Jr

You are using chromadb, You can checkout this method: https://github.com/run-llama/llama_index/blob/8dbb6e91e5984a556756caafbd1d03146e029a51/llama-index-integrations/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py#L349

You can create a object of VectorStoreQuery type which contains query_embeddings and top_k value.
and then you can directly call chroma_db method and it will return nodes for you

MMKhere

@WhiteFang_Jr thanks again, yes I tried this too so in my case when i do self.db.vectore_store.query, this query goes to the following method, not to the one you described. also how to do this where are the embeddings stored in chromadb for me to search the embeddings of my question. This is what i tried - query_obj = VectorStoreQuery(
query_embedding=query, # The embedding vector (list of floats)
similarity_top_k=k # Number of top results
)

query_result = self.db.vector_store.query(query_embedding) but this calls the following query method of llamaindex inbuilt method def query(self, query: VectorStoreQuery, kwargs: Any) -> VectorStoreQueryResult: """Query vector store.""" async def aquery( self, query: VectorStoreQuery, kwargs: Any
) -> VectorStoreQueryResult:
"""
Asynchronously query vector store.
NOTE: this is not implemented for all vector stores. If not implemented,
it will just call query synchronously.
"""
return self.query(query, **kwargs),

WWhiteFang_Jr

Hey apologies, I actually told you the direct way to interact with vector store and retrieve nodes.
But i found something and that is in the same direction that you were looking for.

once you create your retriever, you can create a QueryBundle obj which consist embedding field also for your query.
Check the screenshot.

Plain Text

from llama_index.core.schema import QueryBundle
# create retriever
retriever = index.as_retriever()

# create QueryBundle for your query and add embeddings
query = QueryBundle(query_str="This is query", embedding = [ADD Embeddings here])

# pass this object in your retriever to get nodes
nodes  = retriever.retrieve(query)

Attachment

MMKhere

@WhiteFang_Jr , Great stuff - this works like a charm, but it retrieves too many vectors, how to get only the top 3 ? any idea

WWhiteFang_Jr

Glad it worked for you!
The number of nodes directly depends on the top_k value that you have set while instantiating the retriever

MMKhere

@WhiteFang_Jr , how to achieve the following langchain code in to llamaindex - final_prompt = Chatprompt.from_template(context_prompt)

chain = prompt | MODEL.llm | StrOutputParser()

response = chain.run({
"input": question,
"history": "\n".join(memory)},
config = {"callbacks": [MODEL.callback]})

MMKhere

@kapa.ai how to create chain of thoughts and do invoke in llamaindex

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Creating a chain of thoughts in LlamaIndex involves several steps. Here's a general outline based on the information from the provided sources:

Setup LlamaIndex query engine tools: This could be over a specific dataset or document. For example, in one of the guides, they set up the query engine tools over Uber 10Q filings (source ()).

Generate a training/evaluation question dataset: This involves using the dataset generator to create a set of questions over your chosen dataset or document. In the guide, they added complex variations to each question to account for multiple quarters, which helps to induce chain-of-thought prompting (source ()).

Feed these questions through a GPT-4 ReAct Agent: The questions are then fed through a GPT-4 ReAct Agent. The inputs and outputs are logged as a dataset to fine-tune over (source ()).

Fine-tune the model: This involves calling OpenAI fine-tuning endpoints to fine-tune gpt-3.5-turbo on the dataset created in the previous step (source ()).

kkapa.ai

Run qualitative evaluation: Finally, you can run a qualitative evaluation to show that the fine-tuned model performs better in chain-of-thought prompting than the base model (source ()).

Please note that each execution of an agent can involve multiple LLM calls through the ReAct chain-of-thought loop. The prompt inputs/output pair for each LLM call is logged as an individual datapoint in the training dataset, in the chat message format (source ()).

For more detailed instructions and examples, you can refer to the LlamaIndex documentation and guides.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

Guys how do i query the vector embedding