Find answers from the community

Updated 3 months ago

I have a different finetuned embedding model than my fine tuned llm. How can I connect the

I have a different finetuned embedding model than my fine tuned llm. How can I connect these two in SubQuestionQueryEngine such that my index retrieves relevant documents rather than embeddings and send the relevant documents to llm?
k
b
11 comments
Please stand by @balanp as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To connect your fine-tuned embedding model with your fine-tuned LLM in a SubQuestionQueryEngine, you need to create a ServiceContext that includes both your LLM and your embedding model. This ServiceContext is then used to create a VectorStoreIndex from your documents. The VectorStoreIndex can then be used to set up your SubQuestionQueryEngine.

Here is a step-by-step guide based on the information provided in the extracts:

  1. First, you need to set the global tokenizer to match your LLM. This can be done using the set_global_tokenizer function from llama_index. The tokenizer should be the one used by your LLM.
Plain Text
from llama_index import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("Your_LLM_Model_Name").encode
)


  1. Next, you need to set up your embedding model. This can be done using the HuggingFaceEmbedding class from llama_index.embeddings. Replace "BAAI/bge-small-en-v1.5" with the name of your fine-tuned embedding model.
Plain Text
from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="Your_Embedding_Model_Name")


  1. Now, you can create a ServiceContext using the from_defaults method. This context should include your LLM and your embedding model.
Plain Text
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(
    llm=Your_LLM,
    embed_model=embed_model,
)


  1. Load your documents using a SimpleDirectoryReader or any other method you prefer.
Plain Text
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader("Your_Documents_Directory").load_data()


  1. Create a VectorStoreIndex from your documents using the from_documents method. This index uses the ServiceContext you created earlier.
Plain Text
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)


  1. Finally, you can set up your SubQuestionQueryEngine using the as_query_engine method of your VectorStoreIndex.
Plain Text
query_engine = index.as_query_engine()


Now, your SubQuestionQueryEngine is set up to retrieve relevant documents using your fine-tuned embedding model and send these documents to your fine-tuned LLM.

Please note that you need to replace "Your_LLM_Model_Name", "Your_Embedding_Model_Name", Your_LLM, and "Your_Documents_Directory" with the actual values you are using.

For more details, you can refer to the LlamaIndex documentation () and the LlamaIndex GitHub repository ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai You have not shown the SubQuestionQueryEngine setup in the above code. I mean to ask is how the index will send relevant documents rather than embeddings to llm inside the SubquestionQueryEngine?
Let me check the knowledge base for you. One moment! :books:
----------------------
Apologies for the confusion. To set up a SubQuestionQueryEngine that sends relevant documents rather than embeddings to the LLM, you need to create a QueryEngineTool for each vector index. Each tool has a name and a description, which the LLM agent uses to decide which tool to choose.

Here is how you can set up a SubQuestionQueryEngine:

  1. Define a QueryEngineTool for each vector index:
Plain Text
from llama_index.tools import QueryEngineTool, ToolMetadata

individual_query_engine_tools = [
    QueryEngineTool(
        query_engine=index.as_query_engine(),
        metadata=ToolMetadata(
            name="vector_index",
            description="useful for when you want to answer queries about your documents",
        ),
    )
]


  1. Create the SubQuestionQueryEngine, which will allow you to synthesize answers across your documents. Pass in the individual_query_engine_tools you defined above, as well as a service_context that will be used to run the subqueries.
Plain Text
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=individual_query_engine_tools,
    service_context=service_context,
)


Now, when you query the SubQuestionQueryEngine, it will decompose the query into subqueries, each answered by an individual vector index (which retrieves relevant documents), and synthesize the results to answer the overall query.

Please note that you need to replace "vector_index" and "useful for when you want to answer queries about your documents" with the actual values you are using.
For more details, you can refer to the LlamaIndex documentation () and the LlamaIndex GitHub repository ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai just to be sure if I declare my embed model and my llm seperately in my service context than all other set up SubQuestionQueryEngine will be traditional i.e. it will handle the relevant top k nodes retrieved from my embeddings(done by different model) and send it to llm(which is different model) by itself
Let me check the knowledge base for you. One moment! :books:
----------------------
Yes, you're correct. If you declare your embedding model and LLM separately in your ServiceContext, the SubQuestionQueryEngine will handle the process of retrieving the top k nodes from your embeddings and sending them to the LLM for further processing.

The SubQuestionQueryEngine is designed to handle this process automatically. It decomposes the query into subqueries, each answered by an individual vector index (which retrieves relevant documents), and synthesizes the results to answer the overall query.

Here's a simplified version of how you might set this up:

Plain Text
from llama_index import ServiceContext, VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings import OpenAIEmbedding
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine

# Set up your embedding model
embed_model = OpenAIEmbedding()

# Set up your LLM
llm = Your_LLM

# Create a ServiceContext with your LLM and embedding model
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

# Load your documents
documents = SimpleDirectoryReader("./data").load_data()

# Create a VectorStoreIndex from your documents
index = VectorStoreIndex.from_documents(documents)

# Define a QueryEngineTool for each vector index
individual_query_engine_tools = [
    QueryEngineTool(
        query_engine=index.as_query_engine(),
        metadata=ToolMetadata(
            name="vector_index",
            description="useful for when you want to answer queries about your documents",
        ),
    )
]

# Create the SubQuestionQueryEngine
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=individual_query_engine_tools,
    service_context=service_context,
)
In this setup, replace Your_LLM with your actual LLM, and "./data" with the path to your documents. The SubQuestionQueryEngine will now handle the process of retrieving the top k nodes from your embeddings and sending them to the LLM for further processing.

For more details, you can refer to the LlamaIndex documentation () and the LlamaIndex GitHub repository ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord