SubQuestionQueryEngine
, you need to create a ServiceContext
that includes both your LLM and your embedding model. This ServiceContext
is then used to create a VectorStoreIndex
from your documents. The VectorStoreIndex
can then be used to set up your SubQuestionQueryEngine
.set_global_tokenizer
function from llama_index
. The tokenizer should be the one used by your LLM.from llama_index import set_global_tokenizer from transformers import AutoTokenizer set_global_tokenizer( AutoTokenizer.from_pretrained("Your_LLM_Model_Name").encode )
HuggingFaceEmbedding
class from llama_index.embeddings
. Replace "BAAI/bge-small-en-v1.5"
with the name of your fine-tuned embedding model.from llama_index.embeddings import HuggingFaceEmbedding embed_model = HuggingFaceEmbedding(model_name="Your_Embedding_Model_Name")
ServiceContext
using the from_defaults
method. This context should include your LLM and your embedding model.from llama_index import ServiceContext service_context = ServiceContext.from_defaults( llm=Your_LLM, embed_model=embed_model, )
SimpleDirectoryReader
or any other method you prefer.from llama_index import SimpleDirectoryReader documents = SimpleDirectoryReader("Your_Documents_Directory").load_data()
VectorStoreIndex
from your documents using the from_documents
method. This index uses the ServiceContext
you created earlier.from llama_index import VectorStoreIndex index = VectorStoreIndex.from_documents( documents, service_context=service_context )
SubQuestionQueryEngine
using the as_query_engine
method of your VectorStoreIndex
.query_engine = index.as_query_engine()
SubQuestionQueryEngine
is set up to retrieve relevant documents using your fine-tuned embedding model and send these documents to your fine-tuned LLM."Your_LLM_Model_Name"
, "Your_Embedding_Model_Name"
, Your_LLM
, and "Your_Documents_Directory"
with the actual values you are using.SubQuestionQueryEngine
that sends relevant documents rather than embeddings to the LLM, you need to create a QueryEngineTool
for each vector index. Each tool has a name and a description, which the LLM agent uses to decide which tool to choose. SubQuestionQueryEngine
:QueryEngineTool
for each vector index:from llama_index.tools import QueryEngineTool, ToolMetadata individual_query_engine_tools = [ QueryEngineTool( query_engine=index.as_query_engine(), metadata=ToolMetadata( name="vector_index", description="useful for when you want to answer queries about your documents", ), ) ]
SubQuestionQueryEngine
, which will allow you to synthesize answers across your documents. Pass in the individual_query_engine_tools
you defined above, as well as a service_context
that will be used to run the subqueries.from llama_index.query_engine import SubQuestionQueryEngine query_engine = SubQuestionQueryEngine.from_defaults( query_engine_tools=individual_query_engine_tools, service_context=service_context, )
SubQuestionQueryEngine
, it will decompose the query into subqueries, each answered by an individual vector index (which retrieves relevant documents), and synthesize the results to answer the overall query."vector_index"
and "useful for when you want to answer queries about your documents"
with the actual values you are using.ServiceContext
, the SubQuestionQueryEngine
will handle the process of retrieving the top k nodes from your embeddings and sending them to the LLM for further processing. SubQuestionQueryEngine
is designed to handle this process automatically. It decomposes the query into subqueries, each answered by an individual vector index (which retrieves relevant documents), and synthesizes the results to answer the overall query.from llama_index import ServiceContext, VectorStoreIndex, SimpleDirectoryReader from llama_index.embeddings import OpenAIEmbedding from llama_index.tools import QueryEngineTool, ToolMetadata from llama_index.query_engine import SubQuestionQueryEngine # Set up your embedding model embed_model = OpenAIEmbedding() # Set up your LLM llm = Your_LLM # Create a ServiceContext with your LLM and embedding model service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model) # Load your documents documents = SimpleDirectoryReader("./data").load_data() # Create a VectorStoreIndex from your documents index = VectorStoreIndex.from_documents(documents) # Define a QueryEngineTool for each vector index individual_query_engine_tools = [ QueryEngineTool( query_engine=index.as_query_engine(), metadata=ToolMetadata( name="vector_index", description="useful for when you want to answer queries about your documents", ), ) ] # Create the SubQuestionQueryEngine query_engine = SubQuestionQueryEngine.from_defaults( query_engine_tools=individual_query_engine_tools, service_context=service_context, )
Your_LLM
with your actual LLM, and "./data"
with the path to your documents. The SubQuestionQueryEngine
will now handle the process of retrieving the top k nodes from your embeddings and sending them to the LLM for further processing.