Find answers from the community

Updated last year

Can someone explain the different

At a glance
Can someone explain the different between setting the text_qa_template in a response synthesizer and the system_prompt in the LLM used in the response synthesizer?

The behavior I'm looking to achieve is to tell my LLM what they are and then set an example prompt and example answer. I am using the CitationQueryEngine because I also want to know what the citations for the query are.
L
I
12 comments
Which LLM are you using? Where do you see system prompt?

The citation query engine uses a custom text_qa_tempalte. When you call CitationQueryEngine.from_defaults() it sets the text qa template to be this https://github.com/jerryjliu/llama_index/blob/ae3e0bb5ca7811e579da39bbfac8c217dc818cfc/llama_index/query_engine/citation_query_engine.py#L22

You could override that to have different instructions, or add a system prompt there using a chat template
Plain Text
from llama_index.llms.base import ChatMessage
from llama_index.prompts.base import ChatPromptTemplate

text_qa_messages = [
    ChatMessage(role="system", content="Some system prompt"),
    ChatMessage(
        content="some template string. A good default is the one I linked above in the code base",
        role=MessageRole.USER,
    ),
]

text_qa_template = ChatPromptTemplate(message_templates=text_qa_messages)
If you look at the documentation here you can see how to add a system prompt to a huggingface LLM for example: https://gpt-index.readthedocs.io/en/stable/core_modules/model_modules/llms/usage_custom.html#example-using-a-huggingface-llm
I can either do this or use text_qa_messages. But I dont know what is the difference
For more context this is sort of what my code looks like:
Plain Text
self.model = OpenAI(
                model=self.modelName,
                temperature=self.temperature,
                max_tokens=self.contextBuffer,
                stream=True,
            ) # WHY NOT SET SYSTEM PROMPT HERE?????

self.serviceContext = ServiceContext.from_defaults(
            llm=self.model, embed_model=self.embedModel,
        )

self.index = VectorStoreIndex.from_vector_store(
                service_context=self.serviceContext,
                vector_store=WeaviateVectorStore(
                    weaviate_client=self.client, index_name="index_name"
                )
            )

self.queryEngine = CitationQueryEngine.from_args(
            self.index,
            streaming=self.streaming,
            citation_qa_template=ChatPromptTemplate(
                message_templates=self.chatHistory,
            ), # VS USE CITATION_QA_TEMPLATE HERE?????
            service_context=self.serviceContext,
            similarity_top_k=self.topK,
            citation_chunk_size=self.citationSize,
        )
Yea, thats specific to huggingface llm πŸ˜… It's a little unreliable to do it this way (it can cause token issues), but it's needed for open-source LLMs


For openai, use the way above tbh
OR, slightly sneakier, just remember this got added.

Plain Text
from llama_index import LLMPredictor

llm_predictor = LLMPredictor(llm=self.model, system_prompt="Talk like a pirate")

service_context = service_context.from_defaults(llm_predictor=llm_predictor , embed_model=self.embedModel,
Whats "better" though? Is there a difference?
Thanks btw!
No difference, the first method of defining the template just allows for more flexibility (you can customize every part of the overall prompt)
If you are just worried about the system prompt, they should be the same πŸ‘€
Last question - does the CitationQueryEngine do anything special? Do the other engines also return the source nodes of the query?
The only thing special the citation query engine does is breaks nodes into smaller citable chunks, and asks the LLM to write in text citations

All other engines also return source nodes though πŸ‘
Add a reply
Sign up and join the conversation on Discord