[Question]: PromptTemplate seems to mess...

At a glance

Hey @Logan M , I saw that you tackled a similar issue to mine in https://github.com/run-llama/llama_index/issues/9277 . Any idea where I can find messages_to_prompt & completion_to_prompt when using Llama3 & Mistral through the HF Inference API?

22 comments

LLogan M

Looks like inference api doesn't actually use those function hooks

If you have a conversational model though, it should automatically handle formatting

aakvn

Hm, it seems to me that any slight change in the prompt changes the response drastically. My issue now is that sometimes the model would return the metadata with the response (file name, context), other queries and their answers, or even the question would be in the answer itself. It is very weird and only happens with Llama3/Mistral via HuggingFaceInferenceAPI in some queries meanwhile GPT 4 via Azure OpenAI is fine

aakvn

Also, small changes in the prompt can make the chatbot go from a correct answer to a wrong answer with the same question

aakvn

This is the current prompt:

aakvn

Plain Text

Always answer the query {query_str} using the provided context information {context_str}, and not prior knowledge.
    If you do not know the answer, you should say so.
    Some rules you must follow:
    1. Do not include context information or meta data.
    2. Keep your answers concise but comprehensive.
    3. Only provide the answer to the query asked and do not provide additional information.
    Answer:

LLogan M

GPT4 is leagues ahead of llama3 😅 So that difference would make sense

LLogan M

I don't think llama3/mistral is conversational by default, so it might not be auto-formatting

LLogan M

Actually I lied, looking at the code, you might be able to provide messages_to_prompt, assuming you have a function to transform the messages to the appropriate format (llama3's format is pretty complex)

Attachment

LLogan M

I would also set is_chat_model=True so that this chat() function is used consistently

aakvn

Well yeah but I just want it to freaking stick to prompt and not give out context information or even file info. like it sometimes tells you the reference file size lol

aakvn

im actually using a query engine and not a chat engine. While I am indeed building a chat bot, it seems that the query engine performs better.

aakvn

Im using the Llama3-8B-instruct, is that the case for that too?

aakvn

what about the query function?

LLogan M

.chat() is used for query engines as well. Just depends on is_chat_model=True/False

Otherwise you can format the prompt template in the actual format expected for llama3

LLogan M

imo the inference API is so confusing to use. I would just use ollama 😅 But thats just me

aakvn

i initially used ollama but it takes a long time to run without a GPU. Say I use Ollama, do I need to format the template or not?

aakvn

Interesting, I think your documentation can def be improved and I would love to contribute once I finish this project

aakvn

Wait just to verify, .query() and .chat() are practically a reference to the same function with the difference of is_chat_model. I am asking this because when i use my query engine with .chat, I get an error

LLogan M

ollama handles all the foramtting for you. But yea, without any kind of GPU, it will be a tad slow

LLogan M

sorry, this is the llm, not the query engine. Youd still do query_engine.query(), and under the hood, it will use llm.chat() or llm.complete() depending on is_chat_model

aakvn

oh alright makes sense, i will try passing is_chat_model and i will see the behavior, thank you very much.

aakvn

btw, the prompt im using is indeed showing effect compared to default prompt, so this makes me suspect that the inference api does also do formatting

Add a reply

Find answers from the community

[Question]: PromptTemplate seems to mess...