I am having a issue trying to use Open-Orca/OpenOrca-Platypus2-13B. I am gertting [/INST] all over the place and the model keeps chatting with itself. I am using vLLM currently as an "openailike" server.
I looked around the and found an issue where it said to use the STOP command in the API. This made everything work a lot better actually:
curl https://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Open-Orca/OpenOrca-Platypus2-13B",
"stop": ["[INST]", "[/INST]"],
"messages": [
{"role": "user", "content": "What is the square root of two"}
] }'
But I can't see if there is a way for llamaindex to do this as well? I have read through the docs and looked at the code but couldnt figure out if there was an easier way to do this. Any ideas?