Just for fun, I confirmed the behaviour works fine on my end π€
(llama-index) loganm@gamingpc:~/llama_index_proper/llama_index$ python
Python 3.11.0 (main, Mar 1 2023, 18:26:19) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_index import LLMPredictor, ServiceContext
>>> from llama_index.prompts import Prompt
>>> from llama_index import set_global_handler
>>> set_global_handler("simple")
>>> from llama_index.llms import OpenAI
>>> llm = OpenAI(model="gpt-3.5-turbo-instruct")
>>> ctx = ServiceContext.from_defaults(llm=llm, query_wrapper_prompt=Prompt("[INST] {query_str} [/INST] "))
>>> response = ctx.llm_predictor.predict(Prompt("Hello world"))
** Prompt: **
[INST] Hello world [/INST]
** Completion: **
Hello world
>>> response
'\n\nHello world'
I'm using openai here, but the LLM itself doesn't matter. I did remove the extra
****
characters from the code though, they seem to cause print buffer issues or something