Hi Logan, I was able to work out with llama-index LLM except I'm not able to understand one last thing.
Here's the pervious code for LLM:
from langchain.chat_models import ChatOpenAI
self.llm = ChatOpenAI(
model_name=self.model_name,
temperature=self.temperature,
model_kwargs=model_kwargs,
max_tokens=self.max_output_tokens,
streaming=True,
openai_api_key=openai_api_key,
)
predictor = LLMPredictor(llm=self.llm)
Updated it with Llama-index llm:
from llama_index.llms import AzureOpenAI
self.llm = AzureOpenAI(
model=self.model,
temperature=self.temperature,
max_tokens=self.max_output_tokens,
enginer=self.azure_chat_engine_name,
additional_kwargs=model_kwargs,
api_key=azure_openai_key,
)
self.llm_predictor = LLMPredictor(llm=self.llm)
However, I'm not able to figure out How to setup
streaming=True
. Do i need to call stream chat method, If so how would it work when instead of calling it with
llm_predictor
. We use
custom_index.query()