Getting streaming working with GPT-35-Turbo and also (i...

6 comments

return LLMPredictor(llm=AzureOpenAI(
openai_api_key=jwt,
temperature=0,
max_tokens=512,
deployment_name='text-davinci-003',
model_kwargs={
"api_key": jwt,
"api_base": openai.api_base,
"api_type": openai.api_type,
"api_version": openai.api_version,
}))

this using AzureOpenAI doesn't work either with streaming- would be great to have this working with both Azure & Gpt3.5Turbo

RRunonthespot

One of my team are looking at it too

AAndreaSel93

Yeah! Waiting for that me too

RRunonthespot

We got this working. The azure examples folder works fine, and for streaming

jjerryjliu0

sweet! @Runonthespot what did you change?

RRunonthespot

it was more a case of getting all the Azure parameters right - combination of base_url, deployment endpoint, needing to set the version, api_type, and then carefully setting context window size. Some gotchas: currently (28/03) text-embedding-ada-002 has max size of 4096, so when using langchain stuff, have to set max_chunk_size to 1 (langchain confusingly names this parameter for the number of chunks to send to embedding at a time). We discovered this in experimenting with OpenAI and Langchain versions of embedding models & predictor models. We also found we seem to get decent QA results generally with a 2048 chunk size.

Find answers from the community

Getting streaming working with GPT-35-Turbo and also (ideally) the Azure endpoints too