I thought I would share some initial results from a couple tests if there are other noobs like me who may find it useful π
I am attempting to create a chat/qa bot that can allow my users to interact with my documentation with natural language. As source data, I am ingesting the documentation for my product. ChatGPT already has knowledge of older versions of my product, so I am testing it with questions derived specific to the newer version.
I plan to do a lot more testing, but so far I have used gpt3.5turbo a couple times with different tunings for max chunk size and num tokens. By accident I queried the models that were indexed by gpt3.5 with the default davinci query, and then again with 3.5. I indexed and queried a test with the default davinci settings. And I created a knowledge graph index with no embeddings. All the questions I asked were derived directly from the documents I fed in and were based on the exact text I fed in. I only used 5 questions to do an initial sample of model effectiveness, and with such a small number of samples, keep in mind the metrics will be skewed accordingly. My best results thus far are with: (Thanks to @dagthomas!)
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_outputs = 512
# set maximum chunk overlap
max_chunk_overlap = 40
# set chunk size limit
chunk_size_limit = 600
# define LLM
llm_predictor = LLMPredictor(llm=OpenAIChat(
temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
prompt_helper = PromptHelper(max_input_size,
num_outputs,
max_chunk_overlap,
chunk_size_limit=chunk_size_limit)
And also with the same settings but a num_output=2048, the results were the same but slightly more verbose which could be positive or negative depending on your needs. I got 4/5 questions correct with these.
I have a few more notes I will post to this thread if your interested ...