Find answers from the community

Updated 2 years ago

I thought I would share some initial

At a glance
I thought I would share some initial results from a couple tests if there are other noobs like me who may find it useful πŸ™‚
I am attempting to create a chat/qa bot that can allow my users to interact with my documentation with natural language. As source data, I am ingesting the documentation for my product. ChatGPT already has knowledge of older versions of my product, so I am testing it with questions derived specific to the newer version.
I plan to do a lot more testing, but so far I have used gpt3.5turbo a couple times with different tunings for max chunk size and num tokens. By accident I queried the models that were indexed by gpt3.5 with the default davinci query, and then again with 3.5. I indexed and queried a test with the default davinci settings. And I created a knowledge graph index with no embeddings. All the questions I asked were derived directly from the documents I fed in and were based on the exact text I fed in. I only used 5 questions to do an initial sample of model effectiveness, and with such a small number of samples, keep in mind the metrics will be skewed accordingly. My best results thus far are with: (Thanks to @dagthomas!)
Plain Text
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_outputs = 512
# set maximum chunk overlap
max_chunk_overlap = 40
# set chunk size limit
chunk_size_limit = 600

# define LLM
llm_predictor = LLMPredictor(llm=OpenAIChat(
    temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
prompt_helper = PromptHelper(max_input_size,
                             num_outputs,
                             max_chunk_overlap,
                             chunk_size_limit=chunk_size_limit)

And also with the same settings but a num_output=2048, the results were the same but slightly more verbose which could be positive or negative depending on your needs. I got 4/5 questions correct with these.
I have a few more notes I will post to this thread if your interested ...
a
L
d
23 comments
The knowledge graph index with no embeddings was by far the worst performance with 0/5 correct.
and it was over 10x more expensive to build, and and was definitely an even higher multiplier for time-to-build, though I didnt time it, it was obvious
I think the knowledge graph performance is limited by how good the triplets actually are.

Out of curiosity, did you have include_text set to true or false for the graph?
I only tried with false per the example, as I have no understanding of this parameter so wasnt sure if I should try other settings.
But happy to try with others if its interesting!
I will try knowledge graph with embeddings next
and I also have tons of other permutations I want to try, will report back results as I go!
@Logan M By Any chance would you think the Azure integration might work with knowledge graph? Given how the mechanics appear, I would assume probably not and not sure I want to take the time to try if its pointless, but I would be curious for your perspective?
I am not really interested in azure lol, but got the company account so would be nice lol πŸ™‚
include_text=True means that the triplet, along with the text chunk where the triplet came from, are sent to the LLM. I would think this would drastically improve the quality of answers (but, also add cost to an already expensive index lol)
As in using an LLM from Azure? I think that should work yea!
ooh yay I will give it a try πŸ™‚
I saw the azure integration example, just wasnt sure if it would extend to the knowledge graph
since it has some peculiar embeddings settings required
Yea! All our indexes will technically work with any LLM and Embedding model, assuming you wrap them with the appropriate classes

(I personally use local embeddings models a lot to save some small cost lol)
The quality of the outputs will of course depend on the quality of the models you use :p
My challenge is, my regular lab doesnt have a gpu, there are still hugginface models I can run, but there are soo sooo sooo many on hugginface its hard, and I assume without the gpu, I wont be able to run the best models if I could even figure out what the best models were in their massive repo. I dont mean to sound negative, its awesome, but difficult to keep up with when this is my side-project lol
I could rent a gpu vm from aws/azure/gcp, but given those costs and my volume and extra effort, not sure it would be worth it
Oh I totally agree! Stuff is all so expensive tbh.

Most consumer hardware can't run LLMs with anywhere near the same quality (although hopefully this changes soon)

Embeddings models are usually small enough to run locally, but that's only half the battle
if I can get that mac m2 upgrade, maybe i can run em on my laptop πŸ˜‰
@Logan M fyi I ran with text true and for my use case the results were pretty comparable, bumped from 0/5 to 1/5 correct. I look forward to trying it out with embeddings as soon as I get the chance.
well, lots of room for improvement lol
Glad to se it worked out for you @afewell i'll let you know if I get it more accurate πŸ™‚

I also found a pre-prompt on the channel here, that @smokeoX made, that gave me even better results.

https://discord.com/channels/1059199217496772688/1059200010622873741/1075865143650558054
Add a reply
Sign up and join the conversation on Discord