Find answers from the community

s
F
Y
a
P
Updated last year

Two questions for Qdrant as the vector

Two questions, for Qdrant as the vector store, does creating the index also pass the extra_info from the documents into Qdrant as the payload? And for PromptHelper, what would be the configuration to use if I want the output to use as much of the tokens not taken up by the prompt as possible?
L
L
4 comments
  1. It should be yea πŸ™‚
  1. mmmm if you set max_tokens on the OpenAI llm to -1, the LLM will output as much as it has room for
Hmm, so for PromptHelper I see it calculates token space as context_length - num_output - num_input, what is this used for? Is it just to decide whether to refine the given text? What happens if I set num_output to like 0 if I want it to use as much of the available space as possible? xD
So, most LLMs these days are decoder models. This means they generate 1 token at time, add it to the input, and generate the next token

num_output is ensuring there is a minimum amount of space to generate an answer when we prompt the LLM πŸ™‚
Makes sense thanks
Add a reply
Sign up and join the conversation on Discord