Find answers from the community

Updated last year

Two questions for Qdrant as the vector

Two questions, for Qdrant as the vector store, does creating the index also pass the extra_info from the documents into Qdrant as the payload? And for PromptHelper, what would be the configuration to use if I want the output to use as much of the tokens not taken up by the prompt as possible?
L
L
4 comments
  1. It should be yea πŸ™‚
  1. mmmm if you set max_tokens on the OpenAI llm to -1, the LLM will output as much as it has room for
Hmm, so for PromptHelper I see it calculates token space as context_length - num_output - num_input, what is this used for? Is it just to decide whether to refine the given text? What happens if I set num_output to like 0 if I want it to use as much of the available space as possible? xD
So, most LLMs these days are decoder models. This means they generate 1 token at time, add it to the input, and generate the next token

num_output is ensuring there is a minimum amount of space to generate an answer when we prompt the LLM πŸ™‚
Makes sense thanks
Add a reply
Sign up and join the conversation on Discord