Find answers from the community

Updated 2 years ago

Query string

so does the query string get passed to the model which will re-calculate the embeddings anyway, then? Or... since (to my understanding) the model doesn't understand the meaning of the string, only the tokens it's parsed into, how does the model use the raw string to synthesize the answer without having to re-embed it anyway?
L
u
11 comments
So there are two main steps with vector indexes in llama index

  1. Retrieve
The embeddings of a query string are used to retrieve the top k most similar nodes. This uses an embedding model which is optimized for this task (this defaults to text-ada-002 from openai)

  1. Synthesize
Using those top k nodes and the query string, generate a response in natural language. In this step, the LLM reads the query and top k nodes (along with some extra instructions from a prompt template). Then, and response using all that info is generated from the LLM and returned
Since you are passing in the embddings, in step one, it won't need to calculate the embeddings for the query string. But from there, it retrieves the top k nodes, and then synthesizes a response
right, that I get. I think maybe I'm not understanding what goes on inside the LLM's black box when synthesizing the response.
but I understand if that's outside the scope of what you can (or feel like) answering
Haha happy to help make this less of a black box!

So, let's say the retrieve step is done. Right now we have the query string, and a bunch of text that is hopefully relevant to the query

We take the query and the text from the top k nodes, and put that in a prompt template. Basically, the prompt template is just a big instruction string that says "hey, here's a query and some relevant context. Using that context alone, answer the query"
Then the LLM reads all that, and writes an answer
If all the retrieved text doesn't fit into one LLM call, then an answer is refined, where on the second LLM call, we show the model the next context and previous answer, and ask it to update the previous answer if needed
Around here, you can see the actual text_qa and refine templates that are powering this

https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py#L90
ok, so precalculating embeddings doesn't save us from having OpenAI or whoever charge us for token usage for parsing the generated prompt or the query
Thank you for all the info. I think I understand better now!
Awesome! πŸ‘πŸ’ͺ
Add a reply
Sign up and join the conversation on Discord