Query string

uunbittable

so does the query string get passed to the model which will re-calculate the embeddings anyway, then? Or... since (to my understanding) the model doesn't understand the meaning of the string, only the tokens it's parsed into, how does the model use the raw string to synthesize the answer without having to re-embed it anyway?

11 comments

LLogan M

So there are two main steps with vector indexes in llama index

Retrieve

The embeddings of a query string are used to retrieve the top k most similar nodes. This uses an embedding model which is optimized for this task (this defaults to text-ada-002 from openai)

Synthesize

Using those top k nodes and the query string, generate a response in natural language. In this step, the LLM reads the query and top k nodes (along with some extra instructions from a prompt template). Then, and response using all that info is generated from the LLM and returned

LLogan M

Since you are passing in the embddings, in step one, it won't need to calculate the embeddings for the query string. But from there, it retrieves the top k nodes, and then synthesizes a response

uunbittable

right, that I get. I think maybe I'm not understanding what goes on inside the LLM's black box when synthesizing the response.

uunbittable

but I understand if that's outside the scope of what you can (or feel like) answering

LLogan M

Haha happy to help make this less of a black box!

So, let's say the retrieve step is done. Right now we have the query string, and a bunch of text that is hopefully relevant to the query

We take the query and the text from the top k nodes, and put that in a prompt template. Basically, the prompt template is just a big instruction string that says "hey, here's a query and some relevant context. Using that context alone, answer the query"

LLogan M

Then the LLM reads all that, and writes an answer

LLogan M

If all the retrieved text doesn't fit into one LLM call, then an answer is refined, where on the second LLM call, we show the model the next context and previous answer, and ask it to update the previous answer if needed

LLogan M

Around here, you can see the actual text_qa and refine templates that are powering this

https://github.com/jerryjliu/llama_index/blob/main/gpt_index/prompts/default_prompts.py#L90

uunbittable

ok, so precalculating embeddings doesn't save us from having OpenAI or whoever charge us for token usage for parsing the generated prompt or the query

uunbittable

Thank you for all the info. I think I understand better now!

LLogan M

Awesome! 👍💪

Add a reply

Find answers from the community

Query string