The community members are discussing how to choose the right embedding for a conversational model. They explain that embeddings are separate from the choice of language model, and are used in the retrieval step to compare the input to the indexed data. Different embeddings have different dimensionality (e.g. 384 vs 1024) and the community members suggest referencing leaderboards to choose a top-performing embedding, with bge-large-en-v1.5 being a common recommendation. However, they note that many embeddings are quite similar, so the specific choice may not be critical as long as it is from a high-performing option.
embeddings change the "retrieval" step -- essentially, embeddings are what is used to compare a query to all the data you indexed, and retrieves the top-k most similar
Then, using the retrieved text, the LLM handles synthesizing a response to the query in natural language