Thanks a lot for the notebook on "building production ready pipeline". I have a complete noob and have a few questions:
- Why did we choose this embedding model? embed_model = "local:BAAI/bge-small-en-v1.5"
- And why did we choose this model for reranking? model="BAAI/bge-reranker-base"
3.1. Which underlying model does this function use to generate questions that are used for model evaluation?
3.2 What if there are mistakes in this model's output?
async agenerate_dataset_from_nodes(num: int | None = None) → QueryResponseDataset
Generates questions for each document.
- Relevancy:
Evaluates the relevancy of retrieved contexts and responses to a query. This evaluator considers the query string, retrieved contexts, and response string.
So why do we need an LLM (gpt4.0 in this example) for evaluating the relevancy? Relevancy tells us if the generated response is as per the retrieved contents and user query -- which means we just need the query, retriever output/context and the response string (which we got from gpt3.5).