@ravitheja Thanks a lot for the notebook

At a glance

Thanks a lot for the notebook on "building production ready pipeline". I have a complete noob and have a few questions:

Why did we choose this embedding model? embed_model = "local:BAAI/bge-small-en-v1.5"

And why did we choose this model for reranking? model="BAAI/bge-reranker-base"

3.1. Which underlying model does this function use to generate questions that are used for model evaluation?
3.2 What if there are mistakes in this model's output?

async agenerate_dataset_from_nodes(num: int | None = None) → QueryResponseDataset
Generates questions for each document.

Relevancy:

Evaluates the relevancy of retrieved contexts and responses to a query. This evaluator considers the query string, retrieved contexts, and response string.

So why do we need an LLM (gpt4.0 in this example) for evaluating the relevancy? Relevancy tells us if the generated response is as per the retrieved contents and user query -- which means we just need the query, retriever output/context and the response string (which we got from gpt3.5).

3 comments

bbeaverTango

Why did we choose this embedding model? embed_model = "local:BAAI/bge-small-en-v1.5"

You can refer to the LLM leaderboard on HuggingFace for the metrics. The model seem to perform well on all major benchmarks.

And why did we choose this model for reranking? model="BAAI/bge-reranker-base"

The reranking is also task specific but the bge reranker or something from Sentence Transformer is a good starting point.

As pre the rest of the questions I would need to look at the notebook that you're referrring to would be happy to clear out a few doubts. I think a lot of startups in India especially from Bangalore are eying out at RAG pipelines. There's a great community around these folks.

rravitheja

@Arshdeep Kaur

gpt-3.5-turbo by default
GPT-4 is good for evaluations as LLM Judge, so thats the reason gpt4 was used.

Thanks @beaverTango for clearing the doubts.

AArshdeep Kaur

Thanks @beaverTango and @ravitheja
Here is the notebook @beaverTango

Add a reply

Find answers from the community

@ravitheja Thanks a lot for the notebook