Find answers from the community

Updated 2 months ago

Hey everyone I have documentation and I want to find the best embedding model for RAG. Ho

Hey everyone
I have documentation and I want to find the best embedding model for RAG. How can I score / benchmark different embedding models to find the best
k
p
14 comments
Please stand by @payload as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To find the best embedding model for your RAG (Retrieval-Augmented Generation) pipeline, you can use the LlamaIndex project's RetrieverEvaluator to evaluate the quality of different embeddings. The evaluation process involves comparing the quality of retrieved results from the ground-truth context for any given question. The metrics observed during this process are Hit-Rate and MRR (Mean-Reciprocal-Rank) source ().

Here is a Python code snippet showing how to set up the evaluation process for different embedding types:

Plain Text
import nest_asyncio
from llama_index.core.evaluation import generate_question_context_pairs
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.cohere import CohereEmbedding

nest_asyncio.apply()

source ()

You can also fine-tune your embedding model to improve its performance. Fine-tuning can give you a 5-10% increase in retrieval evaluation metrics. LlamaIndex provides comprehensive guides on how to fine-tune embeddings in different ways source ().

For initial model selection, it helps to look at how well the model is performing on a standardized, diverse set of domains or tasks. A useful benchmark for embeddings is the MTEB Leaderboard () source ().
Remember, if you change your embedding model, you must re-index your data. Furthermore, you should ensure the same embedding model is used for both indexing and querying source ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai what is the use of nest asyncio
Let me check the knowledge base for you. One moment! :books:
----------------------
nest_asyncio is a Python library that solves the problem of running asynchronous tasks in environments that do not natively support them, such as Jupyter notebooks or Python's interactive shell. It allows you to use Python's asyncio library in these environments.

In the context of LlamaIndex, nest_asyncio is used to allow asynchronous operations to run within Jupyter notebooks. This is particularly useful when working with LlamaIndex's asynchronous operations such as creating an index, querying, or running evaluations.

Here is how you can apply it:

Plain Text
import nest_asyncio
nest_asyncio.apply()

This code is often seen at the beginning of the LlamaIndex examples, allowing the rest of the asynchronous code in the notebook to run smoothly.
Sources:
  • AsyncIndexCreationDemo ()
  • SQLJoinQueryEngine ()
  • ensemble_query_engine ()
  • BatchEvalRunner - Running Multiple Evaluations ()
  • Ensemble Retrieval ()
  • Pairwise Evaluator ()
  • RouterQueryEngine ()
  • QdrantIndexDemo ()
  • SQLRouterQueryEngine ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Ok @kapa.ai i don't need to nest asyncio, when I am serving a llama index application using fastApi
Let me check the knowledge base for you. One moment! :books:
----------------------
That's correct. When you're serving a LlamaIndex application using FastAPI, you don't need to use nest_asyncio. FastAPI is built on Starlette for the web parts and Pydantic for the data parts, both of which natively support asynchronous operations.

FastAPI is designed to be used with Python's async and await keywords, which are used to define coroutine functions in Python. These keywords are part of Python's asyncio library, which allows for asynchronous I/O operations, and they're natively supported in FastAPI.

So, when you're using FastAPI, you can directly use async and await for your asynchronous operations without needing to use nest_asyncio.

Here's an example of how you might define an asynchronous route in FastAPI:

Plain Text
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def read_root():
    return {"Hello": "World"}


In this example, read_root is an asynchronous function that FastAPI can run concurrently with other functions, improving the overall performance of your application.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@Logan M @WhiteFang_Jr just tagging if u have anything to add in the conversation
@kapa.ai https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/cohere_retriever_eval.ipynb i am using this as reference to evaluate my rag score using open embeddings model and qdrant vector store

getting this error

^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 112, in _aretrieve
return await self._aget_nodes_with_embeddings(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 184, in _aget_nodes_with_embeddings
query_result = await self._vector_store.aquery(query, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/payload/miniconda3/envs/mloflo/lib/python3.12/site-packages/llama_index/vector_stores/qdrant/base.py", line 927, in aquery
response = await self._aclient.search(
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'search'
Let me check the knowledge base for you. One moment! :books:
----------------------
The error message you're seeing, AttributeError: 'NoneType' object has no attribute 'search', suggests that the _aclient attribute of your QdrantVectorStore instance is None at the time when the search method is called. This usually happens when the QdrantVectorStore instance is not properly initialized.

Here are a few things you can check:

  1. Initialization of QdrantVectorStore: Make sure you have correctly initialized the QdrantVectorStore instance. Here is an example of how to do it:
Plain Text
    from llama_index.vector_stores.qdrant import QdrantVectorStore
    vector_store = QdrantVectorStore(host="localhost", port=6333, collection_name="my_collection")
    


Replace "localhost" and 6333 with your Qdrant server's host and port, and "my_collection" with the name of your collection.

  1. Connection to Qdrant server: Ensure that your application is able to connect to the Qdrant server. The server should be running and accessible from the application. If the server is not running or not accessible, the _aclient attribute will not be properly initialized.
  1. Asynchronous context: Make sure you're running the code inside an asynchronous context. QdrantVectorStore's methods are asynchronous, so they need to be called inside an async function or an event loop. If you're running this in a Jupyter notebook, you might need to use nest_asyncio.apply() to enable nested event loops.
If you're still facing issues, please provide more details about your setup and the steps leading up to the error. This will help in diagnosing the problem more accurately.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord