Find answers from the community

Updated 11 months ago

Rag

At a glance
I'm less than thrilled with my rag results and looking to see if anyone has some suggested reads they found useful around metrics and root causing, etc. I am reading this atm - https://blog.llamaindex.ai/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5 which has good information but admitidaly not a rag expert so there could be much better reads i am overlooking. Rag implementation is basically - scraped a bunch of websites related to a topic. when asking it questions, ones i know data exists and in some cases using the exact title from metadata it's not finding them and returning stuff from other unrelated blog texts.
L
s
12 comments
It might be helpful if you shared a few more details.

How many websites? Did you put it all in a vector index without changing any settings? Any other customization?
Sure. So. In total my database (where i sent all the parsed info to) has 14,837 unique URLs.

Metadata:

Plain Text
{"date": "<data article posted - 2022-09-13>", "url": "<car_review_url>", "section": "SUV", "source": "Car and Driver", "title": "<title of the article>"}


Then i have the raw summary (text tag extraction via soup). As far as "cleaning goes" atm I am just lowercasing everything. I'm looking at best practices for this also.

The text + metadata are fed to return a document.

Plain Text
class Configuration:
    def __init__(self):
        self.initialize()

    def initialize(self):
        self.llm = Ollama(model="mixtral:8x7b-instruct-v0.1-q6_K", base_url="http://192.168.0.105:1234")
        
        self.embed_model = MistralAIEmbedding(model_name="mistral-embed", 
                                              api_key=MISTRAL_API_KEY,
                                              embed_batch_size=8)
        
        self.client = chromadb.PersistentClient(path="./dbs/vector_dbs_test/cars/")
        self.chroma_collection = self.client.get_or_create_collection(name="cars_rag_test")
        self.vector_store = ChromaVectorStore(chroma_collection=self.chroma_collection)
        self.storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
        self.service_context = ServiceContext.from_defaults(llm=self.llm,
                                                    chunk_size=1024,
                                                    chunk_overlap=25,
                                                    embed_model=self.embed_model)
building the index:

Plain Text
def extract_and_store_articles_info(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute('SELECT a.metadata, a.original_summary FROM articles_processed a')
    rows = cursor.fetchall()
    conn.close()
    
    return rows
        
def load_data(metadata, text):
    document = Document(text=text, metadata=metadata)
    
    return [document]

def main():
    config = Configuration()
    
    document_list = []
    
    rows = extract_and_store_articles_info("./dbs/processed_data_test.db")  
    
    for row in rows[:500]:
        metadata, text = json.loads(row[0]), row[1]
        metadata = {key.lower(): value.lower() if isinstance(value, str) else value for key, value in metadata.items()}
        text = text.lower()
        
        documents = load_data(metadata, text)
        document_list.append(documents)
 
    for document in document_list:
        #document[0].excluded_llm_metadata_keys = ["url"]
        #print(document[0].get_content(metadata_mode=MetadataMode.LLM))
        
        try:
            VectorStoreIndex.from_documents(documents=document,
                                            service_context=config.service_context, 
                                            storage_context=config.storage_context,
                                            show_progress=True)
        except:
            print("Error:", document[0].get_content(metadata_mode=MetadataMode.LLM))
            continue
    
if __name__ == "__main__":
    main()
Then searching wise with the same service context, storage context, etc:

Plain Text
    index = VectorStoreIndex.from_vector_store(vector_store=config.vector_store,
                                               service_context=config.service_context)
    
    query_engine = index.as_query_engine(verbose=True)
    
    USER_PROMPT = """
    Can you give me the Pricing and Specs for the 2024 Toyota RAV4 Review article.
    
    Cite the URL references you used to determine your answer.
    
    Think through the steps before responding.
    """
    response = query_engine.query(USER_PROMPT)
    
    print(response)
Pricing and Specs for the 2024 Toyota RAV4 Review article -> tile of the article which is also in the metadata (and summary): 2024 Toyota RAV4 Review, Pricing, and Specs

I'll just get really weird responses; and it grabs information about unrelated things and includes it in; then seems to completely ignore what i asked for.
@Logan M hopefully that helps. sorry for not including that initially
No worries! That helps immensely!

So, with 14,000+ weboages, it makes sense this approach doesn't work 😅

index.as_query_engine() only uses vector retrieval to return the top 2 most similar chunks. With that amount of data, you can maybe see how a top k of 2 is too small.

Instead what I might do to improve results here, is crank the top k, and then also use a reranker.

For example (I'm going to use v0.10.x code, not sure what you are on right now)

Plain Text
pip install llama-index-postprocessor-flag-embedding-reranker


Plain Text
from llama_index.postprocessor.flag_embedding_reranker import (
    FlagEmbeddingReranker,
)

rerank = FlagEmbeddingReranker(model="BAAI/bge-reranker-base", top_n=3)

query_engine = index.as_query_engine(similarity_top_k=20, node_postprocessors=[rerank])
Another thing you could introduce (on top of adding reranking) is hybrid search as well
Thank you @Logan M - made the rerank changes (looking into hybrid search). Using reranker-large. Seems to be providing responses back a bit better but still room for improvement.
Nice! What kind of top k did you use for the initial similarity top k? 20? (Tweaking that may help, at the cost of runtime)
k - 20 and n - 6
but still playing with it a bit. trying some different rerankers (cohere, etc.)
Add a reply
Sign up and join the conversation on Discord