Find answers from the community

Updated last month

RAG

At a glance

The community member is new to building RAG (Retrieval-Augmented Generation) pipelines and is trying to build one that interfaces with their school's CS department information to ask basic questions. They have crawled the website and deeper links using fire crawl, resulting in a large JSON file that they have cleaned up and converted to a text file. They have used Hugging Face embeddings, but are stuck on understanding what indexing and storing mean and how to do it effectively. The community member is also experiencing issues with the LLM (Large Language Model) hallucinating a lot when given context.

In the comments, another community member explains that indexing refers to a place where embeddings, nodes, query engine, and chat engine instances can be found, while storing means where the embeddings and nodes are stored, either locally or in a vector database. For the hallucination issue, the community member suggests trying the LLAMA 3.2 model if the user has a GPU, as it is known to perform well.

Another community member responds to the original post, suggesting that the community member is on the right track by exploring different ways of chunking the data to improve the data ingestion portion of their code.

Hi guys. I’m new to building RAG pipelines. Currently I’m trying to build a RAG that interfaces with my schools cs department info so I can ask it basic questions. I have a couple questions.

  1. I used fire crawl to crawl the website and deeper links, so I have a huge json file. I took that JSON and cleaned that up and made it a txt.
  1. Used hugging face embeddings but from here I’m stuck on what indexing and storing actually means and how I should do it effectively. Right now the Llm given context hallucinates A LOT!
W
e
3 comments
Hey,
Indexing is basically a place where you can find embeddings, nodes, query engine, chat engine instance.

Storing would mean where you store the embeddings and node, which is either locally or in a vector database.

For hallucinations, if you are using open source llms then there are chances for it. If you have GPU try llama3.2 they are pretty good
Thanks for the response. I guess maybe I thought I could improve the data ingestion portion of my code. I’m going down the rabbit hole of chunking data. Am I on the right track?
Yes if the default chunking is not helping you correctly then you can try diff chunking ways , size
Add a reply
Sign up and join the conversation on Discord