LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

RAG

RAG

At a glance

The post is asking about the purpose of reading documents from the "data" directory. The comments discuss the limitations of large language models (LLMs) in handling long documents, and how the code shown can help by extracting relevant chunks to pass to the LLM. The process is referred to as "RAG" by one community member.

The comments also mention the ability to define prompts to make the LLM answer in a specific style, such as like James Bond or Captain Jack Sparrow. There is discussion around the need for examples of the user's own writing style to make the LLM sound like them.

The code provided demonstrates loading documents from a directory, creating a vector store index, and using a query engine to retrieve responses to questions. Community members discuss the different components of this process, such as embedding the question, retrieving relevant documents, and combining them into a prompt for the LLM.

There is no explicitly marked answer in the post or comments.

Useful resources

·

it is reading documents from the "data" directory, to do what?

W

S

v

30 comments

LLMs have a definitive context window. So it is not possible for you to feed your entire doc ( let say of 100 pages) to the llm.

This is where this code will help you.

When you will ask a question from the file, it will extract the chunks which contains possible answer to your query and pass that to llm l. Thus llm is able to understand your query and context and is able to answer.

ahhhh ok

this process is called rag?

You can define the prompt to make it answer like James bond or captian Jack sparrow

yes, these are famouse people, what about me? not a famouse person?

RAG will give you the documents to use in the few-shot learning I mentioned earier. I'm doubtful you really need RAG to do this

You can define your way of answering in the prompt. Like this is how I answer. Answer in this format always...

If you want an LLM to sound like you, you'll need examples of how you talk/chat

i do, i have 5 PDF documents of me texting and writing essays

Plain Text

from llama_index.core import PromptTemplate

text_qa_template_str = (
    "Context information is"
    " below.\n---------------------\n{context_str}\n---------------------\nUsing"
    " both the context information and also using your own knowledge, answer"
    " the question: {query_str}\nIf the context isn't helpful, you can also"
    " answer the question on your own in JACK SPARROW STYLE.\n"
)
text_qa_template = PromptTemplate(text_qa_template_str)

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(text_qa_template=text_qa_template)
response = query_engine.query("What did the author do growing up?")
print(response)

I would try prompt engineering, with few-shot

https://www.promptingguide.ai/techniques/fewshot

which is what WhiteFang is showing in code

Yeah this will clear your doubts, basics

I'm not sure you really need the RAG part at this point

just pick some docs by hand to start with

This sounds like a problem where you want to be trying the advanced LLM techniques for extending context

True, if you only want the llm to sound like you then that's not RAG.

If you want the llm to answer with the help of your documents in the form of QnA or chat with your taking style or not

That is where RAG comes in

ok wait, i just want to understand, what s RAG? what am i trying to do? i already understood fine-tuning

https://www.promptingguide.ai/techniques/rag

ok and having the llama get indexese from a directory of documents like in the code i pasted, what is that called?

the code you posted is actually a few things...

Plain Text

# read in documents to a class
documents = SimpleDirectoryReader("data").load_data()

# process each document: (1) embedding LLM (2) save to spcial DB
index = VectorStoreIndex.from_documents(documents)

# ... probably just a function with implied settings / workflow (q&a vs chat, depending on what fn you use here)
query_engine = index.as_query_engine()

# lots of magic here
# 1. embed the question
# 2. get documents from DB by using the output from (1)
# 3. combine both into a prompt, few-shot style
# 4. call the LLM (llama2, openai, ...)
response = query_engine.query("What did the author do growing up?")

# handle the model output, present to user
print(response)

The full process here is typically referred to as a "pipeline", the RAG part is really just the response = query_engine.query("What did the author do growing up?") step (imho). The rest is preparing to be able to do that ... and then presenting the response to the user

ohhhh i get it

so this is just querying my documents, but not changing the "style" of the LLM

i get it

right, there's a lot of effort that goes into building the database you can use to augment your prompts with relevant context. LlamaIndex makes this surprisingly easy

Here's some more advanced code I wrote yesterday: https://discord.com/channels/1059199217496772688/1218390769392418888

This

separates the "build DB" & "use DB" steps into different files
uses more advanced techniques / models
has a minimal API

I'm working on the NextJS side today

note that it is only a q&a system today, but from what I understand, switching to chat is a pretty small code change on the LlamaIndex side, more effort in the API/UI required

Add a reply

Sign up and join the conversation on Discord