Hi,

At a glance

Hi,

I am new to RAG and basically I pulled an article from about SQL Injection from Wikipedia and wrote the code following the llama index tutorials from deeplearning

Plain Text

documents = SimpleDirectoryReader(
    input_files = ["./docs/sql-injection.txt"]
).load_data()

document = Document(text="\n\n".join([doc.text for doc in documents]))

sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)

sentence_window_engine = get_sentence_window_query_engine(sentence_index)

window_response = sentence_window_engine.query("""
        what is sql injection?
""")
print(str(window_response))

And it return me

The context does not provide information on what SQL injection is.

What is happening here?

When I pass it research paper in pdf format. It reads the pdf and returns me the answer. What's wrong with SQL_injection.txt scenario?

7 comments

TTeemu

Could you do

Plain Text

print(response.source_nodes)

This way you could check whether the issue is with LLM synthesis or the returning of correct source nodes

AAbhimanyu Aryan 🧞

oh the print(window_response.source_nodes) shows that it still have the old content from old documents

AAbhimanyu Aryan 🧞

earlier I used different pdfs

Plain Text

documents = SimpleDirectoryReader(
    input_files = ["./docs/sql-injection.txt"]
).load_data()

here and it still shows that as content. I ran all the cells again

AAbhimanyu Aryan 🧞

so my document have the current pdf

AAbhimanyu Aryan 🧞

both of these contents are different

Attachment

AAbhimanyu Aryan 🧞

shouldn't windows_response.source_nodes contain the content from documents?

Attachment

AAbhimanyu Aryan 🧞

instead windows_response.source_nodes contain content from old documents variable which has access to old pdf

Add a reply

Find answers from the community

Hi,