Find answers from the community

Updated 2 months ago

Hi,

Hi,

I am new to RAG and basically I pulled an article from about SQL Injection from Wikipedia and wrote the code following the llama index tutorials from deeplearning

Plain Text
documents = SimpleDirectoryReader(
    input_files = ["./docs/sql-injection.txt"]
).load_data()

document = Document(text="\n\n".join([doc.text for doc in documents]))

sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)

sentence_window_engine = get_sentence_window_query_engine(sentence_index)

window_response = sentence_window_engine.query("""
        what is sql injection?
""")
print(str(window_response))


And it return me


The context does not provide information on what SQL injection is.

What is happening here?

When I pass it research paper in pdf format. It reads the pdf and returns me the answer. What's wrong with SQL_injection.txt scenario?
T
A
7 comments
Could you do
Plain Text
print(response.source_nodes)
This way you could check whether the issue is with LLM synthesis or the returning of correct source nodes
oh the print(window_response.source_nodes) shows that it still have the old content from old documents
earlier I used different pdfs

Plain Text
documents = SimpleDirectoryReader(
    input_files = ["./docs/sql-injection.txt"]
).load_data()


here and it still shows that as content. I ran all the cells again
so my document have the current pdf
both of these contents are different
Attachment
Screenshot_2024-03-30_at_07.43.47.png
shouldn't windows_response.source_nodes contain the content from documents?
Attachment
Screenshot_2024-03-30_at_07.44.49.png
instead windows_response.source_nodes contain content from old documents variable which has access to old pdf
Add a reply
Sign up and join the conversation on Discord