Find answers from the community

Updated 2 months ago

Learning question I am using Django to

Learning question: I am using Django to store files and am loading the documents via the URL of the file vs the directory loader. I'm not sure the files are actually being indexed. Like this:

documents = [Document(source.file.url) for source in project.context.sources.all()]
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
index = GPTSimpleVectorIndex(nodes, service_context=service_context)

When I submit my prompt I'm getting a response that more context is needed. Am I messing up the indexing of the files?
L
m
14 comments
Your indexing is fine

are you using gpt3.5? It's been a little stubborn to work with lately (usually, the answer refinement is a probably for gpt3.5 over the last few weeks, they "updated" the model, and it seems much worse)
Yeah, I'm using the default GPT-3.5 with text-davinci-003. Wondering if I need to change my prompt for it better understand the context.
I've been working on a new refine prompt, maybe give it a shot.

Plain Text
from langchain.prompts.chat import (
    AIMessagePromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)

from llama_index.prompts.prompts import RefinePrompt

# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
    HumanMessagePromptTemplate.from_template("{query_str}"),
    AIMessagePromptTemplate.from_template("{existing_answer}"),
    HumanMessagePromptTemplate.from_template(
        "I have more context below which can be used "
        "(only if needed) to update your previous answer.\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n"
        "Given the new context, update the previous answer to better "
        "answer my previous query."
        "If the previous answer remains the same, repeat it verbatim. "
        "Never reference the new context or my previous query directly.",
    ),
]


CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
index.query("my query", similarity_top_k=3, refine_template=CHAT_REFINE_PROMPT)
I've had better results using the tutorial document with Paul Graham vs my own document.
You might also want to set a smaller chunk size in the simple parser, to help the embeddings be more specific

SimpleNodeParser(text_splitter=TokenTextSplitter(chunk_size=1024))
sorry, just throwing out a bunch of things that can/should likely be tweaked lol
This is great. Thanks for weighing in here. I wasn't sure if I was just indexing the file path text or if the loader was actually reading the text from the document.
I'm using a DOCX file. Do you think I should just use the DOCX loader for better results too?
Yea it would probably help! I think docx files have a bunch of encoding crap that can be parsed out using the proper loader
gonna try that too. Thanks for the help here.
Much better results changing the loader. Going to try adjusting the prompts for even better results. Spent too long yesterday banging my head on the keyboard. Ha.
Haha awesome! Glad it's improving! πŸ’ͺ
let me know if that refine prompt changes things btw, I might make a PR to change the default one to it (it seems to give better results based on my testing with others so far)
Will do! Digging into refining it tomorrow/next week. Fixing some issues I'm having when deploying it.
Add a reply
Sign up and join the conversation on Discord