Find answers from the community

Updated last year

Citation query engine

My hunch is that when it goes past the context limit of a single call something is breaking
L
b
29 comments
I’m way down the rabbit hole
In refine.py, line 169, text_chunks is size 1 even if top k is like 100 aka super huge amount of text
It could be multiple

Before calling that function, it's already iterating over text chunks (I.e 100)

But it checks to make sure each chunk is small enough first (L169 could create multiple chunks, depending on the prompt and chunk size of the current chunk)
I’m logging text_chunks and it appears to just be a giant list of one element of all the node text concatenated together
But I don’t see the citation query in it, not sure if it’s supposed to be at that point
Nope that's not in that point
Right since it's the compact synthesizer, there's a step in the compact class that tries to put as much node text as possible into each LLM call
Sorry am struggling to get to the root
Looking at refine_response_single, I only see an llm call if streaming is true (it’s not for me). Where is the llm call for non streaming?
Trying to find where we are expecting a response with citations but aren’t getting it
Ok made some progress
It’s finding the answer in the first chunk
I see the structuredrefineresponse has query_satisfied=True
But then it seems to continue calling refine_response_single 13 times in my case
Right, if you have a large top k (or large chunk size) refine will need to be called
Yeah see that now
Okay so there are 8 chunks, each produced a nice answer which included citations
It’s just the very final response which has no citations
Seems like the LLM just dropped the ball here? 🤔
There's nothing different about the last LLM call 🤔
Isn’t the final response like a mash up of all the others?
It's supposed to be refining as it goes
But how can it do that if I have like 40k tokens?
Can LLMs actually cite their sources? 3.5 seems pretty bad lol
It shows the LLM the existing answer, new context, and the LLM has to either repeat or modify its existing answer using the new context. This is the core idea to how refine works
That doesn't surprise me haha. Especially when you are doing 13 refine steps.

Maybe try gpt-4, or changing the response mode to tree summarize?
Add a reply
Sign up and join the conversation on Discord