LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Citation query engine

Citation query engine

At a glance

The community members are discussing an issue where the LLM (language model) is not providing citations in the final response, even though it was able to provide citations in the intermediate responses. They suspect that the issue may be related to the context limit of a single call, as the post suggests. The community members explore various aspects of the code, such as the citation query engine, the refine function, and the LLM calls, to try to understand and resolve the issue. They also consider using a different LLM, such as GPT-4, or changing the response mode to tree summarize as potential solutions.

Useful resources

·

My hunch is that when it goes past the context limit of a single call something is breaking

L

b

29 comments

Pretty sus

All the citation query engine does is creates numbered source nodes, plus uses slightly different prompts in an existing synthesizer (compact, by default)

https://github.com/run-llama/llama_index/blob/6d32af25bc0d93dd004b324e5589cca78bdee557/llama_index/query_engine/citation_query_engine.py#L168

https://github.com/run-llama/llama_index/blob/6d32af25bc0d93dd004b324e5589cca78bdee557/llama_index/query_engine/citation_query_engine.py#L226

I’m way down the rabbit hole

In refine.py, line 169, text_chunks is size 1 even if top k is like 100 aka super huge amount of text

Is that right?

It could be multiple

Before calling that function, it's already iterating over text chunks (I.e 100)

But it checks to make sure each chunk is small enough first (L169 could create multiple chunks, depending on the prompt and chunk size of the current chunk)

I’m logging text_chunks and it appears to just be a giant list of one element of all the node text concatenated together

But I don’t see the citation query in it, not sure if it’s supposed to be at that point

Nope that's not in that point

Right since it's the compact synthesizer, there's a step in the compact class that tries to put as much node text as possible into each LLM call

Sorry am struggling to get to the root

Looking at refine_response_single, I only see an llm call if streaming is true (it’s not for me). Where is the llm call for non streaming?

Trying to find where we are expecting a response with citations but aren’t getting it

Ok made some progress

It’s finding the answer in the first chunk

I see the structuredrefineresponse has query_satisfied=True

But then it seems to continue calling refine_response_single 13 times in my case

Right, if you have a large top k (or large chunk size) refine will need to be called

Yeah see that now

Okay so there are 8 chunks, each produced a nice answer which included citations

It’s just the very final response which has no citations

Seems like the LLM just dropped the ball here? 🤔

There's nothing different about the last LLM call 🤔

Isn’t the final response like a mash up of all the others?

Nope

It's supposed to be refining as it goes

But how can it do that if I have like 40k tokens?

Can LLMs actually cite their sources? 3.5 seems pretty bad lol

It shows the LLM the existing answer, new context, and the LLM has to either repeat or modify its existing answer using the new context. This is the core idea to how refine works

That doesn't surprise me haha. Especially when you are doing 13 refine steps.

Maybe try gpt-4, or changing the response mode to tree summarize?

Add a reply

Sign up and join the conversation on Discord