Find answers from the community

Updated 2 years ago

LlamaIndex

At a glance

I've already tried llamaindex, but i does not work well : It talks about multiple sources but there is only one for example

22 comments

LLogan M

Llama Index works fine, but when using custom models you need to take a lot of attention to ensure the prompts are formatted correctly

See this thread

LLogan M

https://discord.com/channels/1059199217496772688/1141057622238314506/1141063356233556048

LLogan M

Llama2 has very specific requirements for prompting

RRAPHCVR

Yes I see ok

RRAPHCVR

For the moment,

RRAPHCVR

I used to load the llm not the custom llm module from llama index

RRAPHCVR

but the one advised in this repository:

RRAPHCVR

https://huggingface.co/TheBloke/vicuna-13B-v1.5-16K-GPTQ

RRAPHCVR

so it's a GPTQ instance loaded with auto_gptq

RRAPHCVR

and the huggingface pipeline

RRAPHCVR

so basically no need of this

RRAPHCVR

I think

RRAPHCVR

the problem I have is that the prompt sended to the llm is talking about sources, and the llm answer also with "sources", even if there is only 1 text (it's sources of chunk I think)

LLogan M

Are you using tree_summarize? That's the only mode that mentions sources in the prompt (because it's summarizing chunks from multiple sources)

RRAPHCVR

hi @Logan M

RRAPHCVR

No I'm only using this function from the llamaindex doc:

RRAPHCVR

But there is indeed Citation in the name of the module:

RRAPHCVR

Attachment

RRAPHCVR

Finally, I don't understand why the message is truncated

RRAPHCVR

Attachment

LLogan M

ah, the citation query engine also mentions sources in the inputs. It treats each text chunk as a source, and attempts to prompt the LLM to write in-text citations

LLogan M

It's truncated because by default llama-index only leaves room for 256 tokens. Additionally, lots of models also default to stopping at 256 output tokens

So you can change the model config, as well as set num_outputs=300 or similar in the service context 🤔

Add a reply