Find answers from the community

Updated 3 months ago

Custom LLM pipeline

Hello guys. I have one question. When we're defining a custom LLM like here:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-custom-llm-model
Why are we using the text-generation task instead for example lets say the question answering? Is there a reason behind this? And can we use also question answering?
L
m
12 comments
I used the text-generation class because the input is a raw prompt, and the model needs to read that and "continue" the prompt by answering the question.

You could use a question answer pipeline too though, or really you can use anything as long as it returns an answer lol but the question answering pipeline takes a specific format (question and context need to be separated), which would require more complex string parsing to organize the prompt
Aha. I thought it was because it "re-organizes" the information from the embeddings that were retrieved
One question about this as well :P. Does the re-organization happen with LLMs or with text-splitting?
From my understanding I see that its some kind of parsing and text-splitting
So when you input some documents into an index, they get broken into chunks so that they hopefully fit into the models context size at query time.


Then when you query (this is for a vector index) the top_k nodes that match the query are retrieved. Then, an answer to the query is refined across all the nodes. This means that once the model gives an answer, llama index presents the LLM with new context and ask if it needs to change the previous answer.
Got it thank you very much! For context I'm trying to create a bot to read an entire documentation like the example you provided. I got weird results though when I tried various models. For example flan-t5 gave empty responses and opt-iml-1.3b gave decent results which I thought its curious
Do you think that this is due to the "customisation" of the LLM or due to the LLM? Could "customizing" the CustomLLM make the flan-t5 give reasonable results like the opt-iml?
Flan has a very limited context size. Which means if it needs to refine an answer, it can quickly run out room in the input (since the refine needs the prompt template + question + context + previous answer)

Even without the refine, the 512 token limitation makes things a little tricky
Oh I get it. Thanks for your help πŸ™‚ and awesome framework!
One more thing. What structure do you think is more relevant for my problem? (Chatbot documentation) tree index or vector index?
I think I would start with a vector index πŸ‘
You can also make some more complex structures, such as many sub indexes wrapped by a top level index https://gpt-index.readthedocs.io/en/latest/how_to/index_structs/composability.html

But really, a vector index is pretty capable of handling most things
Add a reply
Sign up and join the conversation on Discord