Find answers from the community

Updated 8 months ago

Isn’t FLAN-T5 already a fine-tuned model

Isn’t FLAN-T5 already a fine-tuned model?

When I studied FLAN-T5 in the lecture, I understand that FLAN-T5 was created by fine-tuning the T5 model with 50 to 100,000 multi-data sets. However, look at the image below, FLAN-T5 is listed as the base model, and on the right it says full fine tuning. I don't understand
Attachment
videoframe_366244.png
L
10 comments
Yes FLANT-T5 is technically fine-tuned on a ton of instruction datasets
because of this, its generally pretty good at follow instructions. And if you wanted to fine-tune further, it doesn't require a huge dataset
Is there a difference between full fine tuning and FLAN??
I mean, more fine tuning will just make it more aligned to what you train it on (maybe you have a very specific task)
Oh, please let me know if I understand correctly.
Full fine tuning updates all weights, so catastrophic forgetting may occur.

However, catastrophic forgetting does not occur in FLAN-T5 because the dataset has been trained so that it can be used in multi-tasking.

Did I understand it well?
Fine tuning doesn't always mean catastrophic forgetting, especially if you use a lower learning rate

Because of FLANT-T5s training, it makes it really easy to quickly adapt to specific datasets/use cases without too much data

But out of the box, I would expect it to work fairly well
thank you for telling me. I'm a front-end developer, but I've been studying llm recently. Please support me so I can change to an llm engineer hahaha
Currently, I am studying prompt tuning - soft prompting among the PEFT techniques.


The description of this photo is as follows:
Plain Text
"One potential issue to consider is the interpretability of learned virtual tokens. Remember, because the soft prompt tokens can take any value within the continuous embedding vector space. The trained tokens don't correspond to any known token, word, or phrase in the vocabulary of the LLM. However, an analysis of the nearest neighbor tokens to the soft prompt location shows that they form tight semantic clusters. In other words, the words closest to the soft prompt tokens have similar meanings. The words identified usually have some meaning related to the task, suggesting that the prompts are learning word like representations."


But doesn’t this improve the performance of the prompt? Why is this a problem?
Attachment
videoframe_302338.png
Yea thats a weird paragraph. I think its saying the soft prompt tokens have the possibility to take on any meaning, but usually ends up being something that makes sense
I don't quite understand. Could you explain a little?
Add a reply
Sign up and join the conversation on Discord