LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

i am trying to generate a dataset for

i am trying to generate a dataset for

At a glance

The community members are discussing how to generate a dataset for fine-tuning on Hugging Face. The main points are:

- The community member is trying to extract 'context' when running the code, which could be the top chunk from vector search or 3-5 sentences surrounding the answer. They are unsure whether to do this as a separate step.

- The community members suggest using the fine-tuning callback handler to collect all LLM inputs and outputs, including the retrieved context. They also mention that the default embedding model is text-embedding-ada-002 from OpenAI, and that the embedding model needs to be constant across indexing and querying.

The community members discuss the pros and cons of fine-tuning, with some suggesting that fine-tuning for general knowledge may not be worth it, and that it's better to fine-tune for specific use cases or domains. They also discuss the costs of fine-tuning with OpenAI models.

The community members also discuss the community member's specific use case of fine-tuning a model to reason about legal concepts, and suggest different approaches, such as fine-tuning one model for "how to" tasks and another for "what is" tasks, potentially using a RAG-based Q&A system.

Useful resources

·

i am trying to generate a dataset for fine tuning on HF. is it possible to extract also 'context' when running this code? which i presume will be the top chunk from vector search if i run a question , or 3-5 sentences surrounding the answer.
or should i do it as a separate step (go through the list of generated questions to find the closest vector)?
thank you!
ps. i tried to amend the prompt to ensure the output contains 'context', 'question', 'answer' but i am getting non-sensical response and format.

Plain Text

question_gen_query = (
    "You are a Teacher/ Professor. Your task is to setup "
    "a quiz/examination. Using the provided context, formulate "
    "a single question that captures an important fact from the "
    "context. Restrict the question to the context information provided."
)

dataset_generator = DatasetGenerator.from_documents(
    documents[:50],
    question_gen_query=question_gen_query,
    service_context=gpt_35_context,
)

L

M

49 comments

Probably using the fine-tuning callback handler will give more desired results? https://docs.llamaindex.ai/en/stable/examples/finetuning/openai_fine_tuning.html#gpt-4-to-collect-training-data

It collects all LLM inputs/outputs. Run your questions through a query engine and collection training data

finetuning_handler.save_finetuning_events("./output_path")

It's called OpenAIFineTuningHandler, but that's only because it saves in the JSONL format for openai, but you could convert that to any format

Thanks. So, request ‘context’ in the prompt ?

not quite

Use the dataset generator to generate questions

Ask those questions to a query engine, with the finetuning handler attached

The fine tuning handler records the llm inputs and outputs, which includes the retrieved context

Got it . Thanks much

On another thought , It’s not very economical to extract the closest vector /context with gpt4 , no?

it's not, and llamaindex doesn't use gpt-4 for that 🙂

Theres two models, the LLM and embedding model

the default embedding model is text-embedding-ada-002 from openai

you can also use local embedding models

The LLM can change at any time, but the embedding model has to be constant across indexing and querying

If you change embed models, you need to re-index all your data

Ok, so it’s just a simple embedding of a question and retrieval to exctract context right ?

I got worried when I saw gpt4 as a model in that function

well, it's using gpt-4 to answer the query using the retrieved context

you can change it to gpt-3.5 if you want

gpt-4 will generate higher-quality training data though

depends on how many questions you want to run 🙂

i don't know , i 've gone through the code , and i am now using the fine tuned model in openai playground , and it just gives me general answers , or does not give me the answers that were in the training set

it is clearly not fine tuned

i'll keep trying

the thing is here, that the goal is to fine-tune for RAG, not to fine-tune for general knowledge.

Fine-tuning for general knowledge generally does not work too well, and is usually not a good idea either.

I would only fine-tune to inherit some kind of personality, or to train it to understand domain specific terms. But I would continue using RAG

your point being, fine tune the model to use it later on rag-based q&a?

yea 🫡

have you done the research , say retrieve top 3 vectors + summary by existing model vs. retrieve top 3 vectors + summary by fine-tuned model? is it really worth it?

I would say fine-tuning is not really worth it, unless you have a super specific use-case or domain

espeically with openai, because the LLM costs are quite a bit higher for fine-tuned models

here is where i am coming from: say i have a legal text book, 'how to conduct a legal research, analysis and write legal briefs' it contains both content (what. 'a primary legal source is ...') and instructions (how. 'to brief the case here are the steps ...')

and you probably saw , i want to figure out how to teach llm to reason legally

but before that , i wanted to get the right setup in place.

fine tune model 1: how to do x , y , z

fine tuned model 2: what is x , y, z

the latter could be a rag + fine tuned model

but i definitely need model 1

i tried the routing and agents, once they find the first hit , that 's it

i need to 'embed' into a model the 'how' element

how to find a case

how to cite a case

how to apply rule by analogy , etc

sorry for my verbose background

got this result

Plain Text

{'ragas_score': 0.9058, 'answer_relevancy': 0.9560, 'faithfulness': 0.8606}

seems pretty good to me tbh for ragas

the initial one : {'ragas_score': 0.8664, 'answer_relevancy': 0.9721, 'faithfulness': 0.7814}

That seems like a good improvement then!

doesn't ragas need to be fine tuned on legal knowlege too? 🙂

i wonder how it assesses the relevancy

plus isn't ragas for rag retrievals?

i'll keep digging , and may be try to fine tune mistral or llama 2 for comparison .

thank you for all the help Logan and sorry i kept you busy

based on the description looks like this is the one i need to "bake in" knowledge https://gpt-index.readthedocs.io/en/latest/examples/finetuning/knowledge/finetune_knowledge.html

omission ?

Attachment

Screenshot_2023-10-14_at_8.46.45_AM.png

Must be typo yea

Add a reply

Sign up and join the conversation on Discord