Find answers from the community

Updated 5 months ago

0.8.x

Embeddings Fine-tuning. issue when generating dataset.

Hi. I decided to use the "outdated" approach for generating the dataset to be used for embeddings fine-tuning:

https://github.com/run-llama/finetune-embedding/blob/main/generate_dataset.ipynb

I am getting the error
module 'openai' has no attribute 'error'
when performing the step :
train_queries, train_relevant_docs = generate_queries(train_corpus)

I am aware that the newest guidelines are available at
https://docs.llamaindex.ai/en/latest/examples/finetuning/embeddings/finetune_embedding/
but I like more the idea of being able to get control over the prompt (not using ENG in my case), which is not anymore there.

Any idea which version could help running it smoothly ?
The llama-index version I am using is 0.8.5.post2.

The specific error is thrown at

......./llama_index/llms/openai_utils.py:119, in _create_retry_decorator(max_retries) 111 max_seconds = 10 112 # Wait 2^x * 1 second between each retry starting with 113 # 4 seconds, then up to 10 seconds, then 10 seconds afterwards 114 return retry( 115 reraise=True, 116 stop=stop_after_attempt(max_retries), 117 wait=wait_exponential(multiplier=1, min=min_seconds, max=max_seconds), 118 retry=( --> 119 retry_if_exception_type(openai.error.Timeout)

Thanks.
L
H
6 comments
I'm pretty sure the "newer" method still has a way to control the prompt?

0.8.x is quite old (6 months maybe?). I'm guessing your openai package version is too new. You probably need openai<1.0
Thanks. I was coming to the same point (openai version too new) later. As for the newer approach, no it does not have a visible option to adjust the prompt (and more) by configuration.
Newer uses generate_qa_embedding_pairs
while old used generate_queries
hmmm
Attachment
image.png
I see lots of config there
Here's the default prompt that is being used
Attachment
image.png
Hey Logan, thanks very much for pointing to the prompt.
Let me just better explain what I meant with the parameters: I knew that the generate_qa_embedding_pairs was the (new) place to go to modify the parameters, it's just that I didn't trust myself that much to go and change it in the core (common.py)., and I wanted to possibly make the decisions/configurations in the notebook.
So instead, I thought that the way
the embedding finetuning was (previously) made here https://github.com/run-llama/finetune-embedding/tree/main
was much more straight through for me to change the needed stuff: batch size, epochs, prompts, question per chunk, etc.., at least in what I called "configuration" (notebook).

Anyway, I made it to downgrade the package and to run it to the end (the old way). I am really happy with the resulting work and let me share once more the appreciation for both the work and support received.
Add a reply
Sign up and join the conversation on Discord