0.8.x

HHeyC

Embeddings Fine-tuning. issue when generating dataset.

Hi. I decided to use the "outdated" approach for generating the dataset to be used for embeddings fine-tuning:

https://github.com/run-llama/finetune-embedding/blob/main/generate_dataset.ipynb

I am getting the error
module 'openai' has no attribute 'error'
when performing the step :
train_queries, train_relevant_docs = generate_queries(train_corpus)

I am aware that the newest guidelines are available at
https://docs.llamaindex.ai/en/latest/examples/finetuning/embeddings/finetune_embedding/
but I like more the idea of being able to get control over the prompt (not using ENG in my case), which is not anymore there.

Any idea which version could help running it smoothly ?
The llama-index version I am using is 0.8.5.post2.

The specific error is thrown at

......./llama_index/llms/openai_utils.py:119, in _create_retry_decorator(max_retries)
    111 max_seconds = 10
    112 # Wait 2^x * 1 second between each retry starting with
    113 # 4 seconds, then up to 10 seconds, then 10 seconds afterwards
    114 return retry(
    115     reraise=True,
    116     stop=stop_after_attempt(max_retries),
    117     wait=wait_exponential(multiplier=1, min=min_seconds, max=max_seconds),
    118     retry=(
--> 119         retry_if_exception_type(openai.error.Timeout)

Thanks.

6 comments

LLogan M

I'm pretty sure the "newer" method still has a way to control the prompt?

0.8.x is quite old (6 months maybe?). I'm guessing your openai package version is too new. You probably need openai<1.0

HHeyC

Thanks. I was coming to the same point (openai version too new) later. As for the newer approach, no it does not have a visible option to adjust the prompt (and more) by configuration.
Newer uses generate_qa_embedding_pairs
while old used generate_queries

LLogan M

hmmm

Attachment

LLogan M

I see lots of config there

LLogan M

Here's the default prompt that is being used

Attachment

HHeyC

Hey Logan, thanks very much for pointing to the prompt.
Let me just better explain what I meant with the parameters: I knew that the generate_qa_embedding_pairs was the (new) place to go to modify the parameters, it's just that I didn't trust myself that much to go and change it in the core (common.py)., and I wanted to possibly make the decisions/configurations in the notebook.
So instead, I thought that the way
the embedding finetuning was (previously) made here https://github.com/run-llama/finetune-embedding/tree/main
was much more straight through for me to change the needed stuff: batch size, epochs, prompts, question per chunk, etc.., at least in what I called "configuration" (notebook).

Anyway, I made it to downgrade the package and to run it to the end (the old way). I am really happy with the resulting work and let me share once more the appreciation for both the work and support received.

Add a reply

Find answers from the community

0.8.x