Find answers from the community

Updated 3 months ago

Rate limit

Hi guys, i am really new on this type of technology, i have managed to create a VectorStore index with trained data using the OpenAI API. I am really interested on getting to know how can i create larger VectorStores indexes. The reason is i have now a lot of files and when i try to train the model, i get a OpenAI token limit error. So i was wondering how can i merge/load different VectorStores, or how can i load a lot more of files.

This i how i am loading the files (Failing code due to token limit):

Plain Text
def construct_index(directory_path):
    num_outputs = 1024

    llm_predictor = LLMPredictor(
        llm=OpenAI(
            temperature=0.1, model_name="text-davinci-003", max_tokens=num_outputs
        )
    )

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

    docs = SimpleDirectoryReader(directory_path).load_data()

    index = GPTVectorStoreIndex(nodes=docs, service_context=service_context)

    index.storage_context.persist(persist_dir="index")

    return index


Thanks in advance for the help πŸ˜‰
L
F
12 comments
Do you have a paid openai account? I know the trial usage is very rate limited

In any case, you can try lowering the embedding batch size (the default is 10)
https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/usage_pattern.html#batch-size
Yes i have a paid openai account. But i still reach the limit. So i have to divide the files in 4 smaller folders. But this result in having 4 different vector stores
Lowering the batch size will likely help then (like lowering to 1)

Although it will just be slower
Is there a way i can merge different VectorStores?
Yea there's no merge function really, but you could wrap each index to be a tool in an agent or subquestion query engine

Although normally you'd want to do this and sort your data into specific topics per index
Embeddings are super cheap though, if you need to recreate
So do you recommend me to change the batch size and recreate the index with all the files?
I believe so! Probably worth a shot
Will do and tell you how it goes!! Thanks for your time
@Logan M It worked!! Thanks for de advice. I have a question does Openai stores the data i trained? I read that they don't but i just want to double check. Thank for the time
They store it for up to 30 days apparently, but they state that its not used for any training data πŸ€·β€β™‚οΈ
Also, glad it works now!
Add a reply
Sign up and join the conversation on Discord