LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

heelo, I'm coding a script who that

heelo, I'm coding a script who that

At a glance

The community member is facing an issue with the "StorageContext has no attribute callback manager" error when trying to store an index in a MongoDB database using Flask, MongoDB, OpenAI, llama_index, and LangChain. The community members provide suggestions to fix the issue, such as passing the correct parameters to the VectorStoreIndex.from_documents() function and setting the global service context.

Additionally, the community member faces a performance issue when trying to index a large number of documents (around 150), which takes a long time or never completes. The community members suggest running the indexing process on a separate thread to return a response immediately and continue the indexing in the background.

The community members also suggest increasing the batch size for the embedding model to improve the indexing performance.

·

heelo, I'm coding a script who that store my index in a mongo database, i'm using flask mongodb openai llama_index and langchain. when i excecute the code i get this error "StorageContext has no attribute callback manager", how can i fix this ?

here's the code that don't work :
mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# Création ou mise à jour d'un index à partir de documents dans le dossier 'Sources'

documents = SimpleDirectoryReader("./Sources/5-1-placement.pdf").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=storage_context)

responseDTO = IndexCreationResponse.IndexCreationResponseDTO(False, None, "L'index à bien été créé ou a été mis à jour.")

W

K

28 comments

You are passing storage_context to the service_context kwarg while creating index.

It should be like this

Plain Text

    index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)


## OR define service context globally and no need to pass it to anywhere
from llama_index import set_global_service_context

set_global_service_context(service_context)

index = VectorStoreIndex.from_documents(documents,storage_context=storage_context)

the line : "documents = SimpleDirectoryReader("./Sources/5-1-placement.pdf").load_data()" goes after this or should i remove it?

i keep having the same error :

Attachment

You need to update this:

index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

You are passing storage_context to service_context kwarg

by install?

no, jsut replace the line where you are creating the index

Comment the line where you are getting error and move the global setting up one line

like this ?

Attachment

yes but with little change, let me write down this part:

Plain Text

from llama_index import set_global_service_context

mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# You need to create the service context above this line
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources").load_data()
index = VectorStoreIndex.from_documents(documents,storage_context=storage_context)

it works now thank you !^^

hello, @WhiteFang_Jr i'm facing an issue, the thing you gave me yesterday works but when i want to index something like 150 document in a row this take tooo long (about 1h or more or never end), is there a way to make the code down here index one document return the status(200) and restart with the following documint until all document are indexed?

dossier = requestDTO.Index

# Initialisation des paramètres pour les requètes sur MongoDB Atlas

mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# Création ou mise à jour d'un index à partir de documents dans le dossier 'Sources'1
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources/Zephyr").load_data()
#index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

#while documents:
# index.add_documents([documents], storage_context=storage_context)

responseDTO = IndexCreationResponse.IndexCreationResponseDTO(False, None, "L'index à bien été créé ou a été mis à jour.")

# Terminée, on envoi la réponse définitive

return GenerateIndexResponse(requestDTO, responseDTO), 200

You can run the indexing process on a separate thread simply return the response that indexing in process.

Are you using any python framework

i'm using these :

Attachment

can you show me how to separate on a threads?

Sure,it will look something like this:

Plain Text

from flask import Flask, jsonify
import threading

app = Flask(__name__)

def process_data(data):
    # Perofrm the indexing here!!!
    # Process data here (simulated by printing)
    print(f"Processing data: {data}")
    # Simulate a long-running task
    # Replace this with your actual data processing logic
    import time
    time.sleep(5)
    return f"Processed data: {data}"

@app.route('/process', methods=['POST'])
def process():
    # Recieve your files and send it to the method
    data = request.json  # Assuming data is sent in JSON format
    # Start a new thread to process the data
    thread = threading.Thread(target=process_data, args=(data,))
    thread.start()
    # Return an immediate response to the client
    return jsonify({"message": "Indexing data on a separate thread."})

if __name__ == '__main__':
    app.run(debug=True)

for that i need my app.route index to call app.route process ?

No, It's jsut an example, to show you how u can implement threading

You need to create a method where you will do the indexing.
Use threading as given there in your API method from where u are currently doing the indexing.

okay, i just need to call from my index method my process def to separate my documents on a threads, right ?

Yes

could you make an exemple more precise with the code i gave you, i d'ont really understand what i have to do ?

Okay let me try to add it on your code

Can you send me your entire API method, so that I can make the change

This should work for your case

sorry for answer late, this works but the problem is the quota storage in my mongo datbase so i need to update that first

huge thanks for your help

Awesome, happy to help

Also You can increase the batch size that will also reduce the time for index building.

Plain Text

    embed_model = AzureOpenAIEmbedding(
        model=model.LearningModel.Model,
        deployment_name=model.LearningModel.Name,
        api_key=openai.api_key,
        azure_endpoint=openai.base_url,
        api_version=openai.api_version,
        embed_batch_size=50 # This is by default 10
    )

Add a reply

Sign up and join the conversation on Discord