Find answers from the community

Updated 3 months ago

heelo, I'm coding a script who that

heelo, I'm coding a script who that store my index in a mongo database, i'm using flask mongodb openai llama_index and langchain. when i excecute the code i get this error "StorageContext has no attribute callback manager", how can i fix this ?

here's the code that don't work :
mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# Création ou mise à jour d'un index à partir de documents dans le dossier 'Sources'

documents = SimpleDirectoryReader("./Sources/5-1-placement.pdf").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=storage_context)

responseDTO = IndexCreationResponse.IndexCreationResponseDTO(False, None, "L'index à bien été créé ou a été mis à jour.")
W
K
28 comments
You are passing storage_context to the service_context kwarg while creating index.

It should be like this
Plain Text
    index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)


## OR define service context globally and no need to pass it to anywhere
from llama_index import set_global_service_context

set_global_service_context(service_context)

index = VectorStoreIndex.from_documents(documents,storage_context=storage_context)
the line : "documents = SimpleDirectoryReader("./Sources/5-1-placement.pdf").load_data()" goes after this or should i remove it?
i keep having the same error :
Attachment
image.png
You need to update this:

index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

You are passing storage_context to service_context kwarg
no, jsut replace the line where you are creating the index
Comment the line where you are getting error and move the global setting up one line
like this ?
Attachment
image.png
yes but with little change, let me write down this part:
Plain Text
from llama_index import set_global_service_context

mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# You need to create the service context above this line
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources").load_data()
index = VectorStoreIndex.from_documents(documents,storage_context=storage_context)
it works now thank you !^^
hello, @WhiteFang_Jr i'm facing an issue, the thing you gave me yesterday works but when i want to index something like 150 document in a row this take tooo long (about 1h or more or never end), is there a way to make the code down here index one document return the status(200) and restart with the following documint until all document are indexed?


dossier = requestDTO.Index

# Initialisation des paramètres pour les requètes sur MongoDB Atlas

mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# Création ou mise à jour d'un index à partir de documents dans le dossier 'Sources'1
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources/Zephyr").load_data()
#index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

#while documents:
# index.add_documents([documents], storage_context=storage_context)


responseDTO = IndexCreationResponse.IndexCreationResponseDTO(False, None, "L'index à bien été créé ou a été mis à jour.")

# Terminée, on envoi la réponse définitive

return GenerateIndexResponse(requestDTO, responseDTO), 200
You can run the indexing process on a separate thread simply return the response that indexing in process.

Are you using any python framework
i'm using these :
Attachment
image.png
can you show me how to separate on a threads?
Sure,it will look something like this:
Plain Text
from flask import Flask, jsonify
import threading

app = Flask(__name__)

def process_data(data):
    # Perofrm the indexing here!!!
    # Process data here (simulated by printing)
    print(f"Processing data: {data}")
    # Simulate a long-running task
    # Replace this with your actual data processing logic
    import time
    time.sleep(5)
    return f"Processed data: {data}"

@app.route('/process', methods=['POST'])
def process():
    # Recieve your files and send it to the method
    data = request.json  # Assuming data is sent in JSON format
    # Start a new thread to process the data
    thread = threading.Thread(target=process_data, args=(data,))
    thread.start()
    # Return an immediate response to the client
    return jsonify({"message": "Indexing data on a separate thread."})

if __name__ == '__main__':
    app.run(debug=True)
for that i need my app.route index to call app.route process ?
No, It's jsut an example, to show you how u can implement threading

  • You need to create a method where you will do the indexing.
  • Use threading as given there in your API method from where u are currently doing the indexing.
okay, i just need to call from my index method my process def to separate my documents on a threads, right ?
could you make an exemple more precise with the code i gave you, i d'ont really understand what i have to do ?
Okay let me try to add it on your code
Can you send me your entire API method, so that I can make the change
This should work for your case
sorry for answer late, this works but the problem is the quota storage in my mongo datbase so i need to update that first
huge thanks for your help
Awesome, happy to help
Also You can increase the batch size that will also reduce the time for index building.

Plain Text
    embed_model = AzureOpenAIEmbedding(
        model=model.LearningModel.Model,
        deployment_name=model.LearningModel.Name,
        api_key=openai.api_key,
        azure_endpoint=openai.base_url,
        api_version=openai.api_version,
        embed_batch_size=50 # This is by default 10
    )
Add a reply
Sign up and join the conversation on Discord