You are passing storage_context to the service_context kwarg while creating index.
It should be like this
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
## OR define service context globally and no need to pass it to anywhere
from llama_index import set_global_service_context
set_global_service_context(service_context)
index = VectorStoreIndex.from_documents(documents,storage_context=storage_context)
the line : "documents = SimpleDirectoryReader("./Sources/5-1-placement.pdf").load_data()" goes after this or should i remove it?
i keep having the same error :
You need to update this:
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
You are passing storage_context to service_context kwarg
no, jsut replace the line where you are creating the index
Comment the line where you are getting error and move the global setting up one line
yes but with little change, let me write down this part:
from llama_index import set_global_service_context
mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)
storage_context = StorageContext.from_defaults(vector_store=store)
# You need to create the service context above this line
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources").load_data()
index = VectorStoreIndex.from_documents(documents,storage_context=storage_context)
it works now thank you !^^
hello, @WhiteFang_Jr i'm facing an issue, the thing you gave me yesterday works but when i want to index something like 150 document in a row this take tooo long (about 1h or more or never end), is there a way to make the code down here index one document return the status(200) and restart with the following documint until all document are indexed?
dossier = requestDTO.Index
# Initialisation des paramètres pour les requètes sur MongoDB Atlas
mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)
storage_context = StorageContext.from_defaults(vector_store=store)
# Création ou mise à jour d'un index à partir de documents dans le dossier 'Sources'1
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources/Zephyr").load_data()
#index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
#while documents:
# index.add_documents([documents], storage_context=storage_context)
responseDTO = IndexCreationResponse.IndexCreationResponseDTO(False, None, "L'index à bien été créé ou a été mis à jour.")
# Terminée, on envoi la réponse définitive
return GenerateIndexResponse(requestDTO, responseDTO), 200
You can run the indexing process on a separate thread simply return the response that indexing in process.
Are you using any python framework
can you show me how to separate on a threads?
Sure,it will look something like this:
from flask import Flask, jsonify
import threading
app = Flask(__name__)
def process_data(data):
# Perofrm the indexing here!!!
# Process data here (simulated by printing)
print(f"Processing data: {data}")
# Simulate a long-running task
# Replace this with your actual data processing logic
import time
time.sleep(5)
return f"Processed data: {data}"
@app.route('/process', methods=['POST'])
def process():
# Recieve your files and send it to the method
data = request.json # Assuming data is sent in JSON format
# Start a new thread to process the data
thread = threading.Thread(target=process_data, args=(data,))
thread.start()
# Return an immediate response to the client
return jsonify({"message": "Indexing data on a separate thread."})
if __name__ == '__main__':
app.run(debug=True)
for that i need my app.route index to call app.route process ?
No, It's jsut an example, to show you how u can implement threading
- You need to create a method where you will do the indexing.
- Use threading as given there in your API method from where u are currently doing the indexing.
okay, i just need to call from my index method my process def to separate my documents on a threads, right ?
could you make an exemple more precise with the code i gave you, i d'ont really understand what i have to do ?
Okay let me try to add it on your code
Can you send me your entire API method, so that I can make the change
This should work for your case
sorry for answer late, this works but the problem is the quota storage in my mongo datbase so i need to update that first
huge thanks for your help
Also You can increase the batch size that will also reduce the time for index building.
embed_model = AzureOpenAIEmbedding(
model=model.LearningModel.Model,
deployment_name=model.LearningModel.Name,
api_key=openai.api_key,
azure_endpoint=openai.base_url,
api_version=openai.api_version,
embed_batch_size=50 # This is by default 10
)