Find answers from the community

Updated 9 months ago

Hello, I'm facing an issue while

Hello, I'm facing an issue while indexing txt files into my mongodb database. When i store my documents in the database the encoding change and i don't know why. I check my files and they're all enbcoded in "utf-8".
Here is my code :

print("Initialisation OpenAI...")

llm = AzureOpenAI(
model=_model.ChatModel.Model,
deployment_name=_model.ChatModel.Name,
api_key=_model.Key1,
azure_endpoint=_model.Server,
api_version=_model.ChatModel.ApiVersion,
)


embed_model = AzureOpenAIEmbedding(
model=_model.LearningModel.Model,
deployment_name=_model.LearningModel.Name,
api_key=_model.Key1,
azure_endpoint=_model.Server,
api_version=_model.LearningModel.ApiVersion,
)


Settings.llm = llm
Settings.embed_model = embed_model
Settings.context_window = _contextWindow
Settings.num_output = _numOutput

//Initialisation des paramètres pour les requètes sur MongoDB Atlas

print("Initialisation MongoDB...")

mongodb_client = pymongo.MongoClient(_mongoURI)
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=_index)

storage_context = StorageContext.from_defaults(vector_store=store)

//On parcours chaque fichier

print("Démarrage de l'importation...")


reader = SimpleDirectoryReader(_directory, recursive=True, encoding="utf-8", required_exts=[".pdf", ".docx", ".pptx", ".csv", ".txt", ".xml", ".dng"])

for docs in reader.iter_data():
print("F > " + docs[0].get_text() )
VectorStoreIndex.from_documents(docs, storage_context=storage_context)


It seems like it's the variable "VectorStoreIndex" which change the encoding, so my question is : Can i force "VectorStoreIndex" to encode in utf-8 ? If yes, How can i do taht?
L
1 comment
Its not really possible. You'd have to make the change inside the MongoDBAtlasVectorSearch class
Add a reply
Sign up and join the conversation on Discord