documents = SimpleDirectoryReader(input_files=file_paths).load_data()
with a single element in file_paths
it returns 13 documents. I don't understand why.filename_as_id=True
parameter, it just appends a part_X
to each doc id, so that the ids are still deterministic documents = [] for x, file_path in enumerate(file_paths): docs = SimpleDirectoryReader(input_files=[file_path]).load_data() # Add doc_id to documents logging.info("Adding doc_id to documents") for i in range(len(docs)): docs[i].doc_id = f"{azure_path[x]}_part_{i}" documents.extend(docs)