Try lowering the sentence window to 2. If that doesn't work, either your data isn't really sentence-based, or our sentence splitter is doing a very bad job π¦
I checked the data and all the data is sentence-based documents. Some of them are downloaded HTML including some irrelevant html for menu. Other documents are pdf. I am not entirely sure how you preprocess the files in the SimpleDirectoryReader, but it could be that I should clean up the HTML files?