I'm trying to find the "best" way to load PDF's, and I settled for now on the pymupdf4llms because I was using the PdfMarkdownReader from the marker-py lib, but I was hoping to avoid insanely long indexing time
I think SentenceSplitters requires torch, which makes sense, but on any decent server they probably take a long while I presume if it goes at 2.19it/s for my computer with a 4090 24 GB?
I think it would be beneficial to offer that to users too, cause the only change required would be using a vision LLM to "see" what's on the page, vs using torch