The community member is building a RAG application for complex PDFs and is experiencing an issue where the indexing step is taking significantly longer to run locally (a couple of minutes+) compared to when the application is hosted on Streamlit community cloud (~45 seconds). The code has not changed, and the data being used has also not changed. The community member has tried using multiprocessing to speed up the ingestion pipeline, but it still does not index as fast as it used to. The community members are unsure of the cause of this issue and are seeking suggestions from the community.
The comments suggest that the slower performance on the local machine could be due to other processes taking up CPU resources, or that the Streamlit cloud machine may be under a heavier load, which could make the execution slower on that platform. However, there is no definitive answer provided in the comments.
Hi everyone, I'm building a RAG application for complex PDFs and am running into a strange issue where my indexing step is suddenly taking a lot longer than it used to. I am hosting my app on Streamlit community cloud, and indexing still takes a reasonable amount of time there (~45 seconds) whereas it suddenly takes a good couple of minutes+ locally (and shows a lot more progress bars than it used to). The code hasn't changed at all (I also tested this with an earlier version of my code that ran fine locally before), so I'm confused as to what could've caused this. I've tried multiprocessing during my ingestion pipeline to speed things up, but it still doesn't index nearly as fast as it used to. The data I'm using has not changed at all, either. I'm assuming the issue is with my machine or perhaps dependencies, but any ideas would be much appreciated, thanks!