The community member is experiencing an issue with the llama-index library, where their deployments are failing due to a "Resource stopwords not found" error after updating to version 0.10.26. They are wondering if it's possible to delete, remove, or disable NLTK, as they are not using any NLTK functionalities directly.
The comments suggest that it's not possible to remove NLTK, but the NLTK resources should be included in the build. One community member suggests that the issue might be due to a specific feature not being bundled or a problem with the last release. Another community member provides a Google Colab example where the stopwords are bundled properly without any downloading.
The issue was eventually resolved by the community member by adding the following to their Dockerfile:
ENV NLTK_DATA="/opt/nltk_data"
RUN python -c "import nltk; nltk.download('stopwords', download_dir='/opt/nltk_data'); nltk.download('punkt', download_dir='/opt/nltk_data')"
This allowed them to download the necessary NLTK resources and fix the deployment issue.
Hey is it possible to delete/remove/disable nltk. I updated to llama-index 0.10.26 and my deployments are failing because of "Resource stopwords not found.". It used to work fine before the update and afaik Im not using any nltk functionalities, at least not directly? Does anyone know a fix for this?
pretty surprised to see this, unless you are using some specific feature that isn't bundled (well either that, or the last release missed bundling the resources, not sure)