The community members are discussing an error encountered when using a GPU-powered model on Colab. Some suggestions include issues with the service_context configuration and the amount of VRAM required for the model. The community members note that the error may not be an actual error, but rather a warning related to the tokenizer. They also mention that the VRAM requirements can vary depending on the specific model being used, and that techniques like 8-bit or 16-bit loading may help reduce the VRAM requirements. However, there is no explicitly marked answer in the comments.
seems like it is something to do with service_context = ServiceContext.from_defaults(chunk_size_limit=512, llm_predictor=hf_predictor, embed_model=embed_model)
Not an error, just a warning. Something to do with the tokenizer somewhere I think, but it's pretty benign (I think the chunk size being the same is just a coincidence)
update on this one, using a A6000 with 48G of VRAM, i'm 20min in and i've got the same error and no response for the moment. How much time did it take you to load ? @Logan M